I wrote the function ‘create_matrices()’ to collect mutation counts. At least in theory the results from it should be able to address most/any question regarding the counts of mutations observed in the data.
Categorize the data with at least 3 indexes per mutant
## Loading Rerrrt
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:hpgltools':
##
## combine
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: tidyr
sample_sheet <- "sample_sheets/all_samples.xlsx"
ident_column <- "identtable"
mut_column <- "mutationtable"
min_reads <- 3
min_indexes <- 3
min_sequencer <- 10
min_position <- 24
max_position <- 176
max_mutations_per_read <- NULL
prune_n <- TRUE
verbose <- TRUE
excel <- "excel/triples.xlsx"
triples <- create_matrices(sample_sheet=sample_sheet,
ident_column=ident_column, mut_column=mut_column,
min_reads=min_reads, min_indexes=min_indexes,
min_sequencer=min_sequencer,
min_position=min_position, max_position=max_position,
prune_n=prune_n, verbose=verbose, excel=excel)
## Starting sample: 1.
## Reading the file containing mutations: preprocessing/s1/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s1/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 1156535 reads.
## Mutation data: after min-position pruning, there are: 1037310 reads: 119225 lost or 10.31%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1037310 reads.
## Mutation data: after max-position pruning, there are: 968161 reads: 69149 lost or 6.67%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 953181 reads: 14980 lost or 1.55%.
## Mutation data: all filters removed 203354 reads, or 17.58%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1742165 indexes in all the data.
## After reads/index pruning, there are: 837608 indexes: 904557 lost or 51.92%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 953181 changed reads.
## All data: before reads/index pruning, there are: 4681501 identical reads.
## All data: after index pruning, there are: 491995 changed reads: 51.62%.
## All data: after index pruning, there are: 3663004 identical reads: 78.24%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3663004 identical reads.
## Before classification, there are 491995 reads with mutations.
## After classification, there are 2738199 reads/indexes which are only identical.
## After classification, there are 11023 reads/indexes which are strictly sequencer.
## After classification, there are 26963 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 7018785 forward reads and 7148314 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 2.
## Reading the file containing mutations: preprocessing/s2/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s2/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 3421203 reads.
## Mutation data: after min-position pruning, there are: 1758479 reads: 1662724 lost or 48.60%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1758479 reads.
## Mutation data: after max-position pruning, there are: 1667302 reads: 91177 lost or 5.18%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1642969 reads: 24333 lost or 1.46%.
## Mutation data: all filters removed 1778234 reads, or 51.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1261478 indexes in all the data.
## After reads/index pruning, there are: 693725 indexes: 567753 lost or 45.01%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1642969 changed reads.
## All data: before reads/index pruning, there are: 5230976 identical reads.
## All data: after index pruning, there are: 814407 changed reads: 49.57%.
## All data: after index pruning, there are: 4834092 identical reads: 92.41%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 4834092 identical reads.
## Before classification, there are 814407 reads with mutations.
## After classification, there are 2802107 reads/indexes which are only identical.
## After classification, there are 111708 reads/indexes which are strictly sequencer.
## After classification, there are 126921 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 11803361 forward reads and 12275547 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 3.
## Reading the file containing mutations: preprocessing/s3/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s3/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 4309681 reads.
## Mutation data: after min-position pruning, there are: 1564155 reads: 2745526 lost or 63.71%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1564155 reads.
## Mutation data: after max-position pruning, there are: 1482559 reads: 81596 lost or 5.22%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1452047 reads: 30512 lost or 2.06%.
## Mutation data: all filters removed 2857634 reads, or 66.31%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 884042 indexes in all the data.
## After reads/index pruning, there are: 463445 indexes: 420597 lost or 47.58%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1452047 changed reads.
## All data: before reads/index pruning, there are: 3583390 identical reads.
## All data: after index pruning, there are: 730397 changed reads: 50.30%.
## All data: after index pruning, there are: 3332136 identical reads: 92.99%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3332136 identical reads.
## Before classification, there are 730397 reads with mutations.
## After classification, there are 1851177 reads/indexes which are only identical.
## After classification, there are 90341 reads/indexes which are strictly sequencer.
## After classification, there are 244494 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 9104237 forward reads and 9257103 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Plotting index densities.
## Skipping table: delete_reads_by_position
## Skipping table: delete_indexes_by_position
## Skipping table: delete_sequencer_by_position
## Skipping table: delete_reads_by_nt
## Skipping table: delete_indexes_by_nt
## Skipping table: delete_sequencer_by_nt
## Deleting the file excel/triples.xlsx before writing the tables.
## Warning: Removed 76384 rows containing non-finite values (stat_density).
## Warning: Removed 76384 rows containing non-finite values (stat_density).
## Warning: Removed 3693 rows containing non-finite values (stat_density).
## Warning: Removed 3693 rows containing non-finite values (stat_density).
## Length Class Mode
## metadata 5 data.frame list
## samples 3 -none- list
## filtered 18 -none- numeric
## reads_remaining 24 -none- numeric
## indexes_remaining 15 -none- numeric
## reads_per_sample 3 -none- numeric
## indexes_per_sample 3 -none- numeric
## matrices 33 -none- list
## matrices_cpm 33 -none- list
## matrices_cpmlength 33 -none- list
## matrices_counts 33 -none- list
## matrices_countslength 33 -none- list
## pre_index_density_plot 9 gg list
## post_index_density_plot 9 gg list
## plots 5 -none- list
## excel 1 -none- numeric
## Starting sample: 1.
## Reading the file containing mutations: preprocessing/s1/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s1/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 1156535 reads.
## Mutation data: after min-position pruning, there are: 1037310 reads: 119225 lost or 10.31%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1037310 reads.
## Mutation data: after max-position pruning, there are: 968161 reads: 69149 lost or 6.67%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 953181 reads: 14980 lost or 1.55%.
## Mutation data: all filters removed 203354 reads, or 17.58%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1742165 indexes in all the data.
## After reads/index pruning, there are: 837608 indexes: 904557 lost or 51.92%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 953181 changed reads.
## All data: before reads/index pruning, there are: 4681501 identical reads.
## All data: after index pruning, there are: 491995 changed reads: 51.62%.
## All data: after index pruning, there are: 3663004 identical reads: 78.24%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3663004 identical reads.
## Before classification, there are 491995 reads with mutations.
## After classification, there are 2738199 reads/indexes which are only identical.
## After classification, there are 11023 reads/indexes which are strictly sequencer.
## After classification, there are 26963 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 7018785 forward reads and 7148314 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 2.
## Reading the file containing mutations: preprocessing/s2/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s2/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 3421203 reads.
## Mutation data: after min-position pruning, there are: 1758479 reads: 1662724 lost or 48.60%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1758479 reads.
## Mutation data: after max-position pruning, there are: 1667302 reads: 91177 lost or 5.18%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1642969 reads: 24333 lost or 1.46%.
## Mutation data: all filters removed 1778234 reads, or 51.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1261478 indexes in all the data.
## After reads/index pruning, there are: 693725 indexes: 567753 lost or 45.01%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1642969 changed reads.
## All data: before reads/index pruning, there are: 5230976 identical reads.
## All data: after index pruning, there are: 814407 changed reads: 49.57%.
## All data: after index pruning, there are: 4834092 identical reads: 92.41%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 4834092 identical reads.
## Before classification, there are 814407 reads with mutations.
## After classification, there are 2802107 reads/indexes which are only identical.
## After classification, there are 111708 reads/indexes which are strictly sequencer.
## After classification, there are 126921 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 11803361 forward reads and 12275547 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 3.
## Reading the file containing mutations: preprocessing/s3/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s3/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 4309681 reads.
## Mutation data: after min-position pruning, there are: 1564155 reads: 2745526 lost or 63.71%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1564155 reads.
## Mutation data: after max-position pruning, there are: 1482559 reads: 81596 lost or 5.22%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1452047 reads: 30512 lost or 2.06%.
## Mutation data: all filters removed 2857634 reads, or 66.31%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 884042 indexes in all the data.
## After reads/index pruning, there are: 463445 indexes: 420597 lost or 47.58%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1452047 changed reads.
## All data: before reads/index pruning, there are: 3583390 identical reads.
## All data: after index pruning, there are: 730397 changed reads: 50.30%.
## All data: after index pruning, there are: 3332136 identical reads: 92.99%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3332136 identical reads.
## Before classification, there are 730397 reads with mutations.
## After classification, there are 1851177 reads/indexes which are only identical.
## After classification, there are 90341 reads/indexes which are strictly sequencer.
## After classification, there are 244494 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 9104237 forward reads and 9257103 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Plotting index densities.
## Skipping table: delete_reads_by_position
## Skipping table: delete_indexes_by_position
## Skipping table: delete_sequencer_by_position
## Skipping table: delete_reads_by_nt
## Skipping table: delete_indexes_by_nt
## Skipping table: delete_sequencer_by_nt
## Length Class Mode
## metadata 5 data.frame list
## samples 3 -none- list
## filtered 18 -none- numeric
## reads_remaining 24 -none- numeric
## indexes_remaining 15 -none- numeric
## reads_per_sample 3 -none- numeric
## indexes_per_sample 3 -none- numeric
## matrices 33 -none- list
## matrices_cpm 33 -none- list
## matrices_cpmlength 33 -none- list
## matrices_counts 33 -none- list
## matrices_countslength 33 -none- list
## pre_index_density_plot 9 gg list
## post_index_density_plot 9 gg list
## plots 5 -none- list
## Starting sample: 1.
## Reading the file containing mutations: preprocessing/s1/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s1/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 1156535 reads.
## Mutation data: after min-position pruning, there are: 1037310 reads: 119225 lost or 10.31%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1037310 reads.
## Mutation data: after max-position pruning, there are: 968161 reads: 69149 lost or 6.67%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 953181 reads: 14980 lost or 1.55%.
## Mutation data: all filters removed 203354 reads, or 17.58%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1742165 indexes in all the data.
## After reads/index pruning, there are: 837608 indexes: 904557 lost or 51.92%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 953181 changed reads.
## All data: before reads/index pruning, there are: 4681501 identical reads.
## All data: after index pruning, there are: 491995 changed reads: 51.62%.
## All data: after index pruning, there are: 3663004 identical reads: 78.24%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3663004 identical reads.
## Before classification, there are 491995 reads with mutations.
## After classification, there are 2738199 reads/indexes which are only identical.
## After classification, there are 11023 reads/indexes which are strictly sequencer.
## After classification, there are 26963 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 7018785 forward reads and 7148314 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 2.
## Reading the file containing mutations: preprocessing/s2/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s2/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 3421203 reads.
## Mutation data: after min-position pruning, there are: 1758479 reads: 1662724 lost or 48.60%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1758479 reads.
## Mutation data: after max-position pruning, there are: 1667302 reads: 91177 lost or 5.18%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1642969 reads: 24333 lost or 1.46%.
## Mutation data: all filters removed 1778234 reads, or 51.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1261478 indexes in all the data.
## After reads/index pruning, there are: 693725 indexes: 567753 lost or 45.01%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1642969 changed reads.
## All data: before reads/index pruning, there are: 5230976 identical reads.
## All data: after index pruning, there are: 814407 changed reads: 49.57%.
## All data: after index pruning, there are: 4834092 identical reads: 92.41%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 4834092 identical reads.
## Before classification, there are 814407 reads with mutations.
## After classification, there are 2802107 reads/indexes which are only identical.
## After classification, there are 111708 reads/indexes which are strictly sequencer.
## After classification, there are 126921 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 11803361 forward reads and 12275547 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 3.
## Reading the file containing mutations: preprocessing/s3/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s3/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 4309681 reads.
## Mutation data: after min-position pruning, there are: 1564155 reads: 2745526 lost or 63.71%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1564155 reads.
## Mutation data: after max-position pruning, there are: 1482559 reads: 81596 lost or 5.22%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1452047 reads: 30512 lost or 2.06%.
## Mutation data: all filters removed 2857634 reads, or 66.31%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 884042 indexes in all the data.
## After reads/index pruning, there are: 463445 indexes: 420597 lost or 47.58%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1452047 changed reads.
## All data: before reads/index pruning, there are: 3583390 identical reads.
## All data: after index pruning, there are: 730397 changed reads: 50.30%.
## All data: after index pruning, there are: 3332136 identical reads: 92.99%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3332136 identical reads.
## Before classification, there are 730397 reads with mutations.
## After classification, there are 1851177 reads/indexes which are only identical.
## After classification, there are 90341 reads/indexes which are strictly sequencer.
## After classification, there are 244494 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 9104237 forward reads and 9257103 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Plotting index densities.
## Skipping table: delete_reads_by_position
## Skipping table: delete_indexes_by_position
## Skipping table: delete_sequencer_by_position
## Skipping table: delete_reads_by_nt
## Skipping table: delete_indexes_by_nt
## Skipping table: delete_sequencer_by_nt
## Length Class Mode
## metadata 5 data.frame list
## samples 3 -none- list
## filtered 18 -none- numeric
## reads_remaining 24 -none- numeric
## indexes_remaining 15 -none- numeric
## reads_per_sample 3 -none- numeric
## indexes_per_sample 3 -none- numeric
## matrices 33 -none- list
## matrices_cpm 33 -none- list
## matrices_cpmlength 33 -none- list
## matrices_counts 33 -none- list
## matrices_countslength 33 -none- list
## pre_index_density_plot 9 gg list
## post_index_density_plot 9 gg list
## plots 5 -none- list
Categorize the data with at least 5 indexes per mutant
## Starting sample: 1.
## Reading the file containing mutations: preprocessing/s1/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s1/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 1156535 reads.
## Mutation data: after min-position pruning, there are: 1037310 reads: 119225 lost or 10.31%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1037310 reads.
## Mutation data: after max-position pruning, there are: 968161 reads: 69149 lost or 6.67%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 953181 reads: 14980 lost or 1.55%.
## Mutation data: all filters removed 203354 reads, or 17.58%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1742165 indexes in all the data.
## After reads/index pruning, there are: 837608 indexes: 904557 lost or 51.92%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 953181 changed reads.
## All data: before reads/index pruning, there are: 4681501 identical reads.
## All data: after index pruning, there are: 491995 changed reads: 51.62%.
## All data: after index pruning, there are: 3663004 identical reads: 78.24%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3663004 identical reads.
## Before classification, there are 491995 reads with mutations.
## After classification, there are 2738199 reads/indexes which are only identical.
## After classification, there are 11023 reads/indexes which are strictly sequencer.
## After classification, there are 26963 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 7018785 forward reads and 7148314 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 2.
## Reading the file containing mutations: preprocessing/s2/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s2/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 3421203 reads.
## Mutation data: after min-position pruning, there are: 1758479 reads: 1662724 lost or 48.60%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1758479 reads.
## Mutation data: after max-position pruning, there are: 1667302 reads: 91177 lost or 5.18%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1642969 reads: 24333 lost or 1.46%.
## Mutation data: all filters removed 1778234 reads, or 51.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1261478 indexes in all the data.
## After reads/index pruning, there are: 693725 indexes: 567753 lost or 45.01%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1642969 changed reads.
## All data: before reads/index pruning, there are: 5230976 identical reads.
## All data: after index pruning, there are: 814407 changed reads: 49.57%.
## All data: after index pruning, there are: 4834092 identical reads: 92.41%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 4834092 identical reads.
## Before classification, there are 814407 reads with mutations.
## After classification, there are 2802107 reads/indexes which are only identical.
## After classification, there are 111708 reads/indexes which are strictly sequencer.
## After classification, there are 126921 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 11803361 forward reads and 12275547 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 3.
## Reading the file containing mutations: preprocessing/s3/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s3/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 4309681 reads.
## Mutation data: after min-position pruning, there are: 1564155 reads: 2745526 lost or 63.71%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1564155 reads.
## Mutation data: after max-position pruning, there are: 1482559 reads: 81596 lost or 5.22%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1452047 reads: 30512 lost or 2.06%.
## Mutation data: all filters removed 2857634 reads, or 66.31%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 884042 indexes in all the data.
## After reads/index pruning, there are: 463445 indexes: 420597 lost or 47.58%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1452047 changed reads.
## All data: before reads/index pruning, there are: 3583390 identical reads.
## All data: after index pruning, there are: 730397 changed reads: 50.30%.
## All data: after index pruning, there are: 3332136 identical reads: 92.99%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3332136 identical reads.
## Before classification, there are 730397 reads with mutations.
## After classification, there are 1851177 reads/indexes which are only identical.
## After classification, there are 90341 reads/indexes which are strictly sequencer.
## After classification, there are 244494 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 9104237 forward reads and 9257103 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Plotting index densities.
## Skipping table: delete_reads_by_position
## Skipping table: delete_indexes_by_position
## Skipping table: delete_sequencer_by_position
## Skipping table: delete_reads_by_nt
## Skipping table: delete_indexes_by_nt
## Skipping table: delete_sequencer_by_nt
## Length Class Mode
## metadata 5 data.frame list
## samples 3 -none- list
## filtered 18 -none- numeric
## reads_remaining 24 -none- numeric
## indexes_remaining 15 -none- numeric
## reads_per_sample 3 -none- numeric
## indexes_per_sample 3 -none- numeric
## matrices 33 -none- list
## matrices_cpm 33 -none- list
## matrices_cpmlength 33 -none- list
## matrices_counts 33 -none- list
## matrices_countslength 33 -none- list
## pre_index_density_plot 9 gg list
## post_index_density_plot 9 gg list
## plots 5 -none- list
## Starting sample: 1.
## Reading the file containing mutations: preprocessing/s1/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s1/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 1156535 reads.
## Mutation data: after min-position pruning, there are: 1037310 reads: 119225 lost or 10.31%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1037310 reads.
## Mutation data: after max-position pruning, there are: 968161 reads: 69149 lost or 6.67%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 953181 reads: 14980 lost or 1.55%.
## Mutation data: all filters removed 203354 reads, or 17.58%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1742165 indexes in all the data.
## After reads/index pruning, there are: 837608 indexes: 904557 lost or 51.92%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 953181 changed reads.
## All data: before reads/index pruning, there are: 4681501 identical reads.
## All data: after index pruning, there are: 491995 changed reads: 51.62%.
## All data: after index pruning, there are: 3663004 identical reads: 78.24%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3663004 identical reads.
## Before classification, there are 491995 reads with mutations.
## After classification, there are 2738199 reads/indexes which are only identical.
## After classification, there are 11023 reads/indexes which are strictly sequencer.
## After classification, there are 26963 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 7018785 forward reads and 7148314 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 2.
## Reading the file containing mutations: preprocessing/s2/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s2/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 3421203 reads.
## Mutation data: after min-position pruning, there are: 1758479 reads: 1662724 lost or 48.60%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1758479 reads.
## Mutation data: after max-position pruning, there are: 1667302 reads: 91177 lost or 5.18%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1642969 reads: 24333 lost or 1.46%.
## Mutation data: all filters removed 1778234 reads, or 51.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1261478 indexes in all the data.
## After reads/index pruning, there are: 693725 indexes: 567753 lost or 45.01%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1642969 changed reads.
## All data: before reads/index pruning, there are: 5230976 identical reads.
## All data: after index pruning, there are: 814407 changed reads: 49.57%.
## All data: after index pruning, there are: 4834092 identical reads: 92.41%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 4834092 identical reads.
## Before classification, there are 814407 reads with mutations.
## After classification, there are 2802107 reads/indexes which are only identical.
## After classification, there are 111708 reads/indexes which are strictly sequencer.
## After classification, there are 126921 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 11803361 forward reads and 12275547 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 3.
## Reading the file containing mutations: preprocessing/s3/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s3/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 4309681 reads.
## Mutation data: after min-position pruning, there are: 1564155 reads: 2745526 lost or 63.71%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1564155 reads.
## Mutation data: after max-position pruning, there are: 1482559 reads: 81596 lost or 5.22%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1452047 reads: 30512 lost or 2.06%.
## Mutation data: all filters removed 2857634 reads, or 66.31%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 884042 indexes in all the data.
## After reads/index pruning, there are: 463445 indexes: 420597 lost or 47.58%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1452047 changed reads.
## All data: before reads/index pruning, there are: 3583390 identical reads.
## All data: after index pruning, there are: 730397 changed reads: 50.30%.
## All data: after index pruning, there are: 3332136 identical reads: 92.99%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3332136 identical reads.
## Before classification, there are 730397 reads with mutations.
## After classification, there are 1851177 reads/indexes which are only identical.
## After classification, there are 90341 reads/indexes which are strictly sequencer.
## After classification, there are 244494 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 9104237 forward reads and 9257103 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Plotting index densities.
## Skipping table: delete_reads_by_position
## Skipping table: delete_indexes_by_position
## Skipping table: delete_sequencer_by_position
## Skipping table: delete_reads_by_nt
## Skipping table: delete_indexes_by_nt
## Skipping table: delete_sequencer_by_nt
## Length Class Mode
## metadata 5 data.frame list
## samples 3 -none- list
## filtered 18 -none- numeric
## reads_remaining 24 -none- numeric
## indexes_remaining 15 -none- numeric
## reads_per_sample 3 -none- numeric
## indexes_per_sample 3 -none- numeric
## matrices 33 -none- list
## matrices_cpm 33 -none- list
## matrices_cpmlength 33 -none- list
## matrices_counts 33 -none- list
## matrices_countslength 33 -none- list
## pre_index_density_plot 9 gg list
## post_index_density_plot 9 gg list
## plots 5 -none- list
## Starting sample: 1.
## Reading the file containing mutations: preprocessing/s1/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s1/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 1156535 reads.
## Mutation data: after min-position pruning, there are: 1037310 reads: 119225 lost or 10.31%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1037310 reads.
## Mutation data: after max-position pruning, there are: 968161 reads: 69149 lost or 6.67%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 953181 reads: 14980 lost or 1.55%.
## Mutation data: all filters removed 203354 reads, or 17.58%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1742165 indexes in all the data.
## After reads/index pruning, there are: 837608 indexes: 904557 lost or 51.92%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 953181 changed reads.
## All data: before reads/index pruning, there are: 4681501 identical reads.
## All data: after index pruning, there are: 491995 changed reads: 51.62%.
## All data: after index pruning, there are: 3663004 identical reads: 78.24%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3663004 identical reads.
## Before classification, there are 491995 reads with mutations.
## After classification, there are 2738199 reads/indexes which are only identical.
## After classification, there are 11023 reads/indexes which are strictly sequencer.
## After classification, there are 26963 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 7018785 forward reads and 7148314 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 2.
## Reading the file containing mutations: preprocessing/s2/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s2/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 3421203 reads.
## Mutation data: after min-position pruning, there are: 1758479 reads: 1662724 lost or 48.60%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1758479 reads.
## Mutation data: after max-position pruning, there are: 1667302 reads: 91177 lost or 5.18%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1642969 reads: 24333 lost or 1.46%.
## Mutation data: all filters removed 1778234 reads, or 51.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1261478 indexes in all the data.
## After reads/index pruning, there are: 693725 indexes: 567753 lost or 45.01%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1642969 changed reads.
## All data: before reads/index pruning, there are: 5230976 identical reads.
## All data: after index pruning, there are: 814407 changed reads: 49.57%.
## All data: after index pruning, there are: 4834092 identical reads: 92.41%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 4834092 identical reads.
## Before classification, there are 814407 reads with mutations.
## After classification, there are 2802107 reads/indexes which are only identical.
## After classification, there are 111708 reads/indexes which are strictly sequencer.
## After classification, there are 126921 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 11803361 forward reads and 12275547 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: 3.
## Reading the file containing mutations: preprocessing/s3/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s3/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 4309681 reads.
## Mutation data: after min-position pruning, there are: 1564155 reads: 2745526 lost or 63.71%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 1564155 reads.
## Mutation data: after max-position pruning, there are: 1482559 reads: 81596 lost or 5.22%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 1452047 reads: 30512 lost or 2.06%.
## Mutation data: all filters removed 2857634 reads, or 66.31%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 884042 indexes in all the data.
## After reads/index pruning, there are: 463445 indexes: 420597 lost or 47.58%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1452047 changed reads.
## All data: before reads/index pruning, there are: 3583390 identical reads.
## All data: after index pruning, there are: 730397 changed reads: 50.30%.
## All data: after index pruning, there are: 3332136 identical reads: 92.99%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 3332136 identical reads.
## Before classification, there are 730397 reads with mutations.
## After classification, there are 1851177 reads/indexes which are only identical.
## After classification, there are 90341 reads/indexes which are strictly sequencer.
## After classification, there are 244494 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 9104237 forward reads and 9257103 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Plotting index densities.
## Skipping table: delete_reads_by_position
## Skipping table: delete_indexes_by_position
## Skipping table: delete_sequencer_by_position
## Skipping table: delete_reads_by_nt
## Skipping table: delete_indexes_by_nt
## Skipping table: delete_sequencer_by_nt
## Length Class Mode
## metadata 5 data.frame list
## samples 3 -none- list
## filtered 18 -none- numeric
## reads_remaining 24 -none- numeric
## indexes_remaining 15 -none- numeric
## reads_per_sample 3 -none- numeric
## indexes_per_sample 3 -none- numeric
## matrices 33 -none- list
## matrices_cpm 33 -none- list
## matrices_cpmlength 33 -none- list
## matrices_counts 33 -none- list
## matrices_countslength 33 -none- list
## pre_index_density_plot 9 gg list
## post_index_density_plot 9 gg list
## plots 5 -none- list