I wrote the function ‘create_matrices()’ to collect mutation counts. At least in theory the results from it should be able to address most/any question regarding the counts of mutations observed in the data.
Categorize the data with at least 3 indexes per mutant
devtools::load_all("Rerrrt")
## Loading Rerrrt
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:hpgltools':
##
## combine
## The following object is masked from 'package:testthat':
##
## matches
## The following object is masked from 'package:Biobase':
##
## combine
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: tidyr
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:testthat':
##
## matches
sample_sheet <- "sample_sheets/new_samples.xlsx"
ident_column <- "identtable"
mut_column <- "mutationtable"
min_reads <- 3
min_indexes <- 3
min_sequencer <- 6
min_position <- 24
max_position <- 176
max_mutations_per_read <- NULL
prune_n <- TRUE
verbose <- TRUE
excel <- glue::glue("excel/{rundate}_new_triples-v{ver}.xlsx")
triples <- create_matrices(sample_sheet=sample_sheet,
ident_column=ident_column, mut_column=mut_column,
min_reads=min_reads, min_indexes=min_indexes,
min_sequencer=min_sequencer,
min_position=min_position, max_position=max_position,
prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
## Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
## Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
## Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
## Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
## Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
## Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
## Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
## Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
## Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
## Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
## Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
## Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
## Writing raw data.
## Writing cpm data.
## Writing data normalized by reads/indexes.
## Writing data normalized by reads/indexes and length.
## Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_triples_tenmpr-v{ver}.xlsx")
triples_tenmpr <- create_matrices(sample_sheet=sample_sheet,
ident_column=ident_column, mut_column=mut_column,
min_reads=min_reads, min_indexes=min_indexes,
min_sequencer=min_sequencer,
min_position=min_position, max_position=max_position,
prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
## Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
## Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
## Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
## Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
## Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
## Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
## Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
## Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
## Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
## Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
## Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
## Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
## Writing raw data.
## Writing cpm data.
## Writing data normalized by reads/indexes.
## Writing data normalized by reads/indexes and length.
## Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_triples_fivempr-v{ver}.xlsx")
triples_fivempr <- create_matrices(sample_sheet=sample_sheet,
ident_column=ident_column, mut_column=mut_column,
min_reads=min_reads, min_indexes=min_indexes,
min_sequencer=min_sequencer,
min_position=min_position, max_position=max_position,
prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
## Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
## Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
## Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
## Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
## Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
## Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
## Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
## Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
## Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
## Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
## Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
## Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
## Writing raw data.
## Writing cpm data.
## Writing data normalized by reads/indexes.
## Writing data normalized by reads/indexes and length.
## Writing data normalized by cpm(reads/indexes) and length.
Categorize the data with at least 5 indexes per mutant
min_indexes <- 5
max_mutations_per_read <- NULL
excel <- glue::glue("excel/{rundate}_quints-v{ver}.xlsx")
quints <- create_matrices(sample_sheet=sample_sheet,
ident_column=ident_column, mut_column=mut_column,
min_reads=min_reads, min_indexes=min_indexes,
min_sequencer=min_sequencer,
min_position=min_position, max_position=max_position,
prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
## Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
## Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
## Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
## Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
## Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
## Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
## Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
## Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
## Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
## Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
## Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
## Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
## Writing raw data.
## Writing cpm data.
## Writing data normalized by reads/indexes.
## Writing data normalized by reads/indexes and length.
## Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_quints_tenmpr-v{ver}.xlsx")
quints_tenmpr <- create_matrices(sample_sheet=sample_sheet,
ident_column=ident_column, mut_column=mut_column,
min_reads=min_reads, min_indexes=min_indexes,
min_sequencer=min_sequencer,
min_position=min_position, max_position=max_position,
prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
## Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
## Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
## Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
## Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
## Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
## Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
## Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
## Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
## Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
## Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
## Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
## Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
## Writing raw data.
## Writing cpm data.
## Writing data normalized by reads/indexes.
## Writing data normalized by reads/indexes and length.
## Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_quints_fivempr-v{ver}.xlsx")
quints_fivempr <- create_matrices(sample_sheet=sample_sheet,
ident_column=ident_column, mut_column=mut_column,
min_reads=min_reads, min_indexes=min_indexes,
min_sequencer=min_sequencer,
min_position=min_position, max_position=max_position,
prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
## Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
## Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
## Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
## Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
## Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
## Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
## Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
## Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
## Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
## Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
## Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
## Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
## Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
## Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
## Writing raw data.
## Writing cpm data.
## Writing data normalized by reads/indexes.
## Writing data normalized by reads/indexes and length.
## Writing data normalized by cpm(reads/indexes) and length.