1 Calculating error rates.

I wrote the function ‘create_matrices()’ to collect mutation counts. At least in theory the results from it should be able to address most/any question regarding the counts of mutations observed in the data.

1.1 Categorize the data with at least 3 indexes per mutant

devtools::load_all("Rerrrt")
## Loading Rerrrt
## Warning: replacing previous import 'data.table::last' by 'dplyr::last' when
## loading 'Rerrrt'
## Warning: replacing previous import 'data.table::first' by 'dplyr::first' when
## loading 'Rerrrt'
## Warning: replacing previous import 'data.table::between' by 'dplyr::between'
## when loading 'Rerrrt'
## Warning: replacing previous import 'dplyr::collapse' by 'glue::collapse' when
## loading 'Rerrrt'
ident_column <- "identtable"
mut_column <- "mutationtable"
min_reads <- 3
min_indexes <- 3
min_sequencer <- 6
min_position <- 22
max_position <- 185
max_mutations_per_read <- NULL
prune_n <- TRUE
verbose <- TRUE
plot_order <- c("dna_control", "dna_low", "dna_high", "rna_control", "rna_low", "rna_high")
sample_sheet <- "sample_sheets/recent_samples_2020.xlsx"
excel <- glue::glue("excel/{rundate}_recent_samples_2020_triples-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Starting sample: s07.
##   Reading the file containing mutations: preprocessing/s07/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s07/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 344009 reads.
##    Mutation data: after min-position pruning, there are: 332034 reads: 11975 lost or 3.48%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 332034 reads.
##    Mutation data: after max-position pruning, there are: 309791 reads: 22243 lost or 6.70%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 309791 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 34218 reads, or 9.95%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1392027 indexes in all the data.
## After reads/index pruning, there are: 258515 indexes: 1133512 lost or 81.43%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 309791 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 106390 changed reads: 34.34%.
## All data: after index pruning, there are: 870351 identical reads: 38.84%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 870351 identical reads.
## Before classification, there are 106390 reads with mutations.
## After classification, there are 741425 reads/indexes which are only identical.
## After classification, there are 5422 reads/indexes which are strictly sequencer.
## After classification, there are 11996 reads/indexes which are consistently repeated.
## Counted by direction: 1337121 forward reads and 1516515 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s17.
##   Reading the file containing mutations: preprocessing/s17/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s17/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2004637 reads.
##    Mutation data: after min-position pruning, there are: 1978823 reads: 25814 lost or 1.29%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1978823 reads.
##    Mutation data: after max-position pruning, there are: 838424 reads: 1140399 lost or 57.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 836843 reads: 1581 lost or 0.19%.
##   Mutation data: all filters removed 1167794 reads, or 58.25%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1673173 indexes in all the data.
## After reads/index pruning, there are: 302181 indexes: 1370992 lost or 81.94%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 836843 changed reads.
## All data: before reads/index pruning, there are: 2472166 identical reads.
## All data: after index pruning, there are: 286196 changed reads: 34.20%.
## All data: after index pruning, there are: 962051 identical reads: 38.92%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 962051 identical reads.
## Before classification, there are 286196 reads with mutations.
## After classification, there are 716862 reads/indexes which are only identical.
## After classification, there are 10570 reads/indexes which are strictly sequencer.
## After classification, there are 30217 reads/indexes which are consistently repeated.
## Counted by direction: 1339101 forward reads and 1413584 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s08.
##   Reading the file containing mutations: preprocessing/s08/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s08/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 418833 reads.
##    Mutation data: after min-position pruning, there are: 407418 reads: 11415 lost or 2.73%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 407418 reads.
##    Mutation data: after max-position pruning, there are: 382773 reads: 24645 lost or 6.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 382773 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 36060 reads, or 8.61%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1949558 indexes in all the data.
## After reads/index pruning, there are: 177278 indexes: 1772280 lost or 90.91%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 382773 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 78228 changed reads: 20.44%.
## All data: after index pruning, there are: 540224 identical reads: 21.05%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 540224 identical reads.
## Before classification, there are 78228 reads with mutations.
## After classification, there are 468416 reads/indexes which are only identical.
## After classification, there are 1067 reads/indexes which are strictly sequencer.
## After classification, there are 20032 reads/indexes which are consistently repeated.
## Counted by direction: 770967 forward reads and 901245 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s13.
##   Reading the file containing mutations: preprocessing/s13/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s13/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2902085 reads.
##    Mutation data: after min-position pruning, there are: 2869333 reads: 32752 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 2869333 reads.
##    Mutation data: after max-position pruning, there are: 1150787 reads: 1718546 lost or 59.89%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1148128 reads: 2659 lost or 0.23%.
##   Mutation data: all filters removed 1753957 reads, or 60.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 2989249 indexes in all the data.
## After reads/index pruning, there are: 300278 indexes: 2688971 lost or 89.95%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1148128 changed reads.
## All data: before reads/index pruning, there are: 3778131 identical reads.
## All data: after index pruning, there are: 242985 changed reads: 21.16%.
## All data: after index pruning, there are: 902374 identical reads: 23.88%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 902374 identical reads.
## Before classification, there are 242985 reads with mutations.
## After classification, there are 713715 reads/indexes which are only identical.
## After classification, there are 4901 reads/indexes which are strictly sequencer.
## After classification, there are 39399 reads/indexes which are consistently repeated.
## Counted by direction: 1283592 forward reads and 1344612 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s09.
##   Reading the file containing mutations: preprocessing/s09/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s09/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 548445 reads.
##    Mutation data: after min-position pruning, there are: 535384 reads: 13061 lost or 2.38%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 535384 reads.
##    Mutation data: after max-position pruning, there are: 510657 reads: 24727 lost or 4.62%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 510657 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 37788 reads, or 6.89%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1828143 indexes in all the data.
## After reads/index pruning, there are: 212279 indexes: 1615864 lost or 88.39%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 510657 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 128266 changed reads: 25.12%.
## All data: after index pruning, there are: 629763 identical reads: 26.01%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 629763 identical reads.
## Before classification, there are 128266 reads with mutations.
## After classification, there are 543040 reads/indexes which are only identical.
## After classification, there are 1746 reads/indexes which are strictly sequencer.
## After classification, there are 58224 reads/indexes which are consistently repeated.
## Counted by direction: 984786 forward reads and 1113674 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s15.
##   Reading the file containing mutations: preprocessing/s15/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s15/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2673515 reads.
##    Mutation data: after min-position pruning, there are: 2647455 reads: 26060 lost or 0.97%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 2647455 reads.
##    Mutation data: after max-position pruning, there are: 991764 reads: 1655691 lost or 62.54%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 989845 reads: 1919 lost or 0.19%.
##   Mutation data: all filters removed 1683670 reads, or 62.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 2141654 indexes in all the data.
## After reads/index pruning, there are: 184062 indexes: 1957592 lost or 91.41%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 989845 changed reads.
## All data: before reads/index pruning, there are: 2456860 identical reads.
## All data: after index pruning, there are: 203622 changed reads: 20.57%.
## All data: after index pruning, there are: 527371 identical reads: 21.47%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 527371 identical reads.
## Before classification, there are 203622 reads with mutations.
## After classification, there are 409128 reads/indexes which are only identical.
## After classification, there are 4209 reads/indexes which are strictly sequencer.
## After classification, there are 77986 reads/indexes which are consistently repeated.
## Counted by direction: 829602 forward reads and 883735 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10.
##   Reading the file containing mutations: preprocessing/s10/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 267014 reads.
##    Mutation data: after min-position pruning, there are: 258305 reads: 8709 lost or 3.26%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 258305 reads.
##    Mutation data: after max-position pruning, there are: 241183 reads: 17122 lost or 6.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 241183 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 25831 reads, or 9.67%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1155332 indexes in all the data.
## After reads/index pruning, there are: 210220 indexes: 945112 lost or 81.80%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 241183 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 80470 changed reads: 33.36%.
## All data: after index pruning, there are: 711560 identical reads: 38.37%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 711560 identical reads.
## Before classification, there are 80470 reads with mutations.
## After classification, there are 609800 reads/indexes which are only identical.
## After classification, there are 4234 reads/indexes which are strictly sequencer.
## After classification, there are 9529 reads/indexes which are consistently repeated.
## Counted by direction: 1102061 forward reads and 1251091 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11.
##   Reading the file containing mutations: preprocessing/s11/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 387777 reads.
##    Mutation data: after min-position pruning, there are: 376861 reads: 10916 lost or 2.82%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 376861 reads.
##    Mutation data: after max-position pruning, there are: 355520 reads: 21341 lost or 5.66%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 355520 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 32257 reads, or 8.32%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1305752 indexes in all the data.
## After reads/index pruning, there are: 255341 indexes: 1050411 lost or 80.44%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 355520 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 125781 changed reads: 35.38%.
## All data: after index pruning, there are: 833241 identical reads: 40.00%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 833241 identical reads.
## Before classification, there are 125781 reads with mutations.
## After classification, there are 709409 reads/indexes which are only identical.
## After classification, there are 4759 reads/indexes which are strictly sequencer.
## After classification, there are 35770 reads/indexes which are consistently repeated.
## Counted by direction: 1343231 forward reads and 1445534 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s14.
##   Reading the file containing mutations: preprocessing/s14/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s14/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 1504677 reads.
##    Mutation data: after min-position pruning, there are: 1487621 reads: 17056 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1487621 reads.
##    Mutation data: after max-position pruning, there are: 594257 reads: 893364 lost or 60.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 592951 reads: 1306 lost or 0.22%.
##   Mutation data: all filters removed 911726 reads, or 60.59%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1460860 indexes in all the data.
## After reads/index pruning, there are: 148473 indexes: 1312387 lost or 89.84%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 592951 changed reads.
## All data: before reads/index pruning, there are: 1841367 identical reads.
## All data: after index pruning, there are: 127239 changed reads: 21.46%.
## All data: after index pruning, there are: 437737 identical reads: 23.77%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 437737 identical reads.
## Before classification, there are 127239 reads with mutations.
## After classification, there are 346512 reads/indexes which are only identical.
## After classification, there are 1769 reads/indexes which are strictly sequencer.
## After classification, there are 26548 reads/indexes which are consistently repeated.
## Counted by direction: 616448 forward reads and 666447 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12.
##   Reading the file containing mutations: preprocessing/s12/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 558061 reads.
##    Mutation data: after min-position pruning, there are: 545764 reads: 12297 lost or 2.20%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 545764 reads.
##    Mutation data: after max-position pruning, there are: 516128 reads: 29636 lost or 5.43%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 516128 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 41933 reads, or 7.51%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1223666 indexes in all the data.
## After reads/index pruning, there are: 254006 indexes: 969660 lost or 79.24%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 516128 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 200673 changed reads: 38.88%.
## All data: after index pruning, there are: 771535 identical reads: 41.57%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 771535 identical reads.
## Before classification, there are 200673 reads with mutations.
## After classification, there are 644720 reads/indexes which are only identical.
## After classification, there are 5123 reads/indexes which are strictly sequencer.
## After classification, there are 103952 reads/indexes which are consistently repeated.
## Counted by direction: 1310889 forward reads and 1452706 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s16.
##   Reading the file containing mutations: preprocessing/s16/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s16/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 3156347 reads.
##    Mutation data: after min-position pruning, there are: 3124938 reads: 31409 lost or 1.00%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 3124938 reads.
##    Mutation data: after max-position pruning, there are: 1167013 reads: 1957925 lost or 62.65%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1165064 reads: 1949 lost or 0.17%.
##   Mutation data: all filters removed 1991283 reads, or 63.09%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1971830 indexes in all the data.
## After reads/index pruning, there are: 386564 indexes: 1585266 lost or 80.40%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1165064 changed reads.
## All data: before reads/index pruning, there are: 2817590 identical reads.
## All data: after index pruning, there are: 433242 changed reads: 37.19%.
## All data: after index pruning, there are: 1165354 identical reads: 41.36%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 1165354 identical reads.
## Before classification, there are 433242 reads with mutations.
## After classification, there are 884532 reads/indexes which are only identical.
## After classification, there are 13764 reads/indexes which are strictly sequencer.
## After classification, there are 151734 reads/indexes which are consistently repeated.
## Counted by direction: 1900567 forward reads and 1934815 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.

## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
##   Writing a legend.
## Warning in rbind(c("Parameter", "Purpose", "Setting"), c("min_reads", "Minimum
## number of reads for an index to be deemed 'real'", : number of columns of result
## is not a multiple of vector length (arg 2)
## Plotting Index density for mutant reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale
## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for mutant reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
## Repeat the same parameters using all samples
sample_sheet <- "sample_sheets/all_samples_202101.xlsx"
excel <- glue::glue("excel/{rundate}_all_samples_triples-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Error in read.xlsx.default(xlsxFile = file, sheet = 1) : 
##   File does not exist.
## Error in read_metadata(meta_file, ...): Unable to read the metadata file: sample_sheets/all_samples_202101.xlsx
## Repeat with only the recent RNA samples
sample_sheet <- "sample_sheets/rna_samples_202101.xlsx"
excel <- glue::glue("excel/{rundate}_rna_samples_triples-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Starting sample: s10.
##   Reading the file containing mutations: preprocessing/s10/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 267014 reads.
##    Mutation data: after min-position pruning, there are: 258305 reads: 8709 lost or 3.26%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 258305 reads.
##    Mutation data: after max-position pruning, there are: 241183 reads: 17122 lost or 6.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 241183 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 25831 reads, or 9.67%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1155332 indexes in all the data.
## After reads/index pruning, there are: 210220 indexes: 945112 lost or 81.80%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 241183 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 80470 changed reads: 33.36%.
## All data: after index pruning, there are: 711560 identical reads: 38.37%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 711560 identical reads.
## Before classification, there are 80470 reads with mutations.
## After classification, there are 609800 reads/indexes which are only identical.
## After classification, there are 4234 reads/indexes which are strictly sequencer.
## After classification, there are 9529 reads/indexes which are consistently repeated.
## Counted by direction: 1102061 forward reads and 1251091 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11.
##   Reading the file containing mutations: preprocessing/s11/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 387777 reads.
##    Mutation data: after min-position pruning, there are: 376861 reads: 10916 lost or 2.82%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 376861 reads.
##    Mutation data: after max-position pruning, there are: 355520 reads: 21341 lost or 5.66%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 355520 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 32257 reads, or 8.32%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1305752 indexes in all the data.
## After reads/index pruning, there are: 255341 indexes: 1050411 lost or 80.44%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 355520 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 125781 changed reads: 35.38%.
## All data: after index pruning, there are: 833241 identical reads: 40.00%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 833241 identical reads.
## Before classification, there are 125781 reads with mutations.
## After classification, there are 709409 reads/indexes which are only identical.
## After classification, there are 4759 reads/indexes which are strictly sequencer.
## After classification, there are 35770 reads/indexes which are consistently repeated.
## Counted by direction: 1343231 forward reads and 1445534 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s14.
##   Reading the file containing mutations: preprocessing/s14/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s14/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 1504677 reads.
##    Mutation data: after min-position pruning, there are: 1487621 reads: 17056 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1487621 reads.
##    Mutation data: after max-position pruning, there are: 594257 reads: 893364 lost or 60.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 592951 reads: 1306 lost or 0.22%.
##   Mutation data: all filters removed 911726 reads, or 60.59%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1460860 indexes in all the data.
## After reads/index pruning, there are: 148473 indexes: 1312387 lost or 89.84%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 592951 changed reads.
## All data: before reads/index pruning, there are: 1841367 identical reads.
## All data: after index pruning, there are: 127239 changed reads: 21.46%.
## All data: after index pruning, there are: 437737 identical reads: 23.77%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 437737 identical reads.
## Before classification, there are 127239 reads with mutations.
## After classification, there are 346512 reads/indexes which are only identical.
## After classification, there are 1769 reads/indexes which are strictly sequencer.
## After classification, there are 26548 reads/indexes which are consistently repeated.
## Counted by direction: 616448 forward reads and 666447 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12.
##   Reading the file containing mutations: preprocessing/s12/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 558061 reads.
##    Mutation data: after min-position pruning, there are: 545764 reads: 12297 lost or 2.20%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 545764 reads.
##    Mutation data: after max-position pruning, there are: 516128 reads: 29636 lost or 5.43%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 516128 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 41933 reads, or 7.51%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1223666 indexes in all the data.
## After reads/index pruning, there are: 254006 indexes: 969660 lost or 79.24%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 516128 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 200673 changed reads: 38.88%.
## All data: after index pruning, there are: 771535 identical reads: 41.57%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 771535 identical reads.
## Before classification, there are 200673 reads with mutations.
## After classification, there are 644720 reads/indexes which are only identical.
## After classification, there are 5123 reads/indexes which are strictly sequencer.
## After classification, there are 103952 reads/indexes which are consistently repeated.
## Counted by direction: 1310889 forward reads and 1452706 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s16.
##   Reading the file containing mutations: preprocessing/s16/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s16/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 3156347 reads.
##    Mutation data: after min-position pruning, there are: 3124938 reads: 31409 lost or 1.00%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 3124938 reads.
##    Mutation data: after max-position pruning, there are: 1167013 reads: 1957925 lost or 62.65%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1165064 reads: 1949 lost or 0.17%.
##   Mutation data: all filters removed 1991283 reads, or 63.09%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1971830 indexes in all the data.
## After reads/index pruning, there are: 386564 indexes: 1585266 lost or 80.40%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1165064 changed reads.
## All data: before reads/index pruning, there are: 2817590 identical reads.
## All data: after index pruning, there are: 433242 changed reads: 37.19%.
## All data: after index pruning, there are: 1165354 identical reads: 41.36%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 1165354 identical reads.
## Before classification, there are 433242 reads with mutations.
## After classification, there are 884532 reads/indexes which are only identical.
## After classification, there are 13764 reads/indexes which are strictly sequencer.
## After classification, there are 151734 reads/indexes which are consistently repeated.
## Counted by direction: 1900567 forward reads and 1934815 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.

## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
##   Writing a legend.
## Warning in rbind(c("Parameter", "Purpose", "Setting"), c("min_reads", "Minimum
## number of reads for an index to be deemed 'real'", : number of columns of result
## is not a multiple of vector length (arg 2)
## Plotting Index density for mutant reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale
## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for mutant reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_triples_tenmpr-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Starting sample: s10.
##   Reading the file containing mutations: preprocessing/s10/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 267014 reads.
##    Mutation data: after min-position pruning, there are: 258305 reads: 8709 lost or 3.26%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 258305 reads.
##    Mutation data: after max-position pruning, there are: 241183 reads: 17122 lost or 6.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 241183 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 25831 reads, or 9.67%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1155332 indexes in all the data.
## After reads/index pruning, there are: 210220 indexes: 945112 lost or 81.80%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 241183 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 80470 changed reads: 33.36%.
## All data: after index pruning, there are: 711560 identical reads: 38.37%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 711560 identical reads.
## Before classification, there are 80470 reads with mutations.
## After classification, there are 609800 reads/indexes which are only identical.
## After classification, there are 4234 reads/indexes which are strictly sequencer.
## After classification, there are 9529 reads/indexes which are consistently repeated.
## Counted by direction: 1102061 forward reads and 1251091 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11.
##   Reading the file containing mutations: preprocessing/s11/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 387777 reads.
##    Mutation data: after min-position pruning, there are: 376861 reads: 10916 lost or 2.82%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 376861 reads.
##    Mutation data: after max-position pruning, there are: 355520 reads: 21341 lost or 5.66%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 355520 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 32257 reads, or 8.32%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1305752 indexes in all the data.
## After reads/index pruning, there are: 255341 indexes: 1050411 lost or 80.44%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 355520 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 125781 changed reads: 35.38%.
## All data: after index pruning, there are: 833241 identical reads: 40.00%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 833241 identical reads.
## Before classification, there are 125781 reads with mutations.
## After classification, there are 709409 reads/indexes which are only identical.
## After classification, there are 4759 reads/indexes which are strictly sequencer.
## After classification, there are 35770 reads/indexes which are consistently repeated.
## Counted by direction: 1343231 forward reads and 1445534 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s14.
##   Reading the file containing mutations: preprocessing/s14/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s14/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 1504677 reads.
##    Mutation data: after min-position pruning, there are: 1487621 reads: 17056 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1487621 reads.
##    Mutation data: after max-position pruning, there are: 594257 reads: 893364 lost or 60.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 592951 reads: 1306 lost or 0.22%.
##   Mutation data: all filters removed 911726 reads, or 60.59%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1460860 indexes in all the data.
## After reads/index pruning, there are: 148473 indexes: 1312387 lost or 89.84%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 592951 changed reads.
## All data: before reads/index pruning, there are: 1841367 identical reads.
## All data: after index pruning, there are: 127239 changed reads: 21.46%.
## All data: after index pruning, there are: 437737 identical reads: 23.77%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 437737 identical reads.
## Before classification, there are 127239 reads with mutations.
## After classification, there are 346512 reads/indexes which are only identical.
## After classification, there are 1769 reads/indexes which are strictly sequencer.
## After classification, there are 26548 reads/indexes which are consistently repeated.
## Counted by direction: 616448 forward reads and 666447 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12.
##   Reading the file containing mutations: preprocessing/s12/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 558061 reads.
##    Mutation data: after min-position pruning, there are: 545764 reads: 12297 lost or 2.20%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 545764 reads.
##    Mutation data: after max-position pruning, there are: 516128 reads: 29636 lost or 5.43%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 516128 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 41933 reads, or 7.51%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1223666 indexes in all the data.
## After reads/index pruning, there are: 254006 indexes: 969660 lost or 79.24%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 516128 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 200673 changed reads: 38.88%.
## All data: after index pruning, there are: 771535 identical reads: 41.57%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 771535 identical reads.
## Before classification, there are 200673 reads with mutations.
## After classification, there are 644720 reads/indexes which are only identical.
## After classification, there are 5123 reads/indexes which are strictly sequencer.
## After classification, there are 103952 reads/indexes which are consistently repeated.
## Counted by direction: 1310889 forward reads and 1452706 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s16.
##   Reading the file containing mutations: preprocessing/s16/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s16/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 3156347 reads.
##    Mutation data: after min-position pruning, there are: 3124938 reads: 31409 lost or 1.00%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 3124938 reads.
##    Mutation data: after max-position pruning, there are: 1167013 reads: 1957925 lost or 62.65%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1165064 reads: 1949 lost or 0.17%.
##   Mutation data: all filters removed 1991283 reads, or 63.09%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1971830 indexes in all the data.
## After reads/index pruning, there are: 386564 indexes: 1585266 lost or 80.40%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1165064 changed reads.
## All data: before reads/index pruning, there are: 2817590 identical reads.
## All data: after index pruning, there are: 433242 changed reads: 37.19%.
## All data: after index pruning, there are: 1165354 identical reads: 41.36%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 1165354 identical reads.
## Before classification, there are 433242 reads with mutations.
## After classification, there are 884532 reads/indexes which are only identical.
## After classification, there are 13764 reads/indexes which are strictly sequencer.
## After classification, there are 151734 reads/indexes which are consistently repeated.
## Counted by direction: 1900567 forward reads and 1934815 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.

## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
##   Writing a legend.
## Warning in rbind(c("Parameter", "Purpose", "Setting"), c("min_reads", "Minimum
## number of reads for an index to be deemed 'real'", : number of columns of result
## is not a multiple of vector length (arg 2)
## Plotting Index density for mutant reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale
## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for mutant reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_triples_fivempr-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Starting sample: s10.
##   Reading the file containing mutations: preprocessing/s10/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 267014 reads.
##    Mutation data: after min-position pruning, there are: 258305 reads: 8709 lost or 3.26%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 258305 reads.
##    Mutation data: after max-position pruning, there are: 241183 reads: 17122 lost or 6.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 241183 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 25831 reads, or 9.67%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1155332 indexes in all the data.
## After reads/index pruning, there are: 210220 indexes: 945112 lost or 81.80%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 241183 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 80470 changed reads: 33.36%.
## All data: after index pruning, there are: 711560 identical reads: 38.37%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 711560 identical reads.
## Before classification, there are 80470 reads with mutations.
## After classification, there are 609800 reads/indexes which are only identical.
## After classification, there are 4234 reads/indexes which are strictly sequencer.
## After classification, there are 9529 reads/indexes which are consistently repeated.
## Counted by direction: 1102061 forward reads and 1251091 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11.
##   Reading the file containing mutations: preprocessing/s11/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 387777 reads.
##    Mutation data: after min-position pruning, there are: 376861 reads: 10916 lost or 2.82%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 376861 reads.
##    Mutation data: after max-position pruning, there are: 355520 reads: 21341 lost or 5.66%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 355520 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 32257 reads, or 8.32%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1305752 indexes in all the data.
## After reads/index pruning, there are: 255341 indexes: 1050411 lost or 80.44%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 355520 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 125781 changed reads: 35.38%.
## All data: after index pruning, there are: 833241 identical reads: 40.00%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 833241 identical reads.
## Before classification, there are 125781 reads with mutations.
## After classification, there are 709409 reads/indexes which are only identical.
## After classification, there are 4759 reads/indexes which are strictly sequencer.
## After classification, there are 35770 reads/indexes which are consistently repeated.
## Counted by direction: 1343231 forward reads and 1445534 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s14.
##   Reading the file containing mutations: preprocessing/s14/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s14/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 1504677 reads.
##    Mutation data: after min-position pruning, there are: 1487621 reads: 17056 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1487621 reads.
##    Mutation data: after max-position pruning, there are: 594257 reads: 893364 lost or 60.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 592951 reads: 1306 lost or 0.22%.
##   Mutation data: all filters removed 911726 reads, or 60.59%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1460860 indexes in all the data.
## After reads/index pruning, there are: 148473 indexes: 1312387 lost or 89.84%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 592951 changed reads.
## All data: before reads/index pruning, there are: 1841367 identical reads.
## All data: after index pruning, there are: 127239 changed reads: 21.46%.
## All data: after index pruning, there are: 437737 identical reads: 23.77%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 437737 identical reads.
## Before classification, there are 127239 reads with mutations.
## After classification, there are 346512 reads/indexes which are only identical.
## After classification, there are 1769 reads/indexes which are strictly sequencer.
## After classification, there are 26548 reads/indexes which are consistently repeated.
## Counted by direction: 616448 forward reads and 666447 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12.
##   Reading the file containing mutations: preprocessing/s12/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 558061 reads.
##    Mutation data: after min-position pruning, there are: 545764 reads: 12297 lost or 2.20%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 545764 reads.
##    Mutation data: after max-position pruning, there are: 516128 reads: 29636 lost or 5.43%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 516128 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 41933 reads, or 7.51%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1223666 indexes in all the data.
## After reads/index pruning, there are: 254006 indexes: 969660 lost or 79.24%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 516128 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 200673 changed reads: 38.88%.
## All data: after index pruning, there are: 771535 identical reads: 41.57%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 771535 identical reads.
## Before classification, there are 200673 reads with mutations.
## After classification, there are 644720 reads/indexes which are only identical.
## After classification, there are 5123 reads/indexes which are strictly sequencer.
## After classification, there are 103952 reads/indexes which are consistently repeated.
## Counted by direction: 1310889 forward reads and 1452706 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s16.
##   Reading the file containing mutations: preprocessing/s16/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s16/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 3156347 reads.
##    Mutation data: after min-position pruning, there are: 3124938 reads: 31409 lost or 1.00%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 3124938 reads.
##    Mutation data: after max-position pruning, there are: 1167013 reads: 1957925 lost or 62.65%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1165064 reads: 1949 lost or 0.17%.
##   Mutation data: all filters removed 1991283 reads, or 63.09%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1971830 indexes in all the data.
## After reads/index pruning, there are: 386564 indexes: 1585266 lost or 80.40%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1165064 changed reads.
## All data: before reads/index pruning, there are: 2817590 identical reads.
## All data: after index pruning, there are: 433242 changed reads: 37.19%.
## All data: after index pruning, there are: 1165354 identical reads: 41.36%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 1165354 identical reads.
## Before classification, there are 433242 reads with mutations.
## After classification, there are 884532 reads/indexes which are only identical.
## After classification, there are 13764 reads/indexes which are strictly sequencer.
## After classification, there are 151734 reads/indexes which are consistently repeated.
## Counted by direction: 1900567 forward reads and 1934815 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.

## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s10, s11, s14, s12, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
##   Writing a legend.
## Warning in rbind(c("Parameter", "Purpose", "Setting"), c("min_reads", "Minimum
## number of reads for an index to be deemed 'real'", : number of columns of result
## is not a multiple of vector length (arg 2)
## Plotting Index density for mutant reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale
## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for mutant reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.

1.2 Categorize the data with at least 5 indexes per mutant

min_indexes <- 5
max_mutations_per_read <- NULL
sample_sheet <- "sample_sheets/recent_samples_2020.xlsx"
excel <- glue::glue("excel/{rundate}_recent_samples_quints-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Starting sample: s07.
##   Reading the file containing mutations: preprocessing/s07/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s07/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 344009 reads.
##    Mutation data: after min-position pruning, there are: 332034 reads: 11975 lost or 3.48%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 332034 reads.
##    Mutation data: after max-position pruning, there are: 309791 reads: 22243 lost or 6.70%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 309791 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 34218 reads, or 9.95%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1392027 indexes in all the data.
## After reads/index pruning, there are: 258515 indexes: 1133512 lost or 81.43%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 309791 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 106390 changed reads: 34.34%.
## All data: after index pruning, there are: 870351 identical reads: 38.84%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 870351 identical reads.
## Before classification, there are 106390 reads with mutations.
## After classification, there are 741425 reads/indexes which are only identical.
## After classification, there are 5422 reads/indexes which are strictly sequencer.
## After classification, there are 11996 reads/indexes which are consistently repeated.
## Counted by direction: 1337121 forward reads and 1516515 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s17.
##   Reading the file containing mutations: preprocessing/s17/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s17/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2004637 reads.
##    Mutation data: after min-position pruning, there are: 1978823 reads: 25814 lost or 1.29%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1978823 reads.
##    Mutation data: after max-position pruning, there are: 838424 reads: 1140399 lost or 57.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 836843 reads: 1581 lost or 0.19%.
##   Mutation data: all filters removed 1167794 reads, or 58.25%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1673173 indexes in all the data.
## After reads/index pruning, there are: 302181 indexes: 1370992 lost or 81.94%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 836843 changed reads.
## All data: before reads/index pruning, there are: 2472166 identical reads.
## All data: after index pruning, there are: 286196 changed reads: 34.20%.
## All data: after index pruning, there are: 962051 identical reads: 38.92%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 962051 identical reads.
## Before classification, there are 286196 reads with mutations.
## After classification, there are 716862 reads/indexes which are only identical.
## After classification, there are 10570 reads/indexes which are strictly sequencer.
## After classification, there are 30217 reads/indexes which are consistently repeated.
## Counted by direction: 1339101 forward reads and 1413584 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s08.
##   Reading the file containing mutations: preprocessing/s08/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s08/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 418833 reads.
##    Mutation data: after min-position pruning, there are: 407418 reads: 11415 lost or 2.73%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 407418 reads.
##    Mutation data: after max-position pruning, there are: 382773 reads: 24645 lost or 6.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 382773 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 36060 reads, or 8.61%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1949558 indexes in all the data.
## After reads/index pruning, there are: 177278 indexes: 1772280 lost or 90.91%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 382773 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 78228 changed reads: 20.44%.
## All data: after index pruning, there are: 540224 identical reads: 21.05%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 540224 identical reads.
## Before classification, there are 78228 reads with mutations.
## After classification, there are 468416 reads/indexes which are only identical.
## After classification, there are 1067 reads/indexes which are strictly sequencer.
## After classification, there are 20032 reads/indexes which are consistently repeated.
## Counted by direction: 770967 forward reads and 901245 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s13.
##   Reading the file containing mutations: preprocessing/s13/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s13/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2902085 reads.
##    Mutation data: after min-position pruning, there are: 2869333 reads: 32752 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 2869333 reads.
##    Mutation data: after max-position pruning, there are: 1150787 reads: 1718546 lost or 59.89%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1148128 reads: 2659 lost or 0.23%.
##   Mutation data: all filters removed 1753957 reads, or 60.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 2989249 indexes in all the data.
## After reads/index pruning, there are: 300278 indexes: 2688971 lost or 89.95%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1148128 changed reads.
## All data: before reads/index pruning, there are: 3778131 identical reads.
## All data: after index pruning, there are: 242985 changed reads: 21.16%.
## All data: after index pruning, there are: 902374 identical reads: 23.88%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 902374 identical reads.
## Before classification, there are 242985 reads with mutations.
## After classification, there are 713715 reads/indexes which are only identical.
## After classification, there are 4901 reads/indexes which are strictly sequencer.
## After classification, there are 39399 reads/indexes which are consistently repeated.
## Counted by direction: 1283592 forward reads and 1344612 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s09.
##   Reading the file containing mutations: preprocessing/s09/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s09/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 548445 reads.
##    Mutation data: after min-position pruning, there are: 535384 reads: 13061 lost or 2.38%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 535384 reads.
##    Mutation data: after max-position pruning, there are: 510657 reads: 24727 lost or 4.62%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 510657 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 37788 reads, or 6.89%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1828143 indexes in all the data.
## After reads/index pruning, there are: 212279 indexes: 1615864 lost or 88.39%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 510657 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 128266 changed reads: 25.12%.
## All data: after index pruning, there are: 629763 identical reads: 26.01%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 629763 identical reads.
## Before classification, there are 128266 reads with mutations.
## After classification, there are 543040 reads/indexes which are only identical.
## After classification, there are 1746 reads/indexes which are strictly sequencer.
## After classification, there are 58224 reads/indexes which are consistently repeated.
## Counted by direction: 984786 forward reads and 1113674 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s15.
##   Reading the file containing mutations: preprocessing/s15/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s15/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2673515 reads.
##    Mutation data: after min-position pruning, there are: 2647455 reads: 26060 lost or 0.97%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 2647455 reads.
##    Mutation data: after max-position pruning, there are: 991764 reads: 1655691 lost or 62.54%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 989845 reads: 1919 lost or 0.19%.
##   Mutation data: all filters removed 1683670 reads, or 62.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 2141654 indexes in all the data.
## After reads/index pruning, there are: 184062 indexes: 1957592 lost or 91.41%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 989845 changed reads.
## All data: before reads/index pruning, there are: 2456860 identical reads.
## All data: after index pruning, there are: 203622 changed reads: 20.57%.
## All data: after index pruning, there are: 527371 identical reads: 21.47%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 527371 identical reads.
## Before classification, there are 203622 reads with mutations.
## After classification, there are 409128 reads/indexes which are only identical.
## After classification, there are 4209 reads/indexes which are strictly sequencer.
## After classification, there are 77986 reads/indexes which are consistently repeated.
## Counted by direction: 829602 forward reads and 883735 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10.
##   Reading the file containing mutations: preprocessing/s10/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 267014 reads.
##    Mutation data: after min-position pruning, there are: 258305 reads: 8709 lost or 3.26%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 258305 reads.
##    Mutation data: after max-position pruning, there are: 241183 reads: 17122 lost or 6.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 241183 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 25831 reads, or 9.67%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1155332 indexes in all the data.
## After reads/index pruning, there are: 210220 indexes: 945112 lost or 81.80%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 241183 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 80470 changed reads: 33.36%.
## All data: after index pruning, there are: 711560 identical reads: 38.37%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 711560 identical reads.
## Before classification, there are 80470 reads with mutations.
## After classification, there are 609800 reads/indexes which are only identical.
## After classification, there are 4234 reads/indexes which are strictly sequencer.
## After classification, there are 9529 reads/indexes which are consistently repeated.
## Counted by direction: 1102061 forward reads and 1251091 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11.
##   Reading the file containing mutations: preprocessing/s11/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 387777 reads.
##    Mutation data: after min-position pruning, there are: 376861 reads: 10916 lost or 2.82%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 376861 reads.
##    Mutation data: after max-position pruning, there are: 355520 reads: 21341 lost or 5.66%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 355520 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 32257 reads, or 8.32%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1305752 indexes in all the data.
## After reads/index pruning, there are: 255341 indexes: 1050411 lost or 80.44%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 355520 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 125781 changed reads: 35.38%.
## All data: after index pruning, there are: 833241 identical reads: 40.00%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 833241 identical reads.
## Before classification, there are 125781 reads with mutations.
## After classification, there are 709409 reads/indexes which are only identical.
## After classification, there are 4759 reads/indexes which are strictly sequencer.
## After classification, there are 35770 reads/indexes which are consistently repeated.
## Counted by direction: 1343231 forward reads and 1445534 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s14.
##   Reading the file containing mutations: preprocessing/s14/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s14/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 1504677 reads.
##    Mutation data: after min-position pruning, there are: 1487621 reads: 17056 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1487621 reads.
##    Mutation data: after max-position pruning, there are: 594257 reads: 893364 lost or 60.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 592951 reads: 1306 lost or 0.22%.
##   Mutation data: all filters removed 911726 reads, or 60.59%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1460860 indexes in all the data.
## After reads/index pruning, there are: 148473 indexes: 1312387 lost or 89.84%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 592951 changed reads.
## All data: before reads/index pruning, there are: 1841367 identical reads.
## All data: after index pruning, there are: 127239 changed reads: 21.46%.
## All data: after index pruning, there are: 437737 identical reads: 23.77%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 437737 identical reads.
## Before classification, there are 127239 reads with mutations.
## After classification, there are 346512 reads/indexes which are only identical.
## After classification, there are 1769 reads/indexes which are strictly sequencer.
## After classification, there are 26548 reads/indexes which are consistently repeated.
## Counted by direction: 616448 forward reads and 666447 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12.
##   Reading the file containing mutations: preprocessing/s12/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 558061 reads.
##    Mutation data: after min-position pruning, there are: 545764 reads: 12297 lost or 2.20%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 545764 reads.
##    Mutation data: after max-position pruning, there are: 516128 reads: 29636 lost or 5.43%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 516128 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 41933 reads, or 7.51%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1223666 indexes in all the data.
## After reads/index pruning, there are: 254006 indexes: 969660 lost or 79.24%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 516128 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 200673 changed reads: 38.88%.
## All data: after index pruning, there are: 771535 identical reads: 41.57%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 771535 identical reads.
## Before classification, there are 200673 reads with mutations.
## After classification, there are 644720 reads/indexes which are only identical.
## After classification, there are 5123 reads/indexes which are strictly sequencer.
## After classification, there are 103952 reads/indexes which are consistently repeated.
## Counted by direction: 1310889 forward reads and 1452706 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s16.
##   Reading the file containing mutations: preprocessing/s16/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s16/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 3156347 reads.
##    Mutation data: after min-position pruning, there are: 3124938 reads: 31409 lost or 1.00%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 3124938 reads.
##    Mutation data: after max-position pruning, there are: 1167013 reads: 1957925 lost or 62.65%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1165064 reads: 1949 lost or 0.17%.
##   Mutation data: all filters removed 1991283 reads, or 63.09%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1971830 indexes in all the data.
## After reads/index pruning, there are: 386564 indexes: 1585266 lost or 80.40%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1165064 changed reads.
## All data: before reads/index pruning, there are: 2817590 identical reads.
## All data: after index pruning, there are: 433242 changed reads: 37.19%.
## All data: after index pruning, there are: 1165354 identical reads: 41.36%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 1165354 identical reads.
## Before classification, there are 433242 reads with mutations.
## After classification, there are 884532 reads/indexes which are only identical.
## After classification, there are 13764 reads/indexes which are strictly sequencer.
## After classification, there are 151734 reads/indexes which are consistently repeated.
## Counted by direction: 1900567 forward reads and 1934815 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.

## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.

## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
##   Writing a legend.
## Warning in rbind(c("Parameter", "Purpose", "Setting"), c("min_reads", "Minimum
## number of reads for an index to be deemed 'real'", : number of columns of result
## is not a multiple of vector length (arg 2)
## Plotting Index density for mutant reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale
## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for mutant reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_recent_samples_quints_tenmpr-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Starting sample: s07.
##   Reading the file containing mutations: preprocessing/s07/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s07/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 344009 reads.
##    Mutation data: after min-position pruning, there are: 332034 reads: 11975 lost or 3.48%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 332034 reads.
##    Mutation data: after max-position pruning, there are: 309791 reads: 22243 lost or 6.70%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 309791 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 34218 reads, or 9.95%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1392027 indexes in all the data.
## After reads/index pruning, there are: 258515 indexes: 1133512 lost or 81.43%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 309791 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 106390 changed reads: 34.34%.
## All data: after index pruning, there are: 870351 identical reads: 38.84%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 870351 identical reads.
## Before classification, there are 106390 reads with mutations.
## After classification, there are 741425 reads/indexes which are only identical.
## After classification, there are 5422 reads/indexes which are strictly sequencer.
## After classification, there are 11996 reads/indexes which are consistently repeated.
## Counted by direction: 1337121 forward reads and 1516515 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s17.
##   Reading the file containing mutations: preprocessing/s17/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s17/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2004637 reads.
##    Mutation data: after min-position pruning, there are: 1978823 reads: 25814 lost or 1.29%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1978823 reads.
##    Mutation data: after max-position pruning, there are: 838424 reads: 1140399 lost or 57.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 836843 reads: 1581 lost or 0.19%.
##   Mutation data: all filters removed 1167794 reads, or 58.25%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1673173 indexes in all the data.
## After reads/index pruning, there are: 302181 indexes: 1370992 lost or 81.94%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 836843 changed reads.
## All data: before reads/index pruning, there are: 2472166 identical reads.
## All data: after index pruning, there are: 286196 changed reads: 34.20%.
## All data: after index pruning, there are: 962051 identical reads: 38.92%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 962051 identical reads.
## Before classification, there are 286196 reads with mutations.
## After classification, there are 716862 reads/indexes which are only identical.
## After classification, there are 10570 reads/indexes which are strictly sequencer.
## After classification, there are 30217 reads/indexes which are consistently repeated.
## Counted by direction: 1339101 forward reads and 1413584 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s08.
##   Reading the file containing mutations: preprocessing/s08/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s08/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 418833 reads.
##    Mutation data: after min-position pruning, there are: 407418 reads: 11415 lost or 2.73%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 407418 reads.
##    Mutation data: after max-position pruning, there are: 382773 reads: 24645 lost or 6.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 382773 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 36060 reads, or 8.61%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1949558 indexes in all the data.
## After reads/index pruning, there are: 177278 indexes: 1772280 lost or 90.91%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 382773 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 78228 changed reads: 20.44%.
## All data: after index pruning, there are: 540224 identical reads: 21.05%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 540224 identical reads.
## Before classification, there are 78228 reads with mutations.
## After classification, there are 468416 reads/indexes which are only identical.
## After classification, there are 1067 reads/indexes which are strictly sequencer.
## After classification, there are 20032 reads/indexes which are consistently repeated.
## Counted by direction: 770967 forward reads and 901245 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s13.
##   Reading the file containing mutations: preprocessing/s13/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s13/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2902085 reads.
##    Mutation data: after min-position pruning, there are: 2869333 reads: 32752 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 2869333 reads.
##    Mutation data: after max-position pruning, there are: 1150787 reads: 1718546 lost or 59.89%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1148128 reads: 2659 lost or 0.23%.
##   Mutation data: all filters removed 1753957 reads, or 60.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 2989249 indexes in all the data.
## After reads/index pruning, there are: 300278 indexes: 2688971 lost or 89.95%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1148128 changed reads.
## All data: before reads/index pruning, there are: 3778131 identical reads.
## All data: after index pruning, there are: 242985 changed reads: 21.16%.
## All data: after index pruning, there are: 902374 identical reads: 23.88%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 902374 identical reads.
## Before classification, there are 242985 reads with mutations.
## After classification, there are 713715 reads/indexes which are only identical.
## After classification, there are 4901 reads/indexes which are strictly sequencer.
## After classification, there are 39399 reads/indexes which are consistently repeated.
## Counted by direction: 1283592 forward reads and 1344612 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s09.
##   Reading the file containing mutations: preprocessing/s09/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s09/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 548445 reads.
##    Mutation data: after min-position pruning, there are: 535384 reads: 13061 lost or 2.38%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 535384 reads.
##    Mutation data: after max-position pruning, there are: 510657 reads: 24727 lost or 4.62%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 510657 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 37788 reads, or 6.89%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1828143 indexes in all the data.
## After reads/index pruning, there are: 212279 indexes: 1615864 lost or 88.39%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 510657 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 128266 changed reads: 25.12%.
## All data: after index pruning, there are: 629763 identical reads: 26.01%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 629763 identical reads.
## Before classification, there are 128266 reads with mutations.
## After classification, there are 543040 reads/indexes which are only identical.
## After classification, there are 1746 reads/indexes which are strictly sequencer.
## After classification, there are 58224 reads/indexes which are consistently repeated.
## Counted by direction: 984786 forward reads and 1113674 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s15.
##   Reading the file containing mutations: preprocessing/s15/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s15/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2673515 reads.
##    Mutation data: after min-position pruning, there are: 2647455 reads: 26060 lost or 0.97%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 2647455 reads.
##    Mutation data: after max-position pruning, there are: 991764 reads: 1655691 lost or 62.54%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 989845 reads: 1919 lost or 0.19%.
##   Mutation data: all filters removed 1683670 reads, or 62.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 2141654 indexes in all the data.
## After reads/index pruning, there are: 184062 indexes: 1957592 lost or 91.41%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 989845 changed reads.
## All data: before reads/index pruning, there are: 2456860 identical reads.
## All data: after index pruning, there are: 203622 changed reads: 20.57%.
## All data: after index pruning, there are: 527371 identical reads: 21.47%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 527371 identical reads.
## Before classification, there are 203622 reads with mutations.
## After classification, there are 409128 reads/indexes which are only identical.
## After classification, there are 4209 reads/indexes which are strictly sequencer.
## After classification, there are 77986 reads/indexes which are consistently repeated.
## Counted by direction: 829602 forward reads and 883735 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10.
##   Reading the file containing mutations: preprocessing/s10/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 267014 reads.
##    Mutation data: after min-position pruning, there are: 258305 reads: 8709 lost or 3.26%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 258305 reads.
##    Mutation data: after max-position pruning, there are: 241183 reads: 17122 lost or 6.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 241183 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 25831 reads, or 9.67%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1155332 indexes in all the data.
## After reads/index pruning, there are: 210220 indexes: 945112 lost or 81.80%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 241183 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 80470 changed reads: 33.36%.
## All data: after index pruning, there are: 711560 identical reads: 38.37%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 711560 identical reads.
## Before classification, there are 80470 reads with mutations.
## After classification, there are 609800 reads/indexes which are only identical.
## After classification, there are 4234 reads/indexes which are strictly sequencer.
## After classification, there are 9529 reads/indexes which are consistently repeated.
## Counted by direction: 1102061 forward reads and 1251091 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11.
##   Reading the file containing mutations: preprocessing/s11/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 387777 reads.
##    Mutation data: after min-position pruning, there are: 376861 reads: 10916 lost or 2.82%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 376861 reads.
##    Mutation data: after max-position pruning, there are: 355520 reads: 21341 lost or 5.66%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 355520 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 32257 reads, or 8.32%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1305752 indexes in all the data.
## After reads/index pruning, there are: 255341 indexes: 1050411 lost or 80.44%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 355520 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 125781 changed reads: 35.38%.
## All data: after index pruning, there are: 833241 identical reads: 40.00%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 833241 identical reads.
## Before classification, there are 125781 reads with mutations.
## After classification, there are 709409 reads/indexes which are only identical.
## After classification, there are 4759 reads/indexes which are strictly sequencer.
## After classification, there are 35770 reads/indexes which are consistently repeated.
## Counted by direction: 1343231 forward reads and 1445534 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s14.
##   Reading the file containing mutations: preprocessing/s14/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s14/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 1504677 reads.
##    Mutation data: after min-position pruning, there are: 1487621 reads: 17056 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1487621 reads.
##    Mutation data: after max-position pruning, there are: 594257 reads: 893364 lost or 60.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 592951 reads: 1306 lost or 0.22%.
##   Mutation data: all filters removed 911726 reads, or 60.59%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1460860 indexes in all the data.
## After reads/index pruning, there are: 148473 indexes: 1312387 lost or 89.84%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 592951 changed reads.
## All data: before reads/index pruning, there are: 1841367 identical reads.
## All data: after index pruning, there are: 127239 changed reads: 21.46%.
## All data: after index pruning, there are: 437737 identical reads: 23.77%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 437737 identical reads.
## Before classification, there are 127239 reads with mutations.
## After classification, there are 346512 reads/indexes which are only identical.
## After classification, there are 1769 reads/indexes which are strictly sequencer.
## After classification, there are 26548 reads/indexes which are consistently repeated.
## Counted by direction: 616448 forward reads and 666447 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12.
##   Reading the file containing mutations: preprocessing/s12/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 558061 reads.
##    Mutation data: after min-position pruning, there are: 545764 reads: 12297 lost or 2.20%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 545764 reads.
##    Mutation data: after max-position pruning, there are: 516128 reads: 29636 lost or 5.43%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 516128 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 41933 reads, or 7.51%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1223666 indexes in all the data.
## After reads/index pruning, there are: 254006 indexes: 969660 lost or 79.24%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 516128 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 200673 changed reads: 38.88%.
## All data: after index pruning, there are: 771535 identical reads: 41.57%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 771535 identical reads.
## Before classification, there are 200673 reads with mutations.
## After classification, there are 644720 reads/indexes which are only identical.
## After classification, there are 5123 reads/indexes which are strictly sequencer.
## After classification, there are 103952 reads/indexes which are consistently repeated.
## Counted by direction: 1310889 forward reads and 1452706 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s16.
##   Reading the file containing mutations: preprocessing/s16/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s16/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 3156347 reads.
##    Mutation data: after min-position pruning, there are: 3124938 reads: 31409 lost or 1.00%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 3124938 reads.
##    Mutation data: after max-position pruning, there are: 1167013 reads: 1957925 lost or 62.65%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1165064 reads: 1949 lost or 0.17%.
##   Mutation data: all filters removed 1991283 reads, or 63.09%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1971830 indexes in all the data.
## After reads/index pruning, there are: 386564 indexes: 1585266 lost or 80.40%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1165064 changed reads.
## All data: before reads/index pruning, there are: 2817590 identical reads.
## All data: after index pruning, there are: 433242 changed reads: 37.19%.
## All data: after index pruning, there are: 1165354 identical reads: 41.36%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 1165354 identical reads.
## Before classification, there are 433242 reads with mutations.
## After classification, there are 884532 reads/indexes which are only identical.
## After classification, there are 13764 reads/indexes which are strictly sequencer.
## After classification, there are 151734 reads/indexes which are consistently repeated.
## Counted by direction: 1900567 forward reads and 1934815 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.

## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
##   Writing a legend.
## Warning in rbind(c("Parameter", "Purpose", "Setting"), c("min_reads", "Minimum
## number of reads for an index to be deemed 'real'", : number of columns of result
## is not a multiple of vector length (arg 2)
## Plotting Index density for mutant reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale
## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for mutant reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_recent_samples_quints_fivempr-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Starting sample: s07.
##   Reading the file containing mutations: preprocessing/s07/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s07/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 344009 reads.
##    Mutation data: after min-position pruning, there are: 332034 reads: 11975 lost or 3.48%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 332034 reads.
##    Mutation data: after max-position pruning, there are: 309791 reads: 22243 lost or 6.70%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 309791 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 34218 reads, or 9.95%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1392027 indexes in all the data.
## After reads/index pruning, there are: 258515 indexes: 1133512 lost or 81.43%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 309791 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 106390 changed reads: 34.34%.
## All data: after index pruning, there are: 870351 identical reads: 38.84%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 870351 identical reads.
## Before classification, there are 106390 reads with mutations.
## After classification, there are 741425 reads/indexes which are only identical.
## After classification, there are 5422 reads/indexes which are strictly sequencer.
## After classification, there are 11996 reads/indexes which are consistently repeated.
## Counted by direction: 1337121 forward reads and 1516515 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s17.
##   Reading the file containing mutations: preprocessing/s17/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s17/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2004637 reads.
##    Mutation data: after min-position pruning, there are: 1978823 reads: 25814 lost or 1.29%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1978823 reads.
##    Mutation data: after max-position pruning, there are: 838424 reads: 1140399 lost or 57.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 836843 reads: 1581 lost or 0.19%.
##   Mutation data: all filters removed 1167794 reads, or 58.25%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1673173 indexes in all the data.
## After reads/index pruning, there are: 302181 indexes: 1370992 lost or 81.94%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 836843 changed reads.
## All data: before reads/index pruning, there are: 2472166 identical reads.
## All data: after index pruning, there are: 286196 changed reads: 34.20%.
## All data: after index pruning, there are: 962051 identical reads: 38.92%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 962051 identical reads.
## Before classification, there are 286196 reads with mutations.
## After classification, there are 716862 reads/indexes which are only identical.
## After classification, there are 10570 reads/indexes which are strictly sequencer.
## After classification, there are 30217 reads/indexes which are consistently repeated.
## Counted by direction: 1339101 forward reads and 1413584 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s08.
##   Reading the file containing mutations: preprocessing/s08/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s08/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 418833 reads.
##    Mutation data: after min-position pruning, there are: 407418 reads: 11415 lost or 2.73%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 407418 reads.
##    Mutation data: after max-position pruning, there are: 382773 reads: 24645 lost or 6.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 382773 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 36060 reads, or 8.61%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1949558 indexes in all the data.
## After reads/index pruning, there are: 177278 indexes: 1772280 lost or 90.91%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 382773 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 78228 changed reads: 20.44%.
## All data: after index pruning, there are: 540224 identical reads: 21.05%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 540224 identical reads.
## Before classification, there are 78228 reads with mutations.
## After classification, there are 468416 reads/indexes which are only identical.
## After classification, there are 1067 reads/indexes which are strictly sequencer.
## After classification, there are 20032 reads/indexes which are consistently repeated.
## Counted by direction: 770967 forward reads and 901245 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s13.
##   Reading the file containing mutations: preprocessing/s13/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s13/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2902085 reads.
##    Mutation data: after min-position pruning, there are: 2869333 reads: 32752 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 2869333 reads.
##    Mutation data: after max-position pruning, there are: 1150787 reads: 1718546 lost or 59.89%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1148128 reads: 2659 lost or 0.23%.
##   Mutation data: all filters removed 1753957 reads, or 60.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 2989249 indexes in all the data.
## After reads/index pruning, there are: 300278 indexes: 2688971 lost or 89.95%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1148128 changed reads.
## All data: before reads/index pruning, there are: 3778131 identical reads.
## All data: after index pruning, there are: 242985 changed reads: 21.16%.
## All data: after index pruning, there are: 902374 identical reads: 23.88%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 902374 identical reads.
## Before classification, there are 242985 reads with mutations.
## After classification, there are 713715 reads/indexes which are only identical.
## After classification, there are 4901 reads/indexes which are strictly sequencer.
## After classification, there are 39399 reads/indexes which are consistently repeated.
## Counted by direction: 1283592 forward reads and 1344612 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s09.
##   Reading the file containing mutations: preprocessing/s09/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s09/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 548445 reads.
##    Mutation data: after min-position pruning, there are: 535384 reads: 13061 lost or 2.38%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 535384 reads.
##    Mutation data: after max-position pruning, there are: 510657 reads: 24727 lost or 4.62%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 510657 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 37788 reads, or 6.89%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1828143 indexes in all the data.
## After reads/index pruning, there are: 212279 indexes: 1615864 lost or 88.39%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 510657 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 128266 changed reads: 25.12%.
## All data: after index pruning, there are: 629763 identical reads: 26.01%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 629763 identical reads.
## Before classification, there are 128266 reads with mutations.
## After classification, there are 543040 reads/indexes which are only identical.
## After classification, there are 1746 reads/indexes which are strictly sequencer.
## After classification, there are 58224 reads/indexes which are consistently repeated.
## Counted by direction: 984786 forward reads and 1113674 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s15.
##   Reading the file containing mutations: preprocessing/s15/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s15/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 2673515 reads.
##    Mutation data: after min-position pruning, there are: 2647455 reads: 26060 lost or 0.97%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 2647455 reads.
##    Mutation data: after max-position pruning, there are: 991764 reads: 1655691 lost or 62.54%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 989845 reads: 1919 lost or 0.19%.
##   Mutation data: all filters removed 1683670 reads, or 62.98%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 2141654 indexes in all the data.
## After reads/index pruning, there are: 184062 indexes: 1957592 lost or 91.41%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 989845 changed reads.
## All data: before reads/index pruning, there are: 2456860 identical reads.
## All data: after index pruning, there are: 203622 changed reads: 20.57%.
## All data: after index pruning, there are: 527371 identical reads: 21.47%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 527371 identical reads.
## Before classification, there are 203622 reads with mutations.
## After classification, there are 409128 reads/indexes which are only identical.
## After classification, there are 4209 reads/indexes which are strictly sequencer.
## After classification, there are 77986 reads/indexes which are consistently repeated.
## Counted by direction: 829602 forward reads and 883735 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10.
##   Reading the file containing mutations: preprocessing/s10/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 267014 reads.
##    Mutation data: after min-position pruning, there are: 258305 reads: 8709 lost or 3.26%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 258305 reads.
##    Mutation data: after max-position pruning, there are: 241183 reads: 17122 lost or 6.63%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 241183 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 25831 reads, or 9.67%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1155332 indexes in all the data.
## After reads/index pruning, there are: 210220 indexes: 945112 lost or 81.80%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 241183 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 80470 changed reads: 33.36%.
## All data: after index pruning, there are: 711560 identical reads: 38.37%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 711560 identical reads.
## Before classification, there are 80470 reads with mutations.
## After classification, there are 609800 reads/indexes which are only identical.
## After classification, there are 4234 reads/indexes which are strictly sequencer.
## After classification, there are 9529 reads/indexes which are consistently repeated.
## Counted by direction: 1102061 forward reads and 1251091 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11.
##   Reading the file containing mutations: preprocessing/s11/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 387777 reads.
##    Mutation data: after min-position pruning, there are: 376861 reads: 10916 lost or 2.82%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 376861 reads.
##    Mutation data: after max-position pruning, there are: 355520 reads: 21341 lost or 5.66%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 355520 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 32257 reads, or 8.32%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1305752 indexes in all the data.
## After reads/index pruning, there are: 255341 indexes: 1050411 lost or 80.44%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 355520 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 125781 changed reads: 35.38%.
## All data: after index pruning, there are: 833241 identical reads: 40.00%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 833241 identical reads.
## Before classification, there are 125781 reads with mutations.
## After classification, there are 709409 reads/indexes which are only identical.
## After classification, there are 4759 reads/indexes which are strictly sequencer.
## After classification, there are 35770 reads/indexes which are consistently repeated.
## Counted by direction: 1343231 forward reads and 1445534 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s14.
##   Reading the file containing mutations: preprocessing/s14/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s14/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 1504677 reads.
##    Mutation data: after min-position pruning, there are: 1487621 reads: 17056 lost or 1.13%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 1487621 reads.
##    Mutation data: after max-position pruning, there are: 594257 reads: 893364 lost or 60.05%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 592951 reads: 1306 lost or 0.22%.
##   Mutation data: all filters removed 911726 reads, or 60.59%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1460860 indexes in all the data.
## After reads/index pruning, there are: 148473 indexes: 1312387 lost or 89.84%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 592951 changed reads.
## All data: before reads/index pruning, there are: 1841367 identical reads.
## All data: after index pruning, there are: 127239 changed reads: 21.46%.
## All data: after index pruning, there are: 437737 identical reads: 23.77%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 437737 identical reads.
## Before classification, there are 127239 reads with mutations.
## After classification, there are 346512 reads/indexes which are only identical.
## After classification, there are 1769 reads/indexes which are strictly sequencer.
## After classification, there are 26548 reads/indexes which are consistently repeated.
## Counted by direction: 616448 forward reads and 666447 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12.
##   Reading the file containing mutations: preprocessing/s12/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 558061 reads.
##    Mutation data: after min-position pruning, there are: 545764 reads: 12297 lost or 2.20%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 545764 reads.
##    Mutation data: after max-position pruning, there are: 516128 reads: 29636 lost or 5.43%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 516128 reads: 0 lost or 0.00%.
##   Mutation data: all filters removed 41933 reads, or 7.51%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1223666 indexes in all the data.
## After reads/index pruning, there are: 254006 indexes: 969660 lost or 79.24%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 516128 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 200673 changed reads: 38.88%.
## All data: after index pruning, there are: 771535 identical reads: 41.57%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 771535 identical reads.
## Before classification, there are 200673 reads with mutations.
## After classification, there are 644720 reads/indexes which are only identical.
## After classification, there are 5123 reads/indexes which are strictly sequencer.
## After classification, there are 103952 reads/indexes which are consistently repeated.
## Counted by direction: 1310889 forward reads and 1452706 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s16.
##   Reading the file containing mutations: preprocessing/s16/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s16/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
##    Mutation data: removing any differences before position: 22.
##    Mutation data: before pruning, there are: 3156347 reads.
##    Mutation data: after min-position pruning, there are: 3124938 reads: 31409 lost or 1.00%.
##    Mutation data: removing any differences after position: 185.
##    Mutation data: before pruning, there are: 3124938 reads.
##    Mutation data: after max-position pruning, there are: 1167013 reads: 1957925 lost or 62.65%.
##    Mutation data: removing any reads with 'N' as the hit.
##    Mutation data: after N pruning, there are: 1165064 reads: 1949 lost or 0.17%.
##   Mutation data: all filters removed 1991283 reads, or 63.09%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1971830 indexes in all the data.
## After reads/index pruning, there are: 386564 indexes: 1585266 lost or 80.40%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 1165064 changed reads.
## All data: before reads/index pruning, there are: 2817590 identical reads.
## All data: after index pruning, there are: 433242 changed reads: 37.19%.
## All data: after index pruning, there are: 1165354 identical reads: 41.36%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 1165354 identical reads.
## Before classification, there are 433242 reads with mutations.
## After classification, there are 884532 reads/indexes which are only identical.
## After classification, there are 13764 reads/indexes which are strictly sequencer.
## After classification, there are 151734 reads/indexes which are consistently repeated.
## Counted by direction: 1900567 forward reads and 1934815 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.

## Warning in melt.data.table(data = mtrx, value.name = "norm", id.vars = "names"):
## 'measure.vars' [s07, s17, s08, s13, ...] are not all of the same type. By
## order of hierarchy, the molten data value column will be of type 'double'. All
## measure variables not of type 'double' will be coerced too. Check DETAILS in ?
## melt.data.table for more on coercion.
##   Writing a legend.
## Warning in rbind(c("Parameter", "Purpose", "Setting"), c("min_reads", "Minimum
## number of reads for an index to be deemed 'real'", : number of columns of result
## is not a multiple of vector length (arg 2)
## Plotting Index density for mutant reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale
## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads before filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for mutant reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for identical reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
## Plotting Index density for all reads after filtering.
## Warning: Ignoring unknown parameters: trim, scale

## Warning: Ignoring unknown aesthetics: violinwidth
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.

2 Questions from Dr. DeStefano

I think what is best is to get the number of recovered mutations of each type from each data set. That would be A to T, A to G, A to C; T to A, T to G, T to C; G to A, G to C, G to T; and C to A, C to G, C to T; as well as deletions and insertions. I would then need the sum number of the reads that met all our criteria (i.e. at least 3 good recovered reads for that 14 nt index). Each set of 3 or more would ct as “1” read of that particular index so I would need the total with this in mind. I also need to know the total number of nucleotides that were in the region we decided to consider in the analysis. We may want to try this for 3 or more and 5 or more recovered indexes if it is not hard. This information does not include specific positions on the template where errors occurred but we can look at that latter. Right now I just want to get a general error rate and type of error. It would basically be calculated by dividing the number of recovered mutations of a particular type by sum number of the reads times the number of nucleotides screened in the template. As it ends up, this number does not really have a lot of meaning but it can be used to calculate the overall mutation rate as well as the rate for transversions, transitions, and deletions and insertions.

3 Answers

In order to address those queries, I invoked create_matrices() with a minimum index count of 3 and 5. It should be noted that this is not the same as requiring 3 or 5 reads per index. In both cases I require 3 reads per index.

3.1 Recovered mutations of each type

I am interpreting this question as the number of indexes recovered for each mutation type. I collect this information in 2 ways of interest: the indexes by type which are deemed to be from the RT and from the sequencer. In addition, I calculate a normalized (cpm) version of this information which may be used to look for changes across samples.

3.1.1 Mutations by RT index

This following block should print out tables of the numbers of mutant indexes observed for each type for the RT and the sequencer. One would hope that the sequencer will be consistent for all samples, but I think the results will instead suggest that my metric is not yet stringent enough.

knitr::kable(triples[["matrices"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples[["matrices"]][["miss_indexes_by_type"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["matrices"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples_tenmpr[["matrices"]][["miss_indexes_by_type"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["matrices"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples_fivempr[["matrices"]][["miss_indexes_by_type"]]): object 'triples_fivempr' not found
knitr::kable(quints[["matrices"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints[["matrices"]][["miss_indexes_by_type"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["matrices"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints_tenmpr[["matrices"]][["miss_indexes_by_type"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["matrices"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints_fivempr[["matrices"]][["miss_indexes_by_type"]]): object 'quints_fivempr' not found
knitr::kable(triples[["matrices"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples[["matrices"]][["miss_sequencer_by_type"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["matrices"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples_tenmpr[["matrices"]][["miss_sequencer_by_type"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["matrices"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples_fivempr[["matrices"]][["miss_sequencer_by_type"]]): object 'triples_fivempr' not found
knitr::kable(quints[["matrices"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints[["matrices"]][["miss_sequencer_by_type"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["matrices"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints_tenmpr[["matrices"]][["miss_sequencer_by_type"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["matrices"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints_fivempr[["matrices"]][["miss_sequencer_by_type"]]): object 'quints_fivempr' not found

Plots of this information

triples[["plots"]][["counts"]][["miss_indexes_by_type"]]
## Error in eval(expr, envir, enclos): object 'triples' not found
triples_tenmpr[["plots"]][["counts"]][["miss_indexes_by_type"]]
## Error in eval(expr, envir, enclos): object 'triples_tenmpr' not found
triples_fivempr[["plots"]][["counts"]][["miss_indexes_by_type"]]
## Error in eval(expr, envir, enclos): object 'triples_fivempr' not found
quints[["plots"]][["counts"]][["miss_indexes_by_type"]]
## Error in eval(expr, envir, enclos): object 'quints' not found
quints_tenmpr[["plots"]][["counts"]][["miss_indexes_by_type"]]
## Error in eval(expr, envir, enclos): object 'quints_tenmpr' not found
quints_fivempr[["plots"]][["counts"]][["miss_indexes_by_type"]]
## Error in eval(expr, envir, enclos): object 'quints_fivempr' not found

This suggests to me that this information needs to be normalized in some more sensible fashion. Thus the following:

3.1.2 Mutations by RT index, post normalization

The same numbers may be expressed in the context of the number of indexes observed / sample and/or as a cpm across samples. Thus in the first instance one can look at the apparent error rate for each sample, and in the second instance one may look for relative changes in apparent error rate across samples.

3.1.2.1 Rewriting the matrices as cpm to account for library sizes.

knitr::kable(triples[["normalized"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples[["normalized"]][["miss_indexes_by_type"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["normalized"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples_tenmpr[["normalized"]][["miss_indexes_by_type"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["normalized"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples_fivempr[["normalized"]][["miss_indexes_by_type"]]): object 'triples_fivempr' not found
knitr::kable(quints[["normalized"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints[["normalized"]][["miss_indexes_by_type"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["normalized"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints_tenmpr[["normalized"]][["miss_indexes_by_type"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["normalized"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints_fivempr[["normalized"]][["miss_indexes_by_type"]]): object 'quints_fivempr' not found
knitr::kable(triples[["normalized"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples[["normalized"]][["miss_sequencer_by_type"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["normalized"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples_tenmpr[["normalized"]][["miss_sequencer_by_type"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["normalized"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples_fivempr[["normalized"]][["miss_sequencer_by_type"]]): object 'triples_fivempr' not found
knitr::kable(quints[["normalized"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints[["normalized"]][["miss_sequencer_by_type"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["normalized"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints_tenmpr[["normalized"]][["miss_sequencer_by_type"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["normalized"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints_fivempr[["normalized"]][["miss_sequencer_by_type"]]): object 'quints_fivempr' not found

3.1.2.2 Rewriting the matrices by dividing by all indexes

This I think starts to address the later text in your query.

knitr::kable(triples[["matrices_by_counts"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples[["matrices_by_counts"]][["miss_indexes_by_type"]]): object 'triples' not found
knitr::kable(quints[["matrices_by_counts"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints[["matrices_by_counts"]][["miss_indexes_by_type"]]): object 'quints' not found
knitr::kable(triples[["matrices_by_counts"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples[["matrices_by_counts"]][["miss_sequencer_by_type"]]): object 'triples' not found
knitr::kable(quints[["matrices_by_counts"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints[["matrices_by_counts"]][["miss_sequencer_by_type"]]): object 'quints' not found

3.1.2.3 Rewriting the matrices by dividing by all indexes and cpm

I think this might prove to be where we get the most meaningful results.

The nicest thing in it is that after accounting for library sizes and total indexes observed, we finally see that the sequencer error is mostly consistent across all samples and mutation types – with a couple of notable exceptions.

By the same token, for the mutations which are identical for the sequencer, we have some which are decidedly different for the non-sequencer data. The most notable examples I think are A to G but _not G to A; and C to T.

knitr::kable(triples[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples[["normalized_by_counts"]][["miss_indexes_by_type"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Error in knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]]): object 'triples_fivempr' not found
knitr::kable(quints[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints[["normalized_by_counts"]][["miss_indexes_by_type"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Error in knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]]): object 'quints_fivempr' not found
knitr::kable(triples[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples[["normalized_by_counts"]][["miss_sequencer_by_type"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]]): object 'triples_fivempr' not found
knitr::kable(quints[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints[["normalized_by_counts"]][["miss_sequencer_by_type"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Error in knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]]): object 'quints_fivempr' not found

3.1.3 Indels by RT index

The following blocks will repeat the above, but looking for insertions. This data does not observe sufficient deletions to make a proper count for them.

knitr::kable(triples[["matrices"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples[["matrices"]][["insert_indexes_by_nt"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["matrices"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples_tenmpr[["matrices"]][["insert_indexes_by_nt"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["matrices"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples_fivempr[["matrices"]][["insert_indexes_by_nt"]]): object 'triples_fivempr' not found
knitr::kable(quints[["matrices"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints[["matrices"]][["insert_indexes_by_nt"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["matrices"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints_tenmpr[["matrices"]][["insert_indexes_by_nt"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["matrices"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints_fivempr[["matrices"]][["insert_indexes_by_nt"]]): object 'quints_fivempr' not found
knitr::kable(triples[["matrices"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples[["matrices"]][["insert_sequencer_by_nt"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["matrices"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples_tenmpr[["matrices"]][["insert_sequencer_by_nt"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["matrices"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples_fivempr[["matrices"]][["insert_sequencer_by_nt"]]): object 'triples_fivempr' not found
knitr::kable(quints[["matrices"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints[["matrices"]][["insert_sequencer_by_nt"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["matrices"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints_tenmpr[["matrices"]][["insert_sequencer_by_nt"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["matrices"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints_fivempr[["matrices"]][["insert_sequencer_by_nt"]]): object 'quints_fivempr' not found

Plots of this information

triples[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## Error in eval(expr, envir, enclos): object 'triples' not found
triples_tenmpr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## Error in eval(expr, envir, enclos): object 'triples_tenmpr' not found
triples_fivempr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## Error in eval(expr, envir, enclos): object 'triples_fivempr' not found
quints[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## Error in eval(expr, envir, enclos): object 'quints' not found
quints_tenmpr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## Error in eval(expr, envir, enclos): object 'quints_tenmpr' not found
quints_fivempr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## Error in eval(expr, envir, enclos): object 'quints_fivempr' not found

3.1.4 Insertions by RT index, post normalization

3.1.4.1 Rewriting the matrices as cpm to account for library sizes.

knitr::kable(triples[["normalized"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples[["normalized"]][["insert_indexes_by_nt"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["normalized"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples_tenmpr[["normalized"]][["insert_indexes_by_nt"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["normalized"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples_fivempr[["normalized"]][["insert_indexes_by_nt"]]): object 'triples_fivempr' not found
knitr::kable(quints[["normalized"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints[["normalized"]][["insert_indexes_by_nt"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["normalized"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints_tenmpr[["normalized"]][["insert_indexes_by_nt"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["normalized"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints_fivempr[["normalized"]][["insert_indexes_by_nt"]]): object 'quints_fivempr' not found
knitr::kable(triples[["normalized"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples[["normalized"]][["insert_sequencer_by_nt"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["normalized"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples_tenmpr[["normalized"]][["insert_sequencer_by_nt"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["normalized"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples_fivempr[["normalized"]][["insert_sequencer_by_nt"]]): object 'triples_fivempr' not found
knitr::kable(quints[["normalized"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints[["normalized"]][["insert_sequencer_by_nt"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["normalized"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints_tenmpr[["normalized"]][["insert_sequencer_by_nt"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["normalized"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints_fivempr[["normalized"]][["insert_sequencer_by_nt"]]): object 'quints_fivempr' not found

3.1.4.2 Rewriting the matrices by dividing by all indexes

I think that there are few enough insertion events that this gets a bit messed up. I will double check the logic of this, but that is my initial guess given how few insertions I was seeing when reading the outputs manually. Unfortunately, this means that for these I also cannot provide a cpm measurement.

knitr::kable(triples[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples[["matrices_by_counts"]][["insert_indexes_by_nt"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]]): object 'triples_fivempr' not found
knitr::kable(quints[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints[["matrices_by_counts"]][["insert_indexes_by_nt"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Error in knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]]): object 'quints_fivempr' not found
knitr::kable(triples[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples[["matrices_by_counts"]][["insert_sequencer_by_nt"]]): object 'triples' not found
knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]]): object 'triples_tenmpr' not found
knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]]): object 'triples_fivempr' not found
knitr::kable(quints[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints[["matrices_by_counts"]][["insert_sequencer_by_nt"]]): object 'quints' not found
knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]]): object 'quints_tenmpr' not found
knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Error in knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]]): object 'quints_fivempr' not found

The following is my previous writing of this worksheet which just dumped the various tables.

---
title: "Counting RT mutations from illumina sequencing data."
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
  html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    number_sections: true
    self_contained: true
    theme: readable
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
  rmdformats::readthedown:
    code_download: true
    code_folding: show
    df_print: paged
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    width: 300
    keep_md: false
    mode: selfcontained
    toc_float: true
  BiocStyle::html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    toc_float: true
---

<style type="text/css">
body, td {
  font-size: 16px;
}
code.r{
  font-size: 16px;
}
pre {
 font-size: 16px
}
</style>

```{r options, include=FALSE}
library("hpgltools")
tt <- devtools::load_all("~/hpgltools")
knitr::opts_knit$set(width=120,
                     progress=TRUE,
                     verbose=TRUE,
                     echo=TRUE)
knitr::opts_chunk$set(error=TRUE,
                      dpi=96)
old_options <- options(digits=4,
                       stringsAsFactors=FALSE,
                       knitr.duplicate.label="allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size=10))
rundate <- format(Sys.Date(), format="%Y%m%d")
previous_file <- "index.Rmd"
ver <- "20200314"

##tmp <- sm(loadme(filename=paste0(gsub(pattern="\\.Rmd", replace="", x=previous_file), "-v", ver, ".rda.xz")))
rmd_file <- "error_quant_202101.Rmd"
```

# Calculating error rates.

I wrote the function 'create_matrices()' to collect mutation counts.  At least
in theory the results from it should be able to address most/any question
regarding the counts of mutations observed in the data.

## Categorize the data with at least 3 indexes per mutant

```{r triples}
devtools::load_all("Rerrrt")
ident_column <- "identtable"
mut_column <- "mutationtable"
min_reads <- 3
min_indexes <- 3
min_sequencer <- 6
min_position <- 22
max_position <- 185
max_mutations_per_read <- NULL
prune_n <- TRUE
verbose <- TRUE
plot_order <- c("dna_control", "dna_low", "dna_high", "rna_control", "rna_low", "rna_high")
sample_sheet <- "sample_sheets/recent_samples_2020.xlsx"
excel <- glue::glue("excel/{rundate}_recent_samples_2020_triples-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Repeat the same parameters using all samples
sample_sheet <- "sample_sheets/all_samples_202101.xlsx"
excel <- glue::glue("excel/{rundate}_all_samples_triples-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Repeat with only the recent RNA samples
sample_sheet <- "sample_sheets/rna_samples_202101.xlsx"
excel <- glue::glue("excel/{rundate}_rna_samples_triples-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_triples_tenmpr-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_triples_fivempr-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
```

## Categorize the data with at least 5 indexes per mutant

```{r quints}
min_indexes <- 5
max_mutations_per_read <- NULL
sample_sheet <- "sample_sheets/recent_samples_2020.xlsx"
excel <- glue::glue("excel/{rundate}_recent_samples_quints-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_recent_samples_quints_tenmpr-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_recent_samples_quints_fivempr-v{ver}.xlsx")
written <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
```

# Questions from Dr. DeStefano

I think what is best is to get the number of recovered mutations of each type
from each data set.  That would be A to T, A to G, A to C; T to A, T to G, T to
C; G to A, G to C, G to T; and C to A, C to G, C to T; as well as deletions and
insertions.  I would then need the sum number of the reads that met all our
criteria (i.e. at least 3 good recovered reads for that 14 nt index).  Each set
of 3 or more would ct as "1" read of that particular index so I would need the
total with this in mind.  I also need to know the total number of nucleotides
that were in the region we decided to consider in the analysis.  We may want to
try this for 3 or more and 5 or more recovered indexes if it is not hard.  This
information does not include specific positions on the template where errors
occurred but we can look at that latter.  Right now I just want to get a general
error rate and type of error.  It would basically be calculated by dividing the
number of recovered mutations of a particular type by sum number of the reads
times the number of nucleotides screened in the template.  As it ends up, this
number does not really have a lot of meaning but it can be used to calculate the
overall mutation rate as well as the rate for transversions, transitions, and
deletions and insertions.

# Answers

In order to address those queries, I invoked create_matrices() with a minimum
index count of 3 and 5.  It should be noted that this is not the same as
requiring 3 or 5 reads per index.  In both cases I require 3 reads per index.

## Recovered mutations of each type

I am interpreting this question as the number of indexes recovered for each
mutation type.  I collect this information in 2 ways of interest: the indexes by
type which are deemed to be from the RT and from the sequencer.  In addition, I
calculate a normalized (cpm) version of this information which may be used to look for
changes across samples.

### Mutations by RT index

This following block should print out tables of the numbers of mutant indexes
observed for each type for the RT and the sequencer.  One would hope that the
sequencer will be consistent for all samples, but I think the results will
instead suggest that my metric is not yet stringent enough.

```{r mutation_index_count, results='asis'}
knitr::kable(triples[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(triples_tenmpr[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(triples_fivempr[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(quints[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(quints_tenmpr[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(quints_fivempr[["matrices"]][["miss_indexes_by_type"]])

knitr::kable(triples[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(triples_tenmpr[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(triples_fivempr[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(quints[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(quints_tenmpr[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(quints_fivempr[["matrices"]][["miss_sequencer_by_type"]])
```

Plots of this information

```{r mutation_index_count_plots}
triples[["plots"]][["counts"]][["miss_indexes_by_type"]]
triples_tenmpr[["plots"]][["counts"]][["miss_indexes_by_type"]]
triples_fivempr[["plots"]][["counts"]][["miss_indexes_by_type"]]

quints[["plots"]][["counts"]][["miss_indexes_by_type"]]
quints_tenmpr[["plots"]][["counts"]][["miss_indexes_by_type"]]
quints_fivempr[["plots"]][["counts"]][["miss_indexes_by_type"]]
```

This suggests to me that this information needs to be normalized in some more
sensible fashion.  Thus the following:

### Mutations by RT index, post normalization

The same numbers may be expressed in the context of the number of indexes
observed / sample and/or as a cpm across samples.  Thus in the first instance
one can look at the apparent error rate for each sample, and in the second
instance one may look for relative changes in apparent error rate across
samples.

#### Rewriting the matrices as cpm to account for library sizes.

```{r mutation_index_normalized, results='asis'}
knitr::kable(triples[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(triples_tenmpr[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(triples_fivempr[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(quints[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(quints_tenmpr[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(quints_fivempr[["normalized"]][["miss_indexes_by_type"]])

knitr::kable(triples[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(triples_tenmpr[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(triples_fivempr[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(quints[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(quints_tenmpr[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(quints_fivempr[["normalized"]][["miss_sequencer_by_type"]])
```

#### Rewriting the matrices by dividing by all indexes

This I think starts to address the later text in your query.

```{r mutation_index_normalized_by_counts, results='asis'}
knitr::kable(triples[["matrices_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(quints[["matrices_by_counts"]][["miss_indexes_by_type"]])

knitr::kable(triples[["matrices_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(quints[["matrices_by_counts"]][["miss_sequencer_by_type"]])
```

#### Rewriting the matrices by dividing by all indexes and cpm

I think this might prove to be where we get the most meaningful results.

The nicest thing in it is that after accounting for library sizes and total
indexes observed, we finally see that the sequencer error is mostly consistent
across all samples and mutation types -- with a couple of notable exceptions.

By the same token, for the mutations which _are_ identical for the sequencer, we
have some which are decidedly different for the non-sequencer data.  The most
notable examples I think are A to G but _not G to A; and C to T.

```{r mutation_index_cpm_by_counts, results='asis'}
knitr::kable(triples[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(quints[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]])

knitr::kable(triples[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(quints[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
```

### Indels by RT index

The following blocks will repeat the above, but looking for insertions.
This data does not observe sufficient deletions to make a proper count for them.

```{r insert_index_count, results='asis'}
knitr::kable(triples[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(triples_tenmpr[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(triples_fivempr[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(quints[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(quints_tenmpr[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(quints_fivempr[["matrices"]][["insert_indexes_by_nt"]])

knitr::kable(triples[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_tenmpr[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_fivempr[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(quints[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_tenmpr[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_fivempr[["matrices"]][["insert_sequencer_by_nt"]])
```

Plots of this information

```{r insert_index_count_plots}
triples[["plots"]][["counts"]][["insert_indexes_by_nt"]]
triples_tenmpr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
triples_fivempr[["plots"]][["counts"]][["insert_indexes_by_nt"]]

quints[["plots"]][["counts"]][["insert_indexes_by_nt"]]
quints_tenmpr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
quints_fivempr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
```

### Insertions by RT index, post normalization

#### Rewriting the matrices as cpm to account for library sizes.

```{r insert_index_normalized, results='asis'}
knitr::kable(triples[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(triples_tenmpr[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(triples_fivempr[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(quints[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(quints_tenmpr[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(quints_fivempr[["normalized"]][["insert_indexes_by_nt"]])

knitr::kable(triples[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_tenmpr[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_fivempr[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(quints[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_tenmpr[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_fivempr[["normalized"]][["insert_sequencer_by_nt"]])
```

#### Rewriting the matrices by dividing by all indexes

I think that there are few enough insertion events that this gets a bit messed
up.  I will double check the logic of this, but that is my initial guess given
how few insertions I was seeing when reading the outputs manually.
Unfortunately, this means that for these I also cannot provide a cpm measurement.

```{r insert_index_normalized_by_counts, results='asis'}
knitr::kable(triples[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(quints[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]])

knitr::kable(triples[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(quints[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
```

The following is my previous writing of this worksheet which just dumped the
various tables.

# Print raw tables

```{r raw, results='asis'}
for (t in 1:length(triples[["matrices"]])) {
  table_name <- names(triples[["matrices"]])[t]
  message("Raw table: ", table_name, ".")
  print(knitr::kable(triples[["matrices"]][t]))
}
```

# Print raw plots

```{r raw_plots}
for (t in 1:length(triples[["plots"]][["matrices"]])) {
  message("Raw table: ", table_name, ".")
  print(triples[["plots"]][["matrices"]][t])
}
```

# Print normalized tables

```{r norm, results='asis'}
for (t in 1:length(triples[["matrices_counts"]])) {
  table_name <- names(triples[["matrices_counts"]])[t]
  message("Normalized table: ", table_name, ".")
  print(knitr::kable(triples[["matrices_counts"]][t]))
}
```

# Print normalized plots

```{r norm_plots}
for (t in 1:length(triples[["plots"]][["counts"]])) {
  message("Normalized table: ", table_name, ".")
  print(triples[["plots"]][["counts"]][t])
}
```

```{r saveme}
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
message(paste0("Saving to ", this_save))
tmp <- sm(saveme(filename=this_save))
```


```{r loadme, eval=FALSE}
loadme(filename=this_save)
```
