1 Calculating error rates.

I wrote the function ‘create_matrices()’ to collect mutation counts. At least in theory the results from it should be able to address most/any question regarding the counts of mutations observed in the data.

1.1 Categorize the data with at least 3 indexes per mutant

devtools::load_all("Rerrrt")
## Loading Rerrrt
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:hpgltools':
## 
##     combine
## The following object is masked from 'package:testthat':
## 
##     matches
## The following object is masked from 'package:Biobase':
## 
##     combine
## The following objects are masked from 'package:BiocGenerics':
## 
##     combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: tidyr
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:testthat':
## 
##     matches
sample_sheet <- "sample_sheets/new_samples.xlsx"
ident_column <- "identtable"
mut_column <- "mutationtable"
min_reads <- 3
min_indexes <- 3
min_sequencer <- 6
min_position <- 24
max_position <- 176
max_mutations_per_read <- NULL
prune_n <- TRUE
verbose <- TRUE
excel <- glue::glue("excel/{rundate}_new_triples-v{ver}.xlsx")
triples <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
##   Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
##   Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
##   Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
##   Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
##   Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
##   Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
##   Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
##   Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
##   Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
##   Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
##   Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
##   Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
##   Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_triples_tenmpr-v{ver}.xlsx")
triples_tenmpr <- create_matrices(sample_sheet=sample_sheet,
                                  ident_column=ident_column, mut_column=mut_column,
                                  min_reads=min_reads, min_indexes=min_indexes,
                                  min_sequencer=min_sequencer,
                                  min_position=min_position, max_position=max_position,
                                  prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
##   Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
##   Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
##   Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
##   Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
##   Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
##   Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
##   Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
##   Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
##   Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
##   Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
##   Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
##   Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
##   Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_triples_fivempr-v{ver}.xlsx")
triples_fivempr <- create_matrices(sample_sheet=sample_sheet,
                                   ident_column=ident_column, mut_column=mut_column,
                                   min_reads=min_reads, min_indexes=min_indexes,
                                   min_sequencer=min_sequencer,
                                   min_position=min_position, max_position=max_position,
                                   prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
##   Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
##   Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
##   Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
##   Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
##   Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
##   Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
##   Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
##   Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
##   Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
##   Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
##   Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
##   Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 3 indexes.
## Classified mutation strings according to various queries.
##   Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.

1.2 Categorize the data with at least 5 indexes per mutant

min_indexes <- 5
max_mutations_per_read <- NULL
excel <- glue::glue("excel/{rundate}_quints-v{ver}.xlsx")
quints <- create_matrices(sample_sheet=sample_sheet,
                          ident_column=ident_column, mut_column=mut_column,
                          min_reads=min_reads, min_indexes=min_indexes,
                          min_sequencer=min_sequencer,
                          min_position=min_position, max_position=max_position,
                          prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
##   Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
##   Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
##   Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
##   Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
##   Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
##   Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
##   Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
##   Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
##   Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
##   Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
##   Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
##   Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
##   Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_quints_tenmpr-v{ver}.xlsx")
quints_tenmpr <- create_matrices(sample_sheet=sample_sheet,
                                 ident_column=ident_column, mut_column=mut_column,
                                 min_reads=min_reads, min_indexes=min_indexes,
                                 min_sequencer=min_sequencer,
                                 min_position=min_position, max_position=max_position,
                                 prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
##   Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
##   Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
##   Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
##   Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
##   Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
##   Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
##   Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
##   Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
##   Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
##   Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
##   Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
##   Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
##   Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_quints_fivempr-v{ver}.xlsx")
quints_fivempr <- create_matrices(sample_sheet=sample_sheet,
                                  ident_column=ident_column, mut_column=mut_column,
                                  min_reads=min_reads, min_indexes=min_indexes,
                                  min_sequencer=min_sequencer,
                                  min_position=min_position, max_position=max_position,
                                  prune_n=prune_n, verbose=verbose, excel=excel)
## Dropped 6 rows from the sample metadata because they were blank.
## Starting sample: s7_control_dna.
##   Reading the file containing mutations: preprocessing/s7_control_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s7_control_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 548445 reads.
## Mutation data: after min-position pruning, there are: 525666 reads: 22779 lost or 4.15%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 525666 reads.
## Mutation data: after max-position pruning, there are: 482118 reads: 43548 lost or 8.28%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 468057 reads: 14061 lost or 2.92%.
##   Mutation data: all filters removed 80388 reads, or 14.66%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1822433 indexes in all the data.
## After reads/index pruning, there are: 211096 indexes: 1611337 lost or 88.42%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 468057 changed reads.
## All data: before reads/index pruning, there are: 2421460 identical reads.
## All data: after index pruning, there are: 121275 changed reads: 25.91%.
## All data: after index pruning, there are: 628664 identical reads: 25.96%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 628664 identical reads.
## Before classification, there are 121275 reads with mutations.
## After classification, there are 543985 reads/indexes which are only identical.
## After classification, there are 1636 reads/indexes which are strictly sequencer.
## After classification, there are 55444 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 982674 forward reads and 1112622 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s9_low_dna.
##   Reading the file containing mutations: preprocessing/s9_low_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s9_low_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 418833 reads.
## Mutation data: after min-position pruning, there are: 398387 reads: 20446 lost or 4.88%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 398387 reads.
## Mutation data: after max-position pruning, there are: 357781 reads: 40606 lost or 10.19%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 350717 reads: 7064 lost or 1.97%.
##   Mutation data: all filters removed 68116 reads, or 16.26%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1945123 indexes in all the data.
## After reads/index pruning, there are: 176434 indexes: 1768689 lost or 90.93%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 350717 changed reads.
## All data: before reads/index pruning, there are: 2566212 identical reads.
## All data: after index pruning, there are: 73320 changed reads: 20.91%.
## All data: after index pruning, there are: 539291 identical reads: 21.02%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 539291 identical reads.
## Before classification, there are 73320 reads with mutations.
## After classification, there are 469094 reads/indexes which are only identical.
## After classification, there are 1004 reads/indexes which are strictly sequencer.
## After classification, there are 18533 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 770075 forward reads and 900874 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s11_high_dna.
##   Reading the file containing mutations: preprocessing/s11_high_dna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s11_high_dna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 344009 reads.
## Mutation data: after min-position pruning, there are: 325611 reads: 18398 lost or 5.35%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 325611 reads.
## Mutation data: after max-position pruning, there are: 288540 reads: 37071 lost or 11.39%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 282511 reads: 6029 lost or 2.09%.
##   Mutation data: all filters removed 61498 reads, or 17.88%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1390098 indexes in all the data.
## After reads/index pruning, there are: 257737 indexes: 1132361 lost or 81.46%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 282511 changed reads.
## All data: before reads/index pruning, there are: 2240589 identical reads.
## All data: after index pruning, there are: 99596 changed reads: 35.25%.
## All data: after index pruning, there are: 869226 identical reads: 38.79%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 869226 identical reads.
## Before classification, there are 99596 reads with mutations.
## After classification, there are 743162 reads/indexes which are only identical.
## After classification, there are 5098 reads/indexes which are strictly sequencer.
## After classification, there are 11216 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1339382 forward reads and 1519238 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s8_control_rna.
##   Reading the file containing mutations: preprocessing/s8_control_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s8_control_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 267014 reads.
## Mutation data: after min-position pruning, there are: 253197 reads: 13817 lost or 5.17%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 253197 reads.
## Mutation data: after max-position pruning, there are: 225653 reads: 27544 lost or 10.88%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 220454 reads: 5199 lost or 2.30%.
##   Mutation data: all filters removed 46560 reads, or 17.44%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1153820 indexes in all the data.
## After reads/index pruning, there are: 209685 indexes: 944135 lost or 81.83%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 220454 changed reads.
## All data: before reads/index pruning, there are: 1854668 identical reads.
## All data: after index pruning, there are: 75760 changed reads: 34.37%.
## All data: after index pruning, there are: 710768 identical reads: 38.32%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 710768 identical reads.
## Before classification, there are 75760 reads with mutations.
## After classification, there are 611191 reads/indexes which are only identical.
## After classification, there are 4004 reads/indexes which are strictly sequencer.
## After classification, there are 8979 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1103672 forward reads and 1253565 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s10_low_rna.
##   Reading the file containing mutations: preprocessing/s10_low_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s10_low_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 387777 reads.
## Mutation data: after min-position pruning, there are: 370081 reads: 17696 lost or 4.56%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 370081 reads.
## Mutation data: after max-position pruning, there are: 335412 reads: 34669 lost or 9.37%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 324417 reads: 10995 lost or 3.28%.
##   Mutation data: all filters removed 63360 reads, or 16.34%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1303577 indexes in all the data.
## After reads/index pruning, there are: 254438 indexes: 1049139 lost or 80.48%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 324417 changed reads.
## All data: before reads/index pruning, there are: 2083008 identical reads.
## All data: after index pruning, there are: 119184 changed reads: 36.74%.
## All data: after index pruning, there are: 832044 identical reads: 39.94%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 832044 identical reads.
## Before classification, there are 119184 reads with mutations.
## After classification, there are 710943 reads/indexes which are only identical.
## After classification, there are 4525 reads/indexes which are strictly sequencer.
## After classification, there are 34403 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1344297 forward reads and 1447138 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
## Starting sample: s12_high_rna.
##   Reading the file containing mutations: preprocessing/s12_high_rna/step4.txt.xz
##   Reading the file containing the identical reads: preprocessing/s12_high_rna/step2_identical_reads.txt.xz
##   Counting indexes before filtering.
## Mutation data: removing any differences before position: 24.
## Mutation data: before pruning, there are: 558061 reads.
## Mutation data: after min-position pruning, there are: 535712 reads: 22349 lost or 4.00%.
## Mutation data: removing any differences after position: 176.
## Mutation data: before pruning, there are: 535712 reads.
## Mutation data: after max-position pruning, there are: 485875 reads: 49837 lost or 9.30%.
## Mutation data: removing any reads with 'N' as the hit.
## Mutation data: after N pruning, there are: 473035 reads: 12840 lost or 2.64%.
##   Mutation data: all filters removed 85026 reads, or 15.24%.
## Gathering information about the number of reads per index.
## Before reads/index pruning, there are: 1218719 indexes in all the data.
## After reads/index pruning, there are: 252358 indexes: 966361 lost or 79.29%.
## All data: removing indexes with fewer than 3 reads/index.
## All data: before reads/index pruning, there are: 473035 changed reads.
## All data: before reads/index pruning, there are: 1855921 identical reads.
## All data: after index pruning, there are: 188945 changed reads: 39.94%.
## All data: after index pruning, there are: 770419 identical reads: 41.51%.
## Gathering identical, mutant, and sequencer reads/indexes.
## Before classification, there are 770419 identical reads.
## Before classification, there are 188945 reads with mutations.
## After classification, there are 646343 reads/indexes which are only identical.
## After classification, there are 4799 reads/indexes which are strictly sequencer.
## After classification, there are 98204 reads/indexes which are deemed from reverse transcriptase.
## Counted by direction: 1307113 forward reads and 1448739 reverse_reads.
## Subsetting based on mutations with at least 5 indexes.
## Classified mutation strings according to various queries.
##   Writing a legend.
## Plotting Index density for mutant reads before filtering.
## Plotting Index density for identical reads before filtering.
## Plotting Index density for all reads before filtering.
## Plotting Index density for mutant reads after filtering.
## Plotting Index density for identical reads after filtering.
## Plotting Index density for all reads after filtering.
##   Writing raw data.
##   Writing cpm data.
##   Writing data normalized by reads/indexes.
##   Writing data normalized by reads/indexes and length.
##   Writing data normalized by cpm(reads/indexes) and length.

2 Questions from Dr. DeStefano

I think what is best is to get the number of recovered mutations of each type from each data set. That would be A to T, A to G, A to C; T to A, T to G, T to C; G to A, G to C, G to T; and C to A, C to G, C to T; as well as deletions and insertions. I would then need the sum number of the reads that met all our criteria (i.e. at least 3 good recovered reads for that 14 nt index). Each set of 3 or more would ct as “1” read of that particular index so I would need the total with this in mind. I also need to know the total number of nucleotides that were in the region we decided to consider in the analysis. We may want to try this for 3 or more and 5 or more recovered indexes if it is not hard. This information does not include specific positions on the template where errors occurred but we can look at that latter. Right now I just want to get a general error rate and type of error. It would basically be calculated by dividing the number of recovered mutations of a particular type by sum number of the reads times the number of nucleotides screened in the template. As it ends up, this number does not really have a lot of meaning but it can be used to calculate the overall mutation rate as well as the rate for transversions, transitions, and deletions and insertions.

3 Answers

In order to address those queries, I invoked create_matrices() with a minimum index count of 3 and 5. It should be noted that this is not the same as requiring 3 or 5 reads per index. In both cases I require 3 reads per index.

3.1 Recovered mutations of each type

I am interpreting this question as the number of indexes recovered for each mutation type. I collect this information in 2 ways of interest: the indexes by type which are deemed to be from the RT and from the sequencer. In addition, I calculate a normalized (cpm) version of this information which may be used to look for changes across samples.

3.1.1 Mutations by RT index

This following block should print out tables of the numbers of mutant indexes observed for each type for the RT and the sequencer. One would hope that the sequencer will be consistent for all samples, but I think the results will instead suggest that my metric is not yet stringent enough.

knitr::kable(triples[["matrices"]][["miss_indexes_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 1150 503 267 216 1139 1960
A_G 8689 1474 340 226 4101 10582
A_T 987 318 155 76 1077 1922
C_A 5917 3233 3385 3541 6822 11465
C_G 8260 1123 258 248 2011 4290
C_T 12143 1755 727 563 4734 21910
G_A 7463 4834 698 363 3829 20448
G_C 337 226 135 69 363 632
G_T 7123 3646 4313 2918 2955 3679
T_A 671 294 105 70 994 1440
T_C 1135 478 316 209 1734 13391
T_G 1454 579 435 404 1160 2933
knitr::kable(triples_tenmpr[["matrices"]][["miss_indexes_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 1150 503 267 216 1139 1960
A_G 8689 1474 340 226 4101 10582
A_T 987 318 155 76 1077 1922
C_A 5917 3233 3385 3541 6822 11465
C_G 8260 1123 258 248 2011 4290
C_T 12143 1755 727 563 4734 21910
G_A 7463 4834 698 363 3829 20448
G_C 337 226 135 69 363 632
G_T 7123 3646 4313 2918 2955 3679
T_A 671 294 105 70 994 1440
T_C 1135 478 316 209 1734 13391
T_G 1454 579 435 404 1160 2933
knitr::kable(triples_fivempr[["matrices"]][["miss_indexes_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 1150 503 267 216 1139 1960
A_G 8689 1474 340 226 4101 10582
A_T 987 318 155 76 1077 1922
C_A 5917 3233 3385 3541 6822 11465
C_G 8260 1123 258 248 2011 4290
C_T 12143 1755 727 563 4734 21910
G_A 7463 4834 698 363 3829 20448
G_C 337 226 135 69 363 632
G_T 7123 3646 4313 2918 2955 3679
T_A 671 294 105 70 994 1440
T_C 1135 478 316 209 1734 13391
T_G 1454 579 435 404 1160 2933
knitr::kable(quints[["matrices"]][["miss_indexes_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 1146 497 260 197 1136 1960
A_G 8689 1474 320 203 4101 10582
A_T 987 306 133 38 1077 1922
C_A 5917 3233 3385 3541 6822 11465
C_G 8260 1117 224 220 2011 4290
C_T 12143 1755 714 538 4734 21910
G_A 7463 4834 694 338 3829 20448
G_C 313 210 118 49 342 632
G_T 7123 3646 4313 2914 2955 3679
T_A 664 280 74 50 994 1437
T_C 1135 468 312 181 1734 13391
T_G 1447 561 424 387 1149 2933
knitr::kable(quints_tenmpr[["matrices"]][["miss_indexes_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 1146 497 260 197 1136 1960
A_G 8689 1474 320 203 4101 10582
A_T 987 306 133 38 1077 1922
C_A 5917 3233 3385 3541 6822 11465
C_G 8260 1117 224 220 2011 4290
C_T 12143 1755 714 538 4734 21910
G_A 7463 4834 694 338 3829 20448
G_C 313 210 118 49 342 632
G_T 7123 3646 4313 2914 2955 3679
T_A 664 280 74 50 994 1437
T_C 1135 468 312 181 1734 13391
T_G 1447 561 424 387 1149 2933
knitr::kable(quints_fivempr[["matrices"]][["miss_indexes_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 1146 497 260 197 1136 1960
A_G 8689 1474 320 203 4101 10582
A_T 987 306 133 38 1077 1922
C_A 5917 3233 3385 3541 6822 11465
C_G 8260 1117 224 220 2011 4290
C_T 12143 1755 714 538 4734 21910
G_A 7463 4834 694 338 3829 20448
G_C 313 210 118 49 342 632
G_T 7123 3646 4313 2914 2955 3679
T_A 664 280 74 50 994 1437
T_C 1135 468 312 181 1734 13391
T_G 1447 561 424 387 1149 2933
knitr::kable(triples[["matrices"]][["miss_sequencer_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 220 121 665 519 600 564
A_G 93 44 408 327 336 402
A_T 13 3 89 63 87 100
C_A 135 95 489 402 501 569
C_G 162 62 455 305 349 445
C_T 25 17 217 192 245 209
G_A 28 18 183 156 156 193
G_C 22 13 106 96 79 95
G_T 78 37 323 247 288 333
T_A 44 9 148 82 84 107
T_C 94 26 326 287 283 251
T_G 453 325 1524 1143 1350 1366
knitr::kable(triples_tenmpr[["matrices"]][["miss_sequencer_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 220 121 665 519 600 564
A_G 93 44 408 327 336 402
A_T 13 3 89 63 87 100
C_A 135 95 489 402 501 569
C_G 162 62 455 305 349 445
C_T 25 17 217 192 245 209
G_A 28 18 183 156 156 193
G_C 22 13 106 96 79 95
G_T 78 37 323 247 288 333
T_A 44 9 148 82 84 107
T_C 94 26 326 287 283 251
T_G 453 325 1524 1143 1350 1366
knitr::kable(triples_fivempr[["matrices"]][["miss_sequencer_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 220 121 665 519 600 564
A_G 93 44 408 327 336 402
A_T 13 3 89 63 87 100
C_A 135 95 489 402 501 569
C_G 162 62 455 305 349 445
C_T 25 17 217 192 245 209
G_A 28 18 183 156 156 193
G_C 22 13 106 96 79 95
G_T 78 37 323 247 288 333
T_A 44 9 148 82 84 107
T_C 94 26 326 287 283 251
T_G 453 325 1524 1143 1350 1366
knitr::kable(quints[["matrices"]][["miss_sequencer_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 207 117 633 504 580 550
A_G 76 20 387 299 312 383
A_T 0 0 69 30 50 79
C_A 113 81 470 380 479 551
C_G 147 45 437 285 337 423
C_T 8 0 172 111 180 172
G_A 10 5 170 113 114 166
G_C 18 13 69 65 53 69
G_T 68 23 288 197 250 300
T_A 31 5 107 58 63 84
T_C 80 5 316 272 259 226
T_G 447 318 1511 1124 1329 1346
knitr::kable(quints_tenmpr[["matrices"]][["miss_sequencer_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 207 117 633 504 580 550
A_G 76 20 387 299 312 383
A_T 0 0 69 30 50 79
C_A 113 81 470 380 479 551
C_G 147 45 437 285 337 423
C_T 8 0 172 111 180 172
G_A 10 5 170 113 114 166
G_C 18 13 69 65 53 69
G_T 68 23 288 197 250 300
T_A 31 5 107 58 63 84
T_C 80 5 316 272 259 226
T_G 447 318 1511 1124 1329 1346
knitr::kable(quints_fivempr[["matrices"]][["miss_sequencer_by_type"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A_C 207 117 633 504 580 550
A_G 76 20 387 299 312 383
A_T 0 0 69 30 50 79
C_A 113 81 470 380 479 551
C_G 147 45 437 285 337 423
C_T 8 0 172 111 180 172
G_A 10 5 170 113 114 166
G_C 18 13 69 65 53 69
G_T 68 23 288 197 250 300
T_A 31 5 107 58 63 84
T_C 80 5 316 272 259 226
T_G 447 318 1511 1124 1329 1346

Plots of this information

triples[["plots"]][["counts"]][["miss_indexes_by_type"]]
## NULL
triples_tenmpr[["plots"]][["counts"]][["miss_indexes_by_type"]]
## NULL
triples_fivempr[["plots"]][["counts"]][["miss_indexes_by_type"]]
## NULL
quints[["plots"]][["counts"]][["miss_indexes_by_type"]]
## NULL
quints_tenmpr[["plots"]][["counts"]][["miss_indexes_by_type"]]
## NULL
quints_fivempr[["plots"]][["counts"]][["miss_indexes_by_type"]]
## NULL

This suggests to me that this information needs to be normalized in some more sensible fashion. Thus the following:

3.1.2 Mutations by RT index, post normalization

The same numbers may be expressed in the context of the number of indexes observed / sample and/or as a cpm across samples. Thus in the first instance one can look at the apparent error rate for each sample, and in the second instance one may look for relative changes in apparent error rate across samples.

3.1.2.1 Rewriting the matrices as cpm to account for library sizes.

knitr::kable(triples[["normalized"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_tenmpr[["normalized"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_fivempr[["normalized"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["normalized"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_tenmpr[["normalized"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_fivempr[["normalized"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples[["normalized"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_tenmpr[["normalized"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_fivempr[["normalized"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["normalized"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_tenmpr[["normalized"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_fivempr[["normalized"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

3.1.2.2 Rewriting the matrices by dividing by all indexes

This I think starts to address the later text in your query.

knitr::kable(triples[["matrices_by_counts"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["matrices_by_counts"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples[["matrices_by_counts"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["matrices_by_counts"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

3.1.2.3 Rewriting the matrices by dividing by all indexes and cpm

I think this might prove to be where we get the most meaningful results.

The nicest thing in it is that after accounting for library sizes and total indexes observed, we finally see that the sequencer error is mostly consistent across all samples and mutation types – with a couple of notable exceptions.

By the same token, for the mutations which are identical for the sequencer, we have some which are decidedly different for the non-sequencer data. The most notable examples I think are A to G but _not G to A; and C to T.

knitr::kable(triples[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

3.1.3 Indels by RT index

The following blocks will repeat the above, but looking for insertions. This data does not observe sufficient deletions to make a proper count for them.

knitr::kable(triples[["matrices"]][["insert_indexes_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A 50 16 0 0 471 516
C 9 6 0 0 217 348
G 13 0 0 0 55 38
T 29 22 0 3 2726 2640
knitr::kable(triples_tenmpr[["matrices"]][["insert_indexes_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A 50 16 0 0 471 516
C 9 6 0 0 217 348
G 13 0 0 0 55 38
T 29 22 0 3 2726 2640
knitr::kable(triples_fivempr[["matrices"]][["insert_indexes_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A 50 16 0 0 471 516
C 9 6 0 0 217 348
G 13 0 0 0 55 38
T 29 22 0 3 2726 2640
knitr::kable(quints[["matrices"]][["insert_indexes_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A 47 16 0 0 447 500
C 6 6 0 0 213 332
G 10 0 0 0 31 23
T 29 22 0 0 2692 2605
knitr::kable(quints_tenmpr[["matrices"]][["insert_indexes_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A 47 16 0 0 447 500
C 6 6 0 0 213 332
G 10 0 0 0 31 23
T 29 22 0 0 2692 2605
knitr::kable(quints_fivempr[["matrices"]][["insert_indexes_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
A 47 16 0 0 447 500
C 6 6 0 0 213 332
G 10 0 0 0 31 23
T 29 22 0 0 2692 2605
knitr::kable(triples[["matrices"]][["insert_sequencer_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
knitr::kable(triples_tenmpr[["matrices"]][["insert_sequencer_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
knitr::kable(triples_fivempr[["matrices"]][["insert_sequencer_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
knitr::kable(quints[["matrices"]][["insert_sequencer_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
knitr::kable(quints_tenmpr[["matrices"]][["insert_sequencer_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna
knitr::kable(quints_fivempr[["matrices"]][["insert_sequencer_by_nt"]])
names s7_control_dna s9_low_dna s11_high_dna s8_control_rna s10_low_rna s12_high_rna

Plots of this information

triples[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## NULL
triples_tenmpr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## NULL
triples_fivempr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## NULL
quints[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## NULL
quints_tenmpr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## NULL
quints_fivempr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
## NULL

3.1.4 Insertions by RT index, post normalization

3.1.4.1 Rewriting the matrices as cpm to account for library sizes.

knitr::kable(triples[["normalized"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_tenmpr[["normalized"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_fivempr[["normalized"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["normalized"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_tenmpr[["normalized"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_fivempr[["normalized"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples[["normalized"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_tenmpr[["normalized"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_fivempr[["normalized"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["normalized"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_tenmpr[["normalized"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_fivempr[["normalized"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

3.1.4.2 Rewriting the matrices by dividing by all indexes

I think that there are few enough insertion events that this gets a bit messed up. I will double check the logic of this, but that is my initial guess given how few insertions I was seeing when reading the outputs manually. Unfortunately, this means that for these I also cannot provide a cpm measurement.

knitr::kable(triples[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
## Warning in kable_pipe(x = structure(character(0), .Dim = c(0L, 0L), .Dimnames =
## list(: The table should have a header (column names)

|| || || ||

The following is my previous writing of this worksheet which just dumped the various tables.

---
title: "Counting RT mutations from illumina sequencing data."
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
  html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    number_sections: true
    self_contained: true
    theme: readable
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
  rmdformats::readthedown:
    code_download: true
    code_folding: show
    df_print: paged
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    width: 300
    keep_md: false
    mode: selfcontained
    toc_float: true
  BiocStyle::html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    toc_float: true
---

<style type="text/css">
body, td {
  font-size: 16px;
}
code.r{
  font-size: 16px;
}
pre {
 font-size: 16px
}
</style>

```{r options, include=FALSE}
library("hpgltools")
tt <- devtools::load_all("/data/hpgltools")
knitr::opts_knit$set(width=120,
                     progress=TRUE,
                     verbose=TRUE,
                     echo=TRUE)
knitr::opts_chunk$set(error=TRUE,
                      dpi=96)
old_options <- options(digits=4,
                       stringsAsFactors=FALSE,
                       knitr.duplicate.label="allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size=10))
rundate <- format(Sys.Date(), format="%Y%m%d")
previous_file <- "index.Rmd"
ver <- "20200314"

##tmp <- sm(loadme(filename=paste0(gsub(pattern="\\.Rmd", replace="", x=previous_file), "-v", ver, ".rda.xz")))
rmd_file <- "error_quant_new.Rmd"
```

# Calculating error rates.

I wrote the function 'create_matrices()' to collect mutation counts.  At least
in theory the results from it should be able to address most/any question
regarding the counts of mutations observed in the data.

## Categorize the data with at least 3 indexes per mutant

```{r triples}
devtools::load_all("Rerrrt")
sample_sheet <- "sample_sheets/new_samples.xlsx"
ident_column <- "identtable"
mut_column <- "mutationtable"
min_reads <- 3
min_indexes <- 3
min_sequencer <- 6
min_position <- 24
max_position <- 176
max_mutations_per_read <- NULL
prune_n <- TRUE
verbose <- TRUE
excel <- glue::glue("excel/{rundate}_new_triples-v{ver}.xlsx")
triples <- create_matrices(sample_sheet=sample_sheet,
                           ident_column=ident_column, mut_column=mut_column,
                           min_reads=min_reads, min_indexes=min_indexes,
                           min_sequencer=min_sequencer,
                           min_position=min_position, max_position=max_position,
                           prune_n=prune_n, verbose=verbose, excel=excel)

max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_triples_tenmpr-v{ver}.xlsx")
triples_tenmpr <- create_matrices(sample_sheet=sample_sheet,
                                  ident_column=ident_column, mut_column=mut_column,
                                  min_reads=min_reads, min_indexes=min_indexes,
                                  min_sequencer=min_sequencer,
                                  min_position=min_position, max_position=max_position,
                                  prune_n=prune_n, verbose=verbose, excel=excel)
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_triples_fivempr-v{ver}.xlsx")
triples_fivempr <- create_matrices(sample_sheet=sample_sheet,
                                   ident_column=ident_column, mut_column=mut_column,
                                   min_reads=min_reads, min_indexes=min_indexes,
                                   min_sequencer=min_sequencer,
                                   min_position=min_position, max_position=max_position,
                                   prune_n=prune_n, verbose=verbose, excel=excel)
```

## Categorize the data with at least 5 indexes per mutant

```{r quints}
min_indexes <- 5
max_mutations_per_read <- NULL
excel <- glue::glue("excel/{rundate}_quints-v{ver}.xlsx")
quints <- create_matrices(sample_sheet=sample_sheet,
                          ident_column=ident_column, mut_column=mut_column,
                          min_reads=min_reads, min_indexes=min_indexes,
                          min_sequencer=min_sequencer,
                          min_position=min_position, max_position=max_position,
                          prune_n=prune_n, verbose=verbose, excel=excel)
max_mutations_per_read <- 10
excel <- glue::glue("excel/{rundate}_quints_tenmpr-v{ver}.xlsx")
quints_tenmpr <- create_matrices(sample_sheet=sample_sheet,
                                 ident_column=ident_column, mut_column=mut_column,
                                 min_reads=min_reads, min_indexes=min_indexes,
                                 min_sequencer=min_sequencer,
                                 min_position=min_position, max_position=max_position,
                                 prune_n=prune_n, verbose=verbose, excel=excel)
max_mutations_per_read <- 5
excel <- glue::glue("excel/{rundate}_quints_fivempr-v{ver}.xlsx")
quints_fivempr <- create_matrices(sample_sheet=sample_sheet,
                                  ident_column=ident_column, mut_column=mut_column,
                                  min_reads=min_reads, min_indexes=min_indexes,
                                  min_sequencer=min_sequencer,
                                  min_position=min_position, max_position=max_position,
                                  prune_n=prune_n, verbose=verbose, excel=excel)
```

# Questions from Dr. DeStefano

I think what is best is to get the number of recovered mutations of each type
from each data set.  That would be A to T, A to G, A to C; T to A, T to G, T to
C; G to A, G to C, G to T; and C to A, C to G, C to T; as well as deletions and
insertions.  I would then need the sum number of the reads that met all our
criteria (i.e. at least 3 good recovered reads for that 14 nt index).  Each set
of 3 or more would ct as "1" read of that particular index so I would need the
total with this in mind.  I also need to know the total number of nucleotides
that were in the region we decided to consider in the analysis.  We may want to
try this for 3 or more and 5 or more recovered indexes if it is not hard.  This
information does not include specific positions on the template where errors
occurred but we can look at that latter.  Right now I just want to get a general
error rate and type of error.  It would basically be calculated by dividing the
number of recovered mutations of a particular type by sum number of the reads
times the number of nucleotides screened in the template.  As it ends up, this
number does not really have a lot of meaning but it can be used to calculate the
overall mutation rate as well as the rate for transversions, transitions, and
deletions and insertions.

# Answers

In order to address those queries, I invoked create_matrices() with a minimum
index count of 3 and 5.  It should be noted that this is not the same as
requiring 3 or 5 reads per index.  In both cases I require 3 reads per index.

## Recovered mutations of each type

I am interpreting this question as the number of indexes recovered for each
mutation type.  I collect this information in 2 ways of interest: the indexes by
type which are deemed to be from the RT and from the sequencer.  In addition, I
calculate a normalized (cpm) version of this information which may be used to look for
changes across samples.

### Mutations by RT index

This following block should print out tables of the numbers of mutant indexes
observed for each type for the RT and the sequencer.  One would hope that the
sequencer will be consistent for all samples, but I think the results will
instead suggest that my metric is not yet stringent enough.

```{r mutation_index_count, results='asis'}
knitr::kable(triples[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(triples_tenmpr[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(triples_fivempr[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(quints[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(quints_tenmpr[["matrices"]][["miss_indexes_by_type"]])
knitr::kable(quints_fivempr[["matrices"]][["miss_indexes_by_type"]])

knitr::kable(triples[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(triples_tenmpr[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(triples_fivempr[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(quints[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(quints_tenmpr[["matrices"]][["miss_sequencer_by_type"]])
knitr::kable(quints_fivempr[["matrices"]][["miss_sequencer_by_type"]])
```

Plots of this information

```{r mutation_index_count_plots}
triples[["plots"]][["counts"]][["miss_indexes_by_type"]]
triples_tenmpr[["plots"]][["counts"]][["miss_indexes_by_type"]]
triples_fivempr[["plots"]][["counts"]][["miss_indexes_by_type"]]

quints[["plots"]][["counts"]][["miss_indexes_by_type"]]
quints_tenmpr[["plots"]][["counts"]][["miss_indexes_by_type"]]
quints_fivempr[["plots"]][["counts"]][["miss_indexes_by_type"]]
```

This suggests to me that this information needs to be normalized in some more
sensible fashion.  Thus the following:

### Mutations by RT index, post normalization

The same numbers may be expressed in the context of the number of indexes
observed / sample and/or as a cpm across samples.  Thus in the first instance
one can look at the apparent error rate for each sample, and in the second
instance one may look for relative changes in apparent error rate across
samples.

#### Rewriting the matrices as cpm to account for library sizes.

```{r mutation_index_normalized, results='asis'}
knitr::kable(triples[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(triples_tenmpr[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(triples_fivempr[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(quints[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(quints_tenmpr[["normalized"]][["miss_indexes_by_type"]])
knitr::kable(quints_fivempr[["normalized"]][["miss_indexes_by_type"]])

knitr::kable(triples[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(triples_tenmpr[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(triples_fivempr[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(quints[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(quints_tenmpr[["normalized"]][["miss_sequencer_by_type"]])
knitr::kable(quints_fivempr[["normalized"]][["miss_sequencer_by_type"]])
```

#### Rewriting the matrices by dividing by all indexes

This I think starts to address the later text in your query.

```{r mutation_index_normalized_by_counts, results='asis'}
knitr::kable(triples[["matrices_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(quints[["matrices_by_counts"]][["miss_indexes_by_type"]])

knitr::kable(triples[["matrices_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(quints[["matrices_by_counts"]][["miss_sequencer_by_type"]])
```

#### Rewriting the matrices by dividing by all indexes and cpm

I think this might prove to be where we get the most meaningful results.

The nicest thing in it is that after accounting for library sizes and total
indexes observed, we finally see that the sequencer error is mostly consistent
across all samples and mutation types -- with a couple of notable exceptions.

By the same token, for the mutations which _are_ identical for the sequencer, we
have some which are decidedly different for the non-sequencer data.  The most
notable examples I think are A to G but _not G to A; and C to T.

```{r mutation_index_cpm_by_counts, results='asis'}
knitr::kable(triples[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(quints[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_indexes_by_type"]])
knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_indexes_by_type"]])

knitr::kable(triples[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(triples_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(triples_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(quints[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(quints_tenmpr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
knitr::kable(quints_fivempr[["normalized_by_counts"]][["miss_sequencer_by_type"]])
```

### Indels by RT index

The following blocks will repeat the above, but looking for insertions.
This data does not observe sufficient deletions to make a proper count for them.

```{r insert_index_count, results='asis'}
knitr::kable(triples[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(triples_tenmpr[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(triples_fivempr[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(quints[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(quints_tenmpr[["matrices"]][["insert_indexes_by_nt"]])
knitr::kable(quints_fivempr[["matrices"]][["insert_indexes_by_nt"]])

knitr::kable(triples[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_tenmpr[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_fivempr[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(quints[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_tenmpr[["matrices"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_fivempr[["matrices"]][["insert_sequencer_by_nt"]])
```

Plots of this information

```{r insert_index_count_plots}
triples[["plots"]][["counts"]][["insert_indexes_by_nt"]]
triples_tenmpr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
triples_fivempr[["plots"]][["counts"]][["insert_indexes_by_nt"]]

quints[["plots"]][["counts"]][["insert_indexes_by_nt"]]
quints_tenmpr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
quints_fivempr[["plots"]][["counts"]][["insert_indexes_by_nt"]]
```

### Insertions by RT index, post normalization

#### Rewriting the matrices as cpm to account for library sizes.

```{r insert_index_normalized, results='asis'}
knitr::kable(triples[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(triples_tenmpr[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(triples_fivempr[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(quints[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(quints_tenmpr[["normalized"]][["insert_indexes_by_nt"]])
knitr::kable(quints_fivempr[["normalized"]][["insert_indexes_by_nt"]])

knitr::kable(triples[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_tenmpr[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_fivempr[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(quints[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_tenmpr[["normalized"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_fivempr[["normalized"]][["insert_sequencer_by_nt"]])
```

#### Rewriting the matrices by dividing by all indexes

I think that there are few enough insertion events that this gets a bit messed
up.  I will double check the logic of this, but that is my initial guess given
how few insertions I was seeing when reading the outputs manually.
Unfortunately, this means that for these I also cannot provide a cpm measurement.

```{r insert_index_normalized_by_counts, results='asis'}
knitr::kable(triples[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(quints[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_indexes_by_nt"]])
knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_indexes_by_nt"]])

knitr::kable(triples[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(triples_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(quints[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_tenmpr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
knitr::kable(quints_fivempr[["matrices_by_counts"]][["insert_sequencer_by_nt"]])
```

The following is my previous writing of this worksheet which just dumped the
various tables.

# Print raw tables

```{r raw, results='asis'}
for (t in 1:length(triples[["matrices"]])) {
  table_name <- names(triples[["matrices"]])[t]
  message("Raw table: ", table_name, ".")
  print(knitr::kable(triples[["matrices"]][t]))
}
```

# Print raw plots

```{r raw_plots}
for (t in 1:length(triples[["plots"]][["matrices"]])) {
  message("Raw table: ", table_name, ".")
  print(triples[["plots"]][["matrices"]][t])
}
```

# Print normalized tables

```{r norm, results='asis'}
for (t in 1:length(triples[["matrices_counts"]])) {
  table_name <- names(triples[["matrices_counts"]])[t]
  message("Normalized table: ", table_name, ".")
  print(knitr::kable(triples[["matrices_counts"]][t]))
}
```

# Print normalized plots

```{r norm_plots}
for (t in 1:length(triples[["plots"]][["counts"]])) {
  message("Normalized table: ", table_name, ".")
  print(triples[["plots"]][["counts"]][t])
}
```

```{r saveme}
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
message(paste0("Saving to ", this_save))
tmp <- sm(saveme(filename=this_save))
```


```{r loadme, eval=FALSE}
loadme(filename=this_save)
```
