This dataset has gone through a couple of sequencing iterations. The second of which appears to have been processed in 3 different ways. By that I mean the following: when I downloaded the data, there were three directories for each sample, one which was the top-level, one with the name ‘UGMC’ and one called ‘index 1 only’. There was also a note which had some text describing these three trees.
In my emails with Luke, I got the strong sense that a likely way to handle these is to concatenate them into a single set.
The following block does that and uses my consolidate_reads.pl script to search each read pair for a TA
Note, every time I insert an example script produced by cyoa, I am inserting it from the l1_input scripts/. I am reasonably certain all the scripts for the other samples are functionally identical, since I am invoking them all via shell for loops (though actually that was not true for a couple of steps because I have had some weird and annoying problems getting some things to work in this data, most notably the first steps: trimming, consolidation, and getting tpp to finish running.
start="$(pwd)/preprocessing/reseq"
dirs="l1_input l2_input l3_input l1_blood l2_blood l3_blood l1_brain l2_brain l3_brain"
module add cutadapt
cd ${start}
for dir in $dirs; do
echo $dir
mkdir -p "${start}/${dir}/concat"
cd "${start}/${dir}/concat"
cutadapt -m 12 --trim-n --revcomp -g "ACTTATCAGCCAACCTGTTA;anywhere" -a GGGGG *_R1_*.fastq.gz > r1_v2.fastq.gz
cd $start
done
start="${start}/preprocessing/reseq"
dirs="l1_input l2_input l3_input l1_blood l2_blood l3_blood l1_brain l2_brain l3_brain"
dirs="l1_blood l1_brain"
input_file="r1_v2.fastq.gz"
output_name=${input_file%.fastq.gz}
cd ${start}
for dir in $dirs; do
echo $dir
cd "${start}/${dir}/concat"
cyoa --method tpp --species sagalactiae_cjb111 --input ${input_file} \
--gff_type CDS --gff_tag locus_tag --output ${output_name}
cd ${start}
done
So, this finished Friday evening and I started getting TA counts very similar (identical?) to Luke’s. However, I keep looking at the reads and am confused. Here are some comparisons using the l1_input:
My trimming/read consolidation results in a similar number of reads (15.7M vs. 20.1M, so I dropped 25% of the reads Luke’s trim kept. I read the beginnings of these populations side-by-side. My own dataset makes sense to me, every read starts with the insertion TA; of them 22% map to pKrmit, 50% map to CJB111, and many(all?) of the rest map to some weirdo expression vector; the ones which do map to CJB111 account for 17,497 different TAs. In the case of the newly trimmed data, not every read starts with TA (maybe ~25%), but somehow this slightly larger dataset manages to hit 3x more TAs in the genome?
As a table of the input libraries: l1_input, l2_input, l3_input
|Start reads r1|trimmed+consolidated|trimmedv2 |consolidated_mapped|trimmedv2_mapped|consolidated_TAs|trimmedv2_TAs| |20,115,227 |15,699,404 |20,105,787|6,901,757 |10,097,937 |17,497 |51,787 | |18,512,684 |14,371,440 |18,503,591|9,498,527 |12,179,959 |16,908 |50,533 | |7,997,303 |7,887,059 |7,994,922 |4,981,438 |5,028,583 |14,372 |46,424 |
So, lets start with the pathologically weird case of the three first. We get pretty similar read sets at trimming/consolidation. I scanned the first 100 or so reads of the two sets and I could eventually find where they are mostly the same, I see some which are explicitly missing from my set, but mostly mine are either rc and/or a little more trimmed. That fits the numbers, my trimmed dataset is ~ 1.3% smaller. This trend holds when looking at the sam mappings, I mapped 47,145 fewer reads (0.9% less).
Lets take a moment and see if they are in fact similar mappings?
Here are the first few sam outputs from my bt2 mapping against CJB111:
VH00601:2:AACHNC5M5:1:1102:17872:7342 0 CP063198.1 1 2 3M11I108M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT CCCCCC;-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCC AS:i:-53 XS:i:-59 XN:i:0 XM:i:3 XO:i:1 XG:i:11 NM:i:14 MD:Z:0C0C0A108 YT:Z:UU VH00601:2:AACHNC5M5:1:1102:38057:15218 0 CP063198.1 1 0 3M11I108M * 0 0 TGTCTAACCGACCACCTGAAATAGTACCTGCATAAGCAATCACAATTTCCCTAGTATCCGCATCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCATGCATAACATTTGAAGTATA C-CCCCC-;CC;CC;C;CCCC-C;-CCCCCCCC;----CC;C--C--C--CC--C-CC;C---;-C;-----C-;---C-CCC;-CCCCC;;---CC;-CC-C;-C;--CC-C-;C--CC;- AS:i:-72 XN:i:0 XM:i:10 XO:i:1 XG:i:11 NM:i:21 MD:Z:0C0C0A29T5A10G0T39T6C12T0 YT:Z:UU VH00601:2:AACHNC5M5:1:1102:50952:37954 0 CP063198.1 1 2 3M11I108M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT ;CCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCC;CCCCC AS:i:-52 XS:i:-58 XN:i:0 XM:i:3 XO:i:1 XG:i:11 NM:i:14 MD:Z:0C0C0A108 YT:Z:UU VH00601:2:AACHNC5M5:1:1102:61688:40320 0 CP063198.1 1 2 3M11I108M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC AS:i:-53 XS:i:-59 XN:i:0 XM:i:3 XO:i:1 XG:i:11 NM:i:14 MD:Z:0C0C0A108 YT:Z:UU VH00601:2:AACHNC5M5:1:1102:49607:55200 0 CP063198.1 1 2 3M11I108M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCC AS:i:-53 XS:i:-59 XN:i:0 XM:i:3 XO:i:1 XG:i:11 NM:i:14 MD:Z:0C0C0A108 YT:Z:UU VH00601:2:AACHNC5M5:1:1102:38454:56127 0 CP063198.1 1 2 3M11I108M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCC AS:i:-53 XS:i:-59 XN:i:0 XM:i:3 XO:i:1 XG:i:11 NM:i:14 MD:Z:0C0C0A108 YT:Z:UU VH00601:2:AACHNC5M5:1:1103:72576:35569 0 CP063198.1 1 2 3M11I108M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT CCCCCCCCCCCCCCCCCCCCCC;CCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCC-CCCCC-CCC;CCCCC;;CCCCCCCCCCCCC AS:i:-53 XS:i:-59 XN:i:0 XM:i:3 XO:i:1 XG:i:11 NM:i:14 MD:Z:0C0C0A108 YT:Z:UU VH00601:2:AACHNC5M5:1:1104:58943:53174 0 CP063198.1 1 2 3M11I108M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT CCCCCC-CCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCC;CCCCCCC;-CCCCCCCCCC-CC;CC AS:i:-53 XS:i:-59 XN:i:0 XM:i:3 XO:i:1 XG:i:11 NM:i:14 MD:Z:0C0C0A108 YT:Z:UU VH00601:2:AACHNC5M5:1:1201:73693:20670 0 CP063198.1 1 2 3M11I108M * 0 0 AGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCCCTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTAGCATAGCTTGCATACCATTTGTAGTATT --CCCCC-;-;-C;---;CC--C--CCC-CCCCC--CCCCCCCCCCC;C-;C--C-CC--;CCC-CCCC;CCCCCC;-CCCC;CCCC;C-C-CC--C;--C--CCC;;;CCCC;C-CC;;CC AS:i:-58 XS:i:-64 XN:i:0 XM:i:6 XO:i:1 XG:i:11 NM:i:17 MD:Z:0C0C0A35A44C20A6 YT:Z:UU VH00601:2:AACHNC5M5:1:1201:7855:35114 0 CP063198.1 1 2 3M11I108M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT ;CCCCCC;CCCCCCCCCCCCCCCCCCCCCCC
vs. these from the new trimming
VH00601:2:AACHNC5M5:1:1101:14671:20083 0 CP063198.1 1 60 11S84M27S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACACTCACCGCTCTTGTAGCTCGTGCCTC * NM:i:0 MD:Z:84 AS:i:84 XS:i:43 VH00601:2:AACHNC5M5:1:1102:17872:7342 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1102:38057:15218 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAACCGACCACCTGAAATAGTACCTGCATAAGCAATCACAATTTCCCTAGTATCCGCATCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCATGCATAACATTTGAAGTATA * NM:i:7 MD:Z:32T5A10G0T39T6C12T0 AS:i:80 XS:i:34 VH00601:2:AACHNC5M5:1:1102:20693:20727 0 CP063198.1 1 60 11S79M32S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCACTCACCGCTCTTGTAGGAGAATTCACTGTCTC * NM:i:0 MD:Z:79 AS:i:79 XS:i:43 VH00601:2:AACHNC5M5:1:1102:50952:37954 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1102:61688:40320 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1102:49607:55200 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1102:38454:56127 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1103:30028:16051 0 CP063198.1 1 60 11S111M * 0 0 ACTCTAACCGACCACCTGAAATAGTACCTGCATAAGCAATCAATAATTCACTAGAATCCGGTTCCGCTTTTGTCCTTTATTATACGAGCAACTACCATAGCTTGCATAACATTTGAAGAATT * NM:i:9 MD:Z:31C2T8T9A13T4T0G23C9T3 AS:i:67 XS:i:28 VH00601:2:AACHNC5M5:1:1103:72576:35569 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1104:41408:33562 0 CP063198.1 1 37 11S54M57S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCACTCACCGCTCTTGTAGGAGGTTGCGCTGTCTCTTATACACATCTCCGAGCCCACGAG * NM:i:0 MD:Z:54 AS:i:54 XS:i:43 VH00601:2:AACHNC5M5:1:1104:36864:39544 0 CP063198.1 1 60 15S107M * 0 0 GCTATGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAG * NM:i:0 MD:Z:107 AS:i:107 XS:i:65 VH00601:2:AACHNC5M5:1:1104:58943:53174 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1201:73693:20670 0 CP063198.1 1 60 11S111M * 0 0 AGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCCCTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTAGCATAGCTTGCATACCATTTGTAGTATT * NM:i:3 MD:Z:38A44C20A6 AS:i:96 XS:i:49 VH00601:2:AACHNC5M5:1:1201:7855:35114 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1201:42639:46075 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1202:55193:12851 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1202:30899:18815 0 CP063198.1 1 60 11S100M11S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCACTCACCGCTCT * NM:i:0 MD:Z:100 AS:i:100 XS:i:0 VH00601:2:AACHNC5M5:1:1202:35974:32975 0 CP063198.1 1 21 11S33M78S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTCACCGCTCTTGTAGCCAGTACCGCTGTCTCTTATACACATCTCCGAGCCCACGAGACTACGCTGCATCTCGTATTCCG * NM:i:0 MD:Z:33 AS:i:33 XS:i:29 XA:Z:CP063198.1,-2005,78S44M,3; VH00601:2:AACHNC5M5:1:1203:47051:10352 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1203:24764:49710 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1204:27794:36458 0 CP063198.1 1 60 11S77M34S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGACTCACCGCTCTTGTAGCCGCATTCTCTGTCTCT * NM:i:0 MD:Z:77 AS:i:77 XS:i:43 VH00601:2:AACHNC5M5:1:1301:27282:2363 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65 VH00601:2:AACHNC5M5:1:1301:40026:44864 0 CP063198.1 1 60 11S77M34S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGACACACCGCTCTTGTAGGGTCGTTTACTGTCTCT * NM:i:0 MD:Z:77 AS:i:77 XS:i:43 VH00601:2:AACHNC5M5:1:1301:39837:45242 0 CP063198.1 1 60 11S77M34S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGACTCACCGCTCTTGTAGGGTCGTTTACTGTCTCT * NM:i:0 MD:Z:77 AS:i:77 XS:i:43 VH00601:2:AACHNC5M5:1:1301:55193:47874 0 CP063198.1 1 60 11S95M16S * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCACTCACCGCTCTTGTAG * NM:i:0 MD:Z:95 AS:i:95 XS:i:0 VH00601:2:AACHNC5M5:1:1302:70569:20576 0 CP063198.1 1 60 11S111M * 0 0 TGTCTAAACGACCACCTGAAATAGTACCTGCATAAGCAATCACTATTTCACTAGTATCCGGTTCAGCTTTTGTCCTTTTTTATTGGAGCAACTACCATAGCTTGCATACCATTTGAAGTATT * NM:i:0 MD:Z:111 AS:i:111 XS:i:65
ok, immediately I learned a lesson: don’t copy/paste the first few reads when you run samtools on a circular genome mapping which was sorted by position. Those first few reads all started somewhere before 12 o’clock and are confusing. So I tried again and skipped down to position 616, which has a bunch of identical reads which mapped there.
ok, so I see one thing which jumps right out, the alignments at position 616 started immediately after the TA in consolidated.fastq, why? Let’s first double-check the genome in IGV at that region and make sure it makes sense. Actually, while we are at it, lets just load these two alignments (l1_input using my consolidation trimming and the new trimming) side-by-side for this region. WTF they both quite similar and both quite explicitly starting immediately after each TA!? Having written that, the new trimmed do have some reads which cross over the TA that I don’t; could that be the thing which is causing TPP to get such higher numbers of TAs hit? That doesn’t make sense. Another mystery. Here is an image, the top is the alignment by TPP using the newly trimmed data. The bottom is same, but using my consolidated reads. The TAs are obvious to see, they are the little white lines running down the screen. Note: when I looked up a couple of these read IDs, they START_WITH_THAT_TA. So what the heck is going on here? So weird!
images/igv_snapshot.png
Ok, I am getting frustrated by not understanding what is going on. I am going to step away because I think it doesn’t really matter from the perspective of how to interpret the results, even with these weirdly higher numbers of TAs getting hit, we are still pretty drastically below the minimum for statistical significance.
I think I may still use this to play with the downstream transit tools and improve how I handle the data in the hopes that a deeper sequencing set comes in for this data, because it is exceedingly interesting.
cd ${start}
for dir in $dirs; do
echo $dir
cd "${start}/${dir}/concat"
cyoa --method bt2 --species krmit --input ${input_file}
cyoa --method bt2 --species sagalactiae_cjb111 --input ${input_file} \
--gff_type CDS --gff_tag locus_tag --stranded no
cd ${start}
done
At this point I have the wig files from TPP along with the various alignments from bowtie2, let us start poking at them. Note the grep -v in the creation of the following bash variables, that is because my preprocessing script by default creates every file with the same name (‘concat.wig’ in this case). I am going to change this as soon as I finish this; but as a result I symbolically linked each filename to its parent directory name l1_input.wig etc…
cd $start
control_files=$(find . -name '*.wig' | grep input | grep -v concat.wig | sed 's/\.\///g')
echo $control_files
blood_files=$(find . -name '*.wig' | grep blood | grep -v concat.wig | sed 's/\.\///g')
echo $blood_files
brain_files=$(find . -name '*.wig' | grep brain | grep -v concat.wig | sed 's/\.\///g')
echo $brain_files
For some crazy reason, sometimes the python I start in elpy starts in my working directory as cwd, sometimes it doesn’t. So I added an explicit chdir(). I suspect that python, like R, doesn’t like it if the sshfs filesystem gets disconnected. However, the R session I am using for this is still in the correct directory; so if this hypothesis is correct then python must be more aggressive about doing fstat()? Whatever, I just did a manual chdir and it seems fine.
import os
##os.chdir("/home/trey/scratch/tnseq/sagalacticae_2022")
import importlib ## Allows me to reload pytransit as I play with/edit it via importlib.reload()
import io ## Used for logging not to STDOUT/STDERR
import logging
import contextlib ## Used to redirect STDOUT/STDERR
import numpy
import scipy
import pandas
import matplotlib.pyplot as plt
import pytransit
import pytransit.analysis as methods
import pytransit.norm_tools as norm_tools
import pytransit.tnseq_tools as tnseq_tools
import pytransit.transit_tools as transit_tools
## import pytransit.transit_gui as transit_gui
import pytransit.plot_tools as plotter
import pytransit.analysis
import pytransit.export
import pytransit.convert
= pytransit.analysis.methods
methods = pytransit.export.methods
export_methods = pytransit.convert.methods
convert_methods = {}
all_methods all_methods.update(methods)
= ['preprocessing/reseq/l3_input/concat/outputs/61transit_sagalactiae_cjb111/l3_input.wig',
input_wig 'preprocessing/reseq/l1_input/concat/outputs/61transit_sagalactiae_cjb111/l1_input.wig',
'preprocessing/reseq/l2_input/concat/outputs/61transit_sagalactiae_cjb111/l2_input.wig',]
= ['preprocessing/reseq/l3_blood/concat/outputs/61transit_sagalactiae_cjb111/l3_blood.wig',
blood_wig 'preprocessing/reseq/l1_blood/concat/outputs/61transit_sagalactiae_cjb111/l1_blood.wig',
'preprocessing/reseq/l2_blood/concat/outputs/61transit_sagalactiae_cjb111/l2_blood.wig',]
= ['preprocessing/reseq/l3_brain/concat/outputs/61transit_sagalactiae_cjb111/l3_brain.wig',
brain_wig 'preprocessing/reseq/l1_brain/concat/outputs/61transit_sagalactiae_cjb111/l1_brain.wig',
'preprocessing/reseq/l2_brain/concat/outputs/61transit_sagalactiae_cjb111/l2_brain.wig',]
= tnseq_tools.get_data(input_wig)
(input_data, input_positions) = tnseq_tools.get_data(blood_wig)
(blood_data, blood_positions) = tnseq_tools.get_data(brain_wig)
(brain_data, brain_positions)
## Hmm it looks like the documentation is a little out of date...
## The following does not work, but is trivially run using normalize_data()
## input_norm = norm.aBGC_norm(input_data)
As I understand it, transit parses the input gff file and creates a ‘prot_table’ file. I assumed it did this while TPP was running, but I think that is incorrect.
= 'reference/sagalactiae_cjb111.gff'
gff_file
## Make sure I typed that correctly (I must have, emacs did it for me, but I am learning python here)
os.stat(gff_file)
## Note, I am using my forked copy of transit in which I have made some small changes to improve my QoL.
## os.stat_result(st_mode=33188, st_ino=2041, st_dev=90, st_nlink=1, st_uid=10186, st_gid=18062, st_size=996278, st_atime=1678375207, st_mtime=1678375207, st_ctime=1678375207)
= tnseq_tools.get_pos_hash_gff(gff_file, feature_type = 'CDS',
gff_annot = 'ID') feature_tag
## Out of 4350 lines, 2013 were of type CDS and have the ID tag.
= 'quantile'
chosen_norm = norm_tools.normalize_data(input_data, method = chosen_norm)
input_norm = norm_tools.normalize_data(blood_data, method = chosen_norm)
blood_norm = norm_tools.normalize_data(brain_data, method = chosen_norm) brain_norm
So, it turns out I make too many assumptions about device interaction in R/python. I assumed in the following block that it would return a plot object I can play with. Fix that now.
I am not sure if plt.figure() is helpful when making a markdown report, but when running this interactively it is nice to make sure that I get back a fresh plot device for the next plot invocation.
## Plot input 1 vs. blood 1
= plotter.plot_scatter(input_data, blood_data) l1_b1
plt.figure()## Compare input 1 vs input 2
= plotter.plot_scatter(input_data, input_data, control_rep = 0, experimental_rep = 1) input_r1r2
plt.figure()## input 1 vs input 3
= plotter.plot_scatter(input_data, input_data, control_rep = 0, experimental_rep = 2) input_r1r3
plt.figure()## input 2 vs input 3
= plotter.plot_scatter(input_data, input_data, control_rep = 1, experimental_rep = 2) input_r2r3
plt.figure()## blood 1 vs brain 1
= plotter.plot_scatter(blood_data, brain_data, control_rep = 0, experimental_rep = 0) l1_blbr
plt.figure()
My favorite method is the Gumbel estimator.
= 'reference/only_cds.gff'
annotations ## I should add my feature_tag and feature_type args to the call chain for this.
def gumbel_sample(input_set, annotations, output_file = 'gumbel.tsv'):
= open(output_file, mode = "w")
gumbel_out = all_methods['gumbel']
gumbel = gumbel.method(input_set, annotations, gumbel_out)
gumbel_setup = 'error.log', level = logging.DEBUG)
logging.basicConfig(filename = io.StringIO()
log print('Starting the gumbel gene essentiality estimator and logging to "error.log"')
with contextlib.redirect_stdout(log), contextlib.redirect_stderr(log):
= gumbel_setup.Run()
gumbel_run
logging.info(log.getvalue())print('Compressing output tsv file.')
= f'xz -9e -f {output_file}'
xz_string = os.popen(xz_string)
stream
= gumbel_sample(input_wig, annotations, output_file = 'input_gumbel.tsv') input_gumbel
## Starting the gumbel gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
= gumbel_sample(blood_wig, annotations, output_file = 'blood_gumbel.tsv') blood_gumbel
## Starting the gumbel gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
= gumbel_sample(brain_wig, annotations, output_file = 'brain_gumbel.tsv') brain_gumbel
## Starting the gumbel gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
## I should add my feature_tag and feature_type args to the call chain for this.
def resampling_sample(control_set, input_set, annotations, output_file = 'resample.tsv'):
= open(output_file, mode = "w")
resampling_out = all_methods['resampling']
resampling = resampling.method(control_set, input_set, annotations, resampling_out)
resampling_setup = 'error.log', level = logging.DEBUG)
logging.basicConfig(filename = io.StringIO()
log print('Starting the resampling estimator and logging to "error.log"')
with contextlib.redirect_stdout(log), contextlib.redirect_stderr(log):
= resampling_setup.Run()
resampling_run
logging.info(log.getvalue())print('Compressing output tsv file.')
= f'xz -9e -f {output_file}'
xz_string = os.popen(xz_string)
stream
= resampling_sample(input_wig, blood_wig, annotations, output_file = 'blood_vs_input_resampling.tsv') blood_resampling
## [resampling] site_restricted=False
## Starting the resampling estimator and logging to "error.log"
## Compressing output tsv file.
= resampling_sample(input_wig, brain_wig, annotations, output_file = 'brain_vs_input_resampling.tsv') brain_resampling
## [resampling] site_restricted=False
## Starting the resampling estimator and logging to "error.log"
## Compressing output tsv file.
This method looks at gaps between TAs as an estimator, neat!
def griffin_sample(input_set, annotations, output_file = 'griffin.tsv'):
= open(output_file, mode = "w")
griffin_out = all_methods['griffin']
griffin = griffin.method(input_set, annotations, griffin_out)
griffin_setup = 'error.log', level = logging.DEBUG)
logging.basicConfig(filename = io.StringIO()
log print('Starting the griffin gene essentiality estimator and logging to "error.log"')
with contextlib.redirect_stdout(log), contextlib.redirect_stderr(log):
= griffin_setup.Run()
griffin_run
logging.info(log.getvalue())print('Compressing output tsv file.')
= f'xz -9e -f {output_file}'
xz_string = os.popen(xz_string)
stream
= griffin_sample(input_wig, annotations, output_file = 'input_griffin.tsv') input_griffin
## Starting the griffin gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
= griffin_sample(blood_wig, annotations, output_file = 'blood_griffin.tsv') blood_griffin
## Starting the griffin gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
= griffin_sample(brain_wig, annotations, output_file = 'brain_griffin.tsv') brain_griffin
## Starting the griffin gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
This method compares the data to an expected binomial distribution. This is my interpretation after glancing at the source code for 10 seconds, so who knows how correct I am.
def binomial_sample(input_set, annotations, output_file = 'binomial.tsv'):
= open(output_file, mode = "w")
binomial_out = all_methods['binomial']
binomial = binomial.method(input_set, annotations, binomial_out)
binomial_setup = 'error.log', level = logging.DEBUG)
logging.basicConfig(filename = io.StringIO()
log print('Starting the binomial gene essentiality estimator and logging to "error.log"')
with contextlib.redirect_stdout(log), contextlib.redirect_stderr(log):
= binomial_setup.Run()
binomial_run
logging.info(log.getvalue())print('Compressing output tsv file.')
= f'xz -9e -f {output_file}'
xz_string = os.popen(xz_string)
stream
= binomial_sample(input_wig, annotations, output_file = 'input_binomial.tsv') input_binomial
## Starting the binomial gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
= binomial_sample(blood_wig, annotations, output_file = 'blood_binomial.tsv') blood_binomial
## Starting the binomial gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
= binomial_sample(brain_wig, annotations, output_file = 'brain_binomial.tsv') brain_binomial
## Starting the binomial gene essentiality estimator and logging to "error.log"
## Compressing output tsv file.
## I should add my feature_tag and feature_type args to the call chain for this.
def rankproduct_sample(control_set, input_set, annotations, output_file = 'rankproduct.tsv'):
= open(output_file, mode = "w")
rankproduct_out = all_methods['rankproduct']
rankproduct = rankproduct.method(control_set, input_set, annotations, rankproduct_out)
rankproduct_setup = 'error.log', level = logging.DEBUG)
logging.basicConfig(filename = io.StringIO()
log print('Starting the rankproduct estimator and logging to "error.log"')
with contextlib.redirect_stdout(log), contextlib.redirect_stderr(log):
= rankproduct_setup.Run()
rankproduct_run
logging.info(log.getvalue())print('Compressing output tsv file.')
= f'xz -9e -f {output_file}'
xz_string = os.popen(xz_string)
stream
= rankproduct_sample(input_wig, blood_wig, annotations, output_file = 'blood_vs_input_rankproduct.tsv') blood_rankproduct
## Starting the rankproduct estimator and logging to "error.log"
## Compressing output tsv file.
= rankproduct_sample(input_wig, brain_wig, annotations, output_file = 'brain_vs_input_rankproduct.tsv') brain_rankproduct
## Starting the rankproduct estimator and logging to "error.log"
## Compressing output tsv file.
<- load_microbesonline_annotations(species="CJB111") cjb_microbes
## Found 1 entry.
## Streptococcus agalactiae CJB111Firmicutesyes2007-05-08noNANA2208342617
## The species being downloaded is: Streptococcus agalactiae CJB111
## Downloading: http://www.microbesonline.org/cgi-bin/genomeInfo.cgi?tId=342617;export=tab
<- load_gff_annotations("~/scratch/libraries/genome/sagalactiae_cjb111.gff", type = "CDS") cjb_gff
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo = TRUE)
## Had a successful gff import with rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo = TRUE)
## Returning a df with 42 columns and 2013 rows.
<- as.data.frame(cjb_microbes)
cjb_microbes rownames(cjb_gff) <- make.names(cjb_gff[["locus_tag"]], unique=TRUE)
## I am going to only pay attention to the first annotation for each locus tag from microbesonline.
"sysName"]] <- make.names(cjb_microbes[["sysName"]], unique=TRUE)
cjb_microbes[[## I should ask Lindsey if they would like me to merge these
<- cjb_gff
cjb_annot ##cjb_annot <- merge(cjb_gff, cjb_microbes, by.x="old_locus_tag", by.y="sysName")
## rownames(cjb_annot) <- make.names(cjb_annot[["locus_tag"]], unique=TRUE)
<- create_expt(metadata = "sample_sheets/reseq_samples.xlsx",
tpp_expt gene_info = cjb_annot, file_column = "tpphtseqcounts")
## Reading the sample metadata.
## The sample definitions comprises: 9 rows(samples) and 30 columns(metadata fields).
## Matched 2009 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 2009 features and 9 samples.
<- create_expt(metadata = "sample_sheets/reseq_samples.xlsx",
bt2_expt gene_info = cjb_annot, file_column = "bowtie2htseqcounts")
## Reading the sample metadata.
## The sample definitions comprises: 9 rows(samples) and 30 columns(metadata fields).
## Matched 2009 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 2009 features and 9 samples.
<- write_expt(tpp_expt, batch="raw",
tpp_written excel=glue::glue("excel/{rundate}-tpp_counts-v{ver}.xlsx"))
## Writing the first sheet, containing a legend and some summary data.
## `geom_smooth()` using formula = 'y ~ x'
## Loading required package: Matrix
##
##
## Attaching package: 'Matrix'
##
##
## The following object is masked from 'package:S4Vectors':
##
## expand
##
##
##
## Total:14 s
##
## `geom_smooth()` using formula = 'y ~ x'
##
## Total:11 s
<- write_expt(bt2_expt, batch="raw",
bt2_written excel=glue::glue("excel/{rundate}-bt2_counts-v{ver}.xlsx"))
## Writing the first sheet, containing a legend and some summary data.
## `geom_smooth()` using formula = 'y ~ x'
## Total:12 s
## `geom_smooth()` using formula = 'y ~ x'
## Total:13 s
I took a minute to look at the xlsx files produced above and compare the various plots from the bowtie2 and TPP mappings. They are exceedingly (almost identical) similar, which is good.
"legend_plot"]]
tpp_written[["raw_libsize"]]
tpp_written[["raw_density"]]
tpp_written[[## awesome
"norm_disheat"]]
tpp_written[["norm_corheat"]] tpp_written[[
Interestingly, if I set batch to TRUE, limma’s residual-based removeBatchEffect() throws an error. My assumption is that this is because of the extraordinary difference in gene-coverage from input to the experimental samples?
<- list(
keepers "blood_input" = c("blood", "input"),
"brain_input" = c("brain", "input"),
"brain_blood" = c("brain", "blood"))
<- all_pairwise(tpp_expt, model_batch = TRUE, filter = "simple") tpp_pairwise
## This DE analysis will perform all pairwise comparisons among:
##
## input blood brain
## 3 3 3
## This analysis will include a batch factor in the model comprised of:
##
## b1 b2 b3
## 3 3 3
## This will pre-filter the input data using normalize_expt's: simple argument.
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Error in solve.default(t(mod) %*% mod) :
## system is computationally singular: reciprocal condition number = 9.25186e-18
## Error in solve.default(t(mod) %*% mod) :
## system is computationally singular: reciprocal condition number = 9.25186e-18
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
<- all_pairwise(bt2_expt, model_batch = TRUE, filter = "simple") bt2_pairwise
## This DE analysis will perform all pairwise comparisons among:
##
## input blood brain
## 3 3 3
## This analysis will include a batch factor in the model comprised of:
##
## b1 b2 b3
## 3 3 3
## This will pre-filter the input data using normalize_expt's: simple argument.
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Error in solve.default(t(mod) %*% mod) :
## system is computationally singular: reciprocal condition number = 9.25186e-18
## Error in solve.default(t(mod) %*% mod) :
## system is computationally singular: reciprocal condition number = 9.25186e-18
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
<- combine_de_tables(
tpp_table keepers = keepers,
tpp_pairwise, excel = glue::glue("excel/tpp_de_tables-v{ver}.xlsx"))
## Deleting the file excel/tpp_de_tables-v202303.xlsx before writing the tables.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
<- combine_de_tables(
bt2_table keepers = keepers,
bt2_pairwise, excel = glue::glue("excel/bt2_de_tables-v{ver}.xlsx"))
## Deleting the file excel/bt2_de_tables-v202303.xlsx before writing the tables.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
## Getting the genes >= 1.5 stdevs away from the mean of all.
"plots"]][["blood_input"]][["deseq_ma_plots"]] tpp_table[[
"plots"]][["brain_input"]][["deseq_ma_plots"]] tpp_table[[
"plots"]][["brain_blood"]][["deseq_ma_plots"]] tpp_table[[
"plots"]][["blood_input"]][["deseq_ma_plots"]] bt2_table[[
"plots"]][["brain_input"]][["deseq_ma_plots"]] bt2_table[[
"plots"]][["brain_blood"]][["deseq_ma_plots"]] bt2_table[[
::pander(sessionInfo()) pander
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: splines, stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: ruv(v.0.9.7.1), lme4(v.1.1-31), Matrix(v.1.5-3), BiocParallel(v.1.32.5), variancePartition(v.1.28.4), hpgltools(v.1.0), testthat(v.3.1.6), SummarizedExperiment(v.1.28.0), GenomicRanges(v.1.50.2), GenomeInfoDb(v.1.34.9), IRanges(v.2.32.0), S4Vectors(v.0.36.1), MatrixGenerics(v.1.10.0), matrixStats(v.0.63.0), Biobase(v.2.58.0) and BiocGenerics(v.0.44.0)
loaded via a namespace (and not attached): rappdirs(v.0.3.3), rtracklayer(v.1.58.0), R.methodsS3(v.1.8.2), tidyr(v.1.3.0), ggplot2(v.3.4.1), clusterGeneration(v.1.3.7), bit64(v.4.0.5), knitr(v.1.42), R.utils(v.2.12.2), DelayedArray(v.0.24.0), data.table(v.1.14.6), KEGGREST(v.1.38.0), RCurl(v.1.98-1.10), doParallel(v.1.0.17), generics(v.0.1.3), preprocessCore(v.1.60.2), GenomicFeatures(v.1.50.4), callr(v.3.7.3), RhpcBLASctl(v.0.23-42), cowplot(v.1.1.1), usethis(v.2.1.6), RSQLite(v.2.2.20), shadowtext(v.0.1.2), tzdb(v.0.3.0), bit(v.4.0.5), enrichplot(v.1.18.3), xml2(v.1.3.3), httpuv(v.1.6.8), assertthat(v.0.2.1), viridis(v.0.6.2), xfun(v.0.37), hms(v.1.1.2), jquerylib(v.0.1.4), IHW(v.1.26.0), evaluate(v.0.20), promises(v.1.2.0.1), DEoptimR(v.1.0-11), fansi(v.1.0.4), restfulr(v.0.0.15), progress(v.1.2.2), caTools(v.1.18.2), dbplyr(v.2.3.0), geneplotter(v.1.76.0), igraph(v.1.4.0), DBI(v.1.1.3), htmlwidgets(v.1.6.1), purrr(v.1.0.1), ellipsis(v.0.3.2), selectr(v.0.4-2), dplyr(v.1.1.0), backports(v.1.4.1), annotate(v.1.76.0), aod(v.1.3.2), biomaRt(v.2.54.0), vctrs(v.0.5.2), remotes(v.2.4.2), here(v.1.0.1), cachem(v.1.0.6), withr(v.2.5.0), ggforce(v.0.4.1), HDO.db(v.0.99.1), robustbase(v.0.95-0), vroom(v.1.6.1), GenomicAlignments(v.1.34.0), treeio(v.1.22.0), fdrtool(v.1.2.17), prettyunits(v.1.1.1), DOSE(v.3.24.2), ape(v.5.6-2), lazyeval(v.0.2.2), crayon(v.1.5.2), genefilter(v.1.80.3), slam(v.0.1-50), labeling(v.0.4.2), edgeR(v.3.40.2), pkgconfig(v.2.0.3), tweenr(v.2.0.2), nlme(v.3.1-162), pkgload(v.1.3.2), devtools(v.2.4.5), rlang(v.1.0.6), lifecycle(v.1.0.3), miniUI(v.0.1.1.1), downloader(v.0.4), filelock(v.1.0.2), BiocFileCache(v.2.6.0), directlabels(v.2021.1.13), rprojroot(v.2.0.3), polyclip(v.1.10-4), graph(v.1.76.0), aplot(v.0.1.9), lpsymphony(v.1.26.3), boot(v.1.3-28.1), processx(v.3.8.0), png(v.0.1-8), viridisLite(v.0.4.1), rjson(v.0.2.21), bitops(v.1.0-7), R.oo(v.1.25.0), gson(v.0.0.9), KernSmooth(v.2.23-20), pander(v.0.6.5), Biostrings(v.2.66.0), blob(v.1.2.3), stringr(v.1.5.0), qvalue(v.2.30.0), readr(v.2.1.4), remaCor(v.0.0.11), gridGraphics(v.0.5-1), scales(v.1.2.1), memoise(v.2.0.1), GSEABase(v.1.60.0), magrittr(v.2.0.3), plyr(v.1.8.8), gplots(v.3.1.3), zlibbioc(v.1.44.0), compiler(v.4.2.0), scatterpie(v.0.1.8), BiocIO(v.1.8.0), RColorBrewer(v.1.1-3), DESeq2(v.1.38.3), Rsamtools(v.2.14.0), cli(v.3.6.0), XVector(v.0.38.0), urlchecker(v.1.0.1), patchwork(v.1.1.2), ps(v.1.7.2), MASS(v.7.3-58.2), mgcv(v.1.8-41), tidyselect(v.1.2.0), stringi(v.1.7.12), highr(v.0.10), yaml(v.2.3.7), GOSemSim(v.2.24.0), locfit(v.1.5-9.7), ggrepel(v.0.9.3), grid(v.4.2.0), sass(v.0.4.5), fastmatch(v.1.1-3), tools(v.4.2.0), parallel(v.4.2.0), rstudioapi(v.0.14), foreach(v.1.5.2), gridExtra(v.2.3), Rtsne(v.0.16), farver(v.2.1.1), ggraph(v.2.1.0), digest(v.0.6.31), shiny(v.1.7.4), quadprog(v.1.5-8), Rcpp(v.1.0.10), broom(v.1.0.3), later(v.1.3.0), httr(v.1.4.4), AnnotationDbi(v.1.60.0), Rdpack(v.2.4), colorspace(v.2.1-0), rvest(v.1.0.3), brio(v.1.1.3), XML(v.3.99-0.13), fs(v.1.6.1), reticulate(v.1.28), RBGL(v.1.74.0), yulab.utils(v.0.0.6), PROPER(v.1.30.0), tidytree(v.0.4.2), graphlayouts(v.0.8.4), ggplotify(v.0.1.0), plotly(v.4.10.1), sessioninfo(v.1.2.2), xtable(v.1.8-4), jsonlite(v.1.8.4), nloptr(v.2.0.3), ggtree(v.3.6.2), tidygraph(v.1.2.3), corpcor(v.1.6.10), ggfun(v.0.0.9), Vennerable(v.3.1.0.9000), R6(v.2.5.1), RUnit(v.0.4.32), profvis(v.0.3.7), pillar(v.1.8.1), htmltools(v.0.5.4), mime(v.0.12), glue(v.1.6.2), fastmap(v.1.1.0), minqa(v.1.2.5), clusterProfiler(v.4.6.0), codetools(v.0.2-19), fgsea(v.1.24.0), pkgbuild(v.1.4.0), mvtnorm(v.1.1-3), utf8(v.1.2.3), lattice(v.0.20-45), bslib(v.0.4.2), tibble(v.3.1.8), sva(v.3.46.0), pbkrtest(v.0.5.2), curl(v.5.0.0), gtools(v.3.9.4), zip(v.2.2.2), openxlsx(v.4.2.5.2), GO.db(v.3.16.0), survival(v.3.5-3), limma(v.3.54.1), rmarkdown(v.2.20), desc(v.1.4.2), munsell(v.0.5.0), fastcluster(v.1.2.3), GenomeInfoDbData(v.1.2.9), iterators(v.1.0.14), reshape2(v.1.4.4), gtable(v.0.3.1) and rbibutils(v.2.2.13)
message(paste0("This is hpgltools commit: ", get_git_commit()))
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset c100d5666f5032d24c933739015d267ef651c323
## This is hpgltools commit: Wed Mar 1 09:50:14 2023 -0500: c100d5666f5032d24c933739015d267ef651c323
<- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
this_save message(paste0("Saving to ", this_save))
## Saving to indexv2-v202303.rda.xz
<- sm(saveme(filename=this_save)) tmp
loadme(filename=this_save)