This document seeks to lay out my process in poking at the DNAsequencing results of a series of Pseudomonas aeruginosa PA14 and PAK strains.
If I understand Dr. Lee and co.’s goal, they wish to ensure that these strains are still reasonably close to the associated reference strains. I therefore am running my default trimming/mapping/variant search methods.
I have a single command that can run all of these commands at the same time, but I have been actively breaking my tools recently; so I decided to run them one at a time with the assumption that something would not work (but everything did work on the first try, so that was nice).
I downloaded the .zip archive file using the link in Dr. Lee’s email. I did not save it though, so if we need to download the data again, we will have to go to him. I created my usual work directory ‘preprocessing/’ within this tree and moved it there. I unzipped it and moved each pair of reads to a directory which follows Dr. Lee’s desired naming convention.
I then created the directories: ‘reference/’ and ‘sample_sheets/’. The sample_sheets remained empty for a while, but I immediately downloaded the full genbank flat file for the Pseudomonas PAK strain from NCBI, found here:
https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP020659
Note, that when downloading, one must hit the ‘customize view’ button on the right and ensure that the entire sequence and all annotations are included. Then hit the ‘send to’ button and send it to a file. This file I copied to reference/paeruginosa_pak.gb.
Given the full PAK genbank file, I converted it to the expected fasta/gff file for mapping:
cd reference
cyoa --method gb2gff --input paeruginosa_pak.gb
This command created a series of fasta and gff files which provide the coordinates for the various annotations (genes/cds/rRNA/intercds) and sequence for the genome, CDS nucleotides, and amino acids. I then copied the genome/gff files to my global reference directory and prepared it for usage by my favorite mapper:
cd ~/libraries/genome
cyoa --method indexhisat --species paeruginosa_pak
Now all of the pieces are in place for me to play. Each of the following steps was performed twice, once for the PA14 samples, once for the PAK samples. The only difference in the invocations was due to the fact that the PAK annotations provide different tags. E.g. I used the ‘Alias’ tag for PA14 and the ‘locus_tag’ tag for PAK. As a result I am only going to write down in this document the PA14 invocations and assume the reader can figure out the difference.
I have a couple of trimming methods, in this instance I just used the default and will operate under the assumption that it is sufficient until I see otherwise.
cd preprocessing
start=$(pwd)
for i in $(/bin/ls -d PA14*); do
cd $i
cyoa --method trim --input $(/bin/ls *.fastq.gz | tr '\n' ':' | sed 's/:$//g')
cd $start
done
The above command line invocation produced a series of trimming jobs which when examined look like this (I am only showing examples from PA14_exoUTY, and am leaving off the beginning and end).
## This is a portion of file:
## preprocessing/PA14_exoUTY/scripts/01trim_7_UTY_S138_R1_001.sh
module add trimomatic
mkdir -p outputs/01trimomatic
## Note that trimomatic prints all output and errors to STDERR, so send both to output
trimmomatic PE \
-threads 1 \
-phred33 \
\
7_UTY_S138_R1_001.fastq.gz 7_UTY_S138_R2_001.fastq.gz \
7_UTY_S138_R1_001-trimmed_paired.fastq 7_UTY_S138_R1_001-trimmed_unpaired.fastq \
7_UTY_S138_R2_001-trimmed_paired.fastq 7_UTY_S138_R2_001-trimmed_unpaired.fastq \
ILLUMINACLIP:/fs/cbcb-software/RedHat-8-x86_64/local/cyoa/202302/prefix/lib/perl5/auto/share/dist/Bio-Adventure/genome/adapters.fa:2:20:10:2:keepBothReads \
SLIDINGWINDOW:4:20 MINLEN:50 1>outputs/01trimomatic/7_UTY_S138_R1_001-trimomatic.stdout \
2>outputs/01trimomatic/7_UTY_S138_R1_001-trimomatic.stderr
excepted=$( { grep "Exception" "outputs/01trimomatic/7_UTY_S138_R1_001-trimomatic.stdout" || test $? = 1; } )
One thing I did not include in the above: upon completion, the script aggressively compresses the trimmed output and symbolically links it to r1_trimmed.fastq.xz and r2_trimmed.fastq.xz. Thus any following steps can use the same input name (r1_trimmed.fastq.xz:r2_trimmed.fastq.xz).
My default mappers run the actual alignment, convert it to a compressed/indexed bam, and count it against the reference genome. In this context, the counting is a little silly, but does have the potential to help find duplications and such.
cd preprocessing
start=$(pwd)
for i in $(/bin/ls -d PA14*); do
cd $i
cyoa --method hisat --input r1_trimmed.fastq.xz:r2_trimmed.fastq.xz \
--stranded no --species paeruginosa_pa14 --gff_type gene --gff_tag Alias
cd $start
done
## Here is what I ran for PAK
cd preprocessing
start=$(pwd)
for i in $(/bin/ls -d PAK*); do
cd $i
cyoa --method hisat --input r1_trimmed.fastq.xz:r2_trimmed.fastq.xz \
--stranded no --species paeruginosa_pa01 --gff_type gene --gff_tag locus_tag
cd $start
done
Similarly, I am just putting the meaty part.
module add hisat2 samtools htseq bamtools
mkdir -p outputs/40hisat2_paeruginosa_pa14
hisat2 -x ${HOME}/libraries/genome/indexes/paeruginosa_pa14 \
-p 8 \
-q -1 <(less /home/trey/sshfs/scratch/atb/dnaseq/paeruginosa_strains_202304/preprocessing/PA14_exoUTY/r1_trimmed.fastq.xz) -2 <(less /home/trey/sshfs/scratch/atb/dnaseq/paeruginosa_strains_202304/preprocessing/PA14_exoUTY/r2_trimmed.fastq.xz) \
--phred33 \
--un outputs/40hisat2_paeruginosa_pa14/unaldis_paeruginosa_pa14_genome.fastq \
--al outputs/40hisat2_paeruginosa_pa14/aldis_paeruginosa_pa14_genome.fastq \
--un-conc outputs/40hisat2_paeruginosa_pa14/unalcon_paeruginosa_pa14_genome.fastq \
--al-conc outputs/40hisat2_paeruginosa_pa14/alcon_paeruginosa_pa14_genome.fastq \
-S outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.sam \
2>outputs/40hisat2_paeruginosa_pa14/hisat2_paeruginosa_pa14_genome_PA14_exoUTY.stderr \
1>outputs/40hisat2_paeruginosa_pa14/hisat2_paeruginosa_pa14_genome_PA14_exoUTY.stdout
The above cyoa invocation also creates this script. It is a little long because it does some checks and creates a couple of filtered versions of the output.
module add samtools bamtools
echo "Starting samtools"
if [[ -f "outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam" && -f "outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.sam" ]]; then
echo "Both the bam and sam files exist, rerunning."
elif [[ -f "outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam" ]]; then
echo "The output file exists, quitting."
exit 0
elif [[ ! -f "outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.sam" ]]; then
echo "Could not find the samtools input file."
exit 1
fi
## If a previous sort file exists due to running out of memory,
## then we need to get rid of them first.
## hg38_100_genome-sorted.bam.tmp.0000.bam
if [[ -f "outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam.tmp.000.bam" ]]; then
rm -f outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam.tmp.*.bam
fi
samtools view -u -t ${HOME}/libraries/genome/paeruginosa_pa14.fasta \
-S outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.sam -o outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam \
2>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stderr \
1>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stdout
echo "First samtools command finished with $?"
samtools sort -l 9 outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam \
-o outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-sorted.bam \
2>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stderr \
1>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stdout
rm outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam
rm outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.sam
mv outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-sorted.bam outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam
samtools index outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam \
2>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stderr \
1>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stdout
echo "Second samtools command finished with $?"
bamtools stats -in outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam \
2>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stats 1>&2
echo "Bamtools finished with $?"
## The following will fail if this is single-ended.
samtools view -b -f 2 \
-o outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired.bam \
\
outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam 2>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stderr \
1>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stdout
samtools index outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired.bam \
2>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stderr \
1>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stdout
bamtools stats -in outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired.bam \
2>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stats 1>&2
bamtools filter -tag XM:0 \
-in outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam \
-out outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-sorted_nomismatch.bam \
2>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stats 1>&2
echo "bamtools filter finished with: $?"
samtools index \
\
outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-sorted_nomismatch.bam 2>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stderr \
1>>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam_samtools.stdout
echo "final samtools index finished with: $?"
Note that this step is not really useful for a dnaseq dataset in most instances. I also have the default orientation set to reverse because most of the samples off our sequencer are reversed; but that is likely not true for this dataset. If it turns out we actually care about these counts, I may need to come back and rerun these.
module add htseq
htseq-count \
-q -f bam \
-s reverse -a 0 \
--type all --idattr Alias \
\
outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired.bam \
/home/trey/libraries/genome/paeruginosa_pa14.gff 2>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.stderr \
1>outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count
xz -f -9e outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count
I tend to like to use freebayes for this. It is a little conservative, but I think it seems to work quite well. I can also use mpileup and snippy. freebayes and mpileup are setup to feed a post-processing script which I think is kind of fun and will be decribed momentarily.
cd preprocessing
start=$(pwd)
for i in $(/bin/ls -d PA14*); do
cd $i
cyoa --method freebayes \
--input outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired.bam \
--species paeruginosa_pa14 --gff_type gene --gff_tag Alias --intron 0
cd $start
done
## Here is what I ran for PAK
for i in $(/bin/ls -d PAK*); do
cd $i
cyoa --method freebayes \
--input outputs/40hisat2_paeruginosa_pa01/paeruginosa_pa01_genome-paired.bam \
--species paeruginosa_pa01 --gff_type gene --gff_tag locus_tag --intron 0
cd $start
done
Unlike hisat, I include the conversion to the binary/compressed/indexed format with the invocation of the variant search. I also include the duplicate search functionality from gatk.
module add gatk freebayes libgsl libhts samtools bcftools vcftools
mkdir -p outputs/50freebayes_paeruginosa_pa14
gatk MarkDuplicates \
-I outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome.bam \
-O outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14_genome_deduplicated.bam \
-M outputs/50freebayes_paeruginosa_pa14/deduplication_stats.txt --REMOVE_DUPLICATES true --COMPRESSION_LEVEL 9 \
2>outputs/50freebayes_paeruginosa_pa14/deduplication.stderr \
1>outputs/50freebayes_paeruginosa_pa14/deduplication.stdout
echo "Finished gatk deduplication." >> outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stdout
samtools index outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14_genome_deduplicated.bam
echo "Finished samtools index." >> outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stdout
freebayes -f /home/trey/libraries/genome/paeruginosa_pa14.fasta \
-v outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.vcf \
\
outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14_genome_deduplicated.bam 1>>outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stdout \
2>>outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stderr
echo "Finished freebayes." >> outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stdout
bcftools convert outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.vcf \
-Ob -o outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf \
2>>outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stderr \
1>>outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stdout
echo "Finished bcftools convert." >> outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stdout
bcftools index outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf \
2>>outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stderr \
1>>outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stdout
echo "Finished bcftools index." >> outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.stdout
rm outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.vcf
The result from the above freebayes script is a bcf containing the high-quality observed variants. The cyoa invocation also creates the following script, which will require a bit of explanation.
use Bio::Adventure::SNP;
my $result = $h->Bio::Adventure::SNP::SNP_Ratio_Worker(
'outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf',
input => 'paeruginosa_pa14',
species => 'freebayes',
vcf_method => '5',
vcf_cutoff => '0.8',
vcf_minpct => 'Alias',
gff_tag => 'gene',
gff_type => 'outputs/50freebayes_paeruginosa_pa14',
output_dir => 'outputs/50freebayes_paeruginosa_pa14/all_tags.txt',
output => 'outputs/50freebayes_paeruginosa_pa14/count.txt',
output_count => 'outputs/50freebayes_paeruginosa_pa14/modified.fasta',
output_genome => 'outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt',
output_by_gene => 'outputs/50freebayes_paeruginosa_pa14/pkm.txt',
output_pkm => );
The function ‘SNP_Ratio_Worker()’ reads the reference genome, the set of variants, and the genome annotations in order to create a new copy of the genome (modified.fasta) which should be equivalent to the input reads. It also rewrites the bcf data into a matrix which is easier to play with in R/python (all_tags.txt). Finally, it uses the annotation information to explicitly show the amino acid substitions observed in every ORF (variants_by_gene.txt). In theory it should also give a rpkm-esque copy of the variants observed / ORF, but I turned that off because it doesn’t seem very useful and it is a little tricky to get right.
In order to play further with the data, I will need a sample sheet. So I will start out by creating a blank one in excel (libreoffice) which contains only the samplenames in the same format as my directories in preprocessing/.
Once completed, I can use it as the input for my hpgltools package and it should extract the interesting information from the preprocessing logs and fill out the sample sheet accordingly. Lets see if it works!
Here is the before:
::kable(extract_metadata("sample_sheets/all_samples.xlsx")) knitr
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
sampleid | condition | batch | |
---|---|---|---|
PA14_exoUTY | PA14_exoUTY | undefined | undefined |
PA14_JC | PA14_JC | undefined | undefined |
PA14_lux | PA14_lux | undefined | undefined |
PA14_NBH | PA14_NBH | undefined | undefined |
PA14_pscD_A5 | PA14_pscD_A5 | undefined | undefined |
PA14_pscD_E4 | PA14_pscD_E4 | undefined | undefined |
PA14_xcp | PA14_xcp | undefined | undefined |
PA14_xcp_pscD | PA14_xcp_pscD | undefined | undefined |
PAK | PAK | undefined | undefined |
PAK_pscC | PAK_pscC | undefined | undefined |
PAK_xcp | PAK_xcp | undefined | undefined |
PAK_xcp_pscC | PAK_xcp_pscC | undefined | undefined |
Like I said, not much going on. Lets see what it looks like after I run the gatherer on it… (Note, I have been meaning to change this to drop the unused columns, but not yet).
<- make_dnaseq_spec()
spec <- c("paeruginosa_pak", "paeruginosa_pa01", "paeruginosa_pa14")
queried_species <- sm(gather_preprocessing_metadata("sample_sheets/all_samples.xlsx",
modified species = queried_species, verbose = FALSE,
specification = spec))
::kable(extract_metadata("sample_sheets/all_samples_modified.xlsx")) knitr
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
rownames | trimomaticinput | trimomaticoutput | trimomaticpercent | hisatgenomesingleconcordantpaeruginosapak | hisatgenomesingleconcordantpaeruginosapa01 | hisatgenomesingleconcordantpaeruginosapa14 | hisatgenomemulticoncordantpaeruginosapak | hisatgenomemulticoncordantpaeruginosapa01 | hisatgenomemulticoncordantpaeruginosapa14 | hisatgenomesingleallpaeruginosapak | hisatgenomesingleallpaeruginosapa01 | hisatgenomesingleallpaeruginosapa14 | hisatgenomemultiallpaeruginosapak | hisatgenomemultiallpaeruginosapa01 | hisatgenomemultiallpaeruginosapa14 | gatkunpairedpaeruginosapak | gatkunpairedpaeruginosapa01 | gatkunpairedpaeruginosapa14 | gatkpairedpaeruginosapak | gatkpairedpaeruginosapa01 | gatkpairedpaeruginosapa14 | gatksupplementarypaeruginosapak | gatksupplementarypaeruginosapa01 | gatksupplementarypaeruginosapa14 | gatkunmappedpaeruginosapak | gatkunmappedpaeruginosapa01 | gatkunmappedpaeruginosapa14 | gatkunpairedduplicatespaeruginosapak | gatkunpairedduplicatespaeruginosapa01 | gatkunpairedduplicatespaeruginosapa14 | gatkpairedduplicatespaeruginosapak | gatkpairedduplicatespaeruginosapa01 | gatkpairedduplicatespaeruginosapa14 | gatkpairedoptduplicatespaeruginosapak | gatkpairedoptduplicatespaeruginosapa01 | gatkpairedoptduplicatespaeruginosapa14 | gatkduplicatepctpaeruginosapak | gatkduplicatepctpaeruginosapa01 | gatkduplicatepctpaeruginosapa14 | gatklibsizepaeruginosapak | gatklibsizepaeruginosapa01 | gatklibsizepaeruginosapa14 | variantsobservedpaeruginosapak | variantsobservedpaeruginosapa01 | variantsobservedpaeruginosapa14 | hisatcounttablepaeruginosapak | hisatcounttablepaeruginosapa01 | hisatcounttablepaeruginosapa14 | variantsbygenefilepaeruginosapak | variantsbygenefilepaeruginosapa01 | variantsbygenefilepaeruginosapa14 | variantsbcftablepaeruginosapak | variantsbcftablepaeruginosapa01 | variantsbcftablepaeruginosapa14 | variantsmodifiedgenomepaeruginosapak | variantsmodifiedgenomepaeruginosapa01 | variantsmodifiedgenomepaeruginosapa14 | variantsbcffilepaeruginosapak | variantsbcffilepaeruginosapa01 | variantsbcffilepaeruginosapa14 | variantspenetrancefilepaeruginosapak | variantspenetrancefilepaeruginosapa01 | variantspenetrancefilepaeruginosapa14 | condition | batch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PA14_exoUTY | PA14_exoUTY | 4706372 | 4350712 | 0.924 | 3690737 | NA | 4280240 | 664 | NA | 25722 | 307644 | NA | 34468 | 664 | NA | 25722 | 0 | 4305962 | 83320 | 0 | 0 | 823232 | 81874 | 0.191184 | 10580352 | 199 | preprocessing/PA14_exoUTY/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PA14_exoUTY/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count.xz | preprocessing/PA14_exoUTY/outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt.xz | preprocessing/PA14_exoUTY/outputs/50freebayes_paeruginosa_pa14/all_tags.txt.xz | preprocessing/PA14_exoUTY/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14-PA14_exoUTY.fasta | preprocessing/PA14_exoUTY/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf | preprocessing/PA14_exoUTY/outputs/50freebayes_paeruginosa_pa14/variants_penetrance.txt.xz | undefined | undefined | |||||||||||||||||||||||||||||||
PA14_JC | PA14_JC | 5786839 | 5336197 | 0.922 | 4550108 | NA | 5275687 | 1028 | NA | 32239 | 330047 | NA | 23521 | 1028 | NA | 32239 | 0 | 5307926 | 107378 | 0 | 0 | 1127763 | 103875 | 0.212468 | 11426698 | 196 | preprocessing/PA14_JC/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PA14_JC/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count.xz | preprocessing/PA14_JC/outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt.xz | preprocessing/PA14_JC/outputs/50freebayes_paeruginosa_pa14/all_tags.txt.xz | preprocessing/PA14_JC/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14-PA14_JC.fasta | preprocessing/PA14_JC/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf | preprocessing/PA14_JC/outputs/50freebayes_paeruginosa_pa14/variants_penetrance.txt.xz | undefined | undefined | |||||||||||||||||||||||||||||||
PA14_lux | PA14_lux | 6622570 | 6099776 | 0.921 | 5205608 | NA | 6028065 | 999 | NA | 36827 | 354183 | NA | 24917 | 999 | NA | 36827 | 0 | 6064892 | 121596 | 0 | 0 | 1432296 | 127776 | 0.236162 | 11448973 | 197 | preprocessing/PA14_lux/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PA14_lux/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count.xz | preprocessing/PA14_lux/outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt.xz | preprocessing/PA14_lux/outputs/50freebayes_paeruginosa_pa14/all_tags.txt.xz | preprocessing/PA14_lux/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14-PA14_lux.fasta | preprocessing/PA14_lux/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf | preprocessing/PA14_lux/outputs/50freebayes_paeruginosa_pa14/variants_penetrance.txt.xz | undefined | undefined | |||||||||||||||||||||||||||||||
PA14_NBH | PA14_NBH | 5151127 | 4581433 | 0.889 | 3883544 | NA | 4516421 | 711 | NA | 26915 | 313829 | NA | 32560 | 711 | NA | 26915 | 0 | 4543336 | 88454 | 0 | 0 | 896720 | 83669 | 0.19737 | 10694127 | 196 | preprocessing/PA14_NBH/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PA14_NBH/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count.xz | preprocessing/PA14_NBH/outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt.xz | preprocessing/PA14_NBH/outputs/50freebayes_paeruginosa_pa14/all_tags.txt.xz | preprocessing/PA14_NBH/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14-PA14_NBH.fasta | preprocessing/PA14_NBH/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf | preprocessing/PA14_NBH/outputs/50freebayes_paeruginosa_pa14/variants_penetrance.txt.xz | undefined | undefined | |||||||||||||||||||||||||||||||
PA14_pscD_A5 | PA14_pscD_A5 | 5898210 | 5417082 | 0.918 | 4579807 | NA | 5359077 | 989 | NA | 32852 | 338340 | NA | 21806 | 989 | NA | 32852 | 0 | 5391929 | 110988 | 0 | 0 | 1134595 | 110041 | 0.210425 | 11790543 | 204 | preprocessing/PA14_pscD_A5/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PA14_pscD_A5/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count.xz | preprocessing/PA14_pscD_A5/outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt.xz | preprocessing/PA14_pscD_A5/outputs/50freebayes_paeruginosa_pa14/all_tags.txt.xz | preprocessing/PA14_pscD_A5/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14-PA14_pscD_A5.fasta | preprocessing/PA14_pscD_A5/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf | preprocessing/PA14_pscD_A5/outputs/50freebayes_paeruginosa_pa14/variants_penetrance.txt.xz | undefined | undefined | |||||||||||||||||||||||||||||||
PA14_pscD_E4 | PA14_pscD_E4 | 5854559 | 5418227 | 0.925 | 4589839 | NA | 5361935 | 920 | NA | 33424 | 325823 | NA | 19561 | 920 | NA | 33424 | 0 | 5395359 | 112228 | 0 | 0 | 1204336 | 118910 | 0.223217 | 10998073 | 203 | preprocessing/PA14_pscD_E4/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PA14_pscD_E4/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count.xz | preprocessing/PA14_pscD_E4/outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt.xz | preprocessing/PA14_pscD_E4/outputs/50freebayes_paeruginosa_pa14/all_tags.txt.xz | preprocessing/PA14_pscD_E4/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14-PA14_pscD_E4.fasta | preprocessing/PA14_pscD_E4/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf | preprocessing/PA14_pscD_E4/outputs/50freebayes_paeruginosa_pa14/variants_penetrance.txt.xz | undefined | undefined | |||||||||||||||||||||||||||||||
PA14_xcp | PA14_xcp | 5683132 | 5214875 | 0.918 | 4430035 | NA | 5134054 | 1300 | NA | 30352 | 356590 | NA | 39479 | 1300 | NA | 30352 | 0 | 5164406 | 97676 | 0 | 0 | 1092613 | 103935 | 0.211566 | 11202466 | 195 | preprocessing/PA14_xcp/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PA14_xcp/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count.xz | preprocessing/PA14_xcp/outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt.xz | preprocessing/PA14_xcp/outputs/50freebayes_paeruginosa_pa14/all_tags.txt.xz | preprocessing/PA14_xcp/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14-PA14_xcp.fasta | preprocessing/PA14_xcp/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf | preprocessing/PA14_xcp/outputs/50freebayes_paeruginosa_pa14/variants_penetrance.txt.xz | undefined | undefined | |||||||||||||||||||||||||||||||
PA14_xcp_pscD | PA14_xcp_pscD | 2026150 | 1514509 | 0.747 | 1223766 | NA | 1471930 | 191 | NA | 10238 | 137765 | NA | 27094 | 191 | NA | 10238 | 0 | 1482168 | 41804 | 0 | 0 | 188223 | 20007 | 0.126992 | 5857317 | 216 | preprocessing/PA14_xcp_pscD/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PA14_xcp_pscD/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sreverse_all_Alias.count.xz | preprocessing/PA14_xcp_pscD/outputs/50freebayes_paeruginosa_pa14/variants_by_gene.txt.xz | preprocessing/PA14_xcp_pscD/outputs/50freebayes_paeruginosa_pa14/all_tags.txt.xz | preprocessing/PA14_xcp_pscD/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14-PA14_xcp_pscD.fasta | preprocessing/PA14_xcp_pscD/outputs/50freebayes_paeruginosa_pa14/paeruginosa_pa14.bcf | preprocessing/PA14_xcp_pscD/outputs/50freebayes_paeruginosa_pa14/variants_penetrance.txt.xz | undefined | undefined | |||||||||||||||||||||||||||||||
PAK | PAK | 4779558 | 4318745 | 0.904 | 4049183 | 3925784 | 3731781 | 1093 | 22836 | 19482 | 179265 | 179736 | 308170 | 1093 | 22836 | 19482 | 168494 | 0 | 4092362 | 3948620 | 3393 | 99172 | 284272 | 0 | 126668 | 0 | 902830 | 873059 | 98051 | 94715 | 0.231327 | 0.221105 | 8530650 | 8207871 | 333 | 26786 | preprocessing/PAK/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sreverse_all_locus_tag.count.xz | preprocessing/PAK/outputs/40hisat2_paeruginosa_pa01/paeruginosa_pa01_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PAK/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sno_gene_Alias.count.xz | preprocessing/PAK/outputs/50freebayes_paeruginosa_pak/variants_by_gene.txt.xz | preprocessing/PAK/outputs/50freebayes_paeruginosa_pa01/variants_by_gene.txt.xz | preprocessing/PAK/outputs/50freebayes_paeruginosa_pak/all_tags.txt.xz | preprocessing/PAK/outputs/50freebayes_paeruginosa_pa01/all_tags.txt.xz | preprocessing/PAK/outputs/50freebayes_paeruginosa_pak/paeruginosa_pak-PAK.fasta | preprocessing/PAK/outputs/50freebayes_paeruginosa_pak/paeruginosa_pak.bcf | preprocessing/PAK/outputs/50freebayes_paeruginosa_pa01/paeruginosa_pa01.bcf | preprocessing/PAK/outputs/50freebayes_paeruginosa_pak/variants_penetrance.txt.xz | preprocessing/PAK/outputs/50freebayes_paeruginosa_pa01/variants_penetrance.txt.xz | undefined | undefined | ||||||||||||||||
PAK_pscC | PAK_pscC | 5734960 | 5271470 | 0.919 | 5090759 | 4853631 | 4638113 | 1343 | 29403 | 25963 | 116625 | 126483 | 266811 | 1343 | 29403 | 25963 | 115014 | 0 | 5097071 | 4883034 | 4081 | 142224 | 233784 | 0 | 92389 | 0 | 1102959 | 1056234 | 106532 | 102207 | 0.222938 | 0.216307 | 10771706 | 10325724 | 373 | 26856 | preprocessing/PAK_pscC/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sreverse_gene_locus_tag.count.xz | preprocessing/PAK_pscC/outputs/40hisat2_paeruginosa_pa01/paeruginosa_pa01_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PAK_pscC/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sno_gene_Alias.count.xz | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pak/variants_by_gene.txt.xz | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pa01/variants_by_gene.txt.xz | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pak/all_tags.txt.xz | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pa01/all_tags.txt.xz | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pak/paeruginosa_pak-PAK_pscC.fasta | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pak/paeruginosa_pak.bcf | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pa01/paeruginosa_pa01.bcf | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pak/variants_penetrance.txt.xz | preprocessing/PAK_pscC/outputs/50freebayes_paeruginosa_pa01/variants_penetrance.txt.xz | undefined | undefined | ||||||||||||||||
PAK_xcp | PAK_xcp | 4843414 | 4443669 | 0.917 | 4293814 | 4088322 | 3904688 | 977 | 24341 | 21592 | 96933 | 110085 | 229330 | 977 | 24341 | 21592 | 95387 | 0 | 4299065 | 4112663 | 3145 | 116582 | 193821 | 0 | 73985 | 0 | 889293 | 849827 | 90548 | 86772 | 0.213098 | 0.206637 | 9634785 | 9231062 | 363 | 26756 | preprocessing/PAK_xcp/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sreverse_gene_locus_tag.count.xz | preprocessing/PAK_xcp/outputs/40hisat2_paeruginosa_pa01/paeruginosa_pa01_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PAK_xcp/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sno_gene_Alias.count.xz | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pak/variants_by_gene.txt.xz | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pa01/variants_by_gene.txt.xz | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pak/all_tags.txt.xz | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pa01/all_tags.txt.xz | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pak/paeruginosa_pak-PAK_xcp.fasta | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pak/paeruginosa_pak.bcf | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pa01/paeruginosa_pa01.bcf | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pak/variants_penetrance.txt.xz | preprocessing/PAK_xcp/outputs/50freebayes_paeruginosa_pa01/variants_penetrance.txt.xz | undefined | undefined | ||||||||||||||||
PAK_xcp_pscC | PAK_xcp_pscC | 5195158 | 4611474 | 0.888 | 4344138 | 4220150 | 4008749 | 1070 | 23899 | 20629 | 177601 | 174592 | 308658 | 1070 | 23899 | 20629 | 168983 | 0 | 4376720 | 4244049 | 3164 | 102714 | 300525 | 0 | 129907 | 0 | 943947 | 917470 | 92772 | 89839 | 0.226149 | 0.216178 | 9299455 | 8989454 | 384 | 26949 | preprocessing/PAK_xcp_pscC/outputs/40hisat2_paeruginosa_pak/paeruginosa_pak_genome-paired_sreverse_gene_locus_tag.count.xz | preprocessing/PAK_xcp_pscC/outputs/40hisat2_paeruginosa_pa01/paeruginosa_pa01_genome-paired_sno_gene_locus_tag.count.xz | preprocessing/PAK_xcp_pscC/outputs/40hisat2_paeruginosa_pa14/paeruginosa_pa14_genome-paired_sno_gene_Alias.count.xz | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pak/variants_by_gene.txt.xz | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pa01/variants_by_gene.txt.xz | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pak/all_tags.txt.xz | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pa01/all_tags.txt.xz | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pak/paeruginosa_pak-PAK_xcp_pscC.fasta | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pak/paeruginosa_pak.bcf | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pa01/paeruginosa_pa01.bcf | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pak/variants_penetrance.txt.xz | preprocessing/PAK_xcp_pscC/outputs/50freebayes_paeruginosa_pa01/variants_penetrance.txt.xz | undefined | undefined |
I reran the missing PAK samples and looked into the logs. It may be the case that the PAK genome I downloaded is of somewhat lower quality than the PA14 and that is skewing the results somewhat.
Lets go one small step further. I have a series of modified genomes as well as the reference. We can do a quickie tree of them: First I will copy each modified genome to the tree/ directory and rename them to the sampleID.
start=$(pwd)
mkdir tree
cd preprocessing
for i in $(/bin/ls -d PA*); do
cp $i/outputs/50*/paeruginosa_pak-*.fasta ${start}/tree/
cp $i/outputs/50*/paeruginosa_pa14-*.fasta ${start}/tree/
done
cd $start
cp ~/libraries/genome/paeruginosa_pa14.fa ${start}/tree
cp ~/libraries/genome/paeruginosa_pak.fa ${start}/tree
Oh, it turns out that at the time of this writing, I forgot to run 3 samples, so this section will need to be redone. But I can at least run it for the samples that I didn’t forget.
<- genomic_sequence_phylo("tree", root = "paeruginosa_pa14") funkytown
## Reading tree/PA14_exoUTY.fasta
## Reading tree/PA14_JC.fasta
## Reading tree/PA14_lux.fasta
## Reading tree/PA14_NBH.fasta
## Reading tree/PA14_pscD_A5.fasta
## Reading tree/PA14_pscD_E4.fasta
## Reading tree/PA14_xcp_pscD.fasta
## Reading tree/paeruginosa_pa14.fasta
## Reading tree/paeruginosa_pak-PAK_pscC.fasta
## Reading tree/paeruginosa_pak-PAK_xcp_pscC.fasta
## Reading tree/paeruginosa_pak-PAK_xcp.fasta
## Reading tree/paeruginosa_pak-PAK.fasta
## Reading tree/paeruginosa_pak.fasta
## Reading tree/PAK_pscC.fasta
## Reading tree/PAK_xcp_pscC.fasta
## Reading tree/PAK_xcp.fasta
## Reading tree/PAK.fasta
plot(funkytown$phy)
The counts from hisat in theory are not very interesting for DNAseq data, except in this instance we want to see the coverage of the knockouts.
<- load_gff_annotations("~/libraries/genome/paeruginosa_pa14.gff", type = "gene", id_col = "Alias") pa14_annot
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo = TRUE)
## Had a successful gff import with rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo = TRUE)
## Returning a df with 16 columns and 5979 rows.
rownames(pa14_annot) <- pa14_annot[["Alias"]]
<- create_expt("sample_sheets/all_samples_modified.xlsx", gene_info = pa14_annot,
pa14_expt file_column = "hisatcounttablepaeruginosapa14")
## Reading the sample metadata.
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## The sample definitions comprises: 12 rows(samples) and 66 columns(metadata fields).
## Matched 5979 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'subset_expt' for signature '"ExpressionSet"'
<- write_expt(pa14_expt, excel = "excel/pa14_strains.xlsx") pa14_write
## Deleting the file excel/pa14_strains.xlsx before writing the tables.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'exprs': object 'pa14_expt' not found
<- load_gff_annotations("~/libraries/genome/paeruginosa_pak.gff", type = "gene", id_col = "locus_tag") pak_annot
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo = TRUE)
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo = FALSE)
## Had a successful gff import with rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo = FALSE)
## Returning a df with 35 columns and 5871 rows.
rownames(pak_annot) <- pak_annot[["locus_tag"]]
<- create_expt("sample_sheets/all_samples_modified.xlsx", file_column = "hisatcounttablepaeruginosapak") pak_expt
## Reading the sample metadata.
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## The sample definitions comprises: 12 rows(samples) and 66 columns(metadata fields).
## Matched 5871 annotations and counts.
## Bringing together the count matrix and gene information.
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'subset_expt' for signature '"ExpressionSet"'
<- write_expt(pak_expt, excel = "excel/pak_strains.xlsx") pak_write
## Deleting the file excel/pak_strains.xlsx before writing the tables.
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'exprs': object 'pak_expt' not found
<- pData(pa14_expt)[["variantspenetrancefilepaeruginosapa14"]] pa14_variants
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'pa14_expt' not found
names(pa14_variants) <- rownames(pData(pa14_expt))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': error in evaluating the argument 'object' in selecting a method for function 'pData': object 'pa14_expt' not found
<- init_xlsx(excel = "excel/pa14_variants.xlsx") start
## Deleting the file excel/pa14_variants.xlsx before writing the tables.
<- start[["wb"]]
wb for (s in seq_len(length(pa14_variants))) {
<- names(pa14_variants)[[s]]
sample_name if (pa14_variants[[s]] == "") {
next
}<- readr::read_tsv(pa14_variants[[s]])
sample_data if (nrow(sample_data) == 0) {
next
}<- write_xlsx(data = sample_data, sheet = sample_name, wb = wb)
written }
## Error in eval(expr, envir, enclos): object 'pa14_variants' not found
<- openxlsx::saveWorkbook(written[["workbook"]], file = "excel/pa14_variants.xlsx") saved
## Error in "Workbook" %in% class(wb): object 'written' not found
<- pData(pak_expt)[["variantspenetrancefilepaeruginosapak"]] pak_variants
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'pak_expt' not found
names(pak_variants) <- rownames(pData(pak_expt))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': error in evaluating the argument 'object' in selecting a method for function 'pData': object 'pak_expt' not found
<- init_xlsx(excel = "excel/pak_variants.xlsx") start
## Deleting the file excel/pak_variants.xlsx before writing the tables.
<- start[["wb"]]
wb for (s in seq_len(length(pak_variants))) {
<- names(pak_variants)[[s]]
sample_name if (pak_variants[[s]] == "") {
next
}<- readr::read_tsv(pak_variants[[s]])
sample_data if (nrow(sample_data) == 0) {
next
}<- write_xlsx(data = sample_data, sheet = sample_name, wb = wb)
written }
## Error in eval(expr, envir, enclos): object 'pak_variants' not found
<- openxlsx::saveWorkbook(written[["workbook"]], file = "excel/pak_variants.xlsx") saved
## Error in "Workbook" %in% class(wb): object 'written' not found
In this following block we will instead write out the nt/aa mutations of CDS/proteins.
<- pData(pa14_expt)[["variantsbygenefilepaeruginosapa14"]] pa14_mutations
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'pa14_expt' not found
names(pa14_mutations) <- rownames(pData(pa14_expt))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': error in evaluating the argument 'object' in selecting a method for function 'pData': object 'pa14_expt' not found
<- init_xlsx(excel = "excel/pa14_mutations.xlsx") start
## Deleting the file excel/pa14_mutations.xlsx before writing the tables.
<- start[["wb"]]
wb for (s in seq_len(length(pa14_mutations))) {
<- names(pa14_mutations)[[s]]
sample_name if (pa14_mutations[[s]] == "") {
next
}<- readr::read_tsv(pa14_mutations[[s]])
sample_data if (nrow(sample_data) == 0) {
next
}<- write_xlsx(data = sample_data, sheet = sample_name, wb = wb)
written }
## Error in eval(expr, envir, enclos): object 'pa14_mutations' not found
<- openxlsx::saveWorkbook(written[["workbook"]], file = "excel/pa14_mutations.xlsx") saved
## Error in "Workbook" %in% class(wb): object 'written' not found
<- pData(pak_expt)[["variantsbygenefilepaeruginosapak"]] pak_mutations
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'pData': object 'pak_expt' not found
names(pak_mutations) <- rownames(pData(pak_expt))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': error in evaluating the argument 'object' in selecting a method for function 'pData': object 'pak_expt' not found
<- init_xlsx(excel = "excel/pak_mutations.xlsx") start
## Deleting the file excel/pak_mutations.xlsx before writing the tables.
<- start[["wb"]]
wb for (s in seq_len(length(pak_mutations))) {
<- names(pak_mutations)[[s]]
sample_name if (pak_mutations[[s]] == "") {
next
}<- readr::read_tsv(pak_mutations[[s]])
sample_data if (nrow(sample_data) == 0) {
next
}<- write_xlsx(data = sample_data, sheet = sample_name, wb = wb)
written }
## Error in eval(expr, envir, enclos): object 'pak_mutations' not found
<- openxlsx::saveWorkbook(written[["workbook"]], file = "excel/pak_mutations.xlsx") saved
## Error in "Workbook" %in% class(wb): object 'written' not found
::pander(sessionInfo()) pander
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: hpgltools(v.1.0), testthat(v.3.1.7), reticulate(v.1.28), SummarizedExperiment(v.1.28.0), GenomicRanges(v.1.50.2), GenomeInfoDb(v.1.34.9), IRanges(v.2.32.0), S4Vectors(v.0.36.2), MatrixGenerics(v.1.10.0), matrixStats(v.0.63.0), Biobase(v.2.58.0) and BiocGenerics(v.0.44.0)
loaded via a namespace (and not attached): rappdirs(v.0.3.3), rtracklayer(v.1.58.0), tidyr(v.1.3.0), ggplot2(v.3.4.2), clusterGeneration(v.1.3.7), bit64(v.4.0.5), knitr(v.1.42), DelayedArray(v.0.24.0), data.table(v.1.14.8), KEGGREST(v.1.38.0), RCurl(v.1.98-1.12), doParallel(v.1.0.17), generics(v.0.1.3), GenomicFeatures(v.1.50.4), callr(v.3.7.3), RhpcBLASctl(v.0.23-42), cowplot(v.1.1.1), usethis(v.2.1.6), RSQLite(v.2.3.1), shadowtext(v.0.1.2), bit(v.4.0.5), enrichplot(v.1.18.3), xml2(v.1.3.3), httpuv(v.1.6.9), viridis(v.0.6.2), xfun(v.0.38), hms(v.1.1.3), jquerylib(v.0.1.4), evaluate(v.0.20), promises(v.1.2.0.1), fansi(v.1.0.4), restfulr(v.0.0.15), progress(v.1.2.2), caTools(v.1.18.2), dbplyr(v.2.3.2), igraph(v.1.4.1), DBI(v.1.1.3), htmlwidgets(v.1.6.2), purrr(v.1.0.1), ellipsis(v.0.3.2), dplyr(v.1.1.1), backports(v.1.4.1), annotate(v.1.76.0), aod(v.1.3.2), biomaRt(v.2.54.1), vctrs(v.0.6.1), remotes(v.2.4.2), cachem(v.1.0.7), withr(v.2.5.0), ggforce(v.0.4.1), HDO.db(v.0.99.1), GenomicAlignments(v.1.34.1), treeio(v.1.22.0), prettyunits(v.1.1.1), kmer(v.1.1.2), DOSE(v.3.24.2), ape(v.5.7-1), lazyeval(v.0.2.2), crayon(v.1.5.2), genefilter(v.1.80.3), edgeR(v.3.40.2), pkgconfig(v.2.0.3), tweenr(v.2.0.2), nlme(v.3.1-162), pkgload(v.1.3.2), devtools(v.2.4.5), rlang(v.1.1.0), lifecycle(v.1.0.3), miniUI(v.0.1.1.1), downloader(v.0.4), filelock(v.1.0.2), BiocFileCache(v.2.6.1), rprojroot(v.2.0.3), polyclip(v.1.10-4), graph(v.1.76.0), Matrix(v.1.5-4), aplot(v.0.1.10), boot(v.1.3-28.1), processx(v.3.8.0), png(v.0.1-8), viridisLite(v.0.4.1), rjson(v.0.2.21), bitops(v.1.0-7), gson(v.0.1.0), KernSmooth(v.2.23-20), pander(v.0.6.5), Biostrings(v.2.66.0), blob(v.1.2.4), phylogram(v.2.1.0), stringr(v.1.5.0), qvalue(v.2.30.0), remaCor(v.0.0.11), gridGraphics(v.0.5-1), scales(v.1.2.1), memoise(v.2.0.1), GSEABase(v.1.60.0), magrittr(v.2.0.3), plyr(v.1.8.8), gplots(v.3.1.3), zlibbioc(v.1.44.0), compiler(v.4.2.0), scatterpie(v.0.1.8), BiocIO(v.1.8.0), RColorBrewer(v.1.1-3), lme4(v.1.1-32), Rsamtools(v.2.14.0), cli(v.3.6.1), XVector(v.0.38.0), urlchecker(v.1.0.1), patchwork(v.1.1.2), ps(v.1.7.4), MASS(v.7.3-58.3), mgcv(v.1.8-41), tidyselect(v.1.2.0), stringi(v.1.7.12), highr(v.0.10), yaml(v.2.3.7), GOSemSim(v.2.24.0), locfit(v.1.5-9.7), ggrepel(v.0.9.3), grid(v.4.2.0), sass(v.0.4.5), fastmatch(v.1.1-3), tools(v.4.2.0), parallel(v.4.2.0), rstudioapi(v.0.14), foreach(v.1.5.2), gridExtra(v.2.3), farver(v.2.1.1), ggraph(v.2.1.0), digest(v.0.6.31), shiny(v.1.7.4), Rcpp(v.1.0.10), broom(v.1.0.4), later(v.1.3.0), httr(v.1.4.5), AnnotationDbi(v.1.60.2), Rdpack(v.2.4), colorspace(v.2.1-0), brio(v.1.1.3), XML(v.3.99-0.14), fs(v.1.6.1), splines(v.4.2.0), yulab.utils(v.0.0.6), PROPER(v.1.30.0), tidytree(v.0.4.2), graphlayouts(v.0.8.4), ggplotify(v.0.1.0), plotly(v.4.10.1), sessioninfo(v.1.2.2), xtable(v.1.8-4), jsonlite(v.1.8.4), nloptr(v.2.0.3), ggtree(v.3.6.2), tidygraph(v.1.2.3), ggfun(v.0.0.9), R6(v.2.5.1), RUnit(v.0.4.32), profvis(v.0.3.7), pillar(v.1.9.0), htmltools(v.0.5.5), mime(v.0.12), glue(v.1.6.2), fastmap(v.1.1.1), minqa(v.1.2.5), clusterProfiler(v.4.6.2), BiocParallel(v.1.32.6), codetools(v.0.2-19), fgsea(v.1.24.0), pkgbuild(v.1.4.0), mvtnorm(v.1.1-3), utf8(v.1.2.3), lattice(v.0.20-45), bslib(v.0.4.2), tibble(v.3.2.1), sva(v.3.46.0), pbkrtest(v.0.5.2), curl(v.5.0.0), gtools(v.3.9.4), zip(v.2.2.2), GO.db(v.3.16.0), openxlsx(v.4.2.5.2), survival(v.3.5-5), limma(v.3.54.2), rmarkdown(v.2.21), desc(v.1.4.2), munsell(v.0.5.0), GenomeInfoDbData(v.1.2.9), iterators(v.1.0.14), variancePartition(v.1.28.9), reshape2(v.1.4.4), gtable(v.0.3.3) and rbibutils(v.2.2.13)
message("This is hpgltools commit: ", get_git_commit())
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset bbc75e24b763faa635cb62c86fc51c0efb3424a1
## This is hpgltools commit: Fri Apr 21 14:33:16 2023 -0400: bbc75e24b763faa635cb62c86fc51c0efb3424a1
<- paste0(gsub(pattern = "\\.Rmd", replace = "", x = rmd_file), "-v", ver, ".rda.xz")
this_save message("Saving to ", this_save)
## Saving to index-v20230501.rda.xz
<- sm(saveme(filename = this_save)) tmp