Preprocessing some Clinical Samples
Table of Contents
1. Introduction
This document is revisiting my SL processing tasks. There are a few new aspects which I am introducing with this round. One important note is that I created a series of variables to describe the species we are comparing against. This is due to the fact that when I performed PCA of the samples we observed close clustering to the two primary zymodemes z2.2 and z2.3 along with a groups of samples which grouped with neither, but instead are most similar to an intermediate strain, which in the TMRC2 samples was called 13794, thus I named that z13794. Thus we are now using 4 references: the MHOM/COL/81L13 reference and three modified versions of it: z22, z23, and z13794.
This document will handle tasks peformed when preprocessing clinical samples. The following block is responsible for environment variables which will be use throughout. Note that I recently explicitly set a zymodeme 2.1 reference using some of these samples.
export LP_SP=lpanamensis_v68 export HS_SP=hg38_115 export LP_TYPE=protein_coding_gene export HS_TYPE=gene export LP_TAG=ID export HS_TAG=ID export LP_22=lpanamensis_z22 export LP_23=lpanamensis_z23 export LP_21=lpanamensis_z21
Now let us move into the tree and check that everything is in place.
cd preprocessing start=$(pwd) ls | head ls -ltr ~/libraries/genome/hg38*.fasta ls -ltr ~/libraries/genome/lpanamensis*.fasta
For the moment at least, the following loop is the primary driver for running the various tools.
prefix=202510 cd preprocessing start=$(pwd) module add cyoa samples=$(/bin/ls -d PRCS*) for s in ${samples}; do cd ${start}/${s} if [[ -d outputs/ ]]; then rm -r outputs/ scripts/ fi if [[ -z "${LP_SP}" ]]; then echo "You forgot to set the environment variables." sleep 500 fi found=$(compgen -G *.fastq.gz) if [[ -n "${found}" ]]; then mkdir unprocessed && mv *.fastq.gz unprocessed/ fi cyoa --method prnaseq --species ${LP_SP} \ --gff_type ${LP_TYPE} --gff_tag ${LP_TAG} \ --introns 0 --stranded reverse --jprefix $prefix \ --input $(/bin/ls unprocessed/*.fastq.gz | tr '\n' ':' | sed 's/:$//g') trimmed=$(/bin/ls outputs/01trimomatic/*-trimmed.fastq* | tr '\n' ':') cyoa --method dantools --introns 0 --species $LP_SP --gff_cds_parent_type $LP_TYPE --gff_tag $LP_TAG \ --input $trimmed --jprefix $prefix bam=$(/bin/ls outputs/03hisat_lpanamensis_v68/*-paired.bam) cyoa --method freebayes --introns 0 --species $LP_SP --gff_type $LP_TYPE --gff_tag $LP_TAG \ --input $bam --jprefix $prefix cyoa --method freebayes --introns 0 --species $LP_22 --gff_type $LP_TYPE --gff_tag $LP_TAG \ --input $bam --jprefix $prefix cyoa --method freebayes --introns 0 --species $LP_23 --gff_type $LP_TYPE --gff_tag $LP_TAG \ --input $bam --jprefix $prefix cyoa --method freebayes --introns 0 --species $LP_21 --gff_type $LP_TYPE --gff_tag $LP_TAG \ --input $bam --jprefix $prefix bcf_ref=$(/bin/ls outputs/${prefix}freebayes_${LP_SP}/${LP_SP}.bcf) ## The AB flag is the ratio of reference/new observed cyoa --method parsebcf --introns 0 --species $LP_SP --gff_type $LP_TYPE --gff_tag $LP_TAG \ --input $bcf_ref --jprefix $prefix --max_value 1.0 --min_value 0 --chosen_tag AB cyoa --method parsebcf --introns 0 --species $LP_SP --gff_type $LP_TYPE --gff_tag $LP_TAG \ --input $bcf_ref --jprefix $prefix --max_value 0.8 --min_value 0 --chosen_tag AB cyoa --method parsebcf --introns 0 --species $LP_SP --gff_type $LP_TYPE --gff_tag $LP_TAG \ --input $bcf_ref --jprefix $prefix --max_value 0.2 --min_value 0 --chosen_tag AB done cd $start
This is a citation in the PNAS style.(1)