Preprocessing some Clinical Samples

Table of Contents

<input type="button" id="btn" value="View source document" > <script> function download(filename) { //creating an invisible element let element = document.createElement('a'); element.href = filename; element.type = 'text/plain'; element.download = filename.split('').pop(); document.body.appendChild(element); element.click(); document.body.removeChild(element); } // Add the download document.getElementById("btn") .addEventListener("click", function () { download(source_filename); }, false); </script>

1. Introduction

This document is revisiting my SL processing tasks. There are a few new aspects which I am introducing with this round. One important note is that I created a series of variables to describe the species we are comparing against. This is due to the fact that when I performed PCA of the samples we observed close clustering to the two primary zymodemes z2.2 and z2.3 along with a groups of samples which grouped with neither, but instead are most similar to an intermediate strain, which in the TMRC2 samples was called 13794, thus I named that z13794. Thus we are now using 4 references: the MHOM/COL/81L13 reference and three modified versions of it: z22, z23, and z13794.

This document will handle tasks peformed when preprocessing clinical samples. The following block is responsible for environment variables which will be use throughout. Note that I recently explicitly set a zymodeme 2.1 reference using some of these samples.

export LP_SP=lpanamensis_v68
export HS_SP=hg38_115
export LP_TYPE=protein_coding_gene
export HS_TYPE=gene
export LP_TAG=ID
export HS_TAG=ID
export LP_22=lpanamensis_z22
export LP_23=lpanamensis_z23
export LP_21=lpanamensis_z21

Now let us move into the tree and check that everything is in place.

cd preprocessing
start=$(pwd)
ls | head
ls -ltr ~/libraries/genome/hg38*.fasta
ls -ltr ~/libraries/genome/lpanamensis*.fasta

For the moment at least, the following loop is the primary driver for running the various tools.

prefix=202510
cd preprocessing
start=$(pwd)
module add cyoa
samples=$(/bin/ls -d PRCS*)
for s in ${samples}; do
    cd ${start}/${s}
    if [[ -d outputs/ ]]; then
        rm -r outputs/ scripts/
    fi
    if [[ -z "${LP_SP}" ]]; then
        echo "You forgot to set the environment variables."
        sleep 500
    fi
    found=$(compgen -G *.fastq.gz)
    if [[ -n "${found}" ]]; then
        mkdir unprocessed && mv *.fastq.gz unprocessed/
    fi
    cyoa --method prnaseq --species ${LP_SP} \
         --gff_type ${LP_TYPE} --gff_tag ${LP_TAG} \
         --introns 0 --stranded reverse --jprefix $prefix \
         --input $(/bin/ls unprocessed/*.fastq.gz | tr '\n' ':' | sed 's/:$//g')
    trimmed=$(/bin/ls outputs/01trimomatic/*-trimmed.fastq* | tr '\n' ':')
    cyoa --method dantools --introns 0 --species $LP_SP --gff_cds_parent_type $LP_TYPE --gff_tag $LP_TAG \
         --input $trimmed --jprefix $prefix
    bam=$(/bin/ls  outputs/03hisat_lpanamensis_v68/*-paired.bam)
    cyoa --method freebayes --introns 0 --species $LP_SP --gff_type $LP_TYPE --gff_tag $LP_TAG \
         --input $bam --jprefix $prefix
    cyoa --method freebayes --introns 0 --species $LP_22 --gff_type $LP_TYPE --gff_tag $LP_TAG \
         --input $bam --jprefix $prefix
    cyoa --method freebayes --introns 0 --species $LP_23 --gff_type $LP_TYPE --gff_tag $LP_TAG \
         --input $bam --jprefix $prefix
    cyoa --method freebayes --introns 0 --species $LP_21 --gff_type $LP_TYPE --gff_tag $LP_TAG \
         --input $bam --jprefix $prefix
    bcf_ref=$(/bin/ls outputs/${prefix}freebayes_${LP_SP}/${LP_SP}.bcf)
    ## The AB flag is the ratio of reference/new observed
    cyoa --method parsebcf --introns 0 --species $LP_SP --gff_type $LP_TYPE --gff_tag $LP_TAG \
         --input $bcf_ref --jprefix $prefix --max_value 0.8 --min_value 0 --chosen_tag AB
    cyoa --method parsebcf --introns 0 --species $LP_SP --gff_type $LP_TYPE --gff_tag $LP_TAG \
         --input $bcf_ref --jprefix $prefix --max_value 0.2 --min_value 0 --chosen_tag AB
done
cd $start

Author: Ashton Belew

Created: 2025-11-03 Mon 11:44

Validate