M.tuberculosis 20180913: Preprocessing DIA data with two libraries.

OpenSwathWorkflow against the Tuberculist transitions

The following invocations of OpenMS/pyprophet/tric are included in:

dia_invocation_20180913.sh

This script includes the process for generating a local, comet-based transition library from DDA samples generated locally as well as the steps used when processing the downloaded transition libraries.

Gathering parameters

In this first block, I will set a couple of variables and source a file containing the parameters for the rest of the script.

Plotting metrics of the raw data

The raw data files provide opportunities to make sure that the later invocations of openMS/etc will actually work; for example, if there are too few transitions observed here, one should not be surprised if pyprophet and tric fail later.

sampleid TubeID tubelabel FigureReplicate Figure.Name Sample.Description Bio.Replicate LC.Run.(tentative.certainly.wrong.until.I.bug.Yan) MS.Run.(tentative.wrong.until.I.bug.Yan) Technical.Replicate Replicate.State Run expt_id Genotype Collection.Type Condition batch windowsize enzyme harvestdate prepdate rundate runinfo rawfile Filename dia_scored tuberculist_scored include_exclude Run_note
2018_0817BrikenTrypsinDIA01 20180404.07 4.4.18 07-7 01 delta_filtrate_01 H37Rv ΔEsx-5A; Culture Filtrate br01 20180828 01 tr06 br01_tr06 2018_0817BrikenTrypsinDIA01 aug2018 delta filtrate delta_filtrate aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA01.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA01.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA01_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA01_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA02 20180404.08 4.4.18 07-8 02 delta_filtrate_02 H37Rv ΔEsx-5A; Culture Filtrate br02 20180828 01 tr06 br02_tr06 2018_0817BrikenTrypsinDIA02 aug2018 delta filtrate delta_filtrate aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA02.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA02.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA02_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA02_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA03 20180404.09 4.4.18 07-9 03 delta_filtrate_03 H37Rv ΔEsx-5A; Culture Filtrate br03 20180828 01 tr06 br03_tr06 2018_0817BrikenTrypsinDIA03 aug2018 delta filtrate delta_filtrate aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA03.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA03.mzXML NA results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA03_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA07 20180215.04 2.15.18-4 01 wt_filtrate_01 WT H37Rv Culture Filtrate br07 20180828 01 tr06 br07_tr06 2018_0817BrikenTrypsinDIA07 aug2018 wt filtrate wt_filtrate aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA07.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA07.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA07_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA07_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA08 20180215.05 2.15.18-5 02 wt_filtrate_02 WT H37Rv Culture Filtrate br08 20180828 01 tr06 br08_tr06 2018_0817BrikenTrypsinDIA08 aug2018 wt filtrate wt_filtrate aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA08.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA08.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA08_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA08_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA09 20180215.06 2.15.18-6 03 wt_filtrate_03 WT H37Rv Culture Filtrate br09 20180828 01 tr06 br09_tr06 2018_0817BrikenTrypsinDIA09 aug2018 wt filtrate wt_filtrate aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA09.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA09.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA09_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA09_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA11 20180404.01 4.4.18 07-1 01 delta_whole_01 H37Rv ΔEsx-5A; Whole Cell Lysate br10 20180828 01 tr06 br10_tr06 2018_0817BrikenTrypsinDIA11 aug2018 delta whole delta_whole aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA11.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA11.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA11_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA11_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA12 20180404.02 4.4.18 07-2 02 delta_whole_02 H37Rv ΔEsx-5A; Whole Cell Lysate br11 20180828 01 tr06 br11_tr06 2018_0817BrikenTrypsinDIA12 aug2018 delta whole delta_whole aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA12.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA12.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA12_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA12_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA13 20180404.03 4.4.18 07-3 03 delta_whole_03 H37Rv ΔEsx-5A; Whole Cell Lysate br12 20180828 01 tr06 br12_tr06 2018_0817BrikenTrypsinDIA13 aug2018 delta whole delta_whole aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA13.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA13.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA13_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA13_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA17 20180215.01 2.15.18-1 01 wt_whole_01 WT H37Rv Whole Cell Lysate br16 20180828 01 tr06 br16_tr06 2018_0817BrikenTrypsinDIA17 aug2018 wt whole wt_whole aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA17.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA17.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA17_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA17_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA18 20180215.02 2.15.18-2 02 wt_whole_02 WT H37Rv Whole Cell Lysate br17 20180828 01 tr06 br17_tr06 2018_0817BrikenTrypsinDIA18 aug2018 wt whole wt_whole aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA18.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA18.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA18_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA18_vs_20180817_whole_HCD_dia_scored.tsv NA NA
2018_0817BrikenTrypsinDIA19 20180215.03 2.15.18-3 03 wt_whole_03 WT H37Rv Whole Cell Lysate br18 20180828 01 tr06 br18_tr06 2018_0817BrikenTrypsinDIA19 aug2018 wt whole wt_whole aug2018 8 Trypsin 20171019 20180828 20180717 8 m/z SWATH windows results/00raw/20180828/2018_0817BrikenTrypsinDIA19.raw results/01mzXML/dia/20180817/2018_0817BrikenTrypsinDIA19.mzXML results/08pyprophet/20180817/whole_8mz/2018_0817BrikenTrypsinDIA19_vs_20180817_whole_HCD_dia_scored.tsv results/08pyprophet/20180817/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA19_vs_20180817_whole_HCD_dia_scored.tsv NA NA
## Adding 2018_0817BrikenTrypsinDIA01
## Adding 2018_0817BrikenTrypsinDIA02
## Adding 2018_0817BrikenTrypsinDIA03
## Adding 2018_0817BrikenTrypsinDIA07
## Adding 2018_0817BrikenTrypsinDIA08
## Adding 2018_0817BrikenTrypsinDIA09
## Adding 2018_0817BrikenTrypsinDIA11
## Adding 2018_0817BrikenTrypsinDIA12
## Adding 2018_0817BrikenTrypsinDIA13
## Adding 2018_0817BrikenTrypsinDIA17
## Adding 2018_0817BrikenTrypsinDIA18
## Adding 2018_0817BrikenTrypsinDIA19
## This data will benefit from being displayed on the log scale.
## If this is not desired, set scale='raw'
## Some entries are 0.  We are on log scale, adding 1 to the data.
## Changed 176131 zero count features.
## Writing the image to: images/20180913_dia_mzxml_intensities-v20180913.png and calling dev.off().

## Adding 2018_0817BrikenTrypsinDIA01
## Adding 2018_0817BrikenTrypsinDIA02
## Adding 2018_0817BrikenTrypsinDIA03
## Adding 2018_0817BrikenTrypsinDIA07
## Adding 2018_0817BrikenTrypsinDIA08
## Adding 2018_0817BrikenTrypsinDIA09
## Adding 2018_0817BrikenTrypsinDIA11
## Adding 2018_0817BrikenTrypsinDIA12
## Adding 2018_0817BrikenTrypsinDIA13
## Adding 2018_0817BrikenTrypsinDIA17
## Adding 2018_0817BrikenTrypsinDIA18
## Adding 2018_0817BrikenTrypsinDIA19
## Writing the image to: images/20180913_dia_mzxml_retention-v20180913.png and calling dev.off().

## Adding 2018_0817BrikenTrypsinDIA01
## Adding 2018_0817BrikenTrypsinDIA02
## Adding 2018_0817BrikenTrypsinDIA03
## Adding 2018_0817BrikenTrypsinDIA07
## Adding 2018_0817BrikenTrypsinDIA08
## Adding 2018_0817BrikenTrypsinDIA09
## Adding 2018_0817BrikenTrypsinDIA11
## Adding 2018_0817BrikenTrypsinDIA12
## Adding 2018_0817BrikenTrypsinDIA13
## Adding 2018_0817BrikenTrypsinDIA17
## Adding 2018_0817BrikenTrypsinDIA18
## Adding 2018_0817BrikenTrypsinDIA19
## Writing the image to: images/20180913_dia_mzxml_mzbase-v20180913.png and calling dev.off().

## Adding 2018_0817BrikenTrypsinDIA01
## Adding 2018_0817BrikenTrypsinDIA02
## Adding 2018_0817BrikenTrypsinDIA03
## Adding 2018_0817BrikenTrypsinDIA07
## Adding 2018_0817BrikenTrypsinDIA08
## Adding 2018_0817BrikenTrypsinDIA09
## Adding 2018_0817BrikenTrypsinDIA11
## Adding 2018_0817BrikenTrypsinDIA12
## Adding 2018_0817BrikenTrypsinDIA13
## Adding 2018_0817BrikenTrypsinDIA17
## Adding 2018_0817BrikenTrypsinDIA18
## Adding 2018_0817BrikenTrypsinDIA19
## Writing the image to: images/20180913_dia_mzxml_scanintensity-v20180913.png and calling dev.off().

OpenSwathWorkflow invocation

block 16 contains the commands used to run openswathworkflow and pyprophet. Those are repeated here in order to test them interactively when needed.

echo "Invoking the OpenSwathWorkflow using the tuberculist transitions."
base_mzxmldir="results/01mzXML/dia/${VERSION}"
swath_inputs=$(/bin/ls "${base_mzxmldir}")
echo "Checking in, the inputs are: ${swath_inputs}"
mkdir -p "${TUBERCULIST_OUTDIR}"
pypdir="${PYPROPHET_OUTDIR}_tuberculist"
mkdir -p "${pypdir}"
for input in ${swath_inputs}
do
    in_mzxml="${base_mzxmldir}/${input}"
    name=$(basename "${input}" .mzXML)
    echo "Starting openswath run of ${name} using ${MZ_WINDOWS} windows at $(date)."
    tb_output_prefix="${TUBERCULIST_OUTDIR}/${name}_vs_${VERSION}_${TYPE}_${DDA_METHOD}_dia"
    pyprophet_output_prefix="${pypdir}/${name}_vs_${VERSION}_${TYPE}_${DDA_METHOD}_dia"
    echo "Deleting previous swath output file: ${tb_output_prefix}.osw"
    rm -f "${tb_output_prefix}.osw"
    OpenSwathWorkflow \
        -ini "parameters/openms_${VERSION}.ini" \
        -in "${in_mzxml}" \
        -swath_windows_file "windows/openswath_${name}.txt" \
        -tr "${TUBERCULIST_PQP}" \
        -out_osw "${tb_output_prefix}.osw" \
        2>"${tb_output_prefix}_osw.log" 1>&2
    if [[ "$?" -ne "0" ]]; then
        echo "OpenSwathWorkflow for ${name} failed."
    fi

    rm -f "${tb_output_prefix}_scored.osw"
    echo "Scoring individual swath run: ${tb_output_prefix}"
    pyprophet \
        score \
        --level ms1 \
        --in "${tb_output_prefix}.osw" \
        --out "${pyprophet_output_prefix}_scored.osw" \
        2>>"${pyprophet_output_prefix}_pyprophet_ms1.log" 1>&2
    if [[ "$?" -ne "0" ]]; then
        echo "MS1 scoring ${pyprophet_output_prefix}_scored.osw failed."
    fi

    pyprophet \
        score \
        --level ms2 \
        --in "${pyprophet_output_prefix}_scored.osw" \
        --out "${pyprophet_output_prefix}_scored.osw" \
        2>>"${pyprophet_output_prefix}_pyprophet_ms2.log" 1>&2
    if [[ "$?" -ne "0" ]]; then
        echo "MS2 scoring ${pyprophet_output_prefix}_scored.osw failed."
    fi

    pyprophet \
        protein \
        --in "${pyprophet_output_prefix}_scored.osw" \
        --context run-specific \
        2>>"${pyprophet_output_prefix}_pyprophet_protein.log" 1>&2
    if [[ "$?" -ne "0" ]]; then
        echo "Protein scoring ${pyprophet_output_prefix}_scored.osw failed."
    fi

    rm -f "${pyprophet_output_prefix}_scored.tsv"
    echo "Exporting individual swath run: to ${pyprophet_output_prefix}_scored.tsv"
    pyprophet \
        export \
        --in "${pyprophet_output_prefix}_scored.osw" \
        --out "${pyprophet_output_prefix}_scored.tsv" \
        2>>"${pyprophet_output_prefix}_pyprophet_export.log" 1>&2
    ## ok something is fubar, the stupid tsv files are being written in the cwd as run_filename.tsv
    ## No matter what I do!
    mv "${input}.tsv" "${pyprophet_output_prefix}_scored.tsv"
    if [[ "$?" -ne "0" ]]; then
        echo "Exporting ${pyprophet_output_prefix}_scored.tsv failed."
    fi
done

Merging the Tuberculist-derived data with TRIC

Finally, block 17 of the invocation script provides the command used to make the final, feature-aligned data which is used by SWATH2stats and friends. In addition, it generates a matrix of intensities by sample and some metadata. Once again, it is copy/pasted here to allow interactive testing.

Index version: 20180913

TODO

  • 2017-06-14:
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset f791b304824ce2290b018f28c7495bc9b4af9b38
## This is hpgltools commit: Thu Oct 25 15:57:52 2018 -0400: f791b304824ce2290b018f28c7495bc9b4af9b38
## Saving to 01_preprocessing_comet_20180806-v20180913.rda.xz

R version 3.5.1 (2018-07-02)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.utf8, LC_NUMERIC=C, LC_TIME=en_US.utf8, LC_COLLATE=en_US.utf8, LC_MONETARY=en_US.utf8, LC_MESSAGES=en_US.utf8, LC_PAPER=en_US.utf8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.utf8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: foreach(v.1.4.4) and hpgltools(v.2018.03)

loaded via a namespace (and not attached): tidyselect(v.0.2.5), xfun(v.0.3), pander(v.0.6.2), purrr(v.0.2.5), doSNOW(v.1.0.16), colorspace(v.1.3-2), snow(v.0.4-3), miniUI(v.0.1.1.1), htmltools(v.0.3.6), yaml(v.2.2.0), base64enc(v.0.1-3), rlang(v.0.3.0), later(v.0.7.5), pillar(v.1.3.0), glue(v.1.3.0), withr(v.2.1.2), RColorBrewer(v.1.1-2), BiocGenerics(v.0.26.0), bindrcpp(v.0.2.2), questionr(v.0.6.3), plyr(v.1.8.4), bindr(v.0.1.1), stringr(v.1.3.1), munsell(v.0.5.0), commonmark(v.1.6), gtable(v.0.2.0), zip(v.1.0.0), devtools(v.1.13.6), codetools(v.0.2-15), evaluate(v.0.12), memoise(v.1.1.0), Biobase(v.2.40.0), knitr(v.1.20), rmdformats(v.0.3.3), doParallel(v.1.0.14), httpuv(v.1.4.5), parallel(v.3.5.1), highr(v.0.7), Rcpp(v.0.12.19), xtable(v.1.8-3), promises(v.1.0.1), scales(v.1.0.0), backports(v.1.1.2), mime(v.0.6), ggplot2(v.3.0.0), digest(v.0.6.18), openxlsx(v.4.1.0), stringi(v.1.2.4), bookdown(v.0.7), dplyr(v.0.7.7), shiny(v.1.1.0), grid(v.3.5.1), rprojroot(v.1.3-2), tools(v.3.5.1), magrittr(v.1.5), lazyeval(v.0.2.1), tibble(v.1.4.2), crayon(v.1.3.4), pkgconfig(v.2.0.2), xml2(v.1.2.0), data.table(v.1.11.8), assertthat(v.0.2.0), rmarkdown(v.1.10), roxygen2(v.6.1.0), rstudioapi(v.0.8), iterators(v.1.0.10), R6(v.2.3.0) and compiler(v.3.5.1)

atb abelew@gmail.com

2018-10-26

