1 20190228

Today in lab meeting, Dallas showed a couple of Western blots which suggest that esxG, esxH, and lpqH are not changing when looking at whole cell lysates between the wild-type and mutant samples.

I would therefore like to step through the data and figure out if I can pinpoint the source of the discrepency.

2 20180913

This is a copy of 20180817 but where I will further simplify to include only the wt/mutant and only the most recent technical replicate.

Therefore, the first thing I did was to copy the sample sheet and remove most of the rows, leaving behind only the 12 which comprise the 3 biological replicates for wt/mutant and the technical run dated 20180817.

3 Notes

  1. Remove complement.
  2. Work only with newest samples.
  3. Define intersections between limma/edger/deseq/msstats for multiple thresholds.
  4. Ontology tests

4 New samples!

In this copy of the worksheet, I am going to deal only with the tuberculist libraries, and only with the most recent technical replicate of the data.

This worksheet is a copy of 20180806 but this time with feeling! (or new samples) A minor caveat: I had some difficulties converting the new data to a openMS suitable format. I am not certain if I messed up a priori, or if a change in the version of the software on the spec caused the problem, but for future reference:

To properly convert the data to mzXML, make sure to use the TPP MSConvert utility and have ‘TPP’ compatibility on.

5 Analyzing data from openMS and friends.

5.1 SWATH2stats preprocessing

I am using my slightly modified copy of SWATH2stats. This seeks to ensure that changes in the case of columns in the metadata from one version of OpenMS to another do not trouble me.

5.1.1 Creating a swath2stats experiment using the tuberculist-derived library data

There is one important caveat in the following block: I used a regex to remove the second half of geneID_geneName so that later when I merge in the annotation data I have it will match.

5.2 Notes 20180913

If I want to subset like this, I need to delete the pyprophet outputs for anything not included in this set and rerun feature_alignment.py.

6 SWATH2stats continued

The various metrics and filters provided by SWATH2stats seem quite reasonable to me. The only thing that really bothers me is that they are all case sensitive and I found that the most recent tric changed the capitalization of a column, causing these to all fall down. Therefore I went in and made everything case insensitive in a fashion similar to that done by MSstats (except I hate capital letters, so I used tolower() rather than toupper()).

6.1 Perform filters

The following block performs the metrics and filters suggested by swath2stats. These first define the decoy hit rate in the data, then filter the data based on that. It also filters out hits with less than ideal m-scores and proteins with non-optimal distributions of peptide hits (either due to too few peptides or a weird distribution of intensities).

## Going to write the image to: images/20190327_tb_swath2stats_sample_cor.png when dev.off() is called.

## PNG 
##   2

## Number of non-decoy peptides: 20811
## Number of decoy peptides: 1002
## Decoy rate: 0.0481

## The average FDR by run on assay level is 0.007
## The average FDR by run on peptide level is 0.007
## The average FDR by run on protein level is 0.023

## Target assay FDR: 0.02
## Required overall m-score cutoff: 0.0070795
## achieving assay FDR: 0.019
## Target protein FDR: 0.02
## Required overall m-score cutoff: 0.00089125
## achieving protein FDR: 0.0172
## Original dimension: 232977, new dimension: 222739, difference: 10238.
## Peptides need to have been quantified in more conditions than: 34.4 in order to pass this percentage-based threshold.
## Fraction of peptides selected: 0.018
## Original dimension: 235512, new dimension: 18173, difference: 217339.
## Target protein FDR: 0.000891250938133746
## Required overall m-score cutoff: 0.01
## achieving protein FDR: 0
## filter_mscore_fdr is filtering the data...
## finding m-score cutoff to achieve desired protein FDR in protein master list..
## finding m-score cutoff to achieve desired global peptide FDR..
## Target peptide FDR: 0.05
## Required overall m-score cutoff: 0.01
## Achieving peptide FDR: 0
## Proteins selected: 
## Total proteins selected: 3044
## Final target proteins: 3044
## Final decoy proteins: 0
## Peptides mapping to these protein entries selected:
## Total mapping peptides: 20243
## Final target peptides: 20243
## Final decoy peptides: 0
## Total peptides selected from:
## Total peptides: 20243
## Final target peptides: 20243
## Final decoy peptides: 0
## Individual run FDR quality of the peptides was not calculated
## as not every run contains a decoy.
## The decoys have been removed from the returned data.
## Number of proteins detected: 3059
## Protein identifiers: Rv1270c, Rv0161, Rv1613, Rv0440, Rv0053, Rv0242c
## Number of proteins detected that are supported by a proteotypic peptide: 2929
## Number of proteotypic peptides detected: 20091
## Number of proteins detected: 2931
## First 6 protein identifiers: Rv1270c, Rv0161, Rv1613, Rv0440, Rv0053, Rv0242c
## Before filtering:
##   Number of proteins: 2929
##   Number of peptides: 20091
## 
## Percentage of peptides removed: 19.21%
## 
## After filtering:
##   Number of proteins: 2903
##   Number of peptides: 16232
## Before filtering:
##   Number of proteins: 2903
##   Number of peptides: 16232
## 
## Percentage of peptides removed: 0.02%
## 
## After filtering:
##   Number of proteins: 2674
##   Number of peptides: 16229

6.2 Write out matrices of the results

swath2stats provides a couple of ways to print out its results, one in a format specifically intended for MSstats, and another as a more canonical matrix of rows = proteins, columns = samples.

## Protein overview matrix results/swath2stats/20190327/protein_all.csv written to working folder.
## [1] 3934   44
## Protein overview matrix results/swath2stats/20190327/protein_matrix_mscore.csv written to working folder.
## [1] 3044   44
## Peptide overview matrix results/swath2stats/20190327/peptide_matrix_mscore.csv written to working folder.
## [1] 20243    44
## Protein overview matrix results/swath2stats/20190327/protein_matrix_filtered.csv written to working folder.
## [1] 2674   44
## Peptide overview matrix results/swath2stats/20190327/peptide_matrix_filtered.csv written to working folder.
## [1] 171784     44

## The library contains 5 transitions per precursor.
## The data table was transformed into a table containing one row per transition.
## One or several columns required by MSstats were not in the data. The columns were created and filled with NAs.
## Missing columns: productcharge, isotopelabeltype
## isotopelabeltype was filled with light.

7 Test aLFQ

I want to revisit aLFQ, I think it might provide better protein-level quantification methods. aLFQ looks promising, but I have not figured out valid parameters for using it.

7.1 Some new plots

In response to some interesting queries from Yan, I made a few little functions which query and plot data from the scored data provided by openswath/pyprophet. Let us look at their results here.

## Warning in if (class(metadata) == "data.frame") {: the condition has length
## > 1 and only the first element will be used
## Attempting to read the tsv file for: 2018_0315Briken02: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken02.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken03: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken03.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken04: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken04.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken05: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken05.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken06: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken06.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken21: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken21.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken22: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken22.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken23: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken23.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken24: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken24.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken25: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken25.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0315Briken26: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0315Briken26.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA01: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA01.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA02: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA02.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA03: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA03.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA04: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA04.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA05: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA05.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA06: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA06.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA08: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA08.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA09: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA09.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA11: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA11.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0502BrikenDIA12: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0502BrikenDIA12.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0726Briken02: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0726Briken02.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0726Briken05: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0726Briken05.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0726Briken06: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0726Briken06.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0726Briken07: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0726Briken07.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0726Briken08: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0726Briken08.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0726Briken09: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0726Briken09.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0726Briken13: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0726Briken13.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0726Briken16: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0726Briken16.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA01: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA01.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA02: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA02.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA03: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA03.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA04: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA04.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA05: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA05.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA06: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA06.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA07: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA07.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA08: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA08.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA09: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA09.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA11: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA11.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA12: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA12.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA13: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA13.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA14: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA14.mzXML.tsv.
## Attempting to read the tsv file for: 2018_0817BrikenTrypsinDIA19: results/08pyprophet/20190327/whole_8mz_tuberculist/2018_0817BrikenTrypsinDIA19.mzXML.tsv.
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 134 rows containing non-finite values (stat_ydensity).
## Warning: Removed 134 rows containing missing values (geom_point).
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Adding wt_filtrate_01
## Adding wt_filtrate_03
## Adding wt_whole_01
## Adding wt_whole_03
## Adding wt_whole_02
## Adding wt_whole_01.1
## Adding wt_whole_03.1
## Adding wt_filtrate_01.1
## Adding wt_filtrate_03.1
## Adding wt_filtrate_02
## Adding wt_filtrate_02.1
## Adding delta_filtrate_01
## Adding delta_filtrate_02
## Adding delta_filtrate_03
## Adding comp_filtrate_01
## Adding comp_filtrate_02
## Adding comp_filtrate_03
## Adding delta_whole_02
## Adding delta_whole_03
## Adding comp_whole_02
## Adding comp_whole_03
## Adding delta_filtrate_02.1
## Adding comp_filtrate_02.1
## Adding comp_filtrate_03.1
## Adding wt_filtrate_01.2
## Adding wt_filtrate_02.2
## Adding wt_filtrate_03.2
## Adding delta_whole_03.1
## Adding comp_whole_03.1
## Adding delta_filtrate_01.1
## Adding delta_filtrate_02.2
## Adding delta_filtrate_03.1
## Adding comp_filtrate_01.1
## Adding comp_filtrate_02.2
## Adding comp_filtrate_03.2
## Adding wt_filtrate_01.3
## Adding wt_filtrate_02.3
## Adding wt_filtrate_03.3
## Adding delta_whole_01
## Adding delta_whole_02.1
## Adding delta_whole_03.2
## Adding comp_whole_01
## Adding wt_whole_03.2
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Adding wt_filtrate_01
## Adding wt_filtrate_03
## Adding wt_whole_01
## Adding wt_whole_03
## Adding wt_whole_02
## Adding wt_whole_01.1
## Adding wt_whole_03.1
## Adding wt_filtrate_01.1
## Adding wt_filtrate_03.1
## Adding wt_filtrate_02
## Adding wt_filtrate_02.1
## Adding delta_filtrate_01
## Adding delta_filtrate_02
## Adding delta_filtrate_03
## Adding comp_filtrate_01
## Adding comp_filtrate_02
## Adding comp_filtrate_03
## Adding delta_whole_02
## Adding delta_whole_03
## Adding comp_whole_02
## Adding comp_whole_03
## Adding delta_filtrate_02.1
## Adding comp_filtrate_02.1
## Adding comp_filtrate_03.1
## Adding wt_filtrate_01.2
## Adding wt_filtrate_02.2
## Adding wt_filtrate_03.2
## Adding delta_whole_03.1
## Adding comp_whole_03.1
## Adding delta_filtrate_01.1
## Adding delta_filtrate_02.2
## Adding delta_filtrate_03.1
## Adding comp_filtrate_01.1
## Adding comp_filtrate_02.2
## Adding comp_filtrate_03.2
## Adding wt_filtrate_01.3
## Adding wt_filtrate_02.3
## Adding wt_filtrate_03.3
## Adding delta_whole_01
## Adding delta_whole_02.1
## Adding delta_whole_03.2
## Adding comp_whole_01
## Adding wt_whole_03.2
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 185 rows containing non-finite values (stat_ydensity).
## Warning: Removed 185 rows containing missing values (geom_point).
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Adding wt_filtrate_01
## Adding wt_filtrate_03
## Adding wt_whole_01
## Adding wt_whole_03
## Adding wt_whole_02
## Adding wt_whole_01.1
## Adding wt_whole_03.1
## Adding wt_filtrate_01.1
## Adding wt_filtrate_03.1
## Adding wt_filtrate_02
## Adding wt_filtrate_02.1
## Adding delta_filtrate_01
## Adding delta_filtrate_02
## Adding delta_filtrate_03
## Adding comp_filtrate_01
## Adding comp_filtrate_02
## Adding comp_filtrate_03
## Adding delta_whole_02
## Adding delta_whole_03
## Adding comp_whole_02
## Adding comp_whole_03
## Adding delta_filtrate_02.1
## Adding comp_filtrate_02.1
## Adding comp_filtrate_03.1
## Adding wt_filtrate_01.2
## Adding wt_filtrate_02.2
## Adding wt_filtrate_03.2
## Adding delta_whole_03.1
## Adding comp_whole_03.1
## Adding delta_filtrate_01.1
## Adding delta_filtrate_02.2
## Adding delta_filtrate_03.1
## Adding comp_filtrate_01.1
## Adding comp_filtrate_02.2
## Adding comp_filtrate_03.2
## Adding wt_filtrate_01.3
## Adding wt_filtrate_02.3
## Adding wt_filtrate_03.3
## Adding delta_whole_01
## Adding delta_whole_02.1
## Adding delta_whole_03.2
## Adding comp_whole_01
## Adding wt_whole_03.2
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Adding wt_filtrate_01
## Adding wt_filtrate_03
## Adding wt_whole_01
## Adding wt_whole_03
## Adding wt_whole_02
## Adding wt_whole_01.1
## Adding wt_whole_03.1
## Adding wt_filtrate_01.1
## Adding wt_filtrate_03.1
## Adding wt_filtrate_02
## Adding wt_filtrate_02.1
## Adding delta_filtrate_01
## Adding delta_filtrate_02
## Adding delta_filtrate_03
## Adding comp_filtrate_01
## Adding comp_filtrate_02
## Adding comp_filtrate_03
## Adding delta_whole_02
## Adding delta_whole_03
## Adding comp_whole_02
## Adding comp_whole_03
## Adding delta_filtrate_02.1
## Adding comp_filtrate_02.1
## Adding comp_filtrate_03.1
## Adding wt_filtrate_01.2
## Adding wt_filtrate_02.2
## Adding wt_filtrate_03.2
## Adding delta_whole_03.1
## Adding comp_whole_03.1
## Adding delta_filtrate_01.1
## Adding delta_filtrate_02.2
## Adding delta_filtrate_03.1
## Adding comp_filtrate_01.1
## Adding comp_filtrate_02.2
## Adding comp_filtrate_03.2
## Adding wt_filtrate_01.3
## Adding wt_filtrate_02.3
## Adding wt_filtrate_03.3
## Adding delta_whole_01
## Adding delta_whole_02.1
## Adding delta_whole_03.2
## Adding comp_whole_01
## Adding wt_whole_03.2
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 11 rows containing non-finite values (stat_ydensity).
## Warning: Removed 11 rows containing missing values (geom_point).
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Adding wt_filtrate_01
## Adding wt_filtrate_03
## Adding wt_whole_01
## Adding wt_whole_03
## Adding wt_whole_02
## Adding wt_whole_01.1
## Adding wt_whole_03.1
## Adding wt_filtrate_01.1
## Adding wt_filtrate_03.1
## Adding wt_filtrate_02
## Adding wt_filtrate_02.1
## Adding delta_filtrate_01
## Adding delta_filtrate_02
## Adding delta_filtrate_03
## Adding comp_filtrate_01
## Adding comp_filtrate_02
## Adding comp_filtrate_03
## Adding delta_whole_02
## Adding delta_whole_03
## Adding comp_whole_02
## Adding comp_whole_03
## Adding delta_filtrate_02.1
## Adding comp_filtrate_02.1
## Adding comp_filtrate_03.1
## Adding wt_filtrate_01.2
## Adding wt_filtrate_02.2
## Adding wt_filtrate_03.2
## Adding delta_whole_03.1
## Adding comp_whole_03.1
## Adding delta_filtrate_01.1
## Adding delta_filtrate_02.2
## Adding delta_filtrate_03.1
## Adding comp_filtrate_01.1
## Adding comp_filtrate_02.2
## Adding comp_filtrate_03.2
## Adding wt_filtrate_01.3
## Adding wt_filtrate_02.3
## Adding wt_filtrate_03.3
## Adding delta_whole_01
## Adding delta_whole_02.1
## Adding delta_whole_03.2
## Adding comp_whole_01
## Adding wt_whole_03.2
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

7.2 MSstats

msstats.org seems to provide a complete solution for performing reasonable metrics of this data.

I am currently reading: http://msstats.org/wp-content/uploads/2017/01/MSstats_v3.7.3_manual.pdf

I made some moderately intrusive changes to MSstats to make it clearer, as well.

8 Create hpgltools expressionset

Since I am not certain I understand these data, I will take the intensities from SWATH2stats, metadata, and annotation data; attempt to create a ‘normal’ expressionset; poke at it to see what I can learn.

8.1 Massaging the metadata

I want to use the same metadata as were used for MSstats. It has a few important differences from the requirements of hpgltools: pretty much only that I do not allow rownames/sampleIDs to start with a number.

8.3 Merge the pieces

Now we should have sufficient pieces to make an expressionset.

While here, I will also split the data into a cf and whole-cell pair of data structures.

## There were 43, now there are 17 samples.
## There were 43, now there are 26 samples.
## Writing the first sheet, containing a legend and some summary data.
## Writing the raw reads.
## Hey, you merged the annotation data and did not reset the column names!
## Graphing the raw reads.
## Warning in svg(filename = high_quality): unable to load shared object '/sw/local/R/3.5.3/lib/R/library/grDevices/libs//cairo.so':
##   /sw/local/R/3.5.3/lib/R/library/grDevices/libs//cairo.so: cannot open shared object file: No such file or directory
## Warning in svg(filename = high_quality): failed to load cairo DLL
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page

## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in svg(filename = high_quality): failed to load cairo DLL

## Warning in svg(filename = high_quality): failed to load cairo DLL

## Warning in svg(filename = high_quality): failed to load cairo DLL
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in svg(filename = high_quality): failed to load cairo DLL

## Warning in svg(filename = high_quality): failed to load cairo DLL
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in svg(filename = high_quality): failed to load cairo DLL
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in svg(filename = high_quality): failed to load cairo DLL
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Attempting mixed linear model with: ~  (1|condition) + (1|batch)
## Fitting the expressionset to the model, this is slow.
## Warning in serialize(data, node$con): 'package:variancePartition' may not
## be available when loading
## Warning in serialize(data, node$con): 'package:variancePartition' may not
## be available when loading

## Warning in serialize(data, node$con): 'package:variancePartition' may not
## be available when loading

## Warning in serialize(data, node$con): 'package:variancePartition' may not
## be available when loading

## Warning in serialize(data, node$con): 'package:variancePartition' may not
## be available when loading

## Warning in serialize(data, node$con): 'package:variancePartition' may not
## be available when loading
## 
## Finished...
## Total: 29 s
## Placing factor: condition at the beginning of the model.
## Writing the normalized reads.
## Graphing the normalized reads.
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in svg(filename = high_quality): failed to load cairo DLL

## Warning in svg(filename = high_quality): failed to load cairo DLL

## Warning in svg(filename = high_quality): failed to load cairo DLL
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in svg(filename = high_quality): failed to load cairo DLL

## Warning in svg(filename = high_quality): failed to load cairo DLL
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Attempting mixed linear model with: ~  (1|condition) + (1|batch)
## Fitting the expressionset to the model, this is slow.
## 
## Finished...
## Total: 30 s
## Placing factor: condition at the beginning of the model.
## Writing the median reads by factor.
## The factor comp_filtrate has 8 rows.
## The factor comp_whole has 4 rows.
## The factor delta_filtrate has 7 rows.
## The factor delta_whole has 6 rows.
## The factor wt_filtrate has 11 rows.
## The factor wt_whole has 7 rows.
## Note: zip::zip() is deprecated, please use zip::zipr() instead

8.4 Metrics of the full data set

Generate some metrics of the full data set, later the same things will be performed using the data subsets. There is a weird caveat: sva fails under some strange circumstances for this data. It seems that if the data is cpmmed, then sva sees the resulting matrix as containing singular values. I am guessing that this means that there are a bunch of low values in the filtrate data which, when cpm(data) is performed get dropped to near 0 and therefore trip up sva. There are two likely ways around this:

  1. Do not cpm the data.
  2. Filter the data more aggressively so that there are no zero-values after cpm().

8.6 Variance partition

This is a potential argument against including only the newest technical replicate of the data; as when all the technicals were included, these metrics looked much more sensible.

## Attempting mixed linear model with: ~  (1|condition) + (1|batch)
## Fitting the expressionset to the model, this is slow.
## 
## Finished...
## Total: 29 s
## Placing factor: condition at the beginning of the model.

## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.

## This data will benefit from being displayed on the log scale.
## If this is not desired, set scale='raw'
## Some entries are 0.  We are on log scale, adding 1 to the data.
## Changed 60551 zero count features.
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

8.7 Metrics of the filtrate data set

Once again, here we have metrics of the subset data; this time of filtrate data.

## This function will replace the expt$expressionset slot with:
## cbcb(data)
## It will save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep libsizes in mind
##  when invoking limma.  The appropriate libsize is non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize
## Leaving the data in its current base format, keep in mind that
##  some metrics are easier to see when the data is log2 transformed, but
##  EdgeR/DESeq do not accept transformed data.
## Leaving the data unconverted.  It is often advisable to cpm/rpkm
##  the data to normalize for sampling differences, keep in mind though that rpkm
##  has some annoying biases, and voom() by default does a cpm (though hpgl_voom()
##  will try to detect this).
## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.
## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 1315 low-count genes (1359 remaining).
## Step 2: not normalizing the data.
## Step 3: not converting the data.
## Step 4: not transforming the data.
## Step 5: not doing batch correction.
## This function will replace the expt$expressionset slot with:
## cbcb(data)
## It will save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep libsizes in mind
##  when invoking limma.  The appropriate libsize is non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize
## Leaving the data in its current base format, keep in mind that
##  some metrics are easier to see when the data is log2 transformed, but
##  EdgeR/DESeq do not accept transformed data.
## Leaving the data unconverted.  It is often advisable to cpm/rpkm
##  the data to normalize for sampling differences, keep in mind though that rpkm
##  has some annoying biases, and voom() by default does a cpm (though hpgl_voom()
##  will try to detect this).
## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.
## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 1315 low-count genes (1359 remaining).
## Step 2: not normalizing the data.
## Step 3: not converting the data.
## Step 4: not transforming the data.
## Step 5: not doing batch correction.
## This data will benefit from being displayed on the log scale.
## If this is not desired, set scale='raw'
## Some entries are 0.  We are on log scale, setting them to 0.5.
## Changed 15319 zero count features.
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## [1] TRUE
## [1] TRUE
## The factor br01 has 5 rows.
## The factor br02 has 4 rows.
## The factor br03 has 3 rows.
## The factor br04 has 2 rows.
## The factor br05 has 3 rows.
## The factor br06 has 3 rows.
## The factor br07 has 2 rows.
## The factor br08 has 2 rows.
## The factor br09 has 2 rows.
## Reading the sample metadata.
## Warning in `[<-.factor`(`*tmp*`, iseq, value = c("undefined",
## "undefined", : invalid factor level, NA generated
## The sample definitions comprises: 9 rows(samples) and 29 columns(metadata fields).
## Matched 613 annotations and counts.
## Bringing together the count matrix and gene information.
## The final expressionset has 613 rows and 9 columns.

mean_by_bioreplicate() does the following:

  1. Extracts the expression matrix.
  2. Backfills all 0s with NA
  3. Performs a cpm on the result.
  4. Does a mean of all non-NA samples by biological replicate.

9 Examine the mean by bioreplicate data

## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(pofa(data))))
## It will save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep libsizes in mind
##  when invoking limma.  The appropriate libsize is non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize
## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: pofa
## Removing 1 low-count genes (612 remaining).
## Step 2: normalizing the data with quant.
## Using normalize.quantiles.robust due to a thread error in preprocessCore.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## Step 5: not doing batch correction.
## Warning in grid.Call.graphics(C_segments, x$x0, x$y0, x$x1, x$y1, x$arrow):
## semi-transparency is not supported on this device: reported only once per
## page

9.0.1 Something fun for Najib

## Warning in grid.Call.graphics(C_segments, x$x0, x$y0, x$x1, x$y1, x$arrow):
## semi-transparency is not supported on this device: reported only once per
## page

9.1 Plot some metrics

In the previous blocks, I generated the metrics; in this block I will print them with the assumption that some of them will end up being included in whatever publication comes from this work; with that in mind I should probably change this to sva or pdf or something not png.

## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_libsize.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Going to write the image to: images/20190501_norm_pca.pdf when dev.off() is called.
## Defaulting to tiff.
## Going to write the image to: images/20190501_fsva_pca.pdfg when dev.off() is called.
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_norm_corheat.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_raw_density.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_boxplot.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_whole_libsize.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Going to write the image to: images/20190501_whole_norm_pca.pdf when dev.off() is called.
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Going to write the image to: images/20190501_whole_fsva_pca.pdf when dev.off() is called.
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_whole_norm_corheat.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_whole_raw_density.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_whole_boxplot.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_libsize.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Going to write the image to: images/20190501_norm_pca.pdf when dev.off() is called.
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Going to write the image to: images/20190501_fsva_pca.pdf when dev.off() is called.
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_norm_corheat.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_raw_density.pdf and calling dev.off().
## Warning in cairo_pdf(filename = file, ...): failed to load cairo DLL
## Writing the image to: images/20190501_boxplot.pdf and calling dev.off().

10 Perform hpgltools Differential Expression Analyses

Earlier MSstats was used to contrast the various conditions in this data. In this block the same will be performed using the limma/deseq/edger/ebseq/basic methods. As an aside, running all of these methods in serial takes ~ 1/5th the time of running any one step of MSstats.

10.1 Idea from Volker

Given the initial pairwise data, look at wt edgeR results ratio filtrate/culture; then set a cutoff as log2fc <= 0.75. Proteins which survive this cutoff are then used in the ratio of ratios and analyses of mutant/wt.

The survivors of this initial cutoff is the set of secreted proteins. Compare this set with the set of known secreted proteins from other papers (Cox). Also Collins, Abesol (likely same as the synthetic Mtb library).

Separate, simultaneous filter: Look at the distribution of the culture filtrate data and filter out the (relatively) low-intensity bump. Then take the surviving set and perform all analyses with it. In theory, these two filter methods should get us to a similar place.

10.2 lfc cutoff cutoff

As mentioned below, Volker had an interesting cutoff suggestion: keep only those with lfc >= 0.75 filtrate/whole

## Before removal, there were 2674 entries.
## Now there are 1359 entries.
## Percent kept: 99.994, 99.985, 99.995, 99.867, 99.990, 99.995, 99.998, 99.993, 92.738, 90.837, 89.258, 94.174, 99.998, 99.968, 99.996, 100.000, 99.893, 99.970, 99.991, 89.431, 90.287, 90.537, 92.753, 96.150, 90.465, 100.000, 99.996, 99.995, 99.997, 99.988, 99.942, 99.997, 99.920, 99.995, 99.980, 99.994, 92.805, 91.347, 92.444, 90.941, 92.021, 91.954, 89.626
## Percent removed: 0.006, 0.015, 0.005, 0.133, 0.010, 0.005, 0.002, 0.007, 7.262, 9.163, 10.742, 5.826, 0.002, 0.032, 0.004, 0.000, 0.107, 0.030, 0.009, 10.569, 9.713, 9.463, 7.247, 3.850, 9.535, 0.000, 0.004, 0.005, 0.003, 0.012, 0.058, 0.003, 0.080, 0.005, 0.020, 0.006, 7.195, 8.653, 7.556, 9.059, 7.979, 8.046, 10.374
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq. If they freak out, check the state of the count table and ensure that it is in integer counts.
## Choosing the non-intercept containing model.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.

## Before removal, there were 2674 entries.
## Now there are 1359 entries.
## Percent kept: 99.994, 99.985, 99.995, 99.867, 99.990, 99.995, 99.998, 99.993, 92.738, 90.837, 89.258, 94.174, 99.998, 99.968, 99.996, 100.000, 99.893, 99.970, 99.991, 89.431, 90.287, 90.537, 92.753, 96.150, 90.465, 100.000, 99.996, 99.995, 99.997, 99.988, 99.942, 99.997, 99.920, 99.995, 99.980, 99.994, 92.805, 91.347, 92.444, 90.941, 92.021, 91.954, 89.626
## Percent removed: 0.006, 0.015, 0.005, 0.133, 0.010, 0.005, 0.002, 0.007, 7.262, 9.163, 10.742, 5.826, 0.002, 0.032, 0.004, 0.000, 0.107, 0.030, 0.009, 10.569, 9.713, 9.463, 7.247, 3.850, 9.535, 0.000, 0.004, 0.005, 0.003, 0.012, 0.058, 0.003, 0.080, 0.005, 0.020, 0.006, 7.195, 8.653, 7.556, 9.059, 7.979, 8.046, 10.374
## Before removal, there were 2674 entries.
## Now there are 311 entries.
## Percent kept: 84.415, 77.586, 87.214, 94.832, 72.157, 83.903, 84.976, 69.529, 22.867, 18.218, 14.713, 18.716, 93.900, 75.232, 86.505, 97.506, 74.756, 90.158, 83.104, 17.095, 17.639, 18.935, 12.986, 17.680, 19.688, 85.951, 92.342, 83.666, 92.711, 82.561, 89.206, 84.308, 86.852, 83.157, 85.627, 81.339, 21.076, 16.414, 21.305, 18.076, 20.558, 20.299, 18.363
## Percent removed: 15.585, 22.414, 12.786, 5.168, 27.843, 16.097, 15.024, 30.471, 77.133, 81.782, 85.287, 81.284, 6.100, 24.768, 13.495, 2.494, 25.244, 9.842, 16.896, 82.905, 82.361, 81.065, 87.014, 82.320, 80.312, 14.049, 7.658, 16.334, 7.289, 17.439, 10.794, 15.692, 13.148, 16.843, 14.373, 18.661, 78.924, 83.586, 78.695, 81.924, 79.442, 79.701, 81.637
## Deleting the file excel/20190501_combined_contrasts_all-v20190327.xlsx before writing the tables.
## Writing a legend of columns.
## Printing a pca plot before/after surrogates/batch estimation.
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure

## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
## Working on 1/5: wt_cfwhole which is: wt_filtrate/wt_whole.
## Found inverse table with wt_whole_vs_wt_filtrate
## 20181210 a pthread error in normalize.quantiles leads me to robust.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Working on 2/5: delta_cfwhole which is: delta_filtrate/delta_whole.
## Found inverse table with delta_whole_vs_delta_filtrate
## 20181210 a pthread error in normalize.quantiles leads me to robust.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Working on 3/5: whole_deltawt which is: delta_whole/wt_whole.
## Found inverse table with wt_whole_vs_delta_whole
## 20181210 a pthread error in normalize.quantiles leads me to robust.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Working on 4/5: cf_deltawt which is: delta_filtrate/wt_filtrate.
## Found inverse table with wt_filtrate_vs_delta_filtrate
## 20181210 a pthread error in normalize.quantiles leads me to robust.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Working on 5/5: rofr which is: delta_cfwhole/wt_cfwhole.
## Found table with delta_cfwhole_vs_wt_cfwhole
## Used the inverse table, might need to -1 the logFC and stat.
## Warning in combine_single_de_table(li = limma, ed = edger, eb = ebseq, de =
## deseq, : The deseq table seems to be missing.
## Used the inverse table, might need to -1 the logFC.
## Warning in combine_single_de_table(li = limma, ed = edger, eb = ebseq, de =
## deseq, : The ebseq table seems to be missing.
## Used the inverse table, might need to -1 the logFC.
## Warning in combine_single_de_table(li = limma, ed = edger, eb = ebseq, de =
## deseq, : The basic table seems to be missing.
## 20181210 a pthread error in normalize.quantiles leads me to robust.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Used Bon Ferroni corrected t test(s) between columns.
## Adding venn plots for wt_cfwhole.
## Limma expression coefficients for wt_cfwhole; R^2: 0.25; equation: y = 0.41x - 0.276
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Edger expression coefficients for wt_cfwhole; R^2: 0.532; equation: y = 0.859x + 1.78
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## DESeq2 expression coefficients for wt_cfwhole; R^2: 0.677; equation: y = 0.834x + 1.61
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Adding venn plots for delta_cfwhole.
## Limma expression coefficients for delta_cfwhole; R^2: 0.21; equation: y = 0.378x - 0.222
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Edger expression coefficients for delta_cfwhole; R^2: 0.0469; equation: y = 0.15x + 16.1
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## DESeq2 expression coefficients for delta_cfwhole; R^2: 0.515; equation: y = 0.739x + 4.53
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Adding venn plots for whole_deltawt.
## Limma expression coefficients for whole_deltawt; R^2: 0.882; equation: y = 0.898x + 0.164
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Edger expression coefficients for whole_deltawt; R^2: 0.972; equation: y = 0.984x + 0.17
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## DESeq2 expression coefficients for whole_deltawt; R^2: 0.982; equation: y = 0.955x + 0.651
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Adding venn plots for cf_deltawt.
## Limma expression coefficients for cf_deltawt; R^2: 0.766; equation: y = 0.85x - 0.093
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Edger expression coefficients for cf_deltawt; R^2: 0.944; equation: y = 0.965x + 0.733
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## DESeq2 expression coefficients for cf_deltawt; R^2: 0.965; equation: y = 0.926x + 1.74
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_rect, x$x, x$y, x$width, x$height,
## resolveHJust(x$just, : semi-transparency is not supported on this device:
## reported only once per page
## Adding venn plots for rofr.
## Writing summary information.
## Attempting to add the comparison plot to pairwise_summary at row: 27 and column: 1
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Performing save of the workbook.
## Writing a legend of columns.
## The count is: 1 and the test is: edger.
## Writing excel data according to edger for wt_cfwhole: 1/5.
## After (adj)p filter, the up genes table has 111 genes.
## After (adj)p filter, the down genes table has 470 genes.
## After fold change filter, the up genes table has 111 genes.
## After fold change filter, the down genes table has 470 genes.
## Writing excel data according to edger for delta_cfwhole: 2/5.
## After (adj)p filter, the up genes table has 135 genes.
## After (adj)p filter, the down genes table has 309 genes.
## After fold change filter, the up genes table has 135 genes.
## After fold change filter, the down genes table has 309 genes.
## Writing excel data according to edger for whole_deltawt: 3/5.
## After (adj)p filter, the up genes table has 13 genes.
## After (adj)p filter, the down genes table has 15 genes.
## After fold change filter, the up genes table has 13 genes.
## After fold change filter, the down genes table has 14 genes.
## Writing excel data according to edger for cf_deltawt: 4/5.
## After (adj)p filter, the up genes table has 188 genes.
## After (adj)p filter, the down genes table has 148 genes.
## After fold change filter, the up genes table has 187 genes.
## After fold change filter, the down genes table has 148 genes.
## Writing excel data according to edger for rofr: 5/5.
## After (adj)p filter, the up genes table has 117 genes.
## After (adj)p filter, the down genes table has 100 genes.
## After fold change filter, the up genes table has 114 genes.
## After fold change filter, the down genes table has 100 genes.
## Printing significant genes to the file: excel/20190501_siglfc0.75_proteins_all-v20190327.xlsx
## 1/5: Creating significant table up_1edger_wt_cfwhole
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 2/5: Creating significant table up_2edger_delta_cfwhole
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 3/5: Creating significant table up_3edger_whole_deltawt
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 4/5: Creating significant table up_4edger_cf_deltawt
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 5/5: Creating significant table up_5edger_rofr
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Adding significance bar plots.
## Writing a legend of columns.
## The count is: 1 and the test is: edger.
## Writing excel data according to edger for wt_cfwhole: 1/5.
## After (adj)p filter, the up genes table has 418 genes.
## After (adj)p filter, the down genes table has 926 genes.
## After fold change filter, the up genes table has 299 genes.
## After fold change filter, the down genes table has 801 genes.
## Writing excel data according to edger for delta_cfwhole: 2/5.
## After (adj)p filter, the up genes table has 516 genes.
## After (adj)p filter, the down genes table has 835 genes.
## After fold change filter, the up genes table has 385 genes.
## After fold change filter, the down genes table has 712 genes.
## Writing excel data according to edger for whole_deltawt: 3/5.
## After (adj)p filter, the up genes table has 502 genes.
## After (adj)p filter, the down genes table has 833 genes.
## After fold change filter, the up genes table has 159 genes.
## After fold change filter, the down genes table has 304 genes.
## Writing excel data according to edger for cf_deltawt: 4/5.
## After (adj)p filter, the up genes table has 678 genes.
## After (adj)p filter, the down genes table has 640 genes.
## After fold change filter, the up genes table has 455 genes.
## After fold change filter, the down genes table has 390 genes.
## Writing excel data according to edger for rofr: 5/5.
## After (adj)p filter, the up genes table has 788 genes.
## After (adj)p filter, the down genes table has 569 genes.
## After fold change filter, the up genes table has 534 genes.
## After fold change filter, the down genes table has 338 genes.
## Printing significant genes to the file: excel/20190501_siglfc0.75_proteins_all-v20190327.xlsx
## 1/5: Creating significant table up_1edger_wt_cfwhole
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 2/5: Creating significant table up_2edger_delta_cfwhole
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 3/5: Creating significant table up_3edger_whole_deltawt
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 4/5: Creating significant table up_4edger_cf_deltawt
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 5/5: Creating significant table up_5edger_rofr
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Adding significance bar plots.
## Writing a legend of columns.
## The count is: 1 and the test is: edger.
## Writing excel data according to edger for wt_cfwhole: 1/5.
## After (adj)p filter, the up genes table has 144 genes.
## After (adj)p filter, the down genes table has 167 genes.
## After fold change filter, the up genes table has 123 genes.
## After fold change filter, the down genes table has 105 genes.
## Writing excel data according to edger for delta_cfwhole: 2/5.
## After (adj)p filter, the up genes table has 164 genes.
## After (adj)p filter, the down genes table has 143 genes.
## After fold change filter, the up genes table has 142 genes.
## After fold change filter, the down genes table has 87 genes.
## Writing excel data according to edger for whole_deltawt: 3/5.
## After (adj)p filter, the up genes table has 97 genes.
## After (adj)p filter, the down genes table has 199 genes.
## After fold change filter, the up genes table has 40 genes.
## After fold change filter, the down genes table has 106 genes.
## Writing excel data according to edger for cf_deltawt: 4/5.
## After (adj)p filter, the up genes table has 134 genes.
## After (adj)p filter, the down genes table has 177 genes.
## After fold change filter, the up genes table has 67 genes.
## After fold change filter, the down genes table has 80 genes.
## Writing excel data according to edger for rofr: 5/5.
## After (adj)p filter, the up genes table has 180 genes.
## After (adj)p filter, the down genes table has 131 genes.
## After fold change filter, the up genes table has 106 genes.
## After fold change filter, the down genes table has 68 genes.
## Printing significant genes to the file: excel/20190501_siglfc0.75_proteins_prefiltered-v20190327.xlsx
## 1/5: Creating significant table up_1edger_wt_cfwhole
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 2/5: Creating significant table up_2edger_delta_cfwhole
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 3/5: Creating significant table up_3edger_whole_deltawt
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 4/5: Creating significant table up_4edger_cf_deltawt
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## 5/5: Creating significant table up_5edger_rofr
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): semi-
## transparency is not supported on this device: reported only once per page
## Adding significance bar plots.

10.2.1 Show a few metrics from the hpgltools pairwise comparisons

## Writing the image to: images/de_heat.png and calling dev.off().

11 For each msstats run, do a DE table

11.1 wt_cf vs wt_whole

## Error in eval(expr, envir, enclos): object 'msstats_results' not found
## Error in eval(expr, envir, enclos): object 'msstats_result' not found
## Error in combine_de_tables(extra_lfc_filter_de, keepers = keepers, extra_annot = msstats_result, : object 'msstats_result' not found
## Error in eval(expr, envir, enclos): object 'wtcf_nobatch_wtwhole_tables' not found
## Error in cor.test(comp_table$log2fc, comp_table$deseq_logfc, method = "spearman"): object 'comp_table' not found
## Error in cor.test(comp_table$log2fc, comp_table$limma_logfc, method = "spearman"): object 'comp_table' not found
## Error in cor.test(comp_table$log2fc, comp_table$ebseq_logfc, method = "spearman"): object 'comp_table' not found
## Error in extract_significant_genes(wtcf_nobatch_wtwhole_tables, lfc = 0.75, : object 'wtcf_nobatch_wtwhole_tables' not found

11.2 delta_cf vs delta_whole

## Error in eval(expr, envir, enclos): object 'msstats_results' not found
## Error in eval(expr, envir, enclos): object 'msstats_result' not found
## Error in combine_de_tables(extra_lfc_filter_de, keepers = keepers, extra_annot = msstats_result, : object 'msstats_result' not found
## Error in eval(expr, envir, enclos): object 'deltacf_deltawhole_tables' not found
## Error in cor.test(comp_table$log2fc, comp_table$deseq_logfc, method = "spearman"): object 'comp_table' not found
## Error in data.frame(df[, c(1, 2)]): object 'comp_table' not found
## NULL
## Error in extract_significant_genes(deltacf_deltawhole_tables, lfc = 0.75, : object 'deltacf_deltawhole_tables' not found

11.3 delta_cf vs wt_cf

Something is weird about this particular contrast, I keep getting negative correlations!

## Error in eval(expr, envir, enclos): object 'msstats_results' not found
## Error in eval(expr, envir, enclos): object 'msstats_result' not found
## Error in combine_de_tables(extra_lfc_filter_de, keepers = keepers, extra_annot = msstats_result, : object 'msstats_result' not found
## Error in eval(expr, envir, enclos): object 'deltacf_wtcf_tables' not found
## Error in cor.test(comp_table$log2fc, comp_table$deseq_logfc, method = "spearman"): object 'comp_table' not found
## Error in cor.test(comp_table$log2fc, comp_table$limma_logfc, method = "spearman"): object 'comp_table' not found
## Error in cor.test(comp_table$log2fc, comp_table$edger_logfc, method = "spearman"): object 'comp_table' not found
## Error in cor.test(comp_table$log2fc, comp_table$ebseq_logfc, method = "spearman"): object 'comp_table' not found
## Error in data.frame(df[, c(1, 2)]): object 'comp_table' not found
## Error in eval(expr, envir, enclos): object 'scatter_test' not found
## Error in eval(expr, envir, enclos): object 'comp_table' not found
## Error in eval(expr, envir, enclos): object 'comp_table' not found
## Error in eval(expr, envir, enclos): object 'com_table' not found
## Error in com_table[neg_inf_idx, "log2fc"] <- -100: object 'com_table' not found
## Error in eval(expr, envir, enclos): object 'com_table' not found
## Error in com_table[pos_inf_idx, "log2fc"] <- 100: object 'com_table' not found
## Error in cor.test(com_table$log2fc, com_table$deseq_logfc): object 'com_table' not found
## Error in data.frame(df[, c(1, 2)]): object 'com_table' not found
## Error in extract_significant_genes(deltacf_wtcf_tables, lfc = 0.75, p = 0.1, : object 'deltacf_wtcf_tables' not found

Ok, I have been resisting this, but lets look more closely at the data for this contrast. One thing I can do to look for correctness in what I am seeing is to look at the mean numerators/denominators for this contrast and see if they agree with msstats or the others.

11.4 delta_whole vs wt_whole

## Error in eval(expr, envir, enclos): object 'msstats_results' not found
## Error in eval(expr, envir, enclos): object 'msstats_result' not found
## Error in combine_de_tables(extra_lfc_filter_de, keepers = keepers, extra_annot = msstats_result, : object 'msstats_result' not found
## Error in eval(expr, envir, enclos): object 'deltawhole_wtwhole_tables' not found
## Error in cor.test(comp_table$log2fc, comp_table$deseq_logfc, method = "spearman"): object 'comp_table' not found
## Error in extract_significant_genes(deltawhole_wtwhole_tables, lfc = 0.75, : object 'deltawhole_wtwhole_tables' not found

11.5 Venn of MSstats vs. others for wt whole/cf

Najib asked for the overlap in perceived significance.

## Error in eval(expr, envir, enclos): object 'wtcf_nobatch_wtwhole_tables' not found
## Error in start[["adjpvalue"]]: object of type 'closure' is not subsettable
## Error in `[<-`(`*tmp*`, na_idx, "adjpvalue", value = 1): object 'na_idx' not found
## Error in start[["adjpvalue"]]: object of type 'closure' is not subsettable
## Error in start[["deseq_adjp"]]: object of type 'closure' is not subsettable
## Error in start[["edger_adjp"]]: object of type 'closure' is not subsettable
## Error in start[["limma_adjp"]]: object of type 'closure' is not subsettable
## Error in start[["ebseq_ppee"]]: object of type 'closure' is not subsettable
## Error in start[ms_sig_idx, ]: object 'ms_sig_idx' not found
## Error in start[de_sig_idx, ]: object 'de_sig_idx' not found
## Error in start[ed_sig_idx, ]: object 'ed_sig_idx' not found
## Error in start[lm_sig_idx, ]: object 'lm_sig_idx' not found
## Error in start[eb_sig_idx, ]: object 'eb_sig_idx' not found
## Error in rownames(ms_sig): object 'ms_sig' not found
## Error in Vennerable::Venn(Sets = method_lst): object 'method_lst' not found
## Error in Vennerable::plot(method_venn, doWeights = FALSE): object 'method_venn' not found

13 Ontology tests

13.1 Wt CF vs Wt whole

## The species being downloaded is: Mycobacterium tuberculosis H37Rv and is being downloaded as 83332.tab.
## # A tibble: 6 x 2
##   sysName GO        
##   <chr>   <chr>     
## 1 Rv0001  GO:0006270
## 2 Rv0001  GO:0006275
## 3 Rv0001  GO:0003688
## 4 Rv0001  GO:0017111
## 5 Rv0001  GO:0005524
## 6 Rv0002  GO:0006260
## Error in eval(expr, envir, enclos): object 'wtcf_nobatch_wtwhole_sig' not found
## Error in simple_goseq(sig_genes = wtcf_wtwhole_up, go_db = mtb_go, length_db = mtb_lengths, : object 'wtcf_wtwhole_up' not found
## Error in eval(expr, envir, enclos): object 'wtcf_wtwhole_up_goseq_nb' not found
## Error in eval(expr, envir, enclos): object 'wtcf_nobatch_wtwhole_sig' not found
## Error in simple_goseq(sig_genes = wtcf_wtwhole_down, go_db = mtb_go, length_db = mtb_lengths, : object 'wtcf_wtwhole_down' not found
## Error in eval(expr, envir, enclos): object 'wtcf_wtwhole_down_goseq_nb' not found

13.2 delta CF vs delta whole

## Error in eval(expr, envir, enclos): object 'deltacf_deltawhole_sig' not found
## Error in simple_goseq(sig_genes = deltacf_deltawhole_up, go_db = mtb_go, : object 'deltacf_deltawhole_up' not found
## Error in eval(expr, envir, enclos): object 'deltacf_deltawhole_up_goseq_nb' not found
## Error in eval(expr, envir, enclos): object 'deltacf_deltawhole_sig' not found
## Error in simple_goseq(sig_genes = deltacf_deltawhole_down, go_db = mtb_go, : object 'deltacf_deltawhole_down' not found
## Error in eval(expr, envir, enclos): object 'deltacf_deltawhole_down_goseq_nb' not found

13.3 delta filtrate vs wt filtrate

## Error in eval(expr, envir, enclos): object 'deltacf_wtcf_sig' not found
## Error in simple_goseq(sig_genes = deltacf_wtcf_up, go_db = mtb_go, length_db = mtb_lengths, : object 'deltacf_wtcf_up' not found
## Error in eval(expr, envir, enclos): object 'deltacf_wtcf_up_goseq_nb' not found
## Error in eval(expr, envir, enclos): object 'deltacf_wtcf_sig' not found
## Error in simple_goseq(sig_genes = deltacf_wtcf_down, go_db = mtb_go, length_db = mtb_lengths, : object 'deltacf_wtcf_down' not found
## Error in eval(expr, envir, enclos): object 'deltacf_wtcf_down_goseq_nb' not found

13.4 delta whole vs wt whole

## Error in eval(expr, envir, enclos): object 'deltawhole_wtwhole_sig' not found
## Error in simple_goseq(sig_genes = deltawhole_wtwhole_up, go_db = mtb_go, : object 'deltawhole_wtwhole_up' not found
## Error in eval(expr, envir, enclos): object 'deltawhole_wtwhole_up_goseq_nb' not found
## Error in eval(expr, envir, enclos): object 'deltawhole_wtwhole_sig' not found
## Error in simple_goseq(sig_genes = deltawhole_wtwhole_down, go_db = mtb_go, : object 'deltawhole_wtwhole_down' not found
## Error in eval(expr, envir, enclos): object 'deltawhole_wtwhole_down_goseq_nb' not found

14 circos

## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=TRUE)
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=FALSE)
## Had a successful gff import with rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=FALSE)
## Returning a df with 15 columns and 4093 rows.
## The species being downloaded is: Mycobacterium tuberculosis H37Rv
## Warning: Setting row names on a tibble is deprecated.
## This assumes you have a colors.conf in circos/colors/ and fonts.conf in circos/fonts/
## It also assumes you have conf/ideogram.conf, conf/ticks.conf, and conf/housekeeping.conf
## It will write circos/conf/mtb.conf with a reasonable first approximation config file.
## Wrote karyotype to circos/conf/ideograms/mtb.conf
## This should match the karyotype= line in mtb.conf
## Wrote ticks to circos/conf/ticks_mtb.conf
## Wrote karyotype to circos/conf/karyotypes/mtb.conf
## This should match the karyotype= line in mtb.conf
## Writing data file: circos/data/mtb_plus_go.txt with the + strand GO data.
## Writing data file: circos/data/mtb_minus_go.txt with the - strand GO data.
## Wrote the +/- config files.  Appending their inclusion to the master file.
## Returning the inner width: 0.84.  Use it as the outer for the next ring.
## Warning in merge.data.frame(df, annot_df, by = "row.names"): column name
## 'Row.names' is duplicated in the result
## Writing data file: circos/data/mtb_wt_wholewt_whole_heatmap.txt with the wt_wholewt_whole column.
## Warning in merge.data.frame(df, annot_df, by = "row.names"): column name
## 'Row.names' is duplicated in the result
## Writing data file: circos/data/mtb_delta_wholedelta_whole_heatmap.txt with the delta_wholedelta_whole column.
## Warning in merge.data.frame(df, annot_df, by = "row.names"): column name
## 'Row.names' is duplicated in the result
## Writing data file: circos/data/mtb_wt_filtratewt_filtrate_heatmap.txt with the wt_filtratewt_filtrate column.
## Warning in merge.data.frame(df, annot_df, by = "row.names"): column name
## 'Row.names' is duplicated in the result
## Writing data file: circos/data/mtb_delta_filtratedelta_filtrate_heatmap.txt with the delta_filtratedelta_filtrate column.
## Warning in merge.data.frame(df, annot_df, by = "row.names"): column name
## 'Row.names' is duplicated in the result
## Writing data file: circos/data/mtb_delta_filtrate_heatdelta_filtrate_heatmap.txt with the delta_filtrate_heatdelta_filtrate column.

circos plot

15 TODO

  • 2018-04-10: Make sure my invocations of SWATH2stats/MSstats are correct.
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset ecdbbe5efd47607d11e6bdd0b8ebf489a1cef51a
## This is hpgltools commit: Mon Apr 29 16:23:57 2019 -0400: ecdbbe5efd47607d11e6bdd0b8ebf489a1cef51a
## Saving to 03_swath2stats_20190327-v20190327.rda.xz
## Error in save(list = ls(all.names = TRUE, envir = globalenv()), envir = globalenv(), : error writing to connection
---
title: "M.tuberculosis 20181112: Most samples! Analyzing data from EncyclopeDIA and OpenSwathWorkFlow/TRIC."
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
  rmdformats::readthedown:
    code_download: true
    code_folding: show
    df_print: paged
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    width: 300
    keep_md: false
    mode: selfcontained
    toc_float: true
  BiocStyle::html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    toc_float: true
  html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    number_sections: true
    self_contained: true
    theme: readable
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
---

<style type="text/css">
body, td {
  font-size: 16px;
}
code.r{
  font-size: 16px;
}
pre {
 font-size: 16px
}
</style>

```{r options, include=FALSE}
library("hpgltools")
tt <- devtools::load_all("~/hpgltools")
knitr::opts_knit$set(width=120,
                     progress=TRUE,
                     verbose=TRUE,
                     echo=TRUE)
knitr::opts_chunk$set(error=TRUE,
                      dpi=96)
old_options <- options(digits=4,
                       stringsAsFactors=FALSE,
                       knitr.duplicate.label="allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size=10))
previous_file <- "01_annotation_20181112.Rmd"
rundate <- format(Sys.Date(), format="%Y%m%d")
ver <- "20181112"
tmp <- try(sm(loadme(filename=paste0(gsub(pattern="\\.Rmd", replace="", x=previous_file), "-v", ver, ".rda.xz"))))

ver <- "20190327"
rmd_file <- paste0("03_swath2stats_", ver, ".Rmd")
```

# 20190228

Today in lab meeting, Dallas showed a couple of Western blots which suggest that
esxG, esxH, and lpqH are not changing when looking at whole cell lysates between
the wild-type and mutant samples.

I would therefore like to step through the data and figure out if I can pinpoint
the source of the discrepency.

# 20180913

This is a copy of 20180817 but where I will further simplify to include only the
wt/mutant and only the most recent technical replicate.

Therefore, the first thing I did was to copy the sample sheet and remove most of
the rows, leaving behind only the 12 which comprise the 3 biological replicates
for wt/mutant and the technical run dated 20180817.

# Notes

1.  Remove complement.
2.  Work only with newest samples.
3.  Define intersections between limma/edger/deseq/msstats for multiple thresholds.
4.  Ontology tests

# New samples!

In this copy of the worksheet, I am going to deal _only_ with the tuberculist
libraries, and only with the most recent technical replicate of the data.

This worksheet is a copy of 20180806 but this time with feeling! (or new samples)
A minor caveat: I had some difficulties converting the new data to a openMS
suitable format.  I am not certain if I messed up a priori, or if a change in
the version of the software on the spec caused the problem, but for future
reference:

To properly convert the data to mzXML, make sure to use the TPP MSConvert
utility and have 'TPP' compatibility on.

Analyzing data from openMS and friends.
=======================================

## SWATH2stats preprocessing

I am using my slightly modified copy of SWATH2stats.  This seeks to ensure that
changes in the case of columns in the metadata from one version of OpenMS to
another do not trouble me.

### Creating a swath2stats experiment using the tuberculist-derived library data

There is one important caveat in the following block: I used a regex to remove
the second half of geneID_geneName so that later when I merge in the annotation
data I have it will match.

## Notes 20180913

If I want to subset like this, I need to delete the pyprophet outputs
for anything not included in this set and rerun feature_alignment.py.

## Notes 20190430

In my recent run of this report, I accidently used a different version of
SWATH2stats and so much of this failed.

```{r tb_swath2stats_initial}
tric_data <- read.csv(
  paste0("results/tric/", ver, "/whole_8mz_tuberculist/comet_HCD.tsv"), sep="\t")
tric_data[["ProteinName"]] <- gsub(pattern="^(.*)_.*$", replacement="\\1",
                                   x=tric_data[["ProteinName"]])
sample_annot <- extract_metadata(paste0("sample_sheets/Mtb_dia_samples_", ver, ".xlsx"))
kept <- ! grepl(x=rownames(sample_annot), pattern="^s\\.\\.")
sample_annot <- sample_annot[kept, ]
devtools::load_all("~/scratch/git/SWATH2stats_myforked")
s2s_exp <- sample_annotation(data=tric_data,
                             sample_annotation=sample_annot,
                             fullpeptidename_column="fullpeptidename")
```

Now I have a couple data structures which should prove useful for the metrics
provided by SWATH2stats, MSstats, and my own hpgltools.

# SWATH2stats continued

The various metrics and filters provided by SWATH2stats seem quite reasonable to
me.  The only thing that really bothers me is that they are all case sensitive
and I found that the most recent tric changed the capitalization of a column,
causing these to all fall down.  Therefore I went in and made everything case
insensitive in a fashion similar to that done by MSstats (except I hate capital
letters, so I used tolower() rather than toupper()).

## Perform filters

The following block performs the metrics and filters suggested by swath2stats.
These first define the decoy hit rate in the data, then filter the data based on
that.  It also filters out hits with less than ideal m-scores and proteins with
non-optimal distributions of peptide hits (either due to too few peptides or a
weird distribution of intensities).

```{r tb_swath2stats_processing}
## Get correlations on a sample by sample basis
pp(file="images/20190327_tb_swath2stats_sample_cor.png")
sample_cor <- plot_correlation_between_samples(
  s2s_exp, size=2,
  fun.aggregate=mean,
  comparison=transition_group_id ~ condition + bioreplicate,
  column.values="intensity")
dev.off()
sample_cond_rep_cor <- plot_correlation_between_samples(
  s2s_exp, size=2,
  comparison=transition_group_id ~
    condition + bioreplicate + run,
  fun.aggregate=mean,
  column.values="intensity")

decoy_lists <- assess_decoy_rate(s2s_exp)
## This seems a bit high to me, yesno?
fdr_overall <- assess_fdr_overall(s2s_exp, output="Rconsole", plot=TRUE)

byrun_fdr <- assess_fdr_byrun(s2s_exp, FFT=0.7, plot=TRUE, output="Rconsole")
chosen_mscore <- mscore4assayfdr(s2s_exp, FFT=0.7, fdr_target=0.02)
prot_score <- mscore4protfdr(s2s_exp, FFT=0.7, fdr_target=0.02)

filtered_ms <- filter_mscore(s2s_exp, chosen_mscore)
filtered_fq <- filter_mscore_freqobs(s2s_exp, 0.01, 0.8, rm.decoy=FALSE)
filtered_ms_fdr <- filter_mscore_fdr(filtered_ms, FFT=0.7,
                                     overall_protein_fdr_target=prot_score,
                                     upper_overall_peptide_fdr_limit=0.05)
filtered_ms_fdr_pr <- filter_proteotypic_peptides(filtered_ms_fdr)
filtered_ms_fdr_pr_all <- filter_all_peptides(filtered_ms_fdr_pr)
filtered_ms_fdr_pr_all_str <- filter_on_max_peptides(data=filtered_ms_fdr_pr_all, n_peptides=10)
filtered_all_filters <- filter_on_min_peptides(data=filtered_ms_fdr_pr_all_str, n_peptides=3)
```

## Write out matrices of the results

swath2stats provides a couple of ways to print out its results, one in a format
specifically intended for MSstats, and another as a more canonical matrix of
rows = proteins, columns = samples.

```{r swath2stats_matrices}
## I think these matrixes are probably smarter to use than the raw outmatrix from tric.
## But I am not a fan of rerwriting the sample column names.
matrix_prefix <- file.path("results", "swath2stats", ver)
if (!file.exists(matrix_prefix)) {
  dir.create(matrix_prefix)
}
protein_matrix_all <- write_matrix_proteins(
  s2s_exp, write.csv=TRUE,
  filename=file.path(matrix_prefix, "protein_all.csv"))
dim(protein_matrix_all)
protein_matrix_mscore <- write_matrix_proteins(
  filtered_ms, write.csv=TRUE,
  filename=file.path(matrix_prefix, "protein_matrix_mscore.csv"))
dim(protein_matrix_mscore)
peptide_matrix_mscore <- write_matrix_peptides(
  filtered_ms, write.csv=TRUE,
  filename=file.path(matrix_prefix, "peptide_matrix_mscore.csv"))
dim(peptide_matrix_mscore)
protein_matrix_filtered <- write_matrix_proteins(
  filtered_all_filters, write.csv=TRUE,
  filename=file.path(matrix_prefix, "protein_matrix_filtered.csv"))
dim(protein_matrix_filtered)
peptide_matrix_filtered <- write_matrix_peptides(
  filtered_all_filters, write.csv=TRUE,
  filename=file.path(matrix_prefix, "peptide_matrix_filtered.csv"))
dim(peptide_matrix_filtered)

rt_cor <- plot_correlation_between_samples(
  filtered_all_filters, column.values="intensity", fun.aggregate=sum, size=2)
## I have no effing clue what this plot means.
variation <- plot_variation(filtered_all_filters, fun.aggregate=sum)

cols <- colnames(filtered_all_filters)
disaggregated <- disaggregate(filtered_all_filters, all.columns=TRUE)
msstats_input <- convert_MSstats(disaggregated)
```

# Test aLFQ

I want to revisit aLFQ, I think it might provide better protein-level
quantification methods.  aLFQ looks promising, but I have not figured out valid
parameters for using it.

```{r alfq, eval=FALSE}
summary(msstats_input)
devtools::load_all("~/scratch/git/aLFQ")
alfq_input <- tric_data[, c("align_origfilename", "ProteinName", "FullPeptideName", "transition_group_id",
                            "FullPeptideName", "Charge", "Intensity")]
colnames(alfq_input) <- c("run_id", "protein_id", "peptide_id", "transition_id", "peptide_sequence",
                          "precursor_charge", "transition_intensity")
alfq_input[["concentration"]] <- "?"
alfq_inference <- aLFQ::ProteinInference.default(alfq_input, consensus_proteins=FALSE,
                                                 consensus_peptides=FALSE, transition_strictness="loose",
                                                 consensus_transitions=FALSE)
alfq_quantities <- aLFQ::AbsoluteQuantification.default(alfq_inference)
```

## Some new plots

In response to some interesting queries from Yan, I made a few little functions
which query and plot data from the scored data provided by openswath/pyprophet.
Let us look at their results here.

```{r tb_pyprophet_plots}
pyp_metadata <- glue::glue("sample_sheets/Mtb_dia_samples_{ver}.xlsx")
pyprophet_fun <- extract_pyprophet_data(metadata=pyp_metadata,
                                       pyprophet_column="diascored")
mass_plot <- sm(plot_pyprophet_distribution(pyprophet_fun, column="mass",
                                            expt_names="figurename"))
mass_plot[["violin"]]

deltart_plot_all <- sm(plot_pyprophet_distribution(
  pyprophet_fun, column="delta_rt", expt_names="figurename"))
deltart_plot_all[["violin"]]

deltart_plot_real <- sm(plot_pyprophet_distribution(
  pyprophet_fun, expt_names="figurename",
  column="delta_rt", keep_decoys=FALSE))
deltart_plot_real[["violin"]]

deltart_plot_decoys <- sm(plot_pyprophet_distribution(
  pyprophet_fun, expt_names="figurename",
  column="delta_rt", keep_real=FALSE))
deltart_plot_decoys[["violin"]]

intensities_esxG <- sm(plot_pyprophet_protein(pyprophet_fun, expt_names="figurename",
                                              column="intensity", protein="Rv0287"))
intensities_esxG
drt_esxG <- sm(plot_pyprophet_protein(pyprophet_fun, expt_names="figurename",
                                      column="delta_rt", protein="Rv0287",
                                      min_data=100, max_data=1000))
drt_esxG

intensities_esxH <- plot_pyprophet_protein(pyprophet_fun, expt_names="figurename",
                                           column="intensity", protein="Rv0288")
intensities_esxH
drt_esxH <- plot_pyprophet_protein(pyprophet_fun, expt_names="figurename",
                                   column="delta_rt", protein="Rv0288", min_data=100, max_data=1000)
drt_esxH

intensities_lpqH <- plot_pyprophet_protein(pyprophet_fun, expt_names="figurename",
                                           column="intensity", protein="lpqH")
intensities_lpqH
drt_lpqH <- plot_pyprophet_protein(pyprophet_fun, expt_names="figurename",
                                   column="delta_rt", protein="lpqH", min_data=100, max_data=1000)
drt_lpqH


plot_pyprophet_protein(pyprophet_fun, expt_names="figurename",
                       column="intensity", protein="0562")
```

## MSstats

msstats.org seems to provide a complete solution for performing reasonable metrics of this data.

I am currently reading: http://msstats.org/wp-content/uploads/2017/01/MSstats_v3.7.3_manual.pdf

I made some moderately intrusive changes to MSstats to make it clearer, as well.

```{r tb_msstats_quant, eval=FALSE}
tt <- sm(devtools::load_all("~/scratch/git/MSstats"))
checkpoint <- paste0("msstats_dataprocess-v", ver, ".rda")
if (file.exists(checkpoint)) {
  load(file=checkpoint)
} else {
  msstats_quant <- dataProcess(msstats_input)
  save(file=checkpoint, list=c("msstats_quant"))
}
msstats_plots <- sm(dataProcessPlots(msstats_quant, type="QCPLOT"))

my_levels <- levels(as.factor(msstats_input$condition))
my_levels
comparisons <- make_simplified_contrast_matrix(
  numerators=c("wt_filtrate", "delta_filtrate", "delta_filtrate", "delta_whole"),
  denominators=c("wt_whole", "delta_whole", "wt_filtrate", "wt_whole"))

msstats_results <- list()
checkpoint <- paste0("msstats_group-v", ver, ".rda")
if (file.exists(checkpoint)) {
  load(file=checkpoint)
} else {
  for (c in 1:length(rownames(comparisons))) {
    name <- rownames(comparisons)[c]
    message("Starting ", name)
    comp <- comparisons[c, ]
    comp <- t(as.matrix(comp))
    rownames(comp) <- name
    msstats_results[[name]] <- sm(MSstats::groupComparison(contrast.matrix=comp,
                                                           data=msstats_quant))
    message("Finished ", name)
  }
  save(file=checkpoint, list=c("msstats_results"))
}
```

### P/PE protein QC plots for Yan

Yan asked for the p/pe protein qc plots. ok.  I changed the dataProcessPlots to
return something useful, so that should be possible now.

```{r pe, eval=FALSE}
pe_genes <- read.table("reference/annotated_pe_genes.txt")[[1]]

## Unfortunately, the names did not get set in my changed version of dataProcessPlots...
plotlst <- msstats_plots$QCPLOT
available_plots <- gsub(pattern="^1/", replacement="",
                        x=levels(msstats_quant$ProcessedData$PROTEIN))
names(plotlst) <- available_plots

pe_in_avail_idx <- pe_genes %in% available_plots
pe_in_avail <- pe_genes[pe_in_avail_idx]
pe_plots <- plotlst[pe_in_avail]
pdf(file="pe_qc_plots.pdf")
for (p in 1:length(pe_plots)) {
  plot(pe_plots[[p]])
}
dev.off()
length(pe_plots)
```

# Create hpgltools expressionset

Since I am not certain I understand these data, I will take the intensities from
SWATH2stats, metadata, and annotation data;  attempt to create a 'normal'
expressionset; poke at it to see what I can learn.

## Massaging the metadata

I want to use the same metadata as were used for MSstats.  It has a few
important differences from the requirements of hpgltools: pretty much only that
I do not allow rownames/sampleIDs to start with a number.

## Massaging the intensity matrix

I do not want the \1 before the protein names, I already merged them into one
entry per gene via SWATH2stats.

```{r tb_protein_matrix}
prot_mtrx <- read.csv(file.path("results", "swath2stats", ver, "protein_matrix_filtered.csv"))
rownames(prot_mtrx) <- gsub(pattern="^1\\/", replacement="", x=prot_mtrx[["proteinname"]])
prot_mtrx <- prot_mtrx[, -1]
## Important question: Did SWATH2stats reorder my data?
colnames(prot_mtrx) <- gsub(pattern="^(.*)(2018.*)$", replacement="s\\2", x=colnames(prot_mtrx))
```

## Merge the pieces

Now we should have sufficient pieces to make an expressionset.

While here, I will also split the data into a cf and whole-cell pair of data structures.

```{r tb_expt, fig.show="hide"}
## Drop the metadata not in the protein matrix:
## And ensure that they are the same order.
reordered <- colnames(prot_mtrx)
metadata <- sample_annot[reordered, ]
protein_expt <- sm(create_expt(metadata,
                               count_dataframe=prot_mtrx,
                               gene_info=mtb_annotations))

written_file <- glue::glue("excel/{rundate}_protein_expt-v{ver}.xlsx")
whole_expt <- subset_expt(protein_expt, subset="collectiontype=='whole'")
cf_expt <- subset_expt(protein_expt, subset="collectiontype=='filtrate'")
protein_write <- write_expt(protein_expt, expt_names="figurename",
                            violin=TRUE, batch="raw", excel=written_file)
```
## Metrics of the full data set

Generate some metrics of the full data set, later the same things will be
performed using the data subsets.  There is a weird caveat: sva fails under some
strange circumstances for this data.  It seems that if the data is cpmmed, then
sva sees the resulting matrix as containing singular values.  I am guessing that
this means that there are a bunch of low values in the filtrate data which, when
cpm(data) is performed get dropped to near 0 and therefore trip up sva.  There
are two likely ways around this:

1.  Do not cpm the data.
2.  Filter the data more aggressively so that there are no zero-values after cpm().

```{r tb_protein_metrics, fig.show='hide'}
protein_metrics <- sm(graph_metrics(protein_expt))
protein_norm <- sm(normalize_expt(protein_expt, transform="log2", convert="cpm",
                                  norm="quant", filter=TRUE))
protein_norm_metrics <- sm(graph_metrics(protein_norm))
protein_sva <- sm(normalize_expt(protein_expt, transform="log2",
                                 batch="svaseq", filter=TRUE))
protein_sva_metrics <- sm(graph_metrics(protein_sva))
```

## Metrics of the whole-cell data set

Here are some of the promised subset data.

```{r tb_whole_metrics, fig.show='hide'}
whole_metrics <- sm(graph_metrics(whole_expt))
whole_norm <- sm(normalize_expt(whole_expt, transform="log2", convert="cpm",
                                norm="quant", filter=TRUE))
whole_norm_metrics <- sm(graph_metrics(whole_norm))
whole_sva <- sm(normalize_expt(whole_expt, transform="log2", convert="cpm",
                               batch="svaseq", filter=TRUE))
whole_sva_metrics <- sm(graph_metrics(whole_sva))
```

## Variance partition

This is a potential argument against including only the newest technical
replicate of the data; as when all the technicals were included, these metrics
looked much more sensible.

```{r tb_variance}
whole_varpart <- simple_varpart(whole_norm)
whole_varpart$partition_plot
whole_cv <- plot_variance_coefficients(protein_expt)
whole_cv$disp
##tb_batch_cv <- plot_variance_coefficients(tb_protein_expt, x_axis="batch")
##tb_batch_cv$disp
##genotype_cv <- plot_variance_coefficients(protein_expt, x_axis="genotype")
##genotype_cv$disp
##tb_prep_cv <- plot_variance_coefficients(tb_protein_expt, x_axis="prepdate")
##tb_prep_cv$disp

vio <- plot_boxplot(protein_expt, violin=TRUE)
vio
```

## Metrics of the filtrate data set

Once again, here we have metrics of the subset data; this time of filtrate data.

```{r tb_cf_metrics, fig.show='hide'}
cf_metrics <- sm(graph_metrics(cf_expt))
cf_norm <- sm(normalize_expt(cf_expt, transform="log2", convert="cpm",
                             norm="quant", filter=TRUE))
cf_norm_metrics <- sm(graph_metrics(cf_norm))
cf_sva <- sm(normalize_expt(cf_expt, transform="log2", convert="cpm",
                             batch="svaseq", filter=TRUE))
cf_sva_metrics <- sm(graph_metrics(cf_sva))

## Get the distribution of the filtrate data
cf_filt <- normalize_expt(cf_expt, filter=TRUE)
sample_summary <- summary(exprs(cf_filt))

cf_remaining <- normalize_expt(cf_expt, filter="cbcb")
plot_density(cf_remaining)$plot
keeper_ids <- rownames(exprs(cf_remaining))
"Rv1746" %in% rownames(exprs(cf_remaining))
"Rv3019c" %in% rownames(exprs(cf_remaining))

changed_ids <- c("s2018_0502BrikenDIA04", "s2018_0817BrikenTrypsinDIA04", "s2018_0502BrikenDIA05",
                 "s2018_0726Briken05", "s2018_0817BrikenTrypsinDIA05", "s2018_0502BrikenDIA06",
                 "s2018_0726Briken06", "s2018_0817BrikenTrypsinDIA06", "s2018_0502BrikenDIA01",
                 "s2018_0817BrikenTrypsinDIA01", "s2018_0502BrikenDIA02", "s2018_0726Briken02",
                 "s2018_0817BrikenTrypsinDIA02", "s2018_0502BrikenDIA03",
                 "s2018_0817BrikenTrypsinDIA03")
new_fact <- rep("not_wt", length(changed_ids))
test <- set_expt_conditions(cf_expt, fact=new_fact, ids=changed_ids)
cf_merged <- mean_by_bioreplicate(test)
```

mean_by_bioreplicate() does the following:

1.  Extracts the expression matrix.
2.  Backfills all 0s with NA
3.  Performs a cpm on the result.
4.  Does a mean of all non-NA samples by biological replicate.

# Examine the mean by bioreplicate data

```{r bioreplicate}
cf_merged_norm <- normalize_expt(cf_merged, norm="quant", convert="cpm", filter="pofa", p=0.9,
                                 transform="log2")
plot_pca(cf_merged_norm)$plot
```

### Something fun for Najib

```{r pca3d}
pca3d <- plot_pca(cf_merged_norm)
pca3d$plot
silly <- plot_3d_pca(pca3d)
silly$plot
```

## Plot some metrics

In the previous blocks, I generated the metrics; in this block I will print them
with the assumption that some of them will end up being included in whatever
publication comes from this work; with that in mind I should probably change
this to sva or pdf or something not png.

```{r tb_print_metrics}
pp(image=protein_metrics$libsize,
   file=file.path("images", paste0(rundate, "_libsize.pdf")))
## It seems to me that the scale of the data is all within an order of magnitude or two.
## I cannot get used to these absurdly large numbers though.
pp(image=protein_norm_metrics$pcaplot,
   file=file.path("images", paste0(rundate, "_norm_pca.pdf")))
## There appears to be a nice split in the data, however the un-assayable batch
## effect is a problem.
pp(image=protein_norm_metrics$pcaplot,
   file=file.path("images", paste0(rundate, "_fsva_pca.pdfg")))
## fsva seems to get some handle on the data, but I don't think we should rely
## upon it.
pp(image=protein_norm_metrics$corheat,
   file=file.path("images", paste0(rundate, "_norm_corheat.pdf")))
## Once again, the whole-cell/culture-filtrate split is very large.
pp(image=protein_metrics$density,
   file=file.path("images", paste0(rundate, "_raw_density.pdf")))
## There are two obvious distributions in the data, once again split between types.
pp(image=protein_metrics$boxplot,
   file=file.path("images", paste0(rundate, "_boxplot.pdf")))
## This recapitulates the previous plot.

pp(image=whole_metrics$libsize,
   file=file.path("images", paste0(rundate, "_whole_libsize.pdf")))
pp(image=whole_norm_metrics$pcaplot,
   file=file.path("images", paste0(rundate, "_whole_norm_pca.pdf")))
pp(image=whole_sva_metrics$pcaplot,
   file=file.path("images", paste0(rundate, "_whole_fsva_pca.pdf")))
pp(image=whole_norm_metrics$corheat,
   file=file.path("images", paste0(rundate, "_whole_norm_corheat.pdf")))
pp(image=whole_metrics$density,
   file=file.path("images", paste0(rundate, "_whole_raw_density.pdf")))
pp(image=whole_metrics$boxplot,
   file=file.path("images", paste0(rundate, "_whole_boxplot.pdf")))

pp(image=cf_metrics$libsize,
   file=file.path("images", paste0(rundate, "_libsize.pdf")))
pp(image=cf_norm_metrics$pcaplot,
   file=file.path("images", paste0(rundate, "_norm_pca.pdf")))
pp(image=cf_sva_metrics$pcaplot,
   file=file.path("images", paste0(rundate, "_fsva_pca.pdf")))
pp(image=cf_norm_metrics$corheat,
   file=file.path("images", paste0(rundate, "_norm_corheat.pdf")))
pp(image=cf_metrics$density,
   file=file.path("images", paste0(rundate, "_raw_density.pdf")))
pp(image=cf_metrics$boxplot,
   file=file.path("images", paste0(rundate, "_boxplot.pdf")))
```

# Perform hpgltools Differential Expression Analyses

Earlier MSstats was used to contrast the various conditions in this data.  In
this block the same will be performed using the limma/deseq/edger/ebseq/basic
methods.  As an aside, running all of these methods in serial takes ~ 1/5th the
time of running any one step of MSstats.

## Idea from Volker

Given the initial pairwise data, look at wt edgeR results ratio
filtrate/culture; then set a cutoff as log2fc <= 0.75.  Proteins which survive
this cutoff are then used in the ratio of ratios and analyses of mutant/wt.

The survivors of this initial cutoff is the set of secreted proteins.
Compare this set with the set of known secreted proteins from other papers
(Cox).  Also Collins, Abesol (likely same as the synthetic Mtb library).

Separate, simultaneous filter:  Look at the distribution of the culture filtrate
data and filter out the (relatively) low-intensity bump.  Then take the
surviving set and perform all analyses with it.  In theory, these two filter
methods should get us to a similar place.

## lfc cutoff cutoff

As mentioned below, Volker had an interesting cutoff suggestion: keep only those
with lfc >= 0.75 filtrate/whole

```{r lfc_filter}
lfc_input <- exclude_genes_expt(protein_expt, method="keep", ids=keeper_ids)
test <- edger_pairwise(lfc_input)
query <- test[["all_tables"]][["wt_whole_vs_wt_filtrate"]]
## This one of course put the desired factor on the bottom, so I will look for lfc <= -0.75
lfc_keeper_idx <- query[["logFC"]] <= -0.75
lfc_keeper_ids <- rownames(query[lfc_keeper_idx, ])

## keeper_ids is comprised of all the filtrate IDs
test_venn_lst <- list("filtrate" = keeper_ids,
                      "lfc_sig" = lfc_keeper_ids)
test_venn <- Vennerable::Venn(Sets=test_venn_lst)
Vennerable::plot(test_venn)
```

```{r compressed_de, fig.show="hide"}
## Here I perform the same cutoff as shown in the density plots above.
## keeper_ids comes from section 'tb_cf_metrics' following filtering of the filtrate data.
input <- exclude_genes_expt(protein_expt, method="keep", ids=keeper_ids)
lfc_cutoff_input <- exclude_genes_expt(protein_expt, method="keep", ids=lfc_keeper_ids)

filter_de <- sm(all_pairwise(
  input, model_batch=FALSE, force=TRUE,
  do_ebseq=TRUE, parallel=FALSE))
lfc_filter_de <- sm(all_pairwise(
  lfc_cutoff_input, model_batch=FALSE, force=TRUE,
  do_ebseq=TRUE, parallel=FALSE))
## Interesting, when I run this interactively, no error, but it appears that the report
## had problems with it.
extra <- "delta_cfwhole_vs_wt_cfwhole=(delta_filtrate-delta_whole)-(wt_filtrate-wt_whole)"
extra_filter_de <- sm(all_pairwise(
  input, model_batch=FALSE, force=TRUE,
  do_ebseq=TRUE, parallel=FALSE,
  extra_contrasts=extra))
extra_lfc_filter_de <- sm(all_pairwise(
  lfc_cutoff_input, model_batch=FALSE, force=TRUE,
  do_ebseq=TRUE, parallel=FALSE,
  extra_contrasts=extra))

extra_keepers <- list(
  "wt_cfwhole" = c("wt_filtrate", "wt_whole"),
  "delta_cfwhole" = c("delta_filtrate", "delta_whole"),
  "whole_deltawt" = c("delta_whole", "wt_whole"),
  "cf_deltawt" = c("delta_filtrate", "wt_filtrate"),
  "rofr" = c("delta_cfwhole", "wt_cfwhole"))

all_contrast_file <- glue::glue("excel/{rundate}_combined_contrasts_all-v{ver}.xlsx")
extra_filter_tables <- combine_de_tables(
  extra_filter_de,
  keepers=extra_keepers,
  excel=all_contrast_file)

lfc_filt_contrast_file <- glue::glue("excel/{rundate}_combined_contrasts_prefiltered-v{ver}.xlsx")
  extra_lfc_filter_tables <- sm(combine_de_tables(
  extra_lfc_filter_de,
  keepers=extra_keepers,
  excel=lfc_filt_contrast_file))

## FIXME:
## No-intensity filtered all contrasts data:  {rundate}_combined_contrasts-{version}.xlsx
## Low-intensity filtered all contrasts data: {rundate}_combined_contrasts_li-{version}.xlsx
## For each of these, have significance filters: 2.0 and 0.75
##   {rundate}_combined_contrasts_li_lfc2.0-{version}.xlsx
##   {rundate}_combined_contrasts_li_lfc0.75-{version}.xlsx
## Note to self, I effed this up and these are not properly named below.
## Also, change these to 10% adjusted FDR.

sig_file <- glue::glue("excel/{rundate}_siglfc0.75_proteins_all-v{ver}.xlsx")
extra_filter_sig <- extract_significant_genes(
  extra_filter_tables, according_to="edger",
  lfc=0.75, p=0.10,
  excel=sig_file)

nop_sig_file <- glue::glue("excel/{rundate}_siglfc0.75nop_proteins_all-v{ver}.xlsx")
extra_filter_sig <- extract_significant_genes(
  extra_filter_tables, according_to="edger",
  lfc=0.75, p=1.0,
  excel=sig_file)

sig_file <- glue::glue("excel/{rundate}_siglfc0.75_proteins_prefiltered-v{ver}.xlsx")
extra_lfc_filter_sig <- sm(extract_significant_genes(
  extra_lfc_filter_tables, according_to="edger",
  lfc=0.75, p=0.10,
  excel=sig_file))

nop_sig_file <- glue::glue("excel/{rundate}_siglfc0.75nop_proteins_prefiltered-v{ver}.xlsx")
extra_filter_sig <- extract_significant_genes(
  extra_lfc_filter_tables, according_to="edger",
  lfc=0.75, p=1.0,
  excel=sig_file)
```

```{r play_de, eval=FALSE}
test_limma <- sm(limma_pairwise(input, model_batch=FALSE, extra_contrasts=extra))
test_edger <- sm(edger_pairwise(input, model_batch=FALSE, extra_contrasts=extra))
test_deseq <- sm(deseq_pairwise(input, model_batch=FALSE, extra_contrasts=extra, force=TRUE))
test_basic <- sm(basic_pairwise(input, model_batch=FALSE, extra_contrasts=extra, force=TRUE))
test_ebseq <- sm(ebseq_pairwise(input, model_batch=FALSE, extra_contrasts=extra, force=TRUE))
```

### Show a few metrics from the hpgltools pairwise comparisons

```{r show_de_testing}
pp(file="images/de_heat.png", image=extra_lfc_filter_de$comparison$heat)
extra_lfc_filter_de$comparison$heat
```

# For each msstats run, do a DE table

## wt_cf vs wt_whole

```{r tb_combine_wtcf_wtwhole, fig.show='hide'}
keepers <- list(
  "wtcf_vs_wtwhole" = c("wt_filtrate", "wt_whole"))
ms_set_name <- "wt_filtrate_vs_wt_whole"
msstats_result <- msstats_results[[ms_set_name]][["ComparisonResult"]]
droppers <- c("undefined")
names(droppers) <- "log2fc"
## Make sure to set the rownames so it will merge into the excel file.
rownames(msstats_result) <- msstats_result[["Protein"]]
combined_file <- glue::glue("excel/{rundate}_wtcf_vs_wtwhole_all-v{ver}.xlsx")
wtcf_nobatch_wtwhole_tables <- sm(combine_de_tables(
  extra_lfc_filter_de, keepers=keepers, extra_annot=msstats_result,
  excludes=droppers,
  excel=combined_file))
comp_table <- wtcf_nobatch_wtwhole_tables$data[[names(keepers)]]
cor.test(comp_table$log2fc, comp_table$deseq_logfc, method="spearman")
cor.test(comp_table$log2fc, comp_table$limma_logfc, method="spearman")
cor.test(comp_table$log2fc, comp_table$ebseq_logfc, method="spearman")
sig_file <- glue::glue("excel/{rundate}_wtcf_vs_wtwhole_siglfc0.75_proteins_all-v{ver}.xlsx")
wtcf_nobatch_wtwhole_sig <- sm(extract_significant_genes(
  wtcf_nobatch_wtwhole_tables, lfc=0.75, p=0.10,
  excel=sig_file))
```

## delta_cf vs delta_whole

```{r tb_combine_deltacf_deltawhole, fig.show='hide'}
keepers <- list(
  "deltacf_vs_deltawhole" = c("delta_filtrate", "delta_whole"))
ms_set_name <- "delta_filtrate_vs_delta_whole"
msstats_result <- msstats_results[[ms_set_name]][["ComparisonResult"]]
## Make sure to set the rownames so it will merge into the excel file.
rownames(msstats_result) <- msstats_result[["Protein"]]
combined_file <- glue::glue("excel/{rundate}_deltacf_vs_deltawhole_all-v{ver}.xlsx")
deltacf_deltawhole_tables <- sm(combine_de_tables(
  extra_lfc_filter_de, keepers=keepers, extra_annot=msstats_result,
  excel=combined_file))
comp_table <- deltacf_deltawhole_tables$data[[names(keepers)]]
cor.test(comp_table$log2fc, comp_table$deseq_logfc, method="spearman")
tt <- plot_linear_scatter(comp_table[, c("log2fc", "edger_logfc")])
tt$scatter

sig_file <- glue::glue("excel/{rundate}_deltacf_vs_deltawhole_siglfc0.75_proteins_all-v{ver}.xlsx")
deltacf_deltawhole_sig <- sm(extract_significant_genes(
  deltacf_deltawhole_tables, lfc=0.75, p=0.10,
  excel=sig_file))
```

## delta_cf vs wt_cf

Something is weird about this particular contrast, I keep getting negative correlations!

```{r tb_combine_deltacf_wtcf, fig.show='hide'}
keepers <- list(
  "deltacf_vs_wtcf" = c("delta_filtrate", "wt_filtrate"))
ms_set_name <- "delta_filtrate_vs_wt_filtrate"
msstats_result <- msstats_results[[ms_set_name]][["ComparisonResult"]]
rownames(msstats_result) <- msstats_result[["Protein"]]
combined_file <- glue::glue("excel/{rundate}_deltacf_vs_wtcf_all-v{ver}.xlsx")
deltacf_wtcf_tables <- sm(combine_de_tables(
  extra_lfc_filter_de, keepers=keepers, extra_annot=msstats_result,
  combined_file))

comp_table <- deltacf_wtcf_tables$data[[names(keepers)]]
cor.test(comp_table$log2fc, comp_table$deseq_logfc, method="spearman")
cor.test(comp_table$log2fc, comp_table$limma_logfc, method="spearman")
cor.test(comp_table$log2fc, comp_table$edger_logfc, method="spearman")
cor.test(comp_table$log2fc, comp_table$ebseq_logfc, method="spearman")
scatter_test <- plot_linear_scatter(comp_table[, c("log2fc", "limma_logfc")])
scatter_test$scatter
filtered_idx <- ! is.na(comp_table$log2fc)
com_table <- comp_table[filtered_idx, ]
neg_inf_idx <- com_table$log2fc == -Inf
com_table[neg_inf_idx, "log2fc"] <- -100
pos_inf_idx <- com_table$log2fc == Inf
com_table[pos_inf_idx, "log2fc"] <- 100
cor.test(com_table$log2fc, com_table$deseq_logfc)
scatter_test <- plot_linear_scatter(com_table[, c("log2fc", "edger_logfc")])

sig_file <- glue::glue("excel/{rundate}_deltacf_vs_wtcf_siglfc0.75_proteins_all-v{ver}.xlsx")
deltacf_wtcf_sig <- extract_significant_genes(
  deltacf_wtcf_tables, lfc=0.75, p=0.10,
  excel=sig_file)
```

Ok, I have been resisting this, but lets look more closely at the data for this
contrast.
One thing I can do to look for correctness in what I am seeing is to look at the
mean numerators/denominators for this contrast and see if they agree with
msstats or the others.

## delta_whole vs wt_whole

```{r tb_combine_deltawhole_wtwhole, fig.show='hide'}
keepers <- list(
  "deltawhole_vs_wtwhole" = c("delta_whole", "wt_whole"))
ms_set_name <- "delta_whole_vs_wt_whole"
msstats_result <- msstats_results[[ms_set_name]][["ComparisonResult"]]
rownames(msstats_result) <- msstats_result[["Protein"]]
combined_file <- glue::glue("excel/{rundate}_deltawhole_vs_wtwhole_all-v{ver}.xlsx")
deltawhole_wtwhole_tables <- sm(combine_de_tables(
  extra_lfc_filter_de, keepers=keepers, extra_annot=msstats_result,
  excel=combined_file))
comp_table <- deltawhole_wtwhole_tables$data[[names(keepers)]]
cor.test(comp_table$log2fc, comp_table$deseq_logfc, method="spearman")
sig_file <- glue::glue("excel/{rundate}_deltawhole_vs_wtwhole_siglfc0.75_proteins_all-v{ver}.xlsx")
deltawhole_wtwhole_sig <- sm(extract_significant_genes(
  deltawhole_wtwhole_tables, lfc=0.75, p=0.10,
  excel=sig_file))
```

## Venn of MSstats vs. others for wt whole/cf

Najib asked for the overlap in perceived significance.

```{r msstats_other_venn}
start <- wtcf_nobatch_wtwhole_tables[["data"]][[1]]
na_idx <- is.na(start[["adjpvalue"]])
start[na_idx, "adjpvalue"] <- 1
ms_sig_idx <- start[["adjpvalue"]] <= 0.05

de_sig_idx <- start[["deseq_adjp"]] <= 0.05
ed_sig_idx <- start[["edger_adjp"]] <= 0.05
lm_sig_idx <- start[["limma_adjp"]] <= 0.05
eb_sig_idx <- start[["ebseq_ppee"]] <= 0.05

ms_sig <- start[ms_sig_idx, ]
de_sig <- start[de_sig_idx, ]
ed_sig <- start[ed_sig_idx, ]
lm_sig <- start[lm_sig_idx, ]
eb_sig <- start[eb_sig_idx, ]

method_lst <- list(
  "msstats" = rownames(ms_sig),
  "deseq" = rownames(de_sig),
  "edger" = rownames(ed_sig),
  "limma" = rownames(lm_sig))
method_venn <- Vennerable::Venn(Sets=method_lst)
Vennerable::plot(method_venn, doWeights=FALSE)
```

# Intersections

```{r intersect}
chosen_intersection <- intersect_significant(
  wtcf_nobatch_wtwhole_tables,
  selectors=c("limma", "msstats"),
  fc_column="log2fc", p_column="adjpvalue",
  excel="excel/testing_intersections_two.xlsx")
chosen_intersection <- intersect_significant(
  wtcf_nobatch_wtwhole_tables,
  selectors=c("limma", "ebseq", "deseq"),
  fc_column="log2fc", p_column="adjpvalue",
  excel="excel/testing_intersections_three.xlsx")
chosen_intersection <- intersect_significant(
  wtcf_nobatch_wtwhole_tables,
  selectors=c("limma", "ebseq", "deseq", "edger"),
  fc_column="log2fc", p_column="adjpvalue",
  excel="excel/testing_intersections_four.xlsx")
```

# Ontology tests

## Wt CF vs Wt whole

```{r ontologywtcfwtwhole}
mtb_go <- load_microbesonline_go(id=83332, id_column="sysName")
head(mtb_go)
colnames(mtb_go) <- c("ID", "GO")
mtb_lengths <- mtb_annotations[, c("ID", "width")]

## What sets of genes to perform goseq on?
## Arbitrary decision time, lets use deseq data for each contrast, once without sva, once with.
wtcf_wtwhole_up <- wtcf_nobatch_wtwhole_sig[["deseq"]][["ups"]][[1]]
wtcf_wtwhole_up_goseq_nb <- simple_goseq(
  sig_genes=wtcf_wtwhole_up, go_db=mtb_go, length_db=mtb_lengths,
  excel=paste0("excel/wtcf_wtwhole_up_deseq_nobatch_goseq-v", ver, ".xlsx"))
wtcf_wtwhole_up_goseq_nb$pvalue_plots[[1]]

wtcf_wtwhole_down <- wtcf_nobatch_wtwhole_sig[["deseq"]][["downs"]][[1]]
wtcf_wtwhole_down_goseq_nb <- simple_goseq(
  sig_genes=wtcf_wtwhole_down, go_db=mtb_go, length_db=mtb_lengths,
  excel=paste0("excel/wtcf_wtwhole_down_deseq_nobatch_goseq-v", ver, ".xlsx"))
wtcf_wtwhole_down_goseq_nb$pvalue_plots[[1]]
```

## delta CF vs delta whole

```{r ontology_dcf_dwhole}
deltacf_deltawhole_up <- deltacf_deltawhole_sig[["deseq"]][["ups"]][[1]]
deltacf_deltawhole_up_goseq_nb <- simple_goseq(
  sig_genes=deltacf_deltawhole_up, go_db=mtb_go, length_db=mtb_lengths,
  excel=paste0("excel/deltacf_deltawhole_up_deseq_nobatch_goseq-v", ver, ".xlsx"))
deltacf_deltawhole_up_goseq_nb$pvalue_plots[[1]]

deltacf_deltawhole_down <- deltacf_deltawhole_sig[["deseq"]][["downs"]][[1]]
deltacf_deltawhole_down_goseq_nb <- simple_goseq(
  sig_genes=deltacf_deltawhole_down, go_db=mtb_go, length_db=mtb_lengths,
  excel=paste0("excel/deltacf_deltawhole_down_deseq_nobatch_goseq-v", ver, ".xlsx"))
deltacf_deltawhole_down_goseq_nb$pvalue_plots[[1]]
```

## delta filtrate vs wt filtrate

```{r ontology_dcf_wtcf}
deltacf_wtcf_up <- deltacf_wtcf_sig[["deseq"]][["ups"]][[1]]
deltacf_wtcf_up_goseq_nb <- simple_goseq(
  sig_genes=deltacf_wtcf_up, go_db=mtb_go, length_db=mtb_lengths,
  excel=paste0("excel/deltacf_wtcf_up_deseq_nobatch_goseq-v", ver, ".xlsx"))
deltacf_wtcf_up_goseq_nb$pvalue_plots[[1]]

deltacf_wtcf_down <- deltacf_wtcf_sig[["deseq"]][["downs"]][[1]]
deltacf_wtcf_down_goseq_nb <- simple_goseq(
  sig_genes=deltacf_wtcf_down, go_db=mtb_go, length_db=mtb_lengths,
  excel=paste0("excel/deltacf_wtcf_down_deseq_nobatch_goseq-v", ver, ".xlsx"))
deltacf_wtcf_down_goseq_nb$pvalue_plots[[1]]
```

## delta whole vs wt whole

```{r ontology_dwhole_wtwhole}
deltawhole_wtwhole_up <- deltawhole_wtwhole_sig[["deseq"]][["ups"]][[1]]
deltawhole_wtwhole_up_goseq_nb <- simple_goseq(
  sig_genes=deltawhole_wtwhole_up, go_db=mtb_go, length_db=mtb_lengths,
  excel=paste0("excel/deltawhole_wtwhole_up_deseq_nobatch_goseq-v", ver, ".xlsx"))
deltawhole_wtwhole_up_goseq_nb$pvalue_plots[[1]]

deltawhole_wtwhole_down <- deltawhole_wtwhole_sig[["deseq"]][["downs"]][[1]]
deltawhole_wtwhole_down_goseq_nb <- simple_goseq(
  sig_genes=deltawhole_wtwhole_down, go_db=mtb_go, length_db=mtb_lengths,
  excel=paste0("excel/deltawhole_wtwhole_down_deseq_nobatch_goseq-v", ver, ".xlsx"))
deltawhole_wtwhole_down_goseq_nb$pvalue_plots[[1]]
```

# circos

```{r circos}
gff_df <- load_gff_annotations("reference/mtuberculosis_h37rv2.gff")
rownames(gff_df) <- make.names(gff_df[["ID"]], unique=TRUE)
microbes_annot <- load_microbesonline_annotations(id=83332)
rownames(microbes_annot) <- make.names(microbes_annot[["sysName"]], unique=TRUE)

coefficients <- extra_lfc_filter_de$deseq$coefficients
values <- merge(microbes_annot, coefficients, by="row.names")

cog_groups <- values[, c("Row.names", "start", "stop", "strand", "COGFun")]
colnames(cog_groups) <- c("seqnames", "start", "stop", "strand", "COGFun")
rownames(cog_groups) <- cog_groups[["seqnames"]]

wt_whole <- values[, c("Row.names", "wt_whole")]
rownames(wt_whole) <- wt_whole[["Row.names"]]
wtcf <- values[, c("Row.names", "wt_filtrate")]
rownames(wtcf) <- wtcf[["Row.names"]]
delta_whole <- values[, c("Row.names", "delta_whole")]
rownames(delta_whole) <- delta_whole[["Row.names"]]
delta_filtrate <- values[, c("Row.names", "delta_filtrate")]
rownames(delta_filtrate) <- delta_filtrate[["Row.names"]]

circos_mtb <- circos_prefix(name="mtb")
mtb_karyotype <- circos_karyotype("mtb", length=4411532)
mtb_plus <- circos_plus_minus(cog_groups, circos_mtb)
mtb_wtwhole <- circos_heatmap(
  wt_whole, gff_df, colname="wt_whole",
  cfgout=circos_mtb, basename="wt_whole", outer=mtb_plus)
mtb_deltawhole <- circos_heatmap(
  delta_whole, gff_df, colname="delta_whole",
  cfgout=circos_mtb, basename="delta_whole", outer=mtb_wtwhole)
mtb_wtcf <- circos_heatmap(
  wtcf, gff_df, colname="wt_filtrate",
  cfgout=circos_mtb, basename="wt_filtrate", outer=mtb_deltawhole)
mtb_deltacf <- circos_heatmap(
  delta_filtrate, gff_df, colname="delta_filtrate",
  cfgout=circos_mtb, basename="delta_filtrate", outer=mtb_wtcf)
mtb_deltacf_heat <- circos_heatmap(
  delta_filtrate, gff_df, colname="delta_filtrate",
  cfgout=circos_mtb, basename="delta_filtrate_heat", outer=mtb_deltacf)
circos_suffix(cfgout=circos_mtb)
circos_made <- sm(circos_make(target="mtb"))
```

[circos plot](circos/mtb.png)

# TODO

* 2018-04-10:  Make sure my invocations of SWATH2stats/MSstats are correct.

```{r saveme}
if (!isTRUE(get0("skip_load"))) {
  message(paste0("This is hpgltools commit: ", get_git_commit()))
  this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
  message(paste0("Saving to ", this_save))
  tmp <- sm(saveme(filename=this_save))
  pander::pander(sessionInfo())
}
```

```{r loadme, eval=FALSE}
tmp <- loadme(filename=this_save)
```
