1 TODO

  • Remove MSstats logging. – done
  • Make explicit spearman correlations between methods. – done
    • Do both for all data and for the top-50/bottom-50 – done, but weird.
  • Make 100% certain that the samples are annotated correctly. – done.
  • Limma/MSstats/EdgeR venn disagram for all up in CF
    • Repeat in all down CF
    • Repeat with a cutoff, top-50
  • Individual plots for P/PE proteins.

2 Analyzing data from openMS and friends.

In preprocessing_comet_highres.Rmd, I used the openMS tutorials and supplemental materials from a couple papers to hopefully correctly perform the various preprocessing tasks required to extract intensity data from DIA/SWATH transitions.

The final steps of that process combined the transition intensities from every sample into a metadata frile (results/tric/HCD_meta.tsv), an intensity matrix (results/tric/HCD_outmatrix.tsv), and a feature aligned output matrix (results/tric/aligned_comet_HCD.tsv).

My reading of the SWATH2stats and MSstats source code suggests to me that the log2(intensities) of the feature aligned data are our final proxy for protein abundance. At first glance, this suggests to me that these data might follow a distribution similar to RNASeq data (negative binomial, but perhaps with a bigger tail?). In addition, by the time we use tric on the data, we have a count matrix and sample annotation data frames which look remarkably similar to those used in a RNASeq expressionset. Indeed, by the end of the MSstats processing, it creates a MSnSet class of its own which uses fData/exprs/pData.

For the curious, my reasoning for saying that the log intensities are our proxy for abundance comes from MSstats/R/DataProcess.R in a clause which looks like:

if (logTrans == 2) {
  work[["ABUNDANCE"]] <- log2(work[["ABUNDANCE"]])
} else if (logTrans == 10) {
  work[["ABUNDANCE"]] <- log10(work[["ABUNDANCE"]])
} else {
  ## Above there was a check for only log 2 and 10, but we can do e if we want.
  ## I might go back up there and remove that check. Long live e! 2.718282 rules!
  work[["ABUNDANCE"]] <- log(work[["ABUNDANCE"]]) / log(logTrans)
}

(Note: I added the natural log to the set of conditions, but otherwise the logic is unchanged.)

With that in mind, I want to use some tools with which I am familiar in order to try to understand these data. Therefore I will first attempt to coerce my tric aligned data and annotations into a ‘normal’ expressionset. Then I want to do some diagnostic plots which, if I am wrong and these distributions are not as expected, will be conceptually incorrect (I don’t yet think I am wrong).

2.1 Sample annotation via SWATH2stats

I am using the SWATH2stats vignette as my primary source of information. Thus I see that it uses the OpenSWATH_SM3_GoldStandardAutomatedResults_human_peakgroups.txt which has a format nearly identical to my tric output matrix. Thus for the moment I will assume that the proper input for SWATH2stats is ‘results/tric/comet_HCD.tsv’ and not the metadata nor output matrix.

I keep a sample sheet of all the DIA samples used in this analysis in ‘sample_sheets/dia_samples.xlsx’ It should contain all the other required data with one important caveat, I removed 1 sample by ‘commenting’ it (e.g. prefixing it with ‘##’ – which is an admittedly dumb thing to do in an excel file.

One last caveat: I hacked the SWATH2stats sample_annotation() function to add a couple columns in an attempt to make it a little more robust when faced with sample sheets with differently named columns.

In addition, SWATH2stats provides some nice filtering and combination functions which should be considered when generating various expressionset data structures later.

tric_data <- read.csv("results/tric_20180530/comet_HCD.tsv", sep="\t")
## Warning in file(file, "rt"): cannot open file 'results/tric_20180530/comet_HCD.tsv': No
## such file or directory
## Error in file(file, "rt"): cannot open the connection
sample_annot <- openxlsx::read.xlsx("sample_sheets/Mtb_dia_samples.xlsx")
rownames(sample_annot) <- make.names(sample_annot[["sampleid"]], unique=TRUE)
## Drop samples starting with comments
keep_idx <- ! grepl(pattern="##", x=sample_annot[["sampleid"]])
sample_annot <- sample_annot[keep_idx, ]
expt_idx <- sample_annot[["expt_id"]] == "may2018" | sample_annot[["expt_id"]] == "mar2018"
expt_idx[is.na(sample_annot[["expt_id"]])] <- FALSE
sample_annot <- sample_annot[expt_idx, ]
mz_idx <- sample_annot[["windowsize"]] == "8"
sample_annot <- sample_annot[mz_idx, ]
## Set the mzXML column to match the filename column in the data.
loaded <- sm(devtools::load_all("~/scratch/git/SWATH2stats"))
## s2s, my witty way of shortening SWATH2stats...
s2s_exp <- sample_annotation(data=tric_data,
                             sample_annotation=sample_annot,
                             fullpeptidename_column="fullunimodpeptidename")
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, : The
## number of sample annotation condition and filenames in data are equal.
## Warning in if (missing_samples_from_data > 0) {: the condition has length > 1 and only the
## first element will be used
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, : The
## missing data samples from the annotation are: mzXML/dia_20180530/2018_0315Briken01.mzXML,
## mzXML/dia_20180530/2018_0315Briken02.mzXML, mzXML/dia_20180530/2018_0315Briken03.mzXML,
## mzXML/dia_20180530/2018_0315Briken04.mzXML, mzXML/dia_20180530/2018_0315Briken05.mzXML,
## mzXML/dia_20180530/2018_0315Briken06.mzXML, mzXML/dia_20180530/2018_0315Briken21.mzXML,
## mzXML/dia_20180530/2018_0315Briken22.mzXML, mzXML/dia_20180530/2018_0315Briken23.mzXML,
## mzXML/dia_20180530/2018_0315Briken24.mzXML, mzXML/dia_20180530/2018_0315Briken25.mzXML,
## mzXML/dia_20180530/2018_0315Briken26.mzXML, mzXML/dia_20180530/2018_0502BrikenDIA01.mzXML,
## mzXML/dia_20180530/2018_0502BrikenDIA02.mzXML, mzXML/
## dia_20180530/2018_0502BrikenDIA03.mzXML, mzXML/dia_20180530/2018_0502BrikenDIA04.mzXML,
## mzXML/dia_20180530/2018_0502BrikenDIA05.mzXML, mzXML/
## dia_20180530/2018_0502BrikenDIA06.mzXML, mzXML/dia_20180530/2018_0502BrikenDIA07.mzXML,
## mzXML/dia_20180530/2018_0502BrikenDIA08.mzXML, mzXML/
## dia_20180530/2018_0502BrikenDIA09.mzXML, mzXML/dia_20180530/2018_0502BrikenDIA11.mzXML,
## mzXML/dia_20180530/2018_0502BrikenDIA12.mzXML.
## Warning in if (missing_samples_from_annot > 0) {: the condition has length > 1 and only
## the first element will be used
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## The missing data samples from the data are: mzXML/dia_20mz/2018_0116BrikenDIA11.mzXML,
## mzXML/dia_20mz/2018_0116BrikenDIA12.mzXML, mzXML/dia_20mz/2018_0116BrikenDIA13.mzXML,
## mzXML/dia_20mz/2018_0315Briken11.mzXML, mzXML/dia_20mz/2018_0315Briken12.mzXML, mzXML/
## dia_20mz/2018_0315Briken13.mzXML, mzXML/dia_20mz/2018_0315Briken15.mzXML, mzXML/
## dia_20mz/2018_0315Briken16.mzXML, mzXML/dia_8mz/2018_0116BrikenDIA01.mzXML, mzXML/
## dia_8mz/2018_0116BrikenDIA02.mzXML, mzXML/dia_8mz/2018_0116BrikenDIA03.mzXML, mzXML/
## dia_8mz/2018_0315Briken01.mzXML, mzXML/dia_8mz/2018_0315Briken02.mzXML, mzXML/
## dia_8mz/2018_0315Briken03.mzXML, mzXML/dia_8mz/2018_0315Briken04.mzXML, mzXML/
## dia_8mz/2018_0315Briken05.mzXML, mzXML/dia_8mz/2018_0315Briken06.mzXML, mzXML/
## dia_8mz/2018_0315Briken21.mzXML, mzXML/dia_8mz/2018_0315Briken22.mzXML, mzXML/
## dia_8mz/2018_0315Briken23.mzXML, mzXML/dia_8mz/2018_0315Briken24.mzXML, mzXML/dia_8mz/
## 2018_0315Briken25.mzXML, mzXML/dia_8mz/2018_0315Briken26.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken01.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken02.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken03.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken04.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken05.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken06.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken21.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken22.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken23.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken24.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken25.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0315Briken26.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA01.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA02.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA03.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA04.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA05.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA06.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA07.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA08.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA09.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA11.mzXML.
## Warning in sample_annotation(data = tric_data, sample_annotation = sample_annot, :
## No measurement value found for this sample in the data file: mzXML/
## dia_20180530/2018_0502BrikenDIA12.mzXML.
## The following columns were missing from the data:
## fullunimodpeptidename
## Error in `[.data.frame`(data, , sel_colnames): undefined columns selected

Now I have a couple data structures which should prove useful for the metrics provided by SWATH2stats, MSstats, and my own hpgltools.

3 SWATH2stats continued

Lets return to some of the metrics provided by swath2stats.

## Get correlations on a sample by sample basis
pp(file="images/20180523_swath2stats_sample_cor.png")
## Going to write the image to: images/20180523_swath2stats_sample_cor.png when dev.off() is called.
sample_cor <- plot_correlation_between_samples(s2s_exp,
                                               fun.aggregate=sum,
                                               column.values="intensity")
dev.off()
## X11cairo 
##        2
sample_cond_rep_cor <- plot_correlation_between_samples(s2s_exp,
                                                        comparison=transition_group_id ~
                                                          condition + bioreplicate + run,
                                                        fun.aggregate=sum,
                                                        column.values="intensity")

## I am a little concerned that these values do not seem to change when I took
## filtered/normalized data.  So I am rerunning them manually for a moment --
## perhaps I messed something up when I rewrote portions of the
## sample_annotation() function in SWATH2stats.

## ahh I think I see the problem.  The default value for fun.aggregate is NULL,
## which causes dcast to default to length.  I think this is not likely to be
## valid for this data.  I am not certain, however, what is the appropriate
## function.  If I had to guess, I would go with sum()?

assess_decoy_rate(s2s_exp)
## Number of non-decoy peptides: 3227
## Number of decoy peptides: 244
## Decoy rate: 0.0756
## This seems a bit high to me, yesno?
fdr_overall <- assess_fdr_overall(s2s_exp, output="Rconsole", plot=TRUE)

byrun_fdr <- assess_fdr_byrun(s2s_exp, FFT=0.7, plot=TRUE, output="Rconsole")
## The average FDR by run on assay level is 0.016
## The average FDR by run on peptide level is 0.016
## The average FDR by run on protein level is 0.042

chosen_mscore <- mscore4assayfdr(s2s_exp, FFT=0.7, fdr_target=0.02)
## Target assay FDR: 0.02
## Required overall m-score cutoff: 0.0031623
## achieving assay FDR: 0.0198
prot_score <- mscore4protfdr(s2s_exp, FFT=0.7, fdr_target=0.02)
## Target protein FDR: 0.02
## Required overall m-score cutoff: 0.00070795
## achieving protein FDR: 0.0181
mscore_filtered <- filter_mscore(s2s_exp, chosen_mscore)
## Original dimension: 46587, new dimension: 41365, difference: 5222.
data_filtered_mscore <- filter_mscore_freqobs(s2s_exp, 0.01, 0.8, rm.decoy=FALSE)
## Treshold, peptides need to have been quantified in more conditions than: 18.4
## Fraction of peptides selected: 1
## Original dimension: 47663, new dimension: 47663, difference: 0.
data_filtered_fdr <- filter_mscore_fdr(mscore_filtered, FFT=0.7,
                                       overall_protein_fdr_target=prot_score,
                                       upper_overall_peptide_fdr_limit=0.05)
## Target protein FDR: 0.000707945784384137
## Required overall m-score cutoff: 0.01
## achieving protein FDR: 0
## filter_mscore_fdr is filtering the data...
## finding m-score cutoff to achieve desired protein FDR in protein master list..
## finding m-score cutoff to achieve desired global peptide FDR..
## Target peptide FDR: 0.05
## Required overall m-score cutoff: 0.01
## Achieving peptide FDR: 0
## Proteins selected: 
## Total proteins selected: 931
## Final target proteins: 931
## Final decoy proteins: 0
## Peptides mapping to these protein entries selected:
## Total mapping peptides: 3071
## Final target peptides: 3071
## Final decoy peptides: 0
## Total peptides selected from:
## Total peptides: 3071
## Final target peptides: 3071
## Final decoy peptides: 0
## Individual run FDR quality of the peptides was not calculated
## as not every run contains a decoy.
## The decoys have been removed from the returned data.
only_proteotypic <- filter_proteotypic_peptides(data_filtered_fdr)
## Number of proteins detected: 933
## Protein identifiers: Rv1908c, Rv0242c, Rv3224, Rv1133c, Rv3036c, Rv1098c
## Number of proteins detected that are supported by a proteotypic peptide: 916
## Number of proteotypic peptides detected: 3028
all_filtered <- filter_all_peptides(only_proteotypic)
## Number of proteins detected: 916
## First 6 protein identifiers: Rv1908c, Rv0242c, Rv3224, Rv1133c, Rv3036c, Rv1098c
only_strong <- filter_on_max_peptides(data=all_filtered, n_peptides=10)
## Before filtering: 
##   Number of proteins: 916
##   Number of peptides: 3028
## 
## Percentage of peptides removed: 8.19%
## 
## After filtering: 
##   Number of proteins: 914
##   Number of peptides: 2780
only_minimum <- filter_on_min_peptides(data=only_strong, n_peptides=3)
## Before filtering: 
##   Number of proteins: 914
##   Number of peptides: 2780
## 
## Percentage of peptides removed: 0%
## 
## After filtering: 
##   Number of proteins: 867
##   Number of peptides: 2780
## I think these matrixes are probably smarter to use than the raw outmatrix from tric.
## But I am not a fan of rerwriting the sample column names.
protein_matrix_all <- write_matrix_proteins(
  s2s_exp, write.csv=TRUE,
  filename=paste0("results/swath2stats_", ver, "/protein_all.csv"))
## Warning in file(file, ifelse(append, "a", "w")): cannot open file 'results/
## swath2stats_20180528/protein_all.csv': No such file or directory
## Error in file(file, ifelse(append, "a", "w")): cannot open the connection
dim(protein_matrix_all)
## [1] 1146   24
protein_matrix_mscore <- write_matrix_proteins(
  mscore_filtered, write.csv=TRUE,
  filename=paste0("results/swath2stats_", ver, "/protein_matrix_mscore.csv"))
## Warning in file(file, ifelse(append, "a", "w")): cannot open file 'results/
## swath2stats_20180528/protein_matrix_mscore.csv': No such file or directory
## Error in file(file, ifelse(append, "a", "w")): cannot open the connection
dim(protein_matrix_mscore)
## [1] 931  24
peptide_matrix_mscore <- write_matrix_peptides(
  mscore_filtered, write.csv=TRUE,
  filename=paste0("results/swath2stats_", ver, "/peptide_matrix_mscore.csv"))
## Warning in file(file, ifelse(append, "a", "w")): cannot open file 'results/
## swath2stats_20180528/peptide_matrix_mscore.csv': No such file or directory
## Error in file(file, ifelse(append, "a", "w")): cannot open the connection
dim(peptide_matrix_mscore)
## [1] 3071   24
protein_matrix_minimum <- write_matrix_proteins(
  only_minimum, write.csv=TRUE,
  filename=paste0("results/swath2stats_", ver, "/protein_matrix_minimum.csv"))
## Warning in file(file, ifelse(append, "a", "w")): cannot open file 'results/
## swath2stats_20180528/protein_matrix_minimum.csv': No such file or directory
## Error in file(file, ifelse(append, "a", "w")): cannot open the connection
dim(protein_matrix_minimum)
## [1] 867  24
peptide_matrix_minimum <- write_matrix_peptides(
  only_minimum, write.csv=TRUE,
  filename=paste0("results/swath2stats_", ver, "/peptide_matrix_minimum.csv"))
## Warning in file(file, ifelse(append, "a", "w")): cannot open file 'results/
## swath2stats_20180528/peptide_matrix_minimum.csv': No such file or directory
## Error in file(file, ifelse(append, "a", "w")): cannot open the connection
dim(peptide_matrix_minimum)
## [1] 35164    24
rt_cor <- plot_correlation_between_samples(
  only_minimum, column.values="intensity", fun.aggregate=sum)

## I have no effing clue what this plot means.
variation <- plot_variation(only_minimum, fun.aggregate=sum)

## Something in SWATH2stats::disaggregate was written poorly and is looking for
## a variable named 'cols'
cols <- colnames(only_minimum)
disaggregated <- disaggregate(only_minimum, all.columns=TRUE)
## The library contains 6 transitions per precursor.
## The data table was transformed into a table containing one row per transition.
msstats_input <- convert4MSstats(disaggregated)
## One or several columns required by MSstats were not in the data. The columns were created and filled with NAs.
## Missing columns: fragmention, productcharge, isotopelabeltype
## isotopelabeltype was filled with light.
##alfq_input <- sm(convert4aLFQ(disaggregated))
##mapdia_input <- sm(convert4mapDIA(disaggregated, RT=TRUE))

3.1 Some new plots

In response to some interesting queries from Yan, I made a few little functions which query and plot data from the scored data provided by openswath/pyprophet. Let us look at their results here.

pyprophet_fun <- extract_pyprophet_data(metadata="sample_sheets/Mtb_dia_samples.xlsx")
## Warning in extract_pyprophet_data(metadata = "sample_sheets/Mtb_dia_samples.xlsx"): It
## appears that some files are missing in the metadata.
## Attempting to read the tsv file for: 2018_0315Briken01: results/openswath_201805/whole_8mz/2018_0315Briken01_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken01_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken02: results/openswath_201805/whole_8mz/2018_0315Briken02_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken02_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken03: results/openswath_201805/whole_8mz/2018_0315Briken03_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken03_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken04: results/openswath_201805/whole_8mz/2018_0315Briken04_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken04_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken05: results/openswath_201805/whole_8mz/2018_0315Briken05_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken05_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken06: results/openswath_201805/whole_8mz/2018_0315Briken06_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken06_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken11: results/openswath_201805/whole_8mz/2018_0315Briken11_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken11_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken12: results/openswath_201805/whole_8mz/2018_0315Briken12_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken12_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken13: results/openswath_201805/whole_8mz/2018_0315Briken13_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken13_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken15: results/openswath_201805/whole_8mz/2018_0315Briken15_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken15_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken16: results/openswath_201805/whole_8mz/2018_0315Briken16_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken16_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken21: results/openswath_201805/whole_8mz/2018_0315Briken21_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken21_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken22: results/openswath_201805/whole_8mz/2018_0315Briken22_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken22_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken23: results/openswath_201805/whole_8mz/2018_0315Briken23_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken23_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken24: results/openswath_201805/whole_8mz/2018_0315Briken24_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken24_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken25: results/openswath_201805/whole_8mz/2018_0315Briken25_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken25_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0315Briken26: results/openswath_201805/whole_8mz/2018_0315Briken26_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0315Briken26_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA07: results/openswath_201805/whole_8mz/2018_0502BrikenDIA07_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA07_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA08: results/openswath_201805/whole_8mz/2018_0502BrikenDIA08_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA08_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA09: results/openswath_201805/whole_8mz/2018_0502BrikenDIA09_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA09_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA11: results/openswath_201805/whole_8mz/2018_0502BrikenDIA11_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA11_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA12: results/openswath_201805/whole_8mz/2018_0502BrikenDIA12_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA12_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA01: results/openswath_201805/whole_8mz/2018_0502BrikenDIA01_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA01_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA02: results/openswath_201805/whole_8mz/2018_0502BrikenDIA02_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA02_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA03: results/openswath_201805/whole_8mz/2018_0502BrikenDIA03_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA03_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA04: results/openswath_201805/whole_8mz/2018_0502BrikenDIA04_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA04_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA05: results/openswath_201805/whole_8mz/2018_0502BrikenDIA05_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA05_vs_201805_whole_HCD_dia.tsv': No such file or directory
## Attempting to read the tsv file for: 2018_0502BrikenDIA06: results/openswath_201805/whole_8mz/2018_0502BrikenDIA06_vs_201805_whole_HCD_dia.tsv.
## Warning in file(file, "rt"): cannot open file 'results/openswath_201805/whole_8mz/
## 2018_0502BrikenDIA06_vs_201805_whole_HCD_dia.tsv': No such file or directory
mass_plot <- sm(plot_pyprophet_boxplot(pyprophet_fun, column="mass"))
## Error in sample_data[[i]]: subscript out of bounds
mass_plot
## Error in eval(expr, envir, enclos): object 'mass_plot' not found
deltart_plot_all <- sm(plot_pyprophet_boxplot(pyprophet_fun, column="delta_rt"))
## Error in sample_data[[i]]: subscript out of bounds
deltart_plot_all
## Error in eval(expr, envir, enclos): object 'deltart_plot_all' not found
deltart_plot_real <- sm(plot_pyprophet_boxplot(pyprophet_fun,
                                               column="delta_rt", keep_decoys=FALSE))
## Error in sample_data[[i]]: subscript out of bounds
deltart_plot_real
## Error in eval(expr, envir, enclos): object 'deltart_plot_real' not found
deltart_plot_decoys <- sm(plot_pyprophet_boxplot(pyprophet_fun,
                                                 column="delta_rt", keep_real=FALSE))
## Error in sample_data[[i]]: subscript out of bounds
deltart_plot_decoys
## Error in eval(expr, envir, enclos): object 'deltart_plot_decoys' not found
testing <- sm(plot_pyprophet_data(pyprophet_fun))
## Error in sample_data[[i]]: subscript out of bounds
testing$plot
## Error in eval(expr, envir, enclos): object 'testing' not found

3.2 MSstats

msstats.org seems to provide a complete solution for performing reasonable metrics of this data.

I am currently reading: http://msstats.org/wp-content/uploads/2017/01/MSstats_v3.7.3_manual.pdf

I made some moderately intrusive changes to MSstats to make it clearer, as well.

devtools::load_all("~/scratch/git/MSstats")
## Loading MSstats
## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)

## Warning: character(0)
msstats_quant <- dataProcess(msstats_input)
## 2018-06-11 12:17:32 WARNING::The required inputs: ISOTOPLABELCHARGE were not provided. the required inputs are: 
## Proteinname, PeptideSequence/PeptideModifiedSequence, PrecursorCharge, FragmentIon,
## ProductCharge, IsotopeLabelType, Condition, BioReplicate, Run, Intensity.
## The provided inputs are: PROTEINNAME, PEPTIDESEQUENCE, PRECURSORCHARGE, FRAGMENTION, PRODUCTCHARGE, ISOTOPELABELTYPE, INTENSITY, BIOREPLICATE, CONDITION, RUN
## 2018-06-11 12:17:32 INFO::The summary method is: TMP
## 2018-06-11 12:17:32 INFO::The cutoff censor method is: minFeature.
## 2018-06-11 12:17:32 INFO::The censored int is: NA
## 2018-06-11 12:17:33 INFO::Data successfully reformatted for further analyses.
## 2018-06-11 12:17:33 INFO::Log 2 transformation complete.
## 2018-06-11 12:17:38 INFO::The fillincomplete rows option is: TRUE
## 2018-06-11 12:17:38 WARNING::CAUTION: the input dataset has incomplete rows. If missing peaks
##  occur they should be included in the dataset as separate rows, and the missing intensity
##  values should be indicated with 'NA'. The incomplete rows are listed below.
## 2018-06-11 12:17:38 INFO::*** Subject : br2, Condition : wt_cf has incomplete rows for some features
##  2018-06-11 12:17:38 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:38 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:38 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:38 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:38 INFO::*** Subject : br3, Condition : wt_cf has incomplete rows for some features
##  2018-06-11 12:17:38 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:38 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:38 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:38 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:39 INFO::*** Subject : br4, Condition : wt_cf has incomplete rows for some features
##  2018-06-11 12:17:39 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:39 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:39 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:39 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:40 INFO::*** Subject : br5, Condition : wt_whole has incomplete rows for some features
##  2018-06-11 12:17:40 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:40 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:40 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:40 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:41 INFO::*** Subject : br6, Condition : wt_whole has incomplete rows for some features
##  2018-06-11 12:17:41 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:41 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:41 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAHVLPDAQLTAAEQR_2_NA_NA,
##  2018-06-11 12:17:41 INFO::AAAAHVLPDAQLTAAEQR_3_NA_NA
## 2018-06-11 12:17:42 INFO::*** Subject : br7, Condition : wt_whole has incomplete rows for some features
##  2018-06-11 12:17:42 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:42 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:42 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:42 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:43 INFO::*** Subject : brt, Condition : wt_whole has incomplete rows for some features
##  2018-06-11 12:17:43 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:43 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:43 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:43 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:45 INFO::*** Subject : bru, Condition : wt_whole has incomplete rows for some features
##  2018-06-11 12:17:45 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:45 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:45 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:45 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:46 INFO::*** Subject : brv, Condition : wt_cf has incomplete rows for some features
##  2018-06-11 12:17:46 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:46 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:46 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:46 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:47 INFO::*** Subject : brx, Condition : wt_cf has incomplete rows for some features
##  2018-06-11 12:17:47 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:47 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:47 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:47 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:48 INFO::*** Subject : bry, Condition : wt_cf has incomplete rows for some features
##  2018-06-11 12:17:48 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:48 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:48 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:48 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:49 INFO::*** Subject : brz, Condition : wt_whole has incomplete rows for some features
##  2018-06-11 12:17:49 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:49 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:49 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:49 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:51 INFO::*** Subject : br14, Condition : delta_cf has incomplete rows for some features
##  2018-06-11 12:17:51 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:51 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:51 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:51 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:52 INFO::*** Subject : br15, Condition : delta_cf has incomplete rows for some features
##  2018-06-11 12:17:52 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:52 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:52 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:52 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:53 INFO::*** Subject : br16, Condition : delta_cf has incomplete rows for some features
##  2018-06-11 12:17:53 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:53 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:53 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:53 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:55 INFO::*** Subject : br17, Condition : comp_cf has incomplete rows for some features
##  2018-06-11 12:17:55 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:55 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:55 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:55 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:57 INFO::*** Subject : br18, Condition : comp_cf has incomplete rows for some features
##  2018-06-11 12:17:57 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:57 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:57 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:57 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:17:59 INFO::*** Subject : br19, Condition : comp_cf has incomplete rows for some features
##  2018-06-11 12:17:59 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:17:59 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:17:59 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:17:59 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:18:00 INFO::*** Subject : br8, Condition : delta_whole has incomplete rows for some features
##  2018-06-11 12:18:00 INFO::(AAAAAPSGTAVGAGAR_2_NA_NA, AAAAAPSGTAVGAGAR_3_NA_NA,
##  2018-06-11 12:18:00 INFO::AAAAGGQVIAEPADIPSVGR_2_NA_NA, AAAAGGQVIAEPADIPSVGR_3_NA_NA,
##  2018-06-11 12:18:00 INFO::AAAAHVLPDAQLTAAEQR_2_NA_NA, AAAAHVLPDAQLTAAEQR_3_NA_NA
## 2018-06-11 12:18:02 INFO::*** Subject : br9, Condition : delta_whole has incomplete rows for some features
##  2018-06-11 12:18:02 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:18:02 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:18:02 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:18:02 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:18:03 INFO::*** Subject : br10, Condition : delta_whole has incomplete rows for some
##  2018-06-11 12:18:03 INFO::features (AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:18:03 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:18:03 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:18:03 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:18:05 INFO::*** Subject : br12, Condition : comp_whole has incomplete rows for some features
##  2018-06-11 12:18:05 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:18:05 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAAPSGTAVGAGAR_2_NA_NA,
##  2018-06-11 12:18:05 INFO::AAAAAPSGTAVGAGAR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:18:05 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA
## 2018-06-11 12:18:07 INFO::*** Subject : br13, Condition : comp_whole has incomplete rows for some features
##  2018-06-11 12:18:07 INFO::(AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA,
##  2018-06-11 12:18:07 INFO::AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAAAGGQVIAEPADIPSVGR_2_NA_NA,
##  2018-06-11 12:18:07 INFO::AAAAGGQVIAEPADIPSVGR_3_NA_NA, AAADSAELPLFR_2_NA_NA, AAADSAELPLFR_3_NA_NA
## Incomplete rows for missing peaks are added with intensity values=NA.
## 2018-06-11 12:18:10 INFO::Incomplete rows for missing peaks are added with intensity values=NA.
## *** Subject : br2, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAESTAPHKVK_4_NA_NA, AAIEALGPIALPVK_2_NA_NA, AAIEALGPIALPVK_3_NA_NA, AAIEALGPIALPVK_4_NA_NA, AAPATVSELDR_2_NA_NA, AAPATVSELDR_3_NA_NA)
## 2018-06-11 12:18:10 WARNING::*** Subject : br2, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAESTAPHKVK_4_NA_NA, AAIEALGPIALPVK_2_NA_NA, AAIEALGPIALPVK_3_NA_NA, AAIEALGPIALPVK_4_NA_NA, AAPATVSELDR_2_NA_NA, AAPATVSELDR_3_NA_NA)
## *** Subject : br3, Condition : wt_cf has multiple rows (duplicate rows) for some features (AADSAESDAGADQTGPQVK_2_NA_NA, AADSAESDAGADQTGPQVK_3_NA_NA, AFAEPAGIK_2_NA_NA, AFAEPAGIK_3_NA_NA, AFAEPAGIKIEASDISVAAR_2_NA_NA, AFAEPAGIKIEASDISVAAR_3_NA_NA)
## 2018-06-11 12:18:13 WARNING::*** Subject : br3, Condition : wt_cf has multiple rows (duplicate rows) for some features (AADSAESDAGADQTGPQVK_2_NA_NA, AADSAESDAGADQTGPQVK_3_NA_NA, AFAEPAGIK_2_NA_NA, AFAEPAGIK_3_NA_NA, AFAEPAGIKIEASDISVAAR_2_NA_NA, AFAEPAGIKIEASDISVAAR_3_NA_NA)
## *** Subject : br4, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAGFSDRPFSVMR_2_NA_NA, AAGFSDRPFSVMR_3_NA_NA, AALEKDYDLVGNNLGGR_2_NA_NA, AALEKDYDLVGNNLGGR_3_NA_NA, AALGGSDISAIK_2_NA_NA, AALGGSDISAIK_3_NA_NA)
## 2018-06-11 12:18:14 WARNING::*** Subject : br4, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAGFSDRPFSVMR_2_NA_NA, AAGFSDRPFSVMR_3_NA_NA, AALEKDYDLVGNNLGGR_2_NA_NA, AALEKDYDLVGNNLGGR_3_NA_NA, AALGGSDISAIK_2_NA_NA, AALGGSDISAIK_3_NA_NA)
## *** Subject : br5, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAALVELLSPFEADEGKAPAGR_2_NA_NA, AAALVELLSPFEADEGKAPAGR_3_NA_NA, AAIGLGDGVVR_2_NA_NA, AAIGLGDGVVR_3_NA_NA, AAVTAVSDAVR_2_NA_NA, AAVTAVSDAVR_3_NA_NA)
## 2018-06-11 12:18:15 WARNING::*** Subject : br5, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAALVELLSPFEADEGKAPAGR_2_NA_NA, AAALVELLSPFEADEGKAPAGR_3_NA_NA, AAIGLGDGVVR_2_NA_NA, AAIGLGDGVVR_3_NA_NA, AAVTAVSDAVR_2_NA_NA, AAVTAVSDAVR_3_NA_NA)
## *** Subject : br6, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAAAGGQVIAEPADIPSVGR_2_NA_NA, AAAAGGQVIAEPADIPSVGR_3_NA_NA, AAAQERPA_2_NA_NA, AADDAVYTALDANADR_2_NA_NA, AADDAVYTALDANADR_3_NA_NA, AAEGYLEAATSR_2_NA_NA)
## 2018-06-11 12:18:16 WARNING::*** Subject : br6, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAAAGGQVIAEPADIPSVGR_2_NA_NA, AAAAGGQVIAEPADIPSVGR_3_NA_NA, AAAQERPA_2_NA_NA, AADDAVYTALDANADR_2_NA_NA, AADDAVYTALDANADR_3_NA_NA, AAEGYLEAATSR_2_NA_NA)
## *** Subject : br7, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAANLGINLSR_2_NA_NA, AAANLGINLSR_3_NA_NA, AAANLGINLSR_4_NA_NA, AAGAPVIC(UniMod_4)ETADQGR_2_NA_NA, AAGAPVIC(UniMod_4)ETADQGR_3_NA_NA, AAGAPVIC(UniMod_4)ETADQGR_4_NA_NA)
## 2018-06-11 12:18:17 WARNING::*** Subject : br7, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAANLGINLSR_2_NA_NA, AAANLGINLSR_3_NA_NA, AAANLGINLSR_4_NA_NA, AAGAPVIC(UniMod_4)ETADQGR_2_NA_NA, AAGAPVIC(UniMod_4)ETADQGR_3_NA_NA, AAGAPVIC(UniMod_4)ETADQGR_4_NA_NA)
## *** Subject : brt, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAEAWSGGYFAK_2_NA_NA, AAEAWSGGYFAK_3_NA_NA, AAHWEHTVAVTDDGPR_2_NA_NA, AAHWEHTVAVTDDGPR_3_NA_NA, AAPVAPVLSAR_2_NA_NA, AAPVAPVLSAR_3_NA_NA)
## 2018-06-11 12:18:18 WARNING::*** Subject : brt, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAEAWSGGYFAK_2_NA_NA, AAEAWSGGYFAK_3_NA_NA, AAHWEHTVAVTDDGPR_2_NA_NA, AAHWEHTVAVTDDGPR_3_NA_NA, AAPVAPVLSAR_2_NA_NA, AAPVAPVLSAR_3_NA_NA)
## *** Subject : bru, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAADSAELPLFR_2_NA_NA, AAADSAELPLFR_3_NA_NA, AEPIFATVAPGVAAAPR_2_NA_NA, AEPIFATVAPGVAAAPR_3_NA_NA, AEQLDSDRL_2_NA_NA, AEQLDSDRL_3_NA_NA)
## 2018-06-11 12:18:19 WARNING::*** Subject : bru, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAADSAELPLFR_2_NA_NA, AAADSAELPLFR_3_NA_NA, AEPIFATVAPGVAAAPR_2_NA_NA, AEPIFATVAPGVAAAPR_3_NA_NA, AEQLDSDRL_2_NA_NA, AEQLDSDRL_3_NA_NA)
## *** Subject : brv, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAAIHDQFVATLASSASSYAATEVANAAAAS_2_NA_NA, AAM(UniMod_35)AAQLQAVPGAAQYIGLVESVAGSC(UniMod_4)NNY_2_NA_NA, AAMAAQLQAVPGAAQYIGLVESVAGSC(UniMod_4)NNY_2_NA_NA, AAVGTTSDINQQDPATLQDGGNLR_2_NA_NA, AAVGTTSDINQQDPATLQDGGNLR_3_NA_NA, AAVGTTSDINQQDPATLQDGGNLR_4_NA_NA)
## 2018-06-11 12:18:20 WARNING::*** Subject : brv, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAAIHDQFVATLASSASSYAATEVANAAAAS_2_NA_NA, AAM(UniMod_35)AAQLQAVPGAAQYIGLVESVAGSC(UniMod_4)NNY_2_NA_NA, AAMAAQLQAVPGAAQYIGLVESVAGSC(UniMod_4)NNY_2_NA_NA, AAVGTTSDINQQDPATLQDGGNLR_2_NA_NA, AAVGTTSDINQQDPATLQDGGNLR_3_NA_NA, AAVGTTSDINQQDPATLQDGGNLR_4_NA_NA)
## *** Subject : brx, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAALEKAAAAR_2_NA_NA, AAALEKAAAAR_3_NA_NA, AAALNIVPTSTGAAK_2_NA_NA, AAALNIVPTSTGAAK_3_NA_NA, AAAMTASAEYLR_2_NA_NA, AAAMTASAEYLR_3_NA_NA)
## 2018-06-11 12:18:21 WARNING::*** Subject : brx, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAALEKAAAAR_2_NA_NA, AAALEKAAAAR_3_NA_NA, AAALNIVPTSTGAAK_2_NA_NA, AAALNIVPTSTGAAK_3_NA_NA, AAAMTASAEYLR_2_NA_NA, AAAMTASAEYLR_3_NA_NA)
## *** Subject : bry, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAELLGWR_2_NA_NA, AAELLGWR_3_NA_NA, AASGAVLSALGPK_2_NA_NA, AASGAVLSALGPK_3_NA_NA, AAVADLVTAGTHPSC(UniMod_4)PKPAR_2_NA_NA, AAVADLVTAGTHPSC(UniMod_4)PKPAR_3_NA_NA)
## 2018-06-11 12:18:22 WARNING::*** Subject : bry, Condition : wt_cf has multiple rows (duplicate rows) for some features (AAELLGWR_2_NA_NA, AAELLGWR_3_NA_NA, AASGAVLSALGPK_2_NA_NA, AASGAVLSALGPK_3_NA_NA, AAVADLVTAGTHPSC(UniMod_4)PKPAR_2_NA_NA, AAVADLVTAGTHPSC(UniMod_4)PKPAR_3_NA_NA)
## *** Subject : brz, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAAGALSPLAPLR_2_NA_NA, AAAGALSPLAPLR_3_NA_NA, AADILKDESYK_2_NA_NA, AADILKDESYK_3_NA_NA, AADILKDESYK_4_NA_NA, AAGAEAIQINDAHR_2_NA_NA)
## 2018-06-11 12:18:23 WARNING::*** Subject : brz, Condition : wt_whole has multiple rows (duplicate rows) for some features (AAAGALSPLAPLR_2_NA_NA, AAAGALSPLAPLR_3_NA_NA, AADILKDESYK_2_NA_NA, AADILKDESYK_3_NA_NA, AADILKDESYK_4_NA_NA, AAGAEAIQINDAHR_2_NA_NA)
## *** Subject : br14, Condition : delta_cf has multiple rows (duplicate rows) for some features (AAASGATVLC(UniMod_4)VSK_2_NA_NA, AAASGATVLC(UniMod_4)VSK_3_NA_NA, AAASGATVLC(UniMod_4)VSKDLPFAQKR_2_NA_NA, AADM(UniMod_35)WGPSSDPAWER_2_NA_NA, AADM(UniMod_35)WGPSSDPAWER_3_NA_NA, AADMWGPSSDPAWER_2_NA_NA)
## 2018-06-11 12:18:24 WARNING::*** Subject : br14, Condition : delta_cf has multiple rows (duplicate rows) for some features (AAASGATVLC(UniMod_4)VSK_2_NA_NA, AAASGATVLC(UniMod_4)VSK_3_NA_NA, AAASGATVLC(UniMod_4)VSKDLPFAQKR_2_NA_NA, AADM(UniMod_35)WGPSSDPAWER_2_NA_NA, AADM(UniMod_35)WGPSSDPAWER_3_NA_NA, AADMWGPSSDPAWER_2_NA_NA)
## *** Subject : br15, Condition : delta_cf has multiple rows (duplicate rows) for some features (ADALQADAER_2_NA_NA, ADALQADAER_3_NA_NA, ADAMLADAQSR_2_NA_NA, ADAMLADAQSR_3_NA_NA, AELPGVDPDKDVDIM(UniMod_35)VR_2_NA_NA, AELPGVDPDKDVDIMVR_2_NA_NA)
## 2018-06-11 12:18:25 WARNING::*** Subject : br15, Condition : delta_cf has multiple rows (duplicate rows) for some features (ADALQADAER_2_NA_NA, ADALQADAER_3_NA_NA, ADAMLADAQSR_2_NA_NA, ADAMLADAQSR_3_NA_NA, AELPGVDPDKDVDIM(UniMod_35)VR_2_NA_NA, AELPGVDPDKDVDIMVR_2_NA_NA)
## *** Subject : br16, Condition : delta_cf has multiple rows (duplicate rows) for some features (AAGDPDLLVSR_2_NA_NA, AAGDPDLLVSR_3_NA_NA, AAGDPDLLVSR_4_NA_NA, AASENGVPVTAR_2_NA_NA, AASENGVPVTAR_3_NA_NA, AASENGVPVTAR_4_NA_NA)
## 2018-06-11 12:18:26 WARNING::*** Subject : br16, Condition : delta_cf has multiple rows (duplicate rows) for some features (AAGDPDLLVSR_2_NA_NA, AAGDPDLLVSR_3_NA_NA, AAGDPDLLVSR_4_NA_NA, AASENGVPVTAR_2_NA_NA, AASENGVPVTAR_3_NA_NA, AASENGVPVTAR_4_NA_NA)
## *** Subject : br17, Condition : comp_cf has multiple rows (duplicate rows) for some features (AAC(UniMod_4)LDYVEK_2_NA_NA, AAC(UniMod_4)LDYVEK_3_NA_NA, AAEHGDLPLSFSVTNIQPAAAGSATADVSVSGPK_2_NA_NA, AAEHGDLPLSFSVTNIQPAAAGSATADVSVSGPK_3_NA_NA, AAEHGDLPLSFSVTNIQPAAAGSATADVSVSGPK_4_NA_NA, AAELLASTNR_2_NA_NA)
## 2018-06-11 12:18:27 WARNING::*** Subject : br17, Condition : comp_cf has multiple rows (duplicate rows) for some features (AAC(UniMod_4)LDYVEK_2_NA_NA, AAC(UniMod_4)LDYVEK_3_NA_NA, AAEHGDLPLSFSVTNIQPAAAGSATADVSVSGPK_2_NA_NA, AAEHGDLPLSFSVTNIQPAAAGSATADVSVSGPK_3_NA_NA, AAEHGDLPLSFSVTNIQPAAAGSATADVSVSGPK_4_NA_NA, AAELLASTNR_2_NA_NA)
## *** Subject : br18, Condition : comp_cf has multiple rows (duplicate rows) for some features (AADTDVFSAVR_2_NA_NA, AADTDVFSAVR_3_NA_NA, AAEAAAIAALPPSEDFESGARR_2_NA_NA, AAEAAAIAALPPSEDFESGARR_3_NA_NA, AAGGAAGPLGAK_2_NA_NA, AAGGAAGPLGAK_3_NA_NA)
## 2018-06-11 12:18:28 WARNING::*** Subject : br18, Condition : comp_cf has multiple rows (duplicate rows) for some features (AADTDVFSAVR_2_NA_NA, AADTDVFSAVR_3_NA_NA, AAEAAAIAALPPSEDFESGARR_2_NA_NA, AAEAAAIAALPPSEDFESGARR_3_NA_NA, AAGGAAGPLGAK_2_NA_NA, AAGGAAGPLGAK_3_NA_NA)
## *** Subject : br19, Condition : comp_cf has multiple rows (duplicate rows) for some features (AAEPAPQPEQPDTPALGGEQAELTAES_2_NA_NA, AAGAAGIPIR_2_NA_NA, AAGAAGIPIR_3_NA_NA, AAIDTAHAQGAR_2_NA_NA, AALAAAGVQPETVGVVEAHGTGTPIGDPIEYR_2_NA_NA, AALAAAGVQPETVGVVEAHGTGTPIGDPIEYR_3_NA_NA)
## 2018-06-11 12:18:29 WARNING::*** Subject : br19, Condition : comp_cf has multiple rows (duplicate rows) for some features (AAEPAPQPEQPDTPALGGEQAELTAES_2_NA_NA, AAGAAGIPIR_2_NA_NA, AAGAAGIPIR_3_NA_NA, AAIDTAHAQGAR_2_NA_NA, AALAAAGVQPETVGVVEAHGTGTPIGDPIEYR_2_NA_NA, AALAAAGVQPETVGVVEAHGTGTPIGDPIEYR_3_NA_NA)
## *** Subject : br8, Condition : delta_whole has multiple rows (duplicate rows) for some features (AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA, AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAGAELLDYDEVVAR_2_NA_NA, AAGAELLDYDEVVAR_3_NA_NA, AAVILDSDPWR_2_NA_NA, AAYLAEGRQPVVR_2_NA_NA)
## 2018-06-11 12:18:31 WARNING::*** Subject : br8, Condition : delta_whole has multiple rows (duplicate rows) for some features (AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_2_NA_NA, AAAAADPGPPTRPAHNAAGVSPEMVQVPAEAQR_3_NA_NA, AAGAELLDYDEVVAR_2_NA_NA, AAGAELLDYDEVVAR_3_NA_NA, AAVILDSDPWR_2_NA_NA, AAYLAEGRQPVVR_2_NA_NA)
## *** Subject : br9, Condition : delta_whole has multiple rows (duplicate rows) for some features (AAAKAPLRQTAVSAAALGLR_2_NA_NA, AAAKAPLRQTAVSAAALGLR_3_NA_NA, AAKADDLGAQQVAK_2_NA_NA, ADDLGAQQVAK_2_NA_NA, ADDLGAQQVAK_3_NA_NA, ADSGVPIVMLTAK_2_NA_NA)
## 2018-06-11 12:18:32 WARNING::*** Subject : br9, Condition : delta_whole has multiple rows (duplicate rows) for some features (AAAKAPLRQTAVSAAALGLR_2_NA_NA, AAAKAPLRQTAVSAAALGLR_3_NA_NA, AAKADDLGAQQVAK_2_NA_NA, ADDLGAQQVAK_2_NA_NA, ADDLGAQQVAK_3_NA_NA, ADSGVPIVMLTAK_2_NA_NA)
## *** Subject : br10, Condition : delta_whole has multiple rows (duplicate rows) for some features (AADAVSEALLASATPVSGK_2_NA_NA, AADAVSEALLASATPVSGK_3_NA_NA, AFGGPTVTNDGVTVAR_2_NA_NA, AFGGPTVTNDGVTVAR_3_NA_NA, AFTVASAGASADR_2_NA_NA, AGGLM(UniMod_35)SALTPQFGSK_2_NA_NA)
## 2018-06-11 12:18:33 WARNING::*** Subject : br10, Condition : delta_whole has multiple rows (duplicate rows) for some features (AADAVSEALLASATPVSGK_2_NA_NA, AADAVSEALLASATPVSGK_3_NA_NA, AFGGPTVTNDGVTVAR_2_NA_NA, AFGGPTVTNDGVTVAR_3_NA_NA, AFTVASAGASADR_2_NA_NA, AGGLM(UniMod_35)SALTPQFGSK_2_NA_NA)
## *** Subject : br12, Condition : comp_whole has multiple rows (duplicate rows) for some features (AADWVDRAEAEAEVQR_2_NA_NA, AADWVDRAEAEAEVQR_3_NA_NA, AAEIVAGPPR_2_NA_NA, AAEIVAGPPR_3_NA_NA, AAEIVAGPPR_4_NA_NA, AAEIVAGPPRK_3_NA_NA)
## 2018-06-11 12:18:34 WARNING::*** Subject : br12, Condition : comp_whole has multiple rows (duplicate rows) for some features (AADWVDRAEAEAEVQR_2_NA_NA, AADWVDRAEAEAEVQR_3_NA_NA, AAEIVAGPPR_2_NA_NA, AAEIVAGPPR_3_NA_NA, AAEIVAGPPR_4_NA_NA, AAEIVAGPPRK_3_NA_NA)
## *** Subject : br13, Condition : comp_whole has multiple rows (duplicate rows) for some features (AAAAAPSGTAVGAGAR_2_NA_NA, AAAAAPSGTAVGAGAR_3_NA_NA, AAAAHVLPDAQLTAAEQR_2_NA_NA, AAAAHVLPDAQLTAAEQR_3_NA_NA, AAIAVGHGVTVR_2_NA_NA, AAIAVGHGVTVR_3_NA_NA)
## 2018-06-11 12:18:37 WARNING::*** Subject : br13, Condition : comp_whole has multiple rows (duplicate rows) for some features (AAAAAPSGTAVGAGAR_2_NA_NA, AAAAAPSGTAVGAGAR_3_NA_NA, AAAAHVLPDAQLTAAEQR_2_NA_NA, AAAAHVLPDAQLTAAEQR_3_NA_NA, AAIAVGHGVTVR_2_NA_NA, AAIAVGHGVTVR_3_NA_NA)
## 2018-06-11 12:18:37 WARNING::Please remove duplicate rows in the list above.
## 2018-06-11 12:18:39 INFO::Recast the following columns as factors:
## group, subject, group_original, subject_original, subject_original_nested, feature, and run.
## 2018-06-11 12:18:40 INFO::Performing equalize medians normalization.
## 2018-06-11 12:18:40 INFO::Between run interference score was not calculated.
## 2018-06-11 12:18:40 INFO::** Log2 intensities under cutoff =14.452 were considered as censored missing values.
## 2018-06-11 12:18:40 INFO::** Log2 intensities = NA were considered as censored missing values.
## 2018-06-11 12:18:40 INFO::Feature Subset: using all features in the data set.
## 2018-06-11 12:18:40 INFO::Summary: 1 levels of isotope labeling were observed.
## 2018-06-11 12:19:20 INFO::Summary of Features:
## 2018-06-11 12:19:20 INFO::Number of Proteins : 867
## 2018-06-11 12:19:20 INFO::Number of Peptides/Protein : 3-494
## 2018-06-11 12:19:20 INFO::Number of Transitions/Peptide : 1-1
##                       
##   Summary of Samples :
## 2018-06-11 12:20:20 INFO::comp_cf   comp_whole   delta_cf   delta_whole   wt_cf   wt_whole
##  2018-06-11 12:20:20 INFO::--------------------------------  --------  -----------  ---------  ------------  ------  ---------
##  2018-06-11 12:20:20 INFO::Number of MS runs                        3            2          3             3       6          6
##  2018-06-11 12:20:20 INFO::Number of Biological Replicates          3            2          3             3       6          6
##  2018-06-11 12:20:20 INFO::Number of Technical Replicates           1            1          1             1       1          1
## 2018-06-11 12:20:20 INFO::Missingness summary: 5374 are missing completely in one condition.
## 2018-06-11 12:20:20 INFO::-> AADWVDRAEAEAEVQR_2_NA_NA, AAIEEMAPQLAR_2_NA_NA, AAVLASGM(UniMod_35)PVTSGGVQLNR_2_NA_NA, ADAVDDEELLELVEMEVR_2_NA_NA, ADDLLTATLLTAR_2_NA_NA ...
## 2018-06-11 12:20:20 INFO::Missingness summary: 0 are missing 75% observations.
## 2018-06-11 12:20:20 INFO::-> AADWVDRAEAEAEVQR_2_NA_NA, AAIEEMAPQLAR_2_NA_NA, AAVLASGM(UniMod_35)PVTSGGVQLNR_2_NA_NA, ADAVDDEELLELVEMEVR_2_NA_NA, ADDLLTATLLTAR_2_NA_NA ...
## 2018-06-11 12:20:20 INFO::Processing data for analysis is complete.
## 
  |                                                                                      
  |                                                                                |   0%
## Warning in survreg.fit(X, Y, weights, offset, init = init, controlvals = control, : Ran
## out of iterations and did not converge
## Warning in survreg.fit(X, Y, weights, offset, init = init, controlvals = control, : Ran
## out of iterations and did not converge

## Warning in survreg.fit(X, Y, weights, offset, init = init, controlvals = control, : Ran
## out of iterations and did not converge
## 
## 2018-06-11 12:28:20 INFO::The summarization per subplot by method: TMP is finished.
msstats_plots <- sm(dataProcessPlots(msstats_quant, type="QCPLOT"))

my_levels <- levels(as.factor(msstats_input$condition))

my_levels
## [1] "comp_cf"     "comp_whole"  "delta_cf"    "delta_whole" "wt_cf"       "wt_whole"
comparisons <- ghetto_contrast_matrix(
  numerators=c("wt_cf", "delta_cf", "comp_cf",
               "delta_cf", "comp_cf", "delta_whole",
               "comp_whole"),
  denominators=c("wt_whole", "delta_whole", "comp_whole",
                 "wt_cf", "wt_cf", "wt_whole",
                 "wt_whole"))
results <- list()
for (c in 1:length(rownames(comparisons))) {
  name <- rownames(comparisons)[c]
  message("Starting ", name)
  comp <- comparisons[c, ]
  comparison <- t(as.matrix(comp))
  rownames(comparison) <- name
  results[name] <- sm(MSstats::groupComparison(contrast.matrix=comparison,
                                               data=msstats_quant))
}
## Starting wt_cf_vs_wt_whole
## Warning in results[name] <- sm(MSstats::groupComparison(contrast.matrix = comparison, :
## number of items to replace is not a multiple of replacement length
## Starting delta_cf_vs_delta_whole
## Warning in results[name] <- sm(MSstats::groupComparison(contrast.matrix = comparison, :
## number of items to replace is not a multiple of replacement length
## Starting comp_cf_vs_comp_whole
## Warning in results[name] <- sm(MSstats::groupComparison(contrast.matrix = comparison, :
## number of items to replace is not a multiple of replacement length
## Starting delta_cf_vs_wt_cf
## Warning in results[name] <- sm(MSstats::groupComparison(contrast.matrix = comparison, :
## number of items to replace is not a multiple of replacement length
## Starting comp_cf_vs_wt_cf
## Warning in results[name] <- sm(MSstats::groupComparison(contrast.matrix = comparison, :
## number of items to replace is not a multiple of replacement length
## Starting delta_whole_vs_wt_whole
## Warning in results[name] <- sm(MSstats::groupComparison(contrast.matrix = comparison, :
## number of items to replace is not a multiple of replacement length
## Starting comp_whole_vs_wt_whole
## Warning in results[name] <- sm(MSstats::groupComparison(contrast.matrix = comparison, :
## number of items to replace is not a multiple of replacement length

3.2.1 P/PE protein QC plots for Yan

Yan asked for the p/pe protein qc plots. ok. I changed the dataProcessPlots to return something useful, so that should be possible now.

pe_genes <- read.table("reference/annotated_pe_genes.txt")[[1]]

## Unfortunately, the names did not get set in my changed version of dataProcessPlots...
plotlst <- msstats_plots$QCPLOT
available_plots <- gsub(pattern="^1/", replacement="", x=levels(msstats_quant$ProcessedData$PROTEIN))
names(plotlst) <- available_plots

pe_in_avail_idx <- pe_genes %in% available_plots
pe_in_avail <- pe_genes[pe_in_avail_idx]
pe_plots <- plotlst[pe_in_avail]
pdf(file="pe_qc_plots.pdf")
for (p in 1:length(pe_plots)) {
  plot(pe_plots[[p]])
}
dev.off()

4 Create hpgltools expressionset

Since I am not certain I understand these data, I will take the intensities from SWATH2stats, metadata, and annotation data; attempt to create a ‘normal’ expressionset; poke at it to see what I can learn.

4.1 Massaging the metadata

I want to use the same metadata as were used for MSstats. It has a few important differences from the requirements of hpgltools: pretty much only that I do not allow rownames/sampleIDs to start with a number.

metadata <- sample_annot
metadata[["sampleid"]] <- paste0("s", metadata[["sampleid"]])
rownames(metadata) <- metadata[["sampleid"]]

4.2 Massaging the gene annotation data and adding the msstats data.

I have my own annotation data from the gff file/microbesonline/whatever, I can add the MSstats result to it so that later I can print them all together.

4.3 Massaging the intensity matrix

I do not want the \1 before the protein names, I already merged them into one entry per gene vis SWATH2stats.

prot_mtrx <- read.csv(paste0("results/swath2stats_", ver, "/protein_matrix_minimum.csv"))
## Warning in file(file, "rt"): cannot open file 'results/swath2stats_20180528/
## protein_matrix_minimum.csv': No such file or directory
## Error in file(file, "rt"): cannot open the connection
rownames(prot_mtrx) <- gsub(pattern="^1\\/", replacement="", x=prot_mtrx[["proteinname"]])
## Error in `.rowNamesDF<-`(x, value = value): invalid 'row.names' length
prot_mtrx <- prot_mtrx[, -1]
## Important question: Did SWATH2stats reorder my data?
fun <- gsub(pattern="^.*_(2018.*$)", replacement="\\1", x=colnames(prot_mtrx))
colnames(prot_mtrx) <- paste0("s", fun)

4.4 Merge the pieces

Now we should have sufficient pieces to make an expressionset.

While here, I will also split the data into a cf and whole-cell pair of data structures.

## Drop the metadata not in the protein matrix:
## And ensure that they are the same order.
reordered <- colnames(prot_mtrx)
metadata <- metadata[reordered, ]

protein_expt <- create_expt(metadata,
                            count_dataframe=prot_mtrx,
                            gene_info=mtb_annotations)
## Reading the sample metadata.
## The sample definitions comprises: 0, 2 rows, columns.
## The count table column names are: ss2018_0315Briken01, ss2018_0315Briken02, ss2018_0315Briken03, ss2018_0315Briken04, ss2018_0315Briken05, ss2018_0315Briken06, ss2018_0315Briken21, ss2018_0315Briken22, ss2018_0315Briken23, ss2018_0315Briken24, ss2018_0315Briken25, ss2018_0315Briken26, ss2018_0502BrikenDIA01, ss2018_0502BrikenDIA02, ss2018_0502BrikenDIA03, ss2018_0502BrikenDIA05, ss2018_0502BrikenDIA06, ss2018_0502BrikenDIA07, ss2018_0502BrikenDIA08, ss2018_0502BrikenDIA09, ss2018_0502BrikenDIA11, ss2018_0502BrikenDIA12
## The  meta   data  row  names are:
## Error in create_expt(metadata, count_dataframe = prot_mtrx, gene_info = mtb_annotations): The count table column names are not the same as the sample definition row names.
whole_expt <- subset_expt(protein_expt, subset="collectiontype=='whole'")
## There were 23, now there are 11 samples.
cf_expt <- subset_expt(protein_expt, subset="collectiontype=='cf'")
## There were 23, now there are 12 samples.

4.5 Metrics of the full data set

protein_metrics <- sm(graph_metrics(protein_expt))
protein_norm <- sm(normalize_expt(protein_expt, transform="log2", convert="cpm",
                                  norm="quant", filter=TRUE))
protein_norm_metrics <- sm(graph_metrics(protein_norm))
protein_fsva <- sm(normalize_expt(protein_expt, transform="log2", convert="cpm",
                                  batch="fsva", filter=TRUE))
protein_fsva_metrics <- sm(graph_metrics(protein_fsva))

4.6 Metrics of the whole-cell data set

whole_metrics <- sm(graph_metrics(whole_expt))
whole_norm <- sm(normalize_expt(whole_expt, transform="log2", convert="cpm",
                                  norm="quant", filter=TRUE))
whole_norm_metrics <- sm(graph_metrics(whole_norm))
whole_fsva <- sm(normalize_expt(whole_expt, transform="log2", convert="cpm",
                                  batch="fsva", filter=TRUE))
whole_fsva_metrics <- sm(graph_metrics(whole_fsva))

4.7 Metrics of the filtrate data set

cf_metrics <- sm(graph_metrics(cf_expt))
cf_norm <- sm(normalize_expt(cf_expt, transform="log2", convert="cpm",
                                  norm="quant", filter=TRUE))
cf_norm_metrics <- sm(graph_metrics(cf_norm))
cf_fsva <- sm(normalize_expt(cf_expt, transform="log2", convert="cpm",
                                  batch="fsva", filter=TRUE))
cf_fsva_metrics <- sm(graph_metrics(cf_fsva))

4.8 plot some metrics

pp(image=protein_metrics$libsize, file="images/20180523_libsize.png")
## Writing the image to: images/20180523_libsize.png and calling dev.off().
## It seems to me that the scale of the data is all within an order of magnitude or two.
## I cannot get used to these absurdly large numbers though.
pp(image=protein_norm_metrics$pcaplot, file="images/20180523_norm_pca.png")
## Writing the image to: images/20180523_norm_pca.png and calling dev.off().
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## There appears to be a nice split in the data, however the un-assayable batch
## effect is a problem.
pp(image=protein_fsva_metrics$pcaplot, file="images/20180523_fsva_pca.png")
## Writing the image to: images/20180523_fsva_pca.png and calling dev.off().
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## fsva seems to get some handle on the data, but I don't think we should rely
## upon it.
pp(image=protein_norm_metrics$corheat, file="images/20180523_norm_corheat.png")
## Writing the image to: images/20180523_norm_corheat.png and calling dev.off().

## Once again, the whole-cell/culture-filtrate split is very large.
pp(image=protein_metrics$density, file="images/20180523_raw_density.png")
## Writing the image to: images/20180523_raw_density.png and calling dev.off().

## There are two obvious distributions in the data, once again split between types.
pp(image=protein_metrics$boxplot, file="images/20180523_boxplot.png")
## Writing the image to: images/20180523_boxplot.png and calling dev.off().

## This recapitulates the previous plot.

pp(image=whole_metrics$libsize, file="images/20180523_whole_libsize.png")
## Writing the image to: images/20180523_whole_libsize.png and calling dev.off().

pp(image=whole_norm_metrics$pcaplot, file="images/20180523_whole_norm_pca.png")
## Writing the image to: images/20180523_whole_norm_pca.png and calling dev.off().
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

pp(image=whole_fsva_metrics$pcaplot, file="images/20180523_whole_fsva_pca.png")
## Writing the image to: images/20180523_whole_fsva_pca.png and calling dev.off().
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

pp(image=whole_norm_metrics$corheat, file="images/20180523_whole_norm_corheat.png")
## Writing the image to: images/20180523_whole_norm_corheat.png and calling dev.off().

pp(image=whole_metrics$density, file="images/20180523_whole_raw_density.png")
## Writing the image to: images/20180523_whole_raw_density.png and calling dev.off().

pp(image=whole_metrics$boxplot, file="images/20180523_whole_boxplot.png")
## Writing the image to: images/20180523_whole_boxplot.png and calling dev.off().

pp(image=cf_metrics$libsize, file="images/20180523_libsize.png")
## Writing the image to: images/20180523_libsize.png and calling dev.off().

pp(image=cf_norm_metrics$pcaplot, file="images/20180523_norm_pca.png")
## Writing the image to: images/20180523_norm_pca.png and calling dev.off().
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

pp(image=cf_fsva_metrics$pcaplot, file="images/20180523_fsva_pca.png")
## Writing the image to: images/20180523_fsva_pca.png and calling dev.off().
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

pp(image=cf_norm_metrics$corheat, file="images/20180523_norm_corheat.png")
## Writing the image to: images/20180523_norm_corheat.png and calling dev.off().

pp(image=cf_metrics$density, file="images/20180523_raw_density.png")
## Writing the image to: images/20180523_raw_density.png and calling dev.off().

pp(image=cf_metrics$boxplot, file="images/20180523_boxplot.png")
## Writing the image to: images/20180523_boxplot.png and calling dev.off().

5 Attempt some quantification comparisons?

pairwise_filt <- sm(normalize_expt(protein_expt, filter=TRUE))
pairwise_comp <- sm(all_pairwise(pairwise_filt, model_batch="fsva", force=TRUE))

pairwise_nobatch <- sm(all_pairwise(pairwise_filt, model_batch=FALSE, force=TRUE))

6 For each msstats run, do a DE table

6.1 wt_cf vs wt_whole

keepers <- list(
  "wtcf_vs_wtwhole" = c("wt_cf", "wt_whole"))
droppers <- c("undefined")
names(droppers) <- "log2fc"

## Make sure to set the rownames so it will merge into the excel file.
rownames(results[[1]]) <- results[[1]][["Protein"]]
wtcf_wtwhole_tables <- sm(combine_de_tables(
  pairwise_comp, keepers=keepers, extra_annot=results[[1]],
  excludes=droppers,
  excel=paste0("excel/wtcf_vs_wtwhole_tables-v", ver, ".xlsx")))

wtcf_nobatch_wtwhole_tables <- sm(combine_de_tables(
  pairwise_nobatch, keepers=keepers, extra_annot=results[[1]],
  excludes=droppers,
  excel=paste0("excel/wtcf_vs_wtwhole_nobatch_tables-v", ver, ".xlsx")))

comp_table <- wtcf_wtwhole_tables$data[[1]]
cor.test(comp_table$log2fc, comp_table$deseq_logfc, method="spearman")
## Warning in cor.test.default(comp_table$log2fc, comp_table$deseq_logfc, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  comp_table$log2fc and comp_table$deseq_logfc
## S = 5.7e+07, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##    rho 
## 0.4486
comp_table <- wtcf_nobatch_wtwhole_tables$data[[1]]
cor.test(comp_table$log2fc, comp_table$deseq_logfc, method="spearman")
## Warning in cor.test.default(comp_table$log2fc, comp_table$deseq_logfc, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  comp_table$log2fc and comp_table$deseq_logfc
## S = 4.9e+07, p-value <2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##    rho 
## 0.5303

6.2 delta_cf vs delta_whole

keepers <- list(
  "deltacf_vs_deltawhole" = c("delta_cf", "delta_whole"))
## Make sure to set the rownames so it will merge into the excel file.
rownames(results[[2]]) <- results[[2]][["Protein"]]
deltacf_deltawhole_tables <- sm(combine_de_tables(
  pairwise_comp, keepers=keepers, extra_annot=results[[2]],
  excel=paste0("excel/deltacf_vs_deltawhole_tables-v", ver, ".xlsx")))

6.3 comp_cf vs comp_whole

keepers <- list(
  "compcf_vs_compwhole" = c("comp_cf", "comp_whole"))
## Make sure to set the rownames so it will merge into the excel file.
rownames(results[[3]]) <- results[[3]][["Protein"]]
compcf_compwhole_tables <- sm(combine_de_tables(
  pairwise_comp, keepers=keepers, extra_annot=results[[3]],
  excel=paste0("excel/compcf_vs_compwhole_tables-v", ver, ".xlsx")))

6.4 delta_cf vs wt_cf

keepers <- list(
  "deltacf_vs_wtcf" = c("delta_cf", "wt_cf"))
## Make sure to set the rownames so it will merge into the excel file.
rownames(results[[4]]) <- results[[4]][["Protein"]]
deltacf_wtcf_tables <- sm(combine_de_tables(
  pairwise_comp, keepers=keepers, extra_annot=results[[4]],
  excel=paste0("excel/deltacf_vs_wtcf_tables-v", ver, ".xlsx")))

6.5 comp_cf vs wt_cf

keepers <- list(
  "compcf_vs_wtcf" = c("comp_cf", "wt_cf"))
## Make sure to set the rownames so it will merge into the excel file.
rownames(results[[5]]) <- results[[5]][["Protein"]]
compcf_wtcf_tables <- sm(combine_de_tables(
  pairwise_comp, keepers=keepers, extra_annot=results[[5]],
  excel=paste0("excel/compcf_vs_wtcf_tables-v", ver, ".xlsx")))

6.6 delta_whole vs wt_whole

keepers <- list(
  "wtcf_vs_wtwhole" = c("wt_cf", "wt_whole"))
## Make sure to set the rownames so it will merge into the excel file.
rownames(results[[6]]) <- results[[6]][["Protein"]]
wtcf_wtwhole_tables <- sm(combine_de_tables(
  pairwise_comp, keepers=keepers, extra_annot=results[[6]],
  excel=paste0("excel/deltawhole_vs_wtwhole_tables-v", ver, ".xlsx")))

6.7 comp_whole vs wt_whole

keepers <- list(
  "compwhole_vs_wtwhole" = c("comp_whole", "wt_whole"))
## Make sure to set the rownames so it will merge into the excel file.
rownames(results[[7]]) <- results[[7]][["Protein"]]
compwhole_wtwhole_tables <- sm(combine_de_tables(
  pairwise_comp, keepers=keepers, extra_annot=results[[7]],
  excel=paste0("excel/compwhole_vs_wtwhole_tables-v", ver, ".xlsx")))

7 Index version: 20180528

8 TODO

  • 2018-04-10: Make sure my invocations of SWATH2stats/MSstats are correct.
if (!isTRUE(get0("skip_load"))) {
  message(paste0("This is hpgltools commit: ", get_git_commit()))
  this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
  message(paste0("Saving to ", this_save))
  tmp <- sm(saveme(filename=this_save))
  pander::pander(sessionInfo())
}
