RNAsequencing of CL14 and CLBrener

This document attempts to perform analyses of an RNA sequencing experiment to compare the RNA expression profiles of the T. cruzi strains: CL14 and CLBrener. These collections were performed with Epimastigotes, Trypomastigotes, and Amastigotes at times 60 and 96 hours after infection.

Loading/Saving data and generating reports

The following block is intended to only be evaluated manually when playing with data and generating reports. It is not evaluated by knitr so that the data is generated de-novo when the report is made.

load("RData")
rm(list=ls())
save(list=ls(all=TRUE), file="RData")
render("rnaseq_cl14clb.rmd", output_format="pdf_document")
render("rnaseq_cl14clb.rmd", output_format="html_document")
render("rnaseq_cl14clb.rmd", 'knitrBootstrap::bootstrap_document')

Genome annotation input

Annotation data for the genes is important for everything which follows. The TritrypDB reference gff files are the source for this information.

## uh what the hell did I do to this?  I no longer have tooltip_data created.
## Well I guess that is what git history is for...
## I got the original lines back with the following commands:
###  git log  ## This listed me the commit names so I could find the culprit
###  git diff d3cd139df893f1c8197a187eda48e6bb12794d8b^  ## This showed me the differences between
###  the commit (d3cd...) and the current branch (^) and I found where I deleted the lines.
tcruzi_annotations = import.gff3("reference/gff/clbrener_8.1_complete.gff.gz")
annotation_info = as.data.frame(tcruzi_annotations)

genes = annotation_info[annotation_info$type=="gene",]
gene_annotations = genes
rownames(genes) = genes$Name
tooltip_data = genes
tooltip_data = tooltip_data[,c(11,12)]
tooltip_data$tooltip = paste(tooltip_data$Name, tooltip_data$description, sep=": ")
tooltip_data$tooltip = gsub("\\+", " ", tooltip_data$tooltip)
rownames(tooltip_data) = tooltip_data$Name
tooltip_data = tooltip_data[-1]
tooltip_data = tooltip_data[-1]
colnames(tooltip_data) = c("name.tooltip")
head(tooltip_data)

##                                                                                       name.tooltip
## TcCLB.397937.10                TcCLB.397937.10: pumilio/PUF RNA binding protein 1, putative (PUF1)
## TcCLB.397937.5  TcCLB.397937.5: N-acetylglucosamine-6-phosphate deacetylase-like protein, putative
## TcCLB.398343.9                                     TcCLB.398343.9: surface protease GP63, putative
## TcCLB.398345.10                                   TcCLB.398345.10: hypothetical protein, conserved
## TcCLB.398751.10                                   TcCLB.398751.10: hypothetical protein, conserved
## TcCLB.399033.10                                              TcCLB.399033.10: hypothetical protein

Set up experimental design

The following couple of lines reads the csv of samples and sets some initial colors for plots.

Sample removal

The .csv file "all_samples.csv" contains every sequences sample. A few were commented out with '#' marks and the file was re-saved as "kept_samples.csv". The suffix argument is used to define the count tables, which were made as hpgl###.htseq.gz or somesuch.

all_expt = create_expt("all_samples.csv", suffix=".htseq.gz")

## [1] "This function needs the conditions and batches to be an explicit column in the sample sheet."
## [1] "Please note that thus function assumes a specific set of columns in the sample sheet:"
## [1] "The most important ones are: Sample.ID, Stage, Type."
## [1] "Other columns it will attempt to create by itself, but if"
## [1] "batch and condition are provided, that is a nice help."

kept_expt = create_expt("kept_samples.csv", suffix=".htseq.gz" )

## [1] "This function needs the conditions and batches to be an explicit column in the sample sheet."
## [1] "Please note that thus function assumes a specific set of columns in the sample sheet:"
## [1] "The most important ones are: Sample.ID, Stage, Type."
## [1] "Other columns it will attempt to create by itself, but if"
## [1] "batch and condition are provided, that is a nice help."

Create testing subsets

The following block splits the data into an 'all' set and one of just clbrener. It also prints out lots of plots describing the state of the samples.

clbr = expt_subset(all_expt, "condition=='clbr_epi'|condition=='clbr_tryp'|condition=='clbr_a60'|condition=='clbr_a96'")
cl14 = expt_subset(all_expt, "condition=='cl14_epi'|condition=='cl14_tryp'|condition=='cl14_a60'|condition=='cl14_a96'")

cl14_metrics = graph_metrics(expt=cl14)

## [1] "Graphing number of non-zero genes with respect to CPM by library."
## [1] "Graphing library sizes."
## [1] "Adding log10"
## [1] "Graphing a raw data boxplot on log scale."

## Using id as id variables

## [1] "Graphing a normalized boxplot."

## Using id as id variables

## [1] "Graphing a raw-data correlation heatmap."

## [1] "Graphing a raw-data standard median correlation."

## [1] "Graphing a normalized correlation heatmap."

## [1] "Graphing a normalized standard median correlation."

## [1] "Graphing a raw-data distance heatmap."

## [1] "Graphing a raw-data standard median distance."

## [1] "Graphing a normalized distance heatmap."

## [1] "Graphing a normalized standard median distance."

## [1] "Graphing a PCA plot of the raw data."
##    propVar cumPropVar cond.R2 batch.R2
## 1    99.66      99.66   94.55    78.62
## 2     0.22      99.88   91.79    87.55
## 3     0.06      99.94    2.79    99.16
## 4     0.04      99.98    8.76    58.83
## 5     0.01      99.99    2.04    78.55
## 6     0.01     100.00    0.07    43.37
## 7     0.00     100.00    8.50    45.98
## 8     0.00     100.00   89.47    77.21
## 9     0.00     100.00    0.65    28.61
## 10    0.00     100.00    0.11     0.34
## 11    0.00     100.00    1.27     1.78
## 12    0.00     100.00    2.48    11.57
## [1] "Printing a qqplot of the raw data."

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## [1] "Loading grid"

## [1] "Loading grid"

## [1] "This plot looks neat if you do position='fill' or position='stack'"

## Using  as id variables

## [1] "This plot looks neat if you do position='fill' or position='stack'"
## [1] "Graphing a PCA plot of the normalized data."
##    propVar cumPropVar cond.R2 batch.R2
## 1    48.63      48.63   90.30    88.96
## 2    23.31      71.94   39.56    89.51
## 3    11.91      83.85   90.40    82.20
## 4     8.49      92.34   52.80    80.44
## 5     2.45      94.79   20.98    21.97
## 6     1.73      96.52    2.35    91.75
## 7     1.22      97.74    3.19    56.87
## 8     0.74      98.48    0.31     5.62
## 9     0.62      99.10    0.01     1.03
## 10    0.58      99.68    0.08     0.23
## 11    0.30      99.98    0.02    81.42
## [1] "Printing a qqplot of the normalized data."
## [1] "Loading grid"

## [1] "Loading grid"

## Using id as id variables

##   propVar cumPropVar cond.R2 batch.R2
## 1   99.82      99.82   30.58        0
## 2    0.17      99.99    2.60        0
## 3    0.01     100.00    0.14        0
## 4    0.00     100.00    0.01        0
## 5    0.00     100.00    0.00        0

cl14_metrics$norm_corheat

Graph metrics of the subsets

This section will generate a very large number of plots. Much more publication quality versions of them reside in the images/ directory.

clbr_metrics = graph_metrics(expt=clbr)

## [1] "Graphing number of non-zero genes with respect to CPM by library."
## [1] "Graphing library sizes."
## [1] "Adding log10"
## [1] "Graphing a raw data boxplot on log scale."

## Using id as id variables

## [1] "Graphing a normalized boxplot."

## Using id as id variables

## [1] "Graphing a raw-data correlation heatmap."

## [1] "Graphing a raw-data standard median correlation."

## [1] "Graphing a normalized correlation heatmap."

## [1] "Graphing a normalized standard median correlation."

## [1] "Graphing a raw-data distance heatmap."

## [1] "Graphing a raw-data standard median distance."

## [1] "Graphing a normalized distance heatmap."

## [1] "Graphing a normalized standard median distance."

## [1] "Graphing a PCA plot of the raw data."
##    propVar cumPropVar cond.R2 batch.R2
## 1    99.28      99.28   94.53    94.97
## 2     0.64      99.92   55.13    97.30
## 3     0.06      99.98   47.33    98.96
## 4     0.02     100.00   91.78    96.77
## 5     0.01     100.01    1.25     1.93
## 6     0.00     100.01    1.00     1.02
## 7     0.00     100.01    5.71     5.72
## 8     0.00     100.01    1.25     1.28
## 9     0.00     100.01    1.78     1.80
## 10    0.00     100.01    0.24     0.24
## 11    0.00     100.01    0.02     0.02
## 12    0.00     100.01    1.94     1.94
## [1] "Printing a qqplot of the raw data."

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## [1] "Loading grid"

## [1] "Loading grid"

## [1] "This plot looks neat if you do position='fill' or position='stack'"

## Using  as id variables

## [1] "This plot looks neat if you do position='fill' or position='stack'"
## [1] "Graphing a PCA plot of the normalized data."
##    propVar cumPropVar cond.R2 batch.R2
## 1    51.38      51.38   85.44    99.95
## 2    24.61      75.99   99.72    99.98
## 3    15.15      91.14   92.16    99.61
## 4     5.11      96.25   22.21    99.92
## 5     1.04      97.29    0.35     0.35
## 6     0.87      98.16    0.02     0.04
## 7     0.66      98.82    0.03     0.03
## 8     0.54      99.36    0.00     0.03
## 9     0.25      99.61    0.03     0.03
## 10    0.24      99.85    0.03     0.03
## 11    0.15     100.00    0.02     0.02
## [1] "Printing a qqplot of the normalized data."
## [1] "Loading grid"

## [1] "Loading grid"

## Using id as id variables

##   propVar cumPropVar cond.R2 batch.R2
## 1   99.68      99.68       0        0
## 2    0.28      99.96       0        0
## 3    0.03      99.99       0        0
## 4    0.01     100.00       0        0
## 5    0.01     100.01       0        0
## 6    0.00     100.01       0        0
## 7    0.00     100.01       0        0
## 8    0.00     100.01     100      100

all_metrics = graph_metrics(expt=all_expt)

## [1] "Graphing number of non-zero genes with respect to CPM by library."
## [1] "Graphing library sizes."
## [1] "Adding log10"
## [1] "Graphing a raw data boxplot on log scale."

## Using id as id variables

## [1] "Graphing a normalized boxplot."

## Using id as id variables

## [1] "Graphing a raw-data correlation heatmap."

## [1] "Graphing a raw-data standard median correlation."

## [1] "Graphing a normalized correlation heatmap."

## [1] "Graphing a normalized standard median correlation."

## [1] "Graphing a raw-data distance heatmap."

## [1] "Graphing a raw-data standard median distance."

## [1] "Graphing a normalized distance heatmap."

## [1] "Graphing a normalized standard median distance."

## [1] "Graphing a PCA plot of the raw data."
##    propVar cumPropVar cond.R2 batch.R2
## 1    99.43      99.43   94.82    82.08
## 2     0.43      99.86   60.37    82.16
## 3     0.06      99.92   61.52    44.83
## 4     0.03      99.95   15.78     9.54
## 5     0.02      99.97   58.40    51.86
## 6     0.01      99.98   82.60    31.91
## 7     0.01      99.99   70.58    48.97
## 8     0.01     100.00   10.87    38.42
## 9     0.00     100.00   22.63    11.61
## 10    0.00     100.00    7.63    17.60
## 11    0.00     100.00    5.02    45.59
## 12    0.00     100.00   28.23    61.55
## 13    0.00     100.00    5.90    14.68
## 14    0.00     100.00    4.00     5.18
## 15    0.00     100.00   39.52    24.00
## 16    0.00     100.00   46.35    27.39
## 17    0.00     100.00   74.65    65.19
## 18    0.00     100.00    3.40    12.22
## 19    0.00     100.00    0.09     6.92
## 20    0.00     100.00    0.05     1.27
## 21    0.00     100.00    0.11     0.13
## 22    0.00     100.00    2.48     6.58
## 23    0.00     100.00    5.01    10.33
## [1] "Printing a qqplot of the raw data."

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## [1] "Loading grid"

## [1] "Loading grid"

## [1] "This plot looks neat if you do position='fill' or position='stack'"

## Using  as id variables

## [1] "This plot looks neat if you do position='fill' or position='stack'"
## [1] "Graphing a PCA plot of the normalized data."
##    propVar cumPropVar cond.R2 batch.R2
## 1    36.05      36.05   93.28    81.54
## 2    20.96      57.01   87.35    63.96
## 3    14.53      71.54   55.38    45.82
## 4    10.65      82.19   77.26    56.87
## 5     5.81      88.00   89.46    45.83
## 6     2.64      90.64   61.16    62.48
## 7     2.04      92.68   24.19    36.00
## 8     1.39      94.07   76.58    24.67
## 9     1.23      95.30   71.61    63.33
## 10    0.90      96.20   14.78    39.27
## 11    0.83      97.03   33.36    38.79
## 12    0.52      97.55   10.95    48.00
## 13    0.40      97.95    0.95     1.20
## 14    0.34      98.29    0.40     4.21
## 15    0.30      98.59    0.43     0.74
## 16    0.29      98.88    1.84     7.09
## 17    0.27      99.15    0.09     0.72
## 18    0.26      99.41    0.19     0.93
## 19    0.20      99.61    0.28    14.73
## 20    0.13      99.74    0.26    56.74
## 21    0.10      99.84    0.04     0.07
## 22    0.09      99.93    0.11     6.82
## 23    0.07     100.00    0.05     0.19
## [1] "Printing a qqplot of the normalized data."
## [1] "Loading grid"

## [1] "Loading grid"

## Using id as id variables

##    propVar cumPropVar cond.R2 batch.R2
## 1    99.11      99.11   39.42        0
## 2     0.51      99.62   35.49        0
## 3     0.15      99.77   24.33        0
## 4     0.13      99.90   66.42        0
## 5     0.04      99.94   42.90        0
## 6     0.02      99.96    3.13        0
## 7     0.02      99.98    7.29        0
## 8     0.01      99.99   45.08        0
## 9     0.00      99.99    1.13        0
## 10    0.00      99.99    6.52        0
## 11    0.00      99.99    0.18        0
## 12    0.00      99.99    0.13        0
## 13    0.00      99.99    0.16        0
## 14    0.00      99.99    0.01        0
## 15    0.00      99.99    0.03        0
## 16    0.00      99.99    0.00        0

pdf(file="images/all_metrics.pdf", title="Metric graphs of all data.")
all_metrics$nonzero
all_metrics$libsize
all_metrics$raw_boxplot

## Warning: Removed 29553 rows containing non-finite values (stat_boxplot).

all_metrics$norm_boxplot
all_metrics$raw_corheat
all_metrics$norm_corheat
all_metrics$raw_smc
all_metrics$norm_smc
all_metrics$raw_disheat
all_metrics$norm_disheat
all_metrics$raw_smd
all_metrics$norm_smd
all_metrics$raw_pcaplot

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually. if you must have them.

## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually. if you must have them.

all_metrics$norm_pcaplot

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually. if you must have them.

## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually. if you must have them.

all_metrics$raw_qq

## $logs
##
## $ratios
##
## $medians
## $medians[[1]]
## [1] -0.02118
##
## $medians[[2]]
## [1] 0.2904
##
## $medians[[3]]
## [1] -0.6493
##
## $medians[[4]]
## [1] 0.4428
##
## $medians[[5]]
## [1] 0.2657
##
## $medians[[6]]
## [1] 0.1003
##
## $medians[[7]]
## [1] 0.2919
##
## $medians[[8]]
## [1] 0.227
##
## $medians[[9]]
## [1] 0.5087
##
## $medians[[10]]
## [1] -1.23
##
## $medians[[11]]
## [1] -1.312
##
## $medians[[12]]
## [1] -1.197
##
## $medians[[13]]
## [1] -0.2684
##
## $medians[[14]]
## [1] -0.1241
##
## $medians[[15]]
## [1] -0.06543
##
## $medians[[16]]
## [1] -1.416
##
## $medians[[17]]
## [1] -2.522
##
## $medians[[18]]
## [1] -0.8053
##
## $medians[[19]]
## [1] -1.212
##
## $medians[[20]]
## [1] -1.677
##
## $medians[[21]]
## [1] -1.719
##
## $medians[[22]]
## [1] 0.4435
##
## $medians[[23]]
## [1] 0.6136
##
## $medians[[24]]
## [1] 0.6618

all_metrics$norm_qq

## $logs
##
## $ratios
##
## $medians
## $medians[[1]]
## [1] 0.003293
##
## $medians[[2]]
## [1] 0.003288
##
## $medians[[3]]
## [1] 0.003316
##
## $medians[[4]]
## [1] 0.003285
##
## $medians[[5]]
## [1] 0.003284
##
## $medians[[6]]
## [1] 0.003285
##
## $medians[[7]]
## [1] 0.003273
##
## $medians[[8]]
## [1] 0.003291
##
## $medians[[9]]
## [1] 0.003289
##
## $medians[[10]]
## [1] 0.003466
##
## $medians[[11]]
## [1] 0.003491
##
## $medians[[12]]
## [1] 0.003466
##
## $medians[[13]]
## [1] 0.003339
##
## $medians[[14]]
## [1] 0.003321
##
## $medians[[15]]
## [1] 0.003308
##
## $medians[[16]]
## [1] 0.003486
##
## $medians[[17]]
## [1] 0.003968
##
## $medians[[18]]
## [1] 0.003378
##
## $medians[[19]]
## [1] 0.003464
##
## $medians[[20]]
## [1] 0.003604
##
## $medians[[21]]
## [1] 0.003615
##
## $medians[[22]]
## [1] 0.003297
##
## $medians[[23]]
## [1] 0.003277
##
## $medians[[24]]
## [1] 0.003287

all_metrics$raw_density

## [1] "Error in `colnames<-`(`*tmp*`, value = c(\"gene\", \"cond\", \"counts\")) : \n  'names' attribute [3] must be the same length as the vector [2]\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
##

all_metrics$norm_density
all_metrics$batch_boxplot

## Warning in scale$trans$trans(x): NaNs produced

## Warning: Removed 24336 rows containing non-finite values (stat_boxplot).

all_metrics$batch_disheat
all_metrics$batch_corheat
all_metrics$batch_pcaplot

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually. if you must have them.

## Warning: Removed 5 rows containing missing values (geom_point).

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually. if you must have them.

dev.off()

## pdf
##   2

##kept = expt_subset(kept_expt, "")
##kept$names = paste(kept$design$condition, kept$design$batch, sep='-')
kept_metrics = graph_metrics(expt=kept_expt)

## [1] "Graphing number of non-zero genes with respect to CPM by library."
## [1] "Graphing library sizes."
## [1] "Adding log10"
## [1] "Graphing a raw data boxplot on log scale."

## Using id as id variables

## [1] "Graphing a normalized boxplot."

## Using id as id variables

## [1] "Graphing a raw-data correlation heatmap."

## [1] "Graphing a raw-data standard median correlation."

## [1] "Graphing a normalized correlation heatmap."

## [1] "Graphing a normalized standard median correlation."

## [1] "Graphing a raw-data distance heatmap."

## [1] "Graphing a raw-data standard median distance."

## [1] "Graphing a normalized distance heatmap."

## [1] "Graphing a normalized standard median distance."

## [1] "Graphing a PCA plot of the raw data."
##    propVar cumPropVar cond.R2 batch.R2
## 1    99.47      99.47   95.46    81.55
## 2     0.44      99.91   93.50    80.37
## 3     0.05      99.96   90.31    59.16
## 4     0.02      99.98   80.49    66.13
## 5     0.01      99.99   81.03    59.86
## 6     0.01     100.00    9.35    39.82
## 7     0.00     100.00   24.27    10.66
## 8     0.00     100.00    8.28    17.31
## 9     0.00     100.00    7.11    43.76
## 10    0.00     100.00   30.78    32.03
## 11    0.00     100.00    7.96    10.04
## 12    0.00     100.00    8.47     8.38
## 13    0.00     100.00   46.85    25.65
## 14    0.00     100.00   36.64     7.26
## 15    0.00     100.00   69.54    24.26
## 16    0.00     100.00    2.06    11.61
## 17    0.00     100.00    0.09     4.25
## 18    0.00     100.00    0.03     0.84
## 19    0.00     100.00    0.26     0.40
## 20    0.00     100.00    2.37     6.30
## 21    0.00     100.00    5.12    10.38
## [1] "Printing a qqplot of the raw data."

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## Warning in KernSmooth::bkde2D(x, bandwidth = bandwidth, gridsize = nbin, :
## Binning grid too coarse for current (small) bandwidth: consider increasing
## 'gridsize'

## [1] "Loading grid"

## [1] "Loading grid"

## [1] "This plot looks neat if you do position='fill' or position='stack'"

## Using  as id variables

## [1] "This plot looks neat if you do position='fill' or position='stack'"
## [1] "Graphing a PCA plot of the normalized data."
##    propVar cumPropVar cond.R2 batch.R2
## 1    42.84      42.84   98.59    80.74
## 2    24.88      67.72   94.55    54.63
## 3    13.28      81.00   99.93    97.75
## 4     6.00      87.00   96.40    53.58
## 5     3.14      90.14   76.26    66.30
## 6     2.22      92.36   92.02    40.81
## 7     1.84      94.20   83.50    27.03
## 8     1.09      95.29   15.15    28.36
## 9     1.03      96.32   30.21    42.10
## 10    0.65      96.97    9.27    18.97
## 11    0.50      97.47    0.75     0.61
## 12    0.41      97.88    0.44     3.79
## 13    0.38      98.26    0.32     0.85
## 14    0.36      98.62    1.34     5.92
## 15    0.34      98.96    0.15     0.59
## 16    0.32      99.28    0.20     0.71
## 17    0.25      99.53    0.28    13.25
## 18    0.16      99.69    0.48    56.85
## 19    0.12      99.81    0.03     0.02
## 20    0.12      99.93    0.12     6.99
## 21    0.08     100.01    0.02     0.15
## [1] "Printing a qqplot of the normalized data."
## [1] "Loading grid"

## [1] "Loading grid"

## Using id as id variables

##    propVar cumPropVar cond.R2 batch.R2
## 1    99.34      99.34   32.57        0
## 2     0.54      99.88   49.69        0
## 3     0.06      99.94   42.31        0
## 4     0.03      99.97    6.90        0
## 5     0.02      99.99    4.99        0
## 6     0.01     100.00   44.15        0
## 7     0.00     100.00   11.90        0
## 8     0.00     100.00   44.91        0
## 9     0.00     100.00    1.35        0
## 10    0.00     100.00   25.16        0
## 11    0.00     100.00    3.08        0
## 12    0.00     100.00    1.24        0
## 13    0.00     100.00    0.59        0
## 14    0.00     100.00    0.03        0
## 15    0.00     100.00    0.01        0

pdf(file="images/kept_metrics.pdf")
kept_metrics$nonzero
kept_metrics$libsize
kept_metrics$raw_boxplot

## Warning: Removed 26694 rows containing non-finite values (stat_boxplot).

kept_metrics$norm_boxplot
kept_metrics$raw_corheat
kept_metrics$norm_corheat
kept_metrics$raw_smc
kept_metrics$norm_smc
kept_metrics$raw_disheat
kept_metrics$norm_disheat
kept_metrics$raw_smd
kept_metrics$norm_smd
kept_metrics$raw_pcaplot

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 7.
## Consider specifying shapes manually. if you must have them.

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 7.
## Consider specifying shapes manually. if you must have them.

kept_metrics$norm_pcaplot

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 7.
## Consider specifying shapes manually. if you must have them.

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 7.
## Consider specifying shapes manually. if you must have them.

kept_metrics$raw_qq

## $logs
##
## $ratios
##
## $medians
## $medians[[1]]
## [1] 0.03633
##
## $medians[[2]]
## [1] 0.3476
##
## $medians[[3]]
## [1] 0.5029
##
## $medians[[4]]
## [1] 0.3223
##
## $medians[[5]]
## [1] 0.1577
##
## $medians[[6]]
## [1] 0.3489
##
## $medians[[7]]
## [1] 0.2838
##
## $medians[[8]]
## [1] -1.173
##
## $medians[[9]]
## [1] -1.256
##
## $medians[[10]]
## [1] -1.14
##
## $medians[[11]]
## [1] -0.2089
##
## $medians[[12]]
## [1] -0.06449
##
## $medians[[13]]
## [1] -0.004287
##
## $medians[[14]]
## [1] -1.36
##
## $medians[[15]]
## [1] -2.465
##
## $medians[[16]]
## [1] -0.7495
##
## $medians[[17]]
## [1] -1.155
##
## $medians[[18]]
## [1] -1.62
##
## $medians[[19]]
## [1] -1.663
##
## $medians[[20]]
## [1] 0.5059
##
## $medians[[21]]
## [1] 0.6746
##
## $medians[[22]]
## [1] 0.7226

kept_metrics$norm_qq

## $logs
##
## $ratios
##
## $medians
## $medians[[1]]
## [1] 0.003575
##
## $medians[[2]]
## [1] 0.003579
##
## $medians[[3]]
## [1] 0.003569
##
## $medians[[4]]
## [1] 0.003575
##
## $medians[[5]]
## [1] 0.003578
##
## $medians[[6]]
## [1] 0.003573
##
## $medians[[7]]
## [1] 0.003572
##
## $medians[[8]]
## [1] 0.003694
##
## $medians[[9]]
## [1] 0.003722
##
## $medians[[10]]
## [1] 0.003677
##
## $medians[[11]]
## [1] 0.003589
##
## $medians[[12]]
## [1] 0.003588
##
## $medians[[13]]
## [1] 0.00359
##
## $medians[[14]]
## [1] 0.0037
##
## $medians[[15]]
## [1] 0.00422
##
## $medians[[16]]
## [1] 0.003613
##
## $medians[[17]]
## [1] 0.003678
##
## $medians[[18]]
## [1] 0.003812
##
## $medians[[19]]
## [1] 0.00382
##
## $medians[[20]]
## [1] 0.003568
##
## $medians[[21]]
## [1] 0.003574
##
## $medians[[22]]
## [1] 0.003565

kept_metrics$raw_density

## [1] "Error in `colnames<-`(`*tmp*`, value = c(\"gene\", \"cond\", \"counts\")) : \n  'names' attribute [3] must be the same length as the vector [2]\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
##

kept_metrics$norm_density
kept_metrics$batch_boxplot

## Warning in scale$trans$trans(x): NaNs produced

## Warning: Removed 15259 rows containing non-finite values (stat_boxplot).

kept_metrics$batch_disheat
kept_metrics$batch_corheat
kept_metrics$batch_pcaplot

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 7.
## Consider specifying shapes manually. if you must have them.

## Warning: Removed 3 rows containing missing values (geom_point).

## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 7.
## Consider specifying shapes manually. if you must have them.

dev.off()

## pdf
##   2

Compare normalizations

The following block will do a few box plots so that we can look at differences in normalization strategy. It leads off with a completely unmodified data set.

The CL14 'E' epimastigote sample is a big mess, as will be shown below. I think it should be removed post-haste.

hpgl_libsize(expt=all_expt, text=TRUE)

## [1] "Adding log10"

hpgl_boxplot(expt=all_expt, scale="log")

## Using id as id variables

## Warning: Removed 29553 rows containing non-finite values (stat_boxplot).

##all_sfcpml2_df = hpgl_norm(expt=all_expt, transform="log2", norm="sf", convert="cpm")
all_sfcpml2 = normalize_expt(all_expt, transform="log2", norm="sf", convert="cpm")

## [1] "This function defaults to using the original expressionset for normalization."

##exprs(all_sfcpml2$expressionset) = as.matrix(all_sfcpml2_df$counts)
hpgl_boxplot(expt=all_sfcpml2, names=all_sfcpml2$names)

## Using id as id variables

##all_qcpml2_df = hpgl_norm(expt=all, transform="log2", norm="quant", convert="cpm", names=all_sfcpml2$names)
all_qcpml2 = normalize_expt(all_expt, transform="log2", norm="quant", convert="cpm")

## [1] "This function defaults to using the original expressionset for normalization."

##exprs(all_qcpml2$expressionset) = as.matrix(all_qcpml2_df$counts)
hpgl_boxplot(expt=all_qcpml2)

## Using id as id variables

hpgl_corheat(expt=all_qcpml2)

#kept_sfcpml2_df = hpgl_norm(expt=kept, transform="log2", norm="sf", convert="cpm")
kept_sfcpml2 = normalize_expt(kept_expt, transform="log2", norm="sf", convert="cpm")

## [1] "This function defaults to using the original expressionset for normalization."

#exprs(kept_sfcpml2$expressionset) = as.matrix(kept_sfcpml2_df$counts)
hpgl_boxplot(expt=kept_sfcpml2, names=kept_sfcpml2$names)

## Using id as id variables

#kept_qcpml2_df = hpgl_norm(expt=kept, transform="log2", norm="quant", convert="cpm")
kept_qcpml2 = normalize_expt(kept_expt, transform="log2", norm="quant", convert="cpm")

## [1] "This function defaults to using the original expressionset for normalization."

##exprs(kept_qcpml2$expressionset) = as.matrix(kept_qcpml2_df$counts)
hpgl_boxplot(expt=kept_qcpml2, names=kept_qcpml2$names)

## Using id as id variables

hpgl_corheat(expt=kept_qcpml2, names=kept_qcpml2$names)

Testing density and qq

density_test = hpgl_density_plot(expt=kept_qcpml2, position="fill", fill=TRUE)

## [1] "This plot looks neat if you do position='fill' or position='stack'"

density_test

test_data = exprs(kept_qcpml2$expressionset)
qq_fun = suppressWarnings(hpgl_qq_all(expt=kept_qcpml2))

## [1] "Loading grid"

## [1] "Loading grid"

print(qq_fun$logs)

print(qq_fun$ratios)

Playing with material from EdgeR

I want to try and figure out some of the metrics suggested in the EdgeR manual.

test = exprs(kept_qcpml2$expressionset)^2
test = DGEList(counts=test)
test_disp = estimateDisp(test)
summary(test_disp$prior.df)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   2.549   2.549   2.549   2.549   2.549   2.549

sqrt(test_disp$common.disp)

## [1] 0.111491

plotBCV(test_disp)

plotMDS(test, method="bcv")

hpgl_plot_bcv(all_expt)

Voom/limma invocation

The following few comparisons are going to use the 'all' data set. When I do a larger set of limma contrasts, I will just use the 'kept' set.

Simple pairwise comparisons

The following blocks are no longer being evaluated because they remove most of the samples from the data and therefore over-estimate significances. But they do provide reasonable examples of how one might perform simple pairwise comparisons in data.

Compare epimastigotes

epi_subset = expt_subset(all_qcpml2, "condition=='clbr_epi'|condition=='cl14_epi'")
epi_graphs = graph_metrics(expt=epi_subset)
epi_graphs$norm_pcaplot
epi_graphs$norm_disheat
epi_graphs$norm_corheat
epi_comparison = simple_comparison(epi_subset)
epi_table = epi_comparison$table
epi_comparison$pvalue_histogram
epi_comparison$volcano_plot
epi_comparison$ma_plot
epi_comparison$coefficient_scatter

Compare Trypomastigotes

tryp_subset = expt_subset(all_qcpml2, "condition=='clbr_tryp'|condition=='cl14_tryp'")
tryp_graphs = graph_metrics(expt=tryp_subset)
tryp_graphs$norm_pcaplot
tryp_graphs$norm_disheat
tryp_graphs$norm_corheat
tryp_comparison = simple_comparison(tryp_subset)
tryp_table = tryp_comparison$table
tryp_comparison$pvalue_histogram
tryp_comparison$volcano_plot
tryp_comparison$ma_plot
tryp_comparison$coefficient_scatter

Compare Amastigotes 60 hours

a60_subset = expt_subset(all_qcpml2, "condition=='clbr_a60'|condition=='cl14_a60'")
a60_graphs = graph_metrics(expt=a60_subset)
a60_graphs$norm_pcaplot
a60_graphs$norm_corheat
a60_graphs$norm_disheat
a60_comparison = simple_comparison(a60_subset)
a60_table = a60_comparison$table
a60_comparison$pvalue_histogram
a60_comparison$volcano_plot
a60_comparison$ma_plot
a60_comparison$coefficient_scatter

Single-factor all the conditions and batches

A more appropriate method for performing the various comparisons of strains across time is to put all the extant data into a single contrast. The following block does that and attempts to take into account the relative contribution of each batch and sample.

The variables which were created to hold the relative contributions of these samples were also used to perform the various subtractions. Thus, when limma is actually run, it will provide data structures of the abundances of each sample type as well as the various comparisons.

## acb stands for "all_conditions_batches"  which takes too long to
## type when setting up the contrasts.
acb = paste0(kept_qcpml2$conditions, kept_qcpml2$batches)
kept_data = exprs(kept_qcpml2$expressionset)
table(acb)

## acb
##  cl14_a60A  cl14_a60B  cl14_a96C  cl14_epiF cl14_trypB cl14_trypD
##          2          1          3          2          1          1
## cl14_trypG  clbr_a60A  clbr_a96C  clbr_epiE clbr_trypG
##          1          3          3          3          2

## The invocation of table() keptows me to count up the contribution of
## each condition/batch combination to the whole data set.

## Doing this (as I understand it) means I do nothave to worry about
## balanced samples so much, but must be more careful to understand
## the relative contribution of each sample type to the entire data
## set.

complete_model = model.matrix(~0 + acb)
complete_fit = lmFit(kept_data, complete_model)
complete_voom = hpgl_voom(kept_data, complete_model)
complete_voom$plot

complete_model

##    acbcl14_a60A acbcl14_a60B acbcl14_a96C acbcl14_epiF acbcl14_trypB
## 1             0            0            0            1             0
## 2             0            0            0            1             0
## 3             0            0            0            0             0
## 4             0            0            0            0             0
## 5             0            0            0            0             0
## 6             0            0            0            0             0
## 7             0            0            0            0             1
## 8             0            0            1            0             0
## 9             0            0            1            0             0
## 10            0            0            1            0             0
## 11            0            0            0            0             0
## 12            0            0            0            0             0
## 13            0            0            0            0             0
## 14            1            0            0            0             0
## 15            0            1            0            0             0
## 16            1            0            0            0             0
## 17            0            0            0            0             0
## 18            0            0            0            0             0
## 19            0            0            0            0             0
## 20            0            0            0            0             0
## 21            0            0            0            0             0
## 22            0            0            0            0             0
##    acbcl14_trypD acbcl14_trypG acbclbr_a60A acbclbr_a96C acbclbr_epiE
## 1              0             0            0            0            0
## 2              0             0            0            0            0
## 3              0             0            0            0            1
## 4              0             0            0            0            1
## 5              0             0            0            0            1
## 6              1             0            0            0            0
## 7              0             0            0            0            0
## 8              0             0            0            0            0
## 9              0             0            0            0            0
## 10             0             0            0            0            0
## 11             0             0            0            1            0
## 12             0             0            0            1            0
## 13             0             0            0            1            0
## 14             0             0            0            0            0
## 15             0             0            0            0            0
## 16             0             0            0            0            0
## 17             0             0            1            0            0
## 18             0             0            1            0            0
## 19             0             0            1            0            0
## 20             0             1            0            0            0
## 21             0             0            0            0            0
## 22             0             0            0            0            0
##    acbclbr_trypG
## 1              0
## 2              0
## 3              0
## 4              0
## 5              0
## 6              0
## 7              0
## 8              0
## 9              0
## 10             0
## 11             0
## 12             0
## 13             0
## 14             0
## 15             0
## 16             0
## 17             0
## 18             0
## 19             0
## 20             0
## 21             1
## 22             1
## attr(,"assign")
##  [1] 1 1 1 1 1 1 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$acb
## [1] "contr.treatment"

epi_cl14 = "acbcl14_epiF"
epi_clbr = "acbclbr_epiE"
tryp_cl14 = "(acbcl14_trypB + acbcl14_trypD + acbcl14_trypG) / 3"
tryp_clbr = "acbclbr_trypG"
a60_cl14 =  "(acbcl14_a60A * 2/3) + (acbcl14_a60B * 1/3)"
a60_clbr = "acbclbr_a60A"
a96_cl14 = "acbcl14_a96C"
a96_clbr = "acbclbr_a96C"
epi_cl14clbr = paste0("(",epi_cl14,")", "  -  ", "(",epi_clbr,")")
tryp_cl14clbr = paste0("(",tryp_cl14,")", "  -  ", "(",tryp_clbr,")")
a60_cl14clbr = paste0("(",a60_cl14,")", "  -  ", "(",a60_clbr,")")
a96_cl14clbr = paste0("(",a96_cl14,")", "  -  ", "(",a96_clbr,")")
epitryp_cl14 = paste0("(",tryp_cl14,")", "  -  ", "(",epi_cl14,")")
epitryp_clbr = paste0("(",tryp_clbr,")", "  -  ", "(",epi_clbr,")")
epia60_cl14 = paste0("(",a60_cl14,")", "  -  ", "(",epi_cl14,")")
epia60_clbr = paste0("(",a60_clbr,")", "  -  ", "(",epi_clbr,")")
a60a96_cl14 = paste0("(",a96_cl14,")", "  -  ", "(",a60_cl14,")")
a60a96_clbr = paste0("(",a96_clbr,")", "  -  ", "(",a60_clbr,")")
a60tryp_cl14 = paste0("(",tryp_cl14,")", "  -  ", "(",a60_cl14,")")
a60tryp_clbr = paste0("(",tryp_clbr,")", "  -  ", "(",a60_clbr,")")
a96tryp_cl14 = paste0("(",tryp_cl14,")", "  -  ", "(",a96_cl14,")")
a96tryp_clbr = paste0("(",tryp_clbr,")", "  -  ", "(",a96_clbr,")")

## The following contrast is messed up in some as of yet unknown way.
epitryp_cl14clbr = paste0("(",epitryp_cl14,")", "  -  ", "(",epitryp_clbr,")")
## So I will add some more contrasts using data which doesn't get screwed up
epia60_cl14clbr = paste0("(",epia60_cl14,")", "  -  ", "(",epia60_clbr,")")
a60tryp_cl14clbr = paste0("(",a60tryp_cl14,")", "  -  ", "(",a60tryp_clbr,")")
a60a96_cl14clbr = paste0("(",a60a96_cl14,")", "  -  ", "(",a60a96_clbr,")")

complete_contrasts_v2 = makeContrasts(
    epi_cl14=epi_cl14,
    epi_clbr=epi_clbr,
    tryp_cl14=tryp_cl14,
    tryp_clbr=tryp_clbr,
    a60_cl14=a60_cl14,
    a60_clbr=a60_clbr,
    a96_cl14=a96_cl14,
    a96_clbr=a96_clbr,
    epi_cl14clbr=epi_cl14clbr,
    tryp_cl14clbr=tryp_cl14clbr,
    a60_cl14clbr=a60_cl14clbr,
    a96_cl14clbr=a96_cl14clbr,
    epitryp_cl14=epitryp_cl14,
    epitryp_clbr=epitryp_clbr,
    epia60_cl14=epia60_cl14,
    epia60_clbr=epia60_clbr,
    a60a96_cl14=a60a96_cl14,
    a60a96_clbr=a60a96_clbr,
    a60tryp_cl14=a60tryp_cl14,
    a60tryp_clbr=a60tryp_clbr,
    a96tryp_cl14=a96tryp_cl14,
    a96tryp_clbr=a96tryp_clbr,
    epitryp_cl14clbr=epitryp_cl14clbr,
    epia60_cl14clbr=epia60_cl14clbr,
    a60tryp_cl14clbr=a60tryp_cl14clbr,
    a60a96_cl14clbr=a60a96_cl14clbr,
    levels=complete_voom$design)
## This colnames() is annoyingly necessary to avoid really obnoxious contrast names.
colnames(complete_contrasts_v2) = c("epi_cl14","epi_clbr","tryp_cl14","tryp_clbr","a60_cl14","a60_clbr","a96_cl14","a96_clbr","epi_cl14clbr","tryp_cl14clbr","a60_cl14clbr","a96_cl14clbr","epitryp_cl14","epitryp_clbr","epia60_cl14","epia60_clbr","a60tryp_cl14","a60tryp_clbr","a96tryp_cl14","a96tryp_clbr","a60a96_cl14","a60a96_clbr","epitryp_cl14clbr","epia60_cl14clbr","a60tryp_cl14clbr","a60a96_cl14clbr")
kept_fits = contrasts.fit(complete_fit, complete_contrasts_v2)
kept_comparisons = eBayes(kept_fits)

Analysis with only condition in the model

con = kept_qcpml2$conditions
condition_model = model.matrix(~0 + con)
condition_voom = hpgl_voom(kept_data, condition_model)
condition_lmfit = lmFit(condition_voom, condition_model)
des = condition_voom$design
condition_contrasts = makeContrasts(
    epi_cl14=concl14_epi,
    epi_clbr=conclbr_epi,
    a60_cl14=concl14_a60,
    a60_clbr=conclbr_a60,
    a96_cl14=concl14_a96,
    a96_clbr=conclbr_a96,
    tryp_cl14=concl14_tryp,
    tryp_clbr=conclbr_tryp,
    epi_cl14clbr=concl14_epi - conclbr_epi,
    tryp_cl14clbr=concl14_tryp - conclbr_tryp,
    a60_cl14clbr=concl14_a60 - conclbr_a60,
    a96_cl14clbr=concl14_a96 - conclbr_a96,
    a60a96_cl14=concl14_a96 - concl14_a60,
    a60a96_clbr=conclbr_a96 - conclbr_a60,
    a96tryp_cl14=concl14_tryp - concl14_a96,
    a96tryp_clbr=conclbr_tryp - conclbr_a96,
    a60a96_cl14clbr=(concl14_a96-concl14_a60)-(conclbr_a96-conclbr_a60),
    a96tryp_cl14clbr=(concl14_tryp-concl14_a96)-(conclbr_tryp-conclbr_a96),
    levels=des)
colnames(condition_contrasts) = c("epi_cl14","epi_clbr","a60_cl14","a60_clbr","a96_cl14","a96_clbr","tryp_cl14","tryp_clbr","epi_cl14clbr","tryp_cl14clbr","a60_cl14clbr","a96_cl14clbr","a60a96_cl14","a60a96_clbr","a96tryp_cl14","a96_tryp_clbr","a60a96_cl14clbr","a96tryp_cl14clbr")
condition_fits = contrasts.fit(condition_lmfit, condition_contrasts)
condition_comparisons = eBayes(condition_fits)
condition_table = topTable(condition_comparisons, adjust="fdr", n=nrow(kept_data))
condition_tables = write_limma(condition_comparisons, excel=FALSE, csv=FALSE)

## [1] "Printing table: epi_cl14"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for epi_cl14.

## [1] "Printing table: epi_clbr"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for epi_clbr.

## [1] "Printing table: a60_cl14"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for a60_cl14.

## [1] "Printing table: a60_clbr"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for a60_clbr.

## [1] "Printing table: a96_cl14"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for a96_cl14.

## [1] "Printing table: a96_clbr"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for a96_clbr.

## [1] "Printing table: tryp_cl14"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for tryp_cl14.

## [1] "Printing table: tryp_clbr"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for tryp_clbr.

## [1] "Printing table: epi_cl14clbr"
## [1] "Printing table: tryp_cl14clbr"
## [1] "Printing table: a60_cl14clbr"
## [1] "Printing table: a96_cl14clbr"
## [1] "Printing table: a60a96_cl14"
## [1] "Printing table: a60a96_clbr"
## [1] "Printing table: a96tryp_cl14"
## [1] "Printing table: a96_tryp_clbr"
## [1] "Printing table: a60a96_cl14clbr"
## [1] "Printing table: a96tryp_cl14clbr"

a60_cl14clbr_ls = hpgl_linear_scatter(condition_table[,c("a60_cl14", "a60_clbr")], loess=TRUE)

## [1] "No binwidth nor bins provided, setting it to 0.0406815273104036 in order to have 500 bins."

a60_cl14clbr_ls$scatter

a60_cl14clbr_ls$both_histogram$plot

a60_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##     -0.4255       1.0402

a60_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##     Min      1Q  Median      3Q     Max
## -2.9224 -0.2125  0.0263  0.2540  6.2850
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.425451   0.016316  -26.08   <2e-16 ***
## first        1.040162   0.001829  568.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.4087
## Multiple R-squared:  0.9374, Adjusted R-squared:  0.9374
## Convergence in 10 IRWLS iterations
##
## Robustness weights:
##  366 observations c(16199,16758,17747,17806,17837,17931,18086,18218,18224,18245,18367,18392,18405,18412,18465,18468,18476,18504,18524,18533,18537,18539,18555,18557,18598,18643,18648,18664,18670,18675,18695,18716,18729,18736,18743,18759,18795,18830,18881,18885,18938,18944,18948,18963,18971,18985,19048,19055,19114,19137,19149,19157,19158,19171,19174,19177,19187,19204,19219,19224,19226,19234,19244,19252,19259,19260,19278,19285,19297,19303,19329,19331,19332,19339,19343,19348,19353,19387,19393,19394,19397,19415,19418,19420,19446,19471,19502,19511,19518,19531,19562,19571,19584,19594,19596,19599,19604,19611,19634,19639,19660,19663,19664,19666,19670,19673,19681,19699,19720,19721,19725,19726,19728,19730,19738,19741,19755,19758,19820,19826,19843,19847,19849,19860,19869,19876,19907,19908,19912,19918,19932,19934,19939,19948,19963,19964,19973,19974,19977,19979,19985,19986,20007,20008,20012,20015,20017,20023,20036,20048,20052,20056,20059,20080,20084,20093,20099,20100,20105,20107,20110,20126,20132,20134,20139,20146,20149,20150,20169,20174,20175,20176,20178,20184,20195,20205,20210,20212,20216,20219,20225,20234,20241,20243,20247,20248,20253,20256,20260,20262,20264,20267,20271,20280,20284,20285,20289,20291,20294,20309,20312,20316,20322,20328,20336,20344,20349,20354,20357,20362,20371,20380,20381,20383,20391,20395,20397,20406,20408,20411,20418,20426,20436,20446,20448,20451,20457,20471,20477,20487,20492,20496,20499,20501,20502,20510,20514,20515,20520,20549,20550,20554,20555,20557,20564,20569,20574,20575,20577,20579,20586,20588,20590,20601,20608,20611,20620,20621,20625,20634,20636,20640,20646,20650,20651,20654,20659,20663,20669,20672,20677,20679,20691,20697,20706,20709,20717,20724,20728,20741,20743,20745,20749,20754,20756,20770,20777,20780,20783,20786,20797,20805,20820,20823,20827,20829,20830,20832,20846,20848,20849,20851,20865,20874,20878,20881,20884,20885,20896,20911,20917,20919,20922,20927,20932,20944,20945,20946,20956,20959,20965,20974,20977,20978,20988,20991,21002,21003,21011,21017,21039,21041,21054,21057,21060,21061,21067,21072,21083,21086,21087,21104,21123,21129,21133,21135,21154,21159,21170,21172,21175,21195,21201,21202,21207,21210,21214,21217,21221,21225,21226,21244,21250,21257,21263,21292)
##   are outliers with |weight| <= 4.7e-06 ( < 4.7e-06);
##  15166 weights are ~= 1. The remaining 5762 ones are summarized as
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
## 0.000006 0.532500 0.839400 0.719700 0.965500 0.999000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a60_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 293.3474, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8926516 0.8979807
## sample estimates:
##       cor
## 0.8953482

a96_cl14clbr_ls = hpgl_linear_scatter(condition_table[,c("a96_cl14", "a96_clbr")], loess=TRUE)

## [1] "No binwidth nor bins provided, setting it to 0.0441692997385908 in order to have 500 bins."

a96_cl14clbr_ls$scatter

a96_cl14clbr_ls$both_histogram$plot

a96_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##      0.3635       0.9511

a96_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -6.59299 -0.30323 -0.00265  0.31022  4.68608
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.363468   0.020546   17.69   <2e-16 ***
## first       0.951084   0.002301  413.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.5431
## Multiple R-squared:  0.8865, Adjusted R-squared:  0.8865
## Convergence in 14 IRWLS iterations
##
## Robustness weights:
##  7 observations c(11668,13570,15500,17657,18418,19976,21287)
##   are outliers with |weight| = 0 ( < 4.7e-06);
##  15396 weights are ~= 1. The remaining 5891 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000333 0.5751000 0.8631000 0.7403000 0.9636000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a96_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 319.9738, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9075158 0.9121412
## sample estimates:
##       cor
## 0.9098568

tryp_cl14clbr_ls = hpgl_linear_scatter(condition_table[,c("tryp_cl14", "tryp_clbr")], loess=TRUE)

## [1] "No binwidth nor bins provided, setting it to 0.044226555188013 in order to have 500 bins."

tryp_cl14clbr_ls$scatter

tryp_cl14clbr_ls$both_histogram$plot

tryp_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##     -0.1744       1.0063

tryp_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -4.44078 -0.29409  0.03021  0.33090  8.44905
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.174380   0.023936  -7.285 3.32e-13 ***
## first        1.006282   0.002673 376.479  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.5833
## Multiple R-squared:  0.8695, Adjusted R-squared:  0.8695
## Convergence in 13 IRWLS iterations
##
## Robustness weights:
##  168 observations c(15266,16199,17483,17747,17806,17931,17935,18086,18218,18405,18465,18537,18643,18676,18713,18743,18795,18881,18938,18944,18971,18985,19048,19137,19149,19158,19174,19219,19224,19246,19260,19285,19332,19339,19353,19387,19393,19397,19420,19511,19531,19569,19584,19594,19596,19634,19670,19681,19720,19721,19730,19741,19755,19820,19826,19843,19847,19907,19912,19918,19932,19934,19939,19964,19973,19986,20012,20017,20036,20048,20056,20059,20080,20093,20100,20105,20110,20126,20132,20169,20175,20178,20184,20195,20212,20216,20225,20243,20248,20264,20267,20271,20284,20289,20291,20309,20311,20316,20331,20336,20349,20380,20397,20446,20457,20466,20492,20499,20502,20510,20515,20549,20554,20564,20586,20587,20588,20590,20608,20611,20620,20621,20636,20646,20651,20654,20697,20709,20717,20724,20728,20749,20754,20756,20770,20780,20786,20797,20823,20829,20830,20849,20851,20865,20874,20884,20919,20922,20944,20945,20956,20958,20959,20977,21011,21039,21057,21061,21067,21083,21087,21104,21129,21135,21154,21172,21226,21250)
##   are outliers with |weight| <= 2e-06 ( < 4.7e-06);
##  14947 weights are ~= 1. The remaining 6179 ones are summarized as
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
## 0.000005 0.540800 0.853500 0.722800 0.965100 0.999000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

tryp_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 220.9448, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8303204 0.8384797
## sample estimates:
##       cor
## 0.8344457

Write out the results of the contrasts

topTable() will get called quite a lot in order to generate the various tables of interest. write_xls() will follow to write out the results into some excel spreadsheets.

## I wrote a function 'write_limma' which takes care of making a data structure of all the tables and printing the results in excel.
## Lets use that rather than this stupid list of topTable() calls.
## In addition, I am adding q-values and other metrics to improve the output from toptable, so it is a win I think.

kept_table = topTable(kept_comparisons, adjust="fdr", n=nrow(kept_data))
## Use the coef argument to pull out adjusted p-values for subsections of the data
all_tables = write_limma(kept_comparisons)

## [1] "Printing table: epi_cl14"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for epi_cl14.

## [1] "Printing table: epi_clbr"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for epi_clbr.

## [1] "Printing table: tryp_cl14"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for tryp_cl14.

## [1] "Printing table: tryp_clbr"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for tryp_clbr.

## [1] "Printing table: a60_cl14"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for a60_cl14.

## [1] "Printing table: a60_clbr"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for a60_clbr.

## [1] "Printing table: a96_cl14"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for a96_cl14.

## [1] "Printing table: a96_clbr"
## [1] "ERROR: The estimated pi0 <= 0. Check that you have valid p-values or use another lambda method."

## The qvalue estimation failed for a96_clbr.

## [1] "Printing table: epi_cl14clbr"
## [1] "Printing table: tryp_cl14clbr"
## [1] "Printing table: a60_cl14clbr"
## [1] "Printing table: a96_cl14clbr"
## [1] "Printing table: epitryp_cl14"
## [1] "Printing table: epitryp_clbr"
## [1] "Printing table: epia60_cl14"
## [1] "Printing table: epia60_clbr"
## [1] "Printing table: a60tryp_cl14"
## [1] "Printing table: a60tryp_clbr"
## [1] "Printing table: a96tryp_cl14"
## [1] "Printing table: a96tryp_clbr"
## [1] "Printing table: a60a96_cl14"
## [1] "Printing table: a60a96_clbr"
## [1] "Printing table: epitryp_cl14clbr"
## [1] "Printing table: epia60_cl14clbr"
## [1] "Printing table: a60tryp_cl14clbr"
## [1] "Printing table: a60a96_cl14clbr"

Plot some results

The function hpgl_linear_scatter() provides an easy way to perform some simple visualizations of the contrasts

Compare Epimastigotes from CL14 to CLBrener

## Make some scatter plots of the input data to see if these tables seem sane.
epi_cl14clbr_ls = hpgl_linear_scatter(kept_table[,c("epi_cl14", "epi_clbr")], gvis_filename="html/epi_cl14clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0440923149269775 in order to have 500 bins."

epi_cl14clbr_ls$scatter

epi_cl14clbr_ls$both_histogram$plot

epi_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##      0.1435       0.9772

epi_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -4.27378 -0.33421 -0.03518  0.34692  5.14334
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.143537   0.021205   6.769 1.33e-11 ***
## first       0.977160   0.002389 409.101  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.552
## Multiple R-squared:  0.8857, Adjusted R-squared:  0.8857
## Convergence in 10 IRWLS iterations
##
## Robustness weights:
##  2 observations c(12806,21292)
##   are outliers with |weight| <= 1.8e-06 ( < 4.7e-06);
##  15302 weights are ~= 1. The remaining 5990 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000797 0.6743000 0.8959000 0.7773000 0.9745000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

epi_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 333.5902, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9140015 0.9183164
## sample estimates:
##       cor
## 0.9161855

Compare trypomastigotes from CL14 to CLBrener

tryp_cl14clbr_ls = hpgl_linear_scatter(kept_table[,c("tryp_cl14", "tryp_clbr")], gvis_filename="html/tryp_cl14clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.044226555188013 in order to have 500 bins."

tryp_cl14clbr_ls$scatter

tryp_cl14clbr_ls$both_histogram$plot

tryp_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##     -0.1744       1.0063

tryp_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -4.44078 -0.29409  0.03021  0.33090  8.44905
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.174381   0.023936  -7.285 3.32e-13 ***
## first        1.006283   0.002673 376.479  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.5833
## Multiple R-squared:  0.8695, Adjusted R-squared:  0.8695
## Convergence in 13 IRWLS iterations
##
## Robustness weights:
##  168 observations c(17291,19611,19823,20100,20194,20211,20268,20275,20280,20299,20320,20323,20361,20395,20426,20448,20454,20474,20481,20482,20485,20486,20488,20519,20523,20531,20540,20545,20548,20553,20558,20565,20568,20572,20583,20647,20652,20658,20673,20674,20680,20688,20701,20703,20712,20723,20727,20733,20738,20745,20751,20761,20772,20778,20780,20786,20792,20795,20800,20806,20813,20817,20818,20819,20821,20824,20834,20841,20845,20846,20853,20881,20885,20891,20892,20908,20909,20913,20920,20922,20924,20926,20927,20934,20936,20937,20939,20944,20945,20946,20949,20951,20953,20958,20964,20969,20977,20978,20980,20982,20990,20993,20997,20998,20999,21002,21005,21009,21014,21017,21019,21020,21026,21027,21028,21030,21033,21049,21052,21053,21064,21069,21075,21078,21081,21082,21087,21088,21094,21096,21099,21100,21116,21123,21131,21134,21136,21138,21139,21143,21156,21159,21160,21163,21167,21168,21170,21173,21177,21183,21191,21196,21197,21207,21211,21214,21219,21228,21236,21241,21246,21247,21261,21262,21268,21280,21285,21288)
##   are outliers with |weight| <= 2e-06 ( < 4.7e-06);
##  14947 weights are ~= 1. The remaining 6179 ones are summarized as
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
## 0.000005 0.540800 0.853500 0.722800 0.965100 0.999000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

tryp_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 220.9448, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8303204 0.8384797
## sample estimates:
##       cor
## 0.8344458

Compare Amastigote 60 hr from CL14 to CLBrener

a60_cl14clbr_ls = hpgl_linear_scatter(kept_table[,c("a60_cl14", "a60_clbr")], gvis_filename="html/a60_cl14clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0406815273104036 in order to have 500 bins."

a60_cl14clbr_ls$scatter

a60_cl14clbr_ls$both_histogram$plot

a60_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##     -0.4255       1.0402

a60_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -2.92250 -0.21249  0.02628  0.25404  6.28503
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.425464   0.016315  -26.08   <2e-16 ***
## first        1.040164   0.001829  568.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.4087
## Multiple R-squared:  0.9374, Adjusted R-squared:  0.9374
## Convergence in 10 IRWLS iterations
##
## Robustness weights:
##  366 observations c(17811,19117,19211,19230,19667,19768,19936,19941,19965,19998,20007,20012,20048,20075,20099,20100,20121,20131,20135,20139,20169,20176,20194,20195,20206,20211,20230,20237,20255,20256,20257,20268,20275,20276,20284,20286,20294,20298,20299,20320,20323,20331,20346,20361,20365,20370,20373,20376,20378,20389,20390,20395,20403,20422,20424,20426,20436,20440,20445,20448,20452,20454,20463,20464,20471,20481,20482,20485,20486,20488,20490,20496,20497,20499,20504,20516,20518,20519,20523,20525,20528,20530,20533,20534,20540,20544,20545,20548,20550,20552,20553,20554,20555,20558,20562,20564,20565,20572,20581,20583,20586,20588,20589,20596,20598,20611,20612,20620,20623,20627,20628,20629,20638,20640,20641,20643,20647,20650,20651,20652,20653,20663,20665,20669,20673,20674,20675,20678,20680,20682,20683,20689,20693,20701,20703,20707,20712,20717,20723,20727,20733,20734,20735,20738,20740,20743,20745,20748,20751,20755,20758,20761,20764,20771,20772,20778,20780,20784,20786,20789,20792,20795,20799,20800,20806,20808,20810,20811,20812,20813,20817,20818,20819,20821,20823,20824,20827,20829,20833,20834,20841,20845,20846,20853,20863,20864,20865,20869,20872,20879,20881,20882,20885,20892,20893,20894,20898,20904,20905,20908,20909,20910,20913,20914,20915,20917,20919,20920,20922,20924,20926,20927,20932,20934,20936,20937,20939,20941,20942,20943,20944,20945,20946,20949,20951,20953,20958,20961,20964,20969,20971,20976,20977,20978,20979,20980,20982,20984,20988,20991,20993,20995,20996,20997,20998,20999,21002,21005,21006,21007,21008,21009,21010,21014,21017,21018,21019,21020,21025,21026,21027,21028,21030,21033,21039,21040,21043,21045,21046,21049,21052,21053,21057,21060,21063,21064,21069,21075,21078,21079,21081,21082,21083,21087,21088,21091,21092,21094,21095,21096,21097,21098,21099,21100,21103,21104,21109,21111,21114,21116,21118,21119,21121,21123,21124,21131,21134,21135,21136,21138,21139,21142,21143,21145,21147,21149,21151,21152,21153,21156,21159,21160,21163,21165,21167,21168,21170,21171,21173,21175,21176,21177,21179,21183,21186,21191,21196,21197,21202,21207,21214,21219,21228,21235,21236,21237,21238,21241,21246,21247,21252,21254,21256,21261,21262,21263,21267,21268,21270,21277,21279,21280,21283,21285,21287,21288)
##   are outliers with |weight| <= 4.7e-06 ( < 4.7e-06);
##  15167 weights are ~= 1. The remaining 5761 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000059 0.5325000 0.8393000 0.7196000 0.9655000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a60_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 293.3484, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8926522 0.8979812
## sample estimates:
##       cor
## 0.8953488

Compare Amastigote 96 hr CL14 to CLBrener

a96_cl14clbr_ls = hpgl_linear_scatter(kept_table[,c("a96_cl14", "a96_clbr")], gvis_filename="html/a96_cl14clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0441692997385908 in order to have 500 bins."

a96_cl14clbr_ls$scatter

a96_cl14clbr_ls$both_histogram$plot

a96_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##      0.3635       0.9511

a96_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -6.59299 -0.30323 -0.00265  0.31022  4.68608
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.363468   0.020546   17.69   <2e-16 ***
## first       0.951084   0.002301  413.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.5431
## Multiple R-squared:  0.8865, Adjusted R-squared:  0.8865
## Convergence in 14 IRWLS iterations
##
## Robustness weights:
##  7 observations c(8200,10451,10510,18670,18746,20929,21293)
##   are outliers with |weight| = 0 ( < 4.7e-06);
##  15396 weights are ~= 1. The remaining 5891 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000333 0.5751000 0.8631000 0.7403000 0.9636000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a96_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 319.9738, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9075158 0.9121413
## sample estimates:
##       cor
## 0.9098568

Compare Epimastigote to Trypomastigote in CL14

epitryp_cl14_ls = hpgl_linear_scatter(kept_table[,c("epi_cl14", "tryp_cl14")], gvis_filename="html/epitryp_cl14.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.039656612304945 in order to have 500 bins."

epitryp_cl14_ls$scatter

epitryp_cl14_ls$both_histogram$plot

epitryp_cl14_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##      1.6517       0.8073

epitryp_cl14_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##       Min        1Q    Median        3Q       Max
## -5.725399 -0.530079  0.005257  0.543036  8.679218
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.651744   0.035273   46.83   <2e-16 ***
## first       0.807347   0.003972  203.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.9171
## Multiple R-squared:  0.6565, Adjusted R-squared:  0.6565
## Convergence in 12 IRWLS iterations
##
## Robustness weights:
##  4 observations c(7320,8316,11201,13965)
##   are outliers with |weight| <= 4.3e-06 ( < 4.7e-06);
##  15353 weights are ~= 1. The remaining 5937 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000685 0.6263000 0.8838000 0.7680000 0.9720000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

epitryp_cl14_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 161.3096, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7354967 0.7475870
## sample estimates:
##       cor
## 0.7416021

Compare Epimastigotes to Trypomastigotes in CLBrener

epitryp_clbr_ls = hpgl_linear_scatter(kept_table[,c("epi_clbr", "tryp_clbr")], gvis_filename="html/epitryp_clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.044226555188013 in order to have 500 bins."

epitryp_clbr_ls$scatter

epitryp_clbr_ls$both_histogram$plot

epitryp_clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##      1.8701       0.7782

epitryp_clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -6.11363 -0.56572  0.05638  0.57374  7.73642
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.870149   0.037041   50.49   <2e-16 ***
## first       0.778172   0.004162  186.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.9761
## Multiple R-squared:  0.618,  Adjusted R-squared:  0.6179
## Convergence in 14 IRWLS iterations
##
## Robustness weights:
##  2 observations c(640,19764) are outliers with |weight| = 0 ( < 4.7e-06);
##  15465 weights are ~= 1. The remaining 5827 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0004763 0.6137000 0.8683000 0.7568000 0.9666000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

epitryp_clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 142.8638, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6926670 0.7063837
## sample estimates:
##       cor
## 0.6995898

Compare Amastigote 96 hr to Trypomastigotes in CLBrener

a96tryp_clbr_ls = hpgl_linear_scatter(kept_table[,c("a96_clbr", "tryp_clbr")], gvis_filename="html/a96tryp_clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.044226555188013 in order to have 500 bins."

a96tryp_clbr_ls$scatter

a96tryp_clbr_ls$both_histogram$plot

a96tryp_clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##      0.8443       0.8938

a96tryp_clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -8.92371 -0.39694  0.01579  0.42319  6.90098
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.844290   0.026541   31.81   <2e-16 ***
## first       0.893760   0.002982  299.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.6995
## Multiple R-squared:  0.8055, Adjusted R-squared:  0.8055
## Convergence in 12 IRWLS iterations
##
## Robustness weights:
##  26 observations c(14326,14471,16339,16389,16901,17823,18060,18182,18459,18499,18691,18749,18889,19187,19222,19764,19872,19971,19981,20388,20619,20702,20861,20955,21146,21223)
##   are outliers with |weight| = 0 ( < 4.7e-06);
##  15303 weights are ~= 1. The remaining 5965 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000071 0.6106000 0.8734000 0.7474000 0.9681000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a96tryp_clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 216.4479, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8249322 0.8333270
## sample estimates:
##       cor
## 0.8291763

Compare Amastigote 96 hr to Trypomastigotes in CL14

a96tryp_cl14_ls = hpgl_linear_scatter(kept_table[,c("a96_cl14", "tryp_cl14")], gvis_filename="html/a96tryp_cl14.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0419614414349972 in order to have 500 bins."

a96tryp_cl14_ls$scatter

a96tryp_cl14_ls$both_histogram$plot

a96tryp_cl14_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##      2.4439       0.7246

a96tryp_cl14_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -7.17396 -0.53649 -0.01285  0.50926  7.76434
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.443859   0.033627   72.67   <2e-16 ***
## first       0.724598   0.003789  191.23   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.9223
## Multiple R-squared:  0.6229, Adjusted R-squared:  0.6229
## Convergence in 10 IRWLS iterations
##
## Robustness weights:
##  4 observations c(7320,18889,19558,19872)
##   are outliers with |weight| = 0 ( < 4.7e-06);
##  15374 weights are ~= 1. The remaining 5916 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000376 0.5493000 0.8661000 0.7305000 0.9672000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a96tryp_cl14_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 134.9087, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6715570 0.6860411
## sample estimates:
##       cor
## 0.6788651

Compare Amastigotes 60 to 96 hours in CL14

a60a96_cl14_ls = hpgl_linear_scatter(kept_table[,c("a60_cl14", "a96_cl14")], gvis_filename="html/a60a96_cl14.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0419614414349972 in order to have 500 bins."

a60a96_cl14_ls$scatter

a60a96_cl14_ls$both_histogram$plot

a60a96_cl14_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##     -0.1704       1.0095

a60a96_cl14_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -4.41572 -0.29344  0.01941  0.32703  6.74616
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.170375   0.020729  -8.219   <2e-16 ***
## first        1.009456   0.002321 434.917   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.5158
## Multiple R-squared:  0.8994, Adjusted R-squared:  0.8994
## Convergence in 11 IRWLS iterations
##
## Robustness weights:
##  184 observations c(17811,19998,20194,20284,20299,20323,20361,20365,20378,20395,20426,20436,20440,20448,20454,20463,20481,20482,20485,20497,20519,20523,20525,20533,20558,20583,20588,20620,20628,20640,20667,20673,20674,20675,20680,20689,20693,20701,20703,20712,20727,20734,20735,20738,20745,20748,20751,20755,20761,20771,20772,20778,20784,20786,20789,20792,20800,20806,20808,20813,20817,20818,20821,20824,20829,20833,20834,20841,20845,20846,20853,20862,20881,20882,20885,20891,20892,20893,20898,20908,20909,20913,20919,20920,20922,20924,20926,20927,20929,20932,20934,20936,20937,20939,20942,20944,20946,20951,20958,20964,20969,20978,20980,20982,20991,20993,20996,20997,20998,21002,21005,21009,21010,21014,21017,21018,21019,21020,21025,21026,21027,21028,21030,21031,21039,21045,21049,21052,21053,21069,21078,21079,21081,21082,21083,21088,21091,21094,21095,21096,21099,21100,21109,21111,21116,21118,21121,21123,21124,21131,21134,21136,21138,21139,21143,21151,21156,21159,21160,21163,21170,21171,21173,21179,21183,21186,21191,21196,21197,21207,21214,21219,21228,21232,21235,21246,21262,21263,21268,21276,21277,21280,21285,21288)
##   are outliers with |weight| <= 1.8e-06 ( < 4.7e-06);
##  15088 weights are ~= 1. The remaining 6022 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000144 0.6264000 0.8691000 0.7456000 0.9671000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a60a96_cl14_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 265.6017, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.873292 0.879521
## sample estimates:
##       cor
## 0.8764432

Compare Amastigote 60 to 96 hours in CLBrener

a60a96_clbr_ls = hpgl_linear_scatter(kept_table[,c("a60_clbr", "a96_clbr")], gvis_filename="html/a60a96_clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0441692997385908 in order to have 500 bins."

a60a96_clbr_ls$scatter

a60a96_clbr_ls$both_histogram$plot

a60a96_clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##      0.1656       0.9675

a60a96_clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -5.07857 -0.28603 -0.03774  0.32364  5.70401
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.165578   0.021536   7.689 1.55e-14 ***
## first       0.967465   0.002401 402.871  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.5515
## Multiple R-squared:  0.881,  Adjusted R-squared:  0.881
## Convergence in 16 IRWLS iterations
##
## Robustness weights:
##  33 observations c(5666,6627,6673,6943,7394,7691,8200,8628,10368,10451,10510,12433,13332,13744,13989,14512,16035,16874,17196,17574,17792,17975,17988,18459,18670,19551,20870,21031,21232,21269,21273,21276,21290)
##   are outliers with |weight| = 0 ( < 4.7e-06);
##  15182 weights are ~= 1. The remaining 6079 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000325 0.4939000 0.8367000 0.7031000 0.9588000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         4.070e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a60a96_clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 269.6644, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8764190 0.8825038
## sample estimates:
##       cor
## 0.8794973

Compare (epi/tryp)cl14 / (epi/tryp)clbr

## Something is messed up here
## I am going to add some more contrasts to figure out what went wrong
epitryp_cl14clbr_ls = hpgl_linear_scatter(kept_table[,c("epitryp_cl14", "epitryp_clbr")], gvis_filename="html/epitryp_cl14clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0309143539606324 in order to have 500 bins."

epitryp_cl14clbr_ls$scatter

epitryp_cl14clbr_ls$both_histogram$plot

epitryp_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##    -0.03758      0.95988

epitryp_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -5.09267 -0.34077  0.01846  0.35249  5.68291
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.037582   0.004087  -9.195   <2e-16 ***
## first        0.959876   0.003786 253.502   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.6065
## Multiple R-squared:  0.7448, Adjusted R-squared:  0.7448
## Convergence in 12 IRWLS iterations
##
## Robustness weights:
##  22 observations c(16901,17291,17823,19424,19723,19731,19737,20053,20130,20151,20200,20212,20224,20251,20435,20438,20447,20720,20726,20794,20795,21292)
##   are outliers with |weight| <= 1.8e-06 ( < 4.7e-06);
##  15470 weights are ~= 1. The remaining 5802 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0001011 0.5792000 0.8652000 0.7399000 0.9655000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         1.641e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

epitryp_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 192.7764, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7923955 0.8021814
## sample estimates:
##       cor
## 0.7973408

Compare (epi/a60)cl14 / (epi/a60)clbr

## OK, so the following two are solidly telling me that the epimastigote data is the weirdo, I think we can assume the CL14.
epia60_cl14clbr_ls = hpgl_linear_scatter(kept_table[,c("epia60_cl14", "epia60_clbr")], gvis_filename="html/epia60_cl14clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0360293302559337 in order to have 500 bins."

epia60_cl14clbr_ls$scatter

epia60_cl14clbr_ls$both_histogram$plot

epia60_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##     0.03125      0.60917

epia60_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -5.43410 -0.32952  0.04357  0.32798  6.35967
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.031249   0.003665   8.526   <2e-16 ***
## first       0.609168   0.006077 100.241   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.5359
## Multiple R-squared:  0.3191, Adjusted R-squared:  0.3191
## Convergence in 21 IRWLS iterations
##
## Robustness weights:
##  6 observations c(12806,15909,18938,20619,20955,21292)
##   are outliers with |weight| = 0 ( < 4.7e-06);
##  15450 weights are ~= 1. The remaining 5838 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0007114 0.6680000 0.8885000 0.7854000 0.9730000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         1.441e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

epia60_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 101.8364, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5632054 0.5812709
## sample estimates:
##       cor
## 0.5723076

Compare (tryp/a60)cl14 / (tryp/a60)clbr err the other way around...

a60tryp_cl14clbr_ls = hpgl_linear_scatter(kept_table[,c("a60tryp_cl14", "a60tryp_clbr")], gvis_filename="html/a60tryp_cl14clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0234571806362081 in order to have 500 bins."

a60tryp_cl14clbr_ls$scatter

a60tryp_cl14clbr_ls$both_histogram$plot

a60tryp_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##    -0.13034      0.07546

a60tryp_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##      Min       1Q   Median       3Q      Max
## -4.65310 -0.28151 -0.04327  0.32856  5.64755
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.130340   0.003675  -35.47   <2e-16 ***
## first        0.075463   0.004543   16.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.5449
## Multiple R-squared:  0.01232,    Adjusted R-squared:  0.01227
## Convergence in 9 IRWLS iterations
##
## Robustness weights:
##  39 observations c(5666,6627,6673,6943,7394,7691,8200,8628,10368,10451,10510,12186,12433,13332,13744,13989,14512,15262,15296,16035,16874,17196,17574,17792,17975,17988,18178,18459,18574,18670,18935,19493,19551,19986,20067,20870,21273,21276,21290)
##   are outliers with |weight| <= 1.1e-07 ( < 4.7e-06);
##  15232 weights are ~= 1. The remaining 6023 ones are summarized as
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0000108 0.4655000 0.8346000 0.6948000 0.9590000 0.9990000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         1.205e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a60tryp_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 16.8619, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1015185 0.1280275
## sample estimates:
##       cor
## 0.1147934

Compare (a60/a96)cl14 / (a60/a96)clbr

a60a96_cl14clbr_ls = hpgl_linear_scatter(kept_table[,c("a60a96_cl14", "a60a96_clbr")], gvis_filename="html/a60a96_cl14clbr.html", tooltip_data=tooltip_data, loess=TRUE, gvis_trendline="linear")

## [1] "No binwidth nor bins provided, setting it to 0.0367283498193497 in order to have 500 bins."

a60a96_cl14clbr_ls$scatter

a60a96_cl14clbr_ls$both_histogram$plot

a60a96_cl14clbr_ls$lm_model

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##
## Coefficients:
## (Intercept)        first
##    -0.05782      0.39108

a60a96_cl14clbr_ls$lm_summary

##
## Call:
## robustbase::lmrob(formula = second ~ first, data = df, method = "SMDM")
##  \--> method = "SMDM"
## Residuals:
##        Min         1Q     Median         3Q        Max
## -5.9540308 -0.3461099 -0.0005758  0.3556397  6.0482432
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.057818   0.004271  -13.54   <2e-16 ***
## first        0.391075   0.003689  106.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Robust residual standard error: 0.6295
## Multiple R-squared:  0.341,  Adjusted R-squared:  0.341
## Convergence in 15 IRWLS iterations
##
## Robustness weights:
##  11 observations c(15945,17823,18256,18459,18749,18799,20041,20619,20679,20702,21011)
##   are outliers with |weight| <= 3.6e-07 ( < 4.7e-06);
##  15288 weights are ~= 1. The remaining 5995 ones are summarized as
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
## 0.000088 0.619200 0.874700 0.754700 0.966500 0.999000
## Algorithmic parameters:
##       tuning.chi1       tuning.chi2       tuning.chi3       tuning.chi4
##        -5.000e-01         1.500e+00                NA         5.000e-01
##                bb       tuning.psi1       tuning.psi2       tuning.psi3
##         5.000e-01        -5.000e-01         1.500e+00         9.500e-01
##       tuning.psi4        refine.tol           rel.tol         solve.tol
##                NA         1.000e-07         1.000e-07         1.000e-07
##       eps.outlier             eps.x warn.limit.reject warn.limit.meanrw
##         4.696e-06         1.690e-11         5.000e-01         5.000e-01
##      nResample         max.it       best.r.s       k.fast.s          k.max
##            500             50              2              1            200
##    maxit.scale      trace.lev            mts     compute.rd      numpoints
##            200              0           1000              0             10
## fast.s.large.n
##           2000
##                   psi           subsampling                   cov
##                 "lqq"         "nonsingular"             ".vcov.w"
## compute.outlier.stats
##                "SMDM"
## seed : int(0)

a60a96_cl14clbr_ls$correlation

##
##  Pearson's product-moment correlation
##
## data:  df[, 1] and df[, 2]
## t = 104.4489, df = 21292, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5731059 0.5908690
## sample estimates:
##       cor
## 0.5820569

I removed a section regarding batch effect removal from here. This section is not needed because I included its functionality in myr::graph_metrics()

My email to Kwame

The following is the text of a message I sent Kwame, which lays out my concerns in this data:

I just wanted to ping you -- I poked further at my concern. I am 99% certain I set up the model matrix correctly and assigned the relative contributions of each sample in makeContrasts() correctly. The results I get when performing the contrasts look for the most part very much like what I would expect. My only concern is when I perform operations like: (x-y)-(a-b); in most instances they too look very much like what I expect, but in one instance one term(the x from my example) is consistently (~4/3 * a) which leaves an apparently linear relationship when there really should not be one. This caused me to be concerned that I misattributed the samples in makeContrasts(). Upon further examination I came to the conclusion that they were in fact correct. My second guess was (given things I have heard you/Hector/Najib say about the relationship between normalization and read depth) that either the x or a term was much higher or lower in depth than the others. I checked and this is not true (there are libsize differences, but they don't look significant to me). My last guess is that one of the samples in x does not cluster as nicely as all the rest of the samples in the data set and this is somehow leading to a consistent effect throughout the dataset, but for the life of me I cannot find a reason why.

Gene Ontology searches

Lets set up some simple data sets for goseq/clusterProfiler/topGO

Extracting ontology annotations from the TriTrypDB

I need to extract the ontology/pFAM data from the Tcruzi annotation tables.

I have a perl script 'dump_tritryp_GO.pl' which can do this for me.


cd $LAB/ref_data/gene_ontology/
./dump_tritryp_GO.pl $LAB/ref_data/tritryp_text/Tcruzi.txt > tcruzi_go.tab
## The actual commands were slightly different, just because I didn't feel like copy/pasting the filenames.

Preparing to use goseq

I need a couple vectors before being able to use goseq I also need some rda files generated for clusterProfiler and topGO but I will do those later...

## goseq uses the gene lengths, clusterprofiler and topgo don't
gene_lengths = genes[,c("ID","width")]
## go_ids needs to be a 2 column data frame with names 'ID', 'GO' which are
## gene names and GO names respectively.  I actually think I made my goseq() function
## smart enough that this is no longer needed, but nonetheless.

go_ids = read.table("reference/go/tcruzi_all_go.tab.gz", header=0, sep="\t")
colnames(go_ids) = c("ID","GO","ontology","term","source")
## Since there are multiple sources, I am doing a unique of the table...
go_ids = unique(go_ids[,c("ID","GO")])
goid_file = "reference/go/topgo_goids.tab.gz"

Compare Epimastigotes between CL14 and CLBr, high expression

One important note here, I needed to modify the data structure from readGff() in clusterProfiler before it was able to successfully generate the file 'geneTable.rda' Unfortunately, I was watching a movie at the time and didn't properly write down what I did. However, I did write it out in the function myr::Gff2GeneTable() but in order to make it work one must load readGff() from clusterProfiler. I think the real solution is for me to grab a version of readGff() from clusterProfiler and make it exportable so that it will work consistently.

## Changes in epimastigotes from CLBrener to CL14
pval_thresh = 0.0001
epi_cl14clbr_table = all_tables$epi_cl14clbr
epi_cl14clbr_summary = summary(epi_cl14clbr_table$logFC)
epi_cl14clbr_pvals = epi_cl14clbr_table$adj.P.Val
names(epi_cl14clbr_pvals) = rownames(epi_cl14clbr_table)
## Extract the genes with high expression in CL14 compared to Brener and very low pvalues
epi_cl14clbr_high = subset(epi_cl14clbr_table, logFC >= epi_cl14clbr_summary[5] & adj.P.Val <= pval_thresh)
## Do a goseq comparison of the high genes
epi_cl14clbr_high_go = simple_goseq(epi_cl14clbr_high, lengths=gene_lengths, goids=go_ids)

## [1] "simple_goseq() makes some pretty hard assumptions about the data it is fed:"
## [1] "It requires 2 tables, one of GOids which must have columns (gene)ID and GO(category)"
## [1] "The other table is of gene lengths with columns (gene)ID and (gene)width."
## [1] "Other columns are fine, but ignored."

## Warning in pcls(G): initial point very close to some inequality
## constraints

## Using manually entered categories.
## Calculating the p-values...

## [1] "Calculating q-values"
## [1] "Filling godata table with term information, this takes a while."
## [1] "Making pvalue plots for the ontologies."

epi_cl14clbr_high_go$pvalue_histogram

epi_cl14clbr_high_go$bpp_plot

## Oh crap, I cannot make trees until I generate a id2go R mapping, I will need to check my previous script for how to do that (I think I do it automatically in simple_topgo()...
##epi_cl14clbr_high_trees = goseq_trees(degenes=epi_cl14clbr_high, epi_cl14clbr_high_go)
## I had to hack the geneTable() function in order to make it not take infinite memory because it was trying to do some stupid merge of every gff column against every other column...
## Once the geneTable.rda file exists and included real information, this seems to work fine, though.
epi_cl14clbr_high_cl = simple_clusterprofiler(epi_cl14clbr_high)

## [1] "Using GO mapping data located in GO2EG.rda"
## [1] "Testing gseGO"
## NULL
## [1] "Starting MF(molecular function) analysis"
## The minimum observed adjusted pvalue is: 0.011973
## The minimum observed adjusted pvalue is: 0.299321
## [1] "Starting BP(biological process) analysis"
## The minimum observed adjusted pvalue is: 0.003762
## The minimum observed adjusted pvalue is: 0.410924
## [1] "Starting CC(cellular component) analysis"
## The minimum observed adjusted pvalue is: 0.351077
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Attempting to include the cnetplots from clusterProfiler."
## [1] "They fail often, if this is causing errors, set:"
## [1] "include_cnetplots to FALSE"
## [1] "cnetplot just failed for the MF ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the BP ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the CC ontology.  Do not be concerned with the previous error."

epi_cl14clbr_high_cl$bp_all_barplot

## The first time this is run in a fresh directory, it needs to be run like this:
## simple_topgo(epi_cl14clbr_high, goids_df=go_ids)
## setting goids_df will tell it to generate a table of genes->GO<-genes in the format expected.
epi_cl14clbr_high_top = simple_topgo(epi_cl14clbr_high, goids=goid_file, pvals=epi_cl14clbr_pvals)

##
## Building most specific GOs ..... ( 620 GO terms found. )
##
## Build GO DAG topology .......... ( 993 GO terms and 1244 relations. )
##
## Annotating nodes ............... ( 4223 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 566 GO terms found. )
##
## Build GO DAG topology .......... ( 1521 GO terms and 3182 relations. )
##
## Annotating nodes ............... ( 3841 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 203 GO terms found. )
##
## Build GO DAG topology .......... ( 363 GO terms and 771 relations. )
##
## Annotating nodes ............... ( 3453 genes annotated to the GO terms. )
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 660 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1073 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 278 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  6 nodes to be scored    (0 eliminated genes)
##
##   Level 12:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 11:  13 nodes to be scored   (0 eliminated genes)
##
##   Level 10:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 9:   45 nodes to be scored   (0 eliminated genes)
##
##   Level 8:   91 nodes to be scored   (3 eliminated genes)
##
##   Level 7:   184 nodes to be scored  (3 eliminated genes)
##
##   Level 6:   273 nodes to be scored  (172 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (199 eliminated genes)
##
##   Level 4:   113 nodes to be scored  (519 eliminated genes)
##
##   Level 3:   47 nodes to be scored   (582 eliminated genes)
##
##   Level 2:   14 nodes to be scored   (582 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (582 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 16:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  18 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  68 nodes to be scored   (3 eliminated genes)
##
##   Level 11:  107 nodes to be scored  (3 eliminated genes)
##
##   Level 10:  182 nodes to be scored  (3 eliminated genes)
##
##   Level 9:   214 nodes to be scored  (58 eliminated genes)
##
##   Level 8:   205 nodes to be scored  (58 eliminated genes)
##
##   Level 7:   199 nodes to be scored  (66 eliminated genes)
##
##   Level 6:   195 nodes to be scored  (66 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (66 eliminated genes)
##
##   Level 4:   96 nodes to be scored   (187 eliminated genes)
##
##   Level 3:   37 nodes to be scored   (187 eliminated genes)
##
##   Level 2:   15 nodes to be scored   (187 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (187 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  8 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  21 nodes to be scored   (3 eliminated genes)
##
##   Level 12:  22 nodes to be scored   (3 eliminated genes)
##
##   Level 11:  42 nodes to be scored   (3 eliminated genes)
##
##   Level 10:  58 nodes to be scored   (3 eliminated genes)
##
##   Level 9:   35 nodes to be scored   (3 eliminated genes)
##
##   Level 8:   53 nodes to be scored   (3 eliminated genes)
##
##   Level 7:   29 nodes to be scored   (365 eliminated genes)
##
##   Level 6:   32 nodes to be scored   (365 eliminated genes)
##
##   Level 5:   20 nodes to be scored   (365 eliminated genes)
##
##   Level 4:   24 nodes to be scored   (365 eliminated genes)
##
##   Level 3:   10 nodes to be scored   (365 eliminated genes)
##
##   Level 2:   6 nodes to be scored    (365 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (365 eliminated genes)
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 660 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 15:  2 nodes to be scored.
##
##   Level 14:  2 nodes to be scored.
##
##   Level 13:  4 nodes to be scored.
##
##   Level 12:  4 nodes to be scored.
##
##   Level 11:  9 nodes to be scored.
##
##   Level 10:  13 nodes to be scored.
##
##   Level 9:   31 nodes to be scored.
##
##   Level 8:   53 nodes to be scored.
##
##   Level 7:   107 nodes to be scored.
##
##   Level 6:   166 nodes to be scored.
##
##   Level 5:   122 nodes to be scored.
##
##   Level 4:   92 nodes to be scored.
##
##   Level 3:   41 nodes to be scored.
##
##   Level 2:   13 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 1073 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 16:  2 nodes to be scored.
##
##   Level 15:  2 nodes to be scored.
##
##   Level 14:  6 nodes to be scored.
##
##   Level 13:  11 nodes to be scored.
##
##   Level 12:  43 nodes to be scored.
##
##   Level 11:  75 nodes to be scored.
##
##   Level 10:  119 nodes to be scored.
##
##   Level 9:   143 nodes to be scored.
##
##   Level 8:   142 nodes to be scored.
##
##   Level 7:   131 nodes to be scored.
##
##   Level 6:   146 nodes to be scored.
##
##   Level 5:   132 nodes to be scored.
##
##   Level 4:   77 nodes to be scored.
##
##   Level 3:   29 nodes to be scored.
##
##   Level 2:   14 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 278 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 14:  5 nodes to be scored.
##
##   Level 13:  14 nodes to be scored.
##
##   Level 12:  16 nodes to be scored.
##
##   Level 11:  33 nodes to be scored.
##
##   Level 10:  48 nodes to be scored.
##
##   Level 9:   30 nodes to be scored.
##
##   Level 8:   36 nodes to be scored.
##
##   Level 7:   19 nodes to be scored.
##
##   Level 6:   21 nodes to be scored.
##
##   Level 5:   18 nodes to be scored.
##
##   Level 4:   21 nodes to be scored.
##
##   Level 3:   10 nodes to be scored.
##
##   Level 2:   6 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.

topgo_pval = topgo_pval_plot(epi_cl14clbr_high_top)
topgo_pval$BP

top_trees = topgo_trees(epi_cl14clbr_high_top)

top_trees$bp_fisher_tree

Compare Epimastigotes between CL14 and CLBr, low expression

## Extract the genes with low expression in CL14 compared to Brener and very low pvalues
epi_cl14clbr_low = subset(epi_cl14clbr_table, logFC <= epi_cl14clbr_summary[2] & adj.P.Val <= pval_thresh)
epi_cl14clbr_low_go = simple_goseq(de_genes=epi_cl14clbr_low, lengths=gene_lengths, goids=go_ids)

## [1] "simple_goseq() makes some pretty hard assumptions about the data it is fed:"
## [1] "It requires 2 tables, one of GOids which must have columns (gene)ID and GO(category)"
## [1] "The other table is of gene lengths with columns (gene)ID and (gene)width."
## [1] "Other columns are fine, but ignored."

## Warning in pcls(G): initial point very close to some inequality
## constraints

## Using manually entered categories.
## Calculating the p-values...

## [1] "Calculating q-values"
## [1] "Filling godata table with term information, this takes a while."
## [1] "Making pvalue plots for the ontologies."

epi_cl14clbr_low_go$pvalue_histogram

epi_cl14clbr_low_go$bpp_plot

##epi_cl14clbr_low_cl = simple_clusterprofiler(epi_cl14clbr_low, gff="reference/gff/clbrener_8.1_complete_genes.gff.gz")
##epi_cl14clbr_low_cl$bp_all_barplot
epi_cl14clbr_low_top = simple_topgo(epi_cl14clbr_low, goids=goid_file, pvals=epi_cl14clbr_pvals)

##
## Building most specific GOs ..... ( 620 GO terms found. )
##
## Build GO DAG topology .......... ( 993 GO terms and 1244 relations. )
##
## Annotating nodes ............... ( 4223 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 566 GO terms found. )
##
## Build GO DAG topology .......... ( 1521 GO terms and 3182 relations. )
##
## Annotating nodes ............... ( 3841 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 203 GO terms found. )
##
## Build GO DAG topology .......... ( 363 GO terms and 771 relations. )
##
## Annotating nodes ............... ( 3453 genes annotated to the GO terms. )
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 660 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1073 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 278 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  6 nodes to be scored    (0 eliminated genes)
##
##   Level 12:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 11:  13 nodes to be scored   (0 eliminated genes)
##
##   Level 10:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 9:   45 nodes to be scored   (0 eliminated genes)
##
##   Level 8:   91 nodes to be scored   (3 eliminated genes)
##
##   Level 7:   184 nodes to be scored  (3 eliminated genes)
##
##   Level 6:   273 nodes to be scored  (172 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (199 eliminated genes)
##
##   Level 4:   113 nodes to be scored  (519 eliminated genes)
##
##   Level 3:   47 nodes to be scored   (582 eliminated genes)
##
##   Level 2:   14 nodes to be scored   (582 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (582 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 16:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  18 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  68 nodes to be scored   (3 eliminated genes)
##
##   Level 11:  107 nodes to be scored  (3 eliminated genes)
##
##   Level 10:  182 nodes to be scored  (3 eliminated genes)
##
##   Level 9:   214 nodes to be scored  (58 eliminated genes)
##
##   Level 8:   205 nodes to be scored  (58 eliminated genes)
##
##   Level 7:   199 nodes to be scored  (66 eliminated genes)
##
##   Level 6:   195 nodes to be scored  (66 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (66 eliminated genes)
##
##   Level 4:   96 nodes to be scored   (187 eliminated genes)
##
##   Level 3:   37 nodes to be scored   (187 eliminated genes)
##
##   Level 2:   15 nodes to be scored   (187 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (187 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  8 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  21 nodes to be scored   (3 eliminated genes)
##
##   Level 12:  22 nodes to be scored   (3 eliminated genes)
##
##   Level 11:  42 nodes to be scored   (3 eliminated genes)
##
##   Level 10:  58 nodes to be scored   (3 eliminated genes)
##
##   Level 9:   35 nodes to be scored   (3 eliminated genes)
##
##   Level 8:   53 nodes to be scored   (3 eliminated genes)
##
##   Level 7:   29 nodes to be scored   (365 eliminated genes)
##
##   Level 6:   32 nodes to be scored   (365 eliminated genes)
##
##   Level 5:   20 nodes to be scored   (365 eliminated genes)
##
##   Level 4:   24 nodes to be scored   (365 eliminated genes)
##
##   Level 3:   10 nodes to be scored   (365 eliminated genes)
##
##   Level 2:   6 nodes to be scored    (365 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (365 eliminated genes)
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 660 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 15:  2 nodes to be scored.
##
##   Level 14:  2 nodes to be scored.
##
##   Level 13:  4 nodes to be scored.
##
##   Level 12:  4 nodes to be scored.
##
##   Level 11:  9 nodes to be scored.
##
##   Level 10:  13 nodes to be scored.
##
##   Level 9:   31 nodes to be scored.
##
##   Level 8:   53 nodes to be scored.
##
##   Level 7:   107 nodes to be scored.
##
##   Level 6:   166 nodes to be scored.
##
##   Level 5:   122 nodes to be scored.
##
##   Level 4:   92 nodes to be scored.
##
##   Level 3:   41 nodes to be scored.
##
##   Level 2:   13 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 1073 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 16:  2 nodes to be scored.
##
##   Level 15:  2 nodes to be scored.
##
##   Level 14:  6 nodes to be scored.
##
##   Level 13:  11 nodes to be scored.
##
##   Level 12:  43 nodes to be scored.
##
##   Level 11:  75 nodes to be scored.
##
##   Level 10:  119 nodes to be scored.
##
##   Level 9:   143 nodes to be scored.
##
##   Level 8:   142 nodes to be scored.
##
##   Level 7:   131 nodes to be scored.
##
##   Level 6:   146 nodes to be scored.
##
##   Level 5:   132 nodes to be scored.
##
##   Level 4:   77 nodes to be scored.
##
##   Level 3:   29 nodes to be scored.
##
##   Level 2:   14 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 278 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 14:  5 nodes to be scored.
##
##   Level 13:  14 nodes to be scored.
##
##   Level 12:  16 nodes to be scored.
##
##   Level 11:  33 nodes to be scored.
##
##   Level 10:  48 nodes to be scored.
##
##   Level 9:   30 nodes to be scored.
##
##   Level 8:   36 nodes to be scored.
##
##   Level 7:   19 nodes to be scored.
##
##   Level 6:   21 nodes to be scored.
##
##   Level 5:   18 nodes to be scored.
##
##   Level 4:   21 nodes to be scored.
##
##   Level 3:   10 nodes to be scored.
##
##   Level 2:   6 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.

topgo_pval = topgo_pval_plot(epi_cl14clbr_high_top)
topgo_pval$BP

top_trees = topgo_trees(epi_cl14clbr_low_top)

top_trees$bp_fisher_tree

Compare trypo CL14/CLBr

## Changes in epimastigotes from CLBrener to CL14
tryp_cl14clbr_table = all_tables$tryp_cl14clbr
tryp_cl14clbr_summary = summary(tryp_cl14clbr_table$logFC)
tryp_cl14clbr_pvals = tryp_cl14clbr_table$adj.P.Val
names(tryp_cl14clbr_pvals) = rownames(tryp_cl14clbr_table)
## Extract the genes with high expression in CL14 compared to Brener and very low pvalues
tryp_cl14clbr_high = subset(tryp_cl14clbr_table, logFC >= tryp_cl14clbr_summary[5] & adj.P.Val <= pval_thresh)
## Do a goseq comparison of the high genes
tryp_cl14clbr_high_go = simple_goseq(tryp_cl14clbr_high, lengths=gene_lengths, goids=go_ids)

## [1] "simple_goseq() makes some pretty hard assumptions about the data it is fed:"
## [1] "It requires 2 tables, one of GOids which must have columns (gene)ID and GO(category)"
## [1] "The other table is of gene lengths with columns (gene)ID and (gene)width."
## [1] "Other columns are fine, but ignored."

## Using manually entered categories.
## Calculating the p-values...

## [1] "Calculating q-values"
## [1] "Filling godata table with term information, this takes a while."
## [1] "Making pvalue plots for the ontologies."

tryp_cl14clbr_high_go$pvalue_histogram

tryp_cl14clbr_high_go$bpp_plot

## Oh crap, I cannot make trees until I generate a id2go R mapping, I will need to check my previous script for how to do that (I think I do it automatically in simple_topgo()...
##tryp_cl14clbr_high_trees = goseq_trees(degenes=tryp_cl14clbr_high, tryp_cl14clbr_high_go)
## I had to hack the geneTable() function in order to make it not take infinite memory because it was trying to do some stupid merge of every gff column against every other column...
## Once the geneTable.rda file exists and included real information, this seems to work fine, though.
tryp_cl14clbr_high_cl = simple_clusterprofiler(tryp_cl14clbr_high)

## [1] "Using GO mapping data located in GO2EG.rda"
## [1] "Testing gseGO"
## NULL
## [1] "Starting MF(molecular function) analysis"
## The minimum observed adjusted pvalue is: 0.125401
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Starting BP(biological process) analysis"
## The minimum observed adjusted pvalue is: 0.017197
## The minimum observed adjusted pvalue is: 0.658689
## [1] "Starting CC(cellular component) analysis"
## The minimum observed adjusted pvalue is: 0.499209
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Attempting to include the cnetplots from clusterProfiler."
## [1] "They fail often, if this is causing errors, set:"
## [1] "include_cnetplots to FALSE"
## [1] "cnetplot just failed for the MF ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the BP ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the CC ontology.  Do not be concerned with the previous error."

tryp_cl14clbr_high_cl$bp_all_barplot

## The first time this is run in a fresh directory, it needs to be run like this:
## simple_topgo(tryp_cl14clbr_high, goids_df=go_ids)
## setting goids_df will tell it to generate a table of genes->GO<-genes in the format expected.
tryp_cl14clbr_high_top = simple_topgo(tryp_cl14clbr_high, goids=goid_file, pvals=tryp_cl14clbr_pvals)

##
## Building most specific GOs ..... ( 620 GO terms found. )
##
## Build GO DAG topology .......... ( 993 GO terms and 1244 relations. )
##
## Annotating nodes ............... ( 4223 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 566 GO terms found. )
##
## Build GO DAG topology .......... ( 1521 GO terms and 3182 relations. )
##
## Annotating nodes ............... ( 3841 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 203 GO terms found. )
##
## Build GO DAG topology .......... ( 363 GO terms and 771 relations. )
##
## Annotating nodes ............... ( 3453 genes annotated to the GO terms. )
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 676 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1053 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 261 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  6 nodes to be scored    (0 eliminated genes)
##
##   Level 12:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 11:  13 nodes to be scored   (0 eliminated genes)
##
##   Level 10:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 9:   45 nodes to be scored   (0 eliminated genes)
##
##   Level 8:   91 nodes to be scored   (3 eliminated genes)
##
##   Level 7:   184 nodes to be scored  (12 eliminated genes)
##
##   Level 6:   273 nodes to be scored  (120 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (120 eliminated genes)
##
##   Level 4:   113 nodes to be scored  (154 eliminated genes)
##
##   Level 3:   47 nodes to be scored   (172 eliminated genes)
##
##   Level 2:   14 nodes to be scored   (172 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (172 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 16:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  18 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  68 nodes to be scored   (0 eliminated genes)
##
##   Level 11:  107 nodes to be scored  (9 eliminated genes)
##
##   Level 10:  182 nodes to be scored  (17 eliminated genes)
##
##   Level 9:   214 nodes to be scored  (52 eliminated genes)
##
##   Level 8:   205 nodes to be scored  (54 eliminated genes)
##
##   Level 7:   199 nodes to be scored  (189 eliminated genes)
##
##   Level 6:   195 nodes to be scored  (306 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (322 eliminated genes)
##
##   Level 4:   96 nodes to be scored   (322 eliminated genes)
##
##   Level 3:   37 nodes to be scored   (322 eliminated genes)
##
##   Level 2:   15 nodes to be scored   (322 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (322 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  8 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  22 nodes to be scored   (0 eliminated genes)
##
##   Level 11:  42 nodes to be scored   (15 eliminated genes)
##
##   Level 10:  58 nodes to be scored   (15 eliminated genes)
##
##   Level 9:   35 nodes to be scored   (50 eliminated genes)
##
##   Level 8:   53 nodes to be scored   (50 eliminated genes)
##
##   Level 7:   29 nodes to be scored   (74 eliminated genes)
##
##   Level 6:   32 nodes to be scored   (74 eliminated genes)
##
##   Level 5:   20 nodes to be scored   (74 eliminated genes)
##
##   Level 4:   24 nodes to be scored   (74 eliminated genes)
##
##   Level 3:   10 nodes to be scored   (74 eliminated genes)
##
##   Level 2:   6 nodes to be scored    (74 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (74 eliminated genes)
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 676 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 15:  2 nodes to be scored.
##
##   Level 14:  2 nodes to be scored.
##
##   Level 13:  4 nodes to be scored.
##
##   Level 12:  5 nodes to be scored.
##
##   Level 11:  9 nodes to be scored.
##
##   Level 10:  17 nodes to be scored.
##
##   Level 9:   27 nodes to be scored.
##
##   Level 8:   57 nodes to be scored.
##
##   Level 7:   113 nodes to be scored.
##
##   Level 6:   166 nodes to be scored.
##
##   Level 5:   127 nodes to be scored.
##
##   Level 4:   95 nodes to be scored.
##
##   Level 3:   39 nodes to be scored.
##
##   Level 2:   12 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 1053 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 16:  1 nodes to be scored.
##
##   Level 15:  1 nodes to be scored.
##
##   Level 14:  4 nodes to be scored.
##
##   Level 13:  9 nodes to be scored.
##
##   Level 12:  39 nodes to be scored.
##
##   Level 11:  74 nodes to be scored.
##
##   Level 10:  116 nodes to be scored.
##
##   Level 9:   133 nodes to be scored.
##
##   Level 8:   140 nodes to be scored.
##
##   Level 7:   132 nodes to be scored.
##
##   Level 6:   145 nodes to be scored.
##
##   Level 5:   134 nodes to be scored.
##
##   Level 4:   79 nodes to be scored.
##
##   Level 3:   31 nodes to be scored.
##
##   Level 2:   14 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 261 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 14:  4 nodes to be scored.
##
##   Level 13:  11 nodes to be scored.
##
##   Level 12:  17 nodes to be scored.
##
##   Level 11:  32 nodes to be scored.
##
##   Level 10:  43 nodes to be scored.
##
##   Level 9:   27 nodes to be scored.
##
##   Level 8:   35 nodes to be scored.
##
##   Level 7:   17 nodes to be scored.
##
##   Level 6:   21 nodes to be scored.
##
##   Level 5:   17 nodes to be scored.
##
##   Level 4:   20 nodes to be scored.
##
##   Level 3:   10 nodes to be scored.
##
##   Level 2:   6 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.

topgo_pval = topgo_pval_plot(tryp_cl14clbr_high_top)
topgo_pval$BP

top_trees = topgo_trees(tryp_cl14clbr_high_top)

top_trees$bp_fisher_tree

## Warning in replayPlot(x): zero-length arrow is of indeterminate angle and
## so skipped

Compare cl14/clbr tryp other side

## Changes in epimastigotes from CLBrener to CL14
tryp_cl14clbr_table = all_tables$tryp_cl14clbr
tryp_cl14clbr_summary = summary(tryp_cl14clbr_table$logFC)
tryp_cl14clbr_pvals = tryp_cl14clbr_table$adj.P.Val
names(tryp_cl14clbr_pvals) = rownames(tryp_cl14clbr_table)
## Extract the genes with low expression in CL14 compared to Brener and very low pvalues
tryp_cl14clbr_low = subset(tryp_cl14clbr_table, logFC <= tryp_cl14clbr_summary[2] & adj.P.Val <= pval_thresh)
## Do a goseq comparison of the low genes
tryp_cl14clbr_low_go = simple_goseq(tryp_cl14clbr_low, lengths=gene_lengths, goids=go_ids)

## [1] "simple_goseq() makes some pretty hard assumptions about the data it is fed:"
## [1] "It requires 2 tables, one of GOids which must have columns (gene)ID and GO(category)"
## [1] "The other table is of gene lengths with columns (gene)ID and (gene)width."
## [1] "Other columns are fine, but ignored."

## Warning in pcls(G): initial point very close to some inequality
## constraints

## Using manually entered categories.
## Calculating the p-values...

## [1] "Calculating q-values"
## [1] "Filling godata table with term information, this takes a while."
## [1] "Making pvalue plots for the ontologies."

tryp_cl14clbr_low_go$pvalue_histogram

tryp_cl14clbr_low_go$bpp_plot

## Oh crap, I cannot make trees until I generate a id2go R mapping, I will need to check my previous script for how to do that (I think I do it automatically in simple_topgo()...
##tryp_cl14clbr_low_trees = goseq_trees(degenes=tryp_cl14clbr_low, tryp_cl14clbr_low_go)
## I had to hack the geneTable() function in order to make it not take infinite memory because it was trying to do some stupid merge of every gff column against every other column...
## Once the geneTable.rda file exists and included real information, this seems to work fine, though.
tryp_cl14clbr_low_cl = simple_clusterprofiler(tryp_cl14clbr_low)

## [1] "Using GO mapping data located in GO2EG.rda"
## [1] "Testing gseGO"
## NULL
## [1] "Starting MF(molecular function) analysis"
## The minimum observed adjusted pvalue is: 0.011047
## The minimum observed adjusted pvalue is: 0.281686
## [1] "Starting BP(biological process) analysis"
## The minimum observed adjusted pvalue is: 0.036816
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Starting CC(cellular component) analysis"
## The minimum observed adjusted pvalue is: 0.391918
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Attempting to include the cnetplots from clusterProfiler."
## [1] "They fail often, if this is causing errors, set:"
## [1] "include_cnetplots to FALSE"
## [1] "cnetplot just failed for the MF ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the BP ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the CC ontology.  Do not be concerned with the previous error."

tryp_cl14clbr_low_cl$bp_all_barplot

## The first time this is run in a fresh directory, it needs to be run like this:
## simple_topgo(tryp_cl14clbr_low, goids_df=go_ids)
## setting goids_df will tell it to generate a table of genes->GO<-genes in the format expected.
tryp_cl14clbr_low_top = simple_topgo(tryp_cl14clbr_low, goids=goid_file, pvals=tryp_cl14clbr_pvals)

##
## Building most specific GOs ..... ( 620 GO terms found. )
##
## Build GO DAG topology .......... ( 993 GO terms and 1244 relations. )
##
## Annotating nodes ............... ( 4223 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 566 GO terms found. )
##
## Build GO DAG topology .......... ( 1521 GO terms and 3182 relations. )
##
## Annotating nodes ............... ( 3841 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 203 GO terms found. )
##
## Build GO DAG topology .......... ( 363 GO terms and 771 relations. )
##
## Annotating nodes ............... ( 3453 genes annotated to the GO terms. )
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 676 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1053 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 261 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  6 nodes to be scored    (0 eliminated genes)
##
##   Level 12:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 11:  13 nodes to be scored   (0 eliminated genes)
##
##   Level 10:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 9:   45 nodes to be scored   (0 eliminated genes)
##
##   Level 8:   91 nodes to be scored   (3 eliminated genes)
##
##   Level 7:   184 nodes to be scored  (12 eliminated genes)
##
##   Level 6:   273 nodes to be scored  (120 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (120 eliminated genes)
##
##   Level 4:   113 nodes to be scored  (154 eliminated genes)
##
##   Level 3:   47 nodes to be scored   (172 eliminated genes)
##
##   Level 2:   14 nodes to be scored   (172 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (172 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 16:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  18 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  68 nodes to be scored   (0 eliminated genes)
##
##   Level 11:  107 nodes to be scored  (9 eliminated genes)
##
##   Level 10:  182 nodes to be scored  (17 eliminated genes)
##
##   Level 9:   214 nodes to be scored  (52 eliminated genes)
##
##   Level 8:   205 nodes to be scored  (54 eliminated genes)
##
##   Level 7:   199 nodes to be scored  (189 eliminated genes)
##
##   Level 6:   195 nodes to be scored  (306 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (322 eliminated genes)
##
##   Level 4:   96 nodes to be scored   (322 eliminated genes)
##
##   Level 3:   37 nodes to be scored   (322 eliminated genes)
##
##   Level 2:   15 nodes to be scored   (322 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (322 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  8 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  22 nodes to be scored   (0 eliminated genes)
##
##   Level 11:  42 nodes to be scored   (15 eliminated genes)
##
##   Level 10:  58 nodes to be scored   (15 eliminated genes)
##
##   Level 9:   35 nodes to be scored   (50 eliminated genes)
##
##   Level 8:   53 nodes to be scored   (50 eliminated genes)
##
##   Level 7:   29 nodes to be scored   (74 eliminated genes)
##
##   Level 6:   32 nodes to be scored   (74 eliminated genes)
##
##   Level 5:   20 nodes to be scored   (74 eliminated genes)
##
##   Level 4:   24 nodes to be scored   (74 eliminated genes)
##
##   Level 3:   10 nodes to be scored   (74 eliminated genes)
##
##   Level 2:   6 nodes to be scored    (74 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (74 eliminated genes)
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 676 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 15:  2 nodes to be scored.
##
##   Level 14:  2 nodes to be scored.
##
##   Level 13:  4 nodes to be scored.
##
##   Level 12:  5 nodes to be scored.
##
##   Level 11:  9 nodes to be scored.
##
##   Level 10:  17 nodes to be scored.
##
##   Level 9:   27 nodes to be scored.
##
##   Level 8:   57 nodes to be scored.
##
##   Level 7:   113 nodes to be scored.
##
##   Level 6:   166 nodes to be scored.
##
##   Level 5:   127 nodes to be scored.
##
##   Level 4:   95 nodes to be scored.
##
##   Level 3:   39 nodes to be scored.
##
##   Level 2:   12 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 1053 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 16:  1 nodes to be scored.
##
##   Level 15:  1 nodes to be scored.
##
##   Level 14:  4 nodes to be scored.
##
##   Level 13:  9 nodes to be scored.
##
##   Level 12:  39 nodes to be scored.
##
##   Level 11:  74 nodes to be scored.
##
##   Level 10:  116 nodes to be scored.
##
##   Level 9:   133 nodes to be scored.
##
##   Level 8:   140 nodes to be scored.
##
##   Level 7:   132 nodes to be scored.
##
##   Level 6:   145 nodes to be scored.
##
##   Level 5:   134 nodes to be scored.
##
##   Level 4:   79 nodes to be scored.
##
##   Level 3:   31 nodes to be scored.
##
##   Level 2:   14 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 261 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 14:  4 nodes to be scored.
##
##   Level 13:  11 nodes to be scored.
##
##   Level 12:  17 nodes to be scored.
##
##   Level 11:  32 nodes to be scored.
##
##   Level 10:  43 nodes to be scored.
##
##   Level 9:   27 nodes to be scored.
##
##   Level 8:   35 nodes to be scored.
##
##   Level 7:   17 nodes to be scored.
##
##   Level 6:   21 nodes to be scored.
##
##   Level 5:   17 nodes to be scored.
##
##   Level 4:   20 nodes to be scored.
##
##   Level 3:   10 nodes to be scored.
##
##   Level 2:   6 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.

topgo_pval = topgo_pval_plot(tryp_cl14clbr_low_top)
topgo_pval$BP

top_trees = topgo_trees(tryp_cl14clbr_low_top)

top_trees$bp_fisher_tree

## Warning in replayPlot(x): zero-length arrow is of indeterminate angle and
## so skipped

Amastigote a96 cl14/clbr

## Changes in epimastigotes from CLBrener to CL14
a96_cl14clbr_table = all_tables$a96_cl14clbr
a96_cl14clbr_summary = summary(a96_cl14clbr_table$logFC)
a96_cl14clbr_pvals = a96_cl14clbr_table$adj.P.Val
names(a96_cl14clbr_pvals) = rownames(a96_cl14clbr_table)
## Extract the genes with low expression in CL14 compared to Brener and very low pvalues
a96_cl14clbr_low = subset(a96_cl14clbr_table, logFC >= a96_cl14clbr_summary[5] & adj.P.Val <= pval_thresh)
## Do a goseq comparison of the low genes
a96_cl14clbr_low_go = simple_goseq(a96_cl14clbr_low, lengths=gene_lengths, goids=go_ids)

## [1] "simple_goseq() makes some pretty hard assumptions about the data it is fed:"
## [1] "It requires 2 tables, one of GOids which must have columns (gene)ID and GO(category)"
## [1] "The other table is of gene lengths with columns (gene)ID and (gene)width."
## [1] "Other columns are fine, but ignored."

## Warning in pcls(G): initial point very close to some inequality
## constraints

## Using manually entered categories.
## Calculating the p-values...

## [1] "Calculating q-values"
## [1] "Filling godata table with term information, this takes a while."
## [1] "Making pvalue plots for the ontologies."

a96_cl14clbr_low_go$pvalue_histogram

a96_cl14clbr_low_go$bpp_plot

## Oh crap, I cannot make trees until I generate a id2go R mapping, I will need to check my previous script for how to do that (I think I do it automatically in simple_topgo()...
##a96_cl14clbr_low_trees = goseq_trees(degenes=a96_cl14clbr_low, a96_cl14clbr_low_go)
## I had to hack the geneTable() function in order to make it not take infinite memory because it was trying to do some stupid merge of every gff column against every other column...
## Once the geneTable.rda file exists and included real information, this seems to work fine, though.
a96_cl14clbr_low_cl = simple_clusterprofiler(a96_cl14clbr_low)

## [1] "Using GO mapping data located in GO2EG.rda"
## [1] "Testing gseGO"
## NULL
## [1] "Starting MF(molecular function) analysis"
## The minimum observed adjusted pvalue is: 0.062178
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Starting BP(biological process) analysis"
## The minimum observed adjusted pvalue is: 0.062178
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Starting CC(cellular component) analysis"
## The minimum observed adjusted pvalue is: 0.659855
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Attempting to include the cnetplots from clusterProfiler."
## [1] "They fail often, if this is causing errors, set:"
## [1] "include_cnetplots to FALSE"
## [1] "cnetplot just failed for the MF ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the BP ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the CC ontology.  Do not be concerned with the previous error."

a96_cl14clbr_low_cl$bp_all_barplot

## The first time this is run in a fresh directory, it needs to be run like this:
## simple_topgo(a96_cl14clbr_low, goids_df=go_ids)
## setting goids_df will tell it to generate a table of genes->GO<-genes in the format expected.
a96_cl14clbr_low_top = simple_topgo(a96_cl14clbr_low, goids=goid_file, pvals=a96_cl14clbr_pvals)

##
## Building most specific GOs ..... ( 620 GO terms found. )
##
## Build GO DAG topology .......... ( 993 GO terms and 1244 relations. )
##
## Annotating nodes ............... ( 4223 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 566 GO terms found. )
##
## Build GO DAG topology .......... ( 1521 GO terms and 3182 relations. )
##
## Annotating nodes ............... ( 3841 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 203 GO terms found. )
##
## Build GO DAG topology .......... ( 363 GO terms and 771 relations. )
##
## Annotating nodes ............... ( 3453 genes annotated to the GO terms. )
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 717 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1081 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 288 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  6 nodes to be scored    (0 eliminated genes)
##
##   Level 12:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 11:  13 nodes to be scored   (0 eliminated genes)
##
##   Level 10:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 9:   45 nodes to be scored   (0 eliminated genes)
##
##   Level 8:   91 nodes to be scored   (0 eliminated genes)
##
##   Level 7:   184 nodes to be scored  (0 eliminated genes)
##
##   Level 6:   273 nodes to be scored  (144 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (144 eliminated genes)
##
##   Level 4:   113 nodes to be scored  (199 eliminated genes)
##
##   Level 3:   47 nodes to be scored   (199 eliminated genes)
##
##   Level 2:   14 nodes to be scored   (274 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (274 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 16:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  18 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  68 nodes to be scored   (0 eliminated genes)
##
##   Level 11:  107 nodes to be scored  (4 eliminated genes)
##
##   Level 10:  182 nodes to be scored  (42 eliminated genes)
##
##   Level 9:   214 nodes to be scored  (42 eliminated genes)
##
##   Level 8:   205 nodes to be scored  (42 eliminated genes)
##
##   Level 7:   199 nodes to be scored  (42 eliminated genes)
##
##   Level 6:   195 nodes to be scored  (42 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (42 eliminated genes)
##
##   Level 4:   96 nodes to be scored   (55 eliminated genes)
##
##   Level 3:   37 nodes to be scored   (409 eliminated genes)
##
##   Level 2:   15 nodes to be scored   (409 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (409 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  8 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  22 nodes to be scored   (0 eliminated genes)
##
##   Level 11:  42 nodes to be scored   (0 eliminated genes)
##
##   Level 10:  58 nodes to be scored   (51 eliminated genes)
##
##   Level 9:   35 nodes to be scored   (51 eliminated genes)
##
##   Level 8:   53 nodes to be scored   (51 eliminated genes)
##
##   Level 7:   29 nodes to be scored   (366 eliminated genes)
##
##   Level 6:   32 nodes to be scored   (366 eliminated genes)
##
##   Level 5:   20 nodes to be scored   (366 eliminated genes)
##
##   Level 4:   24 nodes to be scored   (366 eliminated genes)
##
##   Level 3:   10 nodes to be scored   (366 eliminated genes)
##
##   Level 2:   6 nodes to be scored    (366 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (1101 eliminated genes)
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 717 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 15:  3 nodes to be scored.
##
##   Level 14:  3 nodes to be scored.
##
##   Level 13:  5 nodes to be scored.
##
##   Level 12:  6 nodes to be scored.
##
##   Level 11:  8 nodes to be scored.
##
##   Level 10:  14 nodes to be scored.
##
##   Level 9:   29 nodes to be scored.
##
##   Level 8:   60 nodes to be scored.
##
##   Level 7:   118 nodes to be scored.
##
##   Level 6:   184 nodes to be scored.
##
##   Level 5:   134 nodes to be scored.
##
##   Level 4:   97 nodes to be scored.
##
##   Level 3:   42 nodes to be scored.
##
##   Level 2:   13 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 1081 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 16:  2 nodes to be scored.
##
##   Level 15:  2 nodes to be scored.
##
##   Level 14:  4 nodes to be scored.
##
##   Level 13:  10 nodes to be scored.
##
##   Level 12:  36 nodes to be scored.
##
##   Level 11:  65 nodes to be scored.
##
##   Level 10:  117 nodes to be scored.
##
##   Level 9:   144 nodes to be scored.
##
##   Level 8:   148 nodes to be scored.
##
##   Level 7:   143 nodes to be scored.
##
##   Level 6:   154 nodes to be scored.
##
##   Level 5:   132 nodes to be scored.
##
##   Level 4:   79 nodes to be scored.
##
##   Level 3:   31 nodes to be scored.
##
##   Level 2:   13 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 288 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 15:  1 nodes to be scored.
##
##   Level 14:  6 nodes to be scored.
##
##   Level 13:  17 nodes to be scored.
##
##   Level 12:  20 nodes to be scored.
##
##   Level 11:  34 nodes to be scored.
##
##   Level 10:  46 nodes to be scored.
##
##   Level 9:   29 nodes to be scored.
##
##   Level 8:   38 nodes to be scored.
##
##   Level 7:   20 nodes to be scored.
##
##   Level 6:   23 nodes to be scored.
##
##   Level 5:   17 nodes to be scored.
##
##   Level 4:   20 nodes to be scored.
##
##   Level 3:   10 nodes to be scored.
##
##   Level 2:   6 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.

topgo_pval = topgo_pval_plot(a96_cl14clbr_low_top)
topgo_pval$BP

top_trees = topgo_trees(a96_cl14clbr_low_top)

top_trees$bp_fisher_tree

## Changes in epimastigotes from CLBrener to CL14
a96_cl14clbr_table = all_tables$a96_cl14clbr
a96_cl14clbr_summary = summary(a96_cl14clbr_table$logFC)
a96_cl14clbr_pvals = a96_cl14clbr_table$adj.P.Val
names(a96_cl14clbr_pvals) = rownames(a96_cl14clbr_table)
## Extract the genes with low expression in CL14 compared to Brener and very low pvalues
a96_cl14clbr_low = subset(a96_cl14clbr_table, logFC <= a96_cl14clbr_summary[2] & adj.P.Val <= pval_thresh)
## Do a goseq comparison of the low genes
a96_cl14clbr_low_go = simple_goseq(a96_cl14clbr_low, lengths=gene_lengths, goids=go_ids)

## [1] "simple_goseq() makes some pretty hard assumptions about the data it is fed:"
## [1] "It requires 2 tables, one of GOids which must have columns (gene)ID and GO(category)"
## [1] "The other table is of gene lengths with columns (gene)ID and (gene)width."
## [1] "Other columns are fine, but ignored."

## Warning in pcls(G): initial point very close to some inequality
## constraints

## Using manually entered categories.
## Calculating the p-values...

## [1] "Calculating q-values"
## [1] "Filling godata table with term information, this takes a while."
## [1] "Making pvalue plots for the ontologies."

a96_cl14clbr_low_go$pvalue_histogram

a96_cl14clbr_low_go$bpp_plot

## Error in grid.Call.graphics(L_raster, x$raster, x$x, x$y, x$width, x$height, : Empty raster

## Oh crap, I cannot make trees until I generate a id2go R mapping, I will need to check my previous script for how to do that (I think I do it automatically in simple_topgo()...
##a96_cl14clbr_low_trees = goseq_trees(degenes=a96_cl14clbr_low, a96_cl14clbr_low_go)
## I had to hack the geneTable() function in order to make it not take infinite memory because it was trying to do some stupid merge of every gff column against every other column...
## Once the geneTable.rda file exists and included real information, this seems to work fine, though.
a96_cl14clbr_low_cl = simple_clusterprofiler(a96_cl14clbr_low)

## [1] "Using GO mapping data located in GO2EG.rda"
## [1] "Testing gseGO"
## NULL
## [1] "Starting MF(molecular function) analysis"
## The minimum observed adjusted pvalue is: 0.724427
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Starting BP(biological process) analysis"
## The minimum observed adjusted pvalue is: 0.837915
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Starting CC(cellular component) analysis"
## The minimum observed adjusted pvalue is: 0.234275
## The minimum observed adjusted pvalue is: 1.000000
## [1] "Attempting to include the cnetplots from clusterProfiler."
## [1] "They fail often, if this is causing errors, set:"
## [1] "include_cnetplots to FALSE"
## [1] "cnetplot just failed for the MF ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the BP ontology.  Do not be concerned with the previous error."
## [1] "cnetplot just failed for the CC ontology.  Do not be concerned with the previous error."

a96_cl14clbr_low_cl$bp_all_barplot
## The first time this is run in a fresh directory, it needs to be run like this:
## simple_topgo(a96_cl14clbr_low, goids_df=go_ids)
## setting goids_df will tell it to generate a table of genes->GO<-genes in the format expected.
a96_cl14clbr_low_top = simple_topgo(a96_cl14clbr_low, goids=goid_file, pvals=a96_cl14clbr_pvals)

##
## Building most specific GOs ..... ( 620 GO terms found. )
##
## Build GO DAG topology .......... ( 993 GO terms and 1244 relations. )
##
## Annotating nodes ............... ( 4223 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 566 GO terms found. )
##
## Build GO DAG topology .......... ( 1521 GO terms and 3182 relations. )
##
## Annotating nodes ............... ( 3841 genes annotated to the GO terms. )
##
## Building most specific GOs ..... ( 203 GO terms found. )
##
## Build GO DAG topology .......... ( 363 GO terms and 771 relations. )
##
## Annotating nodes ............... ( 3453 genes annotated to the GO terms. )
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 717 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1081 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 288 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Classic Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  KS tests
##           score order:  increasing
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 993 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  6 nodes to be scored    (0 eliminated genes)
##
##   Level 12:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 11:  13 nodes to be scored   (0 eliminated genes)
##
##   Level 10:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 9:   45 nodes to be scored   (0 eliminated genes)
##
##   Level 8:   91 nodes to be scored   (0 eliminated genes)
##
##   Level 7:   184 nodes to be scored  (0 eliminated genes)
##
##   Level 6:   273 nodes to be scored  (144 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (144 eliminated genes)
##
##   Level 4:   113 nodes to be scored  (199 eliminated genes)
##
##   Level 3:   47 nodes to be scored   (199 eliminated genes)
##
##   Level 2:   14 nodes to be scored   (274 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (274 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 1521 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 16:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 15:  3 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  7 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  18 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  68 nodes to be scored   (0 eliminated genes)
##
##   Level 11:  107 nodes to be scored  (4 eliminated genes)
##
##   Level 10:  182 nodes to be scored  (42 eliminated genes)
##
##   Level 9:   214 nodes to be scored  (42 eliminated genes)
##
##   Level 8:   205 nodes to be scored  (42 eliminated genes)
##
##   Level 7:   199 nodes to be scored  (42 eliminated genes)
##
##   Level 6:   195 nodes to be scored  (42 eliminated genes)
##
##   Level 5:   172 nodes to be scored  (42 eliminated genes)
##
##   Level 4:   96 nodes to be scored   (55 eliminated genes)
##
##   Level 3:   37 nodes to be scored   (409 eliminated genes)
##
##   Level 2:   15 nodes to be scored   (409 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (409 eliminated genes)
##
##           -- Elim Algorithm --
##
##       the algorithm is scoring 363 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test
##           cutOff:  0.01
##           score order:  increasing
##
##   Level 15:  2 nodes to be scored    (0 eliminated genes)
##
##   Level 14:  8 nodes to be scored    (0 eliminated genes)
##
##   Level 13:  21 nodes to be scored   (0 eliminated genes)
##
##   Level 12:  22 nodes to be scored   (0 eliminated genes)
##
##   Level 11:  42 nodes to be scored   (0 eliminated genes)
##
##   Level 10:  58 nodes to be scored   (51 eliminated genes)
##
##   Level 9:   35 nodes to be scored   (51 eliminated genes)
##
##   Level 8:   53 nodes to be scored   (51 eliminated genes)
##
##   Level 7:   29 nodes to be scored   (366 eliminated genes)
##
##   Level 6:   32 nodes to be scored   (366 eliminated genes)
##
##   Level 5:   20 nodes to be scored   (366 eliminated genes)
##
##   Level 4:   24 nodes to be scored   (366 eliminated genes)
##
##   Level 3:   10 nodes to be scored   (366 eliminated genes)
##
##   Level 2:   6 nodes to be scored    (366 eliminated genes)
##
##   Level 1:   1 nodes to be scored    (1101 eliminated genes)
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 717 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 15:  3 nodes to be scored.
##
##   Level 14:  3 nodes to be scored.
##
##   Level 13:  5 nodes to be scored.
##
##   Level 12:  6 nodes to be scored.
##
##   Level 11:  8 nodes to be scored.
##
##   Level 10:  14 nodes to be scored.
##
##   Level 9:   29 nodes to be scored.
##
##   Level 8:   60 nodes to be scored.
##
##   Level 7:   118 nodes to be scored.
##
##   Level 6:   184 nodes to be scored.
##
##   Level 5:   134 nodes to be scored.
##
##   Level 4:   97 nodes to be scored.
##
##   Level 3:   42 nodes to be scored.
##
##   Level 2:   13 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 1081 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 16:  2 nodes to be scored.
##
##   Level 15:  2 nodes to be scored.
##
##   Level 14:  4 nodes to be scored.
##
##   Level 13:  10 nodes to be scored.
##
##   Level 12:  36 nodes to be scored.
##
##   Level 11:  65 nodes to be scored.
##
##   Level 10:  117 nodes to be scored.
##
##   Level 9:   144 nodes to be scored.
##
##   Level 8:   148 nodes to be scored.
##
##   Level 7:   143 nodes to be scored.
##
##   Level 6:   154 nodes to be scored.
##
##   Level 5:   132 nodes to be scored.
##
##   Level 4:   79 nodes to be scored.
##
##   Level 3:   31 nodes to be scored.
##
##   Level 2:   13 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.
##
##           -- Weight Algorithm --
##
##       The algorithm is scoring 288 nontrivial nodes
##       parameters:
##           test statistic:  Fisher test : ratio
##
##   Level 15:  1 nodes to be scored.
##
##   Level 14:  6 nodes to be scored.
##
##   Level 13:  17 nodes to be scored.
##
##   Level 12:  20 nodes to be scored.
##
##   Level 11:  34 nodes to be scored.
##
##   Level 10:  46 nodes to be scored.
##
##   Level 9:   29 nodes to be scored.
##
##   Level 8:   38 nodes to be scored.
##
##   Level 7:   20 nodes to be scored.
##
##   Level 6:   23 nodes to be scored.
##
##   Level 5:   17 nodes to be scored.
##
##   Level 4:   20 nodes to be scored.
##
##   Level 3:   10 nodes to be scored.
##
##   Level 2:   6 nodes to be scored.
##
##   Level 1:   1 nodes to be scored.

topgo_pval = topgo_pval_plot(a96_cl14clbr_low_top)
topgo_pval$BP
top_trees = topgo_trees(a96_cl14clbr_low_top)
top_trees$bp_fisher_tree

KEGG

Oh yeah, I wrote a toy to wrap KEGG pathway stuff, lets try that. The KEGG abbreviation for t. cruzi is tcr

In theory, the paths data structure returned by hpglpathview() should be a table of the numer of up/down genes observed in each KEGG path. However, there are some weird NAs getting returned. Effing hippies. -- fixed.

kegg_list = epi_cl14clbr_table$logFC
names(kegg_list) = rownames(epi_cl14clbr_table)
epi_cl14clbr_paths = hpgl_pathview(kegg_list, species="tcr", indir="pathview_in", outdir="pathview_epi_cl14clbr", string_from="TcCLB.", string_to="")

## Warning: 'Rgraphviz' namespace cannot be unloaded:
##   namespace 'Rgraphviz' is imported by 'pathview' so cannot be unloaded

## Warning: 'graph' namespace cannot be unloaded:
##   namespace 'graph' is imported by 'Rgraphviz', 'RCytoscape' so cannot be unloaded

## Loading required package: KEGGgraph
## Loading required package: graph
##
## Attaching package: 'graph'
##
## The following object is masked from 'package:Biostrings':
##
##     complement
##
## The following object is masked from 'package:XML':
##
##     addNode
##
## The following object is masked from 'package:plyr':
##
##     join
##
##
## Attaching package: 'KEGGgraph'
##
## The following objects are masked from 'package:seqinr':
##
##     getName, getType
##
## ##############################################################################
## Pathview is an open source software package distributed under GNU General
## Public License version 3 (GPLv3). Details of GPLv3 is available at
## http://www.gnu.org/licenses/gpl-3.0.html.
##
## The pathview downloads and uses KEGG data. Academic users may freely use the
## KEGG website at http://www.kegg.jp/ or its mirror site at GenomeNet
## http://www.genome.jp/kegg/. Academic users may also freely link to the KEGG
## website. Non-academic users may use the KEGG website as end users for
## non-commercial purposes, but any other use requires a license agreement
## (details at http://www.kegg.jp/kegg/legal.html).
## ##############################################################################

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## [1] "some node width is different from others, and hence adjusted!"

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

head(epi_cl14clbr_paths)

##                                                file genes up down
## tcr01040 pathview_epi_cl14clbr/tcr01040_colored.png     0  0    0
## tcr00565 pathview_epi_cl14clbr/tcr00565_colored.png     4  0    1
## tcr00360 pathview_epi_cl14clbr/tcr00360_colored.png     2  0    2
## tcr00072 pathview_epi_cl14clbr/tcr00072_colored.png     4  0    2
## tcr00600 pathview_epi_cl14clbr/tcr00600_colored.png     5  0    3
## tcr00592 pathview_epi_cl14clbr/tcr00592_colored.png     3  0    3

a60_cl14clbr_table = all_tables$a60_cl14clbr
kegg_list = a60_cl14clbr_table$logFC
names(kegg_list) = rownames(a60_cl14clbr_table)
a60_cl14clbr_paths = hpgl_pathview(kegg_list, species="tcr", indir="pathview_in", outdir="pathview_a60_cl14clbr", string_from="TcCLB.", string_to="")

## Warning: 'graph' namespace cannot be unloaded:
##   namespace 'graph' is imported by 'Rgraphviz', 'RCytoscape' so cannot be unloaded

## Loading required package: KEGGgraph
## Loading required package: graph
##
## Attaching package: 'graph'
##
## The following object is masked from 'package:Biostrings':
##
##     complement
##
## The following object is masked from 'package:XML':
##
##     addNode
##
## The following object is masked from 'package:plyr':
##
##     join
##
##
## Attaching package: 'KEGGgraph'
##
## The following objects are masked from 'package:seqinr':
##
##     getName, getType
##
## ##############################################################################
## Pathview is an open source software package distributed under GNU General
## Public License version 3 (GPLv3). Details of GPLv3 is available at
## http://www.gnu.org/licenses/gpl-3.0.html.
##
## The pathview downloads and uses KEGG data. Academic users may freely use the
## KEGG website at http://www.kegg.jp/ or its mirror site at GenomeNet
## http://www.genome.jp/kegg/. Academic users may also freely link to the KEGG
## website. Non-academic users may use the KEGG website as end users for
## non-commercial purposes, but any other use requires a license agreement
## (details at http://www.kegg.jp/kegg/legal.html).
## ##############################################################################

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## [1] "some node width is different from others, and hence adjusted!"

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

head(a60_cl14clbr_paths)

##                                                file genes up down
## tcr01040 pathview_a60_cl14clbr/tcr01040_colored.png     0  0    0
## tcr02010 pathview_a60_cl14clbr/tcr02010_colored.png     3  0    0
## tcr00592 pathview_a60_cl14clbr/tcr00592_colored.png     3  0    3
## tcr00750 pathview_a60_cl14clbr/tcr00750_colored.png     4  0    4
## tcr00360 pathview_a60_cl14clbr/tcr00360_colored.png     2  1    0
## tcr00740 pathview_a60_cl14clbr/tcr00740_colored.png     2  1    0

a96_cl14clbr_table = all_tables$a96_cl14clbr
kegg_list = a96_cl14clbr_table$logFC
names(kegg_list) = rownames(a96_cl14clbr_table)
a96_cl14clbr_paths = hpgl_pathview(kegg_list, species="tcr", indir="pathview_in", outdir="pathview_a96_cl14clbr", string_from="TcCLB.", string_to="")

## Warning: 'graph' namespace cannot be unloaded:
##   namespace 'graph' is imported by 'Rgraphviz', 'RCytoscape' so cannot be unloaded

## Loading required package: KEGGgraph
## Loading required package: graph
##
## Attaching package: 'graph'
##
## The following object is masked from 'package:Biostrings':
##
##     complement
##
## The following object is masked from 'package:XML':
##
##     addNode
##
## The following object is masked from 'package:plyr':
##
##     join
##
##
## Attaching package: 'KEGGgraph'
##
## The following objects are masked from 'package:seqinr':
##
##     getName, getType
##
## ##############################################################################
## Pathview is an open source software package distributed under GNU General
## Public License version 3 (GPLv3). Details of GPLv3 is available at
## http://www.gnu.org/licenses/gpl-3.0.html.
##
## The pathview downloads and uses KEGG data. Academic users may freely use the
## KEGG website at http://www.kegg.jp/ or its mirror site at GenomeNet
## http://www.genome.jp/kegg/. Academic users may also freely link to the KEGG
## website. Non-academic users may use the KEGG website as end users for
## non-commercial purposes, but any other use requires a license agreement
## (details at http://www.kegg.jp/kegg/legal.html).
## ##############################################################################

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## [1] "some node width is different from others, and hence adjusted!"

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

head(a96_cl14clbr_paths)

##                                                file genes up down
## tcr01040 pathview_a96_cl14clbr/tcr01040_colored.png     0  0    0
## tcr00524 pathview_a96_cl14clbr/tcr00524_colored.png     2  0    1
## tcr00360 pathview_a96_cl14clbr/tcr00360_colored.png     2  1    0
## tcr00740 pathview_a96_cl14clbr/tcr00740_colored.png     2  1    0
## tcr00565 pathview_a96_cl14clbr/tcr00565_colored.png     4  1    1
## tcr00500 pathview_a96_cl14clbr/tcr00500_colored.png     7  1    4

tryp_cl14clbr_table = all_tables$a96_cl14clbr
kegg_list = tryp_cl14clbr_table$logFC
names(kegg_list) = rownames(tryp_cl14clbr_table)
tryp_cl14clbr_paths = hpgl_pathview(kegg_list, species="tcr", indir="pathview_in", outdir="pathview_tryp_cl14clbr", string_from="TcCLB.", string_to="")

## Warning: 'graph' namespace cannot be unloaded:
##   namespace 'graph' is imported by 'Rgraphviz', 'RCytoscape' so cannot be unloaded

## Loading required package: KEGGgraph
## Loading required package: graph
##
## Attaching package: 'graph'
##
## The following object is masked from 'package:Biostrings':
##
##     complement
##
## The following object is masked from 'package:XML':
##
##     addNode
##
## The following object is masked from 'package:plyr':
##
##     join
##
##
## Attaching package: 'KEGGgraph'
##
## The following objects are masked from 'package:seqinr':
##
##     getName, getType
##
## ##############################################################################
## Pathview is an open source software package distributed under GNU General
## Public License version 3 (GPLv3). Details of GPLv3 is available at
## http://www.gnu.org/licenses/gpl-3.0.html.
##
## The pathview downloads and uses KEGG data. Academic users may freely use the
## KEGG website at http://www.kegg.jp/ or its mirror site at GenomeNet
## http://www.genome.jp/kegg/. Academic users may also freely link to the KEGG
## website. Non-academic users may use the KEGG website as end users for
## non-commercial purposes, but any other use requires a license agreement
## (details at http://www.kegg.jp/kegg/legal.html).
## ##############################################################################

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## [1] "some node width is different from others, and hence adjusted!"

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

## Start tag expected, '<' not found

head(tryp_cl14clbr_paths)

##                                                 file genes up down
## tcr01040 pathview_tryp_cl14clbr/tcr01040_colored.png     0  0    0
## tcr00524 pathview_tryp_cl14clbr/tcr00524_colored.png     2  0    1
## tcr00360 pathview_tryp_cl14clbr/tcr00360_colored.png     2  1    0
## tcr00740 pathview_tryp_cl14clbr/tcr00740_colored.png     2  1    0
## tcr00565 pathview_tryp_cl14clbr/tcr00565_colored.png     4  1    1
## tcr00500 pathview_tryp_cl14clbr/tcr00500_colored.png     7  1    4