1 RNA sequencing of Mycoplasma tuberculosis

Keith sent me a sample sheet describing some Mtb data. I do not quite know what the long-term goals are, but the immediate goal is to understand better some of the sources of variance and/or batch effect(s) in the data.

2 Poke at the mtb data

Because I am an asshat, I changed 2 column headings: Sample Name -> Sample ID and filepath -> file

## Reading the sample metadata.
## The sample definitions comprises: 19, 8 rows, columns.
## Reading count tables.
## Reading count tables with read.table().
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0618.count contains 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0619.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0620.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0621.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0622.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0623.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0624.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0625.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0626.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0628.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0629.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0664.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0665.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0666.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0668.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0669.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0670.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0671.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0672.count contains 41096 rows and merges to 41096 rows.
## Finished reading count tables.
## Matched 41091 annotations and counts.
## Bringing together the count matrix and gene information.
##          sampleid            condition batch infectiondate
## HPGL0618 HPGL0618 Uninfected_plus_IFNb     A    2015/05/06
## HPGL0619 HPGL0619                   Rv     A    2015/05/06
## HPGL0620 HPGL0620           delta_esxA     A    2015/05/06
## HPGL0621 HPGL0621      delta_esxA_comp     A    2015/05/06
## HPGL0622 HPGL0622 Uninfected_plus_IFNb     B    2015/05/06
## HPGL0623 HPGL0623                   Rv     B    2015/05/06
## HPGL0624 HPGL0624           delta_esxA     B    2015/05/06
## HPGL0625 HPGL0625      delta_esxA_comp     B    2015/05/06
## HPGL0626 HPGL0626 Uninfected_plus_IFNb     C    2015/05/12
## HPGL0628 HPGL0628           delta_esxA     C    2015/05/12
## HPGL0629 HPGL0629      delta_esxA_comp     C    2015/05/12
## HPGL0664 HPGL0664           Uninfected     D    2015/11/04
## HPGL0665 HPGL0665 Uninfected_plus_IFNb     D    2015/11/04
## HPGL0666 HPGL0666           Uninfected     E    2015/11/13
## HPGL0668 HPGL0668                   Rv     E    2015/11/13
## HPGL0669 HPGL0669           delta_esxA     E    2015/11/13
## HPGL0670 HPGL0670      delta_esxA_comp     E    2015/11/13
## HPGL0671 HPGL0671           Uninfected     F    2015/05/06
## HPGL0672 HPGL0672 Uninfected_plus_IFNb     F    2015/05/12
##          rnacollectiondate rnaisolationdate libraryprepdate
## HPGL0618        2015/05/06       2015/06/28      2015/07/01
## HPGL0619        2015/05/06       2015/06/28      2015/07/01
## HPGL0620        2015/05/06       2015/06/28      2015/07/01
## HPGL0621        2015/05/06       2015/06/28      2015/07/01
## HPGL0622        2015/05/06       2015/06/28      2015/07/01
## HPGL0623        2015/05/06       2015/06/28      2015/07/01
## HPGL0624        2015/05/06       2015/06/28      2015/07/01
## HPGL0625        2015/05/06       2015/06/28      2015/07/01
## HPGL0626        2015/05/12       2015/06/28      2015/07/01
## HPGL0628        2015/05/12       2015/06/28      2015/07/01
## HPGL0629        2015/05/12       2015/06/28      2015/07/01
## HPGL0664        2015/11/04       2015/12/16      2015/12/18
## HPGL0665        2015/11/04       2015/12/16      2015/12/18
## HPGL0666        2015/11/13       2015/12/16      2015/12/18
## HPGL0668        2015/11/13       2015/12/16      2015/12/18
## HPGL0669        2015/11/13       2015/12/16      2015/12/18
## HPGL0670        2015/11/13       2015/12/16      2015/12/18
## HPGL0671        2015/12/10       2015/12/16      2015/12/18
## HPGL0672        2015/12/10       2015/12/16      2015/12/18
##                                                                           file
## HPGL0618 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0618.count
## HPGL0619 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0619.count
## HPGL0620 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0620.count
## HPGL0621 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0621.count
## HPGL0622 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0622.count
## HPGL0623 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0623.count
## HPGL0624 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0624.count
## HPGL0625 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0625.count
## HPGL0626 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0626.count
## HPGL0628 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0628.count
## HPGL0629 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0629.count
## HPGL0664 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0664.count
## HPGL0665 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0665.count
## HPGL0666 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0666.count
## HPGL0668 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0668.count
## HPGL0669 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0669.count
## HPGL0670 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0670.count
## HPGL0671 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0671.count
## HPGL0672 /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0672.count
## Reading the sample metadata.
## The sample definitions comprises: 19, 11 rows, columns.
## Reading count tables.
## Reading count tables with read.table().
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0618.count contains 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0619.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0620.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0621.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0622.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0623.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0624.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0625.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0626.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0628.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0629.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0664.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0665.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0666.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0668.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0669.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0670.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0671.count contains 41096 rows and merges to 41096 rows.
## /cbcb/nelsayed-scratch/keith/mtb_ifnb/counts/mmusculus/HPGL0672.count contains 41096 rows and merges to 41096 rows.
## Finished reading count tables.
## Matched 41091 annotations and counts.
## Bringing together the count matrix and gene information.

3 Simple metrics

initial_metrics <- sm(graph_metrics(mtb_expt, qq=TRUE, ma=FALSE))

dirty_norm <- sm(normalize_expt(mtb_expt, transform="log2", convert="cpm", norm="quant", filter=TRUE))
dirty_metrics <- sm(graph_metrics(dirty_norm))

initial_metrics$libsize

initial_metrics$nonzero

initial_metrics$boxplot

initial_metrics$density

## Now the normalized plots
dirty_metrics$pcaplot
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

### hmm too many things going on for me to see
dirty_info <- pca_information(
  dirty_norm,
  expt_factors=c("condition", "batch", "rnaisolationdate", "libraryprepdate",
                 "infectiondate", "rnacollectiondate"),
  plot_pcas=TRUE)
## More shallow curves in these plots suggest more genes in this principle component.

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

dirty_info$anova_fstat_heatmap

## hmm so PC2 is pretty hard up for all the metadata factors we don't want
dirty_info$pca_plots$PC1_PC3
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## to my eyes, pulling PCs 1 and 3 puts everybody except the esx samples in reasonably good order.
## In a moment, lets look at those jerks alone

3.1 Look at a few subsets alone

## Pull just Rv vs. uninfected
rv_uninf <- subset_expt(dirty_norm, subset="condition=='Rv'|condition=='Uninfected'")
## There were 19, now there are 6 samples.
rv_uninf_pca <- plot_pca(rv_uninf)
summary(rv_uninf_pca)
##          Length Class      Mode   
## pca      3      -none-     list   
## plot     9      gg         list   
## table    8      data.frame list   
## res      4      data.frame list   
## variance 5      -none-     numeric
rv_uninf_pca$plot
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## ok, so RV is left with 3 batches intersecting on E while uninfected is on right with 2 separate batches.
## 66+% of the variance is in PCA1/2

## the two strains labeled 'esx' arbitrarily chosen
esx <- subset_expt(dirty_norm, subset="condition=='delta_esxA'|condition=='delta_esxA_comp'")
## There were 19, now there are 8 samples.
esx_pca <- plot_pca(esx)
esx_pca$plot

def <- subset_expt(dirty_norm, subset="batch=='D'|batch=='E'|batch=='F'")
## There were 19, now there are 8 samples.
tt <- pca_information(
  def,
  expt_factors=c("condition", "batch", "rnaisolationdate", "libraryprepdate",
                 "infectiondate", "rnacollectiondate"),
  plot_pcas=TRUE)
## More shallow curves in these plots suggest more genes in this principle component.

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## The standard deviation was 0 for rnaisolationdate and PC1.
## The standard deviation was 0 for rnaisolationdate and PC2.
## The standard deviation was 0 for rnaisolationdate and PC3.
## The standard deviation was 0 for rnaisolationdate and PC4.
## The standard deviation was 0 for rnaisolationdate and PC5.
## The standard deviation was 0 for rnaisolationdate and PC6.
## The standard deviation was 0 for libraryprepdate and PC1.
## The standard deviation was 0 for libraryprepdate and PC2.
## The standard deviation was 0 for libraryprepdate and PC3.
## The standard deviation was 0 for libraryprepdate and PC4.
## The standard deviation was 0 for libraryprepdate and PC5.
## The standard deviation was 0 for libraryprepdate and PC6.

plot_pca(def)$plot
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

4 Backing up

Quick conversation with Keith, acquired some other ideas about the meta-information Therefore made a new design sheet with a couple extra factors

atb_norm <- normalize_expt(mtb_atb, transform="log2", convert="cpm", filter=TRUE, norm="quant")
## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(hpgl(data))))
## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize
## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: hpgl
## Removing 29475 low-count genes (11616 remaining).
## Step 2: normalizing the data with quant.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## transform_counts: Found 58 values equal to 0, adding 1 to the matrix.
## Step 5: not doing batch correction.
atb_pca <- pca_information(
  atb_norm,
  num_components=5,
  expt_factors=c("condition1", "condition", "batch", "ifnadded", "infectstatus",
                 "rnaisolationdate", "libraryprepdate", "infectiondate", "rnacollectiondate"),
  plot_pcas=TRUE)
## More shallow curves in these plots suggest more genes in this principle component.

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

atb_pca$anova_p
##                            PC1      PC2        PC3    PC4    PC5
## condition1        0.0001992444 0.035224 0.36320890 0.8672 0.8384
## condition         0.0689378520 0.309709 0.72108528 0.7170 0.5377
## batch             0.0151229165 0.384079 0.00871834 0.3846 0.5685
## ifnadded          0.0001003657 0.006406 0.74148980 0.8211 0.8428
## infectstatus      0.0000002038 0.176583 0.32590940 0.8795 0.6271
## rnaisolationdate  0.0034424334 0.441193 0.00142764 0.4846 0.6499
## libraryprepdate   0.0034424334 0.441193 0.00142764 0.4846 0.6499
## infectiondate     0.2267300056 0.537462 0.00009331 0.8140 0.6277
## rnacollectiondate 0.0103726974 0.413962 0.00610853 0.2198 0.7327
fun <- compare_surrogate_estimates(
  mtb_atb,
  extra_factors=c("ifnadded", "infectstatus", "rnaisolationdate"))
## The expt has not been filtered, set filter_type/filter_it if you want other options.
## The be method chose 3 surrogate variable(s).
## Attempting pca surrogate estimation with 3 surrogates.
## The be method chose 3 surrogate variable(s).
## Attempting sva supervised surrogate estimation with 3 surrogates.
## The be method chose 3 surrogate variable(s).
## Attempting sva unsupervised surrogate estimation with 3 surrogates.
## The be method chose 3 surrogate variable(s).
## Attempting ruvseq supervised surrogate estimation with 3 surrogates.
## The be method chose 3 surrogate variable(s).
## Attempting ruvseq residual surrogate estimation with 3 surrogates.
## The be method chose 3 surrogate variable(s).
## Attempting ruvseq empirical surrogate estimation with 3 surrogates.
## 1/8: Performing lmFit(data) etc. with null in the model.
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)

## 2/8: Performing lmFit(data) etc. with + batch_adjustments$batch in the model.
## 3/8: Performing lmFit(data) etc. with + batch_adjustments$pca in the model.
## 4/8: Performing lmFit(data) etc. with + batch_adjustments$sva_sup in the model.
## 5/8: Performing lmFit(data) etc. with + batch_adjustments$sva_unsup in the model.
## 6/8: Performing lmFit(data) etc. with + batch_adjustments$ruv_sup in the model.
## 7/8: Performing lmFit(data) etc. with + batch_adjustments$ruv_resid in the model.
## 8/8: Performing lmFit(data) etc. with + batch_adjustments$ruv_emp in the model.

fun$pca_plots
## $null
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## 
## $pca
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## 
## $svasup
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## 
## $svaunsup
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## 
## $ruvsup
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## 
## $ruvresid
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## 
## $ruvemp
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

if (!isTRUE(get0("skip_load"))) {
  pander::pander(sessionInfo())
  message(paste0("This is hpgltools commit: ", get_git_commit()))
  this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
  message(paste0("Saving to ", this_save))
  tmp <- sm(saveme(filename=this_save))
}
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 7de4503f6bb5724c28cce24af5dbee22bb1c0cae
## R> packrat::restore()
## This is hpgltools commit: Thu Apr 12 22:08:53 2018 -0400: 7de4503f6bb5724c28cce24af5dbee22bb1c0cae
## Saving to index-v20180330.rda.xz
LS0tCnRpdGxlOiAiTS50dWJlcmN1bG9zaXMgMjAxNjogQSBzbWFsbCBjb2xsZWN0aW9uIG9mIGFuYWx5c2VzLiIKYXV0aG9yOiAiYXRiIGFiZWxld0BnbWFpbC5jb20iCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCIKb3V0cHV0OgogaHRtbF9kb2N1bWVudDoKICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgY29kZV9mb2xkaW5nOiBzaG93CiAgZmlnX2NhcHRpb246IHRydWUKICBmaWdfaGVpZ2h0OiA3CiAgZmlnX3dpZHRoOiA3CiAgaGlnaGxpZ2h0OiBkZWZhdWx0CiAga2VlcF9tZDogZmFsc2UKICBtb2RlOiBzZWxmY29udGFpbmVkCiAgbnVtYmVyX3NlY3Rpb25zOiB0cnVlCiAgc2VsZl9jb250YWluZWQ6IHRydWUKICB0aGVtZTogcmVhZGFibGUKICB0b2M6IHRydWUKICB0b2NfZmxvYXQ6CiAgICBjb2xsYXBzZWQ6IGZhbHNlCiAgICBzbW9vdGhfc2Nyb2xsOiBmYWxzZQotLS0KCjxzdHlsZT4KICBib2R5IC5tYWluLWNvbnRhaW5lciB7CiAgICBtYXgtd2lkdGg6IDE2MDBweDsKICB9Cjwvc3R5bGU+CgpgYGB7ciBvcHRpb25zLCBpbmNsdWRlPUZBTFNFfQppZiAoIWlzVFJVRShnZXQwKCJza2lwX2xvYWQiKSkpIHsKICBsaWJyYXJ5KGhwZ2x0b29scykKICB0dCA8LSBkZXZ0b29sczo6bG9hZF9hbGwoIn4vaHBnbHRvb2xzIikKICBrbml0cjo6b3B0c19rbml0JHNldChwcm9ncmVzcz1UUlVFLAogICAgICAgICAgICAgICAgICAgICAgIHZlcmJvc2U9VFJVRSwKICAgICAgICAgICAgICAgICAgICAgICB3aWR0aD05MCwKICAgICAgICAgICAgICAgICAgICAgICBlY2hvPVRSVUUpCiAga25pdHI6Om9wdHNfY2h1bmskc2V0KGVycm9yPVRSVUUsCiAgICAgICAgICAgICAgICAgICAgICAgIGZpZy53aWR0aD04LAogICAgICAgICAgICAgICAgICAgICAgICBmaWcuaGVpZ2h0PTgsCiAgICAgICAgICAgICAgICAgICAgICAgIGRwaT05NikKICBvbGRfb3B0aW9ucyA8LSBvcHRpb25zKGRpZ2l0cz00LAogICAgICAgICAgICAgICAgICAgICAgICAgc3RyaW5nc0FzRmFjdG9ycz1GQUxTRSwKICAgICAgICAgICAgICAgICAgICAgICAgIGtuaXRyLmR1cGxpY2F0ZS5sYWJlbD0iYWxsb3ciKQogIGdncGxvdDI6OnRoZW1lX3NldChnZ3Bsb3QyOjp0aGVtZV9idyhiYXNlX3NpemU9MTApKQogIHZlciA8LSAiMjAxODAzMzAiCiAgcHJldmlvdXNfZmlsZSA8LSAiaW5kZXguUm1kIgoKICB0bXAgPC0gdHJ5KHNtKGxvYWRtZShmaWxlbmFtZT1wYXN0ZTAoZ3N1YihwYXR0ZXJuPSJcXC5SbWQiLCByZXBsYWNlPSIiLCB4PXByZXZpb3VzX2ZpbGUpLCAiLXYiLCB2ZXIsICIucmRhLnh6IikpKSkKICBybWRfZmlsZSA8LSAiaW5kZXguUm1kIgp9CmBgYAoKIyBSTkEgc2VxdWVuY2luZyBvZiBNeWNvcGxhc21hIHR1YmVyY3Vsb3NpcwoKS2VpdGggc2VudCBtZSBhIHNhbXBsZSBzaGVldCBkZXNjcmliaW5nIHNvbWUgTXRiIGRhdGEuICBJIGRvIG5vdCBxdWl0ZSBrbm93IHdoYXQgdGhlIGxvbmctdGVybSBnb2FscyBhcmUsIGJ1dCB0aGUgaW1tZWRpYXRlCmdvYWwgaXMgdG8gdW5kZXJzdGFuZCBiZXR0ZXIgc29tZSBvZiB0aGUgc291cmNlcyBvZiB2YXJpYW5jZSBhbmQvb3IgYmF0Y2ggZWZmZWN0KHMpIGluIHRoZSBkYXRhLgoKIyBQb2tlIGF0IHRoZSBtdGIgZGF0YQoKQmVjYXVzZSBJIGFtIGFuIGFzc2hhdCwgSSBjaGFuZ2VkIDIgY29sdW1uIGhlYWRpbmdzOiBTYW1wbGUgTmFtZSAtPiBTYW1wbGUgSUQgYW5kIGZpbGVwYXRoIC0+IGZpbGUKCmBgYHtyIGxvYWRpbmcsIGVjaG89RkFMU0V9Cm10Yl9leHB0IDwtIGNyZWF0ZV9leHB0KCJtb3VzZV9tdGJfaWZuYi5jc3YiKQptdGJfZXhwdCRkZXNpZ24KbXRiX2F0YiA8LSBjcmVhdGVfZXhwdCgibW91c2VfbXRiX2lmbmItYXRiLmNzdiIpCmBgYAoKIyBTaW1wbGUgbWV0cmljcwoKYGBge3IgbWV0cmljcywgc2hvdy5maWc9ImhpZGUifQppbml0aWFsX21ldHJpY3MgPC0gc20oZ3JhcGhfbWV0cmljcyhtdGJfZXhwdCwgcXE9VFJVRSwgbWE9RkFMU0UpKQpkaXJ0eV9ub3JtIDwtIHNtKG5vcm1hbGl6ZV9leHB0KG10Yl9leHB0LCB0cmFuc2Zvcm09ImxvZzIiLCBjb252ZXJ0PSJjcG0iLCBub3JtPSJxdWFudCIsIGZpbHRlcj1UUlVFKSkKZGlydHlfbWV0cmljcyA8LSBzbShncmFwaF9tZXRyaWNzKGRpcnR5X25vcm0pKQpgYGAKCmBgYHtyIHByaW50X21ldHJpY3N9CmluaXRpYWxfbWV0cmljcyRsaWJzaXplCmluaXRpYWxfbWV0cmljcyRub256ZXJvCmluaXRpYWxfbWV0cmljcyRib3hwbG90CmluaXRpYWxfbWV0cmljcyRkZW5zaXR5CgojIyBOb3cgdGhlIG5vcm1hbGl6ZWQgcGxvdHMKZGlydHlfbWV0cmljcyRwY2FwbG90CiMjIyBobW0gdG9vIG1hbnkgdGhpbmdzIGdvaW5nIG9uIGZvciBtZSB0byBzZWUKZGlydHlfaW5mbyA8LSBwY2FfaW5mb3JtYXRpb24oCiAgZGlydHlfbm9ybSwKICBleHB0X2ZhY3RvcnM9YygiY29uZGl0aW9uIiwgImJhdGNoIiwgInJuYWlzb2xhdGlvbmRhdGUiLCAibGlicmFyeXByZXBkYXRlIiwKICAgICAgICAgICAgICAgICAiaW5mZWN0aW9uZGF0ZSIsICJybmFjb2xsZWN0aW9uZGF0ZSIpLAogIHBsb3RfcGNhcz1UUlVFKQpkaXJ0eV9pbmZvJGFub3ZhX2ZzdGF0X2hlYXRtYXAKIyMgaG1tIHNvIFBDMiBpcyBwcmV0dHkgaGFyZCB1cCBmb3IgYWxsIHRoZSBtZXRhZGF0YSBmYWN0b3JzIHdlIGRvbid0IHdhbnQKZGlydHlfaW5mbyRwY2FfcGxvdHMkUEMxX1BDMwojIyB0byBteSBleWVzLCBwdWxsaW5nIFBDcyAxIGFuZCAzIHB1dHMgZXZlcnlib2R5IGV4Y2VwdCB0aGUgZXN4IHNhbXBsZXMgaW4gcmVhc29uYWJseSBnb29kIG9yZGVyLgojIyBJbiBhIG1vbWVudCwgbGV0cyBsb29rIGF0IHRob3NlIGplcmtzIGFsb25lCmBgYAoKIyMgTG9vayBhdCBhIGZldyBzdWJzZXRzIGFsb25lCgpgYGB7ciBzdWJzZXRzfQojIyBQdWxsIGp1c3QgUnYgdnMuIHVuaW5mZWN0ZWQKcnZfdW5pbmYgPC0gc3Vic2V0X2V4cHQoZGlydHlfbm9ybSwgc3Vic2V0PSJjb25kaXRpb249PSdSdid8Y29uZGl0aW9uPT0nVW5pbmZlY3RlZCciKQpydl91bmluZl9wY2EgPC0gcGxvdF9wY2EocnZfdW5pbmYpCnN1bW1hcnkocnZfdW5pbmZfcGNhKQpydl91bmluZl9wY2EkcGxvdAojIyBvaywgc28gUlYgaXMgbGVmdCB3aXRoIDMgYmF0Y2hlcyBpbnRlcnNlY3Rpbmcgb24gRSB3aGlsZSB1bmluZmVjdGVkIGlzIG9uIHJpZ2h0IHdpdGggMiBzZXBhcmF0ZSBiYXRjaGVzLgojIyA2NislIG9mIHRoZSB2YXJpYW5jZSBpcyBpbiBQQ0ExLzIKCiMjIHRoZSB0d28gc3RyYWlucyBsYWJlbGVkICdlc3gnIGFyYml0cmFyaWx5IGNob3Nlbgplc3ggPC0gc3Vic2V0X2V4cHQoZGlydHlfbm9ybSwgc3Vic2V0PSJjb25kaXRpb249PSdkZWx0YV9lc3hBJ3xjb25kaXRpb249PSdkZWx0YV9lc3hBX2NvbXAnIikKZXN4X3BjYSA8LSBwbG90X3BjYShlc3gpCmVzeF9wY2EkcGxvdAoKZGVmIDwtIHN1YnNldF9leHB0KGRpcnR5X25vcm0sIHN1YnNldD0iYmF0Y2g9PSdEJ3xiYXRjaD09J0UnfGJhdGNoPT0nRiciKQp0dCA8LSBwY2FfaW5mb3JtYXRpb24oCiAgZGVmLAogIGV4cHRfZmFjdG9ycz1jKCJjb25kaXRpb24iLCAiYmF0Y2giLCAicm5haXNvbGF0aW9uZGF0ZSIsICJsaWJyYXJ5cHJlcGRhdGUiLAogICAgICAgICAgICAgICAgICJpbmZlY3Rpb25kYXRlIiwgInJuYWNvbGxlY3Rpb25kYXRlIiksCiAgcGxvdF9wY2FzPVRSVUUpCnBsb3RfcGNhKGRlZikkcGxvdApgYGAKCiMgQmFja2luZyB1cAoKUXVpY2sgY29udmVyc2F0aW9uIHdpdGggS2VpdGgsIGFjcXVpcmVkIHNvbWUgb3RoZXIgaWRlYXMgYWJvdXQgdGhlIG1ldGEtaW5mb3JtYXRpb24KVGhlcmVmb3JlIG1hZGUgYSBuZXcgZGVzaWduIHNoZWV0IHdpdGggYSBjb3VwbGUgZXh0cmEgZmFjdG9ycwoKYGBge3IgZXhwbG9yZV9hdGJ9CmF0Yl9ub3JtIDwtIG5vcm1hbGl6ZV9leHB0KG10Yl9hdGIsIHRyYW5zZm9ybT0ibG9nMiIsIGNvbnZlcnQ9ImNwbSIsIGZpbHRlcj1UUlVFLCBub3JtPSJxdWFudCIpCmF0Yl9wY2EgPC0gcGNhX2luZm9ybWF0aW9uKAogIGF0Yl9ub3JtLAogIG51bV9jb21wb25lbnRzPTUsCiAgZXhwdF9mYWN0b3JzPWMoImNvbmRpdGlvbjEiLCAiY29uZGl0aW9uIiwgImJhdGNoIiwgImlmbmFkZGVkIiwgImluZmVjdHN0YXR1cyIsCiAgICAgICAgICAgICAgICAgInJuYWlzb2xhdGlvbmRhdGUiLCAibGlicmFyeXByZXBkYXRlIiwgImluZmVjdGlvbmRhdGUiLCAicm5hY29sbGVjdGlvbmRhdGUiKSwKICBwbG90X3BjYXM9VFJVRSkKYXRiX3BjYSRhbm92YV9wCgpmdW4gPC0gY29tcGFyZV9zdXJyb2dhdGVfZXN0aW1hdGVzKAogIG10Yl9hdGIsCiAgZXh0cmFfZmFjdG9ycz1jKCJpZm5hZGRlZCIsICJpbmZlY3RzdGF0dXMiLCAicm5haXNvbGF0aW9uZGF0ZSIpKQpmdW4kcGNhX3Bsb3RzCmBgYAoKYGBge3Igc2F2ZW1lfQppZiAoIWlzVFJVRShnZXQwKCJza2lwX2xvYWQiKSkpIHsKICBwYW5kZXI6OnBhbmRlcihzZXNzaW9uSW5mbygpKQogIG1lc3NhZ2UocGFzdGUwKCJUaGlzIGlzIGhwZ2x0b29scyBjb21taXQ6ICIsIGdldF9naXRfY29tbWl0KCkpKQogIHRoaXNfc2F2ZSA8LSBwYXN0ZTAoZ3N1YihwYXR0ZXJuPSJcXC5SbWQiLCByZXBsYWNlPSIiLCB4PXJtZF9maWxlKSwgIi12IiwgdmVyLCAiLnJkYS54eiIpCiAgbWVzc2FnZShwYXN0ZTAoIlNhdmluZyB0byAiLCB0aGlzX3NhdmUpKQogIHRtcCA8LSBzbShzYXZlbWUoZmlsZW5hbWU9dGhpc19zYXZlKSkKfQpgYGAK