1 Sample Estimation, Spyogenes: 20180717

This document is concerned with analyzing TNSeq from S.pyogenes.

rpmi_expt <- subset_expt(expt=sp_expt, subset="experiment=='metal homeostasis'")

## There were 48, now there are 42 samples.

rpmi_metrics <- graph_metrics(expt=rpmi_expt)

## Graphing number of non-zero genes with respect to CPM by library.

## Graphing library sizes.

## Graphing a boxplot.

## This data will benefit from being displayed on the log scale.

## If this is not desired, set scale='raw'

## Some entries are 0.  We are on log scale, adding 1 to the data.

## Changed 10893 zero count features.

## Graphing a correlation heatmap.

## Graphing a standard median correlation.

## Performing correlation.

## Graphing a distance heatmap.

## Graphing a standard median distance.

## Performing distance.

## Graphing a PCA plot.

## There is just one batch in this data.

## Graphing a T-SNE plot.

## There is just one batch in this data.

## Plotting a density plot.

## This data will benefit from being displayed on the log scale.

## If this is not desired, set scale='raw'

## Some entries are 0.  We are on log scale, setting them to 0.5.

## Changed 10893 zero count features.

## Plotting the representation of the top-n genes.

## Printing a color to condition legend.

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

rpmi_filt <- sm(normalize_expt(rpmi_expt, filter=TRUE))
rpmi_norm <- sm(normalize_expt(rpmi_expt, filter=TRUE, convert="cpm", norm="quant", transform="log2"))

rpmi_norm_metrics <- graph_metrics(expt=rpmi_norm)

## Graphing number of non-zero genes with respect to CPM by library.

## Graphing library sizes.

## Graphing a boxplot.

## Graphing a correlation heatmap.

## Graphing a standard median correlation.

## Performing correlation.

## Graphing a distance heatmap.

## Graphing a standard median distance.

## Performing distance.

## Graphing a PCA plot.

## There is just one batch in this data.

## Graphing a T-SNE plot.

## There is just one batch in this data.

## Plotting a density plot.

## Plotting the representation of the top-n genes.

## Printing a color to condition legend.

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

1.1 Show some graphs from before normalization.

rpmi_metrics$legend

rpmi_metrics$libsize

## A few samples might be a problem: hpgl0898, hpgl0879; but I am guessing a factor
## of <4 between the highest and lowest samples should not be too big of a problem.
rpmi_metrics$density

## Nice consistent sample densities
rpmi_metrics$corheat

1.2 Now some plots from after normalization

norm_pca <- plot_pca(rpmi_norm, cis=FALSE)

## There is just one batch in this data.

norm_pca$plot

## Too few points to calculate an ellipse

## This clustering is kind of terrible.

1.3 Try a couple of surrogate variables

Given the wretched clustering observed, I figure I should try a couple tools from ruv/sva and see if they help.

rpmi_batch1 <- normalize_expt(rpmi_expt, transform="log2", convert="cpm",
                              filter=TRUE, batch="fsva")

## This function will replace the expt$expressionset slot with:

## log2(fsva(cpm(hpgl(data))))

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.

## Step 1: performing count filter with option: hpgl

## Removing 364 low-count genes (1450 remaining).

## Step 2: not normalizing the data.

## Step 3: converting the data with cpm.

## Step 4: transforming the data with log2.

## transform_counts: Found 274 values equal to 0, adding 1 to the matrix.

## Step 5: doing batch correction with fsva.

## In norm_batch, after testing logic of surrogate method/number, the
## number of surrogates is:  and the method is: be.

## Note to self:  If you get an error like 'x contains missing values'; I think this
##  means that the data has too many 0's and needs to have a better low-count filter applied.

## batch_counts: Before batch correction, 2114 entries 0<x<1.

## batch_counts: Before batch correction, 274 entries are >= 0.

## After checking/setting the number of surrogates, it is: 3.

## batch_counts: Using sva::fsva for batch correction.

## The number of elements which are < 0 after batch correction is: 84

## The variable low_to_zero sets whether to change <0 values to 0 and is: FALSE

plot_pca(rpmi_batch1)$plot

## There is just one batch in this data.

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

plot_corheat(rpmi_batch1)$plot

## This looks a bit more encouraging.

rpmi_batch2 <- normalize_expt(rpmi_expt, transform="log2", convert="cpm",
                              filter=TRUE, batch="svaseq")

## This function will replace the expt$expressionset slot with:

## log2(svaseq(cpm(hpgl(data))))

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.

## Step 1: performing count filter with option: hpgl

## Removing 364 low-count genes (1450 remaining).

## Step 2: not normalizing the data.

## Step 3: converting the data with cpm.

## Step 4: transforming the data with log2.

## transform_counts: Found 274 values equal to 0, adding 1 to the matrix.

## Step 5: doing batch correction with svaseq.

## In norm_batch, after testing logic of surrogate method/number, the
## number of surrogates is:  and the method is: be.

## Note to self:  If you get an error like 'x contains missing values'; I think this
##  means that the data has too many 0's and needs to have a better low-count filter applied.

## batch_counts: Before batch correction, 2114 entries 0<x<1.

## batch_counts: Before batch correction, 274 entries are >= 0.

## After checking/setting the number of surrogates, it is: 4.

## batch_counts: Using sva::svaseq for batch correction.

## Note to self:  If you feed svaseq a data frame you will get an error like:

## data %*% (Id - mod %*% blah blah requires numeric/complex arguments.

## The number of elements which are < 0 after batch correction is: 81

## The variable low_to_zero sets whether to change <0 values to 0 and is: FALSE

plot_pca(rpmi_batch2)$plot

## There is just one batch in this data.

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## as does this.

rpmi_batch_written <- write_expt(expt=rpmi_expt, transform="log2", convert="cpm",
                                 filter=TRUE, batch="fsva", violin=TRUE,
                                 excel=paste0("excel/rpmi_fsva-v", ver, ".xlsx"))

## Writing the legend.

## Writing the raw reads.

## Graphing the raw reads.

## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## There is just one batch in this data.

## There is just one batch in this data.

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## varpart sees only 1 batch, adjusting the model accordingly.

## Attempting mixed linear model with: ~  (1|condition)

## Fitting the expressionset to the model, this is slow.

## Projected run time: ~ 0.09 min

## Placing factor: condition at the beginning of the model.

## Writing the normalized reads.

## Graphing the normalized reads.

## Too few points to calculate an ellipse

## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse

## varpart sees only 1 batch, adjusting the model accordingly.

## Attempting mixed linear model with: ~  (1|condition)

## Fitting the expressionset to the model, this is slow.

## Projected run time: ~ 0.08 min

## Placing factor: condition at the beginning of the model.

## Writing the median reads by factor.

## The factor thy_t1 has 2 rows.

## The factor rpmi_t2 has 20 rows.

## The factor rpmi_t2_lowcu has 4 rows.

## The factor rpmi_t2_highcu has 4 rows.

## The factor rpmi_t2_lowzn has 4 rows.

## The factor rpmi_t2_highzn has 4 rows.

## The factor rpmi_t3 has 20 rows.

## The factor rpmi_t3_lowcu has 4 rows.

## The factor rpmi_t3_highcu has 4 rows.

## The factor rpmi_t3_lowzn has 4 rows.

## The factor rpmi_t3_highzn has 4 rows.

varpart_test <- varpart(expt=rpmi_filt, predictor=NULL,
                        factors=c("coverage", "replicate", "time", "cuzn", "medium"))

## varpart sees only 1 batch, adjusting the model accordingly.

## Attempting mixed linear model with: ~  (1|coverage) + (1|replicate) + (1|time) + (1|cuzn) + (1|medium)

## Fitting the expressionset to the model, this is slow.

## Projected run time: ~ 0.2 min

## Placing factor: coverage at the beginning of the model.

varpart_test$partition_plot

varpart_test$percent_plot

surrogate_test <- compare_surrogate_estimates(rpmi_filt)

## There is 1 batch in the data, fitting condition+batch will fail.

## There is just one batch in this data.

## The be method chose 4 surrogate variable(s).

## Attempting pca surrogate estimation with 4 surrogates.

## There is just one batch in this data.

## The be method chose 4 surrogate variable(s).

## Attempting sva supervised surrogate estimation with 4 surrogates.

## There is just one batch in this data.

## The be method chose 3 surrogate variable(s).

## Attempting sva unsupervised surrogate estimation with 3 surrogates.

## There is just one batch in this data.

## The be method chose 4 surrogate variable(s).

## Attempting ruvseq supervised surrogate estimation with 4 surrogates.

## There is just one batch in this data.

## The be method chose 4 surrogate variable(s).

## Attempting ruvseq residual surrogate estimation with 4 surrogates.

## There is just one batch in this data.

## The be method chose 4 surrogate variable(s).

## Attempting ruvseq empirical surrogate estimation with 4 surrogates.

## There is just one batch in this data.

## Warning in cor(first_svs): the standard deviation is zero

## 1/8: Performing lmFit(data) etc. with null in the model.

## A friendly reminder that there is only 1 batch in the data.

## 3/8: Performing lmFit(data) etc. with + batch_adjustments$pca in the model.

## 4/8: Performing lmFit(data) etc. with + batch_adjustments$sva_sup in the model.

## 5/8: Performing lmFit(data) etc. with + batch_adjustments$sva_unsup in the model.

## 6/8: Performing lmFit(data) etc. with + batch_adjustments$ruv_sup in the model.

## 7/8: Performing lmFit(data) etc. with + batch_adjustments$ruv_resid in the model.

## 8/8: Performing lmFit(data) etc. with + batch_adjustments$ruv_emp in the model.

surrogate_test$sva_unsupervised_adjust$svs_sample

rpmi_metrics$libsize

## Given the contribution of coverage in the variancePartition results above, one might
## assume that the library sizes will correspond to the surrogates detected by sva and friends.
## This appears to not be the case.
surrogate_test$plot

## it looks like the various surrogate estimators mostly agree on this data.

message(paste0("This is hpgltools commit: ", get_git_commit()))

## If you wish to reproduce this exact build of hpgltools, invoke the following:

## > git clone http://github.com/abelew/hpgltools.git

## > git reset c730ef178f8e57bbf3819e21cf5e6cfe879e6328

## R> packrat::restore()

## This is hpgltools commit: Fri Jul 13 17:21:39 2018 -0400: c730ef178f8e57bbf3819e21cf5e6cfe879e6328

pander::pander(sessionInfo())

R version 3.5.1 (2018-07-02)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.utf8, LC_NUMERIC=C, LC_TIME=en_US.utf8, LC_COLLATE=en_US.utf8, LC_MONETARY=en_US.utf8, LC_MESSAGES=en_US.utf8, LC_PAPER=en_US.utf8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.utf8 and LC_IDENTIFICATION=C

attached base packages: parallel, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: ruv(v.0.9.7), variancePartition(v.1.10.0), Biobase(v.2.40.0), BiocGenerics(v.0.26.0), foreach(v.1.4.4), ggplot2(v.3.0.0) and hpgltools(v.2018.03)

loaded via a namespace (and not attached): Rtsne(v.0.13), minqa(v.1.2.4), colorspace(v.1.3-2), hwriter(v.1.3.2), colorRamps(v.2.3), rprojroot(v.1.3-2), corpcor(v.1.6.9), XVector(v.0.20.0), GenomicRanges(v.1.32.4), base64enc(v.0.1-3), roxygen2(v.6.0.1), ggrepel(v.0.8.0), bit64(v.0.9-7), AnnotationDbi(v.1.42.1), xml2(v.1.2.0), R.methodsS3(v.1.7.1), codetools(v.0.2-15), splines(v.3.5.1), doParallel(v.1.0.11), DESeq(v.1.32.0), geneplotter(v.1.58.0), knitr(v.1.20), nloptr(v.1.0.4), Rsamtools(v.1.32.2), pbkrtest(v.0.4-7), annotate(v.1.58.0), R.oo(v.1.22.0), compiler(v.3.5.1), httr(v.1.3.1), backports(v.1.1.2), assertthat(v.0.2.0), Matrix(v.1.2-14), lazyeval(v.0.2.1), limma(v.3.36.2), htmltools(v.0.3.6), prettyunits(v.1.0.2), tools(v.3.5.1), bindrcpp(v.0.2.2), gtable(v.0.2.0), glue(v.1.2.0), GenomeInfoDbData(v.1.1.0), reshape2(v.1.4.3), dplyr(v.0.7.6), ShortRead(v.1.38.0), Rcpp(v.0.12.17), Biostrings(v.2.48.0), gdata(v.2.18.0), preprocessCore(v.1.42.0), nlme(v.3.1-137), rtracklayer(v.1.40.3), iterators(v.1.0.10), stringr(v.1.3.1), testthat(v.2.0.0), openxlsx(v.4.1.0), lme4(v.1.1-17), gtools(v.3.8.1), devtools(v.1.13.6), statmod(v.1.4.30), XML(v.3.98-1.12), edgeR(v.3.22.3), directlabels(v.2018.05.22), zlibbioc(v.1.26.0), MASS(v.7.3-50), scales(v.0.5.0), aroma.light(v.3.10.0), hms(v.0.4.2), SummarizedExperiment(v.1.10.1), RColorBrewer(v.1.1-2), yaml(v.2.1.19), memoise(v.1.1.0), RUVSeq(v.1.14.0), gridExtra(v.2.3), pander(v.0.6.2), biomaRt(v.2.36.1), latticeExtra(v.0.6-28), stringi(v.1.2.3), RSQLite(v.2.1.1), genefilter(v.1.62.0), S4Vectors(v.0.18.3), corrplot(v.0.84), GenomicFeatures(v.1.32.0), caTools(v.1.17.1), zip(v.1.0.0), BiocParallel(v.1.14.2), GenomeInfoDb(v.1.16.0), rlang(v.0.2.1), pkgconfig(v.2.0.1), commonmark(v.1.5), matrixStats(v.0.53.1), bitops(v.1.0-6), evaluate(v.0.10.1), lattice(v.0.20-35), purrr(v.0.2.5), bindr(v.0.1.1), GenomicAlignments(v.1.16.0), labeling(v.0.3), bit(v.1.1-14), tidyselect(v.0.2.4), plyr(v.1.8.4), magrittr(v.1.5), R6(v.2.2.2), IRanges(v.2.14.10), gplots(v.3.0.1), DelayedArray(v.0.6.1), DBI(v.1.0.0), pillar(v.1.3.0), withr(v.2.1.2), mgcv(v.1.8-24), survival(v.2.42-6), RCurl(v.1.95-4.11), EDASeq(v.2.14.1), tibble(v.1.4.2), crayon(v.1.3.4), KernSmooth(v.2.23-15), rmarkdown(v.1.10), progress(v.1.2.0), locfit(v.1.5-9.1), grid(v.3.5.1), sva(v.3.28.0), data.table(v.1.11.4), blob(v.1.1.1), digest(v.0.6.15), xtable(v.1.8-2), R.utils(v.2.6.0), stats4(v.3.5.1), munsell(v.0.5.0) and quadprog(v.1.5-5)

this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
message(paste0("Saving to ", this_save))

## Saving to 02_sample_estimation-v20180717.rda.xz

tmp <- sm(saveme(filename=this_save))

LS0tCnRpdGxlOiAiU2FtcGxlIGVzdGltYXRpb24gZm9yIFROU2VxIG9mIFNweW9nZW5lcyAoMjAxNyBpbmNsdWRpbmcgQW5kcmV3KS4iCmF1dGhvcjogImF0YiIKZGF0ZTogImByIFN5cy5EYXRlKClgIgpvdXRwdXQ6CiBodG1sX2RvY3VtZW50OgogIGNvZGVfZG93bmxvYWQ6IHRydWUKICBjb2RlX2ZvbGRpbmc6IHNob3cKICBmaWdfY2FwdGlvbjogdHJ1ZQogIGZpZ19oZWlnaHQ6IDcKICBmaWdfd2lkdGg6IDcKICBoaWdobGlnaHQ6IGRlZmF1bHQKICBrZWVwX21kOiBmYWxzZQogIG1vZGU6IHNlbGZjb250YWluZWQKICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICBzZWxmX2NvbnRhaW5lZDogdHJ1ZQogIHRoZW1lOiByZWFkYWJsZQogIHRvYzogdHJ1ZQogIHRvY19mbG9hdDoKICAgIGNvbGxhcHNlZDogZmFsc2UKICAgIHNtb290aF9zY3JvbGw6IGZhbHNlCi0tLQoKPHN0eWxlPgogIGJvZHkgLm1haW4tY29udGFpbmVyIHsKICAgIG1heC13aWR0aDogMTYwMHB4Owp9Cjwvc3R5bGU+CgpgYGB7ciBvcHRpb25zLCBpbmNsdWRlPUZBTFNFfQojIyBUaGVzZSBhcmUgdGhlIG9wdGlvbnMgSSB0ZW5kIHRvIGZhdm9yCmxpYnJhcnkoImhwZ2x0b29scyIpCnR0IDwtIGRldnRvb2xzOjpsb2FkX2FsbCgifi9ocGdsdG9vbHMiKQprbml0cjo6b3B0c19rbml0JHNldChwcm9ncmVzcz1UUlVFLAogICAgICAgICAgICAgICAgICAgICB2ZXJib3NlPVRSVUUsCiAgICAgICAgICAgICAgICAgICAgIHdpZHRoPTkwLAogICAgICAgICAgICAgICAgICAgICBlY2hvPVRSVUUpCmtuaXRyOjpvcHRzX2NodW5rJHNldChlcnJvcj1UUlVFLAogICAgICAgICAgICAgICAgICAgICAgZmlnLndpZHRoPTgsCiAgICAgICAgICAgICAgICAgICAgICBmaWcuaGVpZ2h0PTgsCiAgICAgICAgICAgICAgICAgICAgICBkcGk9OTYpCm9wdGlvbnMoZGlnaXRzPTQsCiAgICAgICAgc3RyaW5nc0FzRmFjdG9ycz1GQUxTRSwKICAgICAgICBrbml0ci5kdXBsaWNhdGUubGFiZWw9ImFsbG93IikKZ2dwbG90Mjo6dGhlbWVfc2V0KGdncGxvdDI6OnRoZW1lX2J3KGJhc2Vfc2l6ZT0xMCkpCnNldC5zZWVkKDEpCnByZXZpb3VzX2ZpbGUgPC0gIjAxX2Fubm90YXRpb24uUm1kIgp2ZXIgPC0gIjIwMTgwNzE3IgoKdG1wIDwtIHNtKGxvYWRtZShmaWxlbmFtZT1wYXN0ZTAoZ3N1YihwYXR0ZXJuPSJcXC5SbWQiLCByZXBsYWNlPSIiLCB4PXByZXZpb3VzX2ZpbGUpLCAiLXYiLCB2ZXIsICIucmRhLnh6IikpKQoKcm1kX2ZpbGUgPC0gIjAyX3NhbXBsZV9lc3RpbWF0aW9uLlJtZCIKYGBgCgojIFNhbXBsZSBFc3RpbWF0aW9uLCBTcHlvZ2VuZXM6IGByIHZlcmAKClRoaXMgZG9jdW1lbnQgaXMgY29uY2VybmVkIHdpdGggYW5hbHl6aW5nIFROU2VxIGZyb20gUy5weW9nZW5lcy4KCmBgYHtyIGluaXRpYWxfZXN0aW1hdGlvbiwgZmlnLnNob3c9ImhpZGUifQpycG1pX2V4cHQgPC0gc3Vic2V0X2V4cHQoZXhwdD1zcF9leHB0LCBzdWJzZXQ9ImV4cGVyaW1lbnQ9PSdtZXRhbCBob21lb3N0YXNpcyciKQpycG1pX21ldHJpY3MgPC0gZ3JhcGhfbWV0cmljcyhleHB0PXJwbWlfZXhwdCkKCnJwbWlfZmlsdCA8LSBzbShub3JtYWxpemVfZXhwdChycG1pX2V4cHQsIGZpbHRlcj1UUlVFKSkKcnBtaV9ub3JtIDwtIHNtKG5vcm1hbGl6ZV9leHB0KHJwbWlfZXhwdCwgZmlsdGVyPVRSVUUsIGNvbnZlcnQ9ImNwbSIsIG5vcm09InF1YW50IiwgdHJhbnNmb3JtPSJsb2cyIikpCgpycG1pX25vcm1fbWV0cmljcyA8LSBncmFwaF9tZXRyaWNzKGV4cHQ9cnBtaV9ub3JtKQpgYGAKCiMjIFNob3cgc29tZSBncmFwaHMgZnJvbSBiZWZvcmUgbm9ybWFsaXphdGlvbi4KCmBgYHtyIG1vdXNlX3Nob3dfaW1hZ2VzX3ByZX0KcnBtaV9tZXRyaWNzJGxlZ2VuZApycG1pX21ldHJpY3MkbGlic2l6ZQojIyBBIGZldyBzYW1wbGVzIG1pZ2h0IGJlIGEgcHJvYmxlbTogaHBnbDA4OTgsIGhwZ2wwODc5OyBidXQgSSBhbSBndWVzc2luZyBhIGZhY3RvcgojIyBvZiA8NCBiZXR3ZWVuIHRoZSBoaWdoZXN0IGFuZCBsb3dlc3Qgc2FtcGxlcyBzaG91bGQgbm90IGJlIHRvbyBiaWcgb2YgYSBwcm9ibGVtLgpycG1pX21ldHJpY3MkZGVuc2l0eQojIyBOaWNlIGNvbnNpc3RlbnQgc2FtcGxlIGRlbnNpdGllcwpycG1pX21ldHJpY3MkY29yaGVhdApgYGAKCiMjIE5vdyBzb21lIHBsb3RzIGZyb20gYWZ0ZXIgbm9ybWFsaXphdGlvbgoKYGBge3IgbW91c2Vfc2hvd19pbWFnZXNfcG9zdH0Kbm9ybV9wY2EgPC0gcGxvdF9wY2EocnBtaV9ub3JtLCBjaXM9RkFMU0UpCm5vcm1fcGNhJHBsb3QKIyMgVGhpcyBjbHVzdGVyaW5nIGlzIGtpbmQgb2YgdGVycmlibGUuCmBgYAoKIyMgVHJ5IGEgY291cGxlIG9mIHN1cnJvZ2F0ZSB2YXJpYWJsZXMKCkdpdmVuIHRoZSB3cmV0Y2hlZCBjbHVzdGVyaW5nIG9ic2VydmVkLCBJIGZpZ3VyZSBJIHNob3VsZCB0cnkgYSBjb3VwbGUgdG9vbHMKZnJvbSBydXYvc3ZhIGFuZCBzZWUgaWYgdGhleSBoZWxwLgoKYGBge3IgYmF0Y2hfdGVzdGluZ30KcnBtaV9iYXRjaDEgPC0gbm9ybWFsaXplX2V4cHQocnBtaV9leHB0LCB0cmFuc2Zvcm09ImxvZzIiLCBjb252ZXJ0PSJjcG0iLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICBmaWx0ZXI9VFJVRSwgYmF0Y2g9ImZzdmEiKQpwbG90X3BjYShycG1pX2JhdGNoMSkkcGxvdApwbG90X2NvcmhlYXQocnBtaV9iYXRjaDEpJHBsb3QKIyMgVGhpcyBsb29rcyBhIGJpdCBtb3JlIGVuY291cmFnaW5nLgoKcnBtaV9iYXRjaDIgPC0gbm9ybWFsaXplX2V4cHQocnBtaV9leHB0LCB0cmFuc2Zvcm09ImxvZzIiLCBjb252ZXJ0PSJjcG0iLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICBmaWx0ZXI9VFJVRSwgYmF0Y2g9InN2YXNlcSIpCnBsb3RfcGNhKHJwbWlfYmF0Y2gyKSRwbG90CiMjIGFzIGRvZXMgdGhpcy4KCnJwbWlfYmF0Y2hfd3JpdHRlbiA8LSB3cml0ZV9leHB0KGV4cHQ9cnBtaV9leHB0LCB0cmFuc2Zvcm09ImxvZzIiLCBjb252ZXJ0PSJjcG0iLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBmaWx0ZXI9VFJVRSwgYmF0Y2g9ImZzdmEiLCB2aW9saW49VFJVRSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZXhjZWw9cGFzdGUwKCJleGNlbC9ycG1pX2ZzdmEtdiIsIHZlciwgIi54bHN4IikpCmBgYAoKYGBge3Igc3Vycm9nYXRlX3Rlc3Rpbmd9CnZhcnBhcnRfdGVzdCA8LSB2YXJwYXJ0KGV4cHQ9cnBtaV9maWx0LCBwcmVkaWN0b3I9TlVMTCwKICAgICAgICAgICAgICAgICAgICAgICAgZmFjdG9ycz1jKCJjb3ZlcmFnZSIsICJyZXBsaWNhdGUiLCAidGltZSIsICJjdXpuIiwgIm1lZGl1bSIpKQp2YXJwYXJ0X3Rlc3QkcGFydGl0aW9uX3Bsb3QKdmFycGFydF90ZXN0JHBlcmNlbnRfcGxvdAoKc3Vycm9nYXRlX3Rlc3QgPC0gY29tcGFyZV9zdXJyb2dhdGVfZXN0aW1hdGVzKHJwbWlfZmlsdCkKCnN1cnJvZ2F0ZV90ZXN0JHN2YV91bnN1cGVydmlzZWRfYWRqdXN0JHN2c19zYW1wbGUKcnBtaV9tZXRyaWNzJGxpYnNpemUKIyMgR2l2ZW4gdGhlIGNvbnRyaWJ1dGlvbiBvZiBjb3ZlcmFnZSBpbiB0aGUgdmFyaWFuY2VQYXJ0aXRpb24gcmVzdWx0cyBhYm92ZSwgb25lIG1pZ2h0CiMjIGFzc3VtZSB0aGF0IHRoZSBsaWJyYXJ5IHNpemVzIHdpbGwgY29ycmVzcG9uZCB0byB0aGUgc3Vycm9nYXRlcyBkZXRlY3RlZCBieSBzdmEgYW5kIGZyaWVuZHMuCiMjIFRoaXMgYXBwZWFycyB0byBub3QgYmUgdGhlIGNhc2UuCnN1cnJvZ2F0ZV90ZXN0JHBsb3QKIyMgaXQgbG9va3MgbGlrZSB0aGUgdmFyaW91cyBzdXJyb2dhdGUgZXN0aW1hdG9ycyBtb3N0bHkgYWdyZWUgb24gdGhpcyBkYXRhLgpgYGAKCmBgYHtyIHNhdmVtZX0KbWVzc2FnZShwYXN0ZTAoIlRoaXMgaXMgaHBnbHRvb2xzIGNvbW1pdDogIiwgZ2V0X2dpdF9jb21taXQoKSkpCnBhbmRlcjo6cGFuZGVyKHNlc3Npb25JbmZvKCkpCnRoaXNfc2F2ZSA8LSBwYXN0ZTAoZ3N1YihwYXR0ZXJuPSJcXC5SbWQiLCByZXBsYWNlPSIiLCB4PXJtZF9maWxlKSwgIi12IiwgdmVyLCAiLnJkYS54eiIpCm1lc3NhZ2UocGFzdGUwKCJTYXZpbmcgdG8gIiwgdGhpc19zYXZlKSkKdG1wIDwtIHNtKHNhdmVtZShmaWxlbmFtZT10aGlzX3NhdmUpKQpgYGAK

Sample estimation for TNSeq of Spyogenes (2017 including Andrew).

atb

2018-07-17

1 Sample Estimation, Spyogenes: 20180717

1.1 Show some graphs from before normalization.

1.2 Now some plots from after normalization

1.3 Try a couple of surrogate variables