1 Sample Estimation, Macrophages: 20180822

This document is concerned with analyzing RNAseq data of human and parasite during the infection of human macrophages. A few different strains of L. panamensis were used; the experiment is therefore segregated into groups named ‘self-healing’, ‘chronic’, and ‘uninfected’. Two separate sets of libraries were generated, one earlier set with greater coverage, and a later set including only 1 uninfected sample, and 2-3 chronic samples.

2 Figure S1

Figure S1 should include nice versions of the sample metrics. The normalization chosen is batch-in-model.

First, however, we will make some plots of the raw data.

Sample names are going to be ‘infectionstate_strainnumber’ : chr_7721

  • Panel A: Library sizes.
  • Panel B: Heatmap distance raw.
  • Panel C: PCA
  • Panel D: TSNE

First look for clustering patterns in the raw data, in all data followed by only the CDS regions.

2.1 Write S1 images

Once the experiment has been written with the various normalizations in place, we can use that to print and view some representative plots.

## Writing the image to: images/figure_s1a_cds.pdf and calling dev.off().

## Writing the image to: images/figure_s1b_cds.pdf and calling dev.off().

## Writing the image to: images/figure_s1c_cds.pdf and calling dev.off().

## Writing the image to: images/figure_s1d_cds.pdf and calling dev.off().

3 Figure 1

  • Figure 1a distance heatmap of the normalized data.
  • Figure 1b is a PCA of the normalized data.

In a similar fashion to what is above, I am printing the figure 1 images.

## Writing the image to: images/figure_1a_cds.pdf and calling dev.off().

## Writing the image to: images/figure_1b_cds.pdf and calling dev.off().

sampleid condition batch batch_int colors labels PC1 PC2 pc_1 pc_2 pc_3 pc_4 pc_5 pc_6 pc_7 pc_8 pc_9 pc_10
uninf_1 HPGL0241 uninf a 1 #009900 HPGL0241 -0.5987 -0.2058 -0.5987 -0.2058 0.2790 -0.1521 0.4206 0.1964 0.2007 0.2818 -0.0099 0.2684
uninf_2 HPGL0637 uninf b 2 #009900 HPGL0637 0.1306 -0.4520 0.1306 -0.4520 0.5594 0.2434 -0.2295 0.1217 -0.0561 -0.0296 0.0009 -0.4940
sh_2271 HPGL0242 sh a 1 #000099 HPGL0242 -0.3149 0.0726 -0.3149 0.0726 0.0099 -0.1401 -0.5103 -0.0125 -0.0491 -0.4678 -0.4939 0.2432
sh_2272 HPGL0243 sh a 1 #000099 HPGL0243 -0.1489 -0.2395 -0.1489 -0.2395 -0.1900 0.0505 -0.0781 -0.2077 -0.3461 -0.2708 0.7086 0.2155
sh_2189 HPGL0638 sh b 2 #000099 HPGL0638 0.0391 -0.1299 0.0391 -0.1299 -0.3943 -0.6842 0.0739 0.0102 0.0430 0.0219 -0.0258 -0.5085
sh_1022 HPGL0639 sh b 2 #000099 HPGL0639 -0.3629 0.5818 -0.3629 0.5818 -0.1045 0.4153 0.1574 -0.1299 -0.0116 -0.0384 0.0331 -0.4595
chr_5433 HPGL0244 chr a 1 #990000 HPGL0244 0.4558 0.1832 0.4558 0.1832 0.2411 -0.0841 0.5662 0.0524 -0.0595 -0.4941 -0.0872 0.1547
chr_1320 HPGL0245 chr a 1 #990000 HPGL0245 0.2144 -0.1355 0.2144 -0.1355 -0.2775 0.2678 -0.1065 -0.0517 0.8093 -0.0492 0.0875 0.1302
chr_2504 HPGL0246 chr a 1 #990000 HPGL0246 0.2220 0.3663 0.2220 0.3663 0.3686 -0.2812 -0.2576 -0.4617 0.0324 0.4324 0.1288 0.1630
chr_5430 HPGL0247 chr a 1 #990000 HPGL0247 0.1616 -0.2929 0.1616 -0.2929 -0.3317 0.3060 0.1752 -0.2835 -0.3661 0.3492 -0.4554 0.1385
chr_5397 HPGL0248 chr a 1 #990000 HPGL0248 0.2020 0.2518 0.2020 0.2518 -0.1600 0.0588 -0.2114 0.7664 -0.1969 0.2647 0.1134 0.1486

4 Sample metrics

The following is mostly used to compare different methodologies and is therefore not likely to be useful for most.

The various metrics used to describe and examine the data come once before, and once after normalization.

Some other analyses performed suggest a possible switch between two samples. We can artifically see what the data would look like if that is true by switching those two samples in a separate experimental design.

5 TODO 201611 #4

PBMC: variance partition human, “Are there any genes with variance we can ascribe to condition?” That set of genes is the most interesting. – use the variance parition table and pull the top 200. Report back the % variance by condition for all of these genes.

6 Variance Partition

In the following, I will attempt to find the variance associated with different experimental factors in the macrophage data with mappings against the human transcriptome.

## There were 25, now there are 7 samples.
## The new colors are a character, changing according to condition.

Now lets try removing some of the surrogates, the two primary candidates are the strains which I proxy with 3 columns: snpclade, snpcladev2, and snpcladev3; and the batch. In this data set batch is either a or b.

In theory, sva should pick up both of those in one invocation.

6.1 Start with a pca from sva

## NULL

6.2 Try limma’s removebatcheffect

Another method might be to try using limmaresid to explicitly pull both columns.

Another method might be to try pca on two separate invocations.

6.2.1 Show the plots from above.

## NULL
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 95ee596d3b8007c0f633314bc34b759e00fb3132
## This is hpgltools commit: Mon Feb 4 11:44:16 2019 -0500: 95ee596d3b8007c0f633314bc34b759e00fb3132
## Saving to 02_estimation_macrophage_20180822-v20180822.rda.xz

R version 3.5.2 (2018-12-20)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.utf8, LC_NUMERIC=C, LC_TIME=en_US.utf8, LC_COLLATE=en_US.utf8, LC_MONETARY=en_US.utf8, LC_MESSAGES=en_US.utf8, LC_PAPER=en_US.utf8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.utf8 and LC_IDENTIFICATION=C

attached base packages: parallel, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: variancePartition(v.1.12.1), ruv(v.0.9.7), bindrcpp(v.0.2.2), hpgltools(v.2018.11), Biobase(v.2.42.0) and BiocGenerics(v.0.28.0)

loaded via a namespace (and not attached): backports(v.1.1.3), fastmatch(v.1.1-0), plyr(v.1.8.4), igraph(v.1.2.2), lazyeval(v.0.2.1), splines(v.3.5.2), BiocParallel(v.1.16.5), usethis(v.1.4.0), GenomeInfoDb(v.1.18.1), ggplot2(v.3.1.0), urltools(v.1.7.1), sva(v.3.30.1), digest(v.0.6.18), foreach(v.1.4.4), htmltools(v.0.3.6), GOSemSim(v.2.8.0), viridis(v.0.5.1), GO.db(v.3.7.0), gdata(v.2.18.0), magrittr(v.1.5), memoise(v.1.1.0), doParallel(v.1.0.14), openxlsx(v.4.1.0), limma(v.3.38.3), remotes(v.2.0.2), Biostrings(v.2.50.2), annotate(v.1.60.0), matrixStats(v.0.54.0), enrichplot(v.1.2.0), prettyunits(v.1.0.2), colorspace(v.1.4-0), blob(v.1.1.1), ggrepel(v.0.8.0), xfun(v.0.4), dplyr(v.0.7.8), jsonlite(v.1.6), callr(v.3.1.1), crayon(v.1.3.4), RCurl(v.1.95-4.11), graph(v.1.60.0), genefilter(v.1.64.0), lme4(v.1.1-19), bindr(v.0.1.1), survival(v.2.43-3), iterators(v.1.0.10), glue(v.1.3.0), gtable(v.0.2.0), zlibbioc(v.1.28.0), XVector(v.0.22.0), UpSetR(v.1.3.3), DelayedArray(v.0.8.0), pkgbuild(v.1.0.2), scales(v.1.0.0), DOSE(v.3.8.2), edgeR(v.3.24.3), DBI(v.1.0.0), Rcpp(v.1.0.0), viridisLite(v.0.3.0), xtable(v.1.8-3), progress(v.1.2.0), units(v.0.6-2), gridGraphics(v.0.3-0), bit(v.1.1-14), europepmc(v.0.3), preprocessCore(v.1.45.0), OrganismDbi(v.1.24.0), stats4(v.3.5.2), httr(v.1.4.0), fgsea(v.1.8.0), gplots(v.3.0.1), RColorBrewer(v.1.1-2), pkgconfig(v.2.0.2), XML(v.3.98-1.16), farver(v.1.1.0), locfit(v.1.5-9.1), labeling(v.0.3), ggplotify(v.0.0.3), tidyselect(v.0.2.5), rlang(v.0.3.1), reshape2(v.1.4.3), AnnotationDbi(v.1.44.0), munsell(v.0.5.0), tools(v.3.5.2), cli(v.1.0.1), RSQLite(v.2.1.1), ggridges(v.0.5.1), devtools(v.2.0.1), evaluate(v.0.12), stringr(v.1.3.1), yaml(v.2.2.0), processx(v.3.2.1), knitr(v.1.21), bit64(v.0.9-7), fs(v.1.2.6), pander(v.0.6.3), zip(v.1.0.0), caTools(v.1.17.1.1), purrr(v.0.2.5), ggraph(v.1.0.2), packrat(v.0.5.0), RBGL(v.1.58.1), nlme(v.3.1-137), xml2(v.1.2.0), DO.db(v.2.9), biomaRt(v.2.38.0), compiler(v.3.5.2), pbkrtest(v.0.4-7), rstudioapi(v.0.9.0), testthat(v.2.0.1), tibble(v.2.0.1), tweenr(v.1.0.1), stringi(v.1.2.4), highr(v.0.7), ps(v.1.3.0), GenomicFeatures(v.1.34.1), desc(v.1.2.0), lattice(v.0.20-38), Matrix(v.1.2-15), nloptr(v.1.2.1), pillar(v.1.3.1), BiocManager(v.1.30.4), triebeard(v.0.3.0), corpcor(v.1.6.9), data.table(v.1.12.0), cowplot(v.0.9.4), bitops(v.1.0-6), rtracklayer(v.1.42.1), GenomicRanges(v.1.34.0), qvalue(v.2.14.1), colorRamps(v.2.3), directlabels(v.2018.05.22), R6(v.2.3.0), KernSmooth(v.2.23-15), gridExtra(v.2.3), IRanges(v.2.16.0), sessioninfo(v.1.1.1), codetools(v.0.2-16), MASS(v.7.3-51.1), gtools(v.3.8.1), assertthat(v.0.2.0), pkgload(v.1.0.2), SummarizedExperiment(v.1.12.0), rprojroot(v.1.3-2), withr(v.2.1.2), GenomicAlignments(v.1.18.1), Rsamtools(v.1.34.0), S4Vectors(v.0.20.1), GenomeInfoDbData(v.1.2.0), mgcv(v.1.8-26), hms(v.0.4.2), clusterProfiler(v.3.10.1), quadprog(v.1.5-5), grid(v.3.5.2), tidyr(v.0.8.2), minqa(v.1.2.4), rvcheck(v.0.1.3), rmarkdown(v.1.11), Rtsne(v.0.15), ggforce(v.0.1.3) and base64enc(v.0.1-3)

