1 Annotation version: 20190205

1.1 Genome annotation input

There are a few methods of importing annotation data into R. The following are two attempts, the second is currently being used in these analyses.

1.2 OrganismDb

Since this document was originally written, I have made substantial changes to how I create, load, and manipulate the eupathdb annotation data. As a result, this needs to be significantly reworked.

AnnotationHub is the new and fancier version of what OrganismDb does. Keith already made these for the parasites though, lets try and use one of those.

Assuming the above packages got created, we may load them and extract the annotation data.

## Starting metadata download.
## Finished metadata download.
## Found the following hits: Leishmania major strain Friedlin, Leishmania major strain LV39c5, Leishmania major strain SD 75.1, choosing the first.
## org.Lmajor.Friedlin.v41.eg.db
## Starting metadata download.
## Finished metadata download.
## Found the following hits: Leishmania panamensis MHOM/COL/81/L13, Leishmania panamensis strain MHOM/PA/94/PSC-1, choosing the first.
## org.Lpanamensis.MHOMCOL81L13.v41.eg.db
## Starting metadata download.
## Finished metadata download.
## Found the following hits: Leishmania braziliensis MHOM/BR/75/M2903, Leishmania braziliensis MHOM/BR/75/M2904, choosing the first.
## org.Lbraziliensis.MHOMBR75M2903.v41.eg.db

1.4 Extracting Cell Types

Maria Adelaida requested adding the xCell cell types to the data.

##             Length Class             Mode     
## spill           3  -none-            list     
## spill.array     3  -none-            list     
## signatures    489  GeneSetCollection list     
## genes       10808  -none-            character
## Loading required package: annotate
## Loading required package: XML
## Loading required package: graph
## 
## Attaching package: 'graph'
## The following object is masked from 'package:XML':
## 
##     addNode
## setName: aDC%HPCA%1.txt 
## geneIds: C1QA, C1QB, ..., CCL22 (total: 8)
## geneIdType: Null
## collectionType: Null 
## setIdentifier: PEDS-092FVH8-LT:623:Tue Jun  6 14:36:33 2017:2
## description: 
## organism: 
## pubMedIds: 
## urls: 
## contributor: 
## setVersion: 0.0.1
## creationDate:
##  [1] "aDC%HPCA%1.txt"          "aDC%HPCA%2.txt"         
##  [3] "aDC%HPCA%3.txt"          "aDC%IRIS%1.txt"         
##  [5] "aDC%IRIS%2.txt"          "aDC%IRIS%3.txt"         
##  [7] "Adipocytes%ENCODE%1.txt" "Adipocytes%ENCODE%2.txt"
##  [9] "Adipocytes%ENCODE%3.txt" "Adipocytes%FANTOM%1.txt"

2 Putting the pieces together

The macrophage experiment has samples across 2 contexts, the host and parasite. The following block sets up one experiment for each. If you open the all_samples-species.xlsx files, you will note immediately that a few different attempts were made at ascertaining the most likely experimental factors that contributed to the readily apparent batch effects.

2.1 The human transcriptome mappings

Keep in mind that if I change the experimental design with new annotations, I must therefore regenerate the following.

sampleid pathogenstrain experimentname tubelabel alias condition batch anotherbatch snpclade snpcladev2 snpcladev3 pathogenstrain.1 label donor time pctmappedparasite pctcategory state sourcelab expperson pathogen host hostcelltype noofhostcells infectionperiodhpitimeofharvest moiexposure parasitespercell pctinf rnangul rnaqcpassed libraryconst libqcpassed index descriptonandremarks observation lowercaseid humanfile parasitefile file
HPGL0241 HPGL0241 none macrophage TM130-Nil (Blue label) Nil uninf a a undef undef undef none uninf_1 d130 undef undef 0 uninfected Ade Adriana none Human Human macs Max 2 mill 2h - 24h chase period NA unknown unknown 468 Y Wanderson Y 1 Uninfected human macrophages NA hpgl0241 preprocessing/hpgl0241/outputs/tophat_hsapiens/accepted_paired.count.xz undef null

2.2 The parasite transcriptome mappings

The first three rows of the parasite experimental design.
sampleid pathogenstrain experimentname tubelabel alias condition batch anotherbatch snpclade snpcladev2 snpcladev3 pathogenstrain.1 label donor time pctmappedparasite pctcategory state sourcelab expperson pathogen host hostcelltype noofhostcells infectionperiodhpitimeofharvest moiexposure parasitespercell pctinf rnangul rnaqcpassed libraryconst libqcpassed index descriptonandremarks observation lowercaseid humanfile parasitefile file
HPGL0242 HPGL0242 s2271 macrophage TM130-2271 Self-Healing sh a a white whitepink right s2271 sh_2271 d130 undef 30 3 self_heal Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 276 Y Wanderson Y 8 Infected human macrophages. NA hpgl0242 preprocessing/hpgl0242/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0242/outputs/tophat_lpanamensis/accepted_paired.count.xz null
HPGL0244 HPGL0244 s5433 macrophage TM130-5433 Chronic chr a a blue_self blue left s5433 chr_5433 d130 undef 15 1 chronic Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 261 Y Wanderson Y 27 Infected human macrophages NA hpgl0244 preprocessing/hpgl0244/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0244/outputs/tophat_lpanamensis/accepted_paired.count.xz null
HPGL0245 HPGL0245 s1320 macrophage TM130-1320 Chronic chr a a multicolor yellowbrownmulti right s1320 chr_1320 d130 undef 40 4 chronic Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 199 Y Wanderson Y 11 Infected human macrophages NA hpgl0245 preprocessing/hpgl0245/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0245/outputs/tophat_lpanamensis/accepted_paired.count.xz null

3 Supplemental Table 1

Table S1 is going to be a summary of the metadata in all_samples-combined This may also include some of the numbers regarding mapping %, etc.

Wanted columns:

  • Sample ID: HPGLxxxx
  • Donor Code: TM130 or PG1xx
  • Cell Type: Macrophage or PBMC
  • Infection Status: Infected or Uninfected
  • Disease Outcome: Chronic or Self-Healing or NA
  • Batch: A or B (macrophage); NA for PBMC
  • Number of reads that passed Illumina filter
  • Number of reads after trimming
  • Number of reads mapped - human
  • % reads mapped - human
  • Number of reads mapped - L.panamensis
  • % reads mapped - L.panamensis

Use the Tcruzi colors.

  • A1 is a large title: “Macrophage Samples”
  • Row 2 is the blue column headings
  • 3-m contains Macrophage metadata
  • m+1 is blank
  • m+2 is a large title: “PBMC Samples”
  • m+3-n contains PBMC metadata

4 End

At this point, we should have everything necessary to perform the various analyses of the 4 sub-experiments. So save the current data for reuse elsewhere.

The experimental design is available here.

## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 45b7ec8bad3682886965df63000dc4e98bcbf69f
## This is hpgltools commit: Wed Feb 6 16:38:23 2019 -0500: 45b7ec8bad3682886965df63000dc4e98bcbf69f
## Saving to 01_annotation_20190205-v20190205.rda.xz
## The savefile is: /cbcb/nelsayed-scratch/atb/rnaseq/lpanamensis_2016/savefiles/01_annotation_20190205-v20190205.rda.xz
## Renaming /cbcb/nelsayed-scratch/atb/rnaseq/lpanamensis_2016/savefiles/01_annotation_20190205-v20190205.rda.xz to /cbcb/nelsayed-scratch/atb/rnaseq/lpanamensis_2016/savefiles/01_annotation_20190205-v20190205.rda.xz.01.
## The save string is: con <- pipe(paste0('pxz -T6 > /cbcb/nelsayed-scratch/atb/rnaseq/lpanamensis_2016/savefiles/01_annotation_20190205-v20190205.rda.xz'), 'wb'); save(list=ls(all.names=TRUE, envir=globalenv()),
##      envir=globalenv(), file=con, compress=FALSE); close(con)

R version 3.5.2 (2018-12-20)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.utf8, LC_NUMERIC=C, LC_TIME=en_US.utf8, LC_COLLATE=en_US.utf8, LC_MONETARY=en_US.utf8, LC_MESSAGES=en_US.utf8, LC_PAPER=en_US.utf8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.utf8 and LC_IDENTIFICATION=C

attached base packages: stats4, parallel, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: Homo.sapiens(v.1.3.1), TxDb.Hsapiens.UCSC.hg19.knownGene(v.3.2.2), org.Hs.eg.db(v.3.7.0), GO.db(v.3.7.0), OrganismDbi(v.1.24.0), GenomicFeatures(v.1.34.1), GenomicRanges(v.1.34.0), GenomeInfoDb(v.1.18.1), GSEABase(v.1.44.0), graph(v.1.60.0), annotate(v.1.60.0), XML(v.3.98-1.16), xCell(v.1.1.0), org.Lbraziliensis.MHOMBR75M2903.v38.eg.db(v.2018.08), org.Lpanamensis.MHOMCOL81L13.v38.eg.db(v.2018.08), org.Lmajor.Friedlin.v38.eg.db(v.2018.08), AnnotationDbi(v.1.44.0), IRanges(v.2.16.0), S4Vectors(v.0.20.1), AnnotationHub(v.2.14.2), hpgltools(v.2018.11), Biobase(v.2.42.0) and BiocGenerics(v.0.28.0)

loaded via a namespace (and not attached): tidyselect(v.0.2.5), lme4(v.1.1-19), RSQLite(v.2.1.1), grid(v.3.5.2), BiocParallel(v.1.16.5), devtools(v.2.0.1), munsell(v.0.5.0), codetools(v.0.2-16), units(v.0.6-2), withr(v.2.1.2), colorspace(v.1.4-0), GOSemSim(v.2.8.0), highr(v.0.7), knitr(v.1.21), rstudioapi(v.0.9.0), DOSE(v.3.8.2), labeling(v.0.3), urltools(v.1.7.1), GenomeInfoDbData(v.1.2.0), bit64(v.0.9-7), farver(v.1.1.0), rprojroot(v.1.3-2), xfun(v.0.4), R6(v.2.3.0), doParallel(v.1.0.14), bitops(v.1.0-6), fgsea(v.1.8.0), gridGraphics(v.0.3-0), DelayedArray(v.0.8.0), assertthat(v.0.2.0), promises(v.1.0.1), scales(v.1.0.0), ggraph(v.1.0.2), enrichplot(v.1.2.0), gtable(v.0.2.0), sva(v.3.30.1), processx(v.3.2.1), rlang(v.0.3.1), genefilter(v.1.64.0), splines(v.3.5.2), rtracklayer(v.1.42.1), lazyeval(v.0.2.1), europepmc(v.0.3), BiocManager(v.1.30.4), yaml(v.2.2.0), reshape2(v.1.4.3), backports(v.1.1.3), httpuv(v.1.4.5.1), qvalue(v.2.14.1), RBGL(v.1.58.1), clusterProfiler(v.3.10.1), tools(v.3.5.2), usethis(v.1.4.0), ggplotify(v.0.0.3), ggplot2(v.3.1.0), gplots(v.3.0.1), RColorBrewer(v.1.1-2), sessioninfo(v.1.1.1), ggridges(v.0.5.1), Rcpp(v.1.0.0), plyr(v.1.8.4), base64enc(v.0.1-3), progress(v.1.2.0), zlibbioc(v.1.28.0), purrr(v.0.2.5), RCurl(v.1.95-4.11), ps(v.1.3.0), prettyunits(v.1.0.2), viridis(v.0.5.1), cowplot(v.0.9.4), SummarizedExperiment(v.1.12.0), ggrepel(v.0.8.0), colorRamps(v.2.3), fs(v.1.2.6), variancePartition(v.1.12.1), magrittr(v.1.5), data.table(v.1.12.0), openxlsx(v.4.1.0), DO.db(v.2.9), triebeard(v.0.3.0), packrat(v.0.5.0), matrixStats(v.0.54.0), pkgload(v.1.0.2), hms(v.0.4.2), mime(v.0.6), evaluate(v.0.12), GSVA(v.1.30.0), xtable(v.1.8-3), pbkrtest(v.0.4-7), gridExtra(v.2.3), testthat(v.2.0.1), compiler(v.3.5.2), biomaRt(v.2.38.0), tibble(v.2.0.1), KernSmooth(v.2.23-15), crayon(v.1.3.4), minqa(v.1.2.4), htmltools(v.0.3.6), mgcv(v.1.8-26), later(v.0.7.5.9000), tidyr(v.0.8.2), geneplotter(v.1.60.0), DBI(v.1.0.0), tweenr(v.1.0.1), MASS(v.7.3-51.1), Matrix(v.1.2-15), cli(v.1.0.1), gdata(v.2.18.0), bindr(v.0.1.1), igraph(v.1.2.2), pkgconfig(v.2.0.2), rvcheck(v.0.1.3), GenomicAlignments(v.1.18.1), xml2(v.1.2.0), foreach(v.1.4.4), XVector(v.0.22.0), rvest(v.0.3.2), stringr(v.1.3.1), callr(v.3.1.1), digest(v.0.6.18), pracma(v.2.2.2), Biostrings(v.2.50.2), EuPathDB(v.1.0.1), rmarkdown(v.1.11), fastmatch(v.1.1-0), curl(v.3.3), shiny(v.1.2.0), Rsamtools(v.1.34.0), gtools(v.3.8.1), nloptr(v.1.2.1), nlme(v.3.1-137), jsonlite(v.1.6), bindrcpp(v.0.2.2), desc(v.1.2.0), viridisLite(v.0.3.0), limma(v.3.38.3), pillar(v.1.3.1), lattice(v.0.20-38), httr(v.1.4.0), pkgbuild(v.1.0.2), survival(v.2.43-3), interactiveDisplayBase(v.1.20.0), glue(v.1.3.0), remotes(v.2.0.2), zip(v.1.0.0), UpSetR(v.1.3.3), shinythemes(v.1.1.2), iterators(v.1.0.10), pander(v.0.6.3), bit(v.1.1-14), ggforce(v.0.1.3), stringi(v.1.2.4), blob(v.1.1.1), caTools(v.1.17.1.1), memoise(v.1.1.0) and dplyr(v.0.7.8)

