1 Annotation version: 20190914

1.1 Genome annotation input

There are a few methods of importing annotation data into R. The following are two attempts, the second is currently being used in these analyses.

1.2 OrganismDb

Since this document was originally written, I have made substantial changes to how I create, load, and manipulate the eupathdb annotation data. As a result, this needs to be significantly reworked.

AnnotationHub is the new and fancier version of what OrganismDb does. Keith already made these for the parasites though, lets try and use one of those.

Assuming the above packages got created, we may load them and extract the annotation data.

## Loading required package: GenomeInfoDbData
## 
## This is EuPathDB version 1.6.0
##  Read 'EuPathDB()' to get started.
## 
## Attaching package: 'EuPathDB'
## The following objects are masked from 'package:hpgltools':
## 
##     download_uniprot_proteome, get_kegg_orgn,
##     load_kegg_annotations, load_orgdb_annotations, load_orgdb_go,
##     load_uniprot_annotations, orgdb_from_ah
## Found the following hits: Leishmania major strain Friedlin, Leishmania major strain LV39c5, Leishmania major strain SD 75.1, choosing the first.
## Using: Leishmania major strain Friedlin.
## org.Lmajor.Friedlin.v45.eg.db
## Found the following hits: Leishmania panamensis MHOM/COL/81/L13, Leishmania panamensis strain MHOM/PA/94/PSC-1, choosing the first.
## Using: Leishmania panamensis MHOM/COL/81/L13.
## org.Lpanamensis.MHOMCOL81L13.v45.eg.db

2 Putting the pieces together

The macrophage experiment has samples across 2 contexts, the host and parasite. The following block sets up one experiment for each. If you open the all_samples-species.xlsx files, you will note immediately that a few different attempts were made at ascertaining the most likely experimental factors that contributed to the readily apparent batch effects.

2.1 The human transcriptome mappings

Keep in mind that if I change the experimental design with new annotations, I must therefore regenerate the following.

sampleid pathogenstrain experimentname tubelabel alias condition batch anotherbatch snpclade snpcladev2 snpcladev3 pathogenstrain.1 label donor time pctmappedparasite pctcategory state sourcelab expperson pathogen host hostcelltype noofhostcells infectionperiodhpitimeofharvest moiexposure parasitespercell pctinf rnangul rnaqcpassed libraryconst libqcpassed index descriptonandremarks observation lowercaseid humanfile parasitefile bcftable file
HPGL0241 HPGL0241 none macrophage TM130-Nil (Blue label) Nil uninf a a undef undef undef none uninf_1 d130 undef undef 0 uninfected Ade Adriana none Human Human macs Max 2 mill 2h - 24h chase period NA unknown unknown 468 Y Wanderson Y 1 Uninfected human macrophages NA hpgl0241 preprocessing/hpgl0241/outputs/tophat_hsapiens/accepted_paired.count.xz undef undef null

2.2 The parasite transcriptome mappings

The first three rows of the parasite experimental design.
sampleid pathogenstrain experimentname tubelabel alias condition batch anotherbatch snpclade snpcladev2 snpcladev3 pathogenstrain.1 label donor time pctmappedparasite pctcategory state sourcelab expperson pathogen host hostcelltype noofhostcells infectionperiodhpitimeofharvest moiexposure parasitespercell pctinf rnangul rnaqcpassed libraryconst libqcpassed index descriptonandremarks observation lowercaseid humanfile parasitefile bcftable file
HPGL0242 HPGL0242 s2271 macrophage TM130-2271 Self-Healing sh a a white whitepink right s2271 sh_2271 d130 undef 30 3 self_heal Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 276 Y Wanderson Y 8 Infected human macrophages. NA hpgl0242 preprocessing/hpgl0242/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0242/outputs/tophat_lpanamensis/accepted_paired.count.xz preprocessing/outputs/hpgl0242_parsed_count.txt null
HPGL0244 HPGL0244 s5433 macrophage TM130-5433 Chronic chr a a blue_self blue left s5433 chr_5433 d130 undef 15 1 chronic Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 261 Y Wanderson Y 27 Infected human macrophages NA hpgl0244 preprocessing/hpgl0244/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0244/outputs/tophat_lpanamensis/accepted_paired.count.xz preprocessing/outputs/hpgl0244_parsed_count.txt null
HPGL0245 HPGL0245 s1320 macrophage TM130-1320 Chronic chr a a multicolor yellowbrownmulti right s1320 chr_1320 d130 undef 40 4 chronic Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 199 Y Wanderson Y 11 Infected human macrophages NA hpgl0245 preprocessing/hpgl0245/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0245/outputs/tophat_lpanamensis/accepted_paired.count.xz preprocessing/outputs/hpgl0245_parsed_count.txt null

3 Supplemental Table 1

Table S1 is going to be a summary of the metadata in all_samples-combined This may also include some of the numbers regarding mapping %, etc.

Wanted columns:

  • Sample ID: HPGLxxxx
  • Donor Code: TM130 or PG1xx
  • Cell Type: Macrophage or PBMC
  • Infection Status: Infected or Uninfected
  • Disease Outcome: Chronic or Self-Healing or NA
  • Batch: A or B (macrophage); NA for PBMC
  • Number of reads that passed Illumina filter
  • Number of reads after trimming
  • Number of reads mapped - human
  • % reads mapped - human
  • Number of reads mapped - L.panamensis
  • % reads mapped - L.panamensis

Use the Tcruzi colors.

  • A1 is a large title: “Macrophage Samples”
  • Row 2 is the blue column headings
  • 3-m contains Macrophage metadata
  • m+1 is blank
  • m+2 is a large title: “PBMC Samples”
  • m+3-n contains PBMC metadata

4 End

At this point, we should have everything necessary to perform the various analyses of the 4 sub-experiments. So save the current data for reuse elsewhere.

The experimental design is available here.

## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset f3c1e03852c87dc60c7e72e726bb640572e695ff
## This is hpgltools commit: Thu Aug 22 15:32:44 2019 -0400: f3c1e03852c87dc60c7e72e726bb640572e695ff
## Saving to 01_annotation_20190914-v20190914.rda.xz
## The savefile is: /cbcbsub/fs/cbcb-lab/nelsayed/scratch/atb/rnaseq/lpanamensis_2016/savefiles/01_annotation_20190914-v20190914.rda.xz
## The file does not yet exist.
## The save string is: con <- pipe(paste0('pxz > /cbcbsub/fs/cbcb-lab/nelsayed/scratch/atb/rnaseq/lpanamensis_2016/savefiles/01_annotation_20190914-v20190914.rda.xz'), 'wb'); save(list=ls(all.names=TRUE, envir=globalenv()),
##      envir=globalenv(), file=con, compress=FALSE); close(con)

R version 3.6.0 (2019-04-26)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.utf8, LC_NUMERIC=C, LC_TIME=en_US.utf8, LC_COLLATE=en_US.utf8, LC_MONETARY=en_US.utf8, LC_MESSAGES=en_US.utf8, LC_PAPER=en_US.utf8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.utf8 and LC_IDENTIFICATION=C

attached base packages: stats4, parallel, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: Homo.sapiens(v.1.3.1), TxDb.Hsapiens.UCSC.hg19.knownGene(v.3.2.2), org.Hs.eg.db(v.3.8.2), GO.db(v.3.8.2), OrganismDbi(v.1.26.0), GenomicFeatures(v.1.36.4), GenomicRanges(v.1.36.1), GenomeInfoDb(v.1.20.0), org.Lpanamensis.MHOMCOL81L13.v45.eg.db(v.2019.09), org.Lmajor.Friedlin.v45.eg.db(v.2019.09), AnnotationDbi(v.1.46.1), IRanges(v.2.18.2), S4Vectors(v.0.22.1), futile.logger(v.1.4.3), EuPathDB(v.1.6.0), GenomeInfoDbData(v.1.2.1), hpgltools(v.1.0), Biobase(v.2.44.0) and BiocGenerics(v.0.30.0)

loaded via a namespace (and not attached): RUnit(v.0.4.32), tidyselect(v.0.2.5), lme4(v.1.1-21), RSQLite(v.2.1.2), htmlwidgets(v.1.3), grid(v.3.6.0), BiocParallel(v.1.18.1), devtools(v.2.2.0), munsell(v.0.5.0), codetools(v.0.2-16), DT(v.0.8), withr(v.2.1.2), colorspace(v.1.4-1), GOSemSim(v.2.10.0), highr(v.0.8), knitr(v.1.24), rstudioapi(v.0.10), DOSE(v.3.10.2), labeling(v.0.3), urltools(v.1.7.3), polyclip(v.1.10-0), bit64(v.0.9-7), farver(v.1.1.0), rprojroot(v.1.3-2), vctrs(v.0.2.0), lambda.r(v.1.2.3), xfun(v.0.9), BiocFileCache(v.1.8.0), R6(v.2.4.0), doParallel(v.1.0.15), graphlayouts(v.0.5.0), bitops(v.1.0-6), fgsea(v.1.10.1), gridGraphics(v.0.4-1), DelayedArray(v.0.10.0), assertthat(v.0.2.1), promises(v.1.0.1), scales(v.1.0.0), ggraph(v.2.0.0), enrichplot(v.1.4.0), gtable(v.0.3.0), biocViews(v.1.52.2), sva(v.3.32.1), processx(v.3.4.1), tidygraph(v.1.1.2), rlang(v.0.4.0), zeallot(v.0.1.0), genefilter(v.1.66.0), splines(v.3.6.0), rtracklayer(v.1.44.4), lazyeval(v.0.2.2), europepmc(v.0.3), BiocManager(v.1.30.4), yaml(v.2.2.0), reshape2(v.1.4.3), backports(v.1.1.4), httpuv(v.1.5.2), qvalue(v.2.16.0), RBGL(v.1.60.0), clusterProfiler(v.3.12.0), tools(v.3.6.0), usethis(v.1.5.1), ggplotify(v.0.0.4), ggplot2(v.3.2.1), ellipsis(v.0.2.0.1), gplots(v.3.0.1.1), RColorBrewer(v.1.1-2), sessioninfo(v.1.1.1), ggridges(v.0.5.1), Rcpp(v.1.0.2), plyr(v.1.8.4), base64enc(v.0.1-3), progress(v.1.2.2), zlibbioc(v.1.30.0), purrr(v.0.3.2), RCurl(v.1.95-4.12), ps(v.1.3.0), prettyunits(v.1.0.2), viridis(v.0.5.1), cowplot(v.1.0.0), SummarizedExperiment(v.1.14.1), ggrepel(v.0.8.1), colorRamps(v.2.3), fs(v.1.3.1), variancePartition(v.1.14.0), magrittr(v.1.5), futile.options(v.1.0.1), data.table(v.1.12.2), openxlsx(v.4.1.0.1), DO.db(v.2.9), triebeard(v.0.3.0), matrixStats(v.0.55.0), pkgload(v.1.0.2), hms(v.0.5.1), mime(v.0.7), evaluate(v.0.14), xtable(v.1.8-4), pbkrtest(v.0.4-7), XML(v.3.98-1.20), gridExtra(v.2.3), testthat(v.2.2.1), compiler(v.3.6.0), biomaRt(v.2.40.4), tibble(v.2.1.3), KernSmooth(v.2.23-15), crayon(v.1.3.4), minqa(v.1.2.4), htmltools(v.0.3.6), mgcv(v.1.8-28), later(v.0.8.0), tidyr(v.0.8.3), DBI(v.1.0.0), formatR(v.1.7), tweenr(v.1.0.1), dbplyr(v.1.4.2), MASS(v.7.3-51.4), rappdirs(v.0.3.1), boot(v.1.3-23), Matrix(v.1.2-17), cli(v.1.1.0), gdata(v.2.18.0), igraph(v.1.2.4.1), pkgconfig(v.2.0.2), rvcheck(v.0.1.3), GenomicAlignments(v.1.20.1), xml2(v.1.2.2), foreach(v.1.4.7), annotate(v.1.62.0), XVector(v.0.24.0), AnnotationForge(v.1.26.0), rvest(v.0.3.4), stringr(v.1.4.0), callr(v.3.3.1), digest(v.0.6.20), graph(v.1.62.0), Biostrings(v.2.52.0), rmarkdown(v.1.15), fastmatch(v.1.1-0), curl(v.4.1), shiny(v.1.3.2), Rsamtools(v.2.0.0), gtools(v.3.8.1), nloptr(v.1.2.1), nlme(v.3.1-141), jsonlite(v.1.6), desc(v.1.2.0), viridisLite(v.0.3.0), limma(v.3.40.6), pillar(v.1.4.2), lattice(v.0.20-38), httr(v.1.4.1), pkgbuild(v.1.0.5), survival(v.2.44-1.1), interactiveDisplayBase(v.1.22.0), glue(v.1.3.1), remotes(v.2.1.0), zip(v.2.0.4), UpSetR(v.1.4.0), iterators(v.1.0.12), pander(v.0.6.3), bit(v.1.1-14), ggforce(v.0.3.1), stringi(v.1.4.3), blob(v.1.2.0), AnnotationHub(v.2.16.1), caTools(v.1.17.1.2), AnnotationHubData(v.1.14.0), memoise(v.1.1.0), rBiopaxParser(v.2.24.0) and dplyr(v.0.8.3)

