1 Annotation version: 20180822

1.1 Genome annotation input

There are a few methods of importing annotation data into R. The following are two attempts, the second is currently being used in these analyses.

1.2 OrganismDb

Since this document was originally written, I have made substantial changes to how I create, load, and manipulate the eupathdb annotation data. As a result, this needs to be significantly reworked.

AnnotationHub is the new and fancier version of what OrganismDb does. Keith already made these for the parasites though, lets try and use one of those.

Assuming the above packages got created, we may load them and extract the annotation data.

## Starting metadata download.
## Finished metadata download.
## Found the following hits: Leishmania major strain Friedlin, Leishmania major strain LV39c5, Leishmania major strain SD 75.1, choosing the first.
## org.Lmajor.Friedlin.v39.eg.db
## Starting metadata download.
## Finished metadata download.
## Found the following hits: Leishmania panamensis MHOM/COL/81/L13, choosing the first.
## org.Lpanamensis.MHOMCOL81L13.v39.eg.db
## Starting metadata download.
## Finished metadata download.
## Found the following hits: Leishmania braziliensis MHOM/BR/75/M2903, Leishmania braziliensis MHOM/BR/75/M2904, choosing the first.
## org.Lbraziliensis.MHOMBR75M2903.v39.eg.db

2 Putting the pieces together

The macrophage experiment has samples across 2 contexts, the host and parasite. The following block sets up one experiment for each. If you open the all_samples-species.xlsx files, you will note immediately that a few different attempts were made at ascertaining the most likely experimental factors that contributed to the readily apparent batch effects.

2.1 The human transcriptome mappings

Keep in mind that if I change the experimental design with new annotations, I must therefore regenerate the following.

sampleid experimentname tubelabel alias condition batch anotherbatch snpclade snpcladev2 snpcladev3 pathogenstrain label donor time pctmappedparasite pctcategory state sourcelab expperson pathogen host hostcelltype noofhostcells infectionperiodhpitimeofharvest moiexposure parasitespercell pctinf rnangul rnaqcpassed libraryconst libqcpassed index descriptonandremarks observation lowercaseid humanfile parasitefile file
HPGL0241 HPGL0241 macrophage TM130-Nil (Blue label) Nil uninf a a undef undef undef none uninf_1 d130 undef undef 0 uninfected Ade Adriana none Human Human macs Max 2 mill 2h - 24h chase period NA unknown unknown 468 Y Wanderson Y 1 Uninfected human macrophages NA hpgl0241 preprocessing/hpgl0241/outputs/tophat_hsapiens/accepted_paired.count.xz undef null

2.2 The parasite transcriptome mappings

The first three rows of the parasite experimental design.
sampleid experimentname tubelabel alias condition batch anotherbatch snpclade snpcladev2 snpcladev3 pathogenstrain label donor time pctmappedparasite pctcategory state sourcelab expperson pathogen host hostcelltype noofhostcells infectionperiodhpitimeofharvest moiexposure parasitespercell pctinf rnangul rnaqcpassed libraryconst libqcpassed index descriptonandremarks observation lowercaseid humanfile parasitefile file
HPGL0242 HPGL0242 macrophage TM130-2271 Self-Healing sh a a white whitepink right s2271 sh_2271 d130 undef 30 3 self_heal Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 276 Y Wanderson Y 8 Infected human macrophages. NA hpgl0242 preprocessing/hpgl0242/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0242/outputs/tophat_lpanamensis/accepted_paired.count.xz null
HPGL0243 HPGL0243 macrophage TM130-2272 Self-Healing sh a a white whitepink right s2272 sh_2272 d130 undef 30 3 self_heal Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 532 Y Wanderson Y 10 Infected human macrophages NA hpgl0243 preprocessing/hpgl0243/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0243/outputs/tophat_lpanamensis/accepted_paired.count.xz null
HPGL0638 HPGL0638 macrophage TM130-2189 Self-Healing sh b b pink whitepink right s2189 sh_2189 d130 undef 55 5 self_heal Ade Adriana Lp Human Human macs Max 2 mill 2h - 24h chase period 0.0486111111111111 unknown unknown 37 Y Adelaida Y 4 Infected human macrophages /RNA QC 2013-03-26 sample TM130.8 Library constructed with 342 ng total RNA. hpgl0638 preprocessing/hpgl0638/outputs/tophat_hsapiens/accepted_paired.count.xz preprocessing/hpgl0638/outputs/tophat_lpanamensis/accepted_paired.count.xz null

3 Supplemental Table 1

Table S1 is going to be a summary of the metadata in all_samples-combined This may also include some of the numbers regarding mapping %, etc.

Wanted columns:

  • Sample ID: HPGLxxxx
  • Donor Code: TM130 or PG1xx
  • Cell Type: Macrophage or PBMC
  • Infection Status: Infected or Uninfected
  • Disease Outcome: Chronic or Self-Healing or NA
  • Batch: A or B (macrophage); NA for PBMC
  • Number of reads that passed Illumina filter
  • Number of reads after trimming
  • Number of reads mapped - human
  • % reads mapped - human
  • Number of reads mapped - L.panamensis
  • % reads mapped - L.panamensis

Use the Tcruzi colors.

  • A1 is a large title: “Macrophage Samples”
  • Row 2 is the blue column headings
  • 3-m contains Macrophage metadata
  • m+1 is blank
  • m+2 is a large title: “PBMC Samples”
  • m+3-n contains PBMC metadata

4 End

At this point, we should have everything necessary to perform the various analyses of the 4 sub-experiments. So save the current data for reuse elsewhere.

The experimental design is available here.

## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 7c4477bb4fa3639cc6cf7940216e4c4b8cbee7ce
## This is hpgltools commit: Fri Oct 26 17:27:11 2018 -0400: 7c4477bb4fa3639cc6cf7940216e4c4b8cbee7ce
## Saving to 01_annotation_20180822-v20180822.rda.xz

R version 3.5.1 (2018-07-02)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.utf8, LC_NUMERIC=C, LC_TIME=en_US.utf8, LC_COLLATE=en_US.utf8, LC_MONETARY=en_US.utf8, LC_MESSAGES=en_US.utf8, LC_PAPER=en_US.utf8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.utf8 and LC_IDENTIFICATION=C

attached base packages: stats4, parallel, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: Homo.sapiens(v.1.3.1), TxDb.Hsapiens.UCSC.hg19.knownGene(v.3.2.2), org.Hs.eg.db(v.3.6.0), GO.db(v.3.6.0), OrganismDbi(v.1.22.0), GenomicFeatures(v.1.32.3), GenomicRanges(v.1.32.7), GenomeInfoDb(v.1.16.0), org.Lbraziliensis.MHOMBR75M2903.v38.eg.db(v.2018.08), org.Lpanamensis.MHOMCOL81L13.v38.eg.db(v.2018.08), org.Lmajor.Friedlin.v38.eg.db(v.2018.08), AnnotationDbi(v.1.42.1), IRanges(v.2.14.12), S4Vectors(v.0.18.3), Biobase(v.2.40.0), AnnotationHub(v.2.12.1), BiocGenerics(v.0.26.0) and hpgltools(v.2018.03)

loaded via a namespace (and not attached): bitops(v.1.0-6), matrixStats(v.0.54.0), devtools(v.1.13.6), bit64(v.0.9-7), RColorBrewer(v.1.1-2), progress(v.1.2.0), httr(v.1.3.1), rprojroot(v.1.3-2), tools(v.3.5.1), backports(v.1.1.2), R6(v.2.3.0), DBI(v.1.0.0), lazyeval(v.0.2.1), colorspace(v.1.3-2), withr(v.2.1.2), tidyselect(v.0.2.5), prettyunits(v.1.0.2), bit(v.1.1-14), curl(v.3.2), compiler(v.3.5.1), graph(v.1.58.2), xml2(v.1.2.0), DelayedArray(v.0.6.6), rtracklayer(v.1.40.6), labeling(v.0.3), scales(v.1.0.0), RBGL(v.1.56.0), commonmark(v.1.6), stringr(v.1.3.1), digest(v.0.6.18), Rsamtools(v.1.32.3), rmarkdown(v.1.10), XVector(v.0.20.0), base64enc(v.0.1-3), pkgconfig(v.2.0.2), htmltools(v.0.3.6), highr(v.0.7), rlang(v.0.3.0), RSQLite(v.2.1.1), BiocInstaller(v.1.30.0), shiny(v.1.1.0), bindr(v.0.1.1), BiocParallel(v.1.14.2), zip(v.1.0.0), dplyr(v.0.7.7), RCurl(v.1.95-4.11), magrittr(v.1.5), GenomeInfoDbData(v.1.1.0), Matrix(v.1.2-14), Rcpp(v.0.12.19), munsell(v.0.5.0), stringi(v.1.2.4), yaml(v.2.2.0), SummarizedExperiment(v.1.10.1), zlibbioc(v.1.26.0), plyr(v.1.8.4), grid(v.3.5.1), blob(v.1.1.1), promises(v.1.0.1), crayon(v.1.3.4), lattice(v.0.20-35), Biostrings(v.2.48.0), pander(v.0.6.2), hms(v.0.4.2), knitr(v.1.20), pillar(v.1.3.0), codetools(v.0.2-15), biomaRt(v.2.36.1), XML(v.3.98-1.16), glue(v.1.3.0), evaluate(v.0.12), data.table(v.1.11.8), httpuv(v.1.4.5), foreach(v.1.4.4), gtable(v.0.2.0), purrr(v.0.2.5), assertthat(v.0.2.0), ggplot2(v.3.0.0), openxlsx(v.4.1.0), mime(v.0.6), xtable(v.1.8-3), roxygen2(v.6.1.0), later(v.0.7.5), tibble(v.1.4.2), iterators(v.1.0.10), GenomicAlignments(v.1.16.0), memoise(v.1.1.0), bindrcpp(v.0.2.2) and interactiveDisplayBase(v.1.18.0)

