Annotation
We take the annotation data from ensembl’s biomart instance. The genome which was used to map the data was hg38 revision 91. My default when using biomart is to load the data from 1 year before the current date, which provides annotations which match hg38 91 almost perfectly.
hs_annot <- load_biomart_annotations()
## The biomart annotations file already exists, loading from it.
hs_annot <- hs_annot[["annotation"]]
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique=TRUE)
tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]
Sample Estimation
I used two mapping methods for this data, hisat2 and salmon. Most analyses use hisat2, which is a more traditional map-and-count method. In contrast, salmon uses what may be thought of as a indexed voting method (so that multi-matches are discounted and the votes split among all matches). Salmon also required a pre-existing database of known transcripts (though later versions may actually use mapping from things like hisat), while hisat uses the genome and a database of known transcripts (and optionally can search for splicing junctions to find new transcripts).
Generate expressionsets
The first thing to note is the large range in coverage. There are multiple samples with coverage which is too low to use. These will be removed shortly.
hs_expt <- sm(create_expt("sample_sheets/tmrc3_samples_20201105.xlsx",
file_column="hg3891hisatfile", savefile="hs_expt_all.rda",
gene_info=hs_annot))
Minimum coverage sample filtering
I arbitrarily chose 3,000,000 counts as a minimal level of coverage. We may want this to be higher.
hs_valid <- subset_expt(hs_expt, coverage=3000000)
## Subsetting given a minimal number of counts/sample.
## There were 70, now there are 64 samples.
Macrophages
These samples are rather different from all of the others. The following section is therefore written primarily in response to a separate set of emails from Olga and Maria Adelaida; here is a snippet:
Dear all, about the samples corresponding to infected macrophages with three sensitive (2.2) and three resistant (2.3) clinical strains of L. (V.) panamensis, I send you the results of parasite burden by detection of 7SLRNA. I think these results are interesting, but the sample size is very small. Doctor Najib or Trey could you please send me the quality data and PCA analysis of these samples?
and
Hi Doctor, thank you. These samples corresponding to primary macrophages infected with clinical strains 2.2 (n=3) and 2.3 (n = 3). These information is in the file: TMRC project 3: excel Host TMRC3 v1.1, rows 137 to 150.
Thus I added 3 columns to the end of the sample sheet which seek to include this information. The first is ‘drug’ and encodes both the infection state (no for the two controls and yes for everything else), the second is zymodeme which I took from the tmrc2 sample sheet, the last is drug, which is either no or sb.
macr <- subset_expt(hs_expt, subset="typeofcells=='Macrophages'")
## Using a subset expression.
## There were 70, now there are 12 samples.
macr <- set_expt_conditions(macr, fact="zymodeme")
macr <- set_expt_batches(macr, fact="macrdrug")
macr_norm <- sm(normalize_expt(macr, transform="log2", norm="quant", convert="cpm", filter=TRUE))
norm_pca <- plot_pca(macr_norm, plot_labels=FALSE)
## Not putting labels on the PC plot.
pp(file="macrophage_side_experiment/norm_pca.png", image=norm_pca$plot)
## Writing the image to: macrophage_side_experiment/norm_pca.png and calling dev.off().

plot_3d_pca(norm_pca)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## $plot
##
## $file
## [1] "3dpca.html"
macr_nb <- sm(normalize_expt(macr, batch="svaseq", filter=TRUE))
macr_nb <- sm(normalize_expt(macr_nb, norm="quant", convert="cpm", transform="log2"))
nb_pca <- plot_pca(macr_nb, plot_labels=FALSE)
## Not putting labels on the PC plot.
pp(file="macrophage_side_experiment/normbatch_pca.png", image=nb_pca$plot)
## Writing the image to: macrophage_side_experiment/normbatch_pca.png and calling dev.off().

Write the data
macr_written <- sm(write_expt(macr, excel="macrophage_side_experiment/macrophage_expt.xlsx"))
---
title: "L. panamensis 202011: A Macrophage Side Experiment"
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
 html_document:
  code_download: true
  code_folding: show
  fig_caption: true
  fig_height: 7
  fig_width: 7
  highlight: default
  keep_md: false
  mode: selfcontained
  number_sections: true
  self_contained: true
  theme: readable
  toc: true
  toc_float:
   collapsed: false
   smooth_scroll: false
---

<style>
  body .main-container {
    max-width: 1600px;
  }
</style>

```{r options, include=FALSE}
if (!isTRUE(get0("skip_load"))) {
  library(hpgltools)
  tt <- sm(devtools::load_all("/data/hpgltools"))
  knitr::opts_knit$set(progress=TRUE,
                       verbose=TRUE,
                       width=90,
                       echo=TRUE)
  knitr::opts_chunk$set(error=TRUE,
                        fig.width=8,
                        fig.height=8,
                        dpi=96)
  old_options <- options(digits=4,
                         stringsAsFactors=FALSE,
                         knitr.duplicate.label="allow")
  ggplot2::theme_set(ggplot2::theme_bw(base_size=12))
  ver <- "202011"
  previous_file <- paste0("none", ver, ".Rmd")
  rundate <- format(Sys.Date(), format="%Y%m%d")
  tmp <- try(sm(loadme(filename=gsub(pattern="\\.Rmd", replace="\\.rda\\.xz", x=previous_file))))
  rmd_file <- "macrophage_side_experiment.Rmd"
  savefile <- gsub(pattern="\\.Rmd", replace="\\.rda\\.xz", x=rmd_file)
}
```

# Introduction

This document is intended to provide an overview of TMRC3 samples which have
been sequenced.  It includes some plots and analyses showing the relationships
among the samples as well as some differential analyses when possible.

# Annotation

We take the annotation data from ensembl's biomart instance.  The genome which
was used to map the data was hg38 revision 91.  My default when using biomart is
to load the data from 1 year before the current date, which provides annotations
which match hg38 91 almost perfectly.

```{r hs_annot}
hs_annot <- load_biomart_annotations()
hs_annot <- hs_annot[["annotation"]]
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique=TRUE)
tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]
```

# Sample Estimation

I used two mapping methods for this data, hisat2 and salmon.  Most analyses use
hisat2, which is a more traditional map-and-count method.  In contrast, salmon
uses what may be thought of as a indexed voting method (so that multi-matches are
discounted and the votes split among all matches).  Salmon also required a
pre-existing database of known transcripts (though later versions may actually
use mapping from things like hisat), while hisat uses the genome and a database
of known transcripts (and optionally can search for splicing junctions to find
new transcripts).

## Generate expressionsets

The first thing to note is the large range in coverage.  There are multiple
samples with coverage which is too low to use.  These will be removed shortly.

```{r all_new_hisat2}
hs_expt <- sm(create_expt("sample_sheets/tmrc3_samples_20201105.xlsx",
                          file_column="hg3891hisatfile", savefile="hs_expt_all.rda",
                          gene_info=hs_annot))
```

## Minimum coverage sample filtering

I arbitrarily chose 3,000,000 counts as a minimal level of coverage.  We may
want this to be higher.

```{r hisat2_write, fig.show="hide"}
hs_valid <- subset_expt(hs_expt, coverage=3000000)
```

# Macrophages

These samples are rather different from all of the others.  The following
section is therefore written primarily in response to a separate set of emails
from Olga and Maria Adelaida; here is a snippet:

 Dear all, about the samples corresponding to infected macrophages with three
 sensitive (2.2) and three resistant (2.3) clinical strains of L. (V.)
 panamensis, I send you the results of parasite burden by detection of 7SLRNA. I
 think these results are interesting, but the sample size is very small.
Doctor Najib or Trey could you please send me the quality data and PCA analysis
 of these samples?

and

 Hi Doctor, thank you.  These samples corresponding to primary macrophages
 infected with clinical strains 2.2 (n=3) and 2.3 (n = 3). These information is
 in the file: TMRC project 3: excel Host TMRC3 v1.1, rows 137 to 150.

Thus I added 3 columns to the end of the sample sheet which seek to
include this information.  The first is 'drug' and encodes both the infection
state (no for the two controls and yes for everything else), the second is
zymodeme which I took from the tmrc2 sample sheet, the last is drug, which is
either no or sb.

```{r macrophage_zymodeme_experiment}
macr <- subset_expt(hs_expt, subset="typeofcells=='Macrophages'")
macr <- set_expt_conditions(macr, fact="zymodeme")
macr <- set_expt_batches(macr, fact="macrdrug")

macr_norm <- sm(normalize_expt(macr, transform="log2", norm="quant", convert="cpm", filter=TRUE))
norm_pca <- plot_pca(macr_norm, plot_labels=FALSE)
pp(file="macrophage_side_experiment/norm_pca.png", image=norm_pca$plot)
plot_3d_pca(norm_pca)

macr_nb <- sm(normalize_expt(macr, batch="svaseq", filter=TRUE))
macr_nb <- sm(normalize_expt(macr_nb, norm="quant", convert="cpm", transform="log2"))
nb_pca <- plot_pca(macr_nb, plot_labels=FALSE)
pp(file="macrophage_side_experiment/normbatch_pca.png", image=nb_pca$plot)
```

## Write the data

```{r write, fig.show="hide"}
macr_written <- sm(write_expt(macr, excel="macrophage_side_experiment/macrophage_expt.xlsx"))
```

## Perform DE

```{r de, fig.show="hide"}
##tmp <- normalize_expt(macr, filter=TRUE)
##zy_de <- deseq_pairwise(tmp, model_batch="svaseq")
##zy_ed <- edger_pairwise(tmp, model_batch="svaseq")
zymo_de <- sm(all_pairwise(macr, model_batch="svaseq", filter=TRUE, parallel=FALSE))
zymo_table <- sm(combine_de_tables(zymo_de,
                                   excel="macrophage_side_experiment/macrophage_de.xlsx",
                                   parallel=FALSE))
```

```{r saveme}
if (!isTRUE(get0("skip_load"))) {
  pander::pander(sessionInfo())
  message(paste0("This is hpgltools commit: ", get_git_commit()))
  message(paste0("Saving to ", savefile))
  tmp <- sm(saveme(filename=savefile))
}
```

```{r loadme_after, eval=FALSE}
tmp <- loadme(filename=savefile)
```
