1 Introduction

This document is intended to provide an overview of TMRC3 samples which have been sequenced. It includes some plots and analyses showing the relationships among the samples as well as some differential analyses when possible.

2 Annotation

We take the annotation data from ensembl’s biomart instance. The genome which was used to map the data was hg38 revision 91. My default when using biomart is to load the data from 1 year before the current date, which provides annotations which match hg38 91 almost perfectly.

hs_annot <- load_biomart_annotations()
## The biomart annotations file already exists, loading from it.
hs_annot <- hs_annot[["annotation"]]
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique=TRUE)
tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]

3 Sample Estimation

I used two mapping methods for this data, hisat2 and salmon. Most analyses use hisat2, which is a more traditional map-and-count method. In contrast, salmon uses what may be thought of as a indexed voting method (so that multi-matches are discounted and the votes split among all matches). Salmon also required a pre-existing database of known transcripts (though later versions may actually use mapping from things like hisat), while hisat uses the genome and a database of known transcripts (and optionally can search for splicing junctions to find new transcripts).

3.1 Generate expressionsets

The first thing to note is the large range in coverage. There are multiple samples with coverage which is too low to use. These will be removed shortly.

macr <- create_expt("sample_sheets/tmrc3_samples_20210512.xlsx",
                    file_column = "hg38100hisatfile",
                    savefile = "hs_expt_all.rda",
                    gene_info = hs_annot) %>%
  exclude_genes_expt(column = "gene_biotype", method = "keep",
                     pattern = "protein_coding") %>%
  subset_expt(nonzero = 11000) %>%
  subset_expt(subset = "typeofcells=='Macrophages'") %>%
  set_expt_conditions(fact = "infectionzymodeme") %>%
  set_expt_batches(fact = "infectiondrug")
## Reading the sample metadata.
## Dropped 113 rows from the sample metadata because they were blank.
## The sample definitions comprises: 131 rows(samples) and 80 columns(metadata fields).
## Warning in create_expt("sample_sheets/tmrc3_samples_20210512.xlsx", file_column
## = "hg38100hisatfile", : Some samples were removed when cross referencing the
## samples against the count data.
## Matched 21452 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final expressionset has 21481 rows and 119 columns.
## Before removal, there were 21481 genes, now there are 19941.
## There are 13 samples which kept less than 90 percent counts.
## TMRC30015 TMRC30017 TMRC30019 TMRC30044 TMRC30045 TMRC30097 TMRC30075 TMRC30087 
##     79.24     85.72     89.75     80.34     73.33     89.90     86.97     83.63 
## TMRC30101 TMRC30104 TMRC30114 TMRC30131 TMRC30073 
##     88.41     80.29     87.62     86.82     89.26
## The samples (and read coverage) removed when filtering 11000 non-zero genes are:
## TMRC30010 TMRC30050 TMRC30052 
##     52471    808149   3087347
## subset_expt(): There were 119, now there are 116 samples.
## subset_expt(): There were 116, now there are 12 samples.

4 Macrophages

These samples are rather different from all of the others. The following section is therefore written primarily in response to a separate set of emails from Olga and Maria Adelaida; here is a snippet:

Dear all, about the samples corresponding to infected macrophages with three sensitive (2.2) and three resistant (2.3) clinical strains of L. (V.) panamensis, I send you the results of parasite burden by detection of 7SLRNA. I think these results are interesting, but the sample size is very small. Doctor Najib or Trey could you please send me the quality data and PCA analysis of these samples?

and

Hi Doctor, thank you. These samples corresponding to primary macrophages infected with clinical strains 2.2 (n=3) and 2.3 (n = 3). These information is in the file: TMRC project 3: excel Host TMRC3 v1.1, rows 137 to 150.

Thus I added 3 columns to the end of the sample sheet which seek to include this information. The first is ‘drug’ and encodes both the infection state (no for the two controls and yes for everything else), the second is zymodeme which I took from the tmrc2 sample sheet, the last is drug, which is either no or sb.

macr_norm <- sm(normalize_expt(macr, transform="log2", norm="quant", convert="cpm", filter=TRUE))
norm_pca <- plot_pca(macr_norm, plot_labels=FALSE)
pp(file="macrophage_side_experiment/norm_pca.png", image=norm_pca$plot)

plot_3d_pca(norm_pca)$plot
macr_nb <- sm(normalize_expt(macr, norm = "quant", convert = "cpm",
                             transform = "log2", batch = "svaseq", filter = TRUE))
nb_pca <- plot_pca(macr_nb, plot_labels=FALSE)
pp(file="macrophage_side_experiment/normbatch_pca.png", image=nb_pca$plot)

4.1 Write the data

macr_written <- sm(write_expt(macr, excel="macrophage_side_experiment/macrophage_expt.xlsx"))
## Error in checkModelStatus(fit, showWarnings = showWarnings, colinearityCutoff = colinearityCutoff) : 
##   The variables specified in this model are redundant,
## so the design matrix is not full rank
## Error in checkModelStatus(fit, showWarnings = showWarnings, colinearityCutoff = colinearityCutoff) : 
##   The variables specified in this model are redundant,
## so the design matrix is not full rank

4.2 Perform DE

There were a couple of contrasts explicitly requested for this data:

  • Macrophages infected with 2.3 without SbV vs macrophagues infected with 2.2 without SbV
  • Macrophages infected with 2.3 without SbV vs macrophages uninfected (M0)

Unfortunately, I think to really answer these questions we will require more replicates of a few of these conditions, most notably the uninfected samples.

##tmp <- normalize_expt(macr, filter=TRUE)
##zy_de <- deseq_pairwise(tmp, model_batch="svaseq")
##zy_ed <- edger_pairwise(tmp, model_batch="svaseq")
combined_condition <- paste0(pData(macr)[["infectiondrug"]],
                             pData(macr)[["infectionzymodeme"]])
pData(macr[["expressionset"]])[["combined"]] <- combined_condition
macr <- set_expt_conditions(macr, fact = "combined")
keepers <- list(
    "zymo_nodrug" = c("infz23", "infz22"),
    "z23_uninf" = c("infz23", "uninfnozymo"))
    
zymo_de <- all_pairwise(macr, model_batch = "svaseq", filter = TRUE,
                        do_ebseq = FALSE)
## batch_counts: Before batch/surrogate estimation, 57 entries are x==0: 0%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (11030 remaining).
## batch_counts: Before batch/surrogate estimation, 57 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 3668 entries are 0<x<1: 3%.
## Error in solve.default(t(mod) %*% mod) : 
##   system is computationally singular: reciprocal condition number = 2.27998e-18
## Warning in all_adjusters(count_table, design = design, estimate_type = method, :
## It is highly likely that the underlying reason for this error is too many 0's in
## the dataset, please try doing a filtering of the data and retry.
## Error in solve.default(t(mod) %*% mod) : 
##   system is computationally singular: reciprocal condition number = 2.27072e-18
## Warning in do_batch(count_table, method = batch, expt_design = expt_design, :
## The batch_counts call failed. Returning non-batch reduced data.
## transform_counts: Found 57 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
zymo_table <- combine_de_tables(zymo_de, keepers=keepers,
                                excel="macrophage_side_experiment/macrophage_de.xlsx")
## Deleting the file macrophage_side_experiment/macrophage_de.xlsx before writing the tables.
if (!isTRUE(get0("skip_load"))) {
  pander::pander(sessionInfo())
  message(paste0("This is hpgltools commit: ", get_git_commit()))
  message(paste0("Saving to ", savefile))
  tmp <- sm(saveme(filename=savefile))
}
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 68b1ce610bf0c750d9a3ed2f6bd2a529b1744c29
## This is hpgltools commit: Thu May 27 17:01:01 2021 -0400: 68b1ce610bf0c750d9a3ed2f6bd2a529b1744c29
## Saving to macrophage_experiment_202105.rda.xz
tmp <- loadme(filename=savefile)
---
title: "L. panamensis 202105: A Separate Macrophage Experiment"
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
 html_document:
  code_download: true
  code_folding: show
  fig_caption: true
  fig_height: 7
  fig_width: 7
  highlight: default
  keep_md: false
  mode: selfcontained
  number_sections: true
  self_contained: true
  theme: readable
  toc: true
  toc_float:
   collapsed: false
   smooth_scroll: false
---

<style>
  body .main-container {
    max-width: 1600px;
  }
</style>

```{r options, include=FALSE}
library(hpgltools)
tt <- sm(devtools::load_all("/data/hpgltools"))
knitr::opts_knit$set(progress=TRUE,
                     verbose=TRUE,
                     width=90,
                     echo=TRUE)
knitr::opts_chunk$set(error=TRUE,
                      fig.width=8,
                      fig.height=8,
                      dpi=96)
old_options <- options(digits=4,
                       stringsAsFactors=FALSE,
                       knitr.duplicate.label="allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size=12))
ver <- "202105"
previous_file <- paste0("none", ver, ".Rmd")
rundate <- format(Sys.Date(), format="%Y%m%d")
tmp <- try(sm(loadme(filename=gsub(pattern="\\.Rmd", replace="\\.rda\\.xz", x=previous_file))))
rmd_file <- "macrophage_experiment_202105.Rmd"
savefile <- gsub(pattern="\\.Rmd", replace="\\.rda\\.xz", x=rmd_file)
```

# Introduction

This document is intended to provide an overview of TMRC3 samples which have
been sequenced.  It includes some plots and analyses showing the relationships
among the samples as well as some differential analyses when possible.

# Annotation

We take the annotation data from ensembl's biomart instance.  The genome which
was used to map the data was hg38 revision 91.  My default when using biomart is
to load the data from 1 year before the current date, which provides annotations
which match hg38 91 almost perfectly.

```{r hs_annot}
hs_annot <- load_biomart_annotations()
hs_annot <- hs_annot[["annotation"]]
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique=TRUE)
tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]
```

# Sample Estimation

I used two mapping methods for this data, hisat2 and salmon.  Most analyses use
hisat2, which is a more traditional map-and-count method.  In contrast, salmon
uses what may be thought of as a indexed voting method (so that multi-matches are
discounted and the votes split among all matches).  Salmon also required a
pre-existing database of known transcripts (though later versions may actually
use mapping from things like hisat), while hisat uses the genome and a database
of known transcripts (and optionally can search for splicing junctions to find
new transcripts).

## Generate expressionsets

The first thing to note is the large range in coverage.  There are multiple
samples with coverage which is too low to use.  These will be removed shortly.

```{r all_new_hisat2}
macr <- create_expt("sample_sheets/tmrc3_samples_20210512.xlsx",
                    file_column = "hg38100hisatfile",
                    savefile = "hs_expt_all.rda",
                    gene_info = hs_annot) %>%
  exclude_genes_expt(column = "gene_biotype", method = "keep",
                     pattern = "protein_coding") %>%
  subset_expt(nonzero = 11000) %>%
  subset_expt(subset = "typeofcells=='Macrophages'") %>%
  set_expt_conditions(fact = "infectionzymodeme") %>%
  set_expt_batches(fact = "infectiondrug")
```

# Macrophages

These samples are rather different from all of the others.  The following
section is therefore written primarily in response to a separate set of emails
from Olga and Maria Adelaida; here is a snippet:

 Dear all, about the samples corresponding to infected macrophages with three
 sensitive (2.2) and three resistant (2.3) clinical strains of L. (V.)
 panamensis, I send you the results of parasite burden by detection of 7SLRNA. I
 think these results are interesting, but the sample size is very small.
Doctor Najib or Trey could you please send me the quality data and PCA analysis
 of these samples?

and

 Hi Doctor, thank you.  These samples corresponding to primary macrophages
 infected with clinical strains 2.2 (n=3) and 2.3 (n = 3). These information is
 in the file: TMRC project 3: excel Host TMRC3 v1.1, rows 137 to 150.

Thus I added 3 columns to the end of the sample sheet which seek to
include this information.  The first is 'drug' and encodes both the infection
state (no for the two controls and yes for everything else), the second is
zymodeme which I took from the tmrc2 sample sheet, the last is drug, which is
either no or sb.

```{r macrophage_zymodeme_experiment}
macr_norm <- sm(normalize_expt(macr, transform="log2", norm="quant", convert="cpm", filter=TRUE))
norm_pca <- plot_pca(macr_norm, plot_labels=FALSE)
pp(file="macrophage_side_experiment/norm_pca.png", image=norm_pca$plot)
plot_3d_pca(norm_pca)$plot

macr_nb <- sm(normalize_expt(macr, norm = "quant", convert = "cpm",
                             transform = "log2", batch = "svaseq", filter = TRUE))
nb_pca <- plot_pca(macr_nb, plot_labels=FALSE)
pp(file="macrophage_side_experiment/normbatch_pca.png", image=nb_pca$plot)
```

## Write the data

```{r write, fig.show="hide"}
macr_written <- sm(write_expt(macr, excel="macrophage_side_experiment/macrophage_expt.xlsx"))
```

## Perform DE

There were a couple of contrasts explicitly requested for this data:

* Macrophages infected with 2.3 without SbV vs macrophagues infected with 2.2 without SbV
* Macrophages infected with 2.3 without SbV vs macrophages uninfected (M0)

Unfortunately, I think to really answer these questions we will
require more replicates of a few of these conditions, most notably the
uninfected samples.

```{r de, fig.show="hide"}
##tmp <- normalize_expt(macr, filter=TRUE)
##zy_de <- deseq_pairwise(tmp, model_batch="svaseq")
##zy_ed <- edger_pairwise(tmp, model_batch="svaseq")
combined_condition <- paste0(pData(macr)[["infectiondrug"]],
                             pData(macr)[["infectionzymodeme"]])
pData(macr[["expressionset"]])[["combined"]] <- combined_condition
macr <- set_expt_conditions(macr, fact = "combined")
keepers <- list(
    "zymo_nodrug" = c("infz23", "infz22"),
    "z23_uninf" = c("infz23", "uninfnozymo"))
    
zymo_de <- all_pairwise(macr, model_batch = "svaseq", filter = TRUE,
                        do_ebseq = FALSE)
zymo_table <- combine_de_tables(zymo_de, keepers=keepers,
                                excel="macrophage_side_experiment/macrophage_de.xlsx")
```

```{r saveme}
if (!isTRUE(get0("skip_load"))) {
  pander::pander(sessionInfo())
  message(paste0("This is hpgltools commit: ", get_git_commit()))
  message(paste0("Saving to ", savefile))
  tmp <- sm(saveme(filename=savefile))
}
```

```{r loadme_after, eval=FALSE}
tmp <- loadme(filename=savefile)
```
