1 M. musculus

This will be a very minimal analysis until we get some replicates.

1.2 Metadata

I am going to write a quick sample sheet in the current working directory called ‘all_samples.xlsx’ and put the names of the count tables in it.

1.3 Create expressionsets

Here I combine the metadata, count data, and annotations.

It is worth noting that the gene IDs from htseq-count probably do not match the annotations retrieved because they are likely exon-based rather than gene based. This is not really a problem, but don’t forget it!

## Reading the sample metadata.
## The sample definitions comprises: 8 rows(samples) and 8 columns(metadata fields).
## Reading count tables.
## Using the transcript to gene mapping.
## Reading salmon data with tximport.
## Finished reading count tables.
## Matched 6764 annotations and counts.
## Bringing together the count matrix and gene information.
## The mapped IDs are not the rownames of your gene information, changing them now.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final expressionset has 6764 rows and 8 columns.
## Reading the sample metadata.
## The sample definitions comprises: 8 rows(samples) and 8 columns(metadata fields).
## Reading count tables.
## Reading salmon data with tximport.
## Finished reading count tables.
## Matched 6897 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final expressionset has 56886 rows and 8 columns.

1.4 Query expressionsets

In this block I will calculate all the diagnostic plots, but not show them. I will show them next with a little annotation.

I will leave the output for the first of each invocation and silence it for the second.

## Graphing number of non-zero genes with respect to CPM by library.
## Graphing library sizes.
## Graphing a boxplot.
## This data will benefit from being displayed on the log scale.
## If this is not desired, set scale='raw'
## Some entries are 0.  We are on log scale, adding 1 to the data.
## Changed 21398 zero count features.
## Graphing a correlation heatmap.
## Graphing a standard median correlation.
## Performing correlation.
## Graphing a distance heatmap.
## Graphing a standard median distance.
## Performing distance.
## Graphing a PCA plot.
## Graphing a T-SNE plot.
## Plotting a density plot.
## This data will benefit from being displayed on the log scale.
## If this is not desired, set scale='raw'
## Some entries are 0.  We are on log scale, setting them to 0.5.
## Changed 21398 zero count features.
## Plotting a CV plot.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
## Plotting the representation of the top-n genes.
## Plotting the expression of the top-n PC loaded genes.
## Printing a color to condition legend.
## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(cbcb(data))))
## It will save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep libsizes in mind
##  when invoking limma.  The appropriate libsize is non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize
## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 2794 low-count genes (3970 remaining).
## Step 2: normalizing the data with quant.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## transform_counts: Found 1105 values equal to 0, adding 1 to the matrix.
## Step 5: not doing batch correction.

1.6 Do a simple DE

The only interesting DE I see in this is to compare the retinas to the dlgns. I can treat them as replicates and compare.

## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(cbcb(data))))
## It will save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep libsizes in mind
##  when invoking limma.  The appropriate libsize is non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize
## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 2794 low-count genes (3970 remaining).
## Step 2: normalizing the data with quant.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## transform_counts: Found 1105 values equal to 0, adding 1 to the matrix.
## Step 5: not doing batch correction.
## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(cbcb(data))))
## It will save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep libsizes in mind
##  when invoking limma.  The appropriate libsize is non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize
## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 2794 low-count genes (3970 remaining).
## Step 2: normalizing the data with quant.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## transform_counts: Found 1105 values equal to 0, adding 1 to the matrix.
## Step 5: not doing batch correction.
## Plotting a PCA before surrogates/batch inclusion.
## Not putting labels on the plot.
## Assuming no batch in model for testing pca.
## Not putting labels on the plot.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
dlgn_ko <- "iprgc_06"
dlgn_wt <- "iprgc_01"
dlgn_ht <- "iprgc_04"
scn_ko <- "iprgc_08"
scn_wt <- "iprgc_03"
retina_ko <- "iprgc_07"
retina_wt <- "iprgc_02"
retina_ht <- "iprgc_05"

het_retina_vs_het_dlgn <- exprs(mm38_norm)[, retina_ht] - exprs(mm38_norm)[, dlgn_ht]
ko_retina_vs_ko_dlgn <- exprs(mm38_norm)[, retina_ko] - exprs(mm38_norm)[, dlgn_ko]
ko_retina_vs_ko_scn <- exprs(mm38_norm)[, retina_ko] - exprs(mm38_norm)[, scn_ko]
het_retina_vs_ko_retina <- exprs(mm38_norm)[, retina_ht] - exprs(mm38_norm)[, retina_ko]
het_dlgn_vs_ko_scn <- exprs(mm38_norm)[, dlgn_ht] - exprs(mm38_norm)[, scn_ko]
ko_dlgn_vs_ko_scn <- exprs(mm38_norm)[, dlgn_ko] - exprs(mm38_norm)[, scn_ko]

dlgn_kowt <- exprs(mm38_norm)[, dlgn_ko] - exprs(mm38_norm)[, dlgn_wt]
dlgn_htwt <- exprs(mm38_norm)[, dlgn_ht] - exprs(mm38_norm)[, dlgn_wt]
dlgn_koht <- exprs(mm38_norm)[, dlgn_ko] - exprs(mm38_norm)[, dlgn_ht]
scn_kowt <- exprs(mm38_norm)[, scn_ko] - exprs(mm38_norm)[, scn_wt]
retina_kowt <- exprs(mm38_norm)[, retina_ko] - exprs(mm38_norm)[, retina_wt]
retina_htwt <- exprs(mm38_norm)[, retina_ht] - exprs(mm38_norm)[, retina_wt]
retina_koht <- exprs(mm38_norm)[, retina_ko] - exprs(mm38_norm)[, retina_ht]

pair_mtrx <- cbind(het_retina_vs_het_dlgn,
                   ko_retina_vs_ko_dlgn,
                   ko_retina_vs_ko_scn,
                   het_retina_vs_ko_retina,
                   het_dlgn_vs_ko_scn,
                   ko_dlgn_vs_ko_scn,
                   dlgn_kowt, dlgn_htwt, dlgn_koht,
                   scn_kowt,
                   retina_kowt, retina_htwt, retina_koht)

mm_tables <- combine_de_tables(mm_de, extra_annot=pair_mtrx,
                               excel=glue::glue("excel/{rundate}mm_tables.xlsx"))
## Writing a legend of columns.
## Printing a pca plot before/after surrogates/batch estimation.
## Working on table 1/3: retina_vs_dlgn
## Working on table 2/3: scn_vs_dlgn
## Working on table 3/3: scn_vs_retina
## Adding venn plots for retina_vs_dlgn.
## Limma expression coefficients for retina_vs_dlgn; R^2: 0.934; equation: y = 0.967x + 0.0347
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Deseq expression coefficients for retina_vs_dlgn; R^2: 0.822; equation: y = 0.955x + 0.0956
## Edger expression coefficients for retina_vs_dlgn; R^2: 0.917; equation: y = 0.977x + 0.0265
## Warning: Removed 1 rows containing missing values (geom_hline).

## Warning: Removed 1 rows containing missing values (geom_hline).
## Adding venn plots for scn_vs_dlgn.
## Limma expression coefficients for scn_vs_dlgn; R^2: 0.934; equation: y = 0.967x + 0.0347
## Warning: Removed 1 rows containing missing values (geom_vline).

## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Deseq expression coefficients for scn_vs_dlgn; R^2: 0.822; equation: y = 0.955x + 0.0956
## Edger expression coefficients for scn_vs_dlgn; R^2: 0.917; equation: y = 0.977x + 0.0265
## Warning: Removed 1 rows containing missing values (geom_hline).

## Warning: Removed 1 rows containing missing values (geom_hline).
## Adding venn plots for scn_vs_retina.
## Limma expression coefficients for scn_vs_retina; R^2: 0.934; equation: y = 0.967x + 0.0347
## Warning: Removed 1 rows containing missing values (geom_vline).

## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Deseq expression coefficients for scn_vs_retina; R^2: 0.822; equation: y = 0.955x + 0.0956
## Edger expression coefficients for scn_vs_retina; R^2: 0.917; equation: y = 0.977x + 0.0265
## Warning: Removed 1 rows containing missing values (geom_hline).

## Warning: Removed 1 rows containing missing values (geom_hline).
## Writing summary information, compare_plot is: TRUE.
## Performing save of excel/20200108mm_tables.xlsx.
---
title: "M. musculus 3 cell types, 1 timepoint, 3 genotypes, and 1 replicate."
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
  html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    number_sections: true
    self_contained: true
    theme: readable
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
  rmdformats::readthedown:
    code_download: true
    code_folding: show
    df_print: paged
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    width: 300
    keep_md: false
    mode: selfcontained
    toc_float: true
  BiocStyle::html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    toc_float: true
---

<style type="text/css">
body, td {
  font-size: 16px;
}
code.r{
  font-size: 16px;
}
pre {
 font-size: 16px
}
</style>

```{r options, include=FALSE}
library("hpgltools")
tt <- devtools::load_all("/data/hpgltools")
knitr::opts_knit$set(width=120,
                     progress=TRUE,
                     verbose=TRUE,
                     echo=TRUE)
knitr::opts_chunk$set(error=TRUE,
                      dpi=96)
old_options <- options(digits=4,
                       stringsAsFactors=FALSE,
                       knitr.duplicate.label="allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size=10))
rundate <- format(Sys.Date(), format="%Y%m%d")
previous_file <- "undefined.Rmd"
ver <- "20191216"

##tmp <- sm(loadme(filename=paste0(gsub(pattern="\\.Rmd", replace="", x=previous_file), "-v", ver, ".rda.xz")))
##rmd_file <- "03_expression_infection_20180822.Rmd"
```

# M. musculus

This will be a very minimal analysis until we get some replicates.

## Annotations

I am using mm38_95.

```{r annotations}
## My load_biomart_annotations() function defaults to human, so that will be quick.
mm_annot <- load_biomart_annotations(species="mmusculus")
mm_annot <- mm_annot[["annotation"]]
mm_annot[["txid"]] <- paste0(mm_annot[["ensembl_transcript_id"]], ".", mm_annot[["version"]])
rownames(mm_annot) <- make.names(mm_annot[["ensembl_gene_id"]], unique=TRUE)

tx_gene_map <- mm_annot[, c("txid", "ensembl_gene_id")]
```

So, I now have 2 data frames of parasite annotations and 1 human.

## Metadata

I am going to write a quick sample sheet in the current working directory called
'all_samples.xlsx' and put the names of the count tables in it.

## Create expressionsets

Here I combine the metadata, count data, and annotations.

It is worth noting that the gene IDs from htseq-count probably do not match the
annotations retrieved because they are likely exon-based rather than gene
based.  This is not really a problem, but don't forget it!

```{r expt}
mm38_expt <- create_expt("sample_sheets/all_samples.xlsx", tx_gene_map=tx_gene_map,
                         gene_info=mm_annot, file_column="salmonfile")

mmtx_annot <- mm_annot
rownames(mmtx_annot) <- mm_annot[["txid"]]
mm38_tx <- create_expt("sample_sheets/all_samples.xlsx",
                         gene_info=mmtx_annot, file_column="salmonfile")
```

## Query expressionsets

In this block I will calculate all the diagnostic plots, but not show them.  I
will show them next with a little annotation.

I will leave the output for the first of each invocation and silence it for the second.

```{r query, fig.show="hide"}
mm38_plots <- graph_metrics(mm38_expt)

mm38_norm <- normalize_expt(mm38_expt, norm="quant", convert="cpm", transform="log2", filter=TRUE)

mm38n_plots <- sm(graph_metrics(mm38_norm))
```

## Show some plots

```{r show_plots}
mm38_plots$legend
mm38_plots$libsize
mm38_plots$nonzero
mm38n_plots$density
mm38n_plots$pc_plot
```

## Do a simple DE

The only interesting DE I see in this is to compare the retinas to the dlgns.
I can treat them as replicates and compare.

```{r de, fig.show="hide"}
mm <- set_expt_conditions(mm38_expt, fact="celltype")
mm_norm <- normalize_expt(mm, norm="quant",
                          convert="cpm", transform="log2", filter=TRUE)
plot_pca(mm_norm)$plot

mm_de <- all_pairwise(mm, model_batch=FALSE)

dlgn_ko <- "iprgc_06"
dlgn_wt <- "iprgc_01"
dlgn_ht <- "iprgc_04"
scn_ko <- "iprgc_08"
scn_wt <- "iprgc_03"
retina_ko <- "iprgc_07"
retina_wt <- "iprgc_02"
retina_ht <- "iprgc_05"

het_retina_vs_het_dlgn <- exprs(mm38_norm)[, retina_ht] - exprs(mm38_norm)[, dlgn_ht]
ko_retina_vs_ko_dlgn <- exprs(mm38_norm)[, retina_ko] - exprs(mm38_norm)[, dlgn_ko]
ko_retina_vs_ko_scn <- exprs(mm38_norm)[, retina_ko] - exprs(mm38_norm)[, scn_ko]
het_retina_vs_ko_retina <- exprs(mm38_norm)[, retina_ht] - exprs(mm38_norm)[, retina_ko]
het_dlgn_vs_ko_scn <- exprs(mm38_norm)[, dlgn_ht] - exprs(mm38_norm)[, scn_ko]
ko_dlgn_vs_ko_scn <- exprs(mm38_norm)[, dlgn_ko] - exprs(mm38_norm)[, scn_ko]

dlgn_kowt <- exprs(mm38_norm)[, dlgn_ko] - exprs(mm38_norm)[, dlgn_wt]
dlgn_htwt <- exprs(mm38_norm)[, dlgn_ht] - exprs(mm38_norm)[, dlgn_wt]
dlgn_koht <- exprs(mm38_norm)[, dlgn_ko] - exprs(mm38_norm)[, dlgn_ht]
scn_kowt <- exprs(mm38_norm)[, scn_ko] - exprs(mm38_norm)[, scn_wt]
retina_kowt <- exprs(mm38_norm)[, retina_ko] - exprs(mm38_norm)[, retina_wt]
retina_htwt <- exprs(mm38_norm)[, retina_ht] - exprs(mm38_norm)[, retina_wt]
retina_koht <- exprs(mm38_norm)[, retina_ko] - exprs(mm38_norm)[, retina_ht]

pair_mtrx <- cbind(het_retina_vs_het_dlgn,
                   ko_retina_vs_ko_dlgn,
                   ko_retina_vs_ko_scn,
                   het_retina_vs_ko_retina,
                   het_dlgn_vs_ko_scn,
                   ko_dlgn_vs_ko_scn,
                   dlgn_kowt, dlgn_htwt, dlgn_koht,
                   scn_kowt,
                   retina_kowt, retina_htwt, retina_koht)

mm_tables <- combine_de_tables(mm_de, extra_annot=pair_mtrx,
                               excel=glue::glue("excel/{rundate}mm_tables.xlsx"))
```


```{r saveme, eval=FALSE}
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
message(paste0("Saving to ", this_save))
tmp <- sm(saveme(filename=this_save))
```
