1 Contrasts

zymodeme_keeper <- list(
    "zymodeme" = c("z23", "z22"))
susceptibility_keepers <- list(
    "resistant_sensitive" = c("resistant", "sensitive"),
    "resistant_ambiguous" = c("resistant", "ambiguous"),
    "sensitive_ambiguous" = c("sensitive", "ambiguous"))

1.1 Zymodeme enzyme gene IDs

Najib read me an email listing off the gene names associated with the zymodeme classification. I took those names and cross referenced them against the Leishmania panamensis gene annotations and found the following:

They are:

  1. ALAT: LPAL13_120010900 – alanine aminotransferase
  2. ASAT: LPAL13_340013000 – aspartate aminotransferase
  3. G6PD: LPAL13_000054100 – glucase-6-phosphate 1-dehydrogenase
  4. NH: LPAL13_14006100, LPAL13_180018500 – inosine-guanine nucleoside hydrolase
  5. MPI: LPAL13_320022300 (maybe) – mannose phosphate isomerase (I chose phosphomannose isomerase)

Given these 6 gene IDs (NH has two gene IDs associated with it), I can do some looking for specific differences among the various samples.

1.1.1 Expression levels of zymodeme genes

The following creates a colorspace (red to green) heatmap showing the observed expression of these genes in every sample.

my_genes <- c("LPAL13_120010900", "LPAL13_340013000", "LPAL13_000054100",
              "LPAL13_140006100", "LPAL13_180018500", "LPAL13_320022300",
my_names <- c("ALAT", "ASAT", "G6PD", "NHv1", "NHv2", "MPI", "other")

zymo_six_genes <- exclude_genes_expt(lp_two_strains, ids = my_genes, method = "keep")
strain_norm <- normalize_expt(zymo_six_genes, convert="rpkm", filter=TRUE, transform="log2")
## Removing 0 low-count genes (6 remaining).
zymo_heatmap <- plot_sample_heatmap(strain_norm, row_label = my_names)

lp_norm <- normalize_expt(lp_two_strains, filter=TRUE, convert="rpkm",
                          norm="quant", transform="log2")
## Removing 152 low-count genes (8558 remaining).
## There appear to be 23 genes without a length.
## transform_counts: Found 103 values equal to 0, adding 1 to the matrix.
zymo_heatmap_all <- plot_sample_heatmap(lp_norm)

1.2 Compare to highly expressed, variant genes

I want to compare the above heatmap with one which is comprised of all genes with some ‘significantly high’ expression value and also a not-negligible coefficient of variance.

zymo_high_genes <- normalize_expt(lp_two_strains, filter="cv", cv_min=0.9)
## Removing 5310 low-count genes (3400 remaining).
high_strain_norm <- normalize_expt(zymo_high_genes, convert="rpkm", norm="quant", transform="log2")
## There appear to be 104 genes without a length.
## transform_counts: Found 238 values equal to 0, adding 1 to the matrix.
zymo_heatmap <- plot_sample_heatmap(high_strain_norm, row_label = my_names)

I think this plot suggests that the difference between the two primary strains is not really one of a few specific genes, but instead a global pattern.

2 Zymodeme differential expression

2.1 No attempt at batch estimation

two_zymo <- set_expt_conditions(lp_two_strains, fact = "zymodemecategorical") %>%
  subset_expt(subset = "condition!='unknown'")
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
zymo_table_nobatch <- combine_de_tables(
    zymo_de_nobatch, keepers = zymodeme_keeper,
    rda = glue::glue("rda/zymo_tables_nobatch-v{ver}.rda"),
    excel = glue::glue("excel/zymo_tables_nobatch-v{ver}.xlsx"))
zymo_sig_nobatch <- extract_significant_genes(
    according_to = "deseq", current_id = "GID", required_id = "GID",
    gmt = glue::glue("gmt/zymodeme_nobatch-v{ver}.gmt"),
    excel = glue::glue("excel/zymo_sig_nobatch_deseq-v{ver}.xlsx"))
## Number of down IDs in contrast zymodeme: 85.

2.1.1 Plot DE genes without batch estimation/adjustment



Log ratio, mean average plot and volcano plot of the comparison of the two primary zymodeme transcriptomes. When the transcriptomes of the two main strains (43 and 41 samples of z2.3 and z2.1) were compared without any attempt at batch/surrogate estimation with DESeq2, 45 and 85 genes were observed as significantly higher in strain z2.3 and z2.2 respectively using a cutoff of 1.0 logFC and 0.05 FDR adjusted p-value. There remain a large number of genes which are likely significantly different between the two strains, but fall below the 2-fold difference required for ‘significance.’ This follows prior observations that the parasite transcriptomes are constituitively expressed.

When the same data was plotted via a volcano plot, the relatively small range of fold changes compared to the large range of adjusted p-values is visible.

2.2 Attempt SVA estimate

zymo_de_sva <- all_pairwise(two_zymo, filter = TRUE, model_batch = "svaseq")
## Comparing analyses.
zymo_table_sva <- combine_de_tables(
    zymo_de_sva, keepers = zymodeme_keeper,
    rda = glue::glue("rda/zymo_tables_sva-v{ver}.rda"),
    excel = glue::glue("excel/zymo_tables_sva-v{ver}.xlsx"))
zymo_sig_sva <- extract_significant_genes(
    according_to = "deseq",
    current_id = "GID", required_id = "GID",
    gmt = glue::glue("gmt/zymodeme_sva-v{ver}.gmt"),
    excel = glue::glue("excel/zymo_sig_sva-v{ver}.xlsx"))
2.2.1 Plot zymodeme DE genes with sva batch estimation/adjustment

When estimates from SVA were included in the statistical model used by EdgeR, DESeq2, and limma; a nearly identical view of the data emerged. I think this shows with a high degree of confidence, that sva is not having a significant effect on this dataset.



3 Parasite Susceptibility to Drug (Current)

This susceptibility comparison is using the ‘current’ dataset.

sus_de_nobatch <- all_pairwise(lp_susceptibility, filter = TRUE, model_batch = FALSE)
## Comparing analyses.

sus_table_nobatch <- combine_de_tables(
    sus_de_nobatch, keepers = susceptibility_keepers,
    excel = glue::glue("excel/sus_tables_nobatch-v{ver}.xlsx"))
sus_de_sva <- all_pairwise(lp_susceptibility, filter = TRUE, model_batch = "svaseq")
## Comparing analyses.

sus_table_sva <- combine_de_tables(
    sus_de_sva, keepers = susceptibility_keepers,
    excel = glue::glue("excel/sus_tables_sva-v{ver}.xlsx"))
sus_sig_sva <- extract_significant_genes(
    sus_table_sva, according_to = "deseq",
    excel = glue::glue("excel/sus_sig_sva-v{ver}.xlsx"))
## To get a more true sense of sensitive vs resistant with sva, we kind of need to get rid of the
## unknown samples and perhaps the ambiguous.
no_ambiguous <- subset_expt(lp_susceptibility, subset="condition!='ambiguous'") %>%
## subset_expt(): There were 101, now there are 88 samples.
## subset_expt(): There were 88, now there are 64 samples.
no_ambiguous_de_sva <- all_pairwise(no_ambiguous, filter = TRUE, model_batch = "svaseq")
## Comparing analyses.
## Let us see if my keeper code will fail hard or soft with extra contrasts...
no_ambiguous_table_sva <- combine_de_tables(
    no_ambiguous_de_sva, keepers = susceptibility_keepers,
    excel = glue::glue("excel/no_ambiguous_tables_sva-v{ver}.xlsx"))
no_ambiguous_sig_sva <- extract_significant_genes(
    no_ambiguous_table_sva, according_to = "deseq",
    excel = glue::glue("excel/no_ambiguous_sig_sva-v{ver}.xlsx"))
3.0.1 Plot zymodeme DE genes with sva batch estimation/adjustment







Given that resistance/sensitivity tends to be correlated with strain, one might expect similar results. One caveat in this context though: there are fewer strains with resistance/sensitivity definitions. This when the analysis was repeated without the ambiguous/unknown samples, a few more genes were observed as significant.

4 Parasite Susceptibility to Drug (Historical)

This susceptibility comparison is using the historical dataset.

sushist_de_nobatch <- all_pairwise(lp_susceptibility_historical, filter = TRUE, model_batch = FALSE)
## Comparing analyses.

sushist_table_nobatch <- combine_de_tables(
    sushist_de_nobatch, keepers = susceptibility_keepers,
    excel = glue::glue("excel/sushist_tables_nobatch-v{ver}.xlsx"))
sushist_de_sva <- all_pairwise(lp_susceptibility_historical, filter = TRUE, model_batch = "svaseq")
## Comparing analyses.

sushist_table_sva <- combine_de_tables(
    sushist_de_sva, keepers = susceptibility_keepers,
    excel = glue::glue("excel/sushist_tables_sva-v{ver}.xlsx"))
5 Cure/Fail association

##cf_nb_input <- subset_expt(cf_expt, subset="condition!='unknown'")
cf_de_nobatch <- all_pairwise(lp_cf_known, filter = TRUE, model_batch = FALSE)
## Comparing analyses.
cf_table_nobatch <- combine_de_tables(cf_de_nobatch, excel = glue::glue("excel/cf_tables_nobatch-v{ver}.xlsx"))
cf_de <- all_pairwise(lp_cf_known, filter = TRUE, model_batch = "svaseq")
## Comparing analyses.
cf_table <- combine_de_tables(cf_de, excel = glue::glue("excel/cf_tables-v{ver}.xlsx"))
5.1 Cure/Fail DE plots

It is not surprising that few or no genes are deemed significantly differentially expressed across samples which were taken from cure or fail patients.


dev <- pp(file = "images/cf_ma.png")
closed <- dev.off()

6 Combining the macrophage infected amastigotes with in-vitro promastigotes

One query we have not yet addressed: what are the similarities and differences among the strains used to infect the macrophage samples and the promastigote samples used in the TMRC2 parasite data?

tmrc2_macrophage_norm <- normalize_expt(lp_macrophage, transform="log2", convert="cpm",
                                        norm="quant", filter=TRUE)
## Removing 169 low-count genes (8541 remaining).
## transform_counts: Found 23 values equal to 0, adding 1 to the matrix.
all_tmrc2 <- combine_expts(lp_expt, lp_macrophage)
all_nosb <- all_tmrc2
pData(all_nosb)[["stage"]] <- "promastigote"
na_idx <- is.na(pData(all_nosb)[["macrophagetreatment"]])
pData(all_nosb)[na_idx, "macrophagetreatment"] <- "undefined"
all_nosb <- subset_expt(all_nosb, subset="macrophagetreatment!='inf_sb'")
## subset_expt(): There were 121, now there are 119 samples.
ama_idx <- pData(all_nosb)[["macrophagetreatment"]] == "inf"
pData(all_nosb)[ama_idx, "stage" ] <- "amastigote"
pData(all_nosb)[["batch"]] <- pData(all_nosb)[["stage"]]

I think the above picture is sort of the opposite of what we want to compare in a DE analysis for this set of data, e.g. we want to compare promastigotes from amastigotes?

all_nosb <- set_expt_batches(all_nosb, fact="condition") %>%
two_zymo <- subset_expt(all_nosb, subset="zymodemecategorical=='z22'|zymodemecategorical=='z23'|zymodemecategorical=='unknown'")
## subset_expt(): There were 119, now there are 86 samples.
pro_ama <- all_pairwise(all_nosb, filter=TRUE, model_batch="svaseq")
## This DE analysis will perform all pairwise comparisons among:
##   amastigote promastigote 
##           18          101
## This analysis will include surrogate estimates from: svaseq.
## This will pre-filter the input data using normalize_expt's: TRUE argument.
## Removing 0 low-count genes (8585 remaining).
## Setting 696 low elements to zero.
## transform_counts: Found 696 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
pro_ama_table <- combine_de_tables(
    excel = glue::glue("excel/tmrc2_pro_vs_ama_table-v{ver}.xlsx"))
6.0.1 Plot promastigote/amastigote DE genes


I am a little surprised by this plot, I somewhat expected there to be few genes which passed the 2-fold difference demarcation line.

if (!isTRUE(get0("skip_load"))) {
  message(paste0("This is hpgltools commit: ", get_git_commit()))
  message(paste0("Saving to ", savefile))
  tmp <- sm(saveme(filename = savefile))
