1 Changelog

  • 202310: Cleaning up to make everything pass within a containerized environment.
  • 202310: Received a set of colors and contrasts of interest for a barplot of significance.
  • 20230410: Making some changes to improve the differential expression plots as well as prepare for some different pathway/GSEA/GSVA analyses on the data.

2 Notes

Showing plots of counts with respect to drug treatment: query to Najib to clarify normalization. I think I can make versions of these plots which use SL/normalized counts to alleviate Najib’s query.

mean ratio of SL/total by condition?

3 Introduction

Having established that the TMRC2 macrophage data looks robust and illustrative of a couple of interesting questions, let us perform a couple of differential analyses of it.

Also note that as of 202212, we received a new set of samples which now include some which are of a completely different cell type, U937. As their ATCC page states, they are malignant cells taken from the pleural effusion of a 37 year old white male with histiocytic lymphoma and which exhibit the morphology of monocytes. Thus, this document now includes some comparisons of the cell types as well as the various macrophage donors (given that there are now more donors too).

3.1 Human data

I am moving the dataset manipulations here so that I can look at them all together before running the various DE analyses.

3.2 Create sets focused on drug, celltype, strain, and combinations

Let us start by playing with the metadata a little and create sets with the condition set to:

  • Drug treatment
  • Cell type (macrophage or U937)
  • Donor
  • Infection Strain
  • Some useful combinations thereof

In addition, keep mental track of which datasets are comprised of all samples vs. those which are only macrophage vs. those which are only U937. (Thus, the usage of all_human vs. hs_macr vs. u937 as prefixes for the data structures.)

Ideally, these recreations of the data should perhaps be in the datastructures worksheet.

all_human <- sanitize_expt_pData(hs_macrophage, columns = "drug") %>%
  set_expt_conditions(fact = "drug") %>%
  set_expt_batches(fact = "typeofcells")
## The numbers of samples by condition are:
## 
## antimony     none 
##       34       34
## The number of samples by batch are:
## 
## Macrophages        U937 
##          54          14
## The following 3 lines were copy/pasted to datastructures and should be removed soon.
no_strain_idx <- pData(all_human)[["strainid"]] == "none"
##pData(all_human)[["strainid"]] <- paste0("s", pData(all_human)[["strainid"]],
##                                         "_", pData(all_human)[["macrophagezymodeme"]])
pData(all_human)[no_strain_idx, "strainid"] <- "none"
table(pData(all_human)[["strainid"]])
## 
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     8     2     2     2     8     7     8     2     7     8     2    10
all_human_types <- set_expt_conditions(all_human, fact = "typeofcells") %>%
  set_expt_batches(fact = "drug")
## The numbers of samples by condition are:
## 
## Macrophages        U937 
##          54          14
## The number of samples by batch are:
## 
## antimony     none 
##       34       34
type_zymo_fact <- paste0(pData(all_human_types)[["condition"]], "_",
                         pData(all_human_types)[["macrophagezymodeme"]])
type_zymo <- set_expt_conditions(all_human_types, fact = type_zymo_fact)
## The numbers of samples by condition are:
## 
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none 
##                8               23               23                2 
##         U937_z22         U937_z23 
##                6                6
type_drug_fact <- paste0(pData(all_human_types)[["condition"]], "_",
                         pData(all_human_types)[["drug"]])
type_drug <- set_expt_conditions(all_human_types, fact = type_drug_fact)
## The numbers of samples by condition are:
## 
## Macrophages_antimony     Macrophages_none        U937_antimony 
##                   27                   27                    7 
##            U937_none 
##                    7
strain_fact <- pData(all_human_types)[["strainid"]]
table(strain_fact)
## strain_fact
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     8     2     2     2     8     7     8     2     7     8     2    10
new_conditions <- paste0(pData(hs_macrophage)[["macrophagetreatment"]], "_",
                         pData(hs_macrophage)[["macrophagezymodeme"]])
## Note the sanitize() call is redundant with the addition of sanitize() in the
## datastructures file, but I don't want to wait to rerun that.
hs_macr <- set_expt_conditions(hs_macrophage, fact = new_conditions) %>%
  sanitize_expt_pData(column = "drug") %>%
  subset_expt(subset = "typeofcells!='U937'")
## The numbers of samples by condition are:
## 
##      inf_z22      inf_z23    infsb_z22    infsb_z23   uninf_none uninfsb_none 
##           14           15           15           14            5            5
## The samples excluded are: TMRC30309, TMRC30293, TMRC30294, TMRC30291, TMRC30292, TMRC30307, TMRC30308, TMRC30310, TMRC30331, TMRC30311, TMRC30332, TMRC30305, TMRC30306, TMRC30330.
## subset_expt(): There were 68, now there are 54 samples.

3.2.1 Separate Macrophage samples

Once again, we should reconsider where the following block is placed, but these datastructures are likely to be used in many of the following analyses.

hs_macr_drug_expt <- set_expt_conditions(hs_macr, fact = "drug")
## The numbers of samples by condition are:
## 
## antimony     none 
##       27       27
hs_macr_strain_expt <- set_expt_conditions(hs_macr, fact = "macrophagezymodeme") %>%
  subset_expt(subset = "macrophagezymodeme != 'none'")
## The numbers of samples by condition are:
## 
## none  z22  z23 
##    8   23   23
## The samples excluded are: TMRC30059, TMRC30060, TMRC30266, TMRC30268, TMRC30326, TMRC30327, TMRC30312, TMRC30313.
## subset_expt(): There were 54, now there are 46 samples.
table(pData(hs_macr)[["strainid"]])
## 
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     6     2     2     2     6     5     6     2     5     6     2     8

3.2.2 Refactor U937 samples

The U937 samples were separated in the datastructures file, but we want to use the combination of drug/zymodeme with them pretty much exclusively.

new_conditions <- paste0(pData(hs_u937)[["macrophagetreatment"]], "_",
                         pData(hs_u937)[["macrophagezymodeme"]])
u937_expt <- set_expt_conditions(hs_u937, fact = new_conditions)
## The numbers of samples by condition are:
## 
##      inf_z22      inf_z23    infsb_z22    infsb_z23   uninf_none uninfsb_none 
##            3            3            3            3            1            1

3.3 Contrasts used in this document

Given the various ways we have chopped up this dataset, there are a few general types of contrasts we will perform, which will then be combined into greater complexity:

  • drug treatment
  • strains used
  • cellltypes
  • donors

In the end, our actual goal is to consider the variable effects of drug+strain and see if we can discern patterns which lead to better or worse drug treatment outcome.

There is a set of contrasts in which we are primarily interested in this data, these follow. I created one ratio of ratios contrast which I think has the potential to ask our biggest question.

tmrc2_human_extra <- "z23drugnodrug_vs_z22drugnodrug = (infsbz23 - infz23) - (infsbz22 - infz22), z23z22drug_vs_z23z22nodrug = (infsbz23 - infsbz22) - (infz23 - infz22)"
tmrc2_human_keepers <- list(
  "z23nosb_vs_uninf" = c("infz23", "uninfnone"),
  "z22nosb_vs_uninf" = c("infz22", "uninfnone"),
  "z23nosb_vs_z22nosb" = c("infz23", "infz22"),
  "z23sb_vs_z22sb" = c("infsbz23", "infsbz22"),
  "z23sb_vs_z23nosb" = c("infsbz23", "infz23"),
  "z22sb_vs_z22nosb" = c("infsbz22", "infz22"),
  "z23sb_vs_sb" = c("infsbz23", "uninfsbnone"),
  "z22sb_vs_sb" = c("infsbz22", "uninfsbnone"),
  "z23sb_vs_uninf" = c("infsbz23", "uninfnone"),
  "z22sb_vs_uninf" = c("infsbz22", "uninfnone"),
  "sb_vs_uninf" = c("uninfsbnone", "uninfnone"),
  "extra_z2322" = c("z23drugnodrug", "z22drugnodrug"),
  "extra_drugnodrug" = c("z23z22drug", "z23z22nodrug"))
single_tmrc2_keeper <- list(
  "z22sb_vs_sb" = c("infsbz22", "uninfsbnone"))
tmrc2_drug_keepers <- list(
  "drug" = c("antimony", "none"))
tmrc2_type_keepers <- list(
  "type" = c("U937", "Macrophages"))
tmrc2_strain_keepers <- list(
  "strain" = c("z23", "z22"))
type_zymo_extra <- "zymos_vs_types = (U937z23 - U937z22) - (Macrophagesz23 - Macrophagesz22)"
tmrc2_typezymo_keepers <- list(
  "u937_macr" = c("Macrophagesnone", "U937none"),
  "zymo_macr" = c("Macrophagesz23", "Macrophagesz22"),
  "zymo_u937" = c("U937z23", "U937z22"),
  "z23_types" = c("U937z23", "Macrophagesz23"),
  "z22_types" = c("U937z22", "Macrophagesz22"),
  "zymos_types" = c("zymos_vs_types"))
tmrc2_typedrug_keepers <- list(
  "type_nodrug" = c("U937none", "Macrophagesnone"),
  "type_drug" = c("U937antimony", "Macrophagesantimony"),
  "macr_drugs" = c("Macrophagesantimony", "Macrophagesnone"),
  "u937_drugs" = c("U937antimony", "U937none"))
u937_keepers <- list(
  "z23nosb_vs_uninf" = c("infz23", "uninfnone"),
  "z22nosb_vs_uninf" = c("infz22", "uninfnone"),
  "z23nosb_vs_z22nosb" = c("infz23", "infz22"),
  "z23sb_vs_z22sb" = c("infsbz23", "infsbz22"),
  "z23sb_vs_z23nosb" = c("infsbz23", "infz23"),
  "z22sb_vs_z22nosb" = c("infsbz22", "infz22"),
  "z23sb_vs_sb" = c("infsbz23", "uninfsbnone"),
  "z22sb_vs_sb" = c("infsbz22", "uninfsbnone"),
  "z23sb_vs_uninf" = c("infsbz23", "uninfnone"),
  "z22sb_vs_uninf" = c("infsbz22", "uninfnone"),
  "sb_vs_uninf" = c("uninfsbnone", "uninfnone"))
high_expression <- 128
high_expression_column <- "deseq_basemean"

combined_to_tsv <- function(combined, celltype = "all") {
  keepers <- combined[["keepers"]]
  for (k in seq_len(length(keepers))) {
    kname <- names(keepers)[k]
    numerator <- keepers[[k]][1]
    denominator <- keepers[[k]][2]
    filename <- glue("excel/macrophage_de/{ver}/tsv_tables/tmrc2_{celltype}_{kname}_n{numerator}_d{denominator}-v{ver}.xlsx")
    kdata <- combined[["data"]][[kname]]
    if (is.null(kdata[["basic_num"]])) {
      next
    }
    wanted <- c("hgncsymbol", "deseq_logfc", "deseq_adjp", "deseq_basemean", "deseq_num", "deseq_den")
    wanted_data <- kdata[, wanted]
    colnames(wanted_data) <- c("hgncsymbol", "deseq_logfc", "deseq_adjp", "deseq_mean", "deseq_numerator", "deseq_denominator")
    write_xlsx(data = wanted_data, excel = filename)
  }
}

write_all_gp <- function(all_gp) {
  for (g in seq_len(length(all_gp))) {
    name <- names(all_gp)[g]
    datum <- all_gp[[name]]
    filename <- glue("excel/macrophage_de/{ver}/gprofiler/{name}_gprofiler-v{ver}.xlsx")
    written <- sm(write_gprofiler_data(datum, excel = filename))
  }
}

3.3.1 Primary queries

There is a series of initial questions which make some sense to me, but these do not necessarily match the set of questions which are most pressing. I am hoping to pull both of these sets of queries in one.

Before extracting these groups of queries, let us invoke the all_pairwise() function and get all of the likely contrasts along with one or more extras that might prove useful (the ‘extra’ argument).

3.3.2 Combined U937 and Macrophages: Compare drug effects

When we have the u937 cells in the same dataset as the macrophages, that provides an interesting opportunity to see if we can observe drug-dependant effects which are shared across both cell types.

drug_de <- all_pairwise(all_human, filter = TRUE, model_batch = "svaseq", do_noiseq = FALSE)
## 
## antimony     none 
##       34       34
## Removing 0 low-count genes (12283 remaining).
## Setting 3092 low elements to zero.
## transform_counts: Found 3092 values equal to 0, adding 1 to the matrix.
drug_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
## The logFC agreement among the methods follows:
##                nn_vs_ntmn
## limma_vs_deseq     0.9665
## limma_vs_edger     0.9688
## limma_vs_ebseq     0.9728
## limma_vs_basic     0.9635
## deseq_vs_edger     0.9988
## deseq_vs_ebseq     0.9344
## deseq_vs_basic     0.9874
## edger_vs_ebseq     0.9383
## edger_vs_basic     0.9866
## ebseq_vs_basic     0.9143
drug_table <- combine_de_tables(
  drug_de, keepers = tmrc2_drug_keepers,
  excel = glue("excel/macrophage_de/{ver}/de_tables/macrophage_drug_comparison-v{ver}.xlsx"))
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: ebseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
drug_table
## A set of combined differential expression results.
##                       table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 none_vs_antimony-inverted         480           764         480           759
##   limma_sigup limma_sigdown
## 1         471           700
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?

combined_to_tsv(drug_table, celltype = "all")

drug_sig <- extract_significant_genes(
  drug_table,
  excel = glue("excel/macrophage_de/{ver}/sig_tables/macrophage_drug_sig-v{ver}.xlsx"))
drug_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug      471        700      480        759      480        764      323
##      ebseq_down basic_up basic_down
## drug        577      444        590

drug_highsig <- extract_significant_genes(
  drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("excel/macrophage_de/{ver}/sig_tables/macrophage_drug_highsig-v{ver}.xlsx"))
drug_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug      222        388      233        427      231        429      162
##      ebseq_down basic_up basic_down
## drug        346      211        331

all_drug_gp <- all_gprofiler(drug_sig)
all_drug_gp
## Running gProfiler on every set of significant genes found:
##            GO KEGG REAC WP  TF MIRNA HPA CORUM HP
## drug_up   159    5   76  8  27     1   0     0  0
## drug_down 470    0    1  0 313     1   0     0  0
write_all_gp(all_drug_gp)

3.3.3 Combined U937 and Macrophages: compare cell types

There are a couple of ways one might want to directly compare the two cell types.

  • Given that the variance between the two celltypes is so huge, just compare all samples.
  • One might want to compare them with the interaction effects of drug/zymodeme.
type_de <- all_pairwise(all_human_types, filter = TRUE, model_batch = "svaseq", do_noiseq = FALSE)
## 
## Macrophages        U937 
##          54          14
## Removing 0 low-count genes (12283 remaining).
## Setting 8682 low elements to zero.
## transform_counts: Found 8682 values equal to 0, adding 1 to the matrix.
type_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
## The logFC agreement among the methods follows:
##                U937_vs_Mc
## limma_vs_deseq     0.9851
## limma_vs_edger     0.9859
## limma_vs_ebseq     0.9490
## limma_vs_basic     0.9992
## deseq_vs_edger     0.9976
## deseq_vs_ebseq     0.9806
## deseq_vs_basic     0.9833
## edger_vs_ebseq     0.9836
## edger_vs_basic     0.9843
## ebseq_vs_basic     0.9461
type_table <- combine_de_tables(
  type_de, keepers = tmrc2_type_keepers,
  excel = glue("excel/macrophage_de/{ver}/de_tables/macrophage_type_comparison-v{ver}.xlsx"))
type_table
## A set of combined differential expression results.
##                 table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 U937_vs_Macrophages        2105          2436        2077          2462
##   limma_sigup limma_sigdown
## 1        2247          2129
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?

combined_to_tsv(type_table, celltype = "all")

type_sig <- extract_significant_genes(
  type_table,
  excel = glue("excel/macrophage_de/{ver}/sig_tables/macrophage_type_sig-v{ver}.xlsx"))
type_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## type     2247       2129     2077       2462     2105       2436     1880
##      ebseq_down basic_up basic_down
## type       2485     2231       2097

type_highsig <- extract_significant_genes(
  type_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("excel/macrophage_de/{ver}/sig_tables/macrophage_type_highsig-v{ver}.xlsx"))
type_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## type     1365       1632     1298       1764     1322       1736     1181
##      ebseq_down basic_up basic_down
## type       1789     1354       1617

3.3.3.1 Combined factors of interest: celltype+zymodeme

Given the above explicit comparison of all samples comprising the two cell types, now let us look at the drug treatment+zymodeme status with all samples, macrophages and U937.

type_zymo_de <- all_pairwise(type_zymo, filter = TRUE, model_batch = "svaseq", do_noiseq = FALSE,
                             extra_contrasts = type_zymo_extra)
## 
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none 
##                8               23               23                2 
##         U937_z22         U937_z23 
##                6                6
## Removing 0 low-count genes (12283 remaining).
## Setting 9655 low elements to zero.
## transform_counts: Found 9655 values equal to 0, adding 1 to the matrix.

type_zymo_de
## A pairwise differential expression with results from: basic, deseq, edger, limma.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 6 comparisons.
type_zymo_table <- combine_de_tables(
  type_zymo_de, keepers = tmrc2_typezymo_keepers,
  excel = glue("excel/macrophage_de/de_tables/macrophage_type_zymo_comparison-v{ver}.xlsx"))
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Did not find NA or zymos_vs_types.
## Did not find NA or zymos_vs_types.
combined_to_tsv(type_zymo_table, celltype = "all")

type_zymo_sig <- extract_significant_genes(
  type_zymo_table,
  excel = glue("excel/macrophage_de/{ver}/sig_tables/macrophage_type_zymo_sig-v{ver}.xlsx"))
## There is no deseq_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
type_zymo_sig
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##             limma_up limma_down edger_up edger_down deseq_up deseq_down
## u937_macr       2019       2206     2363       2089     2351       2082
## zymo_macr        384        319      298        463      300        459
## zymo_u937          0          0        1          3        1          2
## z23_types       2295       2181     2123       2498     2152       2468
## z22_types       2271       2154     2004       2558     2024       2540
## zymos_types      187        223      337        220        0          0
##             basic_up basic_down
## u937_macr       1319       1697
## zymo_macr        362        288
## zymo_u937          0          0
## z23_types       2269       2098
## z22_types       2268       2132
## zymos_types        0          0

type_zymo_highsig <- extract_significant_genes(
  type_zymo_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("excel/macrophage_de/{ver}/sig_tables/macrophage_type_zymo_highsig-v{ver}.xlsx"))
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## There is no deseq_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var

3.3.3.2 Combined factors of inteest: celltype+drug

The ‘type_drug’ datastructure is the same as above, but the condition is created from the concatenation of the cell type and drug treatment.

type_drug_de <- all_pairwise(type_drug, filter = TRUE, model_batch = "svaseq")
## 
## Macrophages_antimony     Macrophages_none        U937_antimony 
##                   27                   27                    7 
##            U937_none 
##                    7
## Removing 0 low-count genes (12283 remaining).
## Setting 9642 low elements to zero.
## transform_counts: Found 9642 values equal to 0, adding 1 to the matrix.
type_drug_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
type_drug_table <- combine_de_tables(
  type_drug_de, keepers = tmrc2_typedrug_keepers,
  excel = glue("excel/macrophage_de/{ver}/de_tables/macrophage_type_drug_comparison-v{ver}.xlsx"))
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: ebseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: ebseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
type_drug_table
## A set of combined differential expression results.

##                                             table deseq_sigup deseq_sigdown
## 1                     U937none_vs_Macrophagesnone        2099          2639
## 2             U937antimony_vs_Macrophagesantimony        2102          2375
## 3 Macrophagesnone_vs_Macrophagesantimony-inverted         599           963
## 4               U937none_vs_U937antimony-inverted         423           167
##   edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1        2063          2665        2288          2197
## 2        2083          2387        2254          2130
## 3         605           963         669           914
## 4         439           176         209           162

combined_to_tsv(type_drug_table, celltype = "all")
type_drug_sig <- extract_significant_genes(
  type_drug_table,
  excel = glue("excel/macrophage_de/{ver}/sig_tables/macrophage_type_drug_sig-v{ver}.xlsx"))
type_drug_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##             limma_up limma_down edger_up edger_down deseq_up deseq_down
## type_nodrug     2288       2197     2063       2665     2099       2639
## type_drug       2254       2130     2083       2387     2102       2375
## macr_drugs       669        914      605        963      599        963
## u937_drugs       209        162      439        176      423        167
##             ebseq_up ebseq_down basic_up basic_down
## type_nodrug     1956       2465     2315       2164
## type_drug       2008       2312     2254       2151
## macr_drugs       482        881      669        858
## u937_drugs       359        157      233        179

type_drug_highsig <- extract_significant_genes(
  type_drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("excel/macrophage_de/{ver}/sig_tables/macrophage_type_drug_highsig-v{ver}.xlsx"))
type_drug_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##             limma_up limma_down edger_up edger_down deseq_up deseq_down
## type_nodrug     1391       1694     1313       1892     1343       1866
## type_drug       1385       1651     1302       1736     1320       1721
## macr_drugs       328        523      301        563      301        564
## u937_drugs       101         84      209        105      203        101
##             ebseq_up ebseq_down basic_up basic_down
## type_nodrug     1275       1789     1414       1675
## type_drug       1266       1719     1392       1681
## macr_drugs       243        517      330        497
## u937_drugs       168        100      118         99

4 Individual cell types

At this point, I think it is fair to say that the two cell types are sufficiently different that they do not really belong together in a single analysis.

4.1 drug or strain effects, single cell type

One of the queries Najib asked which I think I misinterpreted was to look at drug and/or strain effects. My interpretation is somewhere below and was not what he was looking for. Instead, he was looking to see all(macrophage) drug/nodrug and all(macrophage) z23/z22 and compare them to each other. It may be that this is still a wrong interpretation, if so the most likely comparison is either:

  • (z23drug/z22drug) / (z23nodrug/z22nodrug), or perhaps
  • (z23drug/z23nodrug) / (z22drug/z22nodrug),

I am not sure those confuse me, and at least one of them is below

4.1.1 Macrophages

In these blocks we will explicitly query only one factor at a time, drug and strain. The eventual goal is to look for effects of drug treatment and/or strain treatment which are shared?

4.1.1.1 Macrophage Drug only

Thus we will start with the pure drug query. In this block we will look only at the drug/nodrug effect.

hs_macr_drug_de <- all_pairwise(hs_macr_drug_expt, filter = TRUE, model_batch = "svaseq")
## 
## antimony     none 
##       27       27
## Removing 0 low-count genes (11756 remaining).
## Setting 1309 low elements to zero.
## transform_counts: Found 1309 values equal to 0, adding 1 to the matrix.
hs_macr_drug_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
## The logFC agreement among the methods follows:
##                 nn_vs_ntmn
## limma_vs_deseq      0.9911
## limma_vs_edger      0.9911
## limma_vs_ebseq      0.9694
## limma_vs_basic      0.9922
## limma_vs_noiseq    -0.8786
## deseq_vs_edger      0.9997
## deseq_vs_ebseq      0.9647
## deseq_vs_basic      0.9916
## deseq_vs_noiseq    -0.8861
## edger_vs_ebseq      0.9643
## edger_vs_basic      0.9911
## edger_vs_noiseq    -0.8872
## ebseq_vs_basic      0.9618
## ebseq_vs_noiseq    -0.8578
## basic_vs_noiseq    -0.8903
hs_macr_drug_table <- combine_de_tables(
  hs_macr_drug_de, keepers = tmrc2_drug_keepers,
  excel = glue("excel/macrophage_de/macrophage_onlydrug_table-v{ver}.xlsx"))
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: ebseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
hs_macr_drug_table
## A set of combined differential expression results.
##                       table deseq_sigup deseq_sigdown edger_sigup edger_sigdown
## 1 none_vs_antimony-inverted         519           862         525           852
##   limma_sigup limma_sigdown
## 1         556           808
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?

combined_to_tsv(hs_macr_drug_table, celltype = "macrophage")

hs_macr_drug_sig <- extract_significant_genes(
  hs_macr_drug_table,
  excel = glue("excel/macrophage_de/macrophageonly_drug_sig-v{ver}.xlsx"))
hs_macr_drug_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug      556        808      525        852      519        862      425
##      ebseq_down basic_up basic_down
## drug        821      573        766

hs_macr_drug_highsig <- extract_significant_genes(
  hs_macr_drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("excel/macrophage_de/macrophageonly_drug_highsig-v{ver}.xlsx"))
hs_macr_drug_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##      limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## drug      283        492      273        539      268        548      225
##      ebseq_down basic_up basic_down
## drug        511      292        482

4.1.1.2 Macrophage Strain only

In a similar fashion, let us look for effects which are observed when we consider only the strain used during infection.

hs_macr_strain_de <- all_pairwise(hs_macr_strain_expt, filter = TRUE, model_batch = "svaseq")
## 
## z22 z23 
##  23  23
## Removing 0 low-count genes (11720 remaining).
## Setting 1017 low elements to zero.
## transform_counts: Found 1017 values equal to 0, adding 1 to the matrix.
hs_macr_strain_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
## The logFC agreement among the methods follows:
##                 z23_vs_z22
## limma_vs_deseq      0.9637
## limma_vs_edger      0.9668
## limma_vs_ebseq      0.9614
## limma_vs_basic      0.9839
## limma_vs_noiseq    -0.8442
## deseq_vs_edger      0.9991
## deseq_vs_ebseq      0.9721
## deseq_vs_basic      0.9596
## deseq_vs_noiseq    -0.8301
## edger_vs_ebseq      0.9726
## edger_vs_basic      0.9623
## edger_vs_noiseq    -0.8338
## ebseq_vs_basic      0.9399
## ebseq_vs_noiseq    -0.8013
## basic_vs_noiseq    -0.8625
hs_macr_strain_table <- combine_de_tables(
  hs_macr_strain_de, keepers = tmrc2_strain_keepers,
  excel = glue("excel/macrophage_de/macrophage_onlystrain_table-v{ver}.xlsx"))
hs_macr_strain_table
## A set of combined differential expression results.
##        table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 z23_vs_z22         291           371         290           366         337
##   limma_sigdown
## 1           275
## `geom_line()`: Each group consists of only one observation.
## i Do you need to adjust the group aesthetic?

combined_to_tsv(hs_macr_strain_table, celltype = "macrophage")

hs_macr_strain_sig <- extract_significant_genes(
  hs_macr_strain_table,
  excel = glue("excel/macrophage_de/macrophageonly_onlystrain_sig-v{ver}.xlsx"))
hs_macr_strain_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## strain      337        275      290        366      291        371      199
##        ebseq_down basic_up basic_down
## strain        216      317        253

hs_macr_strain_highsig <- extract_significant_genes(
  hs_macr_strain_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("excel/macrophage_de/macrophageonly_onlystrain_highsig-v{ver}.xlsx"))
hs_macr_strain_highsig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##        limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## strain      193        101      194        110      194        112      156
##        ebseq_down basic_up basic_down
## strain         51      187        101

4.1.1.3 Compare Drug and Strain Effects

Now let us consider the above two comparisons together. First, I will plot the logFC values of them against each other (drug on x-axis and strain on the y-axis). Then we can extract the significant genes in a few combined categories of interest. I assume these will focus exclusively on the categories which include the introduction of the drug.

drug_strain_comp_df <- merge(hs_macr_drug_table[["data"]][["drug"]],
                             hs_macr_strain_table[["data"]][["strain"]],
                             by = "row.names")
drug_strain_comp_plot <- plot_linear_scatter(
  drug_strain_comp_df[, c("deseq_logfc.x", "deseq_logfc.y")])
## Contrasts: antimony/none, z23/z22; x-axis: drug, y-axis: strain
## top left: higher no drug, z23; top right: higher drug z23
## bottom left: higher no drug, z22; bottom right: higher drug z22
drug_strain_comp_plot$scatter

As I noted in the comments above, some quadrants of the scatter plot are likely to be of greater interest to us than others (the right side). Because I get confused sometimes, the following block will explicitly name the categories of likely interest, then ask which genes are shared among them, and finally use UpSetR to extract the various gene intersection/union categories.

higher_drug <- hs_macr_drug_sig[["deseq"]][["downs"]][[1]]
higher_nodrug <- hs_macr_drug_sig[["deseq"]][["ups"]][[1]]
higher_z23 <- hs_macr_strain_sig[["deseq"]][["ups"]][[1]]
higher_z22 <- hs_macr_strain_sig[["deseq"]][["downs"]][[1]]
sum(rownames(higher_drug) %in% rownames(higher_z23))
## [1] 94
sum(rownames(higher_drug) %in% rownames(higher_z22))
## [1] 87
sum(rownames(higher_nodrug) %in% rownames(higher_z23))
## [1] 26
sum(rownames(higher_nodrug) %in% rownames(higher_z22))
## [1] 73
drug_z23_lst <- list("drug" = rownames(higher_drug),
                     "z23" = rownames(higher_z23))
upset_input <- UpSetR::fromList(drug_z23_lst)
higher_drug_z23 <- upset(upset_input, text.scale = 2)
higher_drug_z23

drug_z23_shared_genes <- overlap_groups(drug_z23_lst)
shared_genes_drug_z23 <- overlap_geneids(drug_z23_shared_genes, "drug:z23")
shared_genes_drug_z23 <- attr(drug_z23_shared_genes, "elements")[drug_z23_shared_genes[["drug:z23"]]]

drug_z22_lst <- list("drug" = rownames(higher_drug),
                     "z22" = rownames(higher_z22))
higher_drug_z22 <- upset(UpSetR::fromList(drug_z22_lst), text.scale = 2)
higher_drug_z22

drug_z22_shared_genes <- overlap_groups(drug_z22_lst)
shared_genes_drug_z22 <- overlap_geneids(drug_z22_shared_genes, "drug:z22")
shared_genes_drug_z22 <- attr(drug_z22_shared_genes, "elements")[drug_z22_shared_genes[["drug:z22"]]]

4.1.1.4 Perform gProfiler on drug/strain effect shared genes

Now that we have some populations of genes which are shared across the drug/strain effects, let us pass them to some GSEA analyses and see what pops out.

wanted <- drug_z23_shared_genes[["drug:z23"]]
shared_genes_drug_z23 <- attr(drug_z23_shared_genes, "elements")[wanted]
shared_drug_z23_gp <- simple_gprofiler(shared_genes_drug_z23)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## Add a little logic here to use enrichplot::dotplot().
shared_drug_z23_gp[["pvalue_plots"]][["MF"]]

shared_drug_z23_gp[["pvalue_plots"]][["BP"]]

shared_drug_z23_gp[["pvalue_plots"]][["REAC"]]

wanted <- drug_z22_shared_genes[["drug:z22"]]
shared_genes_drug_z22 <- attr(drug_z22_shared_genes, "elements")[wanted]
shared_drug_z22_gp <- simple_gprofiler(shared_genes_drug_z22)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## Add a little logic here to use enrichplot::dotplot().
shared_drug_z22_gp[["pvalue_plots"]][["BP"]]

4.2 Our main question of interest

The data structure hs_macr contains our primary macrophages, which are, as shown above, the data we can really sink our teeth into.

Note, we expect some errors when running the combine_de_tables() because not all methods I use are comfortable using the ratio or ratios contrasts we added in the ‘extras’ argument. As a result, when we combine them into the larger output tables, those peculiar contrasts fail. This does not stop it from writing the rest of the results, however.

#test = deseq_pairwise(normalize_expt(hs_macr, filter=TRUE),
#                      model_batch = "svaseq", filter = TRUE,
#                      extra_contrasts = tmrc2_human_extra)

hs_macr_de <- all_pairwise(hs_macr, model_batch = "svaseq",
                           filter = TRUE, extra_contrasts = tmrc2_human_extra)
## 
##      inf_z22      inf_z23    infsb_z22    infsb_z23   uninf_none uninfsb_none 
##           11           12           12           11            4            4
## Removing 0 low-count genes (11756 remaining).
## Setting 2374 low elements to zero.
## transform_counts: Found 2374 values equal to 0, adding 1 to the matrix.
hs_macr_de
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
hs_single_table <- combine_de_tables(
  hs_macr_de, keepers = single_tmrc2_keeper,
  excel = glue("excel/macrophage_de/hs_macr_drug_zymo_z22sb_sb-v{ver}.xlsx"))
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
hs_single_table
## A set of combined differential expression results.
##                              table deseq_sigup deseq_sigdown edger_sigup
## 1 uninfsbnone_vs_infsbz22-inverted          33             0          31
##   edger_sigdown limma_sigup limma_sigdown
## 1             0           2             0
## Error in colSums(temp_data): 'x' must be an array of at least two dimensions
hs_macr_table <- combine_de_tables(
  hs_macr_de, keepers = tmrc2_human_keepers,
  excel = glue("excel/macrophage_de/hs_macr_drug_zymo_table_macr_only-v{ver}.xlsx"))
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_z2322 does not appear in the pairwise data.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_z2322 does not appear in the pairwise data.
## Did not find z22drugnodrug or z23drugnodrug.
## Did not find z22drugnodrug or z23drugnodrug.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_drugnodrug does not appear in the pairwise data.
## Warning in extract_keepers(extracted, keepers, table_names, all_coefficients, :
## The table for extra_drugnodrug does not appear in the pairwise data.
## Did not find z23z22nodrug or z23z22drug.
## Did not find z23z22nodrug or z23z22drug.
hs_macr_table
## A set of combined differential expression results.

##                               table deseq_sigup deseq_sigdown edger_sigup
## 1      uninfnone_vs_infz23-inverted         478           265         472
## 2      uninfnone_vs_infz22-inverted         359             6         340
## 3                  infz23_vs_infz22         349           539         360
## 4              infsbz23_vs_infsbz22         343           252         340
## 5       infz23_vs_infsbz23-inverted         619           828         625
## 6       infz22_vs_infsbz22-inverted         505          1040         520
## 7  uninfsbnone_vs_infsbz23-inverted         461           247         464
## 8  uninfsbnone_vs_infsbz22-inverted          33             0          31
## 9    uninfnone_vs_infsbz23-inverted         839           923         854
## 10   uninfnone_vs_infsbz22-inverted         660           746         672
## 11         uninfsbnone_vs_uninfnone         561           748         564
## 12   z23drugnodrug_vs_z22drugnodrug           0             0         330
## 13       z23z22drug_vs_z23z22nodrug           0             0         330
##    edger_sigdown limma_sigup limma_sigdown
## 1            270         392           251
## 2              6         264            72
## 3            528         451           390
## 4            253         378           216
## 5            821         571           746
## 6           1009         671           925
## 7            249         374           233
## 8              0           2             0
## 9            906         805           914
## 10           733         556           745
## 11           741         514           696
## 12            63         244           135
## 13            63         244           135

combined_to_tsv(hs_macr_table, "macrophage")

hs_macr_sig <- extract_significant_genes(
  hs_macr_table,
  excel = glue("excel/macrophage_de/hs_macr_drug_zymo_sig-v{ver}.xlsx"))
## There is no deseq_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no deseq_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
hs_macr_sig
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf        392        251      472        270      478        265
## z22nosb_vs_uninf        264         72      340          6      359          6
## z23nosb_vs_z22nosb      451        390      360        528      349        539
## z23sb_vs_z22sb          378        216      340        253      343        252
## z23sb_vs_z23nosb        571        746      625        821      619        828
## z22sb_vs_z22nosb        671        925      520       1009      505       1040
## z23sb_vs_sb             374        233      464        249      461        247
## z22sb_vs_sb               2          0       31          0       33          0
## z23sb_vs_uninf          805        914      854        906      839        923
## z22sb_vs_uninf          556        745      672        733      660        746
## sb_vs_uninf             514        696      564        741      561        748
## extra_z2322             244        135      330         63        0          0
## extra_drugnodrug        244        135      330         63        0          0
##                    basic_up basic_down
## z23nosb_vs_uninf        203        117
## z22nosb_vs_uninf         78         10
## z23nosb_vs_z22nosb      425        407
## z23sb_vs_z22sb          195        116
## z23sb_vs_z23nosb        538        700
## z22sb_vs_z22nosb        668        892
## z23sb_vs_sb             130         58
## z22sb_vs_sb               5          0
## z23sb_vs_uninf          478        628
## z22sb_vs_uninf          277        414
## sb_vs_uninf             136        132
## extra_z2322               0          0
## extra_drugnodrug          0          0

hs_macr_highsig <- extract_significant_genes(
  hs_macr_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("excel/macrophage_de/hs_macr_drug_zymo_highsig-v{ver}.xlsx"))
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column =
## this_fc_column, : The column deseq_basemean does not appears to be in the
## table, cannot filter by expression.
## There is no deseq_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no deseq_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
## There is no basic_logfc column in the table.
## The columns are: ensemblgeneid, ensembltranscriptid, version, transcriptversion, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, hgncsymbol, uniprotgnsymbol, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_fdr, edger_adjp_fdr, lfc_meta, lfc_var, lfc_varbymed, p_meta, p_var
hs_macr_highsig
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf        269        139      317        139      314        138
## z22nosb_vs_uninf        103          4      110          0      115          0
## z23nosb_vs_z22nosb      221        154      247        174      238        178
## z23sb_vs_z22sb          211        105      210         86      211         84
## z23sb_vs_z23nosb        305        482      306        566      303        570
## z22sb_vs_z22nosb        330        545      301        572      288        598
## z23sb_vs_sb             250        130      279        140      274        140
## z22sb_vs_sb               2          0        9          0       13          0
## z23sb_vs_uninf          499        603      491        605      482        618
## z22sb_vs_uninf          310        479      318        501      303        513
## sb_vs_uninf             291        459      294        495      291        498
## extra_z2322             244        135      330         63        0          0
## extra_drugnodrug        244        135      330         63        0          0
##                    basic_up basic_down
## z23nosb_vs_uninf        163         79
## z22nosb_vs_uninf         33          0
## z23nosb_vs_z22nosb      225        188
## z23sb_vs_z22sb          127         52
## z23sb_vs_z23nosb        305        484
## z22sb_vs_z22nosb        334        551
## z23sb_vs_sb             108         28
## z22sb_vs_sb               0          0
## z23sb_vs_uninf          336        436
## z22sb_vs_uninf          175        281
## sb_vs_uninf             103        109
## extra_z2322               0          0
## extra_drugnodrug          0          0

4.3 gene group upset

nodrug_upset <- upsetr_combined_de(hs_macr_table,
                                   desired_contrasts = c("z22nosb_vs_uninf", "z23nosb_vs_uninf"))
## Error in upsetr_combined_de(hs_macr_table, desired_contrasts = c("z22nosb_vs_uninf", : could not find function "upsetr_combined_de"
pp(file = "images/nodrug_upset.png")
nodrug_upset
## Error in eval(expr, envir, enclos): object 'nodrug_upset' not found
dev.off()
## png 
##   2
drug_upset <- upsetr_combined_de(hs_macr_table,
                                 desired_contrasts = c("z22sb_vs_sb", "z23sb_vs_sb"))
## Error in upsetr_combined_de(hs_macr_table, desired_contrasts = c("z22sb_vs_sb", : could not find function "upsetr_combined_de"
pp(file = "images/drug_upset.png")
drug_upset
## Error in eval(expr, envir, enclos): object 'drug_upset' not found
dev.off()
## png 
##   2

5 Significance barplot of interest

Olga kindly sent a set of particularly interesting contrasts and colors for a significance barplot, they include the following:

  • z2.3 vs. uninfected.
  • z2.2 vs. uninfected.
  • z2.3 vs z2.2
  • z2.3Sbv vs z2.3
  • z2.2Sbv vs z2.2
  • z2.3Sbv vs z2.2Sbv
  • Sbv vs uninfected.

The existing set of ‘keepers’ exvised to these is taken from the extant set of ‘tmrc2_human_keepers’ and is as follows:

barplot_keepers <- list(
  ## z2.3 vs uninfected
  "z23nosb_vs_uninf" = c("infz23", "uninfnone"),
  ## z2.2 vs uninfected
  "z22nosb_vs_uninf" = c("infz22", "uninfnone"),
  ## z2.3 vs z2.2
  "z23nosb_vs_z22nosb" = c("infz23", "infz22"),
  ## z2.3Sbv vs z2.3
  "z23sb_vs_z23nosb" = c("infsbz23", "infz23"),
  ## z2.2Sbv vs z2.2
  "z22sb_vs_z22nosb" = c("infsbz22", "infz22"),
  ## z2.3Sbv vs z2.2Sbv
  "z23sb_vs_z22sb" = c("infsbz23", "infsbz22"),
  ## Sbv vs uninfected.
  "sb_vs_uninf" = c("uninfsbnone", "uninfnone"))
barplot_combined <- combine_de_tables(
  hs_macr_de, keepers = barplot_keepers,
  excel = glue("excel/macrophage_de/hs_macr_drug_zymo_7contrasts-v{ver}.xlsx"))
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: noiseq_logfc.

Now let us use the colors suggested by Olga to make a barplot of these…

color_list <-  c( "#de8bf9", "#ad07e3","#410257", "#ffa0a0", "#f94040", "#a00000")
barplot_sig <- extract_significant_genes(
  barplot_combined, color_list = color_list, according_to = "deseq",
  excel = glue("excel/macrophage_de/hs_macr_drug_zymo_7contrasts_sig-v{ver}.xlsx"))
barplot_sig
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    deseq_up deseq_down
## z23nosb_vs_uninf        478        265
## z22nosb_vs_uninf        359          6
## z23nosb_vs_z22nosb      349        539
## z23sb_vs_z23nosb        619        828
## z22sb_vs_z22nosb        505       1040
## z23sb_vs_z22sb          343        252
## sb_vs_uninf             561        748

6 PROPER

In our last meeting there were some questions about the statistical power of different future experimental designs. One thing I can do is to use PROPER to estimate the power of an extant dataset and infer from that the likely power of other designs.

In order to use proper, one must feed it one or more DE tables.

power_estimate <- simple_proper(hs_single_table)
## Working on contrast 1: uninfsbnone_vs_infsbz22-inverted.
## Loading required package: edgeR
## Loading required package: limma
## 
## Attaching package: 'limma'
## The following object is masked from 'package:BiocGenerics':
## 
##     plotMA
##     SS=3,3 SS=5,5 SS=7,7 SS=10,10
## 0.2   0.70   0.72   0.73     0.75
## 0.5   0.78   0.79   0.79     0.79
## 1     0.96   0.98   0.98     0.99
## 2     0.84   0.84   0.84     0.84
## 5     0.88   0.88   0.88     0.88
## 10    0.89   0.88   0.88     0.88
power_estimate
## Assuming similar expression patterns and variance to the
## provided experiment, comparing uninfsbnone_vs_infsbz22, and a FDR
## cutoff of 0.05, simulations by PROPER (DOI:10.1093/bioinformatics/btu640)
## suggest that it should be possible to identify 80% of DE genes with a |log2FC| >= 1
## when the sequencing depth is in the range of (80,160] using 5
## replicates in each group.
## 
##   Assuming the 11756 genes used have a mean length of 2000 and the sequencing run
## produces 200nt per read, ~834,123,060 reads will be required per sample to
## approach 160 reads per gene.
## Error in xy.coords(x, y, xlabel, ylabel, log): 'x' is a list, but does not have components 'x' and 'y'
power_estimate[[1]][["power_plot"]]

power_estimate[[1]][["powertd_plot"]]

power_estimate[[1]][["powerfd_plot"]]

6.0.1 Our main questions in U937

Let us do the same comparisons in the U937 samples, though I will not do the extra contrasts, primarily because I think the dataset is less likely to support them.

u937_de <- all_pairwise(u937_expt, model_batch = "svaseq",
                        filter = TRUE, do_noiseq = FALSE)
## 
##      inf_z22      inf_z23    infsb_z22    infsb_z23   uninf_none uninfsb_none 
##            3            3            3            3            1            1
## Removing 0 low-count genes (10751 remaining).
## Setting 5 low elements to zero.
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
u937_de
## A pairwise differential expression with results from: basic, deseq, edger, limma.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 6 comparisons.
u937_table <- combine_de_tables(
  u937_de, keepers = u937_keepers,
  excel = glue("excel/macrophage_de/u937_drug_zymo_table-v{ver}.xlsx"))
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
## Inverting column: basic_logfc.
## Inverting column: deseq_logfc.
## Inverting column: edger_logfc.
## Inverting column: limma_logfc.
u937_table
## A set of combined differential expression results.

##                               table deseq_sigup deseq_sigdown edger_sigup
## 1      uninfnone_vs_infz23-inverted           0             5           2
## 2      uninfnone_vs_infz22-inverted           0             0           0
## 3                  infz23_vs_infz22           1             0          17
## 4              infsbz23_vs_infsbz22           0             0           0
## 5       infz23_vs_infsbz23-inverted         256           171         311
## 6       infz22_vs_infsbz22-inverted         298           154         305
## 7  uninfsbnone_vs_infsbz23-inverted           0             0           2
## 8  uninfsbnone_vs_infsbz22-inverted           0             0           2
## 9    uninfnone_vs_infsbz23-inverted         296           151         306
## 10   uninfnone_vs_infsbz22-inverted         294           169         300
## 11         uninfsbnone_vs_uninfnone         239           119         261
##    edger_sigdown limma_sigup limma_sigdown
## 1              5           0             3
## 2              5           0             2
## 3              6           3             4
## 4              1           0             2
## 5            176         226           196
## 6            149         220           190
## 7              0           0             0
## 8              5           1             4
## 9            155         235           183
## 10           175         230           211
## 11           127         196           155

combined_to_tsv(u937_table, celltype = "u937")

u937_sig <- extract_significant_genes(
  u937_table,
  excel = glue("excel/macrophage_de/u937_drug_zymo_sig-v{ver}.xlsx"))
u937_sig
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf          0          3        2          5        0          5
## z22nosb_vs_uninf          0          2        0          5        0          0
## z23nosb_vs_z22nosb        3          4       17          6        1          0
## z23sb_vs_z22sb            0          2        0          1        0          0
## z23sb_vs_z23nosb        226        196      311        176      256        171
## z22sb_vs_z22nosb        220        190      305        149      298        154
## z23sb_vs_sb               0          0        2          0        0          0
## z22sb_vs_sb               1          4        2          5        0          0
## z23sb_vs_uninf          235        183      306        155      296        151
## z22sb_vs_uninf          230        211      300        175      294        169
## sb_vs_uninf             196        155      261        127      239        119
##                    basic_up basic_down
## z23nosb_vs_uninf          0          0
## z22nosb_vs_uninf          0          0
## z23nosb_vs_z22nosb        0          0
## z23sb_vs_z22sb            0          0
## z23sb_vs_z23nosb        111         97
## z22sb_vs_z22nosb         86         68
## z23sb_vs_sb               0          0
## z22sb_vs_sb               0          0
## z23sb_vs_uninf            0          0
## z22sb_vs_uninf            0          0
## sb_vs_uninf               0          0

u937_highsig <- extract_significant_genes(
  u937_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("excel/macrophage_de/u937_drug_zymo_highsig-v{ver}.xlsx"))
u937_highsig
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                    limma_up limma_down edger_up edger_down deseq_up deseq_down
## z23nosb_vs_uninf          0          3        0          4        0          4
## z22nosb_vs_uninf          0          1        0          4        0          0
## z23nosb_vs_z22nosb        2          3        6          4        1          0
## z23sb_vs_z22sb            0          0        0          0        0          0
## z23sb_vs_z23nosb        149        125      174        116      160        120
## z22sb_vs_z22nosb        130        111      152        104      149        107
## z23sb_vs_sb               0          0        0          0        0          0
## z22sb_vs_sb               0          1        0          1        0          0
## z23sb_vs_uninf          145         99      155         97      154         96
## z22sb_vs_uninf          143        119      155        115      155        116
## sb_vs_uninf             126         91      137         89      136         89
##                    basic_up basic_down
## z23nosb_vs_uninf          0          0
## z22nosb_vs_uninf          0          0
## z23nosb_vs_z22nosb        0          0
## z23sb_vs_z22sb            0          0
## z23sb_vs_z23nosb         91         83
## z22sb_vs_z22nosb         71         52
## z23sb_vs_sb               0          0
## z22sb_vs_sb               0          0
## z23sb_vs_uninf            0          0
## z22sb_vs_uninf            0          0
## sb_vs_uninf               0          0

6.0.1.1 Compare (no)Sb z2.3/z2.2 treatments among macrophages

upset_plots_hs_macr <- upsetr_sig(
  hs_macr_sig, both = TRUE,
  contrasts = c("z23sb_vs_z22sb", "z23nosb_vs_z22nosb"))
upset_plots_hs_macr[["both"]]

groups <- upset_plots_hs_macr[["both_groups"]]
shared_genes <- attr(groups, "elements")[groups[[2]]] %>%
  gsub(pattern = "^gene:", replacement = "")
length(shared_genes)
## [1] 387
shared_gp <- simple_gprofiler(shared_genes)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## Add a little logic here to use enrichplot::dotplot().
shared_gp[["pvalue_plots"]][["MF"]]

shared_gp[["pvalue_plots"]][["BP"]]

shared_gp[["pvalue_plots"]][["REAC"]]

drug_genes <- attr(groups, "elements")[groups[["z23sb_vs_z22sb"]]] %>%
  gsub(pattern = "^gene:", replacement = "")
drugonly_gp <- simple_gprofiler(drug_genes)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## Add a little logic here to use enrichplot::dotplot().
drugonly_gp[["pvalue_plots"]][["BP"]]

I want to try something, directly include the u937 data in this…

both_sig <- hs_macr_sig
names(both_sig[["deseq"]][["ups"]]) <- paste0("macr_", names(both_sig[["deseq"]][["ups"]]))
names(both_sig[["deseq"]][["downs"]]) <- paste0("macr_", names(both_sig[["deseq"]][["downs"]]))
u937_deseq <- u937_sig[["deseq"]]
names(u937_deseq[["ups"]]) <- paste0("u937_", names(u937_deseq[["ups"]]))
names(u937_deseq[["downs"]]) <- paste0("u937_", names(u937_deseq[["downs"]]))
both_sig[["deseq"]][["ups"]] <- c(both_sig[["deseq"]][["ups"]], u937_deseq[["ups"]])
both_sig[["deseq"]][["downs"]] <- c(both_sig[["deseq"]][["ups"]], u937_deseq[["downs"]])
summary(both_sig[["deseq"]][["ups"]])
##                         Length Class      Mode
## macr_z23nosb_vs_uninf   58     data.frame list
## macr_z22nosb_vs_uninf   58     data.frame list
## macr_z23nosb_vs_z22nosb 58     data.frame list
## macr_z23sb_vs_z22sb     58     data.frame list
## macr_z23sb_vs_z23nosb   58     data.frame list
## macr_z22sb_vs_z22nosb   58     data.frame list
## macr_z23sb_vs_sb        58     data.frame list
## macr_z22sb_vs_sb        58     data.frame list
## macr_z23sb_vs_uninf     58     data.frame list
## macr_z22sb_vs_uninf     58     data.frame list
## macr_sb_vs_uninf        58     data.frame list
## macr_extra_z2322         0     data.frame list
## macr_extra_drugnodrug    0     data.frame list
## u937_z23nosb_vs_uninf   50     data.frame list
## u937_z22nosb_vs_uninf   50     data.frame list
## u937_z23nosb_vs_z22nosb 50     data.frame list
## u937_z23sb_vs_z22sb     50     data.frame list
## u937_z23sb_vs_z23nosb   50     data.frame list
## u937_z22sb_vs_z22nosb   50     data.frame list
## u937_z23sb_vs_sb        50     data.frame list
## u937_z22sb_vs_sb        50     data.frame list
## u937_z23sb_vs_uninf     50     data.frame list
## u937_z22sb_vs_uninf     50     data.frame list
## u937_sb_vs_uninf        50     data.frame list
upset_plots_both <- upsetr_sig(
  both_sig, both = TRUE,
  contrasts = c("macr_z23sb_vs_z22sb", "macr_z23nosb_vs_z22nosb",
                "u937_z23sb_vs_z22sb", "u937_z23nosb_vs_z22nosb"))
upset_plots_both$both

6.0.1.2 Compare DE results from macrophages and U937 samples

Looking a bit more closely at these, I think the u937 data is too sparse to effectively compare.

macr_u937_comparison <- compare_de_results(hs_macr_table, u937_table)

macr_u937_comparison$lfc_heat

macr_u937_venns <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                 contrasts = "z23sb_vs_z23nosb")

macr_u937_venns$up_plot

macr_u937_venns$down_plot

macr_u937_venns_v2 <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                    contrasts = "z22sb_vs_z22nosb")

macr_u937_venns_v2$up_plot

macr_u937_venns_v2$down_plot

macr_u937_venns_v3 <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                    contrasts = "sb_vs_uninf")

macr_u937_venns_v3$up_plot

macr_u937_venns_v3$down_plot

6.0.2 Compare macrophage/u937 with respect to z2.3/z2.2

comparison_df <- merge(hs_macr_table[["data"]][["z23sb_vs_z22sb"]],
                       u937_table[["data"]][["z23sb_vs_z22sb"]],
                       by = "row.names")
macru937_z23z22_plot <- plot_linear_scatter(comparison_df[, c("deseq_logfc.x", "deseq_logfc.y")])
macru937_z23z22_plot$scatter

comparison_df <- merge(hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]],
                       u937_table[["data"]][["z23nosb_vs_z22nosb"]],
                       by = "row.names")
macru937_z23z22_plot <- plot_linear_scatter(comparison_df[, c("deseq_logfc.x", "deseq_logfc.y")])
macru937_z23z22_plot$scatter

6.0.2.1 Add donor to the contrasts, no sva

no_power_fact <- paste0(pData(hs_macr)[["donor"]], "_",
                        pData(hs_macr)[["condition"]])
table(pData(hs_macr)[["donor"]])
## 
## d01 d02 d09 d81 
##  13  14  13  14
table(no_power_fact)
## no_power_fact
##      d01_inf_z22      d01_inf_z23    d01_infsb_z22    d01_infsb_z23 
##                2                3                3                3 
##   d01_uninf_none d01_uninfsb_none      d02_inf_z22      d02_inf_z23 
##                1                1                3                3 
##    d02_infsb_z22    d02_infsb_z23   d02_uninf_none d02_uninfsb_none 
##                3                3                1                1 
##      d09_inf_z22      d09_inf_z23    d09_infsb_z22    d09_infsb_z23 
##                3                3                3                2 
##   d09_uninf_none d09_uninfsb_none      d81_inf_z22      d81_inf_z23 
##                1                1                3                3 
##    d81_infsb_z22    d81_infsb_z23   d81_uninf_none d81_uninfsb_none 
##                3                3                1                1
hs_nopower <- set_expt_conditions(hs_macr, fact = no_power_fact)
## The numbers of samples by condition are:
## 
##      d01_inf_z22      d01_inf_z23    d01_infsb_z22    d01_infsb_z23 
##                2                3                3                3 
##   d01_uninf_none d01_uninfsb_none      d02_inf_z22      d02_inf_z23 
##                1                1                3                3 
##    d02_infsb_z22    d02_infsb_z23   d02_uninf_none d02_uninfsb_none 
##                3                3                1                1 
##      d09_inf_z22      d09_inf_z23    d09_infsb_z22    d09_infsb_z23 
##                3                3                3                2 
##   d09_uninf_none d09_uninfsb_none      d81_inf_z22      d81_inf_z23 
##                1                1                3                3 
##    d81_infsb_z22    d81_infsb_z23   d81_uninf_none d81_uninfsb_none 
##                3                3                1                1
hs_nopower <- subset_expt(hs_nopower, subset="macrophagezymodeme!='none'")
## The samples excluded are: TMRC30059, TMRC30060, TMRC30266, TMRC30268, TMRC30326, TMRC30327, TMRC30312, TMRC30313.
## subset_expt(): There were 54, now there are 46 samples.
hs_nopower_nosva_de <- all_pairwise(hs_nopower, model_batch = FALSE, filter = TRUE)
## 
##   d01_inf_z22   d01_inf_z23 d01_infsb_z22 d01_infsb_z23   d02_inf_z22 
##             2             3             3             3             3 
##   d02_inf_z23 d02_infsb_z22 d02_infsb_z23   d09_inf_z22   d09_inf_z23 
##             3             3             3             3             3 
## d09_infsb_z22 d09_infsb_z23   d81_inf_z22   d81_inf_z23 d81_infsb_z22 
##             3             2             3             3             3 
## d81_infsb_z23 
##             3

nopower_keepers <- list(
  "d01_zymo" = c("d01infz23", "d01infz22"),
  "d01_sbzymo" = c("d01infsbz23", "d01infsbz22"),
  "d02_zymo" = c("d02infz23", "d02infz22"),
  "d02_sbzymo" = c("d02infsbz23", "d02infsbz22"),
  "d09_zymo" = c("d09infz23", "d09infz22"),
  "d09_sbzymo" = c("d09infsbz23", "d09infsbz22"),
  "d81_zymo" = c("d81infz23", "d81infz22"),
  "d81_sbzymo" = c("d81infsbz23", "d81infsbz22"))
hs_nopower_nosva_table <- combine_de_tables(
  hs_nopower_nosva_de, keepers = nopower_keepers,
  excel = glue("excel/macrophage_de/hs_nopower_table-v{ver}.xlsx"))
##                                  extra_contrasts = extra)
hs_nopower_nosva_sig <- extract_significant_genes(
  hs_nopower_nosva_table,
  excel = glue("excel/macrophage_de/hs_nopower_nosva_sig-v{ver}.xlsx"))

d01d02_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d01_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d02_zymo"]],
                                by="row.names")
d0102_zymo_nosva_plot <- plot_linear_scatter(d01d02_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0102_zymo_nosva_plot$scatter

d0102_zymo_nosva_plot$correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df[[xcol]] and df[[ycol]]
## t = 199, df = 11718, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8747 0.8829
## sample estimates:
##    cor 
## 0.8789
d0102_zymo_nosva_plot$lm_rsq
## [1] 0.8266
d09d81_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d09_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d81_zymo"]],
                                by="row.names")
d0981_zymo_nosva_plot <- plot_linear_scatter(d09d81_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0981_zymo_nosva_plot$scatter

d0981_zymo_nosva_plot$correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df[[xcol]] and df[[ycol]]
## t = 103, df = 11718, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6801 0.6991
## sample estimates:
##    cor 
## 0.6897
d0981_zymo_nosva_plot$lm_rsq
## [1] 0.4553
d01d81_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d01_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d81_zymo"]],
                                by="row.names")
d0181_zymo_nosva_plot <- plot_linear_scatter(d01d81_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0181_zymo_nosva_plot$scatter

d0181_zymo_nosva_plot$correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df[[xcol]] and df[[ycol]]
## t = 84, df = 11718, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6018 0.6244
## sample estimates:
##    cor 
## 0.6133
d0181_zymo_nosva_plot$lm_rsq
## [1] 0.2654
upset_plots_nosva <- upsetr_sig(hs_nopower_nosva_sig, both=TRUE,
                                contrasts=c("d01_zymo", "d02_zymo", "d09_zymo", "d81_zymo"))
upset_plots_nosva$up

upset_plots_nosva$down

upset_plots_nosva$both

## The 7th element in the both groups list is the set shared among all donors.
## I don't feel like writing out x:y:z:a
groups <- upset_plots_nosva[["both_groups"]]
shared_genes <- attr(groups, "elements")[groups[[7]]] %>%
  gsub(pattern = "^gene:", replacement = "")
shared_gp <- simple_gprofiler(shared_genes)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## Add a little logic here to use enrichplot::dotplot().
shared_gp$pvalue_plots$MF

shared_gp$pvalue_plots$BP

shared_gp$pvalue_plots$REAC

shared_gp$pvalue_plots$WP

6.0.2.2 Add donor to the contrasts, sva

hs_nopower_sva_de <- all_pairwise(hs_nopower, model_batch = "svaseq", filter = TRUE)
## 
##   d01_inf_z22   d01_inf_z23 d01_infsb_z22 d01_infsb_z23   d02_inf_z22 
##             2             3             3             3             3 
##   d02_inf_z23 d02_infsb_z22 d02_infsb_z23   d09_inf_z22   d09_inf_z23 
##             3             3             3             3             3 
## d09_infsb_z22 d09_infsb_z23   d81_inf_z22   d81_inf_z23 d81_infsb_z22 
##             3             2             3             3             3 
## d81_infsb_z23 
##             3
## Removing 0 low-count genes (11720 remaining).
## Setting 2174 low elements to zero.
## transform_counts: Found 2174 values equal to 0, adding 1 to the matrix.

nopower_keepers <- list(
  "d01_zymo" = c("d01infz23", "d01infz22"),
  "d01_sbzymo" = c("d01infsbz23", "d01infsbz22"),
  "d02_zymo" = c("d02infz23", "d02infz22"),
  "d02_sbzymo" = c("d02infsbz23", "d02infsbz22"),
  "d09_zymo" = c("d09infz23", "d09infz22"),
  "d09_sbzymo" = c("d09infsbz23", "d09infsbz22"),
  "d81_zymo" = c("d81infz23", "d81infz22"),
  "d81_sbzymo" = c("d81infsbz23", "d81infsbz22"))
hs_nopower_sva_table <- combine_de_tables(
  hs_nopower_sva_de, keepers = nopower_keepers,
  excel = glue("excel/macrophage_de/hs_nopower_table-v{ver}.xlsx"))
## Deleting the file excel/macrophage_de/hs_nopower_table-v202404.xlsx before writing the tables.
##                                  extra_contrasts = extra)
hs_nopower_sva_sig <- extract_significant_genes(
  hs_nopower_sva_table,
  excel = glue("excel/macrophage_de/hs_nopower_sva_sig-v{ver}.xlsx"))

d01d02_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d01_zymo"]],
                              hs_nopower_sva_table[["data"]][["d02_zymo"]],
                              by="row.names")
d0102_zymo_sva_plot <- plot_linear_scatter(d01d02_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0102_zymo_sva_plot$scatter

d0102_zymo_sva_plot$correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df[[xcol]] and df[[ycol]]
## t = 163, df = 11718, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8277 0.8387
## sample estimates:
##    cor 
## 0.8333
d0102_zymo_sva_plot$lm_rsq
## [1] 0.72
d09d81_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d09_zymo"]],
                              hs_nopower_sva_table[["data"]][["d81_zymo"]],
                              by="row.names")
d0981_zymo_sva_plot <- plot_linear_scatter(d09d81_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0981_zymo_sva_plot$scatter

d0981_zymo_sva_plot$correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df[[xcol]] and df[[ycol]]
## t = 103, df = 11718, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.680 0.699
## sample estimates:
##    cor 
## 0.6897
d0981_zymo_sva_plot$lm_rsq
## [1] 0.4552
d01d81_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d01_zymo"]],
                              hs_nopower_sva_table[["data"]][["d81_zymo"]],
                              by="row.names")
d0181_zymo_sva_plot <- plot_linear_scatter(d01d81_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0181_zymo_sva_plot$scatter

d0181_zymo_sva_plot$correlation
## 
##  Pearson's product-moment correlation
## 
## data:  df[[xcol]] and df[[ycol]]
## t = 77, df = 11718, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5691 0.5931
## sample estimates:
##    cor 
## 0.5813
d0181_zymo_sva_plot$lm_rsq
## [1] 0.2183
upset_plots_sva <- upsetr_sig(hs_nopower_sva_sig, both=TRUE,
                              contrasts=c("d01_zymo", "d02_zymo", "d09_zymo", "d81_zymo"))
upset_plots_sva$up

upset_plots_sva$down

upset_plots_sva$both

## The 7th element in the both groups list is the set shared among all donors.
## I don't feel like writing out x:y:z:a
groups <- upset_plots_sva[["both_groups"]]
shared_genes <- attr(groups, "elements")[groups[[7]]] %>%
  gsub(pattern = "^gene:", replacement = "")
shared_gp <- simple_gprofiler(shared_genes)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## Add a little logic here to use enrichplot::dotplot().
shared_gp$pvalue_plots$MF

shared_gp$pvalue_plots$BP

shared_gp$pvalue_plots$REAC

shared_gp$pvalue_plots$WP

6.0.3 Donor comparison

hs_donors <- set_expt_conditions(hs_macr, fact = "donor")
## The numbers of samples by condition are:
## 
## d01 d02 d09 d81 
##  13  14  13  14
donor_de <- all_pairwise(hs_donors, model_batch="svaseq", filter=TRUE)
## 
## d01 d02 d09 d81 
##  13  14  13  14
## Removing 0 low-count genes (11756 remaining).
## Setting 1225 low elements to zero.
## transform_counts: Found 1225 values equal to 0, adding 1 to the matrix.
donor_de
## A pairwise differential expression with results from: basic, deseq, ebseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 15 comparisons.
donor_table <- combine_de_tables(
  donor_de,
  excel=glue("excel/macrophage_de/donor_tables-v{ver}.xlsx"))
donor_table
## A set of combined differential expression results.

##        table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup
## 1 d02_vs_d01         299           386         310           375         356
## 2 d09_vs_d01         535           471         537           465         524
## 3 d81_vs_d01         677           762         680           757         664
## 4 d09_vs_d02         408           269         407           276         369
## 5 d81_vs_d02         574           652         568           655         529
## 6 d81_vs_d09         217           417         213           421         217
##   limma_sigdown
## 1           356
## 2           468
## 3           760
## 4           306
## 5           677
## 6           423

donor_sig <- extract_significant_genes(
  donor_table,
  excel = glue("excel/macrophage_de/donor_sig-v{ver}.xlsx"))
donor_sig
## A set of genes deemed significant according to limma, edger, deseq, ebseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##            limma_up limma_down edger_up edger_down deseq_up deseq_down ebseq_up
## d02_vs_d01      356        356      310        375      299        386      242
## d09_vs_d01      524        468      537        465      535        471      485
## d81_vs_d01      664        760      680        757      677        762      576
## d09_vs_d02      369        306      407        276      408        269      211
## d81_vs_d02      529        677      568        655      574        652      299
## d81_vs_d09      217        423      213        421      217        417       86
##            ebseq_down basic_up basic_down
## d02_vs_d01        136      270        279
## d09_vs_d01        190      475        409
## d81_vs_d01        385      603        625
## d09_vs_d02        115      268        202
## d81_vs_d02        378      393        449
## d81_vs_d09        200      133        226

6.0.3.1 Primary query contrasts

The final contrast in this list is interesting because it depends on the extra contrasts applied to the all_pairwise() above. In my way of thinking, the primary comparisons to consider are either cross-drug or cross-strain, but not both. However I think in at least a few instances Olga is interested in strain+drug / uninfected+nodrug.

6.0.3.2 Write contrast results

Now let us write out the xlsx file containing the above contrasts. The file with the suffix _table-version will therefore contain all genes and the file with the suffix _sig-version will contain only those deemed significant via our default criteria of DESeq2 |logFC| >= 1.0 and adjusted p-value <= 0.05.

7 Over representation searches

I decided to make one initially small, but I think quickly big change to the organization of this document: I am moving the GSEA searches up to immediately after the DE. I will then move the plots of the gprofiler results to immediately after the various volcano plots so that it is easier to interpret them.

all_gp <- all_gprofiler(hs_macr_sig)
for (g in seq_len(length(all_gp))) {
  name <- names(all_gp)[g]
  datum <- all_gp[[name]]
  filename <- glue("excel/macrophage_de/gprofiler/{name}_gprofiler-v{ver}.xlsx")
  written <- sm(write_gprofiler_data(datum, excel = filename))
}

8 Plot contrasts of interest

One suggestion I received recently was to set the axes for these volcano plots to be static rather than let ggplot choose its own. I am assuming this is only relevant for pairs of contrasts, but that might not be true.

8.1 Individual zymodemes vs. uninfected

8.1.1 Infected with z2.3 no Antimonial vs. Uninfected

plot_colors <- get_expt_colors(hs_macr_table[["input"]][["input"]])
x_limits <- c(-20, 10)

## The original plot from my xlsx file
hs_macr_table$plots$z23nosb_vs_uninf$deseq_vol_plots
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

z23nosb_vs_uninf_volcano <- plot_volcano_condition_de(
  input = hs_macr_table[["data"]][["z23nosb_vs_uninf"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz23"]])
z23nosb_vs_uninf_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_text_repel()`).
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

plotly::ggplotly(z23nosb_vs_uninf_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23nosb_vs_uninf_volcano_nol <- plot_volcano_condition_de(
  input = hs_macr_table[["data"]][["z23nosb_vs_uninf"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz23"]])
z23nosb_vs_uninf_volcano_nol$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).

all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["REAC"]]

## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["KEGG"]]

## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["MF"]]

## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["WP"]]

## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["REAC"]]
## NULL
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["MF"]]

## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

8.1.2 Infected with z2.2 no Antimonial vs. Uninfected

## The original plot
hs_macr_table$plots$z22nosb_vs_uninf$deseq_vol_plots
## Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

z22nosb_vs_uninf_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22nosb_vs_uninf"]], "z22nosb_vs_uninf",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz22"]])
z22nosb_vs_uninf_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

plotly::ggplotly(z22nosb_vs_uninf_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[3L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[3L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[3L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z22nosb_vs_uninf_volcano_nol <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22nosb_vs_uninf"]], "z22nosb_vs_uninf",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz22"]])
z22nosb_vs_uninf_volcano_nol$plot +
  scale_x_continuous(limits = x_limits)

all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["REAC"]]

## Reactome, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["MF"]]

## MF, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["WP"]]

## WikiPathways, zymodeme2.2 without drug vs. uninfected without drug, up.

all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["REAC"]]
## NULL
## Reactome, zymodeme2.2 without drug vs. uninfected without drug, down.
all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.2 without drug vs. uninfected without drug, down.
all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["TF"]]
## NULL
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

8.1.3 Infected with z2.3 treated vs. Uninfected treated

## The original plot
hs_macr_table$plots$z23sb_vs_sb$deseq_vol_plots
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

z23sb_vs_uninfsb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["uninfsbnone"]])
z23sb_vs_uninfsb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_text_repel()`).
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

plotly::ggplotly(z23sb_vs_uninfsb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23sb_vs_uninfsb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["uninfsbnone"]])
z23sb_vs_uninfsb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).

8.1.4 Infected with z2.3 untreated vs. z2.2 untreated

## The original plot
hs_macr_table$plots$z23nosb_vs_z22nosb$deseq_vol_plots

z23nosb_vs_z22nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]], "z23nosb_vs_z22nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infz23"]], color_low = plot_colors[["infz22"]])
z23nosb_vs_z22nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)

8.1.5 Infected with z2.3 treated vs. z2.2 treated

## The original plot
hs_macr_table$plots$z23sb_vs_z22sb$deseq_vol_plots
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

z23sb_vs_z22sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_z22sb"]], "z23sb_vs_z22sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = FALSE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["infsbz22"]])
z23sb_vs_z22sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_text_repel()`).
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

8.1.6 Infected with z2.3 SB treated vs. z2.3 untreated

## The original plot
hs_macr_table$plots$z23sb_vs_z23nosb$deseq_vol_plots

z23sb_vs_z23nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_z23nosb"]], "z23sb_vs_z23nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz23"]], color_high = plot_colors[["infz23"]])
z23sb_vs_z23nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_text_repel()`).

8.1.7 Infected with z2.3 SB treated vs. z2.3 untreated

## The original plot
hs_macr_table$plots$z22sb_vs_z22nosb$deseq_vol_plots

z22sb_vs_z22nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22sb_vs_z22nosb"]], "z22sb_vs_z22nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz22"]], color_high = plot_colors[["infz22"]])
z22sb_vs_z22nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)

8.1.8 Infected with z2.3 SB treated vs. uninfected treated

## The original plot
hs_macr_table$plots$z23sb_vs_sb$deseq_vol_plots
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

z23sb_vs_sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz23"]], color_high = plot_colors[["uninfsbnone"]])
z23sb_vs_sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_text_repel()`).
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

8.1.9 Infected with z2.2 SB treated vs. uninfected treated

## The original plot
hs_macr_table$plots$z22sb_vs_sb$deseq_vol_plots
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

z22sb_vs_sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22sb_vs_sb"]], "z22sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz22"]], color_high = plot_colors[["uninfsbnone"]])
z22sb_vs_sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: ggrepel: 13 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Check that my perception of the number of significant up/down genes matches what the table/venn says.

shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_uninf"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_uninf"]])))
pp(file="images/z23_vs_uninf_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

## I see 910 z23sb/uninf and 670 no z23nosb/uninf genes in the venn diagram.
length(shared@IntersectionSets[["10"]]) + length(shared@IntersectionSets[["11"]])
## [1] 839
dim(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_uninf"]])
## [1] 839  58
shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_uninf"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22nosb_vs_uninf"]])))
pp(file="images/z22_vs_uninf_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

length(shared@IntersectionSets[["10"]]) + length(shared@IntersectionSets[["11"]])
## [1] 660
dim(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_uninf"]])
## [1] 660  58

Note to self: There is an error in my volcano plot code which takes effect when the numerator and denominator of the all_pairwise contrasts are different than those in combine_de_tables. It is putting the ups/downs on the correct sides of the plot, but calling the down genes ‘up’ and vice-versa. The reason for this is that I did a check for this happening, but used the wrong argument to handle it.

A likely bit of text for these volcano plots:

The set of genes differentially expressed between the zymodeme 2.3 and uninfected samples without druge treatment was quantified with DESeq2 and included surrogate estimates from SVA. Given the criteria of significance of a abs(logFC) >= 1.0 and false discovery rate adjusted p-value <= 0.05, 670 genes were observed as significantly increased between the infected and uninfected samples and 386 were observed as decreased. The most increased genes from the uninfected samples include some which are potentially indicative of a strong innate immune response and the inflammatory response.

In contrast, when the set of genes differentially expressed between the zymodeme 2.2 and uninfected samples was visualized, only 7 genes were observed as decreased and 435 increased. The inflammatory response was significantly less apparent in this set, but instead included genes related to transporter activity and oxidoreductases.

8.2 Direct zymodeme comparisons

An orthogonal comparison to that performed above is to directly compare the zymodeme 2.3 and 2.2 samples with and without antimonial treatment.

z23nosb_vs_z22nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23nosb_vs_z22nosb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23sb_vs_z22sb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23sb_vs_z22sb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23sb_vs_z22sb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23nosb_vs_z22nosb_volcano$plot +
  xlim(-10, 10) +
  ylim(0, 60)

pp(file="images/z23nosb_vs_z22nosb_reactome_up.png", image=all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Warning in pp(file = "images/z23nosb_vs_z22nosb_reactome_up.png", image =
## all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], : There is no
## device to shut down.
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]]
## NULL
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
z23sb_vs_z22sb_volcano$plot +
  xlim(-10, 10) +
  ylim(0, 60)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_text_repel()`).
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

pp(file="images/z23sb_vs_z22sb_reactome_up.png", image=all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Warning in pp(file = "images/z23sb_vs_z22sb_reactome_up.png", image =
## all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["REAC"]], : There is no device
## to shut down.
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_z22sb"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_z22nosb"]])))
pp(file="images/drug_nodrug_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23sb_vs_z22sb"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23nosb_vs_z22nosb"]])))
pp(file="images/drug_nodrug_venn_down.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2

A slightly different way of looking at the differences between the two zymodeme infections is to directly compare the infected samples with and without drug. Thus, when a volcano plot showing the comparison of the zymodeme 2.3 vs. 2.2 samples was plotted, 484 genes were observed as increased and 422 decreased; these groups include many of the same inflammatory (up) and membrane (down) genes.

Similar patterns were observed when the antimonial was included. Thus, when a Venn diagram of the two sets of increased genes was plotted, a significant number of the genes was observed as increased (313) and decreased (244) in both the untreated and antimonial treated samples.

8.3 Drug effects on each zymodeme infection

Another likely question is to directly compare the treated vs untreated samples for each zymodeme infection in order to visualize the effects of antimonial.

z23sb_vs_z23nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23sb_vs_z23nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23sb_vs_z23nosb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z22sb_vs_z22nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z22sb_vs_z22nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z22sb_vs_z22nosb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23sb_vs_z23nosb_volcano$plot +
  xlim(-8, 8) +
  ylim(0, 210)
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_point()`).
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_text_repel()`).

pp(file="images/z23sb_vs_z23nosb_reactome_up.png",
   image=all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Warning in pp(file = "images/z23sb_vs_z23nosb_reactome_up.png", image =
## all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["REAC"]], : There is no
## device to shut down.
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["REAC"]]
## Warning: Removed 1 row containing missing values or values outside the scale
## range (`geom_col()`).
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
z22sb_vs_z22nosb_volcano$plot +
  xlim(-8, 8) +
  ylim(0, 210)

pp(file="images/z22sb_vs_z22nosb_reactome_up.png",
   image=all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Warning in pp(file = "images/z22sb_vs_z22nosb_reactome_up.png", image =
## all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], : There is no
## device to shut down.
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
shared <- Vennerable::Venn(list("z23" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_z23nosb"]]),
                                "z22" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_z22nosb"]])))
pp(file="images/z23_z22_drug_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

shared <- Vennerable::Venn(list("z23" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23sb_vs_z23nosb"]]),
                                "z22" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z22sb_vs_z22nosb"]])))
pp(file="images/z23_z22_drug_venn_down.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

Note: I am settig the x and y-axis boundaries by allowing the plotter to pick its own axis the first time, writing down the ranges I observe, and then setting them to the largest of the pair. It is therefore possible that I missed one or more genes which lies outside that range.

The previous plotted contrasts sought to show changes between the two strains z2.3 and z2.2. Conversely, the previous volcano plots seek to directly compare each strain before/after drug treatment.

8.4 LRT of the Human Macrophage

tmrc2_lrt_strain_drug <- deseq_lrt(hs_macr, interactor_column = "drug",
                                   interest_column = "macrophagezymodeme", factors = c("drug", "macrophagezymodeme"))
## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 38 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
## rlog() may take a long time with 50 or more samples,
## vst() is a much faster transformation
## Working with 858 genes.
## Working with 855 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`
## Joining with `by = join_by(merge)`
## Warning in `labels<-.dendrogram`(dend, value = value, ...): The lengths of the
## new labels is shorter than the number of leaves in the dendrogram - labels are
## recycled.

tmrc2_lrt_strain_drug$cluster_data$plot

8.5 Parasite

lp_macrophage_de <- all_pairwise(lp_macrophage_nosb,
                                 model_batch="svaseq", filter=TRUE)
## 
## z2.2 z2.3 
##   14   15
## Removing 0 low-count genes (8591 remaining).
## Setting 889 low elements to zero.
## transform_counts: Found 889 values equal to 0, adding 1 to the matrix.
tmrc2_parasite_keepers <- list(
  "z23_vs_z22" = c("z23", "z22"))
lp_macrophage_table <- combine_de_tables(
  lp_macrophage_de, keepers = tmrc2_parasite_keepers,
  excel = glue("excel/macrophage_de/macrophage_parasite_infection_de-v{ver}.xlsx"))
lp_macrophage_sig <- extract_significant_genes(
  lp_macrophage_table,
  excel = glue("excel/macrophage_de/macrophage_parasite_sig-v{ver}.xlsx"))

lp_macrophage_table[["plots"]][["z23nosb_vs_z22nosb"]][["deseq_vol_plots"]][["plot"]]
## NULL
up_genes <- lp_macrophage_sig[["deseq"]][["ups"]][[1]]
dim(up_genes)
## [1] 48 67
down_genes <- lp_macrophage_sig[["deseq"]][["downs"]][[1]]
dim(down_genes)
## [1] 91 67
lp_z23sb_vs_z22sb_volcano <- plot_volcano_de(
  table = lp_macrophage_table[["data"]][["z23_vs_z22"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(lp_z23sb_vs_z22sb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
lp_z23sb_vs_z22sb_volcano$plot
## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

lp_lengths <- all_lp_annot[, c("gid", "annot_cds_length")]
colnames(lp_lengths)  <- c("ID", "length")

up_goseq <- simple_goseq(up_genes, go_db = lp_go, length_db = lp_lengths)
## Found 18 go_db genes and 48 length_db genes out of 48.
## View categories over represented in the 2.3 samples
up_goseq$pvalue_plots$bpp_plot_over
## NULL
down_goseq <- simple_goseq(down_genes, go_db = lp_go, length_db = lp_lengths)
## Found 25 go_db genes and 91 length_db genes out of 91.
## View categories over represented in the 2.2 samples
down_goseq$pvalue_plots$bpp_plot_over
## NULL
written_goseq <- write_goseq_data(up_goseq,
                                  excel = glue("lp_macrophage_increased_z2.3_goseq-v{ver}.xlsx"))
## Writing a sheet containing the legend.
## Error in names(x) <- value: 'names' attribute [2] must be the same length as the vector [1]
written_goseq <- write_goseq_data(down_goseq,
                                  excel = glue("lp_macrophage_increased_z2.2_goseq-v{ver}.xlsx"))
## Writing a sheet containing the legend.
## Error in names(x) <- value: 'names' attribute [2] must be the same length as the vector [1]

9 GSVA

hs_infected <- subset_expt(hs_macrophage, subset="macrophagetreatment!='uninf'") %>%
  subset_expt(subset="macrophagetreatment!='uninf_sb'")
## The samples excluded are: TMRC30059, TMRC30266, TMRC30326, TMRC30312, TMRC30309.
## subset_expt(): There were 68, now there are 63 samples.
## The samples excluded are: .
## subset_expt(): There were 63, now there are 63 samples.
hs_gsva_c2 <- simple_gsva(hs_infected)
## Converting the rownames() of the expressionset to ENTREZID.
## 1655 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 21481 entries.
## After conversion, the expressionset has 19785 entries.
## Adding descriptions and IDs to the gene set annotations.
hs_gsva_c2_meta <- get_msigdb_metadata(hs_gsva_c2, msig_xml="reference/msigdb_v7.2.xml")
## Error in get_msigdb_metadata(hs_gsva_c2, msig_xml = "reference/msigdb_v7.2.xml"): unused argument (msig_xml = "reference/msigdb_v7.2.xml")
hs_gsva_c2_sig <- get_sig_gsva_categories(hs_gsva_c2_meta, excel = "excel/macrophage_de/hs_macrophage_gsva_c2_sig.xlsx")
## Error in eval(expr, envir, enclos): object 'hs_gsva_c2_meta' not found
hs_gsva_c2_sig$raw_plot
## Error in eval(expr, envir, enclos): object 'hs_gsva_c2_sig' not found
hs_gsva_c7 <- simple_gsva(hs_infected, signature_category = "c7")
## Converting the rownames() of the expressionset to ENTREZID.
## 1655 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 21481 entries.
## After conversion, the expressionset has 19785 entries.
## Adding descriptions and IDs to the gene set annotations.
hs_gsva_c7_meta <- get_msigdb_metadata(hs_gsva_c7, msig_xml="reference/msigdb_v7.2.xml")
## Error in get_msigdb_metadata(hs_gsva_c7, msig_xml = "reference/msigdb_v7.2.xml"): unused argument (msig_xml = "reference/msigdb_v7.2.xml")
hs_gsva_c7_sig <- get_sig_gsva_categories(hs_gsva_c7, excel = "excel/macrophage_de/hs_macrophage_gsva_c7_sig.xlsx")
## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/3: Creating table: infsb_vs_inf.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: uninfsb_vs_inf.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: uninfsb_vs_infsb.  Adjust = BH
## Limma step 6/6: 1/3: Creating table: inf.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: infsb.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: uninfsb.  Adjust = BH
## The factor inf has 29 rows.
## The factor inf_sb has 29 rows.
## The factor uninf_sb has 5 rows.
## Testing each factor against the others.
## Scoring inf against everything else.
## Scoring inf_sb against everything else.
## Scoring uninf_sb against everything else.
hs_gsva_c7_sig$raw_plot

10 Try out a new tool

Two reasons: Najib loves him some PCA, this uses wikipathways, which is something I think is neat.

Ok, I spent some time looking through the code and I have some problems with some of the design decisions.

Most importantly, it requires a data.frame() which has the following format:

  1. No rownames, instead column #1 is the sample ID.
  2. Columns 2-m are the categorical/survival/etc metrics.
  3. Columns m-n are 1 gene-per-column with log2 values.

But when I think about it I think I get the idea, they want to be able to do modelling stuff more easily with response factors.

library(pathwayPCA)
## Error in library(pathwayPCA): there is no package called 'pathwayPCA'
library(rWikiPathways)
## Error in library(rWikiPathways): there is no package called 'rWikiPathways'
downloaded <- downloadPathwayArchive(organism = "Homo sapiens", format = "gmt")
## Error in downloadPathwayArchive(organism = "Homo sapiens", format = "gmt"): could not find function "downloadPathwayArchive"
data_path <- system.file("extdata", package = "pathwayPCA")
wikipathways <- read_gmt(paste0(data_path, "/wikipathways_human_symbol.gmt"),
                         description = TRUE)
## Error in read_gmt(paste0(data_path, "/wikipathways_human_symbol.gmt"), : could not find function "read_gmt"
expt <- subset_expt(hs_macrophage, subset = "macrophagetreatment!='uninf'") %>%
  subset_expt(subset = "macrophagetreatment!='uninf_sb'")
## The samples excluded are: TMRC30059, TMRC30266, TMRC30326, TMRC30312, TMRC30309.
## subset_expt(): There were 68, now there are 63 samples.
## The samples excluded are: .
## subset_expt(): There were 63, now there are 63 samples.
expt <- set_expt_conditions(expt, fact = "macrophagezymodeme")
## The numbers of samples by condition are:
## 
## none  z22  z23 
##    5   29   29
symbol_vector <- fData(expt)[[symbol_column]]
## Error in eval(expr, envir, enclos): object 'symbol_column' not found
names(symbol_vector) <- rownames(fData(expt))
## Error: object 'symbol_vector' not found
symbol_df <- as.data.frame(symbol_vector)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'symbol_vector' not found
assay_df <- merge(symbol_df, as.data.frame(exprs(expt)), by = "row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'symbol_df' not found
assay_df[["Row.names"]] <- NULL
## Error: object 'assay_df' not found
rownames(assay_df) <- make.names(assay_df[["symbol_vector"]], unique = TRUE)
## Error in eval(expr, envir, enclos): object 'assay_df' not found
assay_df[["symbol_vector"]] <- NULL
## Error: object 'assay_df' not found
assay_df <- as.data.frame(t(assay_df))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': error in evaluating the argument 'x' in selecting a method for function 't': object 'assay_df' not found
assay_df[["SampleID"]] <- rownames(assay_df)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'assay_df' not found
assay_df <- dplyr::select(assay_df, "SampleID", everything())
## Error in eval(expr, envir, enclos): object 'assay_df' not found
factor_df <- as.data.frame(pData(expt))
factor_df[["SampleID"]] <- rownames(factor_df)
factor_df <- dplyr::select(factor_df, "SampleID", everything())
factor_df <- factor_df[, c("SampleID", factors)]
## Error in eval(expr, envir, enclos): object 'factors' not found
tt <- CreateOmics(
  assayData_df = assay_df,
  pathwayCollection_ls = wikipathways,
  response = factor_df,
  respType = "categorical",
  minPathSize=5)
## Error in CreateOmics(assayData_df = assay_df, pathwayCollection_ls = wikipathways, : could not find function "CreateOmics"
super <- AESPCA_pVals(
  object = tt,
  numPCs = 2,
  parallel = FALSE,
  numCores = 8,
  numReps = 2,
  adjustment = "BH")
## Error in AESPCA_pVals(object = tt, numPCs = 2, parallel = FALSE, numCores = 8, : could not find function "AESPCA_pVals"
## Stopping this because it takes forever
##if (!isTRUE(get0("skip_load"))) {
##  pander::pander(sessionInfo())
##  message("This is hpgltools commit: ", get_git_commit())
##  message("Saving to ", savefile)
##  tmp <- sm(saveme(filename = savefile))
##}
tmp <- loadme(filename = savefile)
