1 Changelog

  • 20230410: Making some changes to improve the differential expression plots as well as prepare for some different pathway/GSEA/GSVA analyses on the data.

2 Introduction

Having established that the TMRC2 macrophage data looks robust and illustrative of a couple of interesting questions, let us perform a couple of differential analyses of it.

Also note that as of 202212, we received a new set of samples which now include some which are of a completely different cell type, U937. As their ATCC page states, they are malignant cells taken from the pleural effusion of a 37 year old white male with histiocytic lymphoma and which exhibit the morphology of monocytes. Thus, this document now includes some comparisons of the cell types as well as the various macrophage donors (given that there are now more donors too).

2.1 Human data

I am moving the dataset manipulations here so that I can look at them all together before running the various DE analyses.

2.2 Create sets focused on drug, celltype, strain, and combinations

Let us start by playing with the metadata a little and create sets with the condition set to:

  • Drug treatment
  • Cell type (macrophage or U937)
  • Donor
  • Infection Strain
  • Some useful combinations thereof

In addition, keep mental track of which datasets are comprised of all samples vs. those which are only macrophage vs. those which are only U937. (Thus, the usage of all_human vs. hs_macr vs. u937 as prefixes for the data structures.)

Ideally, these recreations of the data should perhaps be in the datastructures worksheet.

all_human <- sanitize_expt_metadata(hs_macrophage, columns = "drug") %>%
  set_expt_conditions(fact = "drug") %>%
  set_expt_batches(fact = "typeofcells")
## 
## antimony     none 
##       34       34 
## 
## Macrophages        U937 
##          54          14
## The following 3 lines were copy/pasted to datastructures and should be removed soon.
no_strain_idx <- pData(all_human)[["strainid"]] == "none"
##pData(all_human)[["strainid"]] <- paste0("s", pData(all_human)[["strainid"]],
##                                         "_", pData(all_human)[["macrophagezymodeme"]])
pData(all_human)[no_strain_idx, "strainid"] <- "none"
table(pData(all_human)[["strainid"]])
## 
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     8     2     2     2     8     7     8     2     7     8     2    10
all_human_types <- set_expt_conditions(all_human, fact = "typeofcells") %>%
  set_expt_batches(fact = "drug")
## 
## Macrophages        U937 
##          54          14 
## 
## antimony     none 
##       34       34
type_zymo_fact <- paste0(pData(all_human_types)[["condition"]], "_",
                         pData(all_human_types)[["macrophagezymodeme"]])
type_zymo <- set_expt_conditions(all_human_types, fact = type_zymo_fact)
## 
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none         U937_z22 
##                8               23               23                2                6 
##         U937_z23 
##                6
type_drug_fact <- paste0(pData(all_human_types)[["condition"]], "_",
                         pData(all_human_types)[["drug"]])
type_drug <- set_expt_conditions(all_human_types, fact = type_drug_fact)
## 
## Macrophages_antimony     Macrophages_none        U937_antimony            U937_none 
##                   27                   27                    7                    7
strain_fact <- pData(all_human_types)[["strainid"]]
table(strain_fact)
## strain_fact
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     8     2     2     2     8     7     8     2     7     8     2    10
new_conditions <- paste0(pData(hs_macrophage)[["macrophagetreatment"]], "_",
                         pData(hs_macrophage)[["macrophagezymodeme"]])
## Note the sanitize() call is redundant with the addition of sanitize() in the
## datastructures file, but I don't want to wait to rerun that.
hs_macr <- set_expt_conditions(hs_macrophage, fact = new_conditions) %>%
  sanitize_expt_metadata(column = "drug") %>%
  subset_expt(subset = "typeofcells!='U937'")
## 
##      inf_z22      inf_z23    infsb_z22    infsb_z23   uninf_none uninfsb_none 
##           14           15           15           14            5            5
## subset_expt(): There were 68, now there are 54 samples.

2.2.1 Separate Macrophage samples

Once again, we should reconsider where the following block is placed, but these datastructures are likely to be used in many of the following analyses.

hs_macr_drug_expt <- set_expt_conditions(hs_macr, fact = "drug")
## 
## antimony     none 
##       27       27
hs_macr_strain_expt <- set_expt_conditions(hs_macr, fact = "macrophagezymodeme") %>%
  subset_expt(subset = "macrophagezymodeme != 'none'")
## 
## none  z22  z23 
##    8   23   23
## subset_expt(): There were 54, now there are 46 samples.
table(pData(hs_macr)[["strainid"]])
## 
## 10763 10772 10977 11026 11075 11126 12251 12309 12355 12367  2169  7158  none 
##     2     6     2     2     2     6     5     6     2     5     6     2     8

2.2.2 Refactor U937 samples

The U937 samples were separated in the datastructures file, but we want to use the combination of drug/zymodeme with them pretty much exclusively.

new_conditions <- paste0(pData(hs_u937)[["macrophagetreatment"]], "_",
                         pData(hs_u937)[["macrophagezymodeme"]])
u937_expt <- set_expt_conditions(hs_u937, fact = new_conditions)
## 
##      inf_z22      inf_z23    infsb_z22    infsb_z23   uninf_none uninfsb_none 
##            3            3            3            3            1            1

2.3 Contrasts used in this document

Given the various ways we have chopped up this dataset, there are a few general types of contrasts we will perform, which will then be combined into greater complexity:

  • drug treatment
  • strains used
  • cellltypes
  • donors

In the end, our actual goal is to consider the variable effects of drug+strain and see if we can discern patterns which lead to better or worse drug treatment outcome.

There is a set of contrasts in which we are primarily interested in this data, these follow. I created one ratio of ratios contrast which I think has the potential to ask our biggest question.

tmrc2_human_extra <- "z23drugnodrug_vs_z22drugnodrug = (infsbz23 - infz23) - (infsbz22 - infz22), z23z22drug_vs_z23z22nodrug = (infsbz23 - infsbz22) - (infz23 - infz22)"
tmrc2_human_keepers <- list(
  "z23nosb_vs_uninf" = c("infz23", "uninfnone"),
  "z22nosb_vs_uninf" = c("infz22", "uninfnone"),
  "z23nosb_vs_z22nosb" = c("infz23", "infz22"),
  "z23sb_vs_z22sb" = c("infsbz23", "infsbz22"),
  "z23sb_vs_z23nosb" = c("infsbz23", "infz23"),
  "z22sb_vs_z22nosb" = c("infsbz22", "infz22"),
  "z23sb_vs_sb" = c("infsbz23", "uninfsbnone"),
  "z22sb_vs_sb" = c("infsbz22", "uninfsbnone"),
  "z23sb_vs_uninf" = c("infsbz23", "uninfnone"),
  "z22sb_vs_uninf" = c("infsbz22", "uninfnone"),
  "sb_vs_uninf" = c("uninfsbnone", "uninfnone"),
  "extra_z2322" = c("z23drugnodrug", "z22drugnodrug"),
  "extra_drugnodrug" = c("z23z22drug", "z23z22nodrug"))
tmrc2_drug_keepers <- list(
  "drug" = c("antimony", "none"))
tmrc2_type_keepers <- list(
  "type" = c("U937", "Macrophages"))
tmrc2_strain_keepers <- list(
  "strain" = c("z23", "z22"))
type_zymo_extra <- "zymos_vs_types = (U937z23 - U937z22) - (Macrophagesz23 - Macrophagesz22)"
tmrc2_typezymo_keepers <- list(
  "u937_macr" = c("Macrophagesnone", "U937none"),
  "zymo_macr" = c("Macrophagesz23", "Macrophagesz22"),
  "zymo_u937" = c("U937z23", "U937z22"),
  "z23_types" = c("U937z23", "Macrophagesz23"),
  "z22_types" = c("U937z22", "Macrophagesz22"),
  "zymos_types" = c("zymos_vs_types"))
tmrc2_typedrug_keepers <- list(
  "type_nodrug" = c("U937none", "Macrophagesnone"),
  "type_drug" = c("U937antimony", "Macrophagesantimony"),
  "macr_drugs" = c("Macrophagesantimony", "Macrophagesnone"),
  "u937_drugs" = c("U937antimony", "U937none"))
u937_keepers <- list(
  "z23nosb_vs_uninf" = c("infz23", "uninfnone"),
  "z22nosb_vs_uninf" = c("infz22", "uninfnone"),
  "z23nosb_vs_z22nosb" = c("infz23", "infz22"),
  "z23sb_vs_z22sb" = c("infsbz23", "infsbz22"),
  "z23sb_vs_z23nosb" = c("infsbz23", "infz23"),
  "z22sb_vs_z22nosb" = c("infsbz22", "infz22"),
  "z23sb_vs_sb" = c("infsbz23", "uninfsbnone"),
  "z22sb_vs_sb" = c("infsbz22", "uninfsbnone"),
  "z23sb_vs_uninf" = c("infsbz23", "uninfnone"),
  "z22sb_vs_uninf" = c("infsbz22", "uninfnone"),
  "sb_vs_uninf" = c("uninfsbnone", "uninfnone"))
high_expression <- 128
high_expression_column <- "deseq_basemean"

combined_to_tsv <- function(combined, celltype = "all") {
  keepers <- combined[["keepers"]]
  for (k in seq_len(length(keepers))) {
    kname <- names(keepers)[k]
    numerator <- keepers[[k]][1]
    denominator <- keepers[[k]][2]
    filename <- glue("analyses/macrophage_de/{ver}/tsv_tables/tmrc2_{celltype}_{kname}_n{numerator}_d{denominator}-v{ver}.tsv")
    kdata <- combined[["data"]][[kname]]
    if (is.null(kdata[["basic_num"]])) {
      next
    }
    wanted <- c("hgncsymbol", "deseq_logfc", "deseq_adjp", "deseq_basemean", "deseq_num", "deseq_den")
    wanted_data <- kdata[, wanted]
    colnames(wanted_data) <- c("hgncsymbol", "deseq_logfc", "deseq_adjp", "deseq_mean", "deseq_numerator", "deseq_denominator")
    readr::write_tsv(x = wanted_data %>% tibble::rownames_to_column(), file = filename)
  }
}

write_all_gp <- function(all_gp) {
  for (g in seq_len(length(all_gp))) {
    name <- names(all_gp)[g]
    datum <- all_gp[[name]]
    filename <- glue("analyses/macrophage_de/{ver}/gprofiler/{name}_gprofiler-v{ver}.xlsx")
    written <- sm(write_gprofiler_data(datum, excel = filename))
  }
}

2.3.1 Primary queries

There is a series of initial questions which make some sense to me, but these do not necessarily match the set of questions which are most pressing. I am hoping to pull both of these sets of queries in one.

Before extracting these groups of queries, let us invoke the all_pairwise() function and get all of the likely contrasts along with one or more extras that might prove useful (the ‘extra’ argument).

2.3.2 Combined U937 and Macrophages: Compare drug effects

When we have the u937 cells in the same dataset as the macrophages, that provides an interesting opportunity to see if we can observe drug-dependant effects which are shared across both cell types.

drug_de <- all_pairwise(all_human, filter = TRUE, model_batch = "svaseq")
## 
## antimony     none 
##       34       34
## Removing 0 low-count genes (12283 remaining).
## Setting 3092 low elements to zero.
## transform_counts: Found 3092 values equal to 0, adding 1 to the matrix.
drug_table <- combine_de_tables(
  drug_de, keepers = tmrc2_drug_keepers,
  excel = glue("analyses/macrophage_de/{ver}/de_tables/tmrc2_macrophage_drug_comparison-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
combined_to_tsv(drug_table, celltype = "all")
## Error in combined_to_tsv(drug_table, celltype = "all"): object 'drug_table' not found
drug_sig <- extract_significant_genes(
  drug_table,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_drug_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(drug_table, excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_drug_sig-v{ver}.xlsx")): object 'drug_table' not found
drug_highsig <- extract_significant_genes(
  drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_drug_highsig-v{ver}.xlsx"))
## Error in extract_significant_genes(drug_table, min_mean_exprs = high_expression, : object 'drug_table' not found
all_drug_gp <- all_gprofiler(drug_sig)
## Error in all_gprofiler(drug_sig): object 'drug_sig' not found
write_all_gp(all_drug_gp)
## Error in write_all_gp(all_drug_gp): object 'all_drug_gp' not found

2.3.3 Combined U937 and Macrophages: compare cell types

There are a couple of ways one might want to directly compare the two cell types.

  • Given that the variance between the two celltypes is so huge, just compare all samples.
  • One might want to compare them with the interaction effects of drug/zymodeme.
type_de <- all_pairwise(all_human_types, filter = TRUE, model_batch = "svaseq")
## 
## Macrophages        U937 
##          54          14
## Removing 0 low-count genes (12283 remaining).
## Setting 8682 low elements to zero.
## transform_counts: Found 8682 values equal to 0, adding 1 to the matrix.
type_table <- combine_de_tables(
  type_de, keepers = tmrc2_type_keepers,
  excel = glue("analyses/macrophage_de/{ver}/de_tables/tmrc2_macrophage_type_comparison-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
combined_to_tsv(type_table, celltype = "all")
## Error in combined_to_tsv(type_table, celltype = "all"): object 'type_table' not found
type_sig <- extract_significant_genes(
  type_table,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(type_table, excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_sig-v{ver}.xlsx")): object 'type_table' not found
type_highsig <- extract_significant_genes(
  type_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_highsig-v{ver}.xlsx"))
## Error in extract_significant_genes(type_table, min_mean_exprs = high_expression, : object 'type_table' not found

2.3.3.1 Combined factors of interest: celltype+zymodeme

Given the above explicit comparison of all samples comprising the two cell types, now let us look at the drug treatment+zymodeme status with all samples, macrophages and U937.

type_zymo_de <- all_pairwise(type_zymo, filter = TRUE, model_batch = "svaseq",
                             extra_contrasts = type_zymo_extra)
## 
## Macrophages_none  Macrophages_z22  Macrophages_z23        U937_none         U937_z22 
##                8               23               23                2                6 
##         U937_z23 
##                6
## Removing 0 low-count genes (12283 remaining).
## Setting 9655 low elements to zero.
## transform_counts: Found 9655 values equal to 0, adding 1 to the matrix.

type_zymo_table <- combine_de_tables(
  type_zymo_de, keepers = tmrc2_typezymo_keepers,
  excel = glue("analyses/macrophage_de/{ver}/de_tables/tmrc2_macrophage_type_zymo_comparison-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
combined_to_tsv(type_zymo_table, celltype = "all")
## Error in combined_to_tsv(type_zymo_table, celltype = "all"): object 'type_zymo_table' not found
type_zymo_sig <- extract_significant_genes(
  type_zymo_table,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_zymo_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(type_zymo_table, excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_zymo_sig-v{ver}.xlsx")): object 'type_zymo_table' not found
type_zymo_highsig <- extract_significant_genes(
  type_zymo_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_zymo_highsig-v{ver}.xlsx"))
## Error in extract_significant_genes(type_zymo_table, min_mean_exprs = high_expression, : object 'type_zymo_table' not found

2.3.3.2 Combined factors of inteest: celltype+drug

The ‘type_drug’ datastructure is the same as above, but the condition is created from the concatenation of the cell type and drug treatment.

type_drug_de <- all_pairwise(type_drug, filter = TRUE, model_batch = "svaseq")
## 
## Macrophages_antimony     Macrophages_none        U937_antimony            U937_none 
##                   27                   27                    7                    7
## Removing 0 low-count genes (12283 remaining).
## Setting 9642 low elements to zero.
## transform_counts: Found 9642 values equal to 0, adding 1 to the matrix.

type_drug_table <- combine_de_tables(
  type_drug_de, keepers = tmrc2_typedrug_keepers,
  excel = glue("analyses/macrophage_de/{ver}/de_tables/tmrc2_macrophage_type_drug_comparison-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
combined_to_tsv(type_drug_table, celltype = "all")
## Error in combined_to_tsv(type_drug_table, celltype = "all"): object 'type_drug_table' not found
type_drug_sig <- extract_significant_genes(
  type_drug_table,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_drug_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(type_drug_table, excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_drug_sig-v{ver}.xlsx")): object 'type_drug_table' not found
type_drug_highsig <- extract_significant_genes(
  type_drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_drug_highsig-v{ver}.xlsx"))
## Error in extract_significant_genes(type_drug_table, min_mean_exprs = high_expression, : object 'type_drug_table' not found

3 Individual cell types

At this point, I think it is fair to say that the two cell types are sufficiently different that they do not really belong together in a single analysis.

3.1 drug or strain effects, single cell type

One of the queries Najib asked which I think I misinterpreted was to look at drug and/or strain effects. My interpretation is somewhere below and was not what he was looking for. Instead, he was looking to see all(macrophage) drug/nodrug and all(macrophage) z23/z22 and compare them to each other. It may be that this is still a wrong interpretation, if so the most likely comparison is either:

  • (z23drug/z22drug) / (z23nodrug/z22nodrug), or perhaps
  • (z23drug/z23nodrug) / (z22drug/z22nodrug),

I am not sure those confuse me, and at least one of them is below

3.1.1 Macrophages

In these blocks we will explicitly query only one factor at a time, drug and strain. The eventual goal is to look for effects of drug treatment and/or strain treatment which are shared?

3.1.1.1 Macrophage Drug only

Thus we will start with the pure drug query. In this block we will look only at the drug/nodrug effect.

hs_macr_drug_de <- all_pairwise(hs_macr_drug_expt, filter = TRUE, model_batch = "svaseq")
## 
## antimony     none 
##       27       27
## Removing 0 low-count genes (11756 remaining).
## Setting 1309 low elements to zero.
## transform_counts: Found 1309 values equal to 0, adding 1 to the matrix.
hs_macr_drug_table <- combine_de_tables(
  hs_macr_drug_de, keepers = tmrc2_drug_keepers,
  excel = glue("analyses/macrophage_de/tmrc2_macrophage_onlydrug_table-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
combined_to_tsv(hs_macr_drug_table, celltype = "macrophage")
## Error in combined_to_tsv(hs_macr_drug_table, celltype = "macrophage"): object 'hs_macr_drug_table' not found
hs_macr_drug_sig <- extract_significant_genes(
  hs_macr_drug_table,
  excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_drug_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(hs_macr_drug_table, excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_drug_sig-v{ver}.xlsx")): object 'hs_macr_drug_table' not found
hs_macr_drug_highsig <- extract_significant_genes(
  hs_macr_drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_drug_highsig-v{ver}.xlsx"))
## Error in extract_significant_genes(hs_macr_drug_table, min_mean_exprs = high_expression, : object 'hs_macr_drug_table' not found

3.1.1.2 Macrophage Strain only

In a similar fashion, let us look for effects which are observed when we consider only the strain used during infection.

hs_macr_strain_de <- all_pairwise(hs_macr_strain_expt, filter = TRUE, model_batch = "svaseq")
## 
## z22 z23 
##  23  23
## Removing 0 low-count genes (11720 remaining).
## Setting 1017 low elements to zero.
## transform_counts: Found 1017 values equal to 0, adding 1 to the matrix.
hs_macr_strain_table <- combine_de_tables(
  hs_macr_strain_de, keepers = tmrc2_strain_keepers,
  excel = glue("analyses/macrophage_de/tmrc2_macrophage_onlystrain_table-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
combined_to_tsv(hs_macr_strain_table, celltype = "macrophage")
## Error in combined_to_tsv(hs_macr_strain_table, celltype = "macrophage"): object 'hs_macr_strain_table' not found
hs_macr_strain_sig <- extract_significant_genes(
  hs_macr_strain_table,
  excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_onlystrain_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(hs_macr_strain_table, excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_onlystrain_sig-v{ver}.xlsx")): object 'hs_macr_strain_table' not found
hs_macr_strain_highsig <- extract_significant_genes(
  hs_macr_strain_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_onlystrain_highsig-v{ver}.xlsx"))
## Error in extract_significant_genes(hs_macr_strain_table, min_mean_exprs = high_expression, : object 'hs_macr_strain_table' not found

3.1.1.3 Compare Drug and Strain Effects

Now let us consider the above two comparisons together. First, I will plot the logFC values of them against each other (drug on x-axis and strain on the y-axis). Then we can extract the significant genes in a few combined categories of interest. I assume these will focus exclusively on the categories which include the introduction of the drug.

drug_strain_comp_df <- merge(hs_macr_drug_table[["data"]][["drug"]],
                             hs_macr_strain_table[["data"]][["strain"]],
                             by = "row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'hs_macr_drug_table' not found
drug_strain_comp_plot <- plot_linear_scatter(
  drug_strain_comp_df[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'drug_strain_comp_df' not found
## Contrasts: antimony/none, z23/z22; x-axis: drug, y-axis: strain
## top left: higher no drug, z23; top right: higher drug z23
## bottom left: higher no drug, z22; bottom right: higher drug z22
drug_strain_comp_plot$scatter
## Error in eval(expr, envir, enclos): object 'drug_strain_comp_plot' not found

As I noted in the comments above, some quadrants of the scatter plot are likely to be of greater interest to us than others (the right side). Because I get confused sometimes, the following block will explicitly name the categories of likely interest, then ask which genes are shared among them, and finally use UpSetR to extract the various gene intersection/union categories.

higher_drug <- hs_macr_drug_sig[["deseq"]][["downs"]][[1]]
## Error in eval(expr, envir, enclos): object 'hs_macr_drug_sig' not found
higher_nodrug <- hs_macr_drug_sig[["deseq"]][["ups"]][[1]]
## Error in eval(expr, envir, enclos): object 'hs_macr_drug_sig' not found
higher_z23 <- hs_macr_strain_sig[["deseq"]][["ups"]][[1]]
## Error in eval(expr, envir, enclos): object 'hs_macr_strain_sig' not found
higher_z22 <- hs_macr_strain_sig[["deseq"]][["downs"]][[1]]
## Error in eval(expr, envir, enclos): object 'hs_macr_strain_sig' not found
sum(rownames(higher_drug) %in% rownames(higher_z23))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function '%in%': error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'higher_drug' not found
sum(rownames(higher_drug) %in% rownames(higher_z22))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function '%in%': error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'higher_drug' not found
sum(rownames(higher_nodrug) %in% rownames(higher_z23))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function '%in%': error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'higher_nodrug' not found
sum(rownames(higher_nodrug) %in% rownames(higher_z22))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function '%in%': error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'higher_nodrug' not found
drug_z23_lst <- list("drug" = rownames(higher_drug),
                     "z23" = rownames(higher_z23))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'higher_drug' not found
higher_drug_z23 <- upset(UpSetR::fromList(drug_z23_lst), text.scale = 2)
## Error in unlist(input): object 'drug_z23_lst' not found
higher_drug_z23
## Error in eval(expr, envir, enclos): object 'higher_drug_z23' not found
drug_z23_shared_genes <- overlap_groups(drug_z23_lst)
## Error in unlist(input): object 'drug_z23_lst' not found
shared_genes_drug_z23 <- overlap_geneids(drug_z23_shared_genes, "drug:z23")
## Error in overlap_geneids(drug_z23_shared_genes, "drug:z23"): object 'drug_z23_shared_genes' not found
#shared_genes_drug_z23 <- attr(drug_z23_shared_genes, "elements")[drug_z23_shared_genes[["drug:z23"]]]

drug_z22_lst <- list("drug" = rownames(higher_drug),
                     "z22" = rownames(higher_z22))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'higher_drug' not found
higher_drug_z22 <- upset(UpSetR::fromList(drug_z22_lst), text.scale = 2)
## Error in unlist(input): object 'drug_z22_lst' not found
higher_drug_z22
## Error in eval(expr, envir, enclos): object 'higher_drug_z22' not found
drug_z22_shared_genes <- overlap_groups(drug_z22_lst)
## Error in unlist(input): object 'drug_z22_lst' not found
shared_genes_drug_z22 <- overlap_geneids(drug_z22_shared_genes, "drug:z22")
## Error in overlap_geneids(drug_z22_shared_genes, "drug:z22"): object 'drug_z22_shared_genes' not found
#shared_genes_drug_z22 <- attr(drug_z22_shared_genes, "elements")[drug_z22_shared_genes[["drug:z22"]]]

3.1.1.4 Perform gProfiler on drug/strain effect shared genes

Now that we have some populations of genes which are shared across the drug/strain effects, let us pass them to some GSEA analyses and see what pops out.

wanted <- drug_z23_shared_genes[["drug:z23"]]
## Error in eval(expr, envir, enclos): object 'drug_z23_shared_genes' not found
shared_genes_drug_z23 <- attr(drug_z23_shared_genes, "elements")[wanted]
## Error in eval(expr, envir, enclos): object 'drug_z23_shared_genes' not found
shared_drug_z23_gp <- simple_gprofiler(shared_genes_drug_z23)
## Error in "character" %in% class(sig_genes): object 'shared_genes_drug_z23' not found
shared_drug_z23_gp[["pvalue_plots"]][["MF"]]
## Error in eval(expr, envir, enclos): object 'shared_drug_z23_gp' not found
shared_drug_z23_gp[["pvalue_plots"]][["BP"]]
## Error in eval(expr, envir, enclos): object 'shared_drug_z23_gp' not found
shared_drug_z23_gp[["pvalue_plots"]][["REAC"]]
## Error in eval(expr, envir, enclos): object 'shared_drug_z23_gp' not found
wanted <- drug_z22_shared_genes[["drug:z22"]]
## Error in eval(expr, envir, enclos): object 'drug_z22_shared_genes' not found
shared_genes_drug_z22 <- attr(drug_z22_shared_genes, "elements")[wanted]
## Error in eval(expr, envir, enclos): object 'drug_z22_shared_genes' not found
shared_drug_z22_gp <- simple_gprofiler(shared_genes_drug_z22)
## Error in "character" %in% class(sig_genes): object 'shared_genes_drug_z22' not found
shared_drug_z22_gp[["pvalue_plots"]][["BP"]]
## Error in eval(expr, envir, enclos): object 'shared_drug_z22_gp' not found

3.2 Our main question of interest

The data structure hs_macr contains our primary macrophages, which are, as shown above, the data we can really sink our teeth into.

Note, we expect some errors when running the combine_de_tables() because not all methods I use are comfortable using the ratio or ratios contrasts we added in the ‘extras’ argument. As a result, when we combine them into the larger output tables, those peculiar contrasts fail. This does not stop it from writing the rest of the results, however.

## test = deseq_pairwise(normalize_expt(hs_macr, filter=TRUE), model_batch = "svaseq", filter = TRUE, extra_contrasts = tmrc2_human_extra)
hs_macr_de <- all_pairwise(
  hs_macr, model_batch = "svaseq", parallel = FALSE,
  filter = TRUE,
  extra_contrasts = tmrc2_human_extra)
## 
##      inf_z22      inf_z23    infsb_z22    infsb_z23   uninf_none uninfsb_none 
##           11           12           12           11            4            4
## Removing 0 low-count genes (11756 remaining).
## Setting 2374 low elements to zero.
## transform_counts: Found 2374 values equal to 0, adding 1 to the matrix.
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating mean and variance tables.
## Basic step 2/3: Performing 21 comparisons.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## Starting DESeq2 pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## DESeq2 step 1/5: Including a matrix of batch estimates in the deseq model.
## converting counts to integer mode
## DESeq2 step 2/5: Estimate size factors.
## DESeq2 step 3/5: Estimate dispersions.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Using a parametric fitting seems to have worked.
## DESeq2 step 4/5: nbinomWaldTest.
## The contrast z23drugnodrug is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast z23z22drug is not in the results.
## If this is not an extra contrast, then this is an error.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.

##                        Length Class         Mode     
## title                   1     -none-        character
## notes                   1     -none-        character
## initial_metadata       71     data.frame    list     
## expressionset           1     ExpressionSet S4       
## design                 71     data.frame    list     
## conditions             54     -none-        character
## batches                54     -none-        character
## samplenames            54     -none-        character
## colors                 54     -none-        character
## state                   5     -none-        list     
## libsize                54     -none-        numeric  
## original_expressionset  1     ExpressionSet S4       
## normalized              6     -none-        list     
## best_libsize           54     -none-        numeric  
## norm_result             6     -none-        list
## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.

## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/17: Creating table: infsbz23_vs_infsbz22.  Adjust = BH
## Limma step 6/6: 2/17: Creating table: infz22_vs_infsbz22.  Adjust = BH
## Limma step 6/6: 3/17: Creating table: infz23_vs_infsbz22.  Adjust = BH
## Limma step 6/6: 4/17: Creating table: uninfnone_vs_infsbz22.  Adjust = BH
## Limma step 6/6: 5/17: Creating table: uninfsbnone_vs_infsbz22.  Adjust = BH
## Limma step 6/6: 6/17: Creating table: infz22_vs_infsbz23.  Adjust = BH
## Limma step 6/6: 7/17: Creating table: infz23_vs_infsbz23.  Adjust = BH
## Limma step 6/6: 8/17: Creating table: uninfnone_vs_infsbz23.  Adjust = BH
## Limma step 6/6: 9/17: Creating table: uninfsbnone_vs_infsbz23.  Adjust = BH
## Limma step 6/6: 10/17: Creating table: infz23_vs_infz22.  Adjust = BH
## Limma step 6/6: 11/17: Creating table: uninfnone_vs_infz22.  Adjust = BH
## Limma step 6/6: 12/17: Creating table: uninfsbnone_vs_infz22.  Adjust = BH
## Limma step 6/6: 13/17: Creating table: uninfnone_vs_infz23.  Adjust = BH
## Limma step 6/6: 14/17: Creating table: uninfsbnone_vs_infz23.  Adjust = BH
## Limma step 6/6: 15/17: Creating table: uninfsbnone_vs_uninfnone.  Adjust = BH
## Limma step 6/6: 16/17: Creating table: z23drugnodrug_vs_z22drugnodrug.  Adjust = BH
## Limma step 6/6: 17/17: Creating table: z23z22drug_vs_z23z22nodrug.  Adjust = BH
## Limma step 6/6: 1/6: Creating table: infsbz22.  Adjust = BH
## Limma step 6/6: 2/6: Creating table: infsbz23.  Adjust = BH
## Limma step 6/6: 3/6: Creating table: infz22.  Adjust = BH
## Limma step 6/6: 4/6: Creating table: infz23.  Adjust = BH
## Limma step 6/6: 5/6: Creating table: uninfnone.  Adjust = BH
## Limma step 6/6: 6/6: Creating table: uninfsbnone.  Adjust = BH

##                        Length Class         Mode     
## title                   1     -none-        character
## notes                   1     -none-        character
## initial_metadata       71     data.frame    list     
## expressionset           1     ExpressionSet S4       
## design                 71     data.frame    list     
## conditions             54     -none-        character
## batches                54     -none-        character
## samplenames            54     -none-        character
## colors                 54     -none-        character
## state                   5     -none-        list     
## libsize                54     -none-        numeric  
## original_expressionset  1     ExpressionSet S4       
## normalized              6     -none-        list     
## best_libsize           54     -none-        numeric  
## norm_result             6     -none-        list

tmp_keepers <- tmrc2_human_keepers[13]

hs_macr_table <- combine_de_tables(
  hs_macr_de,
  keepers = tmrc2_human_keepers,
  excel = glue("analyses/macrophage_de/hs_macr_drug_zymo_table_testing_macr_only-v{ver}.xlsx"))
## Warning in combine_extracted_plots(entry_name, combined, wanted_denominator, : I think this
## is an extra contrast table, the plots may be weird.
## Did not find z22drugnodrug or z23drugnodrug.
## Warning in combine_extracted_plots(entry_name, combined, wanted_denominator, : I think this
## is an extra contrast table, the plots may be weird.
## Did not find z22drugnodrug or z23drugnodrug.
## Warning in combine_extracted_plots(entry_name, combined, wanted_denominator, : I think this
## is an extra contrast table, the plots may be weird.
## Did not find z23z22nodrug or z23z22drug.
## Warning in combine_extracted_plots(entry_name, combined, wanted_denominator, : I think this
## is an extra contrast table, the plots may be weird.
## Did not find z23z22nodrug or z23z22drug.
## Adding venn plots for z23nosb_vs_uninf.
## Adding venn plots for z22nosb_vs_uninf.
## Adding venn plots for z23nosb_vs_z22nosb.
## Adding venn plots for z23sb_vs_z22sb.
## Adding venn plots for z23sb_vs_z23nosb.
## Adding venn plots for z22sb_vs_z22nosb.
## Adding venn plots for z23sb_vs_sb.
## Adding venn plots for z22sb_vs_sb.
## Adding venn plots for z23sb_vs_uninf.
## Adding venn plots for z22sb_vs_uninf.
## Adding venn plots for sb_vs_uninf.
## Adding venn plots for extra_z2322.
## Adding venn plots for extra_drugnodrug.
combined_to_tsv(hs_macr_table, "macrophage")

hs_macr_sig <- extract_significant_genes(
  hs_macr_table,
  excel = glue("analyses/macrophage_de/hs_macr_drug_zymo_sig-v{ver}.xlsx"))
## There is no deseq_logfc column in the table.
## The columns are: ensembltranscriptid, ensemblgeneid, version, transcriptversion, hgncsymbol, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_ihw, edger_adjp_ihw
## There is no deseq_logfc column in the table.
## The columns are: ensembltranscriptid, ensemblgeneid, version, transcriptversion, hgncsymbol, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_ihw, edger_adjp_ihw
## There is no basic_logfc column in the table.
## The columns are: ensembltranscriptid, ensemblgeneid, version, transcriptversion, hgncsymbol, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_ihw, edger_adjp_ihw
## There is no basic_logfc column in the table.
## The columns are: ensembltranscriptid, ensemblgeneid, version, transcriptversion, hgncsymbol, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_ihw, edger_adjp_ihw
hs_macr_highsig <- extract_significant_genes(
  hs_macr_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/hs_macr_drug_zymo_highsig-v{ver}.xlsx"))
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column = this_fc_column,
## : The column deseq_basemean does not appears to be in the table, cannot filter by expression.
## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column = this_fc_column,
## : The column deseq_basemean does not appears to be in the table, cannot filter by expression.

## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column = this_fc_column,
## : The column deseq_basemean does not appears to be in the table, cannot filter by expression.

## Warning in get_sig_genes(this_table, lfc = lfc, p = p, z = z, n = n, column = this_fc_column,
## : The column deseq_basemean does not appears to be in the table, cannot filter by expression.
## There is no deseq_logfc column in the table.
## The columns are: ensembltranscriptid, ensemblgeneid, version, transcriptversion, hgncsymbol, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_ihw, edger_adjp_ihw
## There is no deseq_logfc column in the table.
## The columns are: ensembltranscriptid, ensemblgeneid, version, transcriptversion, hgncsymbol, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_ihw, edger_adjp_ihw
## There is no basic_logfc column in the table.
## The columns are: ensembltranscriptid, ensemblgeneid, version, transcriptversion, hgncsymbol, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_ihw, edger_adjp_ihw
## There is no basic_logfc column in the table.
## The columns are: ensembltranscriptid, ensemblgeneid, version, transcriptversion, hgncsymbol, description, genebiotype, cdslength, chromosomename, strand, startposition, endposition, transcript, edger_logfc, edger_adjp, limma_logfc, limma_adjp, edger_logcpm, edger_lr, edger_p, limma_ave, limma_t, limma_b, limma_p, limma_adjp_ihw, edger_adjp_ihw

3.2.1 Our main questions in U937

Let us do the same comparisons in the U937 samples, though I will not do the extra contrasts, primarily because I think the dataset is less likely to support them.

u937_de <- all_pairwise(u937_expt, model_batch = "svaseq", filter = TRUE)
## 
##      inf_z22      inf_z23    infsb_z22    infsb_z23   uninf_none uninfsb_none 
##            3            3            3            3            1            1
## Removing 0 low-count genes (10751 remaining).
## Setting 5 low elements to zero.
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.

u937_table <- combine_de_tables(
  u937_de,
  keepers = u937_keepers,
  excel = glue("analyses/macrophage_de/u937_drug_zymo_table-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
combined_to_tsv(u937_table, celltype = "u937")
## Error in combined_to_tsv(u937_table, celltype = "u937"): object 'u937_table' not found
u937_sig <- extract_significant_genes(
  u937_table,
  excel = glue("analyses/macrophage_de/u937_drug_zymo_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(u937_table, excel = glue("analyses/macrophage_de/u937_drug_zymo_sig-v{ver}.xlsx")): object 'u937_table' not found
u937_highsig <- extract_significant_genes(
  u937_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/u937_drug_zymo_highsig-v{ver}.xlsx"))
## Error in extract_significant_genes(u937_table, min_mean_exprs = high_expression, : object 'u937_table' not found

3.2.1.1 Compare (no)Sb z2.3/z2.2 treatments among macrophages

upset_plots_hs_macr <- upsetr_sig(
  hs_macr_sig, both = TRUE,
  contrasts = c("z23sb_vs_z22sb", "z23nosb_vs_z22nosb"))
upset_plots_hs_macr[["both"]]

groups <- upset_plots_hs_macr[["both_groups"]]
shared_genes <- attr(groups, "elements")[groups[[2]]] %>%
  gsub(pattern = "^gene:", replacement = "")
length(shared_genes)
## [1] 387
shared_gp <- simple_gprofiler(shared_genes)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
shared_gp[["pvalue_plots"]][["MF"]]

shared_gp[["pvalue_plots"]][["BP"]]

shared_gp[["pvalue_plots"]][["REAC"]]

drug_genes <- attr(groups, "elements")[groups[["z23sb_vs_z22sb"]]] %>%
  gsub(pattern = "^gene:", replacement = "")
drugonly_gp <- simple_gprofiler(drug_genes)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
drugonly_gp[["pvalue_plots"]][["BP"]]

I want to try something, directly include the u937 data in this…

both_sig <- hs_macr_sig
names(both_sig[["deseq"]][["ups"]]) <- paste0("macr_", names(both_sig[["deseq"]][["ups"]]))
names(both_sig[["deseq"]][["downs"]]) <- paste0("macr_", names(both_sig[["deseq"]][["downs"]]))
u937_deseq <- u937_sig[["deseq"]]
## Error in eval(expr, envir, enclos): object 'u937_sig' not found
names(u937_deseq[["ups"]]) <- paste0("u937_", names(u937_deseq[["ups"]]))
## Error in paste0("u937_", names(u937_deseq[["ups"]])): object 'u937_deseq' not found
names(u937_deseq[["downs"]]) <- paste0("u937_", names(u937_deseq[["downs"]]))
## Error in paste0("u937_", names(u937_deseq[["downs"]])): object 'u937_deseq' not found
both_sig[["deseq"]][["ups"]] <- c(both_sig[["deseq"]][["ups"]], u937_deseq[["ups"]])
## Error in eval(expr, envir, enclos): object 'u937_deseq' not found
both_sig[["deseq"]][["downs"]] <- c(both_sig[["deseq"]][["ups"]], u937_deseq[["downs"]])
## Error in eval(expr, envir, enclos): object 'u937_deseq' not found
summary(both_sig[["deseq"]][["ups"]])
##                         Length Class      Mode
## macr_z23nosb_vs_uninf   49     data.frame list
## macr_z22nosb_vs_uninf   49     data.frame list
## macr_z23nosb_vs_z22nosb 49     data.frame list
## macr_z23sb_vs_z22sb     49     data.frame list
## macr_z23sb_vs_z23nosb   49     data.frame list
## macr_z22sb_vs_z22nosb   49     data.frame list
## macr_z23sb_vs_sb        49     data.frame list
## macr_z22sb_vs_sb        49     data.frame list
## macr_z23sb_vs_uninf     49     data.frame list
## macr_z22sb_vs_uninf     49     data.frame list
## macr_sb_vs_uninf        49     data.frame list
## macr_extra_z2322         0     data.frame list
## macr_extra_drugnodrug    0     data.frame list
upset_plots_both <- upsetr_sig(
  both_sig, both = TRUE,
  contrasts = c("macr_z23sb_vs_z22sb", "macr_z23nosb_vs_z22nosb",
                "u937_z23sb_vs_z22sb", "u937_z23nosb_vs_z22nosb"))
upset_plots_both$both

3.2.1.2 Compare DE results from macrophages and U937 samples

Looking a bit more closely at these, I think the u937 data is too sparse to effectively compare.

macr_u937_comparison <- compare_de_results(hs_macr_table, u937_table)
## Error: object 'u937_table' not found
macr_u937_comparison$lfc_heat
## Error in eval(expr, envir, enclos): object 'macr_u937_comparison' not found
macr_u937_venns <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                 contrasts = "z23sb_vs_z23nosb")
## Error: object 'u937_sig' not found
macr_u937_venns$up_plot
## Error in eval(expr, envir, enclos): object 'macr_u937_venns' not found
macr_u937_venns$down_plot
## Error in eval(expr, envir, enclos): object 'macr_u937_venns' not found
macr_u937_venns_v2 <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                    contrasts = "z22sb_vs_z22nosb")
## Error in compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig, : object 'u937_sig' not found
macr_u937_venns_v2$up_plot
## Error in eval(expr, envir, enclos): object 'macr_u937_venns_v2' not found
macr_u937_venns_v2$down_plot
## Error in eval(expr, envir, enclos): object 'macr_u937_venns_v2' not found
macr_u937_venns_v3 <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                    contrasts = "sb_vs_uninf")
## Error in compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig, : object 'u937_sig' not found
macr_u937_venns_v3$up_plot
## Error in eval(expr, envir, enclos): object 'macr_u937_venns_v3' not found
macr_u937_venns_v3$down_plot
## Error in eval(expr, envir, enclos): object 'macr_u937_venns_v3' not found

3.2.2 Compare macrophage/u937 with respect to z2.3/z2.2

comparison_df <- merge(hs_macr_table[["data"]][["z23sb_vs_z22sb"]],
                       u937_table[["data"]][["z23sb_vs_z22sb"]],
                       by = "row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'merge': object 'u937_table' not found
macru937_z23z22_plot <- plot_linear_scatter(comparison_df[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'comparison_df' not found
macru937_z23z22_plot$scatter
## Error in eval(expr, envir, enclos): object 'macru937_z23z22_plot' not found
comparison_df <- merge(hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]],
                       u937_table[["data"]][["z23nosb_vs_z22nosb"]],
                       by = "row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'y' in selecting a method for function 'merge': object 'u937_table' not found
macru937_z23z22_plot <- plot_linear_scatter(comparison_df[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'comparison_df' not found
macru937_z23z22_plot$scatter
## Error in eval(expr, envir, enclos): object 'macru937_z23z22_plot' not found

3.2.2.1 Add donor to the contrasts, no sva

no_power_fact <- paste0(pData(hs_macr)[["donor"]], "_",
                        pData(hs_macr)[["condition"]])
table(pData(hs_macr)[["donor"]])
## 
## d01 d02 d09 d81 
##  13  14  13  14
table(no_power_fact)
## no_power_fact
##      d01_inf_z22      d01_inf_z23    d01_infsb_z22    d01_infsb_z23   d01_uninf_none 
##                2                3                3                3                1 
## d01_uninfsb_none      d02_inf_z22      d02_inf_z23    d02_infsb_z22    d02_infsb_z23 
##                1                3                3                3                3 
##   d02_uninf_none d02_uninfsb_none      d09_inf_z22      d09_inf_z23    d09_infsb_z22 
##                1                1                3                3                3 
##    d09_infsb_z23   d09_uninf_none d09_uninfsb_none      d81_inf_z22      d81_inf_z23 
##                2                1                1                3                3 
##    d81_infsb_z22    d81_infsb_z23   d81_uninf_none d81_uninfsb_none 
##                3                3                1                1
hs_nopower <- set_expt_conditions(hs_macr, fact = no_power_fact)
## 
##      d01_inf_z22      d01_inf_z23    d01_infsb_z22    d01_infsb_z23   d01_uninf_none 
##                2                3                3                3                1 
## d01_uninfsb_none      d02_inf_z22      d02_inf_z23    d02_infsb_z22    d02_infsb_z23 
##                1                3                3                3                3 
##   d02_uninf_none d02_uninfsb_none      d09_inf_z22      d09_inf_z23    d09_infsb_z22 
##                1                1                3                3                3 
##    d09_infsb_z23   d09_uninf_none d09_uninfsb_none      d81_inf_z22      d81_inf_z23 
##                2                1                1                3                3 
##    d81_infsb_z22    d81_infsb_z23   d81_uninf_none d81_uninfsb_none 
##                3                3                1                1
hs_nopower <- subset_expt(hs_nopower, subset="macrophagezymodeme!='none'")
## subset_expt(): There were 54, now there are 46 samples.
hs_nopower_nosva_de <- all_pairwise(hs_nopower, model_batch = FALSE, filter = TRUE)
## 
##   d01_inf_z22   d01_inf_z23 d01_infsb_z22 d01_infsb_z23   d02_inf_z22   d02_inf_z23 
##             2             3             3             3             3             3 
## d02_infsb_z22 d02_infsb_z23   d09_inf_z22   d09_inf_z23 d09_infsb_z22 d09_infsb_z23 
##             3             3             3             3             3             2 
##   d81_inf_z22   d81_inf_z23 d81_infsb_z22 d81_infsb_z23 
##             3             3             3             3

nopower_keepers <- list(
  "d01_zymo" = c("d01infz23", "d01infz22"),
  "d01_sbzymo" = c("d01infsbz23", "d01infsbz22"),
  "d02_zymo" = c("d02infz23", "d02infz22"),
  "d02_sbzymo" = c("d02infsbz23", "d02infsbz22"),
  "d09_zymo" = c("d09infz23", "d09infz22"),
  "d09_sbzymo" = c("d09infsbz23", "d09infsbz22"),
  "d81_zymo" = c("d81infz23", "d81infz22"),
  "d81_sbzymo" = c("d81infsbz23", "d81infsbz22"))
hs_nopower_nosva_table <- combine_de_tables(
  hs_nopower_nosva_de, keepers = nopower_keepers,
  excel = glue("analyses/macrophage_de/hs_nopower_table-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
##                                  extra_contrasts = extra)
hs_nopower_nosva_sig <- extract_significant_genes(
  hs_nopower_nosva_table,
  excel = glue("analyses/macrophage_de/hs_nopower_nosva_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(hs_nopower_nosva_table, excel = glue("analyses/macrophage_de/hs_nopower_nosva_sig-v{ver}.xlsx")): object 'hs_nopower_nosva_table' not found
d01d02_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d01_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d02_zymo"]],
                                by="row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'hs_nopower_nosva_table' not found
d0102_zymo_nosva_plot <- plot_linear_scatter(d01d02_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'd01d02_zymo_nosva_comp' not found
d0102_zymo_nosva_plot$scatter
## Error in eval(expr, envir, enclos): object 'd0102_zymo_nosva_plot' not found
d0102_zymo_nosva_plot$correlation
## Error in eval(expr, envir, enclos): object 'd0102_zymo_nosva_plot' not found
d0102_zymo_nosva_plot$lm_rsq
## Error in eval(expr, envir, enclos): object 'd0102_zymo_nosva_plot' not found
d09d81_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d09_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d81_zymo"]],
                                by="row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'hs_nopower_nosva_table' not found
d0981_zymo_nosva_plot <- plot_linear_scatter(d09d81_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'd09d81_zymo_nosva_comp' not found
d0981_zymo_nosva_plot$scatter
## Error in eval(expr, envir, enclos): object 'd0981_zymo_nosva_plot' not found
d0981_zymo_nosva_plot$correlation
## Error in eval(expr, envir, enclos): object 'd0981_zymo_nosva_plot' not found
d0981_zymo_nosva_plot$lm_rsq
## Error in eval(expr, envir, enclos): object 'd0981_zymo_nosva_plot' not found
d01d81_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d01_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d81_zymo"]],
                                by="row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'hs_nopower_nosva_table' not found
d0181_zymo_nosva_plot <- plot_linear_scatter(d01d81_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'd01d81_zymo_nosva_comp' not found
d0181_zymo_nosva_plot$scatter
## Error in eval(expr, envir, enclos): object 'd0181_zymo_nosva_plot' not found
d0181_zymo_nosva_plot$correlation
## Error in eval(expr, envir, enclos): object 'd0181_zymo_nosva_plot' not found
d0181_zymo_nosva_plot$lm_rsq
## Error in eval(expr, envir, enclos): object 'd0181_zymo_nosva_plot' not found
upset_plots_nosva <- upsetr_sig(hs_nopower_nosva_sig, both=TRUE,
                                contrasts=c("d01_zymo", "d02_zymo", "d09_zymo", "d81_zymo"))
## Error in upsetr_sig(hs_nopower_nosva_sig, both = TRUE, contrasts = c("d01_zymo", : object 'hs_nopower_nosva_sig' not found
upset_plots_nosva$up
## Error in eval(expr, envir, enclos): object 'upset_plots_nosva' not found
upset_plots_nosva$down
## Error in eval(expr, envir, enclos): object 'upset_plots_nosva' not found
upset_plots_nosva$both
## Error in eval(expr, envir, enclos): object 'upset_plots_nosva' not found
## The 7th element in the both groups list is the set shared among all donors.
## I don't feel like writing out x:y:z:a
groups <- upset_plots_nosva[["both_groups"]]
## Error in eval(expr, envir, enclos): object 'upset_plots_nosva' not found
shared_genes <- attr(groups, "elements")[groups[[7]]] %>%
  gsub(pattern = "^gene:", replacement = "")
## Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'gsub': subscript out of bounds
shared_gp <- simple_gprofiler(shared_genes)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
shared_gp$pvalue_plots$MF

shared_gp$pvalue_plots$BP

shared_gp$pvalue_plots$REAC

shared_gp$pvalue_plots$WP

3.2.2.2 Add donor to the contrasts, sva

hs_nopower_sva_de <- all_pairwise(hs_nopower, model_batch = "svaseq", filter = TRUE)
## 
##   d01_inf_z22   d01_inf_z23 d01_infsb_z22 d01_infsb_z23   d02_inf_z22   d02_inf_z23 
##             2             3             3             3             3             3 
## d02_infsb_z22 d02_infsb_z23   d09_inf_z22   d09_inf_z23 d09_infsb_z22 d09_infsb_z23 
##             3             3             3             3             3             2 
##   d81_inf_z22   d81_inf_z23 d81_infsb_z22 d81_infsb_z23 
##             3             3             3             3
## Removing 0 low-count genes (11720 remaining).
## Setting 2174 low elements to zero.
## transform_counts: Found 2174 values equal to 0, adding 1 to the matrix.

nopower_keepers <- list(
  "d01_zymo" = c("d01infz23", "d01infz22"),
  "d01_sbzymo" = c("d01infsbz23", "d01infsbz22"),
  "d02_zymo" = c("d02infz23", "d02infz22"),
  "d02_sbzymo" = c("d02infsbz23", "d02infsbz22"),
  "d09_zymo" = c("d09infz23", "d09infz22"),
  "d09_sbzymo" = c("d09infsbz23", "d09infsbz22"),
  "d81_zymo" = c("d81infz23", "d81infz22"),
  "d81_sbzymo" = c("d81infsbz23", "d81infsbz22"))
hs_nopower_sva_table <- combine_de_tables(
  hs_nopower_sva_de, keepers = nopower_keepers,
  excel = glue("analyses/macrophage_de/hs_nopower_table-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
##                                  extra_contrasts = extra)
hs_nopower_sva_sig <- extract_significant_genes(
  hs_nopower_sva_table,
  excel = glue("analyses/macrophage_de/hs_nopower_sva_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(hs_nopower_sva_table, excel = glue("analyses/macrophage_de/hs_nopower_sva_sig-v{ver}.xlsx")): object 'hs_nopower_sva_table' not found
d01d02_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d01_zymo"]],
                              hs_nopower_sva_table[["data"]][["d02_zymo"]],
                              by="row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'hs_nopower_sva_table' not found
d0102_zymo_sva_plot <- plot_linear_scatter(d01d02_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'd01d02_zymo_sva_comp' not found
d0102_zymo_sva_plot$scatter
## Error in eval(expr, envir, enclos): object 'd0102_zymo_sva_plot' not found
d0102_zymo_sva_plot$correlation
## Error in eval(expr, envir, enclos): object 'd0102_zymo_sva_plot' not found
d0102_zymo_sva_plot$lm_rsq
## Error in eval(expr, envir, enclos): object 'd0102_zymo_sva_plot' not found
d09d81_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d09_zymo"]],
                              hs_nopower_sva_table[["data"]][["d81_zymo"]],
                              by="row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'hs_nopower_sva_table' not found
d0981_zymo_sva_plot <- plot_linear_scatter(d09d81_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'd09d81_zymo_sva_comp' not found
d0981_zymo_sva_plot$scatter
## Error in eval(expr, envir, enclos): object 'd0981_zymo_sva_plot' not found
d0981_zymo_sva_plot$correlation
## Error in eval(expr, envir, enclos): object 'd0981_zymo_sva_plot' not found
d0981_zymo_sva_plot$lm_rsq
## Error in eval(expr, envir, enclos): object 'd0981_zymo_sva_plot' not found
d01d81_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d01_zymo"]],
                              hs_nopower_sva_table[["data"]][["d81_zymo"]],
                              by="row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'hs_nopower_sva_table' not found
d0181_zymo_sva_plot <- plot_linear_scatter(d01d81_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
## Error in is.data.frame(x): object 'd01d81_zymo_sva_comp' not found
d0181_zymo_sva_plot$scatter
## Error in eval(expr, envir, enclos): object 'd0181_zymo_sva_plot' not found
d0181_zymo_sva_plot$correlation
## Error in eval(expr, envir, enclos): object 'd0181_zymo_sva_plot' not found
d0181_zymo_sva_plot$lm_rsq
## Error in eval(expr, envir, enclos): object 'd0181_zymo_sva_plot' not found
upset_plots_sva <- upsetr_sig(hs_nopower_sva_sig, both=TRUE,
                              contrasts=c("d01_zymo", "d02_zymo", "d09_zymo", "d81_zymo"))
## Error in upsetr_sig(hs_nopower_sva_sig, both = TRUE, contrasts = c("d01_zymo", : object 'hs_nopower_sva_sig' not found
upset_plots_sva$up
## Error in eval(expr, envir, enclos): object 'upset_plots_sva' not found
upset_plots_sva$down
## Error in eval(expr, envir, enclos): object 'upset_plots_sva' not found
upset_plots_sva$both
## Error in eval(expr, envir, enclos): object 'upset_plots_sva' not found
## The 7th element in the both groups list is the set shared among all donors.
## I don't feel like writing out x:y:z:a
groups <- upset_plots_sva[["both_groups"]]
## Error in eval(expr, envir, enclos): object 'upset_plots_sva' not found
shared_genes <- attr(groups, "elements")[groups[[7]]] %>%
  gsub(pattern = "^gene:", replacement = "")
## Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'gsub': subscript out of bounds
shared_gp <- simple_gprofiler(shared_genes)
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
## No results to show
## Please make sure that the organism is correct or set significant = FALSE
shared_gp$pvalue_plots$MF

shared_gp$pvalue_plots$BP

shared_gp$pvalue_plots$REAC

shared_gp$pvalue_plots$WP

3.2.3 Donor comparison

hs_donors <- set_expt_conditions(hs_macr, fact = "donor")
## 
## d01 d02 d09 d81 
##  13  14  13  14
donor_de <- all_pairwise(hs_donors, model_batch="svaseq", filter=TRUE)
## 
## d01 d02 d09 d81 
##  13  14  13  14
## Removing 0 low-count genes (11756 remaining).
## Setting 1225 low elements to zero.
## transform_counts: Found 1225 values equal to 0, adding 1 to the matrix.

donor_table <- combine_de_tables(
  donor_de,
  excel=glue("analyses/macrophage_de/donor_tables-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
donor_sig <- extract_significant_genes(
  donor_table,
  excel = glue("analyses/macrophage_de/donor_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(donor_table, excel = glue("analyses/macrophage_de/donor_sig-v{ver}.xlsx")): object 'donor_table' not found

3.2.3.1 Primary query contrasts

The final contrast in this list is interesting because it depends on the extra contrasts applied to the all_pairwise() above. In my way of thinking, the primary comparisons to consider are either cross-drug or cross-strain, but not both. However I think in at least a few instances Olga is interested in strain+drug / uninfected+nodrug.

3.2.3.2 Write contrast results

Now let us write out the xlsx file containing the above contrasts. The file with the suffix _table-version will therefore contain all genes and the file with the suffix _sig-version will contain only those deemed significant via our default criteria of DESeq2 |logFC| >= 1.0 and adjusted p-value <= 0.05.

4 Over representation searches

I decided to make one initially small, but I think quickly big change to the organization of this document: I am moving the GSEA searches up to immediately after the DE. I will then move the plots of the gprofiler results to immediately after the various volcano plots so that it is easier to interpret them.

all_gp <- all_gprofiler(hs_macr_sig)
for (g in seq_len(length(all_gp))) {
  name <- names(all_gp)[g]
  datum <- all_gp[[name]]
  filename <- glue("analyses/macrophage_de/gprofiler/{name}_gprofiler-v{ver}.xlsx")
  written <- sm(write_gprofiler_data(datum, excel = filename))
}
## Error in wb$check_overwrite_tables(sheet = sheet, new_rows = c(startRow,  : 
##   Cannot overwrite existing table with another table.

5 Plot contrasts of interest

One suggestion I received recently was to set the axes for these volcano plots to be static rather than let ggplot choose its own. I am assuming this is only relevant for pairs of contrasts, but that might not be true.

5.1 Individual zymodemes vs. uninfected

5.1.1 Infected with z2.3 no Antimonial vs. Uninfected

plot_colors <- get_expt_colors(hs_macr_table[["input"]][["input"]])
x_limits <- c(-20, 10)

## The original plot from my xlsx file
hs_macr_table$plots$z23nosb_vs_uninf$deseq_vol_plots
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

z23nosb_vs_uninf_volcano <- plot_volcano_condition_de(
  input = hs_macr_table[["data"]][["z23nosb_vs_uninf"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz23"]])
z23nosb_vs_uninf_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

plotly::ggplotly(z23nosb_vs_uninf_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23nosb_vs_uninf_volcano_nol <- plot_volcano_condition_de(
  input = hs_macr_table[["data"]][["z23nosb_vs_uninf"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz23"]])
z23nosb_vs_uninf_volcano_nol$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).

all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["REAC"]]

## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["KEGG"]]

## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["MF"]]

## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["WP"]]

## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["REAC"]]
## NULL
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["MF"]]

## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

5.1.2 Infected with z2.2 no Antimonial vs. Uninfected

## The original plot
hs_macr_table$plots$z22nosb_vs_uninf$deseq_vol_plots
## Warning: ggrepel: 9 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

z22nosb_vs_uninf_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22nosb_vs_uninf"]], "z22nosb_vs_uninf",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz22"]])
z22nosb_vs_uninf_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).
## Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

plotly::ggplotly(z22nosb_vs_uninf_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[3L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[3L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

## Warning in geom2trace.default(dots[[1L]][[3L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z22nosb_vs_uninf_volcano_nol <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22nosb_vs_uninf"]], "z22nosb_vs_uninf",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz22"]])
z22nosb_vs_uninf_volcano_nol$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).

all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["REAC"]]

## Reactome, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["MF"]]

## MF, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["TF"]]

## TF, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["WP"]]

## WikiPathways, zymodeme2.2 without drug vs. uninfected without drug, up.

all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["REAC"]]
## NULL
## Reactome, zymodeme2.2 without drug vs. uninfected without drug, down.
all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["MF"]]
## NULL
## MF, zymodeme2.2 without drug vs. uninfected without drug, down.
all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["TF"]]
## NULL
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.

5.1.3 Infected with z2.3 treated vs. Uninfected treated

## The original plot
hs_macr_table$plots$z23sb_vs_sb$deseq_vol_plots
## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

z23sb_vs_uninfsb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["uninfsbnone"]])
z23sb_vs_uninfsb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

plotly::ggplotly(z23sb_vs_uninfsb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23sb_vs_uninfsb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["uninfsbnone"]])
z23sb_vs_uninfsb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).

5.1.4 Infected with z2.3 untreated vs. z2.2 untreated

## The original plot
hs_macr_table$plots$z23nosb_vs_z22nosb$deseq_vol_plots

z23nosb_vs_z22nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]], "z23nosb_vs_z22nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infz23"]], color_low = plot_colors[["infz22"]])
z23nosb_vs_z22nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)

5.1.5 Infected with z2.3 treated vs. z2.2 treated

## The original plot
hs_macr_table$plots$z23sb_vs_z22sb$deseq_vol_plots
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

z23sb_vs_z22sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_z22sb"]], "z23sb_vs_z22sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = FALSE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["infsbz22"]])
z23sb_vs_z22sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

5.1.6 Infected with z2.3 SB treated vs. z2.3 untreated

## The original plot
hs_macr_table$plots$z23sb_vs_z23nosb$deseq_vol_plots

z23sb_vs_z23nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_z23nosb"]], "z23sb_vs_z23nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz23"]], color_high = plot_colors[["infz23"]])
z23sb_vs_z23nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).

5.1.7 Infected with z2.3 SB treated vs. z2.3 untreated

## The original plot
hs_macr_table$plots$z22sb_vs_z22nosb$deseq_vol_plots

z22sb_vs_z22nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22sb_vs_z22nosb"]], "z22sb_vs_z22nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz22"]], color_high = plot_colors[["infz22"]])
z22sb_vs_z22nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)

5.1.8 Infected with z2.3 SB treated vs. uninfected treated

## The original plot
hs_macr_table$plots$z23sb_vs_sb$deseq_vol_plots
## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

z23sb_vs_sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz23"]], color_high = plot_colors[["uninfsbnone"]])
z23sb_vs_sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

5.1.9 Infected with z2.2 SB treated vs. uninfected treated

## The original plot
hs_macr_table$plots$z22sb_vs_sb$deseq_vol_plots
## Warning: ggrepel: 11 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

z22sb_vs_sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22sb_vs_sb"]], "z22sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz22"]], color_high = plot_colors[["uninfsbnone"]])
z22sb_vs_sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
## Warning: ggrepel: 15 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

Check that my perception of the number of significant up/down genes matches what the table/venn says.

shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_uninf"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_uninf"]])))
pp(file="images/z23_vs_uninf_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

## I see 910 z23sb/uninf and 670 no z23nosb/uninf genes in the venn diagram.
length(shared@IntersectionSets[["10"]]) + length(shared@IntersectionSets[["11"]])
## [1] 839
dim(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_uninf"]])
## [1] 839  49
shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_uninf"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22nosb_vs_uninf"]])))
pp(file="images/z22_vs_uninf_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

length(shared@IntersectionSets[["10"]]) + length(shared@IntersectionSets[["11"]])
## [1] 660
dim(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_uninf"]])
## [1] 660  49

Note to self: There is an error in my volcano plot code which takes effect when the numerator and denominator of the all_pairwise contrasts are different than those in combine_de_tables. It is putting the ups/downs on the correct sides of the plot, but calling the down genes ‘up’ and vice-versa. The reason for this is that I did a check for this happening, but used the wrong argument to handle it.

A likely bit of text for these volcano plots:

The set of genes differentially expressed between the zymodeme 2.3 and uninfected samples without druge treatment was quantified with DESeq2 and included surrogate estimates from SVA. Given the criteria of significance of a abs(logFC) >= 1.0 and false discovery rate adjusted p-value <= 0.05, 670 genes were observed as significantly increased between the infected and uninfected samples and 386 were observed as decreased. The most increased genes from the uninfected samples include some which are potentially indicative of a strong innate immune response and the inflammatory response.

In contrast, when the set of genes differentially expressed between the zymodeme 2.2 and uninfected samples was visualized, only 7 genes were observed as decreased and 435 increased. The inflammatory response was significantly less apparent in this set, but instead included genes related to transporter activity and oxidoreductases.

5.2 Direct zymodeme comparisons

An orthogonal comparison to that performed above is to directly compare the zymodeme 2.3 and 2.2 samples with and without antimonial treatment.

z23nosb_vs_z22nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23nosb_vs_z22nosb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23sb_vs_z22sb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23sb_vs_z22sb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23sb_vs_z22sb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23nosb_vs_z22nosb_volcano$plot +
  xlim(-10, 10) +
  ylim(0, 60)

pp(file="images/z23nosb_vs_z22nosb_reactome_up.png", image=all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Warning in pp(file = "images/z23nosb_vs_z22nosb_reactome_up.png", image =
## all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], : There is no device to shut
## down.
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]]
## NULL
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
z23sb_vs_z22sb_volcano$plot +
  xlim(-10, 10) +
  ylim(0, 60)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).
## Warning: ggrepel: 3 unlabeled data points (too many overlaps). Consider increasing
## max.overlaps

pp(file="images/z23sb_vs_z22sb_reactome_up.png", image=all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Warning in pp(file = "images/z23sb_vs_z22sb_reactome_up.png", image =
## all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["REAC"]], : There is no device to shut down.
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_z22sb"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_z22nosb"]])))
pp(file="images/drug_nodrug_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23sb_vs_z22sb"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23nosb_vs_z22nosb"]])))
pp(file="images/drug_nodrug_venn_down.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2

A slightly different way of looking at the differences between the two zymodeme infections is to directly compare the infected samples with and without drug. Thus, when a volcano plot showing the comparison of the zymodeme 2.3 vs. 2.2 samples was plotted, 484 genes were observed as increased and 422 decreased; these groups include many of the same inflammatory (up) and membrane (down) genes.

Similar patterns were observed when the antimonial was included. Thus, when a Venn diagram of the two sets of increased genes was plotted, a significant number of the genes was observed as increased (313) and decreased (244) in both the untreated and antimonial treated samples.

5.3 Drug effects on each zymodeme infection

Another likely question is to directly compare the treated vs untreated samples for each zymodeme infection in order to visualize the effects of antimonial.

z23sb_vs_z23nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23sb_vs_z23nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23sb_vs_z23nosb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z22sb_vs_z22nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z22sb_vs_z22nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z22sb_vs_z22nosb_volcano$plot)
## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues

## Warning in geom2trace.default(dots[[1L]][[2L]], dots[[2L]][[1L]], dots[[3L]][[1L]]): geom_GeomTextRepel() has yet to be implemented in plotly.
##   If you'd like to see this geom implemented,
##   Please open an issue with your example code at
##   https://github.com/ropensci/plotly/issues
z23sb_vs_z23nosb_volcano$plot +
  xlim(-8, 8) +
  ylim(0, 210)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
## Warning: Removed 1 rows containing missing values (`geom_text_repel()`).

pp(file="images/z23sb_vs_z23nosb_reactome_up.png",
   image=all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Warning in pp(file = "images/z23sb_vs_z23nosb_reactome_up.png", image =
## all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["REAC"]], : There is no device to shut
## down.
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["REAC"]]
## Warning: Removed 1 rows containing missing values (`geom_col()`).
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
z22sb_vs_z22nosb_volcano$plot +
  xlim(-8, 8) +
  ylim(0, 210)

pp(file="images/z22sb_vs_z22nosb_reactome_up.png",
   image=all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Warning in pp(file = "images/z22sb_vs_z22nosb_reactome_up.png", image =
## all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], : There is no device to shut
## down.
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["interactive_plots"]][["WP"]]
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["TF"]]
## Warning: Removed 1 rows containing missing values (`geom_col()`).
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
shared <- Vennerable::Venn(list("z23" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_z23nosb"]]),
                                "z22" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_z22nosb"]])))
pp(file="images/z23_z22_drug_venn_up.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

shared <- Vennerable::Venn(list("z23" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23sb_vs_z23nosb"]]),
                                "z22" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z22sb_vs_z22nosb"]])))
pp(file="images/z23_z22_drug_venn_down.png")
Vennerable::plot(shared)
dev.off()
## png 
##   2
Vennerable::plot(shared)

Note: I am settig the x and y-axis boundaries by allowing the plotter to pick its own axis the first time, writing down the ranges I observe, and then setting them to the largest of the pair. It is therefore possible that I missed one or more genes which lies outside that range.

The previous plotted contrasts sought to show changes between the two strains z2.3 and z2.2. Conversely, the previous volcano plots seek to directly compare each strain before/after drug treatment.

5.4 LRT of the Human Macrophage

tmrc2_lrt_strain_drug <- deseq_lrt(hs_macr, interactor_column = "drug",
                                   interest_column = "macrophagezymodeme", factors = c("drug", "macrophagezymodeme"))
## converting counts to integer mode
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 38 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
## rlog() may take a long time with 50 or more samples,
## vst() is a much faster transformation
## Working with 858 genes.
## Working with 855 genes after filtering: minc > 3
## Joining with `by = join_by(merge)`
## Joining with `by = join_by(merge)`

tmrc2_lrt_strain_drug$cluster_data$plot

5.5 Parasite

lp_macrophage_de <- all_pairwise(lp_macrophage_nosb,
                                 model_batch="svaseq", filter=TRUE)
## 
## z2.2 z2.3 
##   11    7
## Removing 0 low-count genes (8539 remaining).
## Setting 110 low elements to zero.
## transform_counts: Found 110 values equal to 0, adding 1 to the matrix.
tmrc2_parasite_keepers <- list(
  "z23_vs_z22" = c("z23", "z22"))
lp_macrophage_table <- combine_de_tables(
  lp_macrophage_de, keepers = tmrc2_parasite_keepers,
  excel = glue("analyses/macrophage_de/macrophage_parasite_infection_de-v{ver}.xlsx"))
## Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [6]
lp_macrophage_sig <- extract_significant_genes(
  lp_macrophage_table,
  excel = glue("analyses/macrophage_de/macrophage_parasite_sig-v{ver}.xlsx"))
## Error in extract_significant_genes(lp_macrophage_table, excel = glue("analyses/macrophage_de/macrophage_parasite_sig-v{ver}.xlsx")): object 'lp_macrophage_table' not found
lp_macrophage_table[["plots"]][["z23nosb_vs_z22nosb"]][["deseq_vol_plots"]][["plot"]]
## Error in eval(expr, envir, enclos): object 'lp_macrophage_table' not found
up_genes <- lp_macrophage_sig[["deseq"]][["ups"]][[1]]
## Error in eval(expr, envir, enclos): object 'lp_macrophage_sig' not found
dim(up_genes)
## Error in eval(expr, envir, enclos): object 'up_genes' not found
down_genes <- lp_macrophage_sig[["deseq"]][["downs"]][[1]]
## Error in eval(expr, envir, enclos): object 'lp_macrophage_sig' not found
dim(down_genes)
## Error in eval(expr, envir, enclos): object 'down_genes' not found
lp_z23sb_vs_z22sb_volcano <- plot_volcano_de(
  table = lp_macrophage_table[["data"]][["z23_vs_z22"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
## Error in is.data.frame(x): object 'lp_macrophage_table' not found
plotly::ggplotly(lp_z23sb_vs_z22sb_volcano$plot)
## Error in plotly::ggplotly(lp_z23sb_vs_z22sb_volcano$plot): object 'lp_z23sb_vs_z22sb_volcano' not found
lp_z23sb_vs_z22sb_volcano$plot
## Error in eval(expr, envir, enclos): object 'lp_z23sb_vs_z22sb_volcano' not found
up_goseq <- simple_goseq(up_genes, go_db = lp_go, length_db = lp_lengths)
## Error in simple_goseq(up_genes, go_db = lp_go, length_db = lp_lengths): object 'up_genes' not found
## View categories over represented in the 2.3 samples
up_goseq$pvalue_plots$bpp_plot_over
## Error in eval(expr, envir, enclos): object 'up_goseq' not found
down_goseq <- simple_goseq(down_genes, go_db = lp_go, length_db = lp_lengths)
## Error in simple_goseq(down_genes, go_db = lp_go, length_db = lp_lengths): object 'down_genes' not found
## View categories over represented in the 2.2 samples
down_goseq$pvalue_plots$bpp_plot_over
## Error in eval(expr, envir, enclos): object 'down_goseq' not found
written_goseq <- write_goseq_data(up_goseq,
                                  excel = glue("lp_macrophage_increased_z2.3_goseq-v{ver}.xlsx"))
## Writing a sheet containing the legend.
## Error in nrow(goseq_result[["bp_subset"]]): object 'up_goseq' not found
written_goseq <- write_goseq_data(down_goseq,
                                  excel = glue("lp_macrophage_increased_z2.2_goseq-v{ver}.xlsx"))
## Writing a sheet containing the legend.
## Error in nrow(goseq_result[["bp_subset"]]): object 'down_goseq' not found

6 GSVA

hs_infected <- subset_expt(hs_macrophage, subset="macrophagetreatment!='uninf'") %>%
  subset_expt(subset="macrophagetreatment!='uninf_sb'")
## subset_expt(): There were 68, now there are 63 samples.
## subset_expt(): There were 63, now there are 63 samples.
hs_gsva_c2 <- simple_gsva(hs_infected)
## Converting the rownames() of the expressionset to ENTREZID.
## 1630 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 21481 entries.
## After conversion, the expressionset has 20006 entries.
hs_gsva_c2_meta <- get_msigdb_metadata(hs_gsva_c2, msig_xml="reference/msigdb_v7.2.xml")
## The downloaded msigdb contained 2725 rownames shared with the gsva result out of 2989.
hs_gsva_c2_sig <- get_sig_gsva_categories(hs_gsva_c2_meta, excel = "analyses/macrophage_de/hs_macrophage_gsva_c2_sig.xlsx")
##                  Length Class         Mode     
## title             1     -none-        character
## notes             1     -none-        character
## initial_metadata 71     data.frame    list     
## expressionset     1     ExpressionSet S4       
## design           71     data.frame    list     
## conditions       63     -none-        character
## batches          63     -none-        character
## samplenames      63     -none-        character
## colors           63     -none-        character
## state             5     -none-        list     
## libsize          63     -none-        numeric
## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/3: Creating table: infsb_vs_inf.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: uninfsb_vs_inf.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: uninfsb_vs_infsb.  Adjust = BH
## Limma step 6/6: 1/3: Creating table: inf.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: infsb.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: uninfsb.  Adjust = BH
##                  Length Class         Mode     
## title             1     -none-        character
## notes             1     -none-        character
## initial_metadata 71     data.frame    list     
## expressionset     1     ExpressionSet S4       
## design           71     data.frame    list     
## conditions       63     -none-        character
## batches          63     -none-        character
## samplenames      63     -none-        character
## colors           63     -none-        character
## state             5     -none-        list     
## libsize          63     -none-        numeric
## The factor inf has 29 rows.
## The factor inf_sb has 29 rows.
## The factor uninf_sb has 5 rows.
## Testing each factor against the others.
## Scoring inf against everything else.
## Scoring inf_sb against everything else.
## Scoring uninf_sb against everything else.
hs_gsva_c2_sig$raw_plot

hs_gsva_c7 <- simple_gsva(hs_infected, signature_category = "c7")
## Converting the rownames() of the expressionset to ENTREZID.
## 1630 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 21481 entries.
## After conversion, the expressionset has 20006 entries.
hs_gsva_c7_meta <- get_msigdb_metadata(hs_gsva_c7, msig_xml="reference/msigdb_v7.2.xml")
## The downloaded msigdb contained 2725 rownames shared with the gsva result out of 2989.
hs_gsva_c7_sig <- get_sig_gsva_categories(hs_gsva_c7, excel = "analyses/macrophage_de/hs_macrophage_gsva_c7_sig.xlsx")
##                  Length Class         Mode     
## title             1     -none-        character
## notes             1     -none-        character
## initial_metadata 71     data.frame    list     
## expressionset     1     ExpressionSet S4       
## design           71     data.frame    list     
## conditions       63     -none-        character
## batches          63     -none-        character
## samplenames      63     -none-        character
## colors           63     -none-        character
## state             5     -none-        list     
## libsize          63     -none-        numeric
## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/3: Creating table: infsb_vs_inf.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: uninfsb_vs_inf.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: uninfsb_vs_infsb.  Adjust = BH
## Limma step 6/6: 1/3: Creating table: inf.  Adjust = BH
## Limma step 6/6: 2/3: Creating table: infsb.  Adjust = BH
## Limma step 6/6: 3/3: Creating table: uninfsb.  Adjust = BH
##                  Length Class         Mode     
## title             1     -none-        character
## notes             1     -none-        character
## initial_metadata 71     data.frame    list     
## expressionset     1     ExpressionSet S4       
## design           71     data.frame    list     
## conditions       63     -none-        character
## batches          63     -none-        character
## samplenames      63     -none-        character
## colors           63     -none-        character
## state             5     -none-        list     
## libsize          63     -none-        numeric
## The factor inf has 29 rows.
## The factor inf_sb has 29 rows.
## The factor uninf_sb has 5 rows.
## Testing each factor against the others.
## Scoring inf against everything else.
## Scoring inf_sb against everything else.
## Scoring uninf_sb against everything else.
hs_gsva_c7_sig$raw_plot

7 Try out a new tool

Two reasons: Najib loves him some PCA, this uses wikipathways, which is something I think is neat.

Ok, I spent some time looking through the code and I have some problems with some of the design decisions.

Most importantly, it requires a data.frame() which has the following format:

  1. No rownames, instead column #1 is the sample ID.
  2. Columns 2-m are the categorical/survival/etc metrics.
  3. Columns m-n are 1 gene-per-column with log2 values.

But when I think about it I think I get the idea, they want to be able to do modelling stuff more easily with response factors.

library(pathwayPCA)
library(rWikiPathways)
## 
## Attaching package: 'rWikiPathways'
## The following object is masked from 'package:GenomeInfoDb':
## 
##     listOrganisms
downloaded <- downloadPathwayArchive(organism = "Homo sapiens", format = "gmt")
data_path <- system.file("extdata", package = "pathwayPCA")
wikipathways <- read_gmt(paste0(data_path, "/wikipathways_human_symbol.gmt"),
                         description = TRUE)

expt <- subset_expt(hs_macrophage, subset = "macrophagetreatment!='uninf'") %>%
  subset_expt(subset = "macrophagetreatment!='uninf_sb'")
## subset_expt(): There were 68, now there are 63 samples.
## subset_expt(): There were 63, now there are 63 samples.
expt <- set_expt_conditions(expt, fact = "macrophagezymodeme")
## 
## none  z22  z23 
##    5   29   29
symbol_vector <- fData(expt)[[symbol_column]]
## Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, : object 'symbol_column' not found
names(symbol_vector) <- rownames(fData(expt))
## Error in names(symbol_vector) <- rownames(fData(expt)): object 'symbol_vector' not found
symbol_df <- as.data.frame(symbol_vector)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': object 'symbol_vector' not found
assay_df <- merge(symbol_df, as.data.frame(exprs(expt)), by = "row.names")
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'merge': object 'symbol_df' not found
assay_df[["Row.names"]] <- NULL
## Error in assay_df[["Row.names"]] <- NULL: object 'assay_df' not found
rownames(assay_df) <- make.names(assay_df[["symbol_vector"]], unique = TRUE)
## Error in make.names(assay_df[["symbol_vector"]], unique = TRUE): object 'assay_df' not found
assay_df[["symbol_vector"]] <- NULL
## Error in assay_df[["symbol_vector"]] <- NULL: object 'assay_df' not found
assay_df <- as.data.frame(t(assay_df))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': error in evaluating the argument 'x' in selecting a method for function 't': object 'assay_df' not found
assay_df[["SampleID"]] <- rownames(assay_df)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'assay_df' not found
assay_df <- dplyr::select(assay_df, "SampleID", everything())
## Error in dplyr::select(assay_df, "SampleID", everything()): object 'assay_df' not found
factor_df <- as.data.frame(pData(expt))
factor_df[["SampleID"]] <- rownames(factor_df)
factor_df <- dplyr::select(factor_df, "SampleID", everything())
factor_df <- factor_df[, c("SampleID", factors)]
## Error in `[.data.frame`(factor_df, , c("SampleID", factors)): object 'factors' not found
tt <- CreateOmics(
  assayData_df = assay_df,
  pathwayCollection_ls = wikipathways,
  response = factor_df,
  respType = "categorical",
  minPathSize=5)
## Error in CreateOmics(assayData_df = assay_df, pathwayCollection_ls = wikipathways, : object 'assay_df' not found
super <- AESPCA_pVals(
  object = tt,
  numPCs = 2,
  parallel = FALSE,
  numCores = 8,
  numReps = 2,
  adjustment = "BH")
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'AESPCA_pVals' for signature '"list"'
## Stopping this because it takes forever
##if (!isTRUE(get0("skip_load"))) {
##  pander::pander(sessionInfo())
##  message("This is hpgltools commit: ", get_git_commit())
##  message("Saving to ", savefile)
##  tmp <- sm(saveme(filename = savefile))
##}
tmp <- loadme(filename = savefile)
---
title: "TMRC2 202304: Macrophage Differential Expression."
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
  html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: zenburn
    keep_md: false
    mode: selfcontained
    number_sections: true
    self_contained: true
    theme: readable
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
---

<style>
body .main-container {
max-width: 1600px;
}
</style>

```{r options, include = FALSE}
library(ggplot2)
library(glue)
library(Heatplus)
library(hpgltools)
library(UpSetR)
library(tibble)

tt <- devtools::load_all("~/hpgltools")
knitr::opts_knit$set(progress = TRUE, verbose = TRUE, width = 90, echo = TRUE)
knitr::opts_chunk$set(error = TRUE, fig.width = 8, fig.height = 8, dpi = 96)
old_options <- options(digits = 4, stringsAsFactors = FALSE, knitr.duplicate.label = "allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size = 12))
ver <- "202304"
previous_file <- ""
rundate <- format(Sys.Date(), format = "%Y%m%d")

## tmp <- try(sm(loadme(filename = gsub(pattern = "\\.Rmd", replace = "\\.rda\\.xz", x = previous_file))))
rmd_file <- glue("tmrc2_macrophage_differential_expression_{ver}.Rmd")
loaded <- load(file = glue("rda/tmrc2_data_structures-v{ver}.rda"))
savefile <- gsub(pattern = "\\.Rmd", replace = "\\.rda\\.xz", x = rmd_file)
```

# Changelog

* 20230410: Making some changes to improve the differential expression
plots as well as prepare for some different pathway/GSEA/GSVA
analyses on the data.

# Introduction

Having established that the TMRC2 macrophage data looks robust and
illustrative of a couple of interesting questions, let us perform a
couple of differential analyses of it.

Also note that as of 202212, we received a new set of samples which
now include some which are of a completely different cell type,
U937. As their ATCC page states, they are malignant cells taken from
the pleural effusion of a 37 year old white male with histiocytic
lymphoma and which exhibit the morphology of monocytes.  Thus, this
document now includes some comparisons of the cell types as well as
the various macrophage donors (given that there are now more donors
too).

## Human data

I am moving the dataset manipulations here so that I can look at them
all together before running the various DE analyses.

## Create sets focused on drug, celltype, strain, and combinations

Let us start by playing with the metadata a little and create sets
with the condition set to:

* Drug treatment
* Cell type (macrophage or U937)
* Donor
* Infection Strain
* Some useful combinations thereof

In addition, keep mental track of which datasets are comprised of all
samples vs. those which are only macrophage vs. those which are only
U937.  (Thus, the usage of all_human vs. hs_macr vs. u937 as prefixes
for the data structures.)

Ideally, these recreations of the data should perhaps be in the
datastructures worksheet.

```{r de_datasets}
all_human <- sanitize_expt_metadata(hs_macrophage, columns = "drug") %>%
  set_expt_conditions(fact = "drug") %>%
  set_expt_batches(fact = "typeofcells")

## The following 3 lines were copy/pasted to datastructures and should be removed soon.
no_strain_idx <- pData(all_human)[["strainid"]] == "none"
##pData(all_human)[["strainid"]] <- paste0("s", pData(all_human)[["strainid"]],
##                                         "_", pData(all_human)[["macrophagezymodeme"]])
pData(all_human)[no_strain_idx, "strainid"] <- "none"
table(pData(all_human)[["strainid"]])

all_human_types <- set_expt_conditions(all_human, fact = "typeofcells") %>%
  set_expt_batches(fact = "drug")

type_zymo_fact <- paste0(pData(all_human_types)[["condition"]], "_",
                         pData(all_human_types)[["macrophagezymodeme"]])
type_zymo <- set_expt_conditions(all_human_types, fact = type_zymo_fact)

type_drug_fact <- paste0(pData(all_human_types)[["condition"]], "_",
                         pData(all_human_types)[["drug"]])
type_drug <- set_expt_conditions(all_human_types, fact = type_drug_fact)

strain_fact <- pData(all_human_types)[["strainid"]]
table(strain_fact)

new_conditions <- paste0(pData(hs_macrophage)[["macrophagetreatment"]], "_",
                         pData(hs_macrophage)[["macrophagezymodeme"]])
## Note the sanitize() call is redundant with the addition of sanitize() in the
## datastructures file, but I don't want to wait to rerun that.
hs_macr <- set_expt_conditions(hs_macrophage, fact = new_conditions) %>%
  sanitize_expt_metadata(column = "drug") %>%
  subset_expt(subset = "typeofcells!='U937'")
```

### Separate Macrophage samples

Once again, we should reconsider where the following block is placed,
but these datastructures are likely to be used in many of the
following analyses.

```{r hs_macr_drug_strain}
hs_macr_drug_expt <- set_expt_conditions(hs_macr, fact = "drug")

hs_macr_strain_expt <- set_expt_conditions(hs_macr, fact = "macrophagezymodeme") %>%
  subset_expt(subset = "macrophagezymodeme != 'none'")

table(pData(hs_macr)[["strainid"]])
```

### Refactor U937 samples

The U937 samples were separated in the datastructures file, but we
want to use the combination of drug/zymodeme with them pretty much
exclusively.

```{r u937_samples}
new_conditions <- paste0(pData(hs_u937)[["macrophagetreatment"]], "_",
                         pData(hs_u937)[["macrophagezymodeme"]])
u937_expt <- set_expt_conditions(hs_u937, fact = new_conditions)
```

## Contrasts used in this document

Given the various ways we have chopped up this dataset, there are a
few general types of contrasts we will perform, which will then be
combined into greater complexity:

* drug treatment
* strains used
* cellltypes
* donors

In the end, our actual goal is to consider the variable effects of
drug+strain and see if we can discern patterns which lead to better or
worse drug treatment outcome.

There is a set of contrasts in which we are primarily interested in
this data, these follow.  I created one ratio of ratios contrast which
I think has the potential to ask our biggest question.

```{r tumrc2_human_keepers}
tmrc2_human_extra <- "z23drugnodrug_vs_z22drugnodrug = (infsbz23 - infz23) - (infsbz22 - infz22), z23z22drug_vs_z23z22nodrug = (infsbz23 - infsbz22) - (infz23 - infz22)"
tmrc2_human_keepers <- list(
  "z23nosb_vs_uninf" = c("infz23", "uninfnone"),
  "z22nosb_vs_uninf" = c("infz22", "uninfnone"),
  "z23nosb_vs_z22nosb" = c("infz23", "infz22"),
  "z23sb_vs_z22sb" = c("infsbz23", "infsbz22"),
  "z23sb_vs_z23nosb" = c("infsbz23", "infz23"),
  "z22sb_vs_z22nosb" = c("infsbz22", "infz22"),
  "z23sb_vs_sb" = c("infsbz23", "uninfsbnone"),
  "z22sb_vs_sb" = c("infsbz22", "uninfsbnone"),
  "z23sb_vs_uninf" = c("infsbz23", "uninfnone"),
  "z22sb_vs_uninf" = c("infsbz22", "uninfnone"),
  "sb_vs_uninf" = c("uninfsbnone", "uninfnone"),
  "extra_z2322" = c("z23drugnodrug", "z22drugnodrug"),
  "extra_drugnodrug" = c("z23z22drug", "z23z22nodrug"))
tmrc2_drug_keepers <- list(
  "drug" = c("antimony", "none"))
tmrc2_type_keepers <- list(
  "type" = c("U937", "Macrophages"))
tmrc2_strain_keepers <- list(
  "strain" = c("z23", "z22"))
type_zymo_extra <- "zymos_vs_types = (U937z23 - U937z22) - (Macrophagesz23 - Macrophagesz22)"
tmrc2_typezymo_keepers <- list(
  "u937_macr" = c("Macrophagesnone", "U937none"),
  "zymo_macr" = c("Macrophagesz23", "Macrophagesz22"),
  "zymo_u937" = c("U937z23", "U937z22"),
  "z23_types" = c("U937z23", "Macrophagesz23"),
  "z22_types" = c("U937z22", "Macrophagesz22"),
  "zymos_types" = c("zymos_vs_types"))
tmrc2_typedrug_keepers <- list(
  "type_nodrug" = c("U937none", "Macrophagesnone"),
  "type_drug" = c("U937antimony", "Macrophagesantimony"),
  "macr_drugs" = c("Macrophagesantimony", "Macrophagesnone"),
  "u937_drugs" = c("U937antimony", "U937none"))
u937_keepers <- list(
  "z23nosb_vs_uninf" = c("infz23", "uninfnone"),
  "z22nosb_vs_uninf" = c("infz22", "uninfnone"),
  "z23nosb_vs_z22nosb" = c("infz23", "infz22"),
  "z23sb_vs_z22sb" = c("infsbz23", "infsbz22"),
  "z23sb_vs_z23nosb" = c("infsbz23", "infz23"),
  "z22sb_vs_z22nosb" = c("infsbz22", "infz22"),
  "z23sb_vs_sb" = c("infsbz23", "uninfsbnone"),
  "z22sb_vs_sb" = c("infsbz22", "uninfsbnone"),
  "z23sb_vs_uninf" = c("infsbz23", "uninfnone"),
  "z22sb_vs_uninf" = c("infsbz22", "uninfnone"),
  "sb_vs_uninf" = c("uninfsbnone", "uninfnone"))
high_expression <- 128
high_expression_column <- "deseq_basemean"

combined_to_tsv <- function(combined, celltype = "all") {
  keepers <- combined[["keepers"]]
  for (k in seq_len(length(keepers))) {
    kname <- names(keepers)[k]
    numerator <- keepers[[k]][1]
    denominator <- keepers[[k]][2]
    filename <- glue("analyses/macrophage_de/{ver}/tsv_tables/tmrc2_{celltype}_{kname}_n{numerator}_d{denominator}-v{ver}.tsv")
    kdata <- combined[["data"]][[kname]]
    if (is.null(kdata[["basic_num"]])) {
      next
    }
    wanted <- c("hgncsymbol", "deseq_logfc", "deseq_adjp", "deseq_basemean", "deseq_num", "deseq_den")
    wanted_data <- kdata[, wanted]
    colnames(wanted_data) <- c("hgncsymbol", "deseq_logfc", "deseq_adjp", "deseq_mean", "deseq_numerator", "deseq_denominator")
    readr::write_tsv(x = wanted_data %>% tibble::rownames_to_column(), file = filename)
  }
}

write_all_gp <- function(all_gp) {
  for (g in seq_len(length(all_gp))) {
    name <- names(all_gp)[g]
    datum <- all_gp[[name]]
    filename <- glue("analyses/macrophage_de/{ver}/gprofiler/{name}_gprofiler-v{ver}.xlsx")
    written <- sm(write_gprofiler_data(datum, excel = filename))
  }
}
```

### Primary queries

There is a series of initial questions which make some sense
to me, but these do not necessarily match the set of questions which
are most pressing.  I am hoping to pull both of these sets of
queries in one.

Before extracting these groups of queries, let us invoke the
all_pairwise() function and get all of the likely contrasts along with
one or more extras that might prove useful (the 'extra' argument).

### Combined U937 and Macrophages: Compare drug effects

When we have the u937 cells in the same dataset as the macrophages,
that provides an interesting opportunity to see if we can observe
drug-dependant effects which are shared across both cell types.

```{r both_types_drug}
drug_de <- all_pairwise(all_human, filter = TRUE, model_batch = "svaseq")

drug_table <- combine_de_tables(
  drug_de, keepers = tmrc2_drug_keepers,
  excel = glue("analyses/macrophage_de/{ver}/de_tables/tmrc2_macrophage_drug_comparison-v{ver}.xlsx"))
combined_to_tsv(drug_table, celltype = "all")

drug_sig <- extract_significant_genes(
  drug_table,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_drug_sig-v{ver}.xlsx"))
drug_highsig <- extract_significant_genes(
  drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_drug_highsig-v{ver}.xlsx"))
```

```{r gprofiler_all_drug}
all_drug_gp <- all_gprofiler(drug_sig)
write_all_gp(all_drug_gp)
```

### Combined U937 and Macrophages: compare cell types

There are a couple of ways one might want to directly compare the two
cell types.

* Given that the variance between the two celltypes is so huge, just
compare all samples.
* One might want to compare them with the interaction effects of drug/zymodeme.

```{r both_types_compare}
type_de <- all_pairwise(all_human_types, filter = TRUE, model_batch = "svaseq")

type_table <- combine_de_tables(
  type_de, keepers = tmrc2_type_keepers,
  excel = glue("analyses/macrophage_de/{ver}/de_tables/tmrc2_macrophage_type_comparison-v{ver}.xlsx"))
combined_to_tsv(type_table, celltype = "all")

type_sig <- extract_significant_genes(
  type_table,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_sig-v{ver}.xlsx"))
type_highsig <- extract_significant_genes(
  type_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_highsig-v{ver}.xlsx"))
```

#### Combined factors of interest: celltype+zymodeme

Given the above explicit comparison of all samples comprising the two
cell types, now let us look at the drug treatment+zymodeme status with
all samples, macrophages and U937.

```{r all_samples_zymo_type}
type_zymo_de <- all_pairwise(type_zymo, filter = TRUE, model_batch = "svaseq",
                             extra_contrasts = type_zymo_extra)

type_zymo_table <- combine_de_tables(
  type_zymo_de, keepers = tmrc2_typezymo_keepers,
  excel = glue("analyses/macrophage_de/{ver}/de_tables/tmrc2_macrophage_type_zymo_comparison-v{ver}.xlsx"))
combined_to_tsv(type_zymo_table, celltype = "all")

type_zymo_sig <- extract_significant_genes(
  type_zymo_table,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_zymo_sig-v{ver}.xlsx"))
type_zymo_highsig <- extract_significant_genes(
  type_zymo_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_zymo_highsig-v{ver}.xlsx"))
```

#### Combined factors of inteest: celltype+drug

The 'type_drug' datastructure is the same as above, but the condition
is created from the concatenation of the cell type and drug treatment.

```{r all_samples_zymo_type_sva}
type_drug_de <- all_pairwise(type_drug, filter = TRUE, model_batch = "svaseq")

type_drug_table <- combine_de_tables(
  type_drug_de, keepers = tmrc2_typedrug_keepers,
  excel = glue("analyses/macrophage_de/{ver}/de_tables/tmrc2_macrophage_type_drug_comparison-v{ver}.xlsx"))
combined_to_tsv(type_drug_table, celltype = "all")

type_drug_sig <- extract_significant_genes(
  type_drug_table,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_drug_sig-v{ver}.xlsx"))
type_drug_highsig <- extract_significant_genes(
  type_drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/{ver}/sig_tables/tmrc2_macrophage_type_drug_highsig-v{ver}.xlsx"))
```

# Individual cell types

At this point, I think it is fair to say that the two cell types are
sufficiently different that they do not really belong together in a
single analysis.

## drug or strain effects, single cell type

One of the queries Najib asked which I think I misinterpreted was to
look at drug and/or strain effects.  My interpretation is somewhere
below and was not what he was looking for.  Instead, he was looking to
see all(macrophage) drug/nodrug and all(macrophage) z23/z22 and
compare them to each other.  It may be that this is still a wrong
interpretation, if so the most likely comparison is either:

*  (z23drug/z22drug) / (z23nodrug/z22nodrug), or perhaps
*  (z23drug/z23nodrug) / (z22drug/z22nodrug),

I am not sure those confuse me, and at least one of them is below

### Macrophages

In these blocks we will explicitly query only one factor at a time,
drug and strain.  The eventual goal is to look for effects of
drug treatment and/or strain treatment which are shared?

#### Macrophage Drug only

Thus we will start with the pure drug query.  In this block we will
look only at the drug/nodrug effect.

```{r macrophage_drugonly_de}
hs_macr_drug_de <- all_pairwise(hs_macr_drug_expt, filter = TRUE, model_batch = "svaseq")

hs_macr_drug_table <- combine_de_tables(
  hs_macr_drug_de, keepers = tmrc2_drug_keepers,
  excel = glue("analyses/macrophage_de/tmrc2_macrophage_onlydrug_table-v{ver}.xlsx"))
combined_to_tsv(hs_macr_drug_table, celltype = "macrophage")

hs_macr_drug_sig <- extract_significant_genes(
  hs_macr_drug_table,
  excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_drug_sig-v{ver}.xlsx"))
hs_macr_drug_highsig <- extract_significant_genes(
  hs_macr_drug_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_drug_highsig-v{ver}.xlsx"))
```

#### Macrophage Strain only

In a similar fashion, let us look for effects which are observed when
we consider only the strain used during infection.

```{r macrophage_strainonly_de}
hs_macr_strain_de <- all_pairwise(hs_macr_strain_expt, filter = TRUE, model_batch = "svaseq")

hs_macr_strain_table <- combine_de_tables(
  hs_macr_strain_de, keepers = tmrc2_strain_keepers,
  excel = glue("analyses/macrophage_de/tmrc2_macrophage_onlystrain_table-v{ver}.xlsx"))
combined_to_tsv(hs_macr_strain_table, celltype = "macrophage")

hs_macr_strain_sig <- extract_significant_genes(
  hs_macr_strain_table,
  excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_onlystrain_sig-v{ver}.xlsx"))
hs_macr_strain_highsig <- extract_significant_genes(
  hs_macr_strain_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/tmrc2_macrophageonly_onlystrain_highsig-v{ver}.xlsx"))
```

#### Compare Drug and Strain Effects

Now let us consider the above two comparisons together.  First, I will
plot the logFC values of them against each other (drug on x-axis and
strain on the y-axis).  Then we can extract the significant genes in a
few combined categories of interest.  I assume these will focus
exclusively on the categories which include the introduction of the
drug.

```{r compare_drug_strain_effects}
drug_strain_comp_df <- merge(hs_macr_drug_table[["data"]][["drug"]],
                             hs_macr_strain_table[["data"]][["strain"]],
                             by = "row.names")
drug_strain_comp_plot <- plot_linear_scatter(
  drug_strain_comp_df[, c("deseq_logfc.x", "deseq_logfc.y")])
## Contrasts: antimony/none, z23/z22; x-axis: drug, y-axis: strain
## top left: higher no drug, z23; top right: higher drug z23
## bottom left: higher no drug, z22; bottom right: higher drug z22
drug_strain_comp_plot$scatter
```

As I noted in the comments above, some quadrants of the scatter plot
are likely to be of greater interest to us than others (the right
side).  Because I get confused sometimes, the following block will
explicitly name the categories of likely interest, then ask which
genes are shared among them, and finally use UpSetR to extract the
various gene intersection/union categories.

```{r drug_strain_scatter_subgroups}
higher_drug <- hs_macr_drug_sig[["deseq"]][["downs"]][[1]]
higher_nodrug <- hs_macr_drug_sig[["deseq"]][["ups"]][[1]]
higher_z23 <- hs_macr_strain_sig[["deseq"]][["ups"]][[1]]
higher_z22 <- hs_macr_strain_sig[["deseq"]][["downs"]][[1]]
sum(rownames(higher_drug) %in% rownames(higher_z23))
sum(rownames(higher_drug) %in% rownames(higher_z22))
sum(rownames(higher_nodrug) %in% rownames(higher_z23))
sum(rownames(higher_nodrug) %in% rownames(higher_z22))

drug_z23_lst <- list("drug" = rownames(higher_drug),
                     "z23" = rownames(higher_z23))
higher_drug_z23 <- upset(UpSetR::fromList(drug_z23_lst), text.scale = 2)
higher_drug_z23

drug_z23_shared_genes <- overlap_groups(drug_z23_lst)
shared_genes_drug_z23 <- overlap_geneids(drug_z23_shared_genes, "drug:z23")
#shared_genes_drug_z23 <- attr(drug_z23_shared_genes, "elements")[drug_z23_shared_genes[["drug:z23"]]]

drug_z22_lst <- list("drug" = rownames(higher_drug),
                     "z22" = rownames(higher_z22))
higher_drug_z22 <- upset(UpSetR::fromList(drug_z22_lst), text.scale = 2)
higher_drug_z22

drug_z22_shared_genes <- overlap_groups(drug_z22_lst)
shared_genes_drug_z22 <- overlap_geneids(drug_z22_shared_genes, "drug:z22")
#shared_genes_drug_z22 <- attr(drug_z22_shared_genes, "elements")[drug_z22_shared_genes[["drug:z22"]]]
```

#### Perform gProfiler on drug/strain effect shared genes

Now that we have some populations of genes which are shared across the
drug/strain effects, let us pass them to some GSEA analyses and see
what pops out.

```{r gp_drug_strain}
wanted <- drug_z23_shared_genes[["drug:z23"]]
shared_genes_drug_z23 <- attr(drug_z23_shared_genes, "elements")[wanted]
shared_drug_z23_gp <- simple_gprofiler(shared_genes_drug_z23)
shared_drug_z23_gp[["pvalue_plots"]][["MF"]]
shared_drug_z23_gp[["pvalue_plots"]][["BP"]]
shared_drug_z23_gp[["pvalue_plots"]][["REAC"]]

wanted <- drug_z22_shared_genes[["drug:z22"]]
shared_genes_drug_z22 <- attr(drug_z22_shared_genes, "elements")[wanted]
shared_drug_z22_gp <- simple_gprofiler(shared_genes_drug_z22)
shared_drug_z22_gp[["pvalue_plots"]][["BP"]]
```

## Our main question of interest

The data structure hs_macr contains our primary macrophages, which
are, as shown above, the data we can really sink our teeth into.

Note, we expect some errors when running the combine_de_tables()
because not all methods I use are comfortable using the ratio or
ratios contrasts we added in the 'extras' argument.  As a result, when
we combine them into the larger output tables, those peculiar
contrasts fail.  This does not stop it from writing the rest of the
results, however.

```{r hs_de}
## test = deseq_pairwise(normalize_expt(hs_macr, filter=TRUE), model_batch = "svaseq", filter = TRUE, extra_contrasts = tmrc2_human_extra)
hs_macr_de <- all_pairwise(
  hs_macr, model_batch = "svaseq", parallel = FALSE,
  filter = TRUE,
  extra_contrasts = tmrc2_human_extra)
tmp_keepers <- tmrc2_human_keepers[13]

hs_macr_table <- combine_de_tables(
  hs_macr_de,
  keepers = tmrc2_human_keepers,
  excel = glue("analyses/macrophage_de/hs_macr_drug_zymo_table_testing_macr_only-v{ver}.xlsx"))
combined_to_tsv(hs_macr_table, "macrophage")

hs_macr_sig <- extract_significant_genes(
  hs_macr_table,
  excel = glue("analyses/macrophage_de/hs_macr_drug_zymo_sig-v{ver}.xlsx"))
hs_macr_highsig <- extract_significant_genes(
  hs_macr_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/hs_macr_drug_zymo_highsig-v{ver}.xlsx"))
```

### Our main questions in U937

Let us do the same comparisons in the U937 samples, though I will not
do the extra contrasts, primarily because I think the dataset is less
likely to support them.

```{r hs_u937_de}
u937_de <- all_pairwise(u937_expt, model_batch = "svaseq", filter = TRUE)

u937_table <- combine_de_tables(
  u937_de,
  keepers = u937_keepers,
  excel = glue("analyses/macrophage_de/u937_drug_zymo_table-v{ver}.xlsx"))
combined_to_tsv(u937_table, celltype = "u937")

u937_sig <- extract_significant_genes(
  u937_table,
  excel = glue("analyses/macrophage_de/u937_drug_zymo_sig-v{ver}.xlsx"))
u937_highsig <- extract_significant_genes(
  u937_table, min_mean_exprs = high_expression, exprs_column = high_expression_column,
  excel = glue("analyses/macrophage_de/u937_drug_zymo_highsig-v{ver}.xlsx"))
```

#### Compare (no)Sb z2.3/z2.2 treatments among macrophages

```{r compare_drug_z2322}
upset_plots_hs_macr <- upsetr_sig(
  hs_macr_sig, both = TRUE,
  contrasts = c("z23sb_vs_z22sb", "z23nosb_vs_z22nosb"))
upset_plots_hs_macr[["both"]]
groups <- upset_plots_hs_macr[["both_groups"]]
shared_genes <- attr(groups, "elements")[groups[[2]]] %>%
  gsub(pattern = "^gene:", replacement = "")
length(shared_genes)

shared_gp <- simple_gprofiler(shared_genes)
shared_gp[["pvalue_plots"]][["MF"]]
shared_gp[["pvalue_plots"]][["BP"]]
shared_gp[["pvalue_plots"]][["REAC"]]

drug_genes <- attr(groups, "elements")[groups[["z23sb_vs_z22sb"]]] %>%
  gsub(pattern = "^gene:", replacement = "")
drugonly_gp <- simple_gprofiler(drug_genes)
drugonly_gp[["pvalue_plots"]][["BP"]]
```

I want to try something, directly include the u937 data in this...

```{r add_u937}
both_sig <- hs_macr_sig
names(both_sig[["deseq"]][["ups"]]) <- paste0("macr_", names(both_sig[["deseq"]][["ups"]]))
names(both_sig[["deseq"]][["downs"]]) <- paste0("macr_", names(both_sig[["deseq"]][["downs"]]))
u937_deseq <- u937_sig[["deseq"]]
names(u937_deseq[["ups"]]) <- paste0("u937_", names(u937_deseq[["ups"]]))
names(u937_deseq[["downs"]]) <- paste0("u937_", names(u937_deseq[["downs"]]))
both_sig[["deseq"]][["ups"]] <- c(both_sig[["deseq"]][["ups"]], u937_deseq[["ups"]])
both_sig[["deseq"]][["downs"]] <- c(both_sig[["deseq"]][["ups"]], u937_deseq[["downs"]])
summary(both_sig[["deseq"]][["ups"]])

upset_plots_both <- upsetr_sig(
  both_sig, both = TRUE,
  contrasts = c("macr_z23sb_vs_z22sb", "macr_z23nosb_vs_z22nosb",
                "u937_z23sb_vs_z22sb", "u937_z23nosb_vs_z22nosb"))
upset_plots_both$both
```

#### Compare DE results from macrophages and U937 samples

Looking a bit more closely at these, I think the u937 data is too
sparse to effectively compare.

```{r compare_de_u937_macro}
macr_u937_comparison <- compare_de_results(hs_macr_table, u937_table)
macr_u937_comparison$lfc_heat

macr_u937_venns <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                 contrasts = "z23sb_vs_z23nosb")
macr_u937_venns$up_plot
macr_u937_venns$down_plot

macr_u937_venns_v2 <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                    contrasts = "z22sb_vs_z22nosb")
macr_u937_venns_v2$up_plot
macr_u937_venns_v2$down_plot

macr_u937_venns_v3 <- compare_significant_contrasts(hs_macr_sig, second_sig_tables = u937_sig,
                                                    contrasts = "sb_vs_uninf")
macr_u937_venns_v3$up_plot
macr_u937_venns_v3$down_plot
```

### Compare macrophage/u937 with respect to z2.3/z2.2

```{r macr_u937_z23z22}
comparison_df <- merge(hs_macr_table[["data"]][["z23sb_vs_z22sb"]],
                       u937_table[["data"]][["z23sb_vs_z22sb"]],
                       by = "row.names")
macru937_z23z22_plot <- plot_linear_scatter(comparison_df[, c("deseq_logfc.x", "deseq_logfc.y")])
macru937_z23z22_plot$scatter

comparison_df <- merge(hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]],
                       u937_table[["data"]][["z23nosb_vs_z22nosb"]],
                       by = "row.names")
macru937_z23z22_plot <- plot_linear_scatter(comparison_df[, c("deseq_logfc.x", "deseq_logfc.y")])
macru937_z23z22_plot$scatter
```

#### Add donor to the contrasts, no sva

```{r nopower_nosva}
no_power_fact <- paste0(pData(hs_macr)[["donor"]], "_",
                        pData(hs_macr)[["condition"]])
table(pData(hs_macr)[["donor"]])
table(no_power_fact)
hs_nopower <- set_expt_conditions(hs_macr, fact = no_power_fact)
hs_nopower <- subset_expt(hs_nopower, subset="macrophagezymodeme!='none'")
hs_nopower_nosva_de <- all_pairwise(hs_nopower, model_batch = FALSE, filter = TRUE)
nopower_keepers <- list(
  "d01_zymo" = c("d01infz23", "d01infz22"),
  "d01_sbzymo" = c("d01infsbz23", "d01infsbz22"),
  "d02_zymo" = c("d02infz23", "d02infz22"),
  "d02_sbzymo" = c("d02infsbz23", "d02infsbz22"),
  "d09_zymo" = c("d09infz23", "d09infz22"),
  "d09_sbzymo" = c("d09infsbz23", "d09infsbz22"),
  "d81_zymo" = c("d81infz23", "d81infz22"),
  "d81_sbzymo" = c("d81infsbz23", "d81infsbz22"))
hs_nopower_nosva_table <- combine_de_tables(
  hs_nopower_nosva_de, keepers = nopower_keepers,
  excel = glue("analyses/macrophage_de/hs_nopower_table-v{ver}.xlsx"))
##                                  extra_contrasts = extra)
hs_nopower_nosva_sig <- extract_significant_genes(
  hs_nopower_nosva_table,
  excel = glue("analyses/macrophage_de/hs_nopower_nosva_sig-v{ver}.xlsx"))

d01d02_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d01_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d02_zymo"]],
                                by="row.names")
d0102_zymo_nosva_plot <- plot_linear_scatter(d01d02_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0102_zymo_nosva_plot$scatter
d0102_zymo_nosva_plot$correlation
d0102_zymo_nosva_plot$lm_rsq

d09d81_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d09_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d81_zymo"]],
                                by="row.names")
d0981_zymo_nosva_plot <- plot_linear_scatter(d09d81_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0981_zymo_nosva_plot$scatter
d0981_zymo_nosva_plot$correlation
d0981_zymo_nosva_plot$lm_rsq

d01d81_zymo_nosva_comp <- merge(hs_nopower_nosva_table[["data"]][["d01_zymo"]],
                                hs_nopower_nosva_table[["data"]][["d81_zymo"]],
                                by="row.names")
d0181_zymo_nosva_plot <- plot_linear_scatter(d01d81_zymo_nosva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0181_zymo_nosva_plot$scatter
d0181_zymo_nosva_plot$correlation
d0181_zymo_nosva_plot$lm_rsq

upset_plots_nosva <- upsetr_sig(hs_nopower_nosva_sig, both=TRUE,
                                contrasts=c("d01_zymo", "d02_zymo", "d09_zymo", "d81_zymo"))
upset_plots_nosva$up
upset_plots_nosva$down
upset_plots_nosva$both
## The 7th element in the both groups list is the set shared among all donors.
## I don't feel like writing out x:y:z:a
groups <- upset_plots_nosva[["both_groups"]]
shared_genes <- attr(groups, "elements")[groups[[7]]] %>%
  gsub(pattern = "^gene:", replacement = "")
shared_gp <- simple_gprofiler(shared_genes)
shared_gp$pvalue_plots$MF
shared_gp$pvalue_plots$BP
shared_gp$pvalue_plots$REAC
shared_gp$pvalue_plots$WP
```

#### Add donor to the contrasts, sva

```{r donor_drug_zymo_etc}
hs_nopower_sva_de <- all_pairwise(hs_nopower, model_batch = "svaseq", filter = TRUE)
nopower_keepers <- list(
  "d01_zymo" = c("d01infz23", "d01infz22"),
  "d01_sbzymo" = c("d01infsbz23", "d01infsbz22"),
  "d02_zymo" = c("d02infz23", "d02infz22"),
  "d02_sbzymo" = c("d02infsbz23", "d02infsbz22"),
  "d09_zymo" = c("d09infz23", "d09infz22"),
  "d09_sbzymo" = c("d09infsbz23", "d09infsbz22"),
  "d81_zymo" = c("d81infz23", "d81infz22"),
  "d81_sbzymo" = c("d81infsbz23", "d81infsbz22"))
hs_nopower_sva_table <- combine_de_tables(
  hs_nopower_sva_de, keepers = nopower_keepers,
  excel = glue("analyses/macrophage_de/hs_nopower_table-v{ver}.xlsx"))
##                                  extra_contrasts = extra)
hs_nopower_sva_sig <- extract_significant_genes(
  hs_nopower_sva_table,
  excel = glue("analyses/macrophage_de/hs_nopower_sva_sig-v{ver}.xlsx"))

d01d02_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d01_zymo"]],
                              hs_nopower_sva_table[["data"]][["d02_zymo"]],
                              by="row.names")
d0102_zymo_sva_plot <- plot_linear_scatter(d01d02_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0102_zymo_sva_plot$scatter
d0102_zymo_sva_plot$correlation
d0102_zymo_sva_plot$lm_rsq

d09d81_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d09_zymo"]],
                              hs_nopower_sva_table[["data"]][["d81_zymo"]],
                              by="row.names")
d0981_zymo_sva_plot <- plot_linear_scatter(d09d81_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0981_zymo_sva_plot$scatter
d0981_zymo_sva_plot$correlation
d0981_zymo_sva_plot$lm_rsq

d01d81_zymo_sva_comp <- merge(hs_nopower_sva_table[["data"]][["d01_zymo"]],
                              hs_nopower_sva_table[["data"]][["d81_zymo"]],
                              by="row.names")
d0181_zymo_sva_plot <- plot_linear_scatter(d01d81_zymo_sva_comp[, c("deseq_logfc.x", "deseq_logfc.y")])
d0181_zymo_sva_plot$scatter
d0181_zymo_sva_plot$correlation
d0181_zymo_sva_plot$lm_rsq

upset_plots_sva <- upsetr_sig(hs_nopower_sva_sig, both=TRUE,
                              contrasts=c("d01_zymo", "d02_zymo", "d09_zymo", "d81_zymo"))
upset_plots_sva$up
upset_plots_sva$down
upset_plots_sva$both
## The 7th element in the both groups list is the set shared among all donors.
## I don't feel like writing out x:y:z:a
groups <- upset_plots_sva[["both_groups"]]
shared_genes <- attr(groups, "elements")[groups[[7]]] %>%
  gsub(pattern = "^gene:", replacement = "")
shared_gp <- simple_gprofiler(shared_genes)
shared_gp$pvalue_plots$MF
shared_gp$pvalue_plots$BP
shared_gp$pvalue_plots$REAC
shared_gp$pvalue_plots$WP
```

### Donor comparison

```{r donor_de}
hs_donors <- set_expt_conditions(hs_macr, fact = "donor")
donor_de <- all_pairwise(hs_donors, model_batch="svaseq", filter=TRUE)
donor_table <- combine_de_tables(
  donor_de,
  excel=glue("analyses/macrophage_de/donor_tables-v{ver}.xlsx"))
donor_sig <- extract_significant_genes(
  donor_table,
  excel = glue("analyses/macrophage_de/donor_sig-v{ver}.xlsx"))
```

#### Primary query contrasts

The final contrast in this list is interesting because it depends on
the extra contrasts applied to the all_pairwise() above.  In my way of
thinking, the primary comparisons to consider are either cross-drug or
cross-strain, but not both.  However I think in at least a few
instances Olga is interested in strain+drug / uninfected+nodrug.

#### Write contrast results

Now let us write out the xlsx file containing the above contrasts.
The file with the suffix _table-version will therefore contain all
genes and the file with the suffix _sig-version will contain only
those deemed significant via our default criteria of DESeq2 |logFC| >= 1.0
and adjusted p-value <= 0.05.

# Over representation searches

I decided to make one initially small, but I think quickly big change
to the organization of this document:  I am moving the GSEA searches
up to immediately after the DE.  I will then move the plots of the
gprofiler results to immediately after the various volcano plots so
that it is easier to interpret them.

```{r over_represent_data}
all_gp <- all_gprofiler(hs_macr_sig)
for (g in seq_len(length(all_gp))) {
  name <- names(all_gp)[g]
  datum <- all_gp[[name]]
  filename <- glue("analyses/macrophage_de/gprofiler/{name}_gprofiler-v{ver}.xlsx")
  written <- sm(write_gprofiler_data(datum, excel = filename))
}
```

# Plot contrasts of interest

One suggestion I received recently was to set the axes for these
volcano plots to be static rather than let ggplot choose its own.  I
am assuming this is only relevant for pairs of contrasts, but that
might not be true.

## Individual zymodemes vs. uninfected

### Infected with z2.3 no Antimonial vs. Uninfected

```{r volcano_z23uninf_nosb}
plot_colors <- get_expt_colors(hs_macr_table[["input"]][["input"]])
x_limits <- c(-20, 10)

## The original plot from my xlsx file
hs_macr_table$plots$z23nosb_vs_uninf$deseq_vol_plots

z23nosb_vs_uninf_volcano <- plot_volcano_condition_de(
  input = hs_macr_table[["data"]][["z23nosb_vs_uninf"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz23"]])
z23nosb_vs_uninf_volcano$plot +
  scale_x_continuous(limits = x_limits)
plotly::ggplotly(z23nosb_vs_uninf_volcano$plot)

z23nosb_vs_uninf_volcano_nol <- plot_volcano_condition_de(
  input = hs_macr_table[["data"]][["z23nosb_vs_uninf"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz23"]])
z23nosb_vs_uninf_volcano_nol$plot +
  scale_x_continuous(limits = x_limits)

all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_uninf_up"]][["interactive_plots"]][["WP"]]

all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_uninf_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
```

### Infected with z2.2 no Antimonial vs. Uninfected

```{r volcano_z22uninf_nosb}
## The original plot
hs_macr_table$plots$z22nosb_vs_uninf$deseq_vol_plots

z22nosb_vs_uninf_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22nosb_vs_uninf"]], "z22nosb_vs_uninf",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz22"]])
z22nosb_vs_uninf_volcano$plot +
  scale_x_continuous(limits = x_limits)
plotly::ggplotly(z22nosb_vs_uninf_volcano$plot)

z22nosb_vs_uninf_volcano_nol <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22nosb_vs_uninf"]], "z22nosb_vs_uninf",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["uninfnone"]], color_low = plot_colors[["infz22"]])
z22nosb_vs_uninf_volcano_nol$plot +
  scale_x_continuous(limits = x_limits)

all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.2 without drug vs. uninfected without drug, up.
all_gp[["z22nosb_vs_uninf_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.2 without drug vs. uninfected without drug, up.

all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.2 without drug vs. uninfected without drug, down.
all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.2 without drug vs. uninfected without drug, down.
all_gp[["z22nosb_vs_uninf_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
```

### Infected with z2.3 treated vs. Uninfected treated

```{r volcano_z23uninf_sb}
## The original plot
hs_macr_table$plots$z23sb_vs_sb$deseq_vol_plots

z23sb_vs_uninfsb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["uninfsbnone"]])
z23sb_vs_uninfsb_volcano$plot +
  scale_x_continuous(limits = x_limits)
plotly::ggplotly(z23sb_vs_uninfsb_volcano$plot)

z23sb_vs_uninfsb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = NULL, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["uninfsbnone"]])
z23sb_vs_uninfsb_volcano$plot +
  scale_x_continuous(limits = x_limits)
```

### Infected with z2.3 untreated vs. z2.2 untreated

```{r volcano_z23nosb_z22nosb}
## The original plot
hs_macr_table$plots$z23nosb_vs_z22nosb$deseq_vol_plots

z23nosb_vs_z22nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]], "z23nosb_vs_z22nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_high = plot_colors[["infz23"]], color_low = plot_colors[["infz22"]])
z23nosb_vs_z22nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)
```

### Infected with z2.3 treated vs. z2.2 treated

```{r volcano_z23sb_z22sb}
## The original plot
hs_macr_table$plots$z23sb_vs_z22sb$deseq_vol_plots

z23sb_vs_z22sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_z22sb"]], "z23sb_vs_z22sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = FALSE,
  color_high = plot_colors[["infsbz23"]], color_low = plot_colors[["infsbz22"]])
z23sb_vs_z22sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
```

### Infected with z2.3 SB treated vs. z2.3 untreated

```{r volcano_z23sb_z23nosb}
## The original plot
hs_macr_table$plots$z23sb_vs_z23nosb$deseq_vol_plots

z23sb_vs_z23nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_z23nosb"]], "z23sb_vs_z23nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz23"]], color_high = plot_colors[["infz23"]])
z23sb_vs_z23nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)
```

### Infected with z2.3 SB treated vs. z2.3 untreated

```{r volcano_z22sb_z22nosb}
## The original plot
hs_macr_table$plots$z22sb_vs_z22nosb$deseq_vol_plots

z22sb_vs_z22nosb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22sb_vs_z22nosb"]], "z22sb_vs_z22nosb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz22"]], color_high = plot_colors[["infz22"]])
z22sb_vs_z22nosb_volcano$plot +
  scale_x_continuous(limits = x_limits)
```

### Infected with z2.3 SB treated vs. uninfected treated

```{r volcano_z23sb_uninfnosb}
## The original plot
hs_macr_table$plots$z23sb_vs_sb$deseq_vol_plots

z23sb_vs_sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z23sb_vs_sb"]], "z23sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz23"]], color_high = plot_colors[["uninfsbnone"]])
z23sb_vs_sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
```

### Infected with z2.2 SB treated vs. uninfected treated

```{r volcano_z22sb_uninfnosb}
## The original plot
hs_macr_table$plots$z22sb_vs_sb$deseq_vol_plots

z22sb_vs_sb_volcano <- plot_volcano_condition_de(
  hs_macr_table[["data"]][["z22sb_vs_sb"]], "z22sb_vs_sb",
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  label = 10, label_column = "hgncsymbol", invert = TRUE,
  color_low = plot_colors[["infsbz22"]], color_high = plot_colors[["uninfsbnone"]])
z22sb_vs_sb_volcano$plot +
  scale_x_continuous(limits = x_limits)
```





Check that my perception of the number of significant up/down genes
matches what the table/venn says.

```{r check_sig_venn01}
shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_uninf"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_uninf"]])))
pp(file="images/z23_vs_uninf_venn_up.png")
Vennerable::plot(shared)
dev.off()
Vennerable::plot(shared)
## I see 910 z23sb/uninf and 670 no z23nosb/uninf genes in the venn diagram.
length(shared@IntersectionSets[["10"]]) + length(shared@IntersectionSets[["11"]])
dim(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_uninf"]])

shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_uninf"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22nosb_vs_uninf"]])))
pp(file="images/z22_vs_uninf_venn_up.png")
Vennerable::plot(shared)
dev.off()
Vennerable::plot(shared)

length(shared@IntersectionSets[["10"]]) + length(shared@IntersectionSets[["11"]])
dim(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_uninf"]])
```

*Note to self*: There is an error in my volcano plot code which takes
effect when the numerator and denominator of the all_pairwise
contrasts are different than those in combine_de_tables.  It is
putting the ups/downs on the correct sides of the plot, but calling
the down genes 'up' and vice-versa.  The reason for this is that I did
a check for this happening, but used the wrong argument to handle it.

A likely bit of text for these volcano plots:

The set of genes differentially expressed between the zymodeme 2.3
and uninfected samples without druge treatment was quantified with
DESeq2 and included surrogate estimates from SVA.  Given the criteria
of significance of a abs(logFC) >= 1.0 and false discovery rate
adjusted p-value <= 0.05, 670 genes were observed as significantly
increased between the infected and uninfected samples and 386 were
observed as decreased. The most increased genes from the uninfected
samples include some which are potentially indicative of a strong
innate immune response and the inflammatory response.

In contrast, when the set of genes differentially expressed between
the zymodeme 2.2 and uninfected samples was visualized, only 7 genes
were observed as decreased and 435 increased.  The inflammatory
response was significantly less apparent in this set, but instead
included genes related to transporter activity and oxidoreductases.

## Direct zymodeme comparisons

An orthogonal comparison to that performed above is to directly
compare the zymodeme 2.3 and 2.2 samples with and without antimonial
treatment.

```{r z22z23_comparison_plots}
z23nosb_vs_z22nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23nosb_vs_z22nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23nosb_vs_z22nosb_volcano$plot)

z23sb_vs_z22sb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23sb_vs_z22sb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23sb_vs_z22sb_volcano$plot)
```

```{r z23nosb_vs_z22nosb_plots}
z23nosb_vs_z22nosb_volcano$plot +
  xlim(-10, 10) +
  ylim(0, 60)

pp(file="images/z23nosb_vs_z22nosb_reactome_up.png", image=all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23nosb_vs_z22nosb_up"]][["interactive_plots"]][["WP"]]

all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23nosb_vs_z22nosb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
```

```{r z23_vs_z22sb_plots}
z23sb_vs_z22sb_volcano$plot +
  xlim(-10, 10) +
  ylim(0, 60)

pp(file="images/z23sb_vs_z22sb_reactome_up.png", image=all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z22sb_up"]][["interactive_plots"]][["WP"]]

all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z22sb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
```


```{r z23sb_vs_z22sb_venn}
shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_z22sb"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23nosb_vs_z22nosb"]])))
pp(file="images/drug_nodrug_venn_up.png")
Vennerable::plot(shared)
dev.off()
Vennerable::plot(shared)

shared <- Vennerable::Venn(list("drug" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23sb_vs_z22sb"]]),
                                "nodrug" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23nosb_vs_z22nosb"]])))
pp(file="images/drug_nodrug_venn_down.png")
Vennerable::plot(shared)
dev.off()
```

A slightly different way of looking at the differences between the two
zymodeme infections is to directly compare the infected samples with
and without drug.  Thus, when a volcano plot showing the comparison of
the zymodeme 2.3 vs. 2.2 samples was plotted, 484 genes were observed
as increased and 422 decreased; these groups include many of the same
inflammatory (up) and membrane (down) genes.

Similar patterns were observed when the antimonial was included.
Thus, when a Venn diagram of the two sets of increased genes was
plotted, a significant number of the genes was observed as increased
(313) and decreased (244) in both the untreated and antimonial treated
samples.

## Drug effects on each zymodeme infection

Another likely question is to directly compare the treated vs
untreated samples for each zymodeme infection in order to visualize
the effects of antimonial.

```{r z23drug_z23nodrug_z22drug_z22nodrug_plots}
z23sb_vs_z23nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z23sb_vs_z23nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z23sb_vs_z23nosb_volcano$plot)
z22sb_vs_z22nosb_volcano <- plot_volcano_de(
  table = hs_macr_table[["data"]][["z22sb_vs_z22nosb"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(z22sb_vs_z22nosb_volcano$plot)
```

```{r z23sb_vs_z23nosb_plots}
z23sb_vs_z23nosb_volcano$plot +
  xlim(-8, 8) +
  ylim(0, 210)

pp(file="images/z23sb_vs_z23nosb_reactome_up.png",
   image=all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z23sb_vs_z23nosb_up"]][["interactive_plots"]][["WP"]]

all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z23sb_vs_z23nosb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
```

```{r z22sb_vs_z22nosb_plots}
z22sb_vs_z22nosb_volcano$plot +
  xlim(-8, 8) +
  ylim(0, 210)

pp(file="images/z22sb_vs_z22nosb_reactome_up.png",
   image=all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["REAC"]], height=12, width=9)
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["KEGG"]]
## KEGG, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["pvalue_plots"]][["WP"]]
## WikiPathways, zymodeme2.3 without drug vs. uninfected without drug, up.
all_gp[["z22sb_vs_z22nosb_up"]][["interactive_plots"]][["WP"]]

all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["REAC"]]
## Reactome, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["MF"]]
## MF, zymodeme2.3 without drug vs. uninfected without drug, down.
all_gp[["z22sb_vs_z22nosb_down"]][["pvalue_plots"]][["TF"]]
## TF, zymodeme2.3 without drug vs. uninfected without drug, down.
```

```{r z22sb_vs_z22nosb_venns}
shared <- Vennerable::Venn(list("z23" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z23sb_vs_z23nosb"]]),
                                "z22" = rownames(hs_macr_sig[["deseq"]][["ups"]][["z22sb_vs_z22nosb"]])))
pp(file="images/z23_z22_drug_venn_up.png")
Vennerable::plot(shared)
dev.off()
Vennerable::plot(shared)

shared <- Vennerable::Venn(list("z23" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z23sb_vs_z23nosb"]]),
                                "z22" = rownames(hs_macr_sig[["deseq"]][["downs"]][["z22sb_vs_z22nosb"]])))
pp(file="images/z23_z22_drug_venn_down.png")
Vennerable::plot(shared)
dev.off()
Vennerable::plot(shared)
```

Note: I am settig the x and y-axis boundaries by allowing the plotter
to pick its own axis the first time, writing down the ranges I
observe, and then setting them to the largest of the pair.  It is
therefore possible that I missed one or more genes which lies outside
that range.

The previous plotted contrasts sought to show changes between the two
strains z2.3 and z2.2.  Conversely, the previous volcano plots seek to
directly compare each strain before/after drug treatment.

## LRT of the Human Macrophage

```{r lrt_tmrc2_macr}
tmrc2_lrt_strain_drug <- deseq_lrt(hs_macr, interactor_column = "drug",
                                   interest_column = "macrophagezymodeme", factors = c("drug", "macrophagezymodeme"))
tmrc2_lrt_strain_drug$cluster_data$plot
```

## Parasite

```{r lp_de}
lp_macrophage_de <- all_pairwise(lp_macrophage_nosb,
                                 model_batch="svaseq", filter=TRUE)
tmrc2_parasite_keepers <- list(
  "z23_vs_z22" = c("z23", "z22"))
lp_macrophage_table <- combine_de_tables(
  lp_macrophage_de, keepers = tmrc2_parasite_keepers,
  excel = glue("analyses/macrophage_de/macrophage_parasite_infection_de-v{ver}.xlsx"))
lp_macrophage_sig <- extract_significant_genes(
  lp_macrophage_table,
  excel = glue("analyses/macrophage_de/macrophage_parasite_sig-v{ver}.xlsx"))

lp_macrophage_table[["plots"]][["z23nosb_vs_z22nosb"]][["deseq_vol_plots"]][["plot"]]

up_genes <- lp_macrophage_sig[["deseq"]][["ups"]][[1]]
dim(up_genes)
down_genes <- lp_macrophage_sig[["deseq"]][["downs"]][[1]]
dim(down_genes)
```

```{r parasite_volcano}
lp_z23sb_vs_z22sb_volcano <- plot_volcano_de(
  table = lp_macrophage_table[["data"]][["z23_vs_z22"]],
  fc_col = "deseq_logfc", p_col = "deseq_adjp",
  shapes_by_state = FALSE, color_by = "fc",  label = 10, label_column = "hgncsymbol")
plotly::ggplotly(lp_z23sb_vs_z22sb_volcano$plot)
lp_z23sb_vs_z22sb_volcano$plot
```

```{r goseq_lp}
up_goseq <- simple_goseq(up_genes, go_db = lp_go, length_db = lp_lengths)
## View categories over represented in the 2.3 samples
up_goseq$pvalue_plots$bpp_plot_over
down_goseq <- simple_goseq(down_genes, go_db = lp_go, length_db = lp_lengths)
## View categories over represented in the 2.2 samples
down_goseq$pvalue_plots$bpp_plot_over

written_goseq <- write_goseq_data(up_goseq,
                                  excel = glue("lp_macrophage_increased_z2.3_goseq-v{ver}.xlsx"))
written_goseq <- write_goseq_data(down_goseq,
                                  excel = glue("lp_macrophage_increased_z2.2_goseq-v{ver}.xlsx"))
```

# GSVA

```{r gsva}
hs_infected <- subset_expt(hs_macrophage, subset="macrophagetreatment!='uninf'") %>%
  subset_expt(subset="macrophagetreatment!='uninf_sb'")
hs_gsva_c2 <- simple_gsva(hs_infected)
hs_gsva_c2_meta <- get_msigdb_metadata(hs_gsva_c2, msig_xml="reference/msigdb_v7.2.xml")
hs_gsva_c2_sig <- get_sig_gsva_categories(hs_gsva_c2_meta, excel = "analyses/macrophage_de/hs_macrophage_gsva_c2_sig.xlsx")
hs_gsva_c2_sig$raw_plot

hs_gsva_c7 <- simple_gsva(hs_infected, signature_category = "c7")
hs_gsva_c7_meta <- get_msigdb_metadata(hs_gsva_c7, msig_xml="reference/msigdb_v7.2.xml")
hs_gsva_c7_sig <- get_sig_gsva_categories(hs_gsva_c7, excel = "analyses/macrophage_de/hs_macrophage_gsva_c7_sig.xlsx")
hs_gsva_c7_sig$raw_plot
```

# Try out a new tool

Two reasons: Najib loves him some PCA, this uses wikipathways, which is something I think is neat.

Ok, I spent some time looking through the code and I have some
problems with some of the design decisions.

Most importantly, it requires a data.frame() which has the following format:

1.  No rownames, instead column #1 is the sample ID.
2.  Columns 2-m are the categorical/survival/etc metrics.
3.  Columns m-n are 1 gene-per-column with log2 values.

But when I think about it I think I get the idea, they want to be able to do modelling stuff
more easily with response factors.

```{r pathwayPCA}
library(pathwayPCA)
library(rWikiPathways)

downloaded <- downloadPathwayArchive(organism = "Homo sapiens", format = "gmt")
data_path <- system.file("extdata", package = "pathwayPCA")
wikipathways <- read_gmt(paste0(data_path, "/wikipathways_human_symbol.gmt"),
                         description = TRUE)

expt <- subset_expt(hs_macrophage, subset = "macrophagetreatment!='uninf'") %>%
  subset_expt(subset = "macrophagetreatment!='uninf_sb'")
expt <- set_expt_conditions(expt, fact = "macrophagezymodeme")

symbol_vector <- fData(expt)[[symbol_column]]
names(symbol_vector) <- rownames(fData(expt))
symbol_df <- as.data.frame(symbol_vector)

assay_df <- merge(symbol_df, as.data.frame(exprs(expt)), by = "row.names")
assay_df[["Row.names"]] <- NULL
rownames(assay_df) <- make.names(assay_df[["symbol_vector"]], unique = TRUE)
assay_df[["symbol_vector"]] <- NULL
assay_df <- as.data.frame(t(assay_df))
assay_df[["SampleID"]] <- rownames(assay_df)
assay_df <- dplyr::select(assay_df, "SampleID", everything())

factor_df <- as.data.frame(pData(expt))
factor_df[["SampleID"]] <- rownames(factor_df)
factor_df <- dplyr::select(factor_df, "SampleID", everything())
factor_df <- factor_df[, c("SampleID", factors)]

tt <- CreateOmics(
  assayData_df = assay_df,
  pathwayCollection_ls = wikipathways,
  response = factor_df,
  respType = "categorical",
  minPathSize=5)

super <- AESPCA_pVals(
  object = tt,
  numPCs = 2,
  parallel = FALSE,
  numCores = 8,
  numReps = 2,
  adjustment = "BH")
```

```{r saveme}
## Stopping this because it takes forever
##if (!isTRUE(get0("skip_load"))) {
##  pander::pander(sessionInfo())
##  message("This is hpgltools commit: ", get_git_commit())
##  message("Saving to ", savefile)
##  tmp <- sm(saveme(filename = savefile))
##}
```

```{r loadme_after, eval = FALSE}
tmp <- loadme(filename = savefile)
```
