Previous papers did not do an explicit subtraction, instead just compared to WT and kept the genes which are > in delta/het vs. wt. There are multiple ways to deal with this and that query has not yet been defined. Later, Theresa came to the conclusion that the subtraction method is not appropriate.
In this document I hope to explore the freshly processed samples and perform some comparisons to see that we have the expected similarities and differences from the prior analysis performed by Theresa.
There is one way in which I expect any/all of these analyses to be explicitly different: this should include the changes produced by April’s renaming of some samples.
My intention is to produce a sample sheet which includes one column with non-umi-deduplicated results and one with deduplicated results. With the exception of the previous point, I hope that the first will be identical (or at least very close to identical) to Theresa’s result while the second I expect will be subtly different – but I am hoping subtly enough that it will not significantly change the interpretation but be a little more precise.
Lets see! I need therefore to make a change to my metadata gathering function to include the umi deduplicated result. I am thinking therefore to create a separate specification for umi-barcoded samples because looking through the logs for umi stuff when they are not used will be too much of a pain…
I have a couple pictures of RPL22 to help me remember the experimental design:
That second picture came from: (Li et al. (2022))
I would like to improve this document by comparing/contrasting the methodologies performed by other groups and those performed by me in it. I never fully appreciated the suite of computational methods applied by previous groups when examining TRAP data; I instead simply followed Theresa’s notebook without considering other possibilities.
I therefore spent a little time stepping through her thesis and pulling out the relevant papers in the hopes of learning these various methods. I should therefore be able soon to compare/contrast the various methods employed by other labs in addition to copying Theresa’s logic.
umi_spec <- make_rnaseq_spec(umi = TRUE)
iprgc_2022_meta <- gather_preprocessing_metadata("sample_sheets/20240606_only_umd_sequenced.xlsx",
spec = umi_spec, species = "mm39_112", verbose = FALSE,
basedir = "preprocessing/umd_sequenced")
## Dropped 1 rows from the sample metadata because the sample ID is blank.
## Warning in extract_metadata(starting_metadata): There were NA values in the condition column, setting them to 'undefined'.
## Warning in extract_metadata(starting_metadata): There were NA values in the condition column, setting them to 'undefined'.
## Warning in dispatch_regex_search(meta, search, replace, input_file_spec, : NAs introduced by coercion
## Warning in dispatch_regex_search(meta, search, replace, input_file_spec, : NAs introduced by coercion
## Warning in readLines(input_handle): line 58 appears to contain an embedded nul
## Warning in readLines(input_handle): line 58 appears to contain an embedded nul
## Warning in readLines(input_handle): line 58 appears to contain an embedded nul
## Warning in readLines(input_handle): line 58 appears to contain an embedded nul
## Warning in readLines(input_handle): line 58 appears to contain an embedded nul
## Warning in readLines(input_handle): line 58 appears to contain an embedded nul
## Writing new metadata to: sample_sheets/20240606_only_umd_sequenced_modified.xlsx
## Deleting the file sample_sheets/20240606_only_umd_sequenced_modified.xlsx before writing the tables.
From this point on, I am hoping/intending to pull liberally from Theresa’s notebook with a diversion to compare the three datasets:
Lets find out! But first, annotations!
I am pulling this from Theresa’s anxontrapR_pipeline.Rmd, primarily because it looks similar to the other documents, but was modified more recently. I will change it slightly, primarily because I grabbed a new mmusculus assembly and therefore I will pull the mmusculus annotations from a specific biomart (Smedley et al. (2009)) archive that should match it.
## The biomart annotations file already exists, loading from it.
The primary difference between my block and Theresa’s are:
Given that we are excluding a bunch of the older samples, the set of colors I expect to find is different; so I will make explicit here the various colors used to denote location/genotype/time/etc.
April turned me onto this website ‘paletton.com’ for this kind of stuff and I will try and pick out palettes which basically match what I am getting with the original colors.
color_choices <- list(
"all" = list(
"p08_het_dlgn" = "#9ECAE1",
"p15_het_dlgn" = "#9ECAE1",
"p08_het_retina" = "#F46D43",
"p15_het_retina" = "#F46D43",
"p08_het_scn" = "#2CA25F",
"p15_het_scn" = "#2CA25F",
"p08_ko_dlgn" = "#3182BD",
"p15_ko_dlgn" = "#3182BD",
"p08_ko_retina" = "#FDAE61",
"p15_ko_retina" = "#FDAE61",
"p08_ko_scn" = "#006D2C",
"p15_ko_scn" = "#006D2C",
"p08_wt_dlgn" = "#DEEBF7",
"p15_wt_dlgn" = "#DEEBF7",
"p08_wt_retina" = "#D73027",
"p15_wt_retina" = "#D73027",
"p08_wt_scn" = "#66C2A4",
"p15_wt_scn" = "#66C2A4"),
"geno_loc" = list(
"het_dlgn" = "#9ECAE1",
"het_retina" = "#F46D43",
"het_scn" = "#2CA25F",
"ko_dlgn" = "#3182BD",
"ko_retina" = "#FDAE61",
"ko_scn" = "#006D2C",
"wt_dlgn" = "#DEEBF7",
"wt_retina" = "#D73027",
"wt_scn" = "#66C2A4"),
## These colors are coming from ipRGC_summaryplots.html
## I am using kcolorchooser to grab them rather than get confused by the text
"location" = list(
"retina" = "#d73027",
"dlgn" = "#3182bd",
"scn" = "#006b29"),
"genotype" = list(
"wt" = "#D4D4D4",
"het" = "#787878",
"ko" = "#313131"),
"time" = list(
"p08" = "#5E104B",
"p15" = "#4E9231"))
label_column <- "mgisymbol" ## Set the column used to extract gene symbols rather than ENSG.....
There is one noteworthy sample: iprgc_103, it was effectively replaced when April renamed the samples and so exists in the v1 data, but not v2/v3; they instead have the newly named samples which I called iprgc_123 to iprgc_130. As a result, I copied the annotations for iprgc_123 to my column so that there is no discrepency in terms of genotype/location/time.
mm38_hisat_v1 <- create_expt(iprgc_2022_meta[["new_meta"]],
gene_info = mm_annot,
file_column = "symlink") %>%
set_expt_conditions(fact = "genolocatb") %>%
set_expt_batches(fact = "timeatb") %>%
set_expt_colors(color_choices[["geno_loc"]])
## Reading the sample metadata.
## The sample definitions comprises: 69 rows(samples) and 72 columns(metadata fields).
## Warning in create_expt(iprgc_2022_meta[["new_meta"]], gene_info = mm_annot, : Some samples were removed when cross referencing the samples against the count
## data.
## Matched 25663 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 25760 features and 61 samples.
## The numbers of samples by condition are:
##
## het_dlgn het_retina het_scn ko_dlgn ko_retina ko_scn wt_dlgn wt_retina wt_scn
## 7 7 7 5 6 5 9 11 4
## The number of samples by batch are:
##
## p08 p15 p60
## 26 32 3
## An expressionSet containing experiment with 25760
## gene and 61 samples. There are 72 metadata columns and
## 15 annotation columns; the primary condition is comprised of:
## het_dlgn, het_retina, het_scn, ko_dlgn, ko_retina, ko_scn, wt_dlgn, wt_retina, wt_scn.
## Its current state is: raw(data).
mm38_hisat_v2 <- create_expt(iprgc_2022_meta[["new_meta"]], gene_info = mm_annot,
file_column = "hisat_count_table") %>%
set_expt_conditions(fact = "genolocatb") %>%
set_expt_batches(fact = "timeatb") %>%
set_expt_colors(color_choices[["geno_loc"]])
## Reading the sample metadata.
## The sample definitions comprises: 69 rows(samples) and 72 columns(metadata fields).
## Warning in create_expt(iprgc_2022_meta[["new_meta"]], gene_info = mm_annot, : Some samples were removed when cross referencing the samples against the count
## data.
## Matched 25404 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 25425 features and 68 samples.
## The numbers of samples by condition are:
##
## het_dlgn het_retina het_scn ko_dlgn ko_retina ko_scn wt_dlgn wt_retina wt_scn
## 7 7 7 6 6 6 11 11 7
## The number of samples by batch are:
##
## p08 p15 p60
## 31 34 3
## An expressionSet containing experiment with 25425
## gene and 68 samples. There are 72 metadata columns and
## 15 annotation columns; the primary condition is comprised of:
## het_dlgn, het_retina, het_scn, ko_dlgn, ko_retina, ko_scn, wt_dlgn, wt_retina, wt_scn.
## Its current state is: raw(data).
mm38_hisat_v3 <- create_expt(iprgc_2022_meta[["new_meta"]], gene_info = mm_annot,
file_column = "umi_dedup_output_count") %>%
set_expt_conditions(fact = "genolocatb") %>%
set_expt_batches(fact = "timeatb") %>%
set_expt_colors(color_choices[["geno_loc"]])
## Reading the sample metadata.
## The sample definitions comprises: 69 rows(samples) and 72 columns(metadata fields).
## Warning in create_expt(iprgc_2022_meta[["new_meta"]], gene_info = mm_annot, : Some samples were removed when cross referencing the samples against the count
## data.
## Matched 25404 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 25425 features and 68 samples.
## The numbers of samples by condition are:
##
## het_dlgn het_retina het_scn ko_dlgn ko_retina ko_scn wt_dlgn wt_retina wt_scn
## 7 7 7 6 6 6 11 11 7
## The number of samples by batch are:
##
## p08 p15 p60
## 31 34 3
## An expressionSet containing experiment with 25425
## gene and 68 samples. There are 72 metadata columns and
## 15 annotation columns; the primary condition is comprised of:
## het_dlgn, het_retina, het_scn, ko_dlgn, ko_retina, ko_scn, wt_dlgn, wt_retina, wt_scn.
## Its current state is: raw(data).
all_fact <- paste0(pData(mm38_hisat_v3)[["timeatb"]], "_",
pData(mm38_hisat_v3)[["genolocatb"]])
pData(mm38_hisat_v3)[["time_geno_loc"]] <- all_fact
Note the end of the previous block, I created a factor out of the combination of time, genotype, and location. In a future invocation of this notebook, I will change the pairwise comparisons to add each of these three factors to the statistical model instead of this. The code to do that is not quite ready yet.
Let’s look at the number of non-zero genes for all samples versus the coverage.
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## The colors used in the expressionset are: #9ECAE1, #F46D43, #2CA25F, #3182BD, #FDAE61, #006D2C, #DEEBF7, #D73027, #66C2A4.
## The following samples have less than 16744 genes.
## [1] "iprgc_62" "iprgc_63" "iprgc_64" "iprgc_65" "iprgc_66" "iprgc_67" "iprgc_68" "iprgc_70" "iprgc_71" "iprgc_72" "iprgc_73" "iprgc_74"
## [13] "iprgc_75" "iprgc_77" "iprgc_78" "iprgc_79" "iprgc_80" "iprgc_81" "iprgc_82" "iprgc_83" "iprgc_84" "iprgc_85" "iprgc_86" "iprgc_88"
## [25] "iprgc_93" "iprgc_95" "iprgc_96" "iprgc_98" "iprgc_100" "iprgc_105" "iprgc_106" "iprgc_107" "iprgc_108" "iprgc_111" "iprgc_117"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 61 samples.
## These samples have an average 15.13 CPM coverage and 16229 genes observed, ranging from 13031 to
## 17347.
## Warning: ggrepel: 27 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## The following samples have less than 16526.25 genes.
## [1] "iprgc_62" "iprgc_63" "iprgc_64" "iprgc_66" "iprgc_67" "iprgc_68" "iprgc_70" "iprgc_71" "iprgc_72" "iprgc_73" "iprgc_74" "iprgc_75"
## [13] "iprgc_77" "iprgc_78" "iprgc_80" "iprgc_81" "iprgc_82" "iprgc_83" "iprgc_84" "iprgc_85" "iprgc_86" "iprgc_87" "iprgc_88" "iprgc_89"
## [25] "iprgc_90" "iprgc_91" "iprgc_92" "iprgc_93" "iprgc_94" "iprgc_95" "iprgc_96" "iprgc_97" "iprgc_98" "iprgc_100" "iprgc_102" "iprgc_104"
## [37] "iprgc_105" "iprgc_106" "iprgc_107" "iprgc_108" "iprgc_110" "iprgc_111" "iprgc_112" "iprgc_113" "iprgc_114" "iprgc_115" "iprgc_117" "iprgc_118"
## [49] "iprgc_121" "iprgc_123" "iprgc_124" "iprgc_125" "iprgc_126" "iprgc_127" "iprgc_128" "iprgc_129" "iprgc_130"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 68 samples.
## These samples have an average 13.7 CPM coverage and 15744 genes observed, ranging from 13692 to
## 17083.
## Warning: ggrepel: 15 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## The following samples have less than 16526.25 genes.
## [1] "iprgc_62" "iprgc_63" "iprgc_64" "iprgc_66" "iprgc_67" "iprgc_68" "iprgc_70" "iprgc_71" "iprgc_72" "iprgc_73" "iprgc_74" "iprgc_75"
## [13] "iprgc_77" "iprgc_78" "iprgc_80" "iprgc_81" "iprgc_82" "iprgc_83" "iprgc_84" "iprgc_85" "iprgc_86" "iprgc_87" "iprgc_88" "iprgc_89"
## [25] "iprgc_90" "iprgc_91" "iprgc_92" "iprgc_93" "iprgc_94" "iprgc_95" "iprgc_96" "iprgc_97" "iprgc_98" "iprgc_100" "iprgc_102" "iprgc_104"
## [37] "iprgc_105" "iprgc_106" "iprgc_107" "iprgc_108" "iprgc_110" "iprgc_111" "iprgc_112" "iprgc_113" "iprgc_114" "iprgc_115" "iprgc_117" "iprgc_118"
## [49] "iprgc_119" "iprgc_121" "iprgc_123" "iprgc_124" "iprgc_125" "iprgc_126" "iprgc_127" "iprgc_128" "iprgc_129" "iprgc_130"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 68 samples.
## These samples have an average 4.803 CPM coverage and 15787 genes observed, ranging from 13868 to
## 17101.
## Warning: ggrepel: 19 unlabeled data points (too many overlaps). Consider increasing max.overlaps
Oh wow, I did not expect such a profound effect on the cpm values on the more saturated libraries. I guess in retrospect I should have?
Also note to self, we are not messing with p60.
## The samples excluded are: iprgc_78, iprgc_79, iprgc_80.
## subset_expt(): There were 61, now there are 58 samples.
## The samples excluded are: iprgc_78, iprgc_79, iprgc_80.
## subset_expt(): There were 68, now there are 65 samples.
## The samples excluded are: iprgc_78, iprgc_79, iprgc_80.
## subset_expt(): There were 68, now there are 65 samples.
v1_norm <- normalize_expt(mm38_hisat_v1, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 10970 low-count genes (14790 remaining).
## transform_counts: Found 3225 values equal to 0, adding 1 to the matrix.
v2_norm <- normalize_expt(mm38_hisat_v2, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 10298 low-count genes (15127 remaining).
## transform_counts: Found 8465 values equal to 0, adding 1 to the matrix.
v3_norm <- normalize_expt(mm38_hisat_v3, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 10156 low-count genes (15269 remaining).
## transform_counts: Found 9347 values equal to 0, adding 1 to the matrix.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by het_dlgn, het_retina, het_scn, ko_dlgn, ko_retina, ko_scn, wt_dlgn, wt_retina, wt_scn
## Shapes are defined by p08, p15.
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by het_dlgn, het_retina, het_scn, ko_dlgn, ko_retina, ko_scn, wt_dlgn, wt_retina, wt_scn
## Shapes are defined by p08, p15.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by het_dlgn, het_retina, het_scn, ko_dlgn, ko_retina, ko_scn, wt_dlgn, wt_retina, wt_scn
## Shapes are defined by p08, p15.
To my eyes it looks like we just have 1 weirdo p15 sample? Deduplication had a minor but significant effect on the PCA.
With that in mind, let us look at Theresa’s WORKING document and see what we can recapitulate.
Theresa’s document: The TRAP protocol has some variability which is introduced at different stpdf including homogenization, antibody labeling, pulldown efficiency/specificity, sample handling during cleanup stpdf, and library prep/sequencing. We know from Rashmi’s QC that there is variability at the level of pulldown efficiency (amount of RNA isolated). She is doing a good job of keeping track of this for all her samples and we have validated her P8 results (attached supplementary figure 3D). We consistently see clear differences between control and cre samples for the retina, which makes sense because the cell bodies are in the retina. The target tissue differences are smaller, which also makes sense for axon-TRAP. We think that some of her P15 samples are not good based on low amounts of isolated RNA from cre(+) retina samples. We plan to drop these samples and not perform additional isolations at this time point. Based on this (and the general lack of large developmental effects), we were planning to focus on presenting the P8 data only in the paper. Interested to hear your thoughts in this…
My notes: Theresa’s first operations in this notebook were to:
v3_loc_geno <- set_expt_conditions(mm38_hisat_v3, fact = "locationatb",
colors = color_choices[["location"]]) %>%
set_expt_batches(fact = "genotypeatb")
## The numbers of samples by condition are:
##
## dlgn retina scn
## 23 23 19
## The number of samples by batch are:
##
## het ko wt
## 21 18 26
At different times, it appears to me that Theresa has preferred slightly different normalization methods, primarily a mix of TMM and quantile.
Thus I will use different suffix letters to denote various normalizations employed, and if they turn out the same I will pick one arbitrarily.
loc_geno_nq <- normalize_expt(v3_loc_geno, transform = "log2", convert = "cpm",
filter = TRUE, norm = "quant")
## Removing 10156 low-count genes (15269 remaining).
## transform_counts: Found 9347 values equal to 0, adding 1 to the matrix.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
## ok, I have two weirdo samples which look very much like they are actually dlgn.
## These are sample IDs iprgc_66 and iprgc_130
loc_geno_nt <- normalize_expt(v3_loc_geno, transform = "log2", convert = "cpm",
filter = TRUE, norm = "tmm")
## Removing 10156 low-count genes (15269 remaining).
## transform_counts: Found 42869 values equal to 0, adding 1 to the matrix.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
A random thought about these PCA plots, it might be worth while to add a panel below the legend with the sample numbers per condition/batch.
Of course, the same information is provided in a more fun fashion via my silly sankey function:
sample_sankey <- plot_meta_sankey(v3_loc_geno, color_choices = color_choices,
factors = c("genotypeatb", "locationatb", "timeatb"))
sample_sankey
## A sankey plot describing the metadata of 65 samples,
## including 30 out of 0 nodes and traversing metadata factors:
## .
Rashmi came by and we discussed the samples a little. She suggested that is likely that we will need to exclude the 202205 samples, these may be identified by a few ways, most easily I think via the ‘projectah’ column, they are the 021_1 samples.
My sense was that she concurred with my interpretation of the umi deduplication, so I will continue using the deduplicated results exclusively, at least for now.
One of Theresa’s first checks was wisely for melanopsin. Let us repeat a version of this:
opn4_exprs <- data.frame(combined = pData(loc_geno_nt)[["genolocatb"]],
location = pData(loc_geno_nt)[["locationatb"]],
genotype = pData(loc_geno_nt)[["genotypeatb"]],
opn = exprs(loc_geno_nt)["ENSMUSG00000021799", ])
groupedstats::grouped_summary(opn4_exprs, location, opn)
## # A tibble: 3 × 16
## location skim_type skim_variable missing complete mean sd min p25 median p75 max n std.error mean.conf.low mean.conf.high
## <fct> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 dlgn numeric opn 0 1 0.0849 0.195 0 0 0 0 0.675 23 0.0407 0.000476 0.169
## 2 retina numeric opn 0 1 2.82 2.10 0 1.59 2.14 4.68 6.57 23 0.439 1.91 3.73
## 3 scn numeric opn 0 1 0.0450 0.142 0 0 0 0 0.561 19 0.0326 -0.0234 0.113
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in max(x): no non-missing arguments to max; returning -Inf
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in min(x): no non-missing arguments to max; returning -Inf
## Number of labels is greater than default palette color count.
## • Select another color `palette` (and/or `package`).
## Warning in min(x): no non-missing arguments to min; returning Inf
## Warning in min(x): no non-missing arguments to max; returning -Inf
ok, so I plotted the question a bit differently, but got the same answer.
Here is the text of Theresa’s notebook following this analysis:
“Ugh oh, looks like there is at least one retina KO sample that has some melanopsin expression in it. Turns out ipRGC_07 is a bad egg which is supposed to be a KO but has melanopsin expression. It’s friends which were pooled from the same mice are iprgc_06 and iprgc_08, so we need to exclude all these samples.”
I am also seeing some knockout expression with some caveats: I do not have the affected samples in my dataset (iprgc_07) and the levels I am seeing are quite low – I will look in IGV to double check, but I strongly suspect that these are some piddly reads near the UTRs.
Onward!
Theresa’s next operation was to perform libsize/nonzero plots. I already did the pre/post deduplication nonzero, here is the analagous libsize.
v2 is pre-deduplication and v3 is post.
## Library sizes of 65 samples,
## ranging from 3,717,242 to 24,538,069.
## Library sizes of 65 samples,
## ranging from 1,264,475 to 10,979,038.
I am a bit concerned about some of these library sizes post-deduplication.
Let us look at the relationship between reads and duplication, which I assume will be relatively linear.
test <- pData(mm38_hisat_v3)[, c("hisatgenomesingleall", "umideduppctreads")]
test_plot <- plot_linear_scatter(test)
test_plot[["scatter"]]
Theresa also produced a density/sample plot, that might prove quite useful for these due to their significantly larger variance across samples (due to deduplication).
## Changed 627293 zero count features.
## Density plot describing 65 samples.
There is some difference across sample densities, but it is not too crazytown.
Theresa’s first pca was of log2 cpm values. I might add quantile/tmm to this?
v3_location <- set_expt_conditions(mm38_hisat_v3, fact = "locationatb") %>%
set_expt_batches(fact = "genotypeatb")
## The numbers of samples by condition are:
##
## dlgn retina scn
## 23 23 19
## The number of samples by batch are:
##
## het ko wt
## 21 18 26
v3_location_norm <- normalize_expt(v3_location, filter = TRUE, norm = "quant",
transform = "log2", convert = "cpm")
## Removing 10156 low-count genes (15269 remaining).
## transform_counts: Found 9347 values equal to 0, adding 1 to the matrix.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
Once again we see that samples iprgc_66 and iprgc_130 are likely actually DLGN and not SCN. I am therefore going to add a column to the sample sheet noting this, and remove them from the expressionset.
I will thus replot the data after removing those two. If we want to see what it looks like with the re-attributed locations, we can do so.
Theresa has a nice change to the PCA plotter in which she sets the alpha channel as an additional visual queue for a metadata factor…
mm38_hisat_v3 <- subset_expt(mm38_hisat_v3, subset="sampleid!='iprgc_130'") %>%
subset_expt(subset="sampleid!='iprgc_66'")
## The samples excluded are: iprgc_130.
## subset_expt(): There were 65, now there are 64 samples.
## The samples excluded are: iprgc_66.
## subset_expt(): There were 64, now there are 63 samples.
v3_location <- set_expt_conditions(mm38_hisat_v3, fact = "locationatb") %>%
set_expt_batches(fact = "genotypeatb")
## The numbers of samples by condition are:
##
## dlgn retina scn
## 23 23 17
## The number of samples by batch are:
##
## het ko wt
## 20 18 25
v3_location_norm <- normalize_expt(v3_location, filter = TRUE, norm = "quant",
transform = "log2", convert = "cpm")
## Removing 10162 low-count genes (15263 remaining).
## transform_counts: Found 8867 values equal to 0, adding 1 to the matrix.
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by dlgn, retina, scn
## Shapes are defined by het, ko, wt.
removed_sankey <- plot_meta_sankey(v3_location, color_choices = color_choices,
factors = c("genotypeatb", "locationatb", "timeatb"))
removed_sankey
## A sankey plot describing the metadata of 63 samples,
## including 30 out of 0 nodes and traversing metadata factors:
## .
Here is Theresa’s text, recall once again that I do not have some of these older samples (iprgc_62):
PC1 vs PC2 identifies retina vs axon is still the main component of variation. We do see though that in the PC2 direction, we see with the new samples added, we don’t see separation based on axonal targets (dLGN vs SCN). In the PC1 vs PC3 plot, we see that it’s PC3 where we start to see variation correlated with axonal compartment. Let’s look at PC1 vs PC2 colored by batch (when they were processed/sequenced) to see if that is what is contributing so much variation in PC2.
Side note: ipRGC 62 seems like an odd ball. This seems to me like it should have been a dLGN P08 sample. Is there any possibility this got mislabeled early on? I went back and double checked to see if all my processing is correct and it indeed was labeled an SCN P15 from the time I got the samples, and it is indeed.
I now switched to Theresa’s document ‘WORKING_axonTRAP…’ and will start pulling sections from it. I am reasonably certain I have reasonably similar sample distributions, so I presume I can invoke similar/identical calls for DESeq and friends.
In the block immediately before the DE analyses, Theresa created a subset expressionset of only p08 retinas. Thus this initial DE I assume will be used to subtract for the SCN/DLGN analyses that follow. (I guess I could read ahead and find out, but no! I want to be a blank slate)
Theresa’s primary workflow makes heavy use of DESeq2 (Love, Huber, and Anders (2014)) and sva (Leek et al. (2012)). In some(most?) of Theresa’s invocations of the all_pairwise() function, she excludes the other methods that it performs. In this workbook, I left those methods on, thus we can evaluate the relative performance DESeq2 vs. some (all? I may have disabled EBSeq/dream because they were taking too long) of the following:
## The samples excluded are: iprgc_62, iprgc_63, iprgc_64, iprgc_65, iprgc_67, iprgc_68, iprgc_69, iprgc_70, iprgc_71, iprgc_72, iprgc_73, iprgc_74, iprgc_75, iprgc_76, iprgc_77, iprgc_81, iprgc_82, iprgc_85, iprgc_87, iprgc_88, iprgc_89, iprgc_92, iprgc_93, iprgc_94, iprgc_95, iprgc_96, iprgc_97, iprgc_98, iprgc_99, iprgc_100, iprgc_101, iprgc_102, iprgc_104, iprgc_105, iprgc_106, iprgc_107, iprgc_108, iprgc_110, iprgc_111, iprgc_112, iprgc_113, iprgc_114, iprgc_116, iprgc_119, iprgc_122, iprgc_123, iprgc_124, iprgc_125, iprgc_126, iprgc_127, iprgc_128, iprgc_129.
## subset_expt(): There were 63, now there are 11 samples.
##
## het_retina ko_retina wt_retina
## 3 3 5
## Removing 0 low-count genes (13424 remaining).
## Setting 46 low elements to zero.
## transform_counts: Found 46 values equal to 0, adding 1 to the matrix.
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
The following invocation performed by Theresa filters the wt/het comparison for only those genes which increased by at least 0.25 logFC with a significant adjusted p-value. I assume that this is to use the wt samples as a translational control for the ket/ko comparisons; I am therefore thinking that for my purposes, I will therefore separate the contrasts from all_pairwise do this in a stepwise fashion…
The block of code immediately following Theresa’s all_pairwise() invocation is a little confusing for me and warrants some explanation by me to me in the hopes that I do not misunderstand what is happening and the goals therein.
I think I can safely assume that the goal here is to pull out the IDs which increased in het with respect to wild type; even if by a small margin, as long as it is statistically significant vis a vis the adjusted p-value.
I am going to perform what I think is the same thing in a slightly different fashion so that I can share a copy of the results with whomever is interested. I will also repeat Theresa’s invocation and prove to myself that I understood and got the same answer.
wt_het_keeper <- list("het_vs_wt" = c("hetretina", "wtretina"))
het_wt_table <- combine_de_tables(mm_normal_p8_ret_de, keepers = wt_het_keeper,
label_column = label_column,
excel = "excel/het_retina_control.xlsx")
## Deleting the file excel/het_retina_control.xlsx before writing the tables.
wanted_sig <- extract_significant_genes(het_wt_table,
lfc = 0.25,
according_to = "deseq")
wanted_het_increased <- wanted_sig[["deseq"]][["ups"]][["het_vs_wt"]]
increased_het_genes <- rownames(wanted_het_increased)
Here are Theresa’s next lines:
mm_de_normal_p8_ret <- mm_normal_p8_ret_de
hetkeeper_genes <- mm_de_normal_p8_ret$deseq$all_tables$wtretina_vs_hetretina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
kokeeper_genes <- mm_de_normal_p8_ret$deseq$all_tables$wtretina_vs_koretina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
keepergenes <- unique(c(rownames(hetkeeper_genes),
rownames(kokeeper_genes)))
## We know a priori that Opn4 is ENSMUSG00000021799
## I do not expect to see it in this set, it should be higher in wt
## retina vs ko retina by a significant margin.
"ENSMUSG00000021799" %in% keepergenes
## [1] TRUE
I think Rashmi made a compelling point which illustrates why we likely should expect the expression of Opn4 to significantly higher in the heterozygotes vs wild-type:
This makes me wonder if any normalization methods exist which do something like multiply the values by some value related to the proportion of observed genes; and/or if this is a good/bad/indifferent idea.
Also, just a note for me to remember: RPL22, not RPS22, for some reason I keep thinking the small subunit.
hetkeeper_genes <- mm_normal_p8_ret_de$deseq$all_tables$wtretina_vs_hetretina %>%
filter(logFC <= -0.25 & adj.P.Val <= 0.05)
testthat::expect_true(nrow(hetkeeper_genes) == length(increased_het_genes))
taa_keepers <- sort(rownames(hetkeeper_genes))
atb_keepers <- sort(increased_het_genes)
testthat::expect_equal(taa_keepers, atb_keepers)
Yay! I can read! Now let us repeat for the KO vs wt
wt_ko_keeper <- list("ko_vs_wt" = c("koretina", "wtretina"))
ko_wt_table <- combine_de_tables(mm_normal_p8_ret_de, keepers = wt_ko_keeper,
label_column = label_column,
excel = "excel/ko_retina_control.xlsx")
## Deleting the file excel/ko_retina_control.xlsx before writing the tables.
wanted_sig <- extract_significant_genes(ko_wt_table,
lfc = 0.25,
according_to = "deseq")
wanted_ko_increased <- wanted_sig[["deseq"]][["ups"]][["ko_vs_wt"]]
increased_ko_genes <- rownames(wanted_ko_increased)
The next thing performed in Theresa’s document is a unique(concatenation of these two gene groups), thus sucking up every gene which was significantly higher in either the knockout or heterzyous samples with respect to wild-type.
This was followed by a couple of merge operations of a little bit of the annotation data; I am not sure I understand the goal yet…
Here is her code. I copied the annotation ‘mgi_symbol’ column to ‘external_gene_name’ so that I need not change any of her code. I am assuming this is the appropriate column of interest, I do not know this for certain, but it seems quite likely.
While I am at it, here is the set_sig_limma() function from Theresa’s helpers.R
set_sig_limma <- function(limma_tbl, factors = NULL) {
if (is.null(factors)) {
#set significance for plotting colors
limma_tbl$Significance <- NA
limma_tbl[abs(limma_tbl$logFC) < 1 | limma_tbl$adj.P.Val > .05, "Significance"] <- "Not \nEnriched"
limma_tbl[limma_tbl$logFC >= 1 & limma_tbl$adj.P.Val <= .05, ][["Significance"]] <- "Disease \nUpregulated"
limma_tbl[limma_tbl$logFC <= -1 & limma_tbl$adj.P.Val <= .05, ][["Significance"]] <- "Disease \nDownregulated"
limma_tbl$Significance <- factor(limma_tbl$Significance, levels = c("Upregulated", "Downregulated", "Not \nEnriched"))
} else {
limma_tbl$Significance <- NA
limma_tbl[abs(limma_tbl$logFC) < 1 | limma_tbl$adj.P.Val > .05, "Significance"] <- "Not \nEnriched"
if(nrow(limma_tbl[limma_tbl$logFC >= 1 & limma_tbl$adj.P.Val <= .05, ]) != 0) {
limma_tbl[limma_tbl$logFC >= 1 & limma_tbl$adj.P.Val <= .05, ][["Significance"]] <- factors[1]
}
if (nrow(limma_tbl[limma_tbl$logFC <= -1 & limma_tbl$adj.P.Val <= .05, ]) != 0) {
limma_tbl[limma_tbl$logFC <= -1 & limma_tbl$adj.P.Val <= .05, ][["Significance"]] <- factors[2]
}
limma_tbl$Significance <- factor(limma_tbl$Significance, levels = c(factors, "Not \nEnriched"))
}
return(limma_tbl)
}
mm_annot[["external_gene_name"]] <- mm_annot[["mgi_symbol"]]
keepergenes <- unique(c(rownames(hetkeeper_genes), rownames(kokeeper_genes)))
length(keepergenes)
## [1] 3632
annots_to_merge <- mm_annot %>%
select(ensembl_gene_id, external_gene_name) %>%
filter(ensembl_gene_id %in% rownames(mm_de_normal_p8_ret$deseq$all_tables$koretina_vs_hetretina)) %>%
distinct()
mm_de_normal_p8_ret$deseq$all_tables$koretina_vs_hetretina <- merge(
mm_de_normal_p8_ret$deseq$all_tables$koretina_vs_hetretina, annots_to_merge,
by.x = 0, by.y = "ensembl_gene_id", all.x = TRUE)
df <- mm_de_normal_p8_ret$deseq$all_tables$koretina_vs_hetretina %>%
dplyr::mutate(logFC = -logFC) %>%
set_sig_limma(factors = c("Het Enriched", "KO Enriched"))
My version of the above task makes use of the excludes option of combine_de_tabes. Given the set of unique gene IDs increased in the het/ko, I can ask to exlude anything not in that set. I could also have more parsimoniously directly excluded any gene ID increased in the wt samples. But, Theresa already provided the code to do the former, so it will be less typing/opportunity for silly mistakes to just do that.
both_increased_genes <- unique(c(increased_het_genes, increased_ko_genes))
## arbitrairly grab all genes from one of my data structures.
all_genes <- rownames(exprs(mm38_hisat_v3))
exclude_idx <- all_genes %in% both_increased_genes
summary(exclude_idx)
## Mode FALSE TRUE
## logical 21793 3632
exclude_increased_genes <- all_genes[exclude_idx]
retina_keepers <- list(
"het_vs_wt" = c("hetretina", "wtretina"),
"ko_vs_wt" = c("koretina", "wtretina"),
"ko_vs_het" = c("koretina", "hetretina"))
## A reminder to myself: there is also a parameter 'wanted_genes'
## which does effectively the same thing as excludes in this context;
## excludes was originally written to allow flexible, keyword-based
## exclusion.
p8_retina_tables <- combine_de_tables(
mm_normal_p8_ret_de, keepers = retina_keepers,
wanted_genes = both_increased_genes, label_column = label_column,
excel = glue("excel/p8_retina_kept_genes_increased_in_wt_tables-v{ver}.xlsx"))
## Deleting the file excel/p8_retina_kept_genes_increased_in_wt_tables-v20240917.xlsx before writing the tables.
p8_retina_sig <- extract_significant_genes(
p8_retina_tables,
excel = glue("excel/p8_retina_kept_genes_increased_in_wt_sig-v{ver}.xlsx"),
according_to = "deseq")
## Deleting the file excel/p8_retina_kept_genes_increased_in_wt_sig-v20240917.xlsx before writing the tables.
opposite_p8_retina_tables <- combine_de_tables(
mm_normal_p8_ret_de, keepers = retina_keepers,
excludes = both_increased_genes, label_column = label_column,
excel = glue("excel/p8_retina_removed_genes_increased_in_wt_tables-v{ver}.xlsx"))
## Deleting the file excel/p8_retina_removed_genes_increased_in_wt_tables-v20240917.xlsx before writing the tables.
opposite_p8_retina_sig <- extract_significant_genes(
p8_retina_tables,
excel = glue("excel/p8_retina_removed_genes_increased_in_wt_sig-v{ver}.xlsx"),
according_to = "deseq")
## Deleting the file excel/p8_retina_removed_genes_increased_in_wt_sig-v20240917.xlsx before writing the tables.
The following is a copy/paste from Theresa containing the remaining tasks she performed and will provide the template for implementation of the final tasks.
This picks up with the lines from her notebook immediately following the invocation of ‘set_sig_limma(factors = c(“Het Enriched” …’.
For all of the remaining blocks I will copy in her code, turn off its evaluation, run the blocks manually, compare them to her notebook output, then enable each block as I ensure I understand it.
I will likely therefore introduce some small formatting changes and add some additional GSEA/enrichment tasks once the non-specific filtering is complete.
df <- df %>%
filter(Row.names %in% keepergenes)
labels_ups <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(logFC) %>%
head(n = 9)
labels_downs <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(-logFC) %>%
head(n = 11)
labels <- rbind(labels_ups, labels_downs)
res_tbl <- df
DEplot <- ggplot(res_tbl, aes(x = logFC, y = -log10(adj.P.Val), label = external_gene_name)) +
geom_point(aes(colour = Significance), size = 4) +
geom_vline(xintercept = c(-1,1)) +
geom_hline(yintercept = -log10(.05)) +
theme_classic(base_size = 20) +
xlab("log2(FC)") +
ylab("-log10(p-value)") +
theme(legend.position = "right") +
scale_color_manual(values = c("#F8766D", "#00BFC4", "Grey")) +
geom_label_repel(
data = filter(df,
## c('s5_het_dlgn', 's5_het_ret', 's5_het_scn')),
external_gene_name %in% labels$external_gene_name),
## nudge_x = -0.5,
nudge_y = 3, max.overlaps = 15) +
xlim(c(-3, 6))
setPDF()
## Error in setPDF(): could not find function "setPDF"
## Warning: Removed 2 rows containing missing values or values outside the scale range (`geom_point()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range (`geom_label_repel()`).
## png
## 2
## Warning: Removed 2 rows containing missing values or values outside the scale range (`geom_point()`).
## Removed 2 rows containing missing values or values outside the scale range (`geom_label_repel()`).
## [1] 21
## [1] 69
alldysregulated_genes <- res_tbl %>%
filter(adj.P.Val <= 0.05) %>%
arrange(logFC) %>%
select(Row.names, logFC, adj.P.Val, external_gene_name, Significance) %>%
filter(abs(logFC) >= 1)
## gsea_result_ko <- gost(query = ko_genes$external_gene_name,
## organism = "mmusculus",
## evcodes = TRUE,
## ordered_query = TRUE)
gsea_result_het <- gost(query = het_enriched$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)
gsea_result_alldysregulated <- gost(query = alldysregulated_genes$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)
I have a function in my package which seeks to make gProfiler queries a bit more complete and easy. Let us see how similar the result is…
rownames(alldysregulated_genes) <- alldysregulated_genes[["Row.names"]]
alldysregulated_genes[["Row.names"]] <- NULL
het_gp <- simple_gprofiler(rownames(alldysregulated_genes),
species = "mmusculus",
excel = glue("excel/het_gprofiler-v{ver}.xlsx"))
het_gp
## A set of ontologies produced by gprofiler using 90
## genes against the mmusculus annotations and significance cutoff 0.05.
## There are:
## 5 MF
## 2 BP
## 6 KEGG
## 21 REAC
## 1 WP
## 1 TF
## 0 MIRNA
## 0 CORUM
## 0 HP hits.
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'dotplot' for signature '"NULL"'
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'pairwise_termsim' for signature '"NULL"'
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'emapplot': object 'gp_pair' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'ssplot': object 'gp_pair' not found
## Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'treeplot': object 'gp_pair' not found
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'upsetplot' for signature '"NULL"'
## Warning in emapplot.enrichResult(x, showCategory = showCategory, ...): Use 'layout.params = list(coords = your_value)' instead of 'coords'.
## The coords parameter will be removed in the next version.
## Warning in emapplot.enrichResult(x, showCategory = showCategory, ...): Use 'edge.params = list(show = your_value)' instead of 'with_edge'.
## The with_edge parameter will be removed in the next version.
## Warning in emapplot.enrichResult(x, showCategory = showCategory, ...): Use 'cluster.params = list(cluster = your_value)' instead of 'group_category'.
## The group_category parameter will be removed in the next version.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
## ! # Invaild edge matrix for <phylo>. A <tbl_df> is returned.
I make a somewhat arbitrary distinction between the concepts of over-enrichment analyses and GSEA: the former (as performed by gprofiler) (Raudvere et al. (2019)) seeks to find groups of genes overrepresented in GO/reactome/etc. These groups of genes are taken exclusively from the top-n/bottom-n genes with respect to fold-change between conditions of interest; in this case most different than wt in the p08 retina ko or het samples.
With that in mind, I can invoke a similar function using the full table of DE results to get what I call the GSEA result using clusterProfiler (Yu (n.d.)). In the following block I will use the ‘all_cprofiler’ function on the data structures named ‘p8_retina_tables’ and ‘opposite_p8_retina_tables’ in order to get these GSEA results for each contrast performed (het/wt, ko/wt, het/ko). I will follow that up with ‘all_gprofiler’ which does the same, but uses gProfiler’s enrichment analyses (it will therefore include what we just looked at).
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Reading KEGG annotation online: "https://rest.kegg.jp/link/mmu/pathway"...
## Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/mmu"...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## preparing geneSet collections...
##
## GSEA analysis...
##
## leading edge analysis...
##
## done...
##
## --> No gene can be mapped....
##
## --> Expected input gene ID: 56421,394432,171210,69080,60525,68738
##
## --> return NULL...
##
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## preparing geneSet collections...
##
## GSEA analysis...
##
## leading edge analysis...
##
## done...
##
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## No gene sets have size between 5 and 500 ...
##
## --> return NULL...
##
## No gene sets have size between 5 and 500 ...
##
## --> return NULL...
##
## preparing geneSet collections...
##
## GSEA analysis...
##
## leading edge analysis...
##
## done...
##
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## preparing geneSet collections...
##
## GSEA analysis...
##
## leading edge analysis...
##
## done...
##
## preparing geneSet collections...
##
## GSEA analysis...
##
## leading edge analysis...
##
## done...
##
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Starting: het_vs_wt_up.
## Starting: het_vs_wt_down.
## Starting: ko_vs_wt_up.
## Starting: ko_vs_wt_down.
## Starting: ko_vs_het_up.
## Starting: ko_vs_het_down.
pp(file = "images/gsea_p08_retina_ko_vs_het_top_hit.png")
p08_topn_gsea[["GO_ko_vs_het_up"]][[1]]
dev.off()
## png
## 2
#gsea_ko <- gsea_result_ko[["result"]] %>%
# select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
# arrange(desc(recall)) %>%
# head(n = 10)
# gsea_plots_ko <- ggplot(gsea_ko, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
# geom_bar(stat = "identity")+
# scale_fill_continuous(low = "blue", high = "red") +
# theme_bw()+
# ylab("") +
# xlab("GSEA Score")
gsea_het <- gsea_result_het[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_het <- ggplot(gsea_het, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
gsea_all <- gsea_result_alldysregulated[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_all <- ggplot(gsea_all, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
setPDF()
## Error in setPDF(): could not find function "setPDF"
postscript(file = "images/GSEA_p08_axontrap_retinahet_upregulated_vs_retinako.pdf")
gsea_plots_het
dev.off()
## png
## 2
## Error in setPDF(): could not find function "setPDF"
postscript(file = "images/GSEA_p08_retina_axontrap_alldysregulatedgenes.pdf")
gsea_plots_all
dev.off()
## png
## 2
It is only now that I realized we are splitting the data by location for each set of comparisons. I think that, left to my own devices, I would prefer to keep the input data structure intact, perform the somewhat larger number of contrasts, and then split up the results. Ideally this will slightly improve the fidelity of the results returned by DESeq2 and friends. But, I will run the state of Theresa’s notebook with as few changes as possible first, then add this.
I am going to skip this PCA plot for a couple of reasons: I already did a superset of it, and the subset Theresa performed is not valid given the set of samples included in my sample sheet, and figuring out the actually corresponding subset will take me forever… In addition, I want to use my mm38_hisat_v3 for everything…
mm38_subset <- subset_expt(
mm38_hisat,
subset = "(batch == '4' | batch == '5' | batch == '6') & time == 'p08' & location == 'scn' | sampleid == 'iprgc_03'")
mm38_norm <- normalize_expt(mm38_subset, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq")
mm38_norm <- set_expt_batches(mm38_norm, fact = "location")
mm38_norm <- set_expt_conditions(mm38_norm, fact = "genotype")
pca_norm <- plot_pca(mm38_norm, max_overlaps = 70)
pca_norm$plot
Instead I will simplify the subset and see what happens…
scn_samples <- subset_expt(mm38_hisat_v3,
subset = "locationatb == 'scn'") %>%
set_expt_batches(fact = "locationatb") %>%
set_expt_conditions(fact = "genotypeatb")
## The samples excluded are: iprgc_62, iprgc_63, iprgc_64, iprgc_65, iprgc_68, iprgc_69, iprgc_71, iprgc_72, iprgc_73, iprgc_74, iprgc_75, iprgc_76, iprgc_81, iprgc_82, iprgc_83, iprgc_84, iprgc_86, iprgc_87, iprgc_88, iprgc_90, iprgc_91, iprgc_92, iprgc_94, iprgc_96, iprgc_98, iprgc_99, iprgc_100, iprgc_101, iprgc_104, iprgc_105, iprgc_106, iprgc_107, iprgc_108, iprgc_109, iprgc_115, iprgc_116, iprgc_117, iprgc_118, iprgc_119, iprgc_120, iprgc_121, iprgc_122, iprgc_123, iprgc_125, iprgc_126, iprgc_127.
## subset_expt(): There were 63, now there are 17 samples.
## The number of samples by batch are:
##
## scn
## 17
## The numbers of samples by condition are:
##
## het ko wt
## 6 6 5
scn_norm <- normalize_expt(scn_samples, filter = TRUE, convert = "cpm",
transform = "log2", batch = "svaseq")
## Removing 11109 low-count genes (14316 remaining).
## Setting 919 low elements to zero.
## transform_counts: Found 919 values equal to 0, adding 1 to the matrix.
## Error in eval(expr, envir, enclos): object 'scn_norm_pc' not found
At this point in the document I read ahead a bit and came to the conclusion that it repeats the above logic of taking the union of wt comparisons to remove genes from the appropriate het/ko or p15/p08 or location comparisons. This seems quite reasonable to me, but I would prefer to not separate all the data, so I will attempt to duplicate and slightly streamline this logic on the full dataset. Thus I am going to skip down to the end and attempt to implement this.
mm_de_normal_p8_scn <- all_pairwise(mm38_subset, model_batch = "svaseq",
parallel = FALSE, do_ebseq = FALSE, do_basic = FALSE,
do_dream = FALSE, do_noiseq = FALSE, do_edger = FALSE,
filter = TRUE)
annots_to_merge <- mm_annot %>%
select(ensembl_gene_id, external_gene_name) %>%
filter(ensembl_gene_id %in% rownames(mm_de_normal_p8_scn$deseq$all_tables$koscn_vs_hetscn)) %>%
distinct()
mm_de_normal_p8_scn$deseq$all_tables$koscn_vs_hetscn <- merge(
mm_de_normal_p8_scn$deseq$all_tables$koscn_vs_hetscn,
annots_to_merge, by.x = 0, by.y = "ensembl_gene_id", all.x = TRUE)
hetkeeper_genes <- mm_de_normal_p8_scn$deseq$all_tables$wtscn_vs_hetscn %>%
filter(logFC <= -.1 & adj.P.Val <= 0.05)
kokeeper_genes <- mm_de_normal_p8_scn$deseq$all_tables$wtscn_vs_koscn %>%
filter(logFC <= -.1 & adj.P.Val <= 0.05)
keepergenes <- unique(c(rownames(hetkeeper_genes), rownames(kokeeper_genes)))
df <- mm_de_normal_p8_scn$deseq$all_tables$koscn_vs_hetscn %>%
dplyr::mutate(logFC = -logFC) %>%
set_sig_limma(factors = c("Het Enriched",
"KO Enriched"))
df <- df %>%
filter(Row.names %in% keepergenes)
labels_ups <- df %>%
filter(abs(logFC) > 1) %>%
arrange(logFC) %>%
head(n = 1)
labels_downs <- df %>%
filter(abs(logFC) > 1) %>%
arrange(-logFC) %>%
head(n = 1)
labels <- rbind(labels_ups, labels_downs)
res_tbl <- df
DEplot <- ggplot(res_tbl, aes(x = logFC, y = -log10(adj.P.Val), label = external_gene_name)) +
geom_point(aes(colour = Significance), size = 4) +
geom_vline(xintercept = c(-1,1)) +
geom_hline(yintercept = -log10(.05)) +
theme_classic(base_size = 20) +
xlab("log2(FC)") +
ylab("-log10(p-value)") +
## ggtitle(title, subtitle = subtitle) +
theme(legend.position="right") +
scale_color_manual(values=c("Het Enriched" = "#F8766D",
"KO Enriched" = "#00BFC4",
"Not\n Enriched" = "Grey")) +
geom_label_repel(data=filter(df,
## c('s5_het_dlgn', 's5_het_ret', 's5_het_scn')),
external_gene_name %in% labels$external_gene_name),
## nudge_x = -0.5,
nudge_y = 3, max.overlaps = 15) +
ggtitle("SCN Het vs KO Translatome")
setPDF()
postscript(file = "images/p08_scn_DE_1312024.pdf")
DEplot
dev.off()
writexl::write_xlsx(df, path = "excel/scnhet_vs_scnko_WTfiltered.xlsx")
ko_genes <- res_tbl %>%
filter(adj.P.Val <= 0.05) %>%
arrange(-abs(logFC)) %>%
select(Row.names, logFC, adj.P.Val, external_gene_name, Significance) %>%
filter(logFC <= -1)
het_genes <- res_tbl %>%
filter(adj.P.Val <= 0.05) %>%
arrange(-abs(logFC)) %>%
select(Row.names, logFC, adj.P.Val, external_gene_name, Significance) %>%
filter(logFC >= 1)
alldysregulated_genes <- res_tbl %>%
filter(adj.P.Val <= 0.05) %>%
arrange(logFC) %>%
select(Row.names, logFC, adj.P.Val, external_gene_name, Significance) %>%
filter(abs(logFC) >= 1)
gsea_result_ko <- gost(query = ko_genes$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)
gsea_result_het <- gost(query = het_genes$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)
gsea_result_alldysregulated <- gost(query = alldysregulated_genes$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE)
gsea_ko <- gsea_result_ko[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_ko <- ggplot(gsea_ko, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
gsea_het <- gsea_result_het[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_het <- ggplot(gsea_het, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
gsea_all <- gsea_result_alldysregulated[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 10)
gsea_plots_all <- ggplot(gsea_all, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
setPDF()
postscript(file = "images/GSEA_p08_retina_axontrap_alldysregulatedgenes.pdf")
gsea_plots_all
dev.off()
mm38_subset2 <- subset_expt(
mm38_hisat,
subset = "(batch == '4' | batch == '5' | batch == '6') & time == 'p08' & genotype != 'ko' & location != 'dlgn' | sampleid == 'iprgc_03'")
mm38_subset2 <- subset_expt(mm38_subset2, subset = "sampleid != 'iprgc_89'")
mm38_subset2$design %>%
select(genotype, location) %>%
table()
mm38_norm2 <- normalize_expt(mm38_subset2, filter=TRUE,
convert="cpm",
transform="log2", batch = "svaseq")
mm_de_subset2 <- all_pairwise(mm38_subset2,
model_batch="svaseq",
parallel=FALSE, do_ebseq=FALSE,
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_edger = FALSE,
filter = TRUE)
retinakeeper_genes <- mm_de_subset2$deseq$all_tables$wtretina_vs_hetretina %>%
filter(logFC <= -.1 & adj.P.Val <= 0.05)
scnkeeper_genes <- mm_de_subset2$deseq$all_tables$wtscn_vs_hetscn %>%
filter(logFC <= -.1 & adj.P.Val <= 0.05)
keepergenes <- unique(c(rownames(retinakeeper_genes), rownames(scnkeeper_genes)))
annots_to_merge <- mm_annot %>%
select(ensembl_gene_id, external_gene_name) %>%
filter(ensembl_gene_id %in% rownames(mm_de_subset2$deseq$all_tables$hetscn_vs_hetretina)) %>%
distinct()
mm_de_subset2$deseq$all_tables$hetscn_vs_hetretina <- merge(
mm_de_subset2$deseq$all_tables$hetscn_vs_hetretina,
annots_to_merge, by.x = 0,
by.y = "ensembl_gene_id", all.x = TRUE)
df <- mm_de_subset2$deseq$all_tables$hetscn_vs_hetretina %>%
mutate(Significance = case_when(logFC <= -1 ~ "Retina Enriched",
logFC >= 1 ~ "SCN Enriched",
logFC > -1 & logFC < 1 ~ "Not\n Enriched"))
df <- df %>%
filter(Row.names %in% keepergenes)
scn_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC >= 1) %>%
arrange(-logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val) %>%
mutate(Significance = "SCN Enriched") %>%
filter(Row.names %in% rownames(scnkeeper_genes))
retina_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC <= -1) %>%
arrange(logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val) %>%
mutate(Significance = "Retina Enriched") %>%
filter(Row.names %in% rownames(retinakeeper_genes))
notenriched <- df %>%
select(Row.names, external_gene_name, logFC, adj.P.Val, Significance) %>%
filter(Row.names %in% c(rownames(retinakeeper_genes),
rownames(scnkeeper_genes))[duplicated(c(rownames(retinakeeper_genes),
rownames(scnkeeper_genes)))]) %>%
filter(Significance == "Not\n Enriched")
df <- rbind(scn_enriched, retina_enriched, notenriched)
df <- df %>%
distinct()
## writexl::write_xlsx(df, path = "axonTRAP_DE_results_20240202/retinahet_vs_scn_het_WTfiltered.xlsx")
labels_ups <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(logFC) %>%
head(n = 10)
labels_downs <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(-logFC) %>%
head(n = 10)
labels <- rbind(labels_ups, labels_downs)
labels_requested <- c("Cdh10","Cdh12","Cdh13","Cdh18",
"Cdh7","Cdh8","Cdh9","Cntn3",
"Cntn4","Cntn5","Cntn6","Kirrel3",
"Nrxn1","Nrxn3","Sema3c","Sema6d",
"Tenm1","Tenm2","Tenm4")
res_tbl <- df
DEplot <- ggplot(res_tbl, aes(x = logFC, y = -log10(adj.P.Val), label = external_gene_name)) +
geom_point(aes(colour = Significance), size = 4) +
geom_vline(xintercept = c(-1,1)) +
geom_hline(yintercept = -log10(.05)) +
theme_classic(base_size = 20) +
xlab("log2(FC)") +
ylab("-log10(p-value)") +
## ggtitle(title, subtitle = subtitle) +
theme(legend.position="right") +
scale_color_manual(values=c("Grey", "#F8766D", "#00BFC4")) +
geom_label_repel(data=filter(df,
external_gene_name %in% labels_requested),
## c(labels$external_gene_name, "Opn4")), #c('s5_het_dlgn', 's5_het_ret', 's5_het_scn')),
## nudge_x = -0.5,
nudge_y = 15, max.overlaps = 25)
#setPDF()
#postscript(file = "axonTRAP_Volcanoplots_20240202/p08_retinavsscnhet_DE_requested_genelabels_02052024.pdf")
DEplot
#dev.off()
scn_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC >= 1) %>%
arrange(-logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val, Significance)
retina_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC <= -1) %>%
arrange(logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val, Significance)
scn_enriched
retina_enriched
df %>%
filter(Significance == "Not\n Enriched")
gsea_result_scn <- gost(query = scn_enriched$external_gene_name,
organism = "mmusculus", evcodes = TRUE,
ordered_query = TRUE, source = c("GO"))
gsea_result_ret <- gost(query = retina_enriched$external_gene_name,
organism = "mmusculus", evcodes = TRUE,
ordered_query = TRUE, source = c("GO"))
gsea_scn <- gsea_result_scn[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 20)
gsea_plots_scn <- ggplot(gsea_scn, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
setPDF()
postscript(file = "images/GSEA_SCNhet_vs_retina_enriched_P08.pdf")
gsea_plots_scn
dev.off()
gsea_ret <- gsea_result_ret[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 20)
gsea_plots_ret <- ggplot(gsea_ret, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
setPDF()
postscript(file = "images/GSEA_Retinahet_vs_SCN_enriched_P08.pdf")
gsea_plots_ret
dev.off()
mm38_subset3 <- subset_expt(
mm38_hisat,
subset = "(batch == '4' | batch == '5' | batch == '6') & time == 'p08' & genotype != 'het' & location != 'dlgn' | sampleid == 'iprgc_03'")
mm38_subset3 <- subset_expt(mm38_subset3, subset = "sampleid != 'iprgc_86'")
mm38_subset3$design %>%
select(genotype, location) %>%
table()
mm38_norm3 <- normalize_expt(mm38_subset3, filter=TRUE,
convert="cpm", transform="log2", batch = "svaseq")
mm_de_subset3 <- all_pairwise(mm38_subset3,
model_batch="svaseq",
parallel=FALSE, do_ebseq=FALSE,
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_edger = FALSE,
filter = TRUE)
retinakeeper_genes <- mm_de_subset3$deseq$all_tables$wtretina_vs_koretina %>%
filter(logFC <= -1 & adj.P.Val <= 0.05)
scnkeeper_genes <- mm_de_subset3$deseq$all_tables$wtscn_vs_koscn %>%
filter(logFC <= -1 & adj.P.Val <= 0.05)
keepergenes <- unique(c(rownames(retinakeeper_genes), rownames(scnkeeper_genes)))
annots_to_merge <- mm_annot %>%
select(ensembl_gene_id, external_gene_name) %>%
filter(ensembl_gene_id %in% rownames(mm_de_subset3$deseq$all_tables$koscn_vs_koretina)) %>%
distinct()
mm_de_subset3$deseq$all_tables$koscn_vs_koretina <- merge(
mm_de_subset3$deseq$all_tables$koscn_vs_koretina,
annots_to_merge, by.x = 0,
by.y = "ensembl_gene_id", all.x = TRUE)
df <- mm_de_subset3$deseq$all_tables$koscn_vs_koretina %>%
mutate(Significance = case_when(logFC <= -1 ~ "Retina Enriched",
logFC >= 1 ~ "SCN Enriched",
logFC > -1 & logFC < 1 ~ "Not\n Enriched"))
df <- df %>%
filter(Row.names %in% keepergenes)
scn_enriched <- df %>%
filter(adj.P.Val <= 0.05 & logFC >= 1) %>%
arrange(-logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val) %>%
mutate(Significance = "SCN Enriched") %>%
filter(Row.names %in% rownames(scnkeeper_genes))
df %>%
filter(adj.P.Val <= 0.05 & logFC <= -1) %>%
arrange(logFC) %>%
select(Row.names, external_gene_name, logFC, adj.P.Val) %>%
mutate(Significance = "Retina Enriched") %>%
filter(Row.names %in% rownames(retinakeeper_genes)) -> retina_enriched
notenriched <- df %>%
select(Row.names, external_gene_name, logFC, adj.P.Val, Significance) %>%
filter(Row.names %in% c(rownames(retinakeeper_genes),
rownames(scnkeeper_genes))[duplicated(c(rownames(retinakeeper_genes),
rownames(scnkeeper_genes)))])
df <- rbind(scn_enriched, retina_enriched, notenriched)
labels_ups <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(logFC) %>%
head(n = 10)
labels_downs <- df %>%
filter(adj.P.Val <= 0.05 & abs(logFC) > 1) %>%
arrange(-logFC) %>%
head(n = 10)
labels <- rbind(labels_ups, labels_downs)
## wanted_column <- "Significance"
res_tbl <- df
DEplot <- ggplot(res_tbl, aes(x = logFC, y = -log10(adj.P.Val), label = external_gene_name)) +
geom_point(aes(colour = Significance), size = 4) +
## geom_point(aes(colour = !!sym(wanted_column)), size = 4) +
geom_vline(xintercept = c(-1, 1)) +
geom_hline(yintercept = -log10(0.05)) +
theme_classic(base_size = 20) +
xlab("log2(FC)") +
ylab("-log10(p-value)") +
## ggtitle(title, subtitle = subtitle) +
theme(legend.position = "right") +
scale_color_manual(values = c("Grey", "#F8766D", "#00BFC4")) +
geom_label_repel(data = filter(
df, external_gene_name %in% c(labels$external_gene_name, "Opn4")),
## c('s5_het_dlgn', 's5_het_ret', 's5_het_scn')),
## nudge_x = -0.5,
nudge_y = 10, max.overlaps = 25)
setPDF()
postscript(file = "images/p08_retinavsscnko_DE_1312024.pdf")
DEplot
dev.off()
gsea_result_scn <- gost(query = scn_enriched$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE,
source = c("GO"))
gsea_result_ret <- gost(query = retina_enriched$external_gene_name,
organism = "mmusculus",
evcodes = TRUE,
ordered_query = TRUE,
source = c("GO"))
gsea_scn <- gsea_result_scn[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 20)
gsea_plots_scn <- ggplot(gsea_scn, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
setPDF()
postscript(file = "images/GSEA_SCNko_enriched_vs_retina_P08.pdf")
gsea_plots_scn
dev.off()
gsea_ret <- gsea_result_ret[["result"]] %>%
select(term_name, p_value, term_size, intersection_size, recall, source, intersection) %>%
arrange(desc(recall)) %>%
head(n = 20)
gsea_plots_ret <- ggplot(gsea_ret, aes(x = recall, y = reorder(term_name, recall), fill = p_value)) +
geom_bar(stat = "identity") +
scale_fill_continuous(low = "blue", high = "red") +
theme_bw() +
ylab("") +
xlab("GSEA Score")
setPDF()
postscript(file = "images/GSEA_Retinako_enriched_vs_SCN_P08.pdf")
gsea_plots_ret
dev.off()
I want to have an invocation of all_pairwise() which uses all samples, in the following block I will set that up using a set of ‘keepers’ which will be named by time, location, then 2 letters for the numerator/denominator: w for WT, h for het, d for delta; thus “p08_retina_hw” is comparing the het/wt for the p08 retina samples.
If they are of interest, I will have a separate set which follows the same convention with names like “p08_ko_sr” to compare p08 deltas with SCN as the numerator and retina as the denominator.
The most peculiar aspect of this analysis resides in the choices around choosing which genes to consider when comparing the genotypes/locations/times. The general idea is pretty clear: find the genes which are non-specifically being pulled down in the WT samples and either exclude or discount them. The various potential methods for performing this are confusing:
Theresa’s current worksheet implements a version of 1b in which she separated the various input gene sets to define the exclusion genes. I am going to repeat this, but leave the starting data structure intact.
In this first iteration, I will do that by creating a simplified model of the data which combines the time/genotype/location and using sva. In my next iteration I will use a full statistical model containing each of those factors (and probably also using sva).
Note: my color choices are kind of garbage.
In addition, the exclusion dataset is the same as the analysis dataset, it is really only the contrasts which will be different.
v3_pairwise_input <- set_expt_conditions(mm38_hisat_v3, fact = "time_geno_loc",
colors = color_choices[["all"]])
## The numbers of samples by condition are:
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 3 3 3 3 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn p15_wt_dlgn p15_wt_retina p15_wt_scn
## 4 3 3 3 3 5 5 2
In the following few blocks I will set up the various comparisons of interest. Starting with the set of genes to exclude because they were observed to bind non-specifically in the wt samples.
In each exclusion I will have the contrast first followed by the pair of contrasts which will be used to define the gene set to exclude.
Put slightly differently, for every term of interest I will create a contrast with the wt as numerator and the desired term as denominator, then pull out the genes increased in wt.
inclusions <- list(
## I like alphabetizing things, start with dlgn
"p15_het_dlgn" = c("p15hetdlgn", "p15wtdlgn"),
"p08_het_dlgn" = c("p08hetdlgn", "p08wtdlgn"),
"p15_ko_dlgn" = c("p15kodlgn", "p15wtdlgn"),
"p08_ko_dlgn" = c("p08kodlgn", "p08wtdlgn"),
## Then retinas
"p15_het_retina" = c("p15hetretina", "p15wtretina"),
"p08_het_retina" = c("p08hetretina", "p08wtretina"),
"p15_ko_retina" = c("p15koretina", "p15wtretina"),
"p08_ko_retina" = c("p08koretina", "p08wtretina"),
## Then scn
"p15_het_scn" = c("p15hetscn", "p15wtscn"),
"p08_het_scn" = c("p08hetscn", "p08wtscn"),
"p15_ko_scn" = c("p15koscn", "p15wtscn"),
"p08_ko_scn" = c("p08koscn", "p08wtscn"))
For each location/genotype of interest, let us compare p15/p08
time_keepers <- list(
## DLGN
"t_het_dlgn" = c("p15hetdlgn", "p08hetdlgn"),
"t_ko_dlgn" = c("p15kodlgn", "p08kodlgn"),
## Retina
"t_het_retina" = c("p15hetretina", "p08hetretina"),
"t_ko_retina" = c("p15koretina", "p08koretina"),
## SCN
"t_het_scn" = c("p15hetscn", "p08hetscn"),
"t_ko_scn" = c("p15koscn", "p08koscn"))
Compare locations and keep time/genotype consistent. I will use the location initials to define numerator/denominator.
location_keepers <- list(
## dlgn/retina
"dr_p08_het" = c("p08hetdlgn", "p08hetretina"),
"dr_p15_het" = c("p15hetdlgn", "p15hetretina"),
"dr_p08_ko" = c("p08kodlgn", "p08koretina"),
"dr_p15_ko" = c("p15kodlgn", "p15koretina"),
## scn/retina
"sr_p08_het" = c("p08hetscn", "p08hetretina"),
"sr_p15_het" = c("p15hetscn", "p15hetretina"),
"sr_p08_ko" = c("p08koscn", "p08koretina"),
"sr_p15_ko" = c("p15koscn", "p15koretina"),
## dlgn/scn
"ds_p08_het" = c("p08hetdlgn", "p08hetscn"),
"ds_p15_het" = c("p15hetdlgn", "p15hetscn"),
"ds_p08_ko" = c("p08kodlgn", "p08koscn"),
"ds_p15_ko" = c("p15kodlgn", "p15koscn"))
Compare ko/het while keeping time/location constant. Similarly, use the initials to denote numerator/denominator, which will always be kh.
genotype_keepers <- list(
## DLGN
"kh_p08_dlgn" = c("p08kodlgn", "p08hetdlgn"),
"kh_p15_dlgn" = c("p15kodlgn", "p15hetdlgn"),
## Retina
"kh_p08_retina" = c("p08koretina", "p08hetretina"),
"kh_p15_retina" = c("p15koretina", "p15hetretina"),
## SCN
"kh_p08_scn" = c("p08koscn", "p08hetscn"),
"kh_p15_scn" = c("p15koscn", "p15hetscn"))
My all_pairwise() function now has a parameter which allows me to choose which contrasts to perform instead of literally doing every possible comparison. That is well suited for these operations:
lfc_cutoff <- 0.1
adjp_cutoff <- 0.1
inclusion_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = inclusions, model_batch = "svaseq")
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 3 3 3 3 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn p15_wt_dlgn p15_wt_retina p15_wt_scn
## 4 3 3 3 3 5 5 2
## Removing 0 low-count genes (15263 remaining).
## Setting 9594 low elements to zero.
## transform_counts: Found 9594 values equal to 0, adding 1 to the matrix.
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
inclusion_tables <- combine_de_tables(
inclusion_de, keepers = inclusions, label_column = label_column,
excel = glue("wt_comparisons/inclusion_tables-v{ver}.xlsx"))
## Deleting the file wt_comparisons/inclusion_tables-v20240917.xlsx before writing the tables.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15hetdlgn_vs_p15wtdlgn 509 1028 574 1095 622 516
## 2 p08hetdlgn_vs_p08wtdlgn 41 54 90 60 53 27
## 3 p15kodlgn_vs_p15wtdlgn 810 1370 946 1362 837 803
## 4 p08kodlgn_vs_p08wtdlgn 382 523 426 561 485 385
## 5 p15hetretina_vs_p15wtretina 305 58 327 79 350 52
## 6 p08hetretina_vs_p08wtretina 852 615 870 712 862 605
## 7 p15koretina_vs_p15wtretina 111 28 136 34 42 17
## 8 p08koretina_vs_p08wtretina 711 221 733 291 721 284
## 9 p15hetscn_vs_p15wtscn 4 6 4 8 0 0
## 10 p08hetscn_vs_p08wtscn 61 23 54 25 0 0
## 11 p15koscn_vs_p15wtscn 1 3 1 8 2 0
## 12 p08koscn_vs_p08wtscn 2 1 3 1 4 1
## Plot describing unique/shared genes in a differential expression table.
inclusion_sig <- extract_significant_genes(
inclusion_tables, lfc = lfc_cutoff, p = adjp_cutoff,
according_to = "deseq",
excel = glue("wt_comparisons/inclusion_sig-v{ver}.xlsx"))
## Deleting the file wt_comparisons/inclusion_sig-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 0.1 adj P cutoff: 0.1
## deseq_up deseq_down
## p15_het_dlgn 2191 2645
## p08_het_dlgn 369 388
## p15_ko_dlgn 2623 3073
## p08_ko_dlgn 1397 1462
## p15_het_retina 948 371
## p08_het_retina 2284 2094
## p15_ko_retina 262 68
## p08_ko_retina 1728 1047
## p15_het_scn 7 6
## p08_het_scn 311 138
## p15_ko_scn 1 3
## p08_ko_scn 3 1
Up above Theresa performed a 0.25 log2FC and 0.05 adjp filter which provided a set of 2,640 genes observed higher in the p08 het retinas vs. wt retinas. I should see that in this inclusion_sig data structure.
There is an important caveat though: in Theresa’s filter above, she did a DE of only the retina samples but I did all samples. I expected that this would result in basically the same result (I actually assumed I would get a few more genes), but instead it appears to have retrieved a significantly smaller number of genes (about 1/2, happily they pretty much all appear in the previous filter). As a result, I am going to try relaxing my constraints slightly to see if I can recapitulate her filter (which would match Theresa’s later filter, though I guess that in turn will lead to a smaller set of genes compared to her later, relaxed 0.1 filter).
comparison <- inclusion_sig[["deseq"]][["ups"]][["p08_het_retina"]]
comp <- list(
"taa" = taa_keepers,
"new" = rownames(comparison))
test_comparison <- Vennerable::Venn(comp)
plot(test_comparison)
I want to have a little function which, given a contrast of interest, will extract the gene sets which should be included/excluded given the above.
write_all_cp <- function(all_cp) {
all_written <- list()
for (g in seq_len(length(all_cp))) {
name <- names(all_cp)[g]
datum <- all_cp[[name]]
filename <- glue("cprofiler/{ver}/{name}_cprofiler-v{ver}.xlsx")
written <- sm(write_cp_data(datum, excel = filename))
all_written[[g]] <- written
}
return(all_written)
}
write_all_gp <- function(all_gp) {
all_written <- list()
for (g in seq_len(length(all_gp))) {
name <- names(all_gp)[g]
datum <- all_gp[[name]]
filename <- glue("gprofiler/{ver}/{name}_gprofiler-v{ver}.xlsx")
written <- sm(write_gprofiler_data(datum, excel = filename))
all_written[[g]] <- written
}
return(all_written)
}
extract_inclusions <- function(inclusion_sig, inclusion_tables, inclusions, keepers, all_genes,
according_to = "deseq", which = "ups") {
retlist <- list()
table_names <- names(inclusion_sig[[according_to]][[which]])
for (c_num in seq_along(keepers)) {
contrast <- names(keepers)[c_num]
numerator_name <- keepers[[c_num]][1]
denominator_name <- keepers[[c_num]][2]
## In my new branch I cleaned up the sanitizer function for contrasts so this is not needed.
numerator_name <- gsub(x = numerator_name, pattern = "(het|ko|wt)", replacement = "_\\1_")
denominator_name <- gsub(x = denominator_name, pattern = "(het|ko|wt)", replacement = "_\\1_")
numerator_table <- inclusion_sig[[according_to]][[which]][[numerator_name]]
numerator_genes <- rownames(numerator_table)
denominator_table <- inclusion_sig[[according_to]][[which]][[denominator_name]]
denominator_genes <- rownames(denominator_table)
df_columns <- paste0("deseq_", c("logfc", "adjp", "den"))
included_num <- inclusion_tables[["data"]][[numerator_name]][, df_columns]
colnames(included_num) <- c("numerator_vs_wt_logfc", "numerator_vs_wt_adjp", "num_wt_mean_exprs")
included_den <- inclusion_tables[["data"]][[denominator_name]][, df_columns]
colnames(included_den) <- c("denominator_vs_wt_logfc", "denominator_vs_wt_adjp", "den_wt_mean_exprs")
included_df <- merge(included_num, included_den, by = "row.names")
rownames(included_df) <- included_df[["Row.names"]]
included_df[["Row.names"]] <- NULL
include_genes <- unique(c(numerator_genes, denominator_genes))
message("The set of unique genes higher in ", numerator_name,
" vs. wt is ", length(numerator_genes), ".")
message("The set of unique genes higher in ", denominator_name,
" vs. wt is ", length(denominator_genes), ".")
message("The unique union of them is ", length(include_genes), " genes.")
include_name <- paste0("inc_", contrast)
include_idx <- all_genes %in% include_genes
include_genes <- all_genes[include_idx]
df_name <- paste0("df_", contrast)
retlist[[df_name]] <- included_df
written_inclusion <- write_xlsx(data = included_df,
excel = glue("included_genes/{include_name}-v{ver}.xlsx"))
retlist[[include_name]] <- include_genes
retlist[[contrast]] <- include_genes
}
return(retlist)
}
Now, using that function, pull out the gene IDs of genes we do not trust because they were too high in wt for every contrast we are likely to perform.
all_genes <- rownames(exprs(v3_pairwise_input))
time_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
time_keepers, all_genes)
## The set of unique genes higher in p15_het_dlgn vs. wt is 2191.
## The set of unique genes higher in p08_het_dlgn vs. wt is 369.
## The unique union of them is 2241 genes.
## Deleting the file included_genes/inc_t_het_dlgn-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2623.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 1397.
## The unique union of them is 3318 genes.
## Deleting the file included_genes/inc_t_ko_dlgn-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_het_retina vs. wt is 948.
## The set of unique genes higher in p08_het_retina vs. wt is 2284.
## The unique union of them is 2443 genes.
## Deleting the file included_genes/inc_t_het_retina-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_retina vs. wt is 262.
## The set of unique genes higher in p08_ko_retina vs. wt is 1728.
## The unique union of them is 1810 genes.
## Deleting the file included_genes/inc_t_ko_retina-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_het_scn vs. wt is 7.
## The set of unique genes higher in p08_het_scn vs. wt is 311.
## The unique union of them is 316 genes.
## Deleting the file included_genes/inc_t_het_scn-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_scn vs. wt is 1.
## The set of unique genes higher in p08_ko_scn vs. wt is 3.
## The unique union of them is 4 genes.
## Deleting the file included_genes/inc_t_ko_scn-v20240917.xlsx before writing the tables.
location_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
location_keepers, all_genes)
## The set of unique genes higher in p08_het_dlgn vs. wt is 369.
## The set of unique genes higher in p08_het_retina vs. wt is 2284.
## The unique union of them is 2493 genes.
## Deleting the file included_genes/inc_dr_p08_het-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2191.
## The set of unique genes higher in p15_het_retina vs. wt is 948.
## The unique union of them is 2787 genes.
## Deleting the file included_genes/inc_dr_p15_het-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 1397.
## The set of unique genes higher in p08_ko_retina vs. wt is 1728.
## The unique union of them is 2749 genes.
## Deleting the file included_genes/inc_dr_p08_ko-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2623.
## The set of unique genes higher in p15_ko_retina vs. wt is 262.
## The unique union of them is 2809 genes.
## Deleting the file included_genes/inc_dr_p15_ko-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p08_het_scn vs. wt is 311.
## The set of unique genes higher in p08_het_retina vs. wt is 2284.
## The unique union of them is 2546 genes.
## Deleting the file included_genes/inc_sr_p08_het-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_het_scn vs. wt is 7.
## The set of unique genes higher in p15_het_retina vs. wt is 948.
## The unique union of them is 955 genes.
## Deleting the file included_genes/inc_sr_p15_het-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_scn vs. wt is 3.
## The set of unique genes higher in p08_ko_retina vs. wt is 1728.
## The unique union of them is 1731 genes.
## Deleting the file included_genes/inc_sr_p08_ko-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_scn vs. wt is 1.
## The set of unique genes higher in p15_ko_retina vs. wt is 262.
## The unique union of them is 263 genes.
## Deleting the file included_genes/inc_sr_p15_ko-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p08_het_dlgn vs. wt is 369.
## The set of unique genes higher in p08_het_scn vs. wt is 311.
## The unique union of them is 671 genes.
## Deleting the file included_genes/inc_ds_p08_het-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2191.
## The set of unique genes higher in p15_het_scn vs. wt is 7.
## The unique union of them is 2197 genes.
## Deleting the file included_genes/inc_ds_p15_het-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_dlgn vs. wt is 1397.
## The set of unique genes higher in p08_ko_scn vs. wt is 3.
## The unique union of them is 1399 genes.
## Deleting the file included_genes/inc_ds_p08_ko-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2623.
## The set of unique genes higher in p15_ko_scn vs. wt is 1.
## The unique union of them is 2624 genes.
## Deleting the file included_genes/inc_ds_p15_ko-v20240917.xlsx before writing the tables.
genotype_inclusions <- extract_inclusions(inclusion_sig, inclusion_tables, inclusions,
genotype_keepers, all_genes)
## The set of unique genes higher in p08_ko_dlgn vs. wt is 1397.
## The set of unique genes higher in p08_het_dlgn vs. wt is 369.
## The unique union of them is 1475 genes.
## Deleting the file included_genes/inc_kh_p08_dlgn-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_dlgn vs. wt is 2623.
## The set of unique genes higher in p15_het_dlgn vs. wt is 2191.
## The unique union of them is 2906 genes.
## Deleting the file included_genes/inc_kh_p15_dlgn-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_retina vs. wt is 1728.
## The set of unique genes higher in p08_het_retina vs. wt is 2284.
## The unique union of them is 2653 genes.
## Deleting the file included_genes/inc_kh_p08_retina-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_retina vs. wt is 262.
## The set of unique genes higher in p15_het_retina vs. wt is 948.
## The unique union of them is 1030 genes.
## Deleting the file included_genes/inc_kh_p15_retina-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p08_ko_scn vs. wt is 3.
## The set of unique genes higher in p08_het_scn vs. wt is 311.
## The unique union of them is 314 genes.
## Deleting the file included_genes/inc_kh_p08_scn-v20240917.xlsx before writing the tables.
## The set of unique genes higher in p15_ko_scn vs. wt is 1.
## The set of unique genes higher in p15_het_scn vs. wt is 7.
## The unique union of them is 8 genes.
## Deleting the file included_genes/inc_kh_p15_scn-v20240917.xlsx before writing the tables.
genotype_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = genotype_keepers, model_batch = "svaseq")
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 3 3 3 3 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn p15_wt_dlgn p15_wt_retina p15_wt_scn
## 4 3 3 3 3 5 5 2
## Removing 0 low-count genes (15263 remaining).
## Setting 9594 low elements to zero.
## transform_counts: Found 9594 values equal to 0, adding 1 to the matrix.
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
location_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = location_keepers, model_batch = "svaseq")
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 3 3 3 3 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn p15_wt_dlgn p15_wt_retina p15_wt_scn
## 4 3 3 3 3 5 5 2
## Removing 0 low-count genes (15263 remaining).
## Setting 9594 low elements to zero.
## transform_counts: Found 9594 values equal to 0, adding 1 to the matrix.
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
time_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = time_keepers, model_batch = "svaseq")
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 3 3 3 3 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn p15_wt_dlgn p15_wt_retina p15_wt_scn
## 4 3 3 3 3 5 5 2
## Removing 0 low-count genes (15263 remaining).
## Setting 9594 low elements to zero.
## transform_counts: Found 9594 values equal to 0, adding 1 to the matrix.
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: svaseq.
## The primary analysis performed 10 comparisons.
I will start with the tables and no inclusions so I can check my work.
In this first block I will explain a little more thoroughly what is going on:
genotype_tables_full <- combine_de_tables(
genotype_de, keepers = genotype_keepers, label_column = label_column,
excel = glue("full_contrasts/genotype_full_tables-v{ver}.xlsx"))
## Deleting the file full_contrasts/genotype_full_tables-v20240917.xlsx before writing the tables.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08kodlgn_vs_p08hetdlgn 34 1 43 2 41 1
## 2 p15kodlgn_vs_p15hetdlgn 49 2 85 2 0 0
## 3 p08koretina_vs_p08hetretina 6 2 4 2 3 1
## 4 p15koretina_vs_p15hetretina 6 4 8 4 0 3
## 5 p08koscn_vs_p08hetscn 51 128 82 136 31 22
## 6 p15koscn_vs_p15hetscn 0 12 0 25 0 1
## Plot describing unique/shared genes in a differential expression table.
genotype_sig_full <- extract_significant_genes(
genotype_tables_full, according_to = "deseq",
excel = glue("full_contrasts/genotype_full_sig-v{ver}.xlsx"))
## Deleting the file full_contrasts/genotype_full_sig-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_dlgn 34 1
## kh_p15_dlgn 49 2
## kh_p08_retina 6 2
## kh_p15_retina 6 4
## kh_p08_scn 51 128
## kh_p15_scn 0 12
genotype_tables <- list()
genotype_sig <- list()
genotype_gp <- list()
genotype_cp <- list()
for (k in seq_along(genotype_keepers)) {
name <- names(genotype_keepers)[k]
message("Examining ", name)
keeper <- genotype_keepers[name]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- genotype_inclusions[[include_df_name]]
includes <- genotype_inclusions[[include_name]]
summary(rownames(genotype_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
include_filename <- glue("genotype_contrasts/genotype_{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("genotype_contrasts/genotype_{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
genotype_tables[[name]] <- combine_de_tables(
genotype_de, extra_annot = include_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes)
print(genotype_tables[[name]])
genotype_sig[[name]] <- extract_significant_genes(
genotype_tables[[name]], according_to = "deseq",
excel = include_sig_filename)
print(genotype_sig[[name]])
num_rows <- nrow(genotype_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(genotype_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows >= 10) {
message("Performing gprofiler/clusterProfiler.")
genotype_gp[[name]] <- all_gprofiler(genotype_sig[[name]], species = "mmusculus")
gp_written <- write_all_gp(genotype_gp[[name]])
genotype_cp[[name]] <- all_cprofiler(genotype_sig[[name]], genotype_tables[[name]],
orgdb = "org.Mm.eg.db")
cp_written <- write_all_cp(genotype_cp[[name]])
} else {
warning("There are less than 10 genes up and down in the ", name, " comparison.")
message("There are less than 10 genes up and down in the ", name, " comparison.")
}
}
## Examining kh_p08_dlgn
## Deleting the file genotype_contrasts/genotype_kh_p08_dlgn_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08kodlgn_vs_p08hetdlgn 34 1 37 0 31 0
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file genotype_contrasts/genotype_kh_p08_dlgn_including_wt_0.1_decreased_sig-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_dlgn 34 1
## There are 35 significant up and down genes.
## Performing gprofiler/clusterProfiler.
## There are only, 1 returning null.
## preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
## preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
## --> No gene can be mapped....
## --> Expected input gene ID: 328099,94180,225913,26876,14979,226265
## --> return NULL...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Examining kh_p15_dlgn
##
## Deleting the file genotype_contrasts/genotype_kh_p15_dlgn_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
##
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15kodlgn_vs_p15hetdlgn 43 0 76 0 0 0
## Only kh_p15_dlgn_up has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## Deleting the file genotype_contrasts/genotype_kh_p15_dlgn_including_wt_0.1_decreased_sig-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p15_dlgn 43 0
## There are 43 significant up and down genes.
## Performing gprofiler/clusterProfiler.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining kh_p08_retina
## Deleting the file genotype_contrasts/genotype_kh_p08_retina_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08koretina_vs_p08hetretina 3 1 3 1 3 1
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file genotype_contrasts/genotype_kh_p08_retina_including_wt_0.1_decreased_sig-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_retina 3 1
## There are 4 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p08_retina comparison.
## There are less than 10 genes up and down in the kh_p08_retina comparison.
## Examining kh_p15_retina
## Deleting the file genotype_contrasts/genotype_kh_p15_retina_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15koretina_vs_p15hetretina 5 4 6 3 0 3
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file genotype_contrasts/genotype_kh_p15_retina_including_wt_0.1_decreased_sig-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p15_retina 5 4
## There are 9 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p15_retina comparison.
## There are less than 10 genes up and down in the kh_p15_retina comparison.
## Examining kh_p08_scn
## Deleting the file genotype_contrasts/genotype_kh_p08_scn_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08koscn_vs_p08hetscn 3 68 3 67 1 13
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file genotype_contrasts/genotype_kh_p08_scn_including_wt_0.1_decreased_sig-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p08_scn 3 68
## There are 71 significant up and down genes.
## Performing gprofiler/clusterProfiler.
## There are only, 3 returning null.
## preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining kh_p15_scn
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15koscn_vs_p15hetscn 0 5 0 5 0 0
## Only kh_p15_scn_down has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## kh_p15_scn 0 5
## There are 5 significant up and down genes.
## Warning: There are less than 10 genes up and down in the kh_p15_scn comparison.
## There are less than 10 genes up and down in the kh_p15_scn comparison.
A few specific plots of interest: Colenso asked to label a few genes for the knockout/het p08_retinas, p08_scn, and p08_dlgn: either the top-15 or all significant. I am pretty sure if I tell it 15 and there are not that many, it will just do the significant? Let us find out!
For some crazy reason, this plot is double-labelling!
table_input <- genotype_tables[["kh_p08_retina"]]
table_name <- "kh_p08_retina"
table <- table_input[["data"]][[table_name]]
interesting <- c("Opn4", "Gm9008", "Lrr1", "Cnbd1")
kh_p08_retina_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
label_column = "mgisymbol", label = interesting, alpha = 1.0,
size = 4, label_type = "label")
pp(file = "images/kh_p08_retina_volcano.pdf", width = 9, height = 9)
kh_p08_retina_volcano[["plot"]]
dev.off()
## png
## 2
## why in the crap is it double-labelling!?
## My MA plotter isn't as smart as the volcano plotter, the genes are:
kh_p08_retina_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
p_col = "deseq_adjp", label_column = "mgisymbol", label = interesting)
pp(file = "images/kh_p08_retina_ma.pdf", width = 9, height = 9)
kh_p08_retina_ma[["plot"]]
dev.off()
## png
## 2
Holy crappers, this plot did not double label; oooh I have a check in my plotter to see if there are too few/too many labels and I foolishly allowed it to concatenate the labels! What in the crap was I thinking?
I am going to make an executive decision for this plot, 15 is too many and makes it crazy cluttered.
table_input <- genotype_tables[["kh_p08_scn"]]
table_name <- "kh_p08_scn"
table <- table_input[["data"]][[table_name]]
interesting_genes <- c("Fign", "Nrn1", "Dpysl2", "Actb", "Fgf9", "Otx2", "Sec23",
"Ncam1", "Map4", "Sec22b", "Nlgn3", "Marcks", "Cd47",
"Dpysl3", "Lin7c", "Cadm1", "Snx12", "Rhoa", "Inpp5f",
"Atg12", "Set", "Gsk3b", "Pdcd4", "Gabra2", "Tmco1", "Anapc16")
kh_p08_scn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
label_column = "mgisymbol", label = interesting_genes, size = 4, alpha = 1.0,
label_type = "label")
pp(file = "images/kh_p08_scn_volcano.pdf", width = 9, height = 9)
kh_p08_scn_volcano[["plot"]]
dev.off()
## png
## 2
## why in the crap is it double-labelling!?
## My MA plotter isn't as smart as the volcano plotter, the genes are:
kh_p08_scn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
p_col = "deseq_adjp", label_column = "mgisymbol", label = interesting_genes)
pp(file = "images/kh_p08_scn_ma.pdf", width = 9, height = 9)
kh_p08_scn_ma[["plot"]]
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## png
## 2
## Warning: ggrepel: 14 unlabeled data points (too many overlaps). Consider increasing max.overlaps
table_input <- genotype_tables[["kh_p08_dlgn"]]
table_name <- "kh_p08_dlgn"
table <- table_input[["data"]][[table_name]]
kh_p08_dlgn_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
label_column = "mgisymbol", label = 10, size = 4, alpha = 1.0,
label_type = "label")
pp(file = "images/kh_p08_dlgn_volcano.pdf", width = 9, height = 9)
kh_p08_dlgn_volcano[["plot"]]
dev.off()
## png
## 2
## My MA plotter isn't as smart as the volcano plotter, the genes are:
kh_p08_dlgn_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
p_col = "deseq_adjp", label_column = "mgisymbol", label = 10)
pp(file = "images/kh_p08_dlgn_ma.pdf", width = 9, height = 9)
kh_p08_dlgn_ma[["plot"]]
dev.off()
## png
## 2
Repeat the same block with a find/replace of genotype/location.
location_tables_full <- combine_de_tables(
location_de, keepers = location_keepers, label_column = label_column,
excel = glue("full_contrasts/location_full_tables-v{ver}.xlsx"))
location_tables_full
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08hetdlgn_vs_p08hetretina 2174 1563 2232 1639 1895 1646
## 2 p15hetdlgn_vs_p15hetretina 2429 3362 2584 3348 2738 2634
## 3 p08kodlgn_vs_p08koretina 2198 1864 2163 2089 2131 1961
## 4 p15kodlgn_vs_p15koretina 2711 3941 2930 3852 3243 2974
## 5 p08hetscn_vs_p08hetretina 2630 1701 2593 1892 2225 1894
## 6 p15hetscn_vs_p15hetretina 2846 2381 2724 2617 2729 2360
## 7 p08koscn_vs_p08koretina 2733 1716 2674 1937 2412 2174
## 8 p15koscn_vs_p15koretina 2620 3023 2613 3159 2827 2646
## 9 p08hetdlgn_vs_p08hetscn 646 785 760 783 645 759
## 10 p15hetdlgn_vs_p15hetscn 1698 2769 1985 2620 1965 2014
## 11 p08kodlgn_vs_p08koscn 986 1312 1104 1411 1162 1277
## 12 p15kodlgn_vs_p15koscn 1841 2529 2161 2413 1892 2006
## Plot describing unique/shared genes in a differential expression table.
location_sig_full <- extract_significant_genes(
location_tables_full, according_to = "deseq",
excel = glue("full_contrasts/location_full_sig-v{ver}.xlsx"))
location_sig_full
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p08_het 2174 1563
## dr_p15_het 2429 3362
## dr_p08_ko 2198 1864
## dr_p15_ko 2711 3941
## sr_p08_het 2630 1701
## sr_p15_het 2846 2381
## sr_p08_ko 2733 1716
## sr_p15_ko 2620 3023
## ds_p08_het 646 785
## ds_p15_het 1698 2769
## ds_p08_ko 986 1312
## ds_p15_ko 1841 2529
location_tables <- list()
location_sig <- list()
location_gp <- list()
location_cp <- list()
for (k in seq_along(location_keepers)) {
name <- names(location_keepers)[k]
message("Examining ", name)
keeper <- location_keepers[name]
includes <- location_inclusions[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- location_inclusions[[include_df_name]]
includes <- location_inclusions[[include_name]]
summary(rownames(location_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
include_filename <- glue("location_contrasts/location_{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("location_contrasts/location_{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
location_tables[[name]] <- combine_de_tables(
location_de, extra_annot = include_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes)
print(location_tables[[name]])
location_sig[[name]] <- extract_significant_genes(
location_tables[[name]], according_to = "deseq",
excel = include_sig_filename)
print(location_sig[[name]])
num_rows <- nrow(location_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(location_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows > 10) {
location_gp[[name]] <- all_gprofiler(location_sig[[name]], species = "mmusculus")
gp_written <- write_all_gp(genotype_gp[[name]])
location_cp[[name]] <- all_cprofiler(location_sig[[name]], location_tables[[name]],
orgdb = "org.Mm.eg.db")
cp_written <- write_all_cp(genotype_cp[[name]])
}
}
## Examining dr_p08_het
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08hetdlgn_vs_p08hetretina 463 178 466 180 420 188
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p08_het 463 178
## There are 641 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## preparing geneSet collections...
##
## GSEA analysis...
##
## leading edge analysis...
##
## done...
##
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Examining dr_p15_het
##
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15hetdlgn_vs_p15hetretina 992 137 1051 133 965 122
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p15_het 992 137
## There are 1129 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining dr_p08_ko
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08kodlgn_vs_p08koretina 829 145 801 158 756 162
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p08_ko 829 145
## There are 974 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining dr_p15_ko
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15kodlgn_vs_p15koretina 1111 259 1181 250 1237 219
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## dr_p15_ko 1111 259
## There are 1370 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining sr_p08_het
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08hetscn_vs_p08hetretina 548 265 531 291 473 291
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## sr_p08_het 548 265
## There are 813 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining sr_p15_het
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15hetscn_vs_p15hetretina 254 129 237 136 231 126
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## sr_p15_het 254 129
## There are 383 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining sr_p08_ko
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08koscn_vs_p08koretina 402 151 387 164 390 181
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## sr_p08_ko 402 151
## There are 553 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining sr_p15_ko
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15koscn_vs_p15koretina 77 54 75 57 89 49
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## sr_p15_ko 77 54
## There are 131 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining ds_p08_het
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08hetdlgn_vs_p08hetscn 51 57 63 48 49 39
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## ds_p08_het 51 57
## There are 108 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining ds_p15_het
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15hetdlgn_vs_p15hetscn 1039 3 1191 3 1043 2
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## ds_p15_het 1039 3
## There are 1042 significant up and down genes.
## There are only, 3 returning null.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining ds_p08_ko
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p08kodlgn_vs_p08koscn 365 12 397 12 339 13
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## ds_p08_ko 365 12
## There are 377 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining ds_p15_ko
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15kodlgn_vs_p15koscn 1117 4 1286 3 1024 6
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## ds_p15_ko 1117 4
## There are 1121 significant up and down genes.
## There are only, 4 returning null.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
Colenso sent a specific query of interest, comparing SCN vs. Retinas at p08 in the heterozygotes including a set of genes of particular interest. Perhaps I can use some of these as markers to quality control my work in the future?
Here are the genes:
Opn4, Eomes, Trpc7, Oprm1, Nr4a3, Tbx20, Irx6, AW551984, Pcdh19, Adcyap1, Baiap3, Chl1, Grin3a, Igf1, Gria1, Grin2d, Grin3a, Chrna6, Chrna3, Htr5a, Htr2a, Htr7, Irx4, PlxnC1, Sema6d, Sema4f, Sema4a, Sema6b, Lrrc4b, Lrrc58, Lrrc3b, Wnt4, Wnt9b, Ctxn3, Tenm1, Gna14, Rgs4, Rgs6, Rgs5
table_input <- location_tables[["sr_p08_het"]]
table_name <- "sr_p08_het"
table <- table_input[["data"]][[table_name]]
interesting_genes <- c("Opn4", "Eomes", "Trpc7", "Oprm1", "Nr4a3", "Tbx20",
"Irx6", "AW551984", "Pcdh19", "Adcyap1r1", "Baiap3",
"Chl1", "Grin3a", "Igf1", "Gria1", "Grin2d", "Grin3a",
"Chrna6", "Chrna3", "Htr5a", "Htr2a", "Htr7", "Irx4",
"PlxnC1", "Sema6d", "Sema4f", "Sema4a", "Sema6b", "Lrrc4b",
"Lrrc58", "Lrrc3b", "Wnt4", "Wnt9b", "Ctxn3", "Tenm1", "Gna14",
"Rgs4", "Rgs6", "Rgs5", "Pou4f2", "Chrnb3", "Bcan")
sr_p08_het_volcano <- plot_volcano_condition_de(
table, table_name, fc_col = "deseq_logfc", p_col = "deseq_adjp",
label_column = "mgisymbol", label = interesting_genes, alpha = 1.0,
label_type = "label", size = 4)
pp(file = "images/sr_p08_het_volcano.pdf", width = 9, height = 9)
sr_p08_het_volcano[["plot"]]
dev.off()
## png
## 2
sr_p08_het_ma <- plot_ma_condition_de(
table, table_name, expr_col = "deseq_basemean", fc_col = "deseq_logfc",
p_col = "deseq_adjp", label_column = "mgisymbol", label = interesting_genes)
pp(file = "images/sr_p08_het_ma.pdf", width = 9, height = 9)
sr_p08_het_ma[["plot"]]
## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider increasing max.overlaps
## png
## 2
## Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider increasing max.overlaps
time_tables_full <- combine_de_tables(
time_de, keepers = time_keepers,
label_column = label_column,
excel = glue("full_contrasts/time_full_tables-v{ver}.xlsx"))
time_sig_full <- extract_significant_genes(
time_tables_full, according_to = "deseq",
excel = glue("full_contrasts/time_full_sig-v{ver}.xlsx"))
time_tables <- list()
time_sig <- list()
time_gp <- list()
time_cp <- list()
for (k in seq_along(time_keepers)) {
name <- names(time_keepers)[k]
message("Examining ", name)
keeper <- time_keepers[name]
includes <- time_inclusions[[name]]
include_name <- paste0("inc_", name)
include_df_name <- paste0("df_", name)
include_df <- time_inclusions[[include_df_name]]
includes <- time_inclusions[[include_name]]
summary(rownames(time_sig_full[["deseq"]][["ups"]][[name]]) %in% includes)
include_filename <- glue("time_contrasts/time_{name}_including_wt_{lfc_cutoff}_decreased_table-v{ver}.xlsx")
include_sig_filename <- glue("time_contrasts/time_{name}_including_wt_{lfc_cutoff}_decreased_sig-v{ver}.xlsx")
time_tables[[name]] <- combine_de_tables(
time_de, extra_annot = include_df,
keepers = keeper, label_column = label_column,
excel = include_filename, wanted_genes = includes)
print(time_tables[[name]])
time_sig[[name]] <- extract_significant_genes(
time_tables[[name]], according_to = "deseq",
excel = include_filename)
print(time_sig[[name]])
num_rows <- nrow(time_sig[[name]][["deseq"]][["ups"]][[name]]) +
nrow(time_sig[[name]][["deseq"]][["downs"]][[name]])
message("There are ", num_rows, " significant up and down genes.")
if (num_rows > 10) {
time_gp[[name]] <- all_gprofiler(time_sig[[name]], species = "mmusculus")
gp_written <- write_all_gp(time_gp[[name]])
time_cp[[name]] <- all_cprofiler(time_sig[[name]], time_tables[[name]],
orgdb = "org.Mm.eg.db")
cp_written <- write_all_cp(time_cp[[name]])
}
}
## Examining t_het_dlgn
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15hetdlgn_vs_p08hetdlgn 499 17 534 15 436 17
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file time_contrasts/time_t_het_dlgn_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## t_het_dlgn 499 17
## There are 516 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## preparing geneSet collections...
##
## GSEA analysis...
##
## leading edge analysis...
##
## done...
##
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.
## Examining t_ko_dlgn
##
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15kodlgn_vs_p08kodlgn 719 229 846 205 624 196
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file time_contrasts/time_t_ko_dlgn_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## t_ko_dlgn 719 229
## There are 948 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining t_het_retina
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15hetretina_vs_p08hetretina 74 273 78 280 57 358
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file time_contrasts/time_t_het_retina_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## t_het_retina 74 273
## There are 347 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining t_ko_retina
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15koretina_vs_p08koretina 47 339 52 358 39 516
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file time_contrasts/time_t_ko_retina_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## t_ko_retina 47 339
## There are 386 significant up and down genes.
## preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.preparing geneSet collections...
## GSEA analysis...
## no term enriched under specific pvalueCutoff...
## Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Scale for size is already present.
## Adding another scale for size, which will replace the existing scale.Examining t_het_scn
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15hetscn_vs_p08hetscn 1 2 1 2 0 2
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## Plot describing unique/shared genes in a differential expression table.
## Deleting the file time_contrasts/time_t_het_scn_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## t_het_scn 1 2
## There are 3 significant up and down genes.
## Examining t_ko_scn
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 p15koscn_vs_p08koscn 0 3 1 3 1 2
## Only t_ko_scn_down has information, cannot create an UpSet.
## Plot describing unique/shared genes in a differential expression table.
## NULL
## Deleting the file time_contrasts/time_t_ko_scn_including_wt_0.1_decreased_table-v20240917.xlsx before writing the tables.
## A set of genes deemed significant according to deseq.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
## deseq_up deseq_down
## t_ko_scn 0 3
## There are 3 significant up and down genes.
In conversation with Colenso, he spoke about a series of contrasts which would be interesting to attempt in order to query the changes across both locations and genotypes and/or both locations and time, thus:
(p08_het_scn / p08_het_retina) / (p08_ko_scn / p08_ko_retina)
as an example. We can definitely do these, but they do not work for all methods employed (I think they work best with limma and edgeR).
Lets find out!
scn_extra <- glue("\\
p08het = (p08_het_scn - p08_het_retina), \\
p08ko = (p08_ko_scn - p08_ko_retina), \\
p08het_vs_p08ko = (p08_het_scn - p08_het_retina) - (p08_ko_scn - p08_ko_retina), \\
p15het = (p15_het_scn - p15_het_retina), \\
p15ko = (p15_ko_scn - p15_ko_retina), \\
p15het_vs_p15ko = (p15_het_scn - p15_het_retina) - (p15_ko_scn - p15_ko_retina)")
scn_translatome_de_keepers <- list(
"p08het" = c("p08_het_scn", "p08_het_retina"),
"p08ko" = c("p08_ko_scn", "p08_ko_retina"),
"p15het" = c("p15_het_scn", "p15_het_retina"),
"p15ko" = c("p15_ko_scn", "p15_ko_retina"))
scn_translatome_keepers <- list(
"p08het" = c("p08_het_scn", "p08_het_retina"),
"p08ko" = c("p08_ko_scn", "p08_ko_retina"),
"p08_scn_translatome" = c("p08het", "p08ko"),
"p15het" = c("p15_het_scn", "p15_het_retina"),
"p15ko" = c("p15_ko_scn", "p15_ko_retina"),
"p15_scn_translatome" = c("p15het", "p15ko"))
filt <- normalize_expt(v3_pairwise_input, filter = TRUE)
## Removing 10162 low-count genes (15263 remaining).
limma_test <- limma_pairwise(filt,
keepers = scn_translatome_de_keepers,
keep_underscore = TRUE,
model_batch = FALSE, extra_contrastrs = scn_extra)
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/4: Creating table: p08_het_scn_vs_p08_het_retina. Adjust = BH
## Limma step 6/6: 2/4: Creating table: p08_ko_scn_vs_p08_ko_retina. Adjust = BH
## Limma step 6/6: 3/4: Creating table: p15_het_scn_vs_p15_het_retina. Adjust = BH
## Limma step 6/6: 4/4: Creating table: p15_ko_scn_vs_p15_ko_retina. Adjust = BH
## Limma step 6/6: 1/18: Creating table: p08_het_dlgn. Adjust = BH
## Limma step 6/6: 2/18: Creating table: p08_het_retina. Adjust = BH
## Limma step 6/6: 3/18: Creating table: p08_het_scn. Adjust = BH
## Limma step 6/6: 4/18: Creating table: p08_ko_dlgn. Adjust = BH
## Limma step 6/6: 5/18: Creating table: p08_ko_retina. Adjust = BH
## Limma step 6/6: 6/18: Creating table: p08_ko_scn. Adjust = BH
## Limma step 6/6: 7/18: Creating table: p08_wt_dlgn. Adjust = BH
## Limma step 6/6: 8/18: Creating table: p08_wt_retina. Adjust = BH
## Limma step 6/6: 9/18: Creating table: p08_wt_scn. Adjust = BH
## Limma step 6/6: 10/18: Creating table: p15_het_dlgn. Adjust = BH
## Limma step 6/6: 11/18: Creating table: p15_het_retina. Adjust = BH
## Limma step 6/6: 12/18: Creating table: p15_het_scn. Adjust = BH
## Limma step 6/6: 13/18: Creating table: p15_ko_dlgn. Adjust = BH
## Limma step 6/6: 14/18: Creating table: p15_ko_retina. Adjust = BH
## Limma step 6/6: 15/18: Creating table: p15_ko_scn. Adjust = BH
## Limma step 6/6: 16/18: Creating table: p15_wt_dlgn. Adjust = BH
## Limma step 6/6: 17/18: Creating table: p15_wt_retina. Adjust = BH
## Limma step 6/6: 18/18: Creating table: p15_wt_scn. Adjust = BH
edger_test <- edger_pairwise(filt,
keepers = scn_translatome_de_keepers,
keep_underscore = TRUE,
model_batch = FALSE, extra_contrasts = scn_extra)
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
scn_translatome_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = scn_translatome_de_keepers,
model_batch = FALSE,
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = scn_extra,
parallel = FALSE, keep_underscore = TRUE)
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 3 3 3 3 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn p15_wt_dlgn p15_wt_retina p15_wt_scn
## 4 3 3 3 3 5 5 2
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The contrast p08het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p08ko is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p08het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15ko is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15het is not in the results.
## If this is not an extra contrast, then this is an error.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/10: Creating table: p08_het_scn_vs_p08_het_retina. Adjust = BH
## Limma step 6/6: 2/10: Creating table: p08_ko_scn_vs_p08_ko_retina. Adjust = BH
## Limma step 6/6: 3/10: Creating table: p15_het_scn_vs_p15_het_retina. Adjust = BH
## Limma step 6/6: 4/10: Creating table: p15_ko_scn_vs_p15_ko_retina. Adjust = BH
## Limma step 6/6: 5/10: Creating table: p08het. Adjust = BH
## Limma step 6/6: 6/10: Creating table: p08ko. Adjust = BH
## Limma step 6/6: 7/10: Creating table: p08het_vs_p08ko. Adjust = BH
## Limma step 6/6: 8/10: Creating table: p15het. Adjust = BH
## Limma step 6/6: 9/10: Creating table: p15ko. Adjust = BH
## Limma step 6/6: 10/10: Creating table: p15het_vs_p15ko. Adjust = BH
## Limma step 6/6: 1/18: Creating table: p08_het_dlgn. Adjust = BH
## Limma step 6/6: 2/18: Creating table: p08_het_retina. Adjust = BH
## Limma step 6/6: 3/18: Creating table: p08_het_scn. Adjust = BH
## Limma step 6/6: 4/18: Creating table: p08_ko_dlgn. Adjust = BH
## Limma step 6/6: 5/18: Creating table: p08_ko_retina. Adjust = BH
## Limma step 6/6: 6/18: Creating table: p08_ko_scn. Adjust = BH
## Limma step 6/6: 7/18: Creating table: p08_wt_dlgn. Adjust = BH
## Limma step 6/6: 8/18: Creating table: p08_wt_retina. Adjust = BH
## Limma step 6/6: 9/18: Creating table: p08_wt_scn. Adjust = BH
## Limma step 6/6: 10/18: Creating table: p15_het_dlgn. Adjust = BH
## Limma step 6/6: 11/18: Creating table: p15_het_retina. Adjust = BH
## Limma step 6/6: 12/18: Creating table: p15_het_scn. Adjust = BH
## Limma step 6/6: 13/18: Creating table: p15_ko_dlgn. Adjust = BH
## Limma step 6/6: 14/18: Creating table: p15_ko_retina. Adjust = BH
## Limma step 6/6: 15/18: Creating table: p15_ko_scn. Adjust = BH
## Limma step 6/6: 16/18: Creating table: p15_wt_dlgn. Adjust = BH
## Limma step 6/6: 17/18: Creating table: p15_wt_retina. Adjust = BH
## Limma step 6/6: 18/18: Creating table: p15_wt_scn. Adjust = BH
scn_combined_test <- combine_de_tables(scn_translatome_de, keepers = scn_translatome_keepers,
excel = "excel/test_scn_translatome.xlsx")
## Deleting the file excel/test_scn_translatome.xlsx before writing the tables.
## Did not find p08ko or p08het.
## Did not find p08ko or p08het.
## Did not find p15ko or p15het.
## Did not find p15ko or p15het.
p08_dlgn_extra <- "p08het_vs_p08ko = (p08_het_dlgn - p08_het_retina) - (p08_ko_dlgn - p08_ko_retina)"
p08_dlgn_translatome_de_keepers <- list(
"p08het" = c("p08_het_dlgn", "p08_het_retina"),
"p08ko" = c("p08_ko_dlgn", "p08_ko_retina"))
p08_dlgn_translatome_keepers <- list(
"p08_het_dlgn_vs_retina" = c("p08_het_dlgn", "p08_het_retina"),
"p08_ko_dlgn_vs_retina" = c("p08_ko_dlgn", "p08_ko_retina"),
"p08_dlgn_translatome" = c("p08het", "p08ko"))
p08_dlgn_translatome_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = p08_dlgn_translatome_keepers,
model_batch = FALSE,
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = p08_dlgn_extra,
parallel = FALSE, keep_underscore = TRUE)
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 3 3 3 3 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn p15_wt_dlgn p15_wt_retina p15_wt_scn
## 4 3 3 3 3 5 5 2
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## Error in eval(ej, envir = levelsenv) : object 'p08het' not found
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## Error in eval(ej, envir = levelsenv) : object 'p08het' not found
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Error in eval(ej, envir = levelsenv) : object 'p08het' not found
## Error in retlst[[meth]]: attempt to select less than one element in get1index
p08_dlgn_combined_test <- combine_de_tables(p08_dlgn_translatome_de, keepers = p08_dlgn_translatome_keepers,
excel = "excel/test_p08_dlgn_translatome.xlsx")
## Error in eval(expr, envir, enclos): object 'p08_dlgn_translatome_de' not found
time_scn_extra <- glue("\\
p15het = (p15_het_scn - p15_het_retina), \\
p08het = (p08_het_scn - p08_het_retina), \\
p15het_vs_p08het = (p15_het_scn - p15_het_retina) - (p08_het_scn - p08_het_retina),
p15ko = (p15_ko_scn - p15_ko_retina), \\
p08ko = (p08_ko_scn - p08_ko_retina), \\
p15ko_vs_p08ko = (p15_ko_scn - p15_ko_retina) - (p08_ko_scn - p08_ko_retina)")
time_scn_translatome_de_keepers <- list(
"p15het" = c("p15_het_scn", "p15_het_retina"),
"p08het" = c("p08_het_scn", "p08_het_retina"),
"p15ko" = c("p15_ko_scn", "p15_ko_retina"),
"p08ko" = c("p08_ko_scn", "p08_ko_retina"))
time_scn_translatome_keepers <- list(
"p15_het_sc_vs_retina" = c("p15_het_scn", "p15_het_retina"),
"p08_het_sc_vs_retina" = c("p08_het_scn", "p08_het_retina"),
"scn_het_translatome" = c("p15het", "p08het"),
"scn_ko_translatome" = c("p15ko", "p08ko"))
time_scn_translatome_de <- all_pairwise(v3_pairwise_input, filter = TRUE,
keepers = time_scn_translatome_de_keepers,
model_batch = FALSE,
do_basic = FALSE, do_dream = FALSE,
do_noiseq = FALSE, do_ebseq = FALSE,
extra_contrasts = time_scn_extra,
parallel = FALSE, keep_underscore = TRUE)
##
## p08_het_dlgn p08_het_retina p08_het_scn p08_ko_dlgn p08_ko_retina p08_ko_scn p08_wt_dlgn p08_wt_retina p08_wt_scn p15_het_dlgn
## 3 3 3 3 3 3 5 5 3 4
## p15_het_retina p15_het_scn p15_ko_dlgn p15_ko_retina p15_ko_scn p15_wt_dlgn p15_wt_retina p15_wt_scn
## 4 3 3 3 3 5 5 2
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## converting counts to integer mode
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## The contrast p15het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p08het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15het is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15ko is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p08ko is not in the results.
## If this is not an extra contrast, then this is an error.
## The contrast p15ko is not in the results.
## If this is not an extra contrast, then this is an error.
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq/EBSeq.
## If they freak out, check the state of the count table
## and ensure that it is in integer counts.
## EdgeR step 1/9: Importing and normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'edger_test'.
## EdgeR step 8/9: Making pairwise contrasts.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Limma step 1/6: choosing model.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Using normalize.method = quantile for voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Finished make_pairwise_contrasts.
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/10: Creating table: p15_het_scn_vs_p15_het_retina. Adjust = BH
## Limma step 6/6: 2/10: Creating table: p08_het_scn_vs_p08_het_retina. Adjust = BH
## Limma step 6/6: 3/10: Creating table: p15_ko_scn_vs_p15_ko_retina. Adjust = BH
## Limma step 6/6: 4/10: Creating table: p08_ko_scn_vs_p08_ko_retina. Adjust = BH
## Limma step 6/6: 5/10: Creating table: p15het. Adjust = BH
## Limma step 6/6: 6/10: Creating table: p08het. Adjust = BH
## Limma step 6/6: 7/10: Creating table: p15het_vs_p08het. Adjust = BH
## Limma step 6/6: 8/10: Creating table: p15ko. Adjust = BH
## Limma step 6/6: 9/10: Creating table: p08ko. Adjust = BH
## Limma step 6/6: 10/10: Creating table: p15ko_vs_p08ko. Adjust = BH
## Limma step 6/6: 1/18: Creating table: p08_het_dlgn. Adjust = BH
## Limma step 6/6: 2/18: Creating table: p08_het_retina. Adjust = BH
## Limma step 6/6: 3/18: Creating table: p08_het_scn. Adjust = BH
## Limma step 6/6: 4/18: Creating table: p08_ko_dlgn. Adjust = BH
## Limma step 6/6: 5/18: Creating table: p08_ko_retina. Adjust = BH
## Limma step 6/6: 6/18: Creating table: p08_ko_scn. Adjust = BH
## Limma step 6/6: 7/18: Creating table: p08_wt_dlgn. Adjust = BH
## Limma step 6/6: 8/18: Creating table: p08_wt_retina. Adjust = BH
## Limma step 6/6: 9/18: Creating table: p08_wt_scn. Adjust = BH
## Limma step 6/6: 10/18: Creating table: p15_het_dlgn. Adjust = BH
## Limma step 6/6: 11/18: Creating table: p15_het_retina. Adjust = BH
## Limma step 6/6: 12/18: Creating table: p15_het_scn. Adjust = BH
## Limma step 6/6: 13/18: Creating table: p15_ko_dlgn. Adjust = BH
## Limma step 6/6: 14/18: Creating table: p15_ko_retina. Adjust = BH
## Limma step 6/6: 15/18: Creating table: p15_ko_scn. Adjust = BH
## Limma step 6/6: 16/18: Creating table: p15_wt_dlgn. Adjust = BH
## Limma step 6/6: 17/18: Creating table: p15_wt_retina. Adjust = BH
## Limma step 6/6: 18/18: Creating table: p15_wt_scn. Adjust = BH
time_scn_translatome_test <- combine_de_tables(trime_scn_translatome_de,
keepers = time_scn_translatome_keepers,
excel = "excel/test_time_scn_translatome.xlsx")
## Error in eval(expr, envir, enclos): object 'trime_scn_translatome_de' not found