This document turns to the infection of PBMC cells with L.panamensis. This data is particularly strangely affected by the different strains used to infect the cells, and as a result is both useful and troubling.
Given the observations above, we have some ideas of ways to pass the data for differential expression analyses which may or may not be ‘better’. Lets try some and see what happens.
Given the above ways to massage the data, lets use a few of them for limma/deseq/edger. The main caveat in this is that those tools really do expect specific distributions of data which we horribly violate if we use log2() data, which is why in the previous blocks I named them l2blahblah, thus we can do the same sets of normalization but without that and forcibly push the resulting data into limma/edger/deseq.
Everything I did in 02_estimation_infection.html suggests that there are no significant differences visible if one looks just at chronic/self-healing in this data. Further testing has seemingly proven this statement, as a result most of the following analyses will look at chronic/uninfected and self-healing/uninfected followed by attempts to reconcile those results.
To save some time and annoyance with sva, lets filter the data now. In addition, write down a small function used to extract the sets of significant genes across different contrasts (notably self/uninfected vs. chronic/uninfected).
excel_file <- glue::glue("excel/{rundate}_hs_infect_patbatch_contr-v{ver}.xlsx")
hs_combined_batch <- sm(combine_de_tables(
hs_pairwise_batch,
excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_patbatch_sig-v{ver}.xlsx")
hs_sig_batch <- sm(extract_significant_genes(
hs_combined_batch,
excel=excel_file))
hs_sig_batch[["deseq"]][["counts"]]
## change_counts_up change_counts_down
## sh_nil 906 298
## ch_nil 932 279
## ch_sh 0 0
excel_file <- glue::glue("excel/{rundate}_hs_infect_sva_contr-v{ver}.xlsx")
hs_combined_sva <- sm(combine_de_tables(
hs_pairwise_sva,
excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_sva_sig-v{ver}.xlsx")
hs_sig_sva <- sm(extract_significant_genes(
hs_combined_sva,
excel=excel_file))
hs_sig_sva$deseq$counts
## change_counts_up change_counts_down
## sh_nil 877 330
## ch_nil 899 327
## ch_sh 0 0
At this point, we should see that there are no significant differences between the chronic and self-healing samples when we look at all samples. The following will attempt to query why this is the case and decide on what to do about it.
While sitting with Hector in 201812, we ended up focusing on the distribution of p-values observed when performing a chronic vs. self-healing comparison. This was eventually expanded to include that distribution for each of the three individual donors.
## There were 18, now there are 6 samples.
d107_pairwise <- sm(all_pairwise(d107, model_batch=FALSE))
d107_table <- sm(combine_de_tables(d107_pairwise))
summary(d107_table$data)
## Length Class Mode
## sh_vs_chr 46 data.frame list
d107_sig <- sm(extract_significant_genes(d107_table, according_to="deseq", p=0.1, p_type="raw"))
dim(d107_sig[["deseq"]][["ups"]][[1]])
## [1] 5 46
pvalues <- sm(plot_de_pvals(d107_table[["data"]][["sh_vs_chr"]], type="deseq", p_type="raw"))
pvalues
## There were 18, now there are 6 samples.
d108_pairwise <- sm(all_pairwise(d108, model_batch=FALSE))
d108_table <- sm(combine_de_tables(d108_pairwise))
summary(d108_table$data)
## Length Class Mode
## sh_vs_chr 46 data.frame list
d108_sig <- sm(extract_significant_genes(d108_table, according_to="deseq", p=0.1, p_type="raw"))
head(d108_sig$deseq$ups[[1]])
## ensembltranscriptid ensemblgeneid hgncsymbol
## ENSG00000119457 ENST00000374228 ENSG00000119457 SLC46A2
## ENSG00000162493 ENST00000294489 ENSG00000162493 PDPN
## description
## ENSG00000119457 solute carrier family 46 member 2 [Source:HGNC Symbol;Acc:HGNC:16055]
## ENSG00000162493 podoplanin [Source:HGNC Symbol;Acc:HGNC:29602]
## genebiotype deseq_logfc deseq_adjp edger_logfc
## ENSG00000119457 protein_coding 1.124 0.9999 1.110
## ENSG00000162493 protein_coding 1.054 0.9999 1.047
## edger_adjp limma_logfc limma_adjp basic_nummed
## ENSG00000119457 1 1.0310 0.9984 0.8026
## ENSG00000162493 1 0.9181 0.9984 1.5200
## basic_denmed basic_numvar basic_denvar basic_logfc basic_t
## ENSG00000119457 -0.09514 2.568e-01 6.194e-02 0.8977 3.281
## ENSG00000162493 0.80260 9.344e-01 5.950e-01 0.7173 1.414
## basic_p basic_adjp deseq_basemean deseq_lfcse deseq_stat
## ENSG00000119457 4.843e-02 1e+00 14.09 0.5510 2.039
## ENSG00000162493 2.335e-01 1e+00 22.92 0.5347 1.971
## deseq_p ebseq_fc ebseq_logfc ebseq_postfc ebseq_mean
## ENSG00000119457 0.04144 2.246 1.167 2.187 14.19
## ENSG00000162493 0.04873 2.177 1.122 2.143 23.12
## ebseq_ppee ebseq_ppde ebseq_adjp edger_logcpm edger_lr
## ENSG00000119457 1 2.896e-05 1 0.5923 6.374
## ENSG00000162493 1 4.423e-06 1 1.2200 4.785
## edger_p limma_ave limma_t limma_b limma_p limma_adjp_fdr
## ENSG00000119457 0.01158 0.2681 2.918 -4.453 0.01789 9.984e-01
## ENSG00000162493 0.02871 0.8417 1.655 -4.532 0.13370 9.984e-01
## deseq_adjp_fdr edger_adjp_fdr basic_adjp_fdr lfc_meta
## ENSG00000119457 1.000e+00 1.000e+00 1e+00 1.117
## ENSG00000162493 1.000e+00 1.000e+00 1e+00 1.017
## lfc_var lfc_varbymed p_meta p_var
## ENSG00000119457 0.000e+00 0.000e+00 2.364e-02 2.477e-04
## ENSG00000162493 3.373e-03 3.317e-03 7.038e-02 3.107e-03
pvalues <- sm(plot_de_pvals(d108_table[["data"]][["sh_vs_chr"]], type="deseq", p_type="raw"))
pvalues
## There were 18, now there are 6 samples.
d110_pairwise <- sm(all_pairwise(d110, model_batch=FALSE))
d110_table <- sm(combine_de_tables(d110_pairwise))
summary(d110_table$data)
## Length Class Mode
## sh_vs_chr 46 data.frame list
d110_sig <- sm(extract_significant_genes(d110_table, according_to="deseq", p=0.1, p_type="raw"))
head(d110_sig$deseq$ups[[1]])
## ensembltranscriptid ensemblgeneid hgncsymbol
## ENSG00000143333 ENST00000367558 ENSG00000143333 RGS16
## ENSG00000120436 ENST00000366834 ENSG00000120436 GPR31
## ENSG00000099985 ENST00000215781 ENSG00000099985 OSM
## ENSG00000137834 ENST00000288840 ENSG00000137834 SMAD6
## ENSG00000177807 ENST00000368089 ENSG00000177807 KCNJ10
## ENSG00000114115 ENST00000232219 ENSG00000114115 RBP1
## description
## ENSG00000143333 regulator of G-protein signaling 16 [Source:HGNC Symbol;Acc:HGNC:9997]
## ENSG00000120436 G protein-coupled receptor 31 [Source:HGNC Symbol;Acc:HGNC:4486]
## ENSG00000099985 oncostatin M [Source:HGNC Symbol;Acc:HGNC:8506]
## ENSG00000137834 SMAD family member 6 [Source:HGNC Symbol;Acc:HGNC:6772]
## ENSG00000177807 potassium voltage-gated channel subfamily J member 10 [Source:HGNC Symbol;Acc:HGNC:6256]
## ENSG00000114115 retinol binding protein 1 [Source:HGNC Symbol;Acc:HGNC:9919]
## genebiotype deseq_logfc deseq_adjp edger_logfc
## ENSG00000143333 protein_coding 1.510 6.358e-05 1.4950
## ENSG00000120436 protein_coding 1.478 1.000e+00 1.4550
## ENSG00000099985 protein_coding 1.226 5.618e-06 1.2100
## ENSG00000137834 protein_coding 1.155 1.000e+00 1.1340
## ENSG00000177807 protein_coding 1.035 1.593e-01 1.0190
## ENSG00000114115 protein_coding 1.012 1.000e+00 0.9872
## edger_adjp limma_logfc limma_adjp basic_nummed
## ENSG00000143333 4.147e-06 1.5450 0.2282 2.7930
## ENSG00000120436 5.293e-02 1.4270 0.2438 1.1730
## ENSG00000099985 1.314e-06 1.2660 0.2282 5.6940
## ENSG00000137834 8.498e-02 1.1040 0.2282 0.7448
## ENSG00000177807 1.213e-01 0.9465 0.3169 3.0800
## ENSG00000114115 1.662e-01 1.1950 0.2322 1.0860
## basic_denmed basic_numvar basic_denvar basic_logfc basic_t
## ENSG00000143333 1.47500 1.042e-01 1.044e-01 1.319 5.790
## ENSG00000120436 -0.77380 8.287e-01 1.687e-01 1.946 2.391
## ENSG00000099985 4.19700 6.884e-02 1.701e-01 1.497 4.498
## ENSG00000137834 -0.53530 1.577e-01 9.161e-02 1.280 3.796
## ENSG00000177807 1.91800 9.209e-01 1.252e-01 1.162 1.501
## ENSG00000114115 -0.04987 1.772e-02 8.064e-01 1.136 2.286
## basic_p basic_adjp deseq_basemean deseq_lfcse deseq_stat
## ENSG00000143333 4.422e-03 4.809e-01 68.76 0.2758 5.477
## ENSG00000120436 1.034e-01 4.811e-01 16.89 0.5436 2.719
## ENSG00000099985 1.573e-02 4.809e-01 484.20 0.2032 6.031
## ENSG00000137834 2.169e-02 4.809e-01 15.79 0.4878 2.368
## ENSG00000177807 2.461e-01 5.601e-01 73.35 0.3745 2.765
## ENSG00000114115 1.442e-01 4.928e-01 21.96 0.4624 2.188
## deseq_p ebseq_fc ebseq_logfc ebseq_postfc ebseq_mean
## ENSG00000143333 4.315e-08 2.874 1.523 2.851 69.31
## ENSG00000120436 6.544e-03 2.807 1.489 2.719 16.93
## ENSG00000099985 1.634e-09 2.396 1.261 2.394 488.89
## ENSG00000137834 1.787e-02 2.256 1.174 2.201 15.95
## ENSG00000177807 5.700e-03 2.060 1.043 2.050 73.55
## ENSG00000114115 2.866e-02 2.073 1.051 2.040 22.14
## ebseq_ppee ebseq_ppde ebseq_adjp edger_logcpm edger_lr
## ENSG00000143333 4.467e-10 1.000e+00 4.467e-10 2.2710 36.250
## ENSG00000120436 2.182e-01 7.818e-01 2.182e-01 0.3850 11.210
## ENSG00000099985 1.874e-06 1.000e+00 1.874e-06 5.0590 39.490
## ENSG00000137834 1.000e+00 1.816e-12 1.000e+00 0.2622 9.693
## ENSG00000177807 7.014e-01 2.986e-01 7.014e-01 2.3790 8.339
## ENSG00000114115 2.122e-01 7.878e-01 2.122e-01 0.7058 6.987
## edger_p limma_ave limma_t limma_b limma_p
## ENSG00000143333 1.736e-09 1.99000 6.492 -0.2393 0.0002045
## ENSG00000120436 8.155e-04 -0.09402 2.825 -3.3940 0.0227200
## ENSG00000099985 3.300e-10 4.85000 5.981 0.5318 0.0003530
## ENSG00000137834 1.850e-03 -0.02053 3.616 -2.8330 0.0070230
## ENSG00000177807 3.880e-03 2.09500 2.052 -3.9700 0.0748700
## ENSG00000114115 8.210e-03 0.38370 3.042 -3.1270 0.0163500
## limma_adjp_fdr deseq_adjp_fdr edger_adjp_fdr
## ENSG00000143333 2.282e-01 7.363e-05 4.147e-06
## ENSG00000120436 2.438e-01 1.888e-01 5.294e-02
## ENSG00000099985 2.282e-01 6.505e-06 1.314e-06
## ENSG00000137834 2.282e-01 2.751e-01 8.499e-02
## ENSG00000177807 3.169e-01 1.816e-01 1.213e-01
## ENSG00000114115 2.322e-01 3.192e-01 1.662e-01
## basic_adjp_fdr lfc_meta lfc_var lfc_varbymed p_meta
## ENSG00000143333 4.809e-01 1.5280 1.951e-03 1.277e-03 6.818e-05
## ENSG00000120436 4.811e-01 1.4780 4.320e-04 2.922e-04 1.003e-02
## ENSG00000099985 4.809e-01 1.3010 2.058e-02 1.582e-02 1.177e-04
## ENSG00000137834 4.809e-01 1.1050 4.602e-03 4.164e-03 8.914e-03
## ENSG00000177807 5.600e-01 0.9865 4.913e-03 4.980e-03 2.815e-02
## ENSG00000114115 4.928e-01 1.0720 1.590e-02 1.483e-02 1.774e-02
## p_var
## ENSG00000143333 1.394e-08
## ENSG00000120436 1.290e-04
## ENSG00000099985 4.154e-08
## ENSG00000137834 6.684e-05
## ENSG00000177807 1.638e-03
## ENSG00000114115 1.060e-04
pvalues <- sm(plot_de_pvals(d110_table[["data"]][["sh_vs_chr"]], type="deseq", p_type="raw"))
pvalues
hs_tmp <- set_expt_batches(hs_inf_filt, fact="pathogenstrain")
hs_tmp2 <- sm(normalize_expt(hs_tmp, transform="log2", convert="cpm", norm="quant", filter=TRUE))
plot_pca(hs_tmp2)$plot
The above plots suggest, along with the PCA plots of all samples, that one or more strains are either switched or at least very problematic. Specifically, samples from the strains annotated 2504 and 2272. Therefore, we next embarked on a series of comparisons of what happens when we remove each of them individually, and then both.
This block will first remove strain 2504, then 2272. The resulting subsets will be used for another round of these differential expression analyses.
In this block we will perform the various removals, creating ‘remove_2504’, ‘remove_2272’, and ‘remove_both’. In addition, there are versions of this with and without the uninfected samples. We eventually decided to use the data with the uninfected samples for our final analyses; but we will have some pca etc. plots without them first because the uninfected make it harder to see the distributions because they are so very different than the infected.
## There were 18, now there are 15 samples.
remove_2504_inf_filt <- sm(normalize_expt(remove_2504_inf, filter=TRUE))
remove_2272_inf <- subset_expt(hs_inf_filt, subset="pathogenstrain!='s2272'")
## There were 18, now there are 15 samples.
remove_2272_inf_filt <- sm(normalize_expt(remove_2272_inf, filter=TRUE))
remove_both_inf <- subset_expt(remove_2504_inf, subset="pathogenstrain!='s2272'")
## There were 15, now there are 12 samples.
remove_both_inf_filt <- sm(normalize_expt(remove_both_inf, filter=TRUE))
remove_2504_uninf <- subset_expt(hs_uninf_filt, subset="pathogenstrain!='s2504'")
## There were 21, now there are 18 samples.
remove_2504_uninf_filt <- sm(normalize_expt(remove_2504_uninf, filter=TRUE))
remove_2272_uninf <- subset_expt(hs_uninf_filt, subset="pathogenstrain!='s2272'")
## There were 21, now there are 18 samples.
remove_2272_uninf_filt <- sm(normalize_expt(remove_2272_uninf, filter=TRUE))
remove_both_uninf <- subset_expt(remove_2504_uninf, subset="pathogenstrain!='s2272'")
## There were 18, now there are 15 samples.
remove_both_uninf_filt <- sm(normalize_expt(remove_both_uninf, filter=TRUE))
remove_2504_d107 <- subset_expt(remove_2504_inf, subset="donor=='d107'")
## There were 15, now there are 5 samples.
## There were 15, now there are 5 samples.
## There were 15, now there are 5 samples.
## There were 15, now there are 5 samples.
## There were 15, now there are 5 samples.
## There were 15, now there are 5 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
Strain 2504 was the first thing Hector focused upon, so let us perform a few metrics without it.
remove_2504_inf_norm <- sm(normalize_expt(remove_2504_inf_filt, transform="log2",
convert="cpm", norm="quant"))
plot_pca(remove_2504_inf_norm)$plot
remove_2504_inf_norm <- sm(normalize_expt(remove_2504_inf_filt, transform="log2",
batch="svaseq",
norm="quant"))
plot_pca(remove_2504_inf_norm)$plot
remove_2504_d107_filt <- sm(normalize_expt(remove_2504_d107, filter=TRUE))
remove_2504_d107_de <- sm(all_pairwise(remove_2504_d107_filt, model_batch="svaseq"))
remove_2504_d107_table <- sm(combine_de_tables(remove_2504_d107_de))
pvalues <- plot_de_pvals(remove_2504_d107_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_2504_d108_filt <- sm(normalize_expt(remove_2504_d108, filter=TRUE))
remove_2504_d108_de <- sm(all_pairwise(remove_2504_d108_filt, model_batch="svaseq"))
remove_2504_d108_table <- sm(combine_de_tables(remove_2504_d108_de))
pvalues <- plot_de_pvals(remove_2504_d108_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_2504_d110_filt <- sm(normalize_expt(remove_2504_d110, filter=TRUE))
remove_2504_d110_de <- sm(all_pairwise(remove_2504_d110_filt, model_batch="svaseq"))
remove_2504_d110_table <- sm(combine_de_tables(remove_2504_d110_de))
pvalues <- plot_de_pvals(remove_2504_d110_table$data[[1]], type="deseq", p_type="raw")
pvalues
The above should give us a clue as to whether removing sample 2504 did anything helpful to the resulting distribution of p-values. Let us now bring the donors back together and see how that looks.
Given the clustering of the samples after removing the 2504 strain samples, let us now perform the pairwise analysis and see how it looks.
Note that these are performed with the uninfected samples while the previous metrics were without them. This will include one round with batch in the model and one round using svaseq. In addition, we are relaxing the log fold-change and p-value constraints to 0.6 and 0.1 respectively.
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2504uninf_batchmodel_contr-v{ver}.xlsx")
remove_2504_uninf_table <- sm(combine_de_tables(remove_2504_uninf_de, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2504uninf_batchmodel_sig-v{ver}.xlsx")
remove_2504_uninf_sig <- sm(extract_significant_genes(remove_2504_uninf_table, excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
remove_2504_uninf_de_sva <- sm(all_pairwise(remove_2504_uninf_filt, model_batch="svaseq"))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2504uninf_svaseq_contr-v{ver}.xlsx")
remove_2504_uninf_table_sva <- sm(combine_de_tables(remove_2504_uninf_de_sva, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2504uninf_svaseq_sig-v{ver}.xlsx")
remove_2504_uninf_sig_sva <- sm(extract_significant_genes(remove_2504_uninf_table_sva,
excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
This should be an exact repetition of the 2504 removal above, so I removed the commentary.
remove_2272_inf_norm <- sm(normalize_expt(remove_2272_inf_filt, transform="log2",
convert="cpm", norm="quant"))
plot_pca(remove_2272_inf_norm)$plot
remove_2272_inf_norm <- sm(normalize_expt(remove_2272_inf_filt, transform="log2",
batch="svaseq",
norm="quant"))
plot_pca(remove_2272_inf_norm)$plot
remove_2272_d107_filt <- sm(normalize_expt(remove_2272_d107, filter=TRUE))
remove_2272_d107_de <- sm(all_pairwise(remove_2272_d107_filt, model_batch="svaseq"))
remove_2272_d107_table <- sm(combine_de_tables(remove_2272_d107_de))
pvalues <- plot_de_pvals(remove_2272_d107_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_2272_d108_filt <- sm(normalize_expt(remove_2272_d108, filter=TRUE))
remove_2272_d108_de <- sm(all_pairwise(remove_2272_d108_filt, model_batch="svaseq"))
remove_2272_d108_table <- sm(combine_de_tables(remove_2272_d108_de))
pvalues <- plot_de_pvals(remove_2272_d108_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_2272_d110_filt <- sm(normalize_expt(remove_2272_d110, filter=TRUE))
remove_2272_d110_de <- sm(all_pairwise(remove_2272_d110_filt, model_batch="svaseq"))
remove_2272_d110_table <- sm(combine_de_tables(remove_2272_d110_de))
pvalues <- plot_de_pvals(remove_2272_d110_table$data[[1]], type="deseq", p_type="raw")
pvalues
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2272uninf_batchmodel_contr-v{ver}.xlsx")
remove_2272_uninf_table <- sm(combine_de_tables(remove_2272_uninf_de, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2272uninf_batchmodel_sig-v{ver}.xlsx")
remove_2272_uninf_sig <- sm(extract_significant_genes(remove_2272_uninf_table, excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
remove_2272_uninf_de_sva <- sm(all_pairwise(remove_2272_uninf_filt, model_batch="svaseq"))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2272uninf_svaseq_contr-v{ver}.xlsx")
remove_2272_uninf_table_sva <- sm(combine_de_tables(remove_2272_uninf_de_sva, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2272uninf_svaseq_sig-v{ver}.xlsx")
remove_2272_uninf_sig_sva <- sm(extract_significant_genes(remove_2272_uninf_table_sva,
excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
This should be an exact repetition of the 2504/2272 removals above, so I removed the commentary.
remove_both_inf_norm <- sm(normalize_expt(remove_both_inf_filt, transform="log2",
convert="cpm", norm="quant"))
plot_pca(remove_both_inf_norm)$plot
remove_both_inf_norm <- sm(normalize_expt(remove_both_inf_filt, transform="log2",
batch="svaseq",
norm="quant"))
plot_pca(remove_both_inf_norm)$plot
remove_both_d107_filt <- sm(normalize_expt(remove_both_d107, filter=TRUE))
remove_both_d107_de <- sm(all_pairwise(remove_both_d107_filt, model_batch="svaseq"))
remove_both_d107_table <- sm(combine_de_tables(remove_both_d107_de))
pvalues <- plot_de_pvals(remove_both_d107_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_both_d108_filt <- sm(normalize_expt(remove_both_d108, filter=TRUE))
remove_both_d108_de <- sm(all_pairwise(remove_both_d108_filt, model_batch="svaseq"))
remove_both_d108_table <- sm(combine_de_tables(remove_both_d108_de))
pvalues <- plot_de_pvals(remove_both_d108_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_both_d110_filt <- sm(normalize_expt(remove_both_d110, filter=TRUE))
remove_both_d110_de <- sm(all_pairwise(remove_both_d110_filt, model_batch="svaseq"))
remove_both_d110_table <- sm(combine_de_tables(remove_both_d110_de))
pvalues <- plot_de_pvals(remove_both_d110_table$data[[1]], type="deseq", p_type="raw")
pvalues
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_batchmodel_contr-v{ver}.xlsx")
remove_both_uninf_table <- sm(combine_de_tables(remove_both_uninf_de, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_batchmodel_sig-v{ver}.xlsx")
remove_both_uninf_sig <- sm(extract_significant_genes(remove_both_uninf_table, excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
remove_both_uninf_de_sva <- sm(all_pairwise(remove_both_uninf_filt, model_batch="svaseq"))
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_svaseq_contr-v{ver}.xlsx")
remove_both_uninf_table_sva <- sm(combine_de_tables(remove_both_uninf_de_sva, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_svaseq_sig-v{ver}.xlsx")
remove_both_uninf_sig_sva <- sm(extract_significant_genes(remove_both_uninf_table_sva,
excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
remove_both_uninf_sig_sva$deseq$counts
## change_counts_up change_counts_down
## sh_nil 1511 1012
## ch_nil 1568 1096
## ch_sh 43 36
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_svaseq_sig-v{ver}_normal.xlsx")
remove_both_uninf_sig_sva_normal <- sm(extract_significant_genes(remove_both_uninf_table_sva,
excel=excel_file))
remove_both_uninf_sig_sva_normal$deseq$counts
## change_counts_up change_counts_down
## sh_nil 866 296
## ch_nil 953 401
## ch_sh 4 14
Now that we have performed a set of analyses looking at the various combinations of strains and donors, let us look at how similar are the distributions of logFC and rank orders.
compare <- sm(compare_de_results(d107_table, d108_table,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_vs_chr
## $sh_vs_chr$logfc
## [1] 0.04986
##
## $sh_vs_chr$p
## [1] -0.01345
##
## $sh_vs_chr$adjp
## [1] 0.04919
## Holy crapola!
compare <- sm(compare_de_results(d107_table, d110_table,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_vs_chr
## $sh_vs_chr$logfc
## [1] 0.5236
##
## $sh_vs_chr$p
## [1] 0.2382
##
## $sh_vs_chr$adjp
## [1] 0.0238
compare <- sm(compare_de_results(d108_table, d110_table,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_vs_chr
## $sh_vs_chr$logfc
## [1] -0.1194
##
## $sh_vs_chr$p
## [1] 0.003014
##
## $sh_vs_chr$adjp
## [1] 0.04387
Wow I had forgotten how ridiculously different the 3 donors are.
compare <- compare_de_results(remove_2272_uninf_table, remove_both_uninf_table,
cor_method="spearman", try_methods="deseq")
## Testing method: deseq.
## Adding method: deseq to the set.
## $sh_nil
## $sh_nil$logfc
## [1] 0.9999
##
## $sh_nil$p
## [1] 0.9953
##
## $sh_nil$adjp
## [1] 0.9953
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 0.9923
##
## $ch_nil$p
## [1] 0.9739
##
## $ch_nil$adjp
## [1] 0.9739
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 0.9257
##
## $ch_sh$p
## [1] 0.796
##
## $ch_sh$adjp
## [1] 0.7625
compare <- compare_de_results(remove_2504_uninf_table, remove_both_uninf_table,
cor_method="spearman", try_methods="deseq")
## Testing method: deseq.
## Adding method: deseq to the set.
## $sh_nil
## $sh_nil$logfc
## [1] 0.9917
##
## $sh_nil$p
## [1] 0.9724
##
## $sh_nil$adjp
## [1] 0.9724
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 0.9999
##
## $ch_nil$p
## [1] 0.9974
##
## $ch_nil$adjp
## [1] 0.9974
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 0.9508
##
## $ch_sh$p
## [1] 0.8686
##
## $ch_sh$adjp
## [1] 0.7791
compare <- compare_de_results(remove_2272_uninf_table_sva, remove_both_uninf_table_sva,
cor_method="spearman", try_methods="deseq")
## Testing method: deseq.
## Adding method: deseq to the set.
## $sh_nil
## $sh_nil$logfc
## [1] 0.997
##
## $sh_nil$p
## [1] 0.9878
##
## $sh_nil$adjp
## [1] 0.9878
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 0.9889
##
## $ch_nil$p
## [1] 0.9665
##
## $ch_nil$adjp
## [1] 0.9665
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 0.9125
##
## $ch_sh$p
## [1] 0.7655
##
## $ch_sh$adjp
## [1] 0.7843
compare <- compare_de_results(remove_2504_uninf_table_sva, remove_both_uninf_table_sva,
cor_method="spearman", try_methods="deseq")
## Testing method: deseq.
## Adding method: deseq to the set.
## $sh_nil
## $sh_nil$logfc
## [1] 0.9893
##
## $sh_nil$p
## [1] 0.9674
##
## $sh_nil$adjp
## [1] 0.9674
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 0.9994
##
## $ch_nil$p
## [1] 0.9965
##
## $ch_nil$adjp
## [1] 0.9965
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 0.9553
##
## $ch_sh$p
## [1] 0.8818
##
## $ch_sh$adjp
## [1] 0.8223
compare <- sm(compare_de_results(remove_both_uninf_table, hs_combined_batch,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_nil
## $sh_nil$logfc
## [1] 0.9917
##
## $sh_nil$p
## [1] 0.9699
##
## $sh_nil$adjp
## [1] 0.9699
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 0.9926
##
## $ch_nil$p
## [1] 0.9728
##
## $ch_nil$adjp
## [1] 0.9728
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 0.8154
##
## $ch_sh$p
## [1] 0.5951
##
## $ch_sh$adjp
## [1] 0.542
compare <- sm(compare_de_results(remove_both_uninf_table_sva, hs_combined_sva,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_nil
## $sh_nil$logfc
## [1] 0.9903
##
## $sh_nil$p
## [1] 0.9678
##
## $sh_nil$adjp
## [1] 0.9678
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 0.9919
##
## $ch_nil$p
## [1] 0.9717
##
## $ch_nil$adjp
## [1] 0.9717
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 0.8215
##
## $ch_sh$p
## [1] 0.5961
##
## $ch_sh$adjp
## [1] 0.5484
lp_pairwise_nobatch <- sm(all_pairwise(lp_inf_filt, model_batch=FALSE,
do_ebseq=FALSE))
excel_file <- glue::glue("excel/{rundate}_lp_infect_nobatch_contr-v{ver}.xlsx")
lp_combined_nobatch <- sm(combine_de_tables(lp_pairwise_nobatch, excel=excel_file))
excel_file <- glue::glue("excel/{rundate}_lp_infect_nobatch_sig-v{ver}.xlsx")
lp_sig_nobatch <- sm(extract_significant_genes(lp_combined_nobatch, excel=excel_file))
lp_pairwise_batch <- sm(all_pairwise(lp_inf_filt, model_batch=TRUE))
excel_file <- glue::glue("excel/{rundate}_lp_infect_batch_contr-v{ver}.xlsx")
lp_combined_batch <- sm(combine_de_tables(lp_pairwise_batch, excel=excel_file))
excel_file <- glue::glue("excel/{rundate}_lp_infect_batch_sig-v{ver}.xlsx")
lp_sig_batch <- sm(extract_significant_genes(lp_combined_batch, excel=excel_file))
lp_pairwise_ssva <- sm(all_pairwise(lp_inf_filt, model_batch="ssva"))
excel_file <- glue::glue("excel/{rundate}_lp_infect_ssva_contr-v{ver}.xlsx")
lp_combined_ssva <- sm(combine_de_tables(lp_pairwise_ssva, excel=excel_file))
excel_file <- glue::glue("excel/{rundate}_lp_infect_ssva_sig-v{ver}.xlsx")
lp_sig_ssva <- sm(extract_significant_genes(lp_combined_ssva, excel=excel_file))
lp_pairwise_fsva <- sm(all_pairwise(lp_inf_filt, model_batch="fsva"))
excel_file <- glue::glue("excel/{rundate}_lp_infect_fsva_contr-v{ver}.xlsx")
lp_combined_fsva <- sm(combine_de_tables(lp_pairwise_fsva, excel=excel_file))