This document turns to the infection of PBMC cells with L.panamensis. This data is particularly strangely affected by the different strains used to infect the cells, and as a result is both useful and troubling.
Given the observations above, we have some ideas of ways to pass the data for differential expression analyses which may or may not be ‘better’. Lets try some and see what happens.
Given the above ways to massage the data, lets use a few of them for limma/deseq/edger. The main caveat in this is that those tools really do expect specific distributions of data which we horribly violate if we use log2() data, which is why in the previous blocks I named them l2blahblah, thus we can do the same sets of normalization but without that and forcibly push the resulting data into limma/edger/deseq.
Everything I did in 02_estimation_infection.html suggests that there are no significant differences visible if one looks just at chronic/self-healing in this data. Further testing has seemingly proven this statement, as a result most of the following analyses will look at chronic/uninfected and self-healing/uninfected followed by attempts to reconcile those results.
To save some time and annoyance with sva, lets filter the data now. In addition, write down a small function used to extract the sets of significant genes across different contrasts (notably self/uninfected vs. chronic/uninfected).
excel_file <- glue::glue("excel/{rundate}_hs_infect_patbatch_contr-v{ver}.xlsx")
hs_combined_batch <- sm(combine_de_tables(
hs_pairwise_batch,
excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_patbatch_sig-v{ver}.xlsx")
hs_sig_batch <- sm(extract_significant_genes(
hs_combined_batch,
excel=excel_file))
hs_sig_batch[["deseq"]][["counts"]]
## change_counts_up change_counts_down
## sh_nil 883 280
## ch_nil 958 368
## ch_sh 4 12
excel_file <- glue::glue("excel/{rundate}_hs_infect_sva_contr-v{ver}.xlsx")
hs_combined_sva <- sm(combine_de_tables(
hs_pairwise_sva,
excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_sva_sig-v{ver}.xlsx")
hs_sig_sva <- sm(extract_significant_genes(
hs_combined_sva,
excel=excel_file))
hs_sig_sva$deseq$counts
## change_counts_up change_counts_down
## sh_nil 866 296
## ch_nil 953 401
## ch_sh 4 14
At this point, we should see that there are no significant differences between the chronic and self-healing samples when we look at all samples. The following will attempt to query why this is the case and decide on what to do about it.
While sitting with Hector in 201812, we ended up focusing on the distribution of p-values observed when performing a chronic vs. self-healing comparison. This was eventually expanded to include that distribution for each of the three individual donors.
## There were 12, now there are 4 samples.
d107_pairwise <- sm(all_pairwise(d107, model_batch=FALSE))
d107_table <- sm(combine_de_tables(d107_pairwise))
summary(d107_table$data)
## Length Class Mode
## sh_vs_chr 112 data.frame list
d107_sig <- sm(extract_significant_genes(d107_table, according_to="deseq", p=0.1, p_type="raw"))
dim(d107_sig[["deseq"]][["ups"]][[1]])
## [1] 25 112
pvalues <- sm(plot_de_pvals(d107_table[["data"]][["sh_vs_chr"]], type="deseq", p_type="raw"))
pvalues
## There were 12, now there are 4 samples.
d108_pairwise <- sm(all_pairwise(d108, model_batch=FALSE))
d108_table <- sm(combine_de_tables(d108_pairwise))
summary(d108_table$data)
## Length Class Mode
## sh_vs_chr 112 data.frame list
d108_sig <- sm(extract_significant_genes(d108_table, according_to="deseq", p=0.1, p_type="raw"))
head(d108_sig$deseq$ups[[1]])
## hgncsymbol ensembltranscriptid ensemblgeneid
## ENSG00000064300 NGFR ENST00000172229 ENSG00000064300
## ENSG00000177675 CD163L1 ENST00000313599 ENSG00000177675
## ENSG00000115414 FN1 ENST00000323926 ENSG00000115414
## ENSG00000182853 VMO1 ENST00000328739 ENSG00000182853
## ENSG00000110077 MS4A6A ENST00000412309 ENSG00000110077
## ENSG00000129538 RNASE1 ENST00000340900 ENSG00000129538
## description
## ENSG00000064300 nerve growth factor receptor [Source:HGNC Symbol;Acc:HGNC:7809]
## ENSG00000177675 CD163 molecule like 1 [Source:HGNC Symbol;Acc:HGNC:30375]
## ENSG00000115414 fibronectin 1 [Source:HGNC Symbol;Acc:HGNC:3778]
## ENSG00000182853 vitelline membrane outer layer 1 homolog [Source:HGNC Symbol;Acc:HGNC:30387]
## ENSG00000110077 membrane spanning 4-domains A6A [Source:HGNC Symbol;Acc:HGNC:13375]
## ENSG00000129538 ribonuclease A family member 1, pancreatic [Source:HGNC Symbol;Acc:HGNC:10044]
## genebiotype found adc adipocytes astrocytes bcells
## ENSG00000064300 protein_coding 0 FALSE FALSE FALSE FALSE
## ENSG00000177675 protein_coding FALSE FALSE FALSE FALSE FALSE
## ENSG00000115414 protein_coding 0 FALSE FALSE TRUE FALSE
## ENSG00000182853 protein_coding FALSE FALSE FALSE FALSE FALSE
## ENSG00000110077 protein_coding 0 FALSE FALSE FALSE FALSE
## ENSG00000129538 protein_coding FALSE FALSE FALSE FALSE FALSE
## basophils cd4.memory.tcells cd4.naive.tcells cd4.tcells
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE
## cd4.tcm cd4.tem cd8.naive.tcells cd8.tcells cd8.tcm
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE FALSE
## cd8.tem cdc chondrocytes classswitched.memory.bcells
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE
## clp cmp dc endothelial.cells eosinophils
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE TRUE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE FALSE
## epithelial.cells erythrocytes fibroblasts gmp
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE
## hepatocytes hsc idc keratinocytes ly.endothelial.cells
## ENSG00000064300 FALSE FALSE FALSE TRUE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE FALSE
## macrophages macrophages.m1 macrophages.m2 mast.cells
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE
## ENSG00000110077 TRUE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE
## megakaryocytes melanocytes memory.bcells mep
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE
## mesangial.cells monocytes mpp msc mv.endothelial.cells
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE FALSE
## ENSG00000115414 TRUE FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE TRUE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE FALSE
## myocytes naive.bcells neurons neutrophils nk.cells nkt
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE FALSE FALSE
## osteoblast pdc pericytes plasma.cells platelets
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE FALSE
## preadipocytes pro.bcells sebocytes skeletal.muscle
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE
## smooth.muscle tgd.cells th1.cells th2.cells tregs
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000177675 FALSE FALSE FALSE FALSE FALSE
## ENSG00000115414 FALSE FALSE FALSE FALSE FALSE
## ENSG00000182853 FALSE FALSE FALSE FALSE FALSE
## ENSG00000110077 FALSE FALSE FALSE FALSE FALSE
## ENSG00000129538 FALSE FALSE FALSE FALSE FALSE
## xcelltypes deseq_logfc deseq_adjp edger_logfc edger_adjp
## ENSG00000064300 FALSE 8.039 1.0000000 8.578 1.240e-03
## ENSG00000177675 FALSE 2.888 0.4430000 2.836 7.689e-02
## ENSG00000115414 FALSE 2.767 0.0007941 2.763 1.530e-17
## ENSG00000182853 FALSE 2.481 0.4104000 2.437 2.151e-03
## ENSG00000110077 FALSE 2.435 0.2503000 2.406 1.224e-02
## ENSG00000129538 FALSE 2.175 0.2069000 2.157 1.677e-03
## limma_logfc limma_adjp basic_nummed basic_denmed
## ENSG00000064300 3.803 0.8713 -0.5145 -4.2780
## ENSG00000177675 2.037 0.8326 0.6936 -1.3700
## ENSG00000115414 2.732 0.1784 7.6410 4.8970
## ENSG00000182853 2.200 0.3871 1.1890 -0.9205
## ENSG00000110077 2.041 0.7394 1.4840 -0.5330
## ENSG00000129538 2.004 0.4144 1.7290 -0.2478
## basic_numvar basic_denvar basic_logfc basic_t basic_p
## ENSG00000064300 2.832e+01 4.880e-12 3.763 1.000 5.000e-01
## ENSG00000177675 4.072e+00 1.256e+00 2.063 1.264 3.624e-01
## ENSG00000115414 4.896e-01 2.920e-01 2.744 4.389 5.383e-02
## ENSG00000182853 8.482e-02 7.009e-02 2.109 7.579 1.740e-02
## ENSG00000110077 1.208e+00 1.626e-01 2.017 2.437 2.038e-01
## ENSG00000129538 2.227e-01 4.880e-12 1.976 5.922 1.065e-01
## basic_adjp deseq_basemean deseq_lfcse deseq_stat deseq_p
## ENSG00000064300 9.837e-01 23.77 4.7870 1.679 9.311e-02
## ENSG00000177675 9.837e-01 14.09 1.0450 2.765 5.691e-03
## ENSG00000115414 9.837e-01 1192.00 0.5868 4.715 2.416e-06
## ENSG00000182853 9.837e-01 12.84 0.8724 2.844 4.460e-03
## ENSG00000110077 9.837e-01 18.65 0.7974 3.053 2.263e-03
## ENSG00000129538 9.837e-01 20.27 0.6950 3.130 1.749e-03
## ebseq_fc ebseq_logfc ebseq_postfc ebseq_mean ebseq_ppee
## ENSG00000064300 4761.911 12.217 78.015 23.80 0.000000
## ENSG00000177675 7.655 2.936 6.609 14.10 0.902426
## ENSG00000115414 6.919 2.791 6.907 1197.58 0.016757
## ENSG00000182853 5.709 2.513 5.069 12.94 1.000000
## ENSG00000110077 5.511 2.462 5.079 18.71 0.739878
## ENSG00000129538 4.563 2.190 4.290 20.38 0.006923
## ebseq_ppde ebseq_adjp edger_logcpm edger_lr edger_p
## ENSG00000064300 0.000e+00 0.000000 1.3850 20.38 6.333e-06
## ENSG00000177675 9.757e-02 0.902426 0.7071 10.55 1.162e-03
## ENSG00000115414 9.832e-01 0.016757 6.9110 91.21 1.292e-21
## ENSG00000182853 1.532e-08 1.000000 0.5701 19.11 1.235e-05
## ENSG00000110077 2.601e-01 0.739878 1.0610 14.98 1.086e-04
## ENSG00000129538 9.931e-01 0.006923 1.1630 19.76 8.779e-06
## limma_ave limma_t limma_b limma_p limma_adjp_fdr
## ENSG00000064300 -2.497000 1.119 -4.593 0.3030000 8.713e-01
## ENSG00000177675 -0.445500 1.652 -4.324 0.1462000 8.329e-01
## ENSG00000115414 6.193000 7.205 1.060 0.0002613 1.784e-01
## ENSG00000182853 -0.002466 5.044 -2.515 0.0019070 3.870e-01
## ENSG00000110077 0.378200 2.838 -3.417 0.0274500 7.393e-01
## ENSG00000129538 0.650500 4.729 -2.278 0.0026710 4.144e-01
## deseq_adjp_fdr edger_adjp_fdr basic_adjp_fdr lfc_meta
## ENSG00000064300 1.000e+00 1.240e-03 9.837e-01 8.308
## ENSG00000177675 4.434e-01 7.689e-02 9.837e-01 2.613
## ENSG00000115414 7.949e-04 1.530e-17 9.837e-01 3.215
## ENSG00000182853 4.108e-01 2.151e-03 9.837e-01 2.459
## ENSG00000110077 2.505e-01 1.225e-02 9.837e-01 2.332
## ENSG00000129538 2.072e-01 1.677e-03 9.837e-01 2.158
## lfc_var lfc_varbymed p_meta p_var
## ENSG00000064300 0.000e+00 0.000e+00 1.320e-01 2.409e-02
## ENSG00000177675 1.855e-01 7.098e-02 5.102e-02 6.800e-03
## ENSG00000115414 6.066e-01 1.887e-01 8.791e-05 2.255e-08
## ENSG00000182853 0.000e+00 0.000e+00 2.126e-03 4.982e-06
## ENSG00000110077 2.359e-02 1.011e-02 9.941e-03 2.311e-04
## ENSG00000129538 9.321e-03 4.319e-03 1.476e-03 1.828e-06
pvalues <- sm(plot_de_pvals(d108_table[["data"]][["sh_vs_chr"]], type="deseq", p_type="raw"))
pvalues
## There were 12, now there are 4 samples.
d110_pairwise <- sm(all_pairwise(d110, model_batch=FALSE))
d110_table <- sm(combine_de_tables(d110_pairwise))
summary(d110_table$data)
## Length Class Mode
## sh_vs_chr 112 data.frame list
d110_sig <- sm(extract_significant_genes(d110_table, according_to="deseq", p=0.1, p_type="raw"))
head(d110_sig$deseq$ups[[1]])
## hgncsymbol ensembltranscriptid ensemblgeneid
## ENSG00000064300 NGFR ENST00000172229 ENSG00000064300
## ENSG00000077063 CTTNBP2 ENST00000160373 ENSG00000077063
## ENSG00000124491 F13A1 ENST00000264870 ENSG00000124491
## ENSG00000130558 OLFM1 ENST00000252854 ENSG00000130558
## ENSG00000157168 NRG1 ENST00000287842 ENSG00000157168
## ENSG00000121807 CCR2 ENST00000292301 ENSG00000121807
## description
## ENSG00000064300 nerve growth factor receptor [Source:HGNC Symbol;Acc:HGNC:7809]
## ENSG00000077063 cortactin binding protein 2 [Source:HGNC Symbol;Acc:HGNC:15679]
## ENSG00000124491 coagulation factor XIII A chain [Source:HGNC Symbol;Acc:HGNC:3531]
## ENSG00000130558 olfactomedin 1 [Source:HGNC Symbol;Acc:HGNC:17187]
## ENSG00000157168 neuregulin 1 [Source:HGNC Symbol;Acc:HGNC:7997]
## ENSG00000121807 C-C motif chemokine receptor 2 [Source:HGNC Symbol;Acc:HGNC:1603]
## genebiotype found adc adipocytes astrocytes bcells
## ENSG00000064300 protein_coding 0 FALSE FALSE FALSE FALSE
## ENSG00000077063 protein_coding FALSE FALSE FALSE FALSE FALSE
## ENSG00000124491 protein_coding 0 FALSE FALSE FALSE FALSE
## ENSG00000130558 protein_coding FALSE FALSE FALSE FALSE FALSE
## ENSG00000157168 protein_coding FALSE FALSE FALSE FALSE FALSE
## ENSG00000121807 protein_coding 0 FALSE FALSE FALSE FALSE
## basophils cd4.memory.tcells cd4.naive.tcells cd4.tcells
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE TRUE
## cd4.tcm cd4.tem cd8.naive.tcells cd8.tcells cd8.tcm
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE TRUE FALSE FALSE FALSE
## cd8.tem cdc chondrocytes classswitched.memory.bcells
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE FALSE
## clp cmp dc endothelial.cells eosinophils
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE TRUE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE FALSE FALSE
## epithelial.cells erythrocytes fibroblasts gmp
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE FALSE
## hepatocytes hsc idc keratinocytes ly.endothelial.cells
## ENSG00000064300 FALSE FALSE FALSE TRUE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE TRUE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE FALSE FALSE
## macrophages macrophages.m1 macrophages.m2 mast.cells
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE FALSE
## megakaryocytes melanocytes memory.bcells mep
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE FALSE
## mesangial.cells monocytes mpp msc mv.endothelial.cells
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE TRUE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE TRUE FALSE FALSE FALSE
## myocytes naive.bcells neurons neutrophils nk.cells nkt
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE FALSE FALSE FALSE
## osteoblast pdc pericytes plasma.cells platelets
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE TRUE
## ENSG00000130558 FALSE FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE TRUE FALSE FALSE FALSE
## preadipocytes pro.bcells sebocytes skeletal.muscle
## ENSG00000064300 FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE FALSE FALSE FALSE
## smooth.muscle tgd.cells th1.cells th2.cells tregs
## ENSG00000064300 FALSE FALSE FALSE FALSE FALSE
## ENSG00000077063 FALSE FALSE FALSE FALSE FALSE
## ENSG00000124491 FALSE FALSE FALSE FALSE FALSE
## ENSG00000130558 FALSE FALSE FALSE FALSE FALSE
## ENSG00000157168 FALSE FALSE FALSE FALSE FALSE
## ENSG00000121807 FALSE TRUE FALSE FALSE FALSE
## xcelltypes deseq_logfc deseq_adjp edger_logfc edger_adjp
## ENSG00000064300 FALSE 5.325 1.000e+00 5.238 2.057e-03
## ENSG00000077063 FALSE 2.757 1.000e+00 2.706 1.547e-07
## ENSG00000124491 FALSE 2.293 1.000e+00 2.270 2.290e-03
## ENSG00000130558 FALSE 2.253 1.000e+00 2.238 2.464e-14
## ENSG00000157168 FALSE 2.076 1.000e+00 2.060 3.860e-05
## ENSG00000121807 FALSE 1.760 5.746e-16 1.745 3.410e-27
## limma_logfc limma_adjp basic_nummed basic_denmed
## ENSG00000064300 2.646 0.525200 0.4877 -2.3320
## ENSG00000077063 2.744 0.029070 1.5470 -1.1610
## ENSG00000124491 1.743 0.222700 2.4700 0.6189
## ENSG00000130558 2.293 0.015510 3.0130 0.7146
## ENSG00000157168 2.126 0.037900 1.2910 -0.7716
## ENSG00000121807 1.796 0.003128 5.0380 3.2410
## basic_numvar basic_denvar basic_logfc basic_t basic_p
## ENSG00000064300 2.476e+01 9.780e-01 2.819 0.7859 5.680e-01
## ENSG00000077063 1.672e-01 1.626e-01 2.708 6.6680 2.177e-02
## ENSG00000124491 3.142e+00 5.565e-02 1.852 1.4640 3.755e-01
## ENSG00000130558 2.851e-03 1.379e-01 2.298 8.6650 6.745e-02
## ENSG00000157168 8.629e-05 4.436e-01 2.062 4.3790 1.429e-01
## ENSG00000121807 8.491e-03 1.296e-03 1.798 25.7000 1.023e-02
## basic_adjp deseq_basemean deseq_lfcse deseq_stat deseq_p
## ENSG00000064300 7.385e-01 55.86 2.5290 2.106 3.521e-02
## ENSG00000077063 5.622e-01 23.64 0.6570 4.197 2.706e-05
## ENSG00000124491 6.216e-01 63.37 1.1120 2.062 3.919e-02
## ENSG00000130558 5.622e-01 66.83 0.3795 5.937 2.894e-09
## ENSG00000157168 5.622e-01 21.13 0.6503 3.193 1.407e-03
## ENSG00000121807 5.622e-01 295.30 0.1972 8.923 4.544e-19
## ebseq_fc ebseq_logfc ebseq_postfc ebseq_mean ebseq_ppee
## ENSG00000064300 43.692 5.449 35.040 59.07 7.570e-03
## ENSG00000077063 7.146 2.837 6.522 24.20 4.028e-07
## ENSG00000124491 5.230 2.387 5.100 65.75 7.709e-01
## ENSG00000130558 4.905 2.294 4.795 67.98 7.438e-15
## ENSG00000157168 4.290 2.101 4.039 21.48 1.000e+00
## ENSG00000121807 3.523 1.817 3.510 299.77 0.000e+00
## ebseq_ppde ebseq_adjp edger_logcpm edger_lr edger_p
## ENSG00000064300 9.924e-01 7.570e-03 1.8820 15.57 7.936e-05
## ENSG00000077063 1.000e+00 4.028e-07 0.7988 38.20 6.400e-10
## ENSG00000124491 2.291e-01 7.709e-01 2.1230 15.29 9.223e-05
## ENSG00000130558 1.000e+00 7.438e-15 2.2510 71.88 2.289e-17
## ENSG00000157168 6.371e-07 1.000e+00 0.6714 25.24 5.052e-07
## ENSG00000121807 1.000e+00 0.000e+00 4.3840 133.10 8.637e-31
## limma_ave limma_t limma_b limma_p limma_adjp_fdr
## ENSG00000064300 -1.0500 1.012 -4.8300 3.425e-01 5.252e-01
## ENSG00000077063 0.1028 6.676 -0.5113 1.973e-04 2.907e-02
## ENSG00000124491 1.4850 2.032 -4.0750 7.850e-02 2.227e-01
## ENSG00000130558 1.8050 9.349 1.7790 1.965e-05 1.552e-02
## ENSG00000157168 0.1839 5.688 -0.8544 5.546e-04 3.790e-02
## ENSG00000121807 4.0770 14.020 5.1590 1.056e-06 3.127e-03
## deseq_adjp_fdr edger_adjp_fdr basic_adjp_fdr lfc_meta
## ENSG00000064300 2.286e-01 2.057e-03 7.384e-01 4.432
## ENSG00000077063 1.667e-03 1.547e-07 5.620e-01 3.582
## ENSG00000124491 2.418e-01 2.290e-03 6.216e-01 2.105
## ENSG00000130558 1.008e-06 2.465e-14 5.620e-01 2.258
## ENSG00000157168 3.168e-02 3.860e-05 5.620e-01 2.127
## ENSG00000121807 7.688e-16 3.410e-27 5.620e-01 1.858
## lfc_var lfc_varbymed p_meta p_var
## ENSG00000064300 2.168e+00 4.891e-01 1.259e-01 3.549e-02
## ENSG00000077063 2.168e+00 6.052e-01 7.479e-05 1.144e-08
## ENSG00000124491 9.328e-02 4.431e-02 3.926e-02 1.537e-03
## ENSG00000130558 4.320e-04 1.914e-04 6.551e-06 1.287e-10
## ENSG00000157168 1.050e-02 4.937e-03 6.540e-04 5.020e-07
## ENSG00000121807 3.318e-02 1.786e-02 3.520e-07 3.717e-13
pvalues <- sm(plot_de_pvals(d110_table[["data"]][["sh_vs_chr"]], type="deseq", p_type="raw"))
pvalues
hs_tmp <- set_expt_batches(hs_inf_filt, fact="pathogenstrain")
hs_tmp2 <- sm(normalize_expt(hs_tmp, transform="log2", convert="cpm", norm="quant", filter=TRUE))
plot_pca(hs_tmp2)$plot
The above plots suggest, along with the PCA plots of all samples, that one or more strains are either switched or at least very problematic. Specifically, samples from the strains annotated 2504 and 2272. Therefore, we next embarked on a series of comparisons of what happens when we remove each of them individually, and then both.
This block will first remove strain 2504, then 2272. The resulting subsets will be used for another round of these differential expression analyses.
In this block we will perform the various removals, creating ‘remove_2504’, ‘remove_2272’, and ‘remove_both’. In addition, there are versions of this with and without the uninfected samples. We eventually decided to use the data with the uninfected samples for our final analyses; but we will have some pca etc. plots without them first because the uninfected make it harder to see the distributions because they are so very different than the infected.
## There were 12, now there are 12 samples.
remove_2504_inf_filt <- sm(normalize_expt(remove_2504_inf, filter=TRUE))
remove_2272_inf <- subset_expt(hs_inf_filt, subset="pathogenstrain!='s2272'")
## There were 12, now there are 12 samples.
remove_2272_inf_filt <- sm(normalize_expt(remove_2272_inf, filter=TRUE))
remove_both_inf <- subset_expt(remove_2504_inf, subset="pathogenstrain!='s2272'")
## There were 12, now there are 12 samples.
remove_both_inf_filt <- sm(normalize_expt(remove_both_inf, filter=TRUE))
remove_2504_uninf <- subset_expt(hs_uninf_filt, subset="pathogenstrain!='s2504'")
## There were 15, now there are 15 samples.
remove_2504_uninf_filt <- sm(normalize_expt(remove_2504_uninf, filter=TRUE))
remove_2272_uninf <- subset_expt(hs_uninf_filt, subset="pathogenstrain!='s2272'")
## There were 15, now there are 15 samples.
remove_2272_uninf_filt <- sm(normalize_expt(remove_2272_uninf, filter=TRUE))
remove_both_uninf <- subset_expt(remove_2504_uninf, subset="pathogenstrain!='s2272'")
## There were 15, now there are 15 samples.
remove_both_uninf_filt <- sm(normalize_expt(remove_both_uninf, filter=TRUE))
remove_2504_d107 <- subset_expt(remove_2504_inf, subset="donor=='d107'")
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
## There were 12, now there are 4 samples.
Strain 2504 was the first thing Hector focused upon, so let us perform a few metrics without it.
remove_2504_inf_norm <- sm(normalize_expt(remove_2504_inf_filt, transform="log2",
convert="cpm", norm="quant"))
plot_pca(remove_2504_inf_norm)$plot
remove_2504_inf_norm <- sm(normalize_expt(remove_2504_inf_filt, transform="log2",
batch="svaseq",
norm="quant"))
plot_pca(remove_2504_inf_norm)$plot
remove_2504_d107_filt <- sm(normalize_expt(remove_2504_d107, filter=TRUE))
remove_2504_d107_de <- sm(all_pairwise(remove_2504_d107_filt, model_batch="svaseq"))
remove_2504_d107_table <- sm(combine_de_tables(remove_2504_d107_de))
pvalues <- plot_de_pvals(remove_2504_d107_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_2504_d108_filt <- sm(normalize_expt(remove_2504_d108, filter=TRUE))
remove_2504_d108_de <- sm(all_pairwise(remove_2504_d108_filt, model_batch="svaseq"))
remove_2504_d108_table <- sm(combine_de_tables(remove_2504_d108_de))
pvalues <- plot_de_pvals(remove_2504_d108_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_2504_d110_filt <- sm(normalize_expt(remove_2504_d110, filter=TRUE))
remove_2504_d110_de <- sm(all_pairwise(remove_2504_d110_filt, model_batch="svaseq"))
remove_2504_d110_table <- sm(combine_de_tables(remove_2504_d110_de))
pvalues <- plot_de_pvals(remove_2504_d110_table$data[[1]], type="deseq", p_type="raw")
pvalues
The above should give us a clue as to whether removing sample 2504 did anything helpful to the resulting distribution of p-values. Let us now bring the donors back together and see how that looks.
Given the clustering of the samples after removing the 2504 strain samples, let us now perform the pairwise analysis and see how it looks.
Note that these are performed with the uninfected samples while the previous metrics were without them. This will include one round with batch in the model and one round using svaseq. In addition, we are relaxing the log fold-change and p-value constraints to 0.6 and 0.1 respectively.
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2504uninf_batchmodel_contr-v{ver}.xlsx")
remove_2504_uninf_table <- sm(combine_de_tables(remove_2504_uninf_de, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2504uninf_batchmodel_sig-v{ver}.xlsx")
remove_2504_uninf_sig <- sm(extract_significant_genes(remove_2504_uninf_table, excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
remove_2504_uninf_de_sva <- sm(all_pairwise(remove_2504_uninf_filt, model_batch="svaseq"))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2504uninf_svaseq_contr-v{ver}.xlsx")
remove_2504_uninf_table_sva <- sm(combine_de_tables(remove_2504_uninf_de_sva, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2504uninf_svaseq_sig-v{ver}.xlsx")
remove_2504_uninf_sig_sva <- sm(extract_significant_genes(remove_2504_uninf_table_sva,
excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
This should be an exact repetition of the 2504 removal above, so I removed the commentary.
remove_2272_inf_norm <- sm(normalize_expt(remove_2272_inf_filt, transform="log2",
convert="cpm", norm="quant"))
plot_pca(remove_2272_inf_norm)$plot
remove_2272_inf_norm <- sm(normalize_expt(remove_2272_inf_filt, transform="log2",
batch="svaseq",
norm="quant"))
plot_pca(remove_2272_inf_norm)$plot
remove_2272_d107_filt <- sm(normalize_expt(remove_2272_d107, filter=TRUE))
remove_2272_d107_de <- sm(all_pairwise(remove_2272_d107_filt, model_batch="svaseq"))
remove_2272_d107_table <- sm(combine_de_tables(remove_2272_d107_de))
pvalues <- plot_de_pvals(remove_2272_d107_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_2272_d108_filt <- sm(normalize_expt(remove_2272_d108, filter=TRUE))
remove_2272_d108_de <- sm(all_pairwise(remove_2272_d108_filt, model_batch="svaseq"))
remove_2272_d108_table <- sm(combine_de_tables(remove_2272_d108_de))
pvalues <- plot_de_pvals(remove_2272_d108_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_2272_d110_filt <- sm(normalize_expt(remove_2272_d110, filter=TRUE))
remove_2272_d110_de <- sm(all_pairwise(remove_2272_d110_filt, model_batch="svaseq"))
remove_2272_d110_table <- sm(combine_de_tables(remove_2272_d110_de))
pvalues <- plot_de_pvals(remove_2272_d110_table$data[[1]], type="deseq", p_type="raw")
pvalues
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2272uninf_batchmodel_contr-v{ver}.xlsx")
remove_2272_uninf_table <- sm(combine_de_tables(remove_2272_uninf_de, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2272uninf_batchmodel_sig-v{ver}.xlsx")
remove_2272_uninf_sig <- sm(extract_significant_genes(remove_2272_uninf_table, excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
remove_2272_uninf_de_sva <- sm(all_pairwise(remove_2272_uninf_filt, model_batch="svaseq"))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2272uninf_svaseq_contr-v{ver}.xlsx")
remove_2272_uninf_table_sva <- sm(combine_de_tables(remove_2272_uninf_de_sva, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_remove2272uninf_svaseq_sig-v{ver}.xlsx")
remove_2272_uninf_sig_sva <- sm(extract_significant_genes(remove_2272_uninf_table_sva,
excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
This should be an exact repetition of the 2504/2272 removals above, so I removed the commentary.
remove_both_inf_norm <- sm(normalize_expt(remove_both_inf_filt, transform="log2",
convert="cpm", norm="quant"))
plot_pca(remove_both_inf_norm)$plot
remove_both_inf_norm <- sm(normalize_expt(remove_both_inf_filt, transform="log2",
batch="svaseq",
norm="quant"))
plot_pca(remove_both_inf_norm)$plot
remove_both_d107_filt <- sm(normalize_expt(remove_both_d107, filter=TRUE))
remove_both_d107_de <- sm(all_pairwise(remove_both_d107_filt, model_batch="svaseq"))
remove_both_d107_table <- sm(combine_de_tables(remove_both_d107_de))
pvalues <- plot_de_pvals(remove_both_d107_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_both_d108_filt <- sm(normalize_expt(remove_both_d108, filter=TRUE))
remove_both_d108_de <- sm(all_pairwise(remove_both_d108_filt, model_batch="svaseq"))
remove_both_d108_table <- sm(combine_de_tables(remove_both_d108_de))
pvalues <- plot_de_pvals(remove_both_d108_table$data[[1]], type="deseq", p_type="raw")
pvalues
remove_both_d110_filt <- sm(normalize_expt(remove_both_d110, filter=TRUE))
remove_both_d110_de <- sm(all_pairwise(remove_both_d110_filt, model_batch="svaseq"))
remove_both_d110_table <- sm(combine_de_tables(remove_both_d110_de))
pvalues <- plot_de_pvals(remove_both_d110_table$data[[1]], type="deseq", p_type="raw")
pvalues
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_batchmodel_contr-v{ver}.xlsx")
remove_both_uninf_table <- sm(combine_de_tables(remove_both_uninf_de, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_batchmodel_sig-v{ver}.xlsx")
remove_both_uninf_sig <- sm(extract_significant_genes(remove_both_uninf_table, excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
remove_both_uninf_de_sva <- sm(all_pairwise(remove_both_uninf_filt, model_batch="svaseq"))
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_svaseq_contr-v{ver}.xlsx")
remove_both_uninf_table_sva <- sm(combine_de_tables(remove_both_uninf_de_sva, excel=excel_file,
keepers=keepers_uninf))
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_svaseq_sig-v{ver}.xlsx")
remove_both_uninf_sig_sva <- sm(extract_significant_genes(remove_both_uninf_table_sva,
excel=excel_file,
according_to="deseq", p=0.1, lfc=0.6))
remove_both_uninf_sig_sva$deseq$counts
## change_counts_up change_counts_down
## sh_nil 1511 1012
## ch_nil 1568 1096
## ch_sh 43 36
excel_file <- glue::glue("excel/{rundate}_hs_infect_removebothuninf_svaseq_sig-v{ver}_normal.xlsx")
remove_both_uninf_sig_sva_normal <- sm(extract_significant_genes(remove_both_uninf_table_sva,
excel=excel_file))
remove_both_uninf_sig_sva_normal$deseq$counts
## change_counts_up change_counts_down
## sh_nil 866 296
## ch_nil 953 401
## ch_sh 4 14
Now that we have performed a set of analyses looking at the various combinations of strains and donors, let us look at how similar are the distributions of logFC and rank orders.
compare <- sm(compare_de_results(d107_table, d108_table,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_vs_chr
## $sh_vs_chr$logfc
## [1] 0.1951
##
## $sh_vs_chr$p
## [1] 0.08561
##
## $sh_vs_chr$adjp
## [1] 0.1286
## Holy crapola!
compare <- sm(compare_de_results(d107_table, d110_table,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_vs_chr
## $sh_vs_chr$logfc
## [1] 0.5934
##
## $sh_vs_chr$p
## [1] 0.3445
##
## $sh_vs_chr$adjp
## [1] 0.187
compare <- sm(compare_de_results(d108_table, d110_table,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_vs_chr
## $sh_vs_chr$logfc
## [1] 0.06535
##
## $sh_vs_chr$p
## [1] 0.1053
##
## $sh_vs_chr$adjp
## [1] 0.09951
Wow I had forgotten how ridiculously different the 3 donors are.
compare <- compare_de_results(remove_2272_uninf_table, remove_both_uninf_table,
cor_method="spearman", try_methods="deseq")
## Testing method: deseq.
## Adding method: deseq to the set.
## $sh_nil
## $sh_nil$logfc
## [1] 1
##
## $sh_nil$p
## [1] 1
##
## $sh_nil$adjp
## [1] 1
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 1
##
## $ch_nil$p
## [1] 1
##
## $ch_nil$adjp
## [1] 1
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 1
##
## $ch_sh$p
## [1] 1
##
## $ch_sh$adjp
## [1] 1
compare <- compare_de_results(remove_2504_uninf_table, remove_both_uninf_table,
cor_method="spearman", try_methods="deseq")
## Testing method: deseq.
## Adding method: deseq to the set.
## $sh_nil
## $sh_nil$logfc
## [1] 1
##
## $sh_nil$p
## [1] 1
##
## $sh_nil$adjp
## [1] 1
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 1
##
## $ch_nil$p
## [1] 1
##
## $ch_nil$adjp
## [1] 1
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 1
##
## $ch_sh$p
## [1] 1
##
## $ch_sh$adjp
## [1] 1
compare <- compare_de_results(remove_2272_uninf_table_sva, remove_both_uninf_table_sva,
cor_method="spearman", try_methods="deseq")
## Testing method: deseq.
## Adding method: deseq to the set.
## $sh_nil
## $sh_nil$logfc
## [1] 1
##
## $sh_nil$p
## [1] 1
##
## $sh_nil$adjp
## [1] 1
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 1
##
## $ch_nil$p
## [1] 1
##
## $ch_nil$adjp
## [1] 1
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 1
##
## $ch_sh$p
## [1] 1
##
## $ch_sh$adjp
## [1] 1
compare <- compare_de_results(remove_2504_uninf_table_sva, remove_both_uninf_table_sva,
cor_method="spearman", try_methods="deseq")
## Testing method: deseq.
## Adding method: deseq to the set.
## $sh_nil
## $sh_nil$logfc
## [1] 1
##
## $sh_nil$p
## [1] 1
##
## $sh_nil$adjp
## [1] 1
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 1
##
## $ch_nil$p
## [1] 1
##
## $ch_nil$adjp
## [1] 1
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 1
##
## $ch_sh$p
## [1] 1
##
## $ch_sh$adjp
## [1] 1
compare <- sm(compare_de_results(remove_both_uninf_table, hs_combined_batch,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_nil
## $sh_nil$logfc
## [1] 1
##
## $sh_nil$p
## [1] 1
##
## $sh_nil$adjp
## [1] 1
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 1
##
## $ch_nil$p
## [1] 1
##
## $ch_nil$adjp
## [1] 1
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 1
##
## $ch_sh$p
## [1] 1
##
## $ch_sh$adjp
## [1] 1
compare <- sm(compare_de_results(remove_both_uninf_table_sva, hs_combined_sva,
cor_method="spearman", try_methods="deseq"))
compare$result$deseq
## $sh_nil
## $sh_nil$logfc
## [1] 1
##
## $sh_nil$p
## [1] 1
##
## $sh_nil$adjp
## [1] 1
##
##
## $ch_nil
## $ch_nil$logfc
## [1] 1
##
## $ch_nil$p
## [1] 1
##
## $ch_nil$adjp
## [1] 1
##
##
## $ch_sh
## $ch_sh$logfc
## [1] 1
##
## $ch_sh$p
## [1] 1
##
## $ch_sh$adjp
## [1] 1
lp_inf_filt <- sm(normalize_expt(lp_inf, filter=TRUE))
reminder_data <- set_expt_batches(lp_inf_filt, fact="pathogenstrain")
##reminder_data <- subset_expt(reminder_data, subset="pathogenstrain!='s2272'")
##reminder_data <- subset_expt(reminder_data, subset="pathogenstrain!='s2504'")
reminder_norm <- normalize_expt(reminder_data, norm="quant", transform="log2", convert="cpm")
## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(data)))
## It backs up the current data into a slot named:
## expt$backup_expressionset. It will also save copies of each step along the way
## in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
## when invoking limma. The appropriate libsize is the non-log(cpm(normalized)).
## This is most likely kept at:
## 'new_expt$normalized$intermediate_counts$normalization$libsizes'
## A copy of this may also be found at:
## new_expt$best_libsize
## Filter is false, this should likely be set to something, good
## choices include cbcb, kofa, pofa (anything but FALSE). If you want this to
## stay FALSE, keep in mind that if other normalizations are performed, then the
## resulting libsizes are likely to be strange (potentially negative!)
## Not correcting the count-data for batch effects. If batch is
## included in EdgerR/limma's model, then this is probably wise; but in extreme
## batch effects this is a good parameter to play with.
## Step 1: not doing count filtering.
## Step 2: normalizing the data with quant.
## Using normalize.quantiles.robust due to a thread error in preprocessCore.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## transform_counts: Found 40 values equal to 0, adding 1 to the matrix.
## Step 5: not doing batch correction.
Oh yeah, I remember now, the primary difference when looking at the parasite samples is one which matches the variant pattern. Thus, if one looks at the clustering observed in the parasite data, one will find the same pattern observed in 04_variants.Rmd, where the samples on one side of the hclust cladogram are on one side of the PCA and vice versa.
This does not bode well for our ability to use differential expression anayses in order to find parasite strain differences between chronic/self-healing; but it is great if we want to see differences between one family of strains and the other; sadly both families have self-healing and chronic members.
lp_pairwise_nobatch <- sm(all_pairwise(lp_inf_filt, model_batch=FALSE,
do_ebseq=FALSE))
excel_file <- glue::glue("excel/{rundate}_lp_infect_nobatch_contr-v{ver}.xlsx")
lp_combined_nobatch <- sm(combine_de_tables(lp_pairwise_nobatch, excel=excel_file))
excel_file <- glue::glue("excel/{rundate}_lp_infect_nobatch_sig-v{ver}.xlsx")
lp_sig_nobatch <- sm(extract_significant_genes(lp_combined_nobatch, excel=excel_file))
lp_pairwise_batch <- sm(all_pairwise(lp_inf_filt, model_batch=TRUE))
excel_file <- glue::glue("excel/{rundate}_lp_infect_batch_contr-v{ver}.xlsx")
lp_combined_batch <- sm(combine_de_tables(lp_pairwise_batch, excel=excel_file))
excel_file <- glue::glue("excel/{rundate}_lp_infect_batch_sig-v{ver}.xlsx")
lp_sig_batch <- sm(extract_significant_genes(lp_combined_batch, excel=excel_file))
lp_pairwise_ssva <- sm(all_pairwise(lp_inf_filt, model_batch="ssva"))
excel_file <- glue::glue("excel/{rundate}_lp_infect_ssva_contr-v{ver}.xlsx")
lp_combined_ssva <- sm(combine_de_tables(lp_pairwise_ssva, excel=excel_file))
excel_file <- glue::glue("excel/{rundate}_lp_infect_ssva_sig-v{ver}.xlsx")
lp_sig_ssva <- sm(extract_significant_genes(lp_combined_ssva, excel=excel_file))
lp_pairwise_fsva <- sm(all_pairwise(lp_inf_filt, model_batch="fsva"))
excel_file <- glue::glue("excel/{rundate}_lp_infect_fsva_contr-v{ver}.xlsx")
lp_combined_fsva <- sm(combine_de_tables(lp_pairwise_fsva, excel=excel_file))