Moving all of the visualization and diagnostic tasks to this document. The metadata and gene annotation data collection tasks are therefore in tmrc3_data_structures.Rmd. The reasons for some of the data structure creation in that document is made clear in this document, but they are all performed there.
Thus the lesion size is the more inclusive metric, but potentially ulcer size is more informative? Any inflammation in the skin causes the person to be defined as failure.
These samples are from patients who either successfully cleared a Leishmania panamensis infection following treatment, or did not. They include biopsies from each patient along with purifications for Monocytes, Neutrophils, and Eosinophils. When possible, this process was repeated over three visits; but some patients did not return for the second or third visit.
The over-arching goal is to look for attributes(most likely genes) which distinguish patients who do and do not cure the infection after treatment. If possible, these will be apparent on the first visit.
plot_legend(hs_expt)$plot
## plot labels was not set and there are more than 100 samples, disabling it.
all_nz <- plot_nonzero(hs_expt)
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
all_nz$plot
## Warning: ggrepel: 195 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
The following plot is essentially identical to the previous with two exceptions:
nz_post <- plot_nonzero(tc_valid)
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
nz_post$plot
## Warning: ggrepel: 163 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
Maria Adelaida’s quote: “I would like one picture of all samples including the miltefosine so that I can keep in my mind why we removed them.”
We need to keep track of how many of each sample type is lost when we do our various filters. Thus I am repeating the same set of tallies. This will likely happen one more time, following the removal of samples which came from Cali.
table(pData(tc_valid)$drug)
##
## antimony
## 184
table(pData(tc_valid)$clinic)
##
## Cali Tumaco
## 61 123
table(pData(tc_valid)$finaloutcome)
##
## cure failure
## 122 62
table(pData(tc_valid)$typeofcells)
##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
table(pData(tc_valid)$visit)
##
## 3 2 1
## 51 50 83
summary(as.numeric(pData(tc_valid)$eb_lc_tiempo_evolucion))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 6.00 8.19 12.00 21.00
summary(as.numeric(pData(tc_valid)$eb_lc_tto_mcto_glucan_dosis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.0 14.8 19.0 17.5 20.0 20.0
summary(as.numeric(pData(tc_valid)$v3_lc_ejey_lesion_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 7.2 32.0 303.4 999.0 999.0
summary(as.numeric(pData(tc_valid)$v3_lc_lesion_area_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 226 999 2328 2448 16965
summary(as.numeric(pData(tc_valid)$v3_lc_ejex_ulcera_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 12.5 295.9 999.0 999.0
table(pData(tc_valid)$eb_lc_sexo)
##
## 1 2
## 156 28
table(pData(tc_valid)$eb_lc_etnia)
##
## 1 2 3
## 91 46 47
summary(as.numeric(pData(tc_valid)$edad))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 25.0 28.5 30.7 36.0 51.0
table(pData(tc_valid)$eb_lc_peso)
##
## 53.9 57.9 58 58.1 58.3 58.6 59 59.6 62 63 67 69.4 72
## 9 2 6 7 10 3 8 1 6 6 6 10 9
## 75 76.5 77 78 79.2 82 83.3 83.4 86.4 87 89 93.3 100
## 2 3 18 10 10 9 4 10 9 3 9 7 5
## 100.8
## 2
table(pData(tc_valid)$eb_lc_estatura)
##
## 152 154 155 156 158 159 160 163 164 165 166 167 169 172 173 174 176 177 182 183
## 1 10 9 6 15 2 3 9 15 12 19 3 2 10 9 32 1 7 9 10
length(unique(pData(tc_valid)[["codigo_paciente"]]))
## [1] 29
The sets of samples used to visualize the data will also comprise the sets used when later performing the various differential expression analyses.
Start out with some initial metrics of all samples. The most obvious are plots of the numbers of non-zero genes observed, heatmaps showing the relative relationships among the samples, the relative library sizes, and some PCA. It might be smart to split the library sizes up across subsets of the data, because they have expanded too far to see well on a computer screen.
The most likely factors to query when considering the entire dataset are cure/fail, visit, and cell type. This is the level at which we will choose samples to exclude from future analyses.
plot_legend(tc_biopsies)$plot
plot_libsize(tc_biopsies)$plot
plot_nonzero(tc_biopsies)$plot
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
There is a (relatively)new function in the following block. plot_libsize_prepost attempts to provide an idea about how much data is lost when low-count filtering the data.
The first plot it produces is a barplot of the number of reads removed by the filter from each sample. The second plot has two bars, the top bar is labeled with the number of low-count genes before the filter. The lower bar represents the number after the filter and is assumed to be quite low.
biopsy_prepost <- plot_libsize_prepost(tc_biopsies)
biopsy_prepost$count_plot
biopsy_prepost$lowgene_plot
## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of biopsy genes: ~ 14,000
plot_libsize(tc_eosinophils)$plot
plot_nonzero(tc_eosinophils)$plot
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
## Warning: ggrepel: 18 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
eosinophil_prepost <- plot_libsize_prepost(tc_eosinophils)
eosinophil_prepost$count_plot
eosinophil_prepost$lowgene_plot
## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of eosinophil genes: ~ 13,500
plot_libsize(tc_monocytes)$plot
plot_nonzero(tc_monocytes)$plot
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
## Warning: ggrepel: 48 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
monocyte_prepost <- plot_libsize_prepost(tc_monocytes)
monocyte_prepost$count_plot
monocyte_prepost$lowgene_plot
## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of monocyte genes: ~ 7,500 before setting the minimum.
plot_libsize(tc_neutrophils)$plot
plot_nonzero(tc_neutrophils)$plot
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
## Warning: ggrepel: 41 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
neutrophil_prepost <- plot_libsize_prepost(tc_neutrophils)
neutrophil_prepost$count_plot
neutrophil_prepost$lowgene_plot
## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of neutrophil genes: ~ 10,000 before setting minimum coverage.
The above block just repeats the same two plots on a per-celltype basis: the number of reads observed / sample and a plot of observed genes with respect to coverage. I made some comments with my observations about the number of genes.
Now that those ‘global’ metrics are out of the way, lets look at some global metrics of the data following normalization; the most likely plots are of course PCA but also a couple of heatmaps.
In the google doc TMRC3_Aug18_2021, there is an example of an image for the first figure:
“Transcriptomic profiles of primary innate cells of CL patients show unique transcriptional signatures - Remove PBMCs and M0, maybe biopsies as well (but Remove WT samples)”
While we were talking in a meeting however, it sounded like there was some desire to keep all cell types. Therefore the following block has one image with everything and one following the above.
tc_type <- set_expt_conditions(tc_valid, fact="typeofcells") %>%
set_expt_batches(fact="finaloutcome") %>%
set_expt_colors(color_choices[["type"]])
tc_norm <- sm(normalize_expt(tc_type, transform="log2", norm="quant",
convert="cpm", filter=TRUE))
tc_pca <- plot_pca(tc_norm, plot_labels=FALSE,
plot_title="PCA - Cell type", size_column="visitnumber")
dev <- pp(file=glue("images/tmrc3_pca_nolabels-v{ver}.png"))
tc_pca$plot
closed <- dev.off()
tc_pca$plot
tc_pca_nosize <- plot_pca(tc_norm, plot_labels=FALSE)
tc_pca_nosize$plot
write.csv(tc_pca$table, file="coords/tc_donor_pca_coords.csv")
tc_cf_norm <- set_expt_batches(tc_norm,
fact="visitnumber")
tc_cf_corheat <- plot_corheat(tc_cf_norm, plot_title="Heirarchical clustering:
cell types")
dev <- pp(file=glue("images/tmrc3_corheat_cf-v{ver}.png"), height=12, width=12)
tc_cf_corheat$plot
closed <- dev.off()
tc_cf_corheat$plot
tc_cf_disheat <- plot_disheat(tc_cf_norm, plot_title="Heirarchical clustering:
cell types")
dev <- pp(file=glue("images/tmrc3_disheat_cf-v{ver}.png"), height=12, width=12)
tc_cf_disheat$plot
closed <- dev.off()
tc_cf_disheat$plot
A potential figure legend for the following images might include:
The observed counts per gene for all of the clinical samples were filtered, log transformed, cpm converted, and quantile normalized. The colors were defined by cell types and shapes by patient visit. When the first two principle components were plotted, clustering was observed by cell type. The biopsy samples were significantly different from the innate immune cell types.
fig1v2_norm <- normalize_expt(tc_type, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 5633 low-count genes (14290 remaining).
## transform_counts: Found 675 values equal to 0, adding 1 to the matrix.
fig1v2_pca <- plot_pca(fig1v2_norm, cis=FALSE)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/tmrc3_fig1v2.png"))
fig1v2_pca$plot
closed <- dev.off()
fig1v2_pca$plot
fig1v3_norm <- normalize_expt(tc_type, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 5633 low-count genes (14290 remaining).
## transform_counts: Found 675 values equal to 0, adding 1 to the matrix.
fig1v3_pca <- plot_pca(fig1v3_norm, cis=FALSE)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file="images/tmrc3_fig1v3.png")
fig1v3_pca$plot
closed <- dev.off()
fig1v3_pca$plot
Spoiler alert: This section will eventually suggest pretty strongly that we will not easily be able to use the Cali samples. Thus, after finishing it, we will likely exclude those samples.
Take a moment to view the biopsy samples. We separated them by clinic (Cali or Tumaco), and this view of the samples is the only one which does not suggest a strong difference between the two clinics. However, it also suggests that the biopsy samples will not prove very helpful.
tc_biopsies_norm <- normalize_expt(tc_biopsies, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 6315 low-count genes (13608 remaining).
## transform_counts: Found 206 values equal to 0, adding 1 to the matrix.
tc_biopsies_pca <- plot_pca(tc_biopsies_norm, plot_labels=FALSE)
dev <- pp(file="images/biopsy_place.png")
tc_biopsies_pca$plot
closed <- dev.off()
tc_biopsies_pca$plot
tc_biopsies_nb <- normalize_expt(tc_biopsies, transform="log2",
convert="cpm", batch="svaseq", filter=TRUE)
## Removing 6315 low-count genes (13608 remaining).
## Setting 290 low elements to zero.
## transform_counts: Found 290 values equal to 0, adding 1 to the matrix.
tc_biopsies_nb_pca <- plot_pca(tc_biopsies_nb, plot_labels=FALSE)
dev <- pp(file="images/biopsy_place_nb.png")
tc_biopsies_nb_pca$plot
closed <- dev.off()
tc_biopsies_nb_pca$plot
In contrast, the Eosinophil samples do have significant amounts of variance which discriminates the two clinics. At the time of this writing, there are fewer eosinophil samples than monocytes nor neutrophils; as a result there are no samples which failed from Cali. This is somewhat limiting is we wish to look for differences between the cure and fail samples which came from the two clinics.
tc_eosinophils_norm <- normalize_expt(tc_eosinophils, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 9059 low-count genes (10864 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
tc_eosinophils_pca <- plot_pca(tc_eosinophils_norm, plot_labels=FALSE)
dev <- pp(file="images/eosinophil_place.png")
tc_eosinophils_pca$plot
closed <- dev.off()
tc_eosinophils_pca$plot
tc_eosinophils_nb <- normalize_expt(tc_eosinophils, transform="log2",
convert="cpm", batch="svaseq", filter=TRUE)
## Removing 9059 low-count genes (10864 remaining).
## Setting 1043 low elements to zero.
## transform_counts: Found 1043 values equal to 0, adding 1 to the matrix.
tc_eosinophils_nb_pca <- plot_pca(tc_eosinophils_nb, plot_labels=FALSE)
dev <- pp(file="images/eosinophil_place_nb.png")
tc_eosinophils_nb_pca$plot
closed <- dev.off()
tc_eosinophils_nb_pca$plot
In contrast with the eosinophil samples, we have one patient’s monocyte and neutrophil samples which did not cure. As we will see, there is one person from Cali who did not cure, this person is not different with respect to tracscriptome than the other people from Cali.
tc_monocytes_norm <- normalize_expt(tc_monocytes, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 8819 low-count genes (11104 remaining).
## transform_counts: Found 12 values equal to 0, adding 1 to the matrix.
tc_monocytes_pca <- plot_pca(tc_monocytes_norm, plot_labels=FALSE)
dev <- pp(file="images/monocytes_place.png")
tc_monocytes_pca$plot
closed <- dev.off()
tc_monocytes_pca$plot
tc_monocytes_nb <- normalize_expt(tc_monocytes, transform="log2",
convert="cpm", batch="svaseq", filter=TRUE)
## Removing 8819 low-count genes (11104 remaining).
## Setting 1447 low elements to zero.
## transform_counts: Found 1447 values equal to 0, adding 1 to the matrix.
tc_monocytes_nb_pca <- plot_pca(tc_monocytes_nb, plot_labels=FALSE)
dev <- pp(file="images/monocytes_place_nb.png")
tc_monocytes_nb_pca$plot
closed <- dev.off()
tc_monocytes_nb_pca$plot
Finally, that same one person does appear to be different than the others from Cali.
tc_neutrophils_norm <- normalize_expt(tc_neutrophils, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 10681 low-count genes (9242 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
tc_neutrophils_pca <- plot_pca(tc_neutrophils_norm, plot_labels=FALSE)
dev <- pp(file="images/neutrophil_place.png")
tc_neutrophils_pca$plot
closed <- dev.off()
tc_neutrophils_pca$plot
tc_neutrophils_nb <- normalize_expt(tc_neutrophils, transform="log2",
convert="cpm", batch="svaseq", filter=TRUE)
## Removing 10681 low-count genes (9242 remaining).
## Setting 1541 low elements to zero.
## transform_counts: Found 1541 values equal to 0, adding 1 to the matrix.
tc_neutrophils_nb_pca <- plot_pca(tc_neutrophils_nb, plot_labels=FALSE)
dev <- pp(file="images/neutrophil_place_nb.png")
tc_neutrophils_nb_pca$plot
closed <- dev.off()
tc_neutrophils_nb_pca$plot
Now that we have these various subsets, perform an explicit comparison of the samples which came from the two clinics.
tc_clinic_type <- tc_valid %>%
set_expt_conditions(fact="clinic") %>%
set_expt_batches(fact="typeofcells")
tc_clinic_type_norm <- normalize_expt(tc_clinic_type, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 5633 low-count genes (14290 remaining).
## transform_counts: Found 675 values equal to 0, adding 1 to the matrix.
tc_clinic_type_pca <- plot_pca(tc_clinic_type_norm)
## plot labels was not set and there are more than 100 samples, disabling it.
tc_clinic_type_pca$plot
tc_clinic_type_nb <- normalize_expt(tc_clinic_type, transform="log2", convert="cpm",
batch="svaseq", filter=TRUE)
## Removing 5633 low-count genes (14290 remaining).
## Setting 31271 low elements to zero.
## transform_counts: Found 31271 values equal to 0, adding 1 to the matrix.
tc_clinic_type_nb_pca <- plot_pca(tc_clinic_type_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
tc_clinic_type_nb_pca$plot
tc_clinical_norm <- sm(normalize_expt(tc_clinical, filter="simple", transform="log2",
norm="quant", convert="cpm"))
clinical_pca <- plot_pca(tc_clinical_norm, plot_labels=FALSE,
cis=NULL,
plot_title="PCA - clinical samples")
dev <- pp(file=glue("images/all_clinical_nobatch_pca-v{ver}.png"), height=8, width=16)
clinical_pca$plot
closed <- dev.off()
clinical_pca$plot
tc_clinical_nb <- normalize_expt(tc_clinical, filter="simple", transform="log2",
batch="svaseq", convert="cpm")
## Removing 1872 low-count genes (18051 remaining).
## Setting 156640 low elements to zero.
## transform_counts: Found 156640 values equal to 0, adding 1 to the matrix.
tc_clinical_nb_pca <- plot_pca(tc_clinical_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/all_clinical_svaseqbatch_pca-v{ver}.png"), height=6, width=8)
tc_clinical_nb_pca$plot
closed <- dev.off()
tc_clinical_nb_pca$plot
clinical_pca_info <- pca_information(
tc_clinical_norm, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "finaloutcome",
"clinic", "donor"))
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file="images/clinical_samples_neglogp_pcs.png")
clinical_pca_info$anova_neglogp_heatmap
closed <- dev.off()
clinical_pca_info$anova_neglogp_heatmap
clinical_pca_info$pca_plots$PC4_PC7
## Warning: ggrepel: 114 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
clinical_scores <- pca_highscores(tc_clinical_norm)
clinical_scores[["highest"]][,"Comp.4"]
## [1] "15.73:ENSG00000168329" "14.96:ENSG00000133574" "14.03:ENSG00000204389"
## [4] "14.02:ENSG00000171115" "13.89:ENSG00000163563" "13.47:ENSG00000179144"
## [7] "13.17:ENSG00000004799" "13.11:ENSG00000180871" "13:ENSG00000172086"
## [10] "12.77:ENSG00000091106" "12.61:ENSG00000121858" "12.37:ENSG00000123405"
## [13] "12.36:ENSG00000175538" "12.04:ENSG00000138449" "12.01:ENSG00000109971"
## [16] "11.84:ENSG00000165118" "11.6:ENSG00000088986" "11.59:ENSG00000135828"
## [19] "11.37:ENSG00000038274" "11.17:ENSG00000130150"
Another way to explore the effect of SVA is to iteratively increase the number of SVs removed by it and look at some simple plots of the resulting data. Ideally, this should complement the methods employed by Theresa.
first <- normalize_expt(tc_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=1)
## Removing 5633 low-count genes (14290 remaining).
## Setting 192779 low elements to zero.
## transform_counts: Found 192779 values equal to 0, adding 1 to the matrix.
first_info <- pca_information(
first, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
first_info$anova_neglogp_heatmap
first_info$pca_plots[["PC1_PC2"]]
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning: ggrepel: 176 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
second <- normalize_expt(tc_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=2) %>%
set_expt_batches(fact="clinic")
## Removing 5633 low-count genes (14290 remaining).
## Setting 31218 low elements to zero.
## transform_counts: Found 31218 values equal to 0, adding 1 to the matrix.
second_info <- pca_information(
second, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
second_info$anova_neglogp_heatmap
third <- normalize_expt(tc_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=3) %>%
set_expt_batches(fact="clinic")
## Removing 5633 low-count genes (14290 remaining).
## Setting 27267 low elements to zero.
## transform_counts: Found 27267 values equal to 0, adding 1 to the matrix.
third_info <- pca_information(
third, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
third_info$anova_neglogp_heatmap
fourth <- normalize_expt(tc_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=4) %>%
set_expt_batches(fact="clinic")
## Removing 5633 low-count genes (14290 remaining).
## Setting 25946 low elements to zero.
## transform_counts: Found 25946 values equal to 0, adding 1 to the matrix.
fourth_info <- pca_information(
fourth, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
fourth_info$anova_neglogp_heatmap
fourth_info[["pca_plots"]][["PC1_PC2"]]
## Warning: ggrepel: 109 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
fifth <- normalize_expt(tc_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=5) %>%
set_expt_batches(fact="clinic")
## Removing 5633 low-count genes (14290 remaining).
## Setting 27033 low elements to zero.
## transform_counts: Found 27033 values equal to 0, adding 1 to the matrix.
fifth_info <- pca_information(
fifth, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
fifth_info$anova_neglogp_heatmap
fifth_info[["pca_plots"]][["PC1_PC12"]]
## Warning: ggrepel: 112 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
sixth <- normalize_expt(tc_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=6) %>%
set_expt_batches(fact="clinic")
## Removing 5633 low-count genes (14290 remaining).
## Setting 23957 low elements to zero.
## transform_counts: Found 23957 values equal to 0, adding 1 to the matrix.
sixth_info <- pca_information(
sixth, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
sixth_info$anova_neglogp_heatmap
seventh <- normalize_expt(tc_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=7) %>%
set_expt_batches(fact="clinic")
## Removing 5633 low-count genes (14290 remaining).
## Setting 24476 low elements to zero.
## transform_counts: Found 24476 values equal to 0, adding 1 to the matrix.
seventh_info <- pca_information(
seventh, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
seventh_info$anova_neglogp_heatmap
eighth <- normalize_expt(tc_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=8)
## Removing 5633 low-count genes (14290 remaining).
## Setting 24108 low elements to zero.
## transform_counts: Found 24108 values equal to 0, adding 1 to the matrix.
eighth_info <- pca_information(
eighth, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells",
"finaloutcome", "clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
eighth_info$anova_neglogp_heatmap
At least in theory, everything which follows will be using the above ‘clinical’ data structure. Thus, let us count it up and get a sense of what we will work with.
table(pData(t_clinical)$drug)
##
## antimony
## 123
table(pData(t_clinical)$clinic)
##
## Tumaco
## 123
table(pData(t_clinical)$finaloutcome)
##
## cure failure
## 67 56
table(pData(t_clinical)$typeofcells)
##
## biopsy eosinophils monocytes neutrophils
## 14 26 42 41
table(pData(t_clinical)$visit)
##
## 3 2 1
## 34 35 54
summary(as.numeric(pData(t_clinical)$eb_lc_tiempo_evolucion))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 4.00 7.03 8.00 21.00
summary(as.numeric(pData(t_clinical)$eb_lc_tto_mcto_glucan_dosis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13 14 17 17 20 20
summary(as.numeric(pData(t_clinical)$v3_lc_ejey_lesion_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 7.2 31.3 389.6 999.0 999.0
summary(as.numeric(pData(t_clinical)$v3_lc_lesion_area_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 46 222 999 1089 999 5055
summary(as.numeric(pData(t_clinical)$v3_lc_ejex_ulcera_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 383 999 999
table(pData(t_clinical)$eb_lc_sexo)
##
## 1 2
## 101 22
table(pData(t_clinical)$eb_lc_etnia)
##
## 1 2 3
## 76 19 28
summary(as.numeric(pData(t_clinical)$edad))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 23.0 25.0 28.5 34.0 51.0
table(pData(t_clinical)$eb_lc_peso)
##
## 53.9 57.9 58.1 58.3 58.6 59 59.6 62 63 69.4 77 78 79.2
## 9 2 7 10 3 8 1 6 6 10 9 10 10
## 83.3 83.4 86.4 93.3 100.8
## 4 10 9 7 2
table(pData(t_clinical)$eb_lc_estatura)
##
## 152 154 158 159 163 164 165 166 172 173 174 176 177 182 183
## 1 10 15 2 9 15 12 10 10 4 8 1 7 9 10
length(unique(pData(t_clinical)[["codigo_paciente"]]))
## [1] 19
only_cure <- pData(t_clinical)[["finaloutcome"]] == "cure"
c_meta <- pData(t_clinical)[only_cure, ]
length(unique(c_meta[["codigo_paciente"]]))
## [1] 10
only_fail <- pData(t_clinical)[["finaloutcome"]] == "failure"
f_meta <- pData(t_clinical)[only_fail, ]
length(unique(f_meta[["codigo_paciente"]]))
## [1] 9
t_clinical_nobiop_norm <- normalize_expt(t_clinical_nobiop, filter=TRUE, norm="quant",
convert="cpm", transform="log2")
## Removing 8016 low-count genes (11907 remaining).
## transform_counts: Found 93 values equal to 0, adding 1 to the matrix.
t_clinical_nobiop_pca <- plot_pca(t_clinical_nobiop_norm, plot_labels=FALSE)
dev <- pp(file="images/clinical_nobiopsys_tumaco_norm_pca.png")
t_clinical_nobiop_pca$plot
closed <- dev.off()
t_clinical_nobiop_pca$plot
t_clinical_nobiop_nb <- normalize_expt(t_clinical_nobiop, filter=TRUE, nb="quant", convert="cpm",
transform="log2", batch="svaseq")
## Removing 8016 low-count genes (11907 remaining).
## Setting 9578 low elements to zero.
## transform_counts: Found 9578 values equal to 0, adding 1 to the matrix.
t_clinical_nobiop_nb_pca <- plot_pca(t_clinical_nobiop_nb, plot_labels=FALSE)
dev <- pp(file="images/clinical_nobiopsys_tumaco_nb_pca.png")
t_clinical_nobiop_nb_pca$plot
closed <- dev.off()
t_clinical_nobiop_nb_pca$plot
Now we have a new, smaller set of primary samples which are categorized by cell type.
Sadly, the biopsy samples remain basically impenetrable. This makes me sad, I think it would be particularly nice if we could judge cure/fail from a visit 1 biopsy.
t_biopsies_norm <- normalize_expt(t_biopsies, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 6417 low-count genes (13506 remaining).
## transform_counts: Found 136 values equal to 0, adding 1 to the matrix.
t_biopsies_pca <- plot_pca(t_biopsies_norm,
plot_labels=FALSE)
dev <- pp(file="images/biopsys_tumaco_norm.png")
t_biopsies_pca$plot
closed <- dev.off()
t_biopsies_pca$plot
t_biopsies_nb <- normalize_expt(t_biopsies, transform="log2", convert="cpm",
batch="svaseq", filter=TRUE)
## Removing 6417 low-count genes (13506 remaining).
## Setting 145 low elements to zero.
## transform_counts: Found 145 values equal to 0, adding 1 to the matrix.
t_biopsies_nb_pca <- plot_pca(t_biopsies_nb, plot_labels=FALSE)
dev <- pp(file="images/biopsys_tumaco_norm_sva.png")
t_biopsies_nb_pca$plot
closed <- dev.off()
t_biopsies_nb_pca$plot
In contrast, I suspect that we can get meaningful data from the other cell types. The monocyte samples are still a bit messy.
t_monocyte_norm <- normalize_expt(t_monocytes, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 9064 low-count genes (10859 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
t_monocyte_pca <- plot_pca(t_monocyte_norm,
plot_labels=FALSE)
dev <- pp(file="images/monocytes_tumaco_norm.png")
t_monocyte_pca$plot
closed <- dev.off()
t_monocyte_pca$plot
t_monocyte_nb <- normalize_expt(t_monocytes, transform="log2", convert="cpm",
batch="svaseq", filter=TRUE)
## Removing 9064 low-count genes (10859 remaining).
## Setting 730 low elements to zero.
## transform_counts: Found 730 values equal to 0, adding 1 to the matrix.
t_monocyte_nb_pca <- plot_pca(t_monocyte_nb, plot_labels=FALSE)
dev <- pp(file="images/monocytes_tumaco_norm_sva.png")
t_monocyte_nb_pca$plot
closed <- dev.off()
t_monocyte_nb_pca$plot
Well, really all the cell types remain pretty messy. There is always at least one person in one visit or another who really does not fit well with the rest of the cohort.
t_neutrophil_norm <- normalize_expt(t_neutrophils, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 10824 low-count genes (9099 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_neutrophil_pca <- plot_pca(t_neutrophil_norm,
plot_labels=FALSE)
dev <- pp(file="images/neutrophils_tumaco_norm.png")
t_neutrophil_pca$plot
closed <- dev.off()
t_neutrophil_pca$plot
t_neutrophil_nb <- normalize_expt(t_neutrophils, transform="log2", convert="cpm",
batch="svaseq", filter=TRUE)
## Removing 10824 low-count genes (9099 remaining).
## Setting 750 low elements to zero.
## transform_counts: Found 750 values equal to 0, adding 1 to the matrix.
t_neutrophil_nb_pca <- plot_pca(t_neutrophil_nb, plot_labels=FALSE)
dev <- pp(file="images/neutrophils_tumaco_norm_sva.png")
t_neutrophil_nb_pca$plot
closed <- dev.off()
t_neutrophil_nb_pca$plot
t_eosinophil_norm <- normalize_expt(t_eosinophils, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 9393 low-count genes (10530 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_eosinophil_pca <- plot_pca(t_eosinophil_norm,
plot_labels=FALSE)
dev <- pp(file="images/eosinophils_tumaco_norm.png")
t_eosinophil_pca$plot
closed <- dev.off()
t_eosinophil_pca$plot
t_eosinophil_nb <- normalize_expt(t_eosinophils, transform="log2", convert="cpm",
batch="svaseq", filter=TRUE)
## Removing 9393 low-count genes (10530 remaining).
## Setting 325 low elements to zero.
## transform_counts: Found 325 values equal to 0, adding 1 to the matrix.
t_eosinophil_nb_pca <- plot_pca(t_eosinophil_nb, plot_labels=FALSE)
dev <- pp(file="images/eosinophils_tumaco_norm_sva.png")
t_eosinophil_nb_pca$plot
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
closed <- dev.off()
t_eosinophil_nb_pca$plot
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
t_monocyte_v1 <- subset_expt(t_monocytes, subset = "visitnumber=='1'")
## subset_expt(): There were 42, now there are 16 samples.
t_monocyte_v1_norm <- normalize_expt(t_monocyte_v1, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 9444 low-count genes (10479 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_monocyte_v1_pca <- plot_pca(t_monocyte_v1_norm, plot_labels = FALSE)
dev <- pp(file="images/monocytes_v1_cf_norm_pca.png")
t_monocyte_v1_pca$plot
closed <- dev.off()
t_monocyte_v1_pca$plot
t_monocyte_v1_nb <- normalize_expt(t_monocyte_v1, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9444 low-count genes (10479 remaining).
## Setting 187 low elements to zero.
## transform_counts: Found 187 values equal to 0, adding 1 to the matrix.
t_monocyte_v1_nb_pca <- plot_pca(t_monocyte_v1_nb, plot_labels = FALSE)
dev <- pp(file="images/monocytes_v1_cf_norm_sva_pca.png")
t_monocyte_v1_nb_pca$plot
closed <- dev.off()
t_monocyte_v1_nb_pca$plot
t_monocyte_v2 <- subset_expt(t_monocytes, subset = "visitnumber=='2'")
## subset_expt(): There were 42, now there are 13 samples.
t_monocyte_v2_norm <- normalize_expt(t_monocyte_v2, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 9403 low-count genes (10520 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_monocyte_v2_pca <- plot_pca(t_monocyte_v2_norm, plot_labels = FALSE)
dev <- pp(file="images/monocytes_v2_cf_norm_pca.png")
t_monocyte_v2_pca$plot
closed <- dev.off()
t_monocyte_v2_pca$plot
t_monocyte_v2_nb <- normalize_expt(t_monocyte_v2, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9403 low-count genes (10520 remaining).
## Setting 115 low elements to zero.
## transform_counts: Found 115 values equal to 0, adding 1 to the matrix.
t_monocyte_v2_nb_pca <- plot_pca(t_monocyte_v2_nb, plot_labels = FALSE)
dev <- pp(file="images/monocytes_v2_cf_norm_sva_pca.png")
t_monocyte_v2_nb_pca$plot
closed <- dev.off()
t_monocyte_v2_nb_pca$plot
t_monocyte_v3 <- subset_expt(t_monocytes, subset = "visitnumber=='3'")
## subset_expt(): There were 42, now there are 13 samples.
t_monocyte_v3_norm <- normalize_expt(t_monocyte_v3, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 9549 low-count genes (10374 remaining).
## transform_counts: Found 16 values equal to 0, adding 1 to the matrix.
t_monocyte_v3_pca <- plot_pca(t_monocyte_v3_norm, plot_labels = FALSE)
dev <- pp(file="images/monocytes_v3_cf_norm_pca.png")
t_monocyte_v3_pca$plot
closed <- dev.off()
t_monocyte_v3_pca$plot
t_monocyte_v3_nb <- normalize_expt(t_monocyte_v3, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9549 low-count genes (10374 remaining).
## Setting 55 low elements to zero.
## transform_counts: Found 55 values equal to 0, adding 1 to the matrix.
t_monocyte_v3_nb_pca <- plot_pca(t_monocyte_v3_nb, plot_labels = FALSE)
dev <- pp(file="images/monocytes_v3_cf_norm_sva_pca.png")
t_monocyte_v3_nb_pca$plot
closed <- dev.off()
t_monocyte_v3_nb_pca$plot
t_neutrophil_v1 <- subset_expt(t_neutrophils, subset = "visitnumber=='1'")
## subset_expt(): There were 41, now there are 16 samples.
t_neutrophil_v1_norm <- normalize_expt(t_neutrophil_v1, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 11208 low-count genes (8715 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
t_neutrophil_v1_pca <- plot_pca(t_neutrophil_v1_norm, plot_labels = FALSE)
dev <- pp(file="images/neutrophils_v1_cf_norm_pca.png")
t_neutrophil_v1_pca$plot
closed <- dev.off()
t_neutrophil_v1_pca$plot
t_neutrophil_v1_nb <- normalize_expt(t_neutrophil_v1, convert = "cpm",
transform = "log2", filter = TRUE, batch = "ruvg")
## Removing 11208 low-count genes (8715 remaining).
## Warning in RUVSeq::RUVg(linear_mtrx, ruv_controls, k = chosen_surrogates): The expression matrix does not contain counts.
## Please, pass a matrix of counts (not logged) or set isLog to TRUE to skip the log transformation
## Setting 192 low elements to zero.
## transform_counts: Found 192 values equal to 0, adding 1 to the matrix.
t_neutrophil_v1_nb_pca <- plot_pca(t_neutrophil_v1_nb, plot_labels = FALSE)
dev <- pp(file="images/neutrophils_v1_cf_norm_sva_pca.png")
t_neutrophil_v1_nb_pca$plot
closed <- dev.off()
t_neutrophil_v1_nb_pca$plot
t_neutrophil_v2 <- subset_expt(t_neutrophils, subset = "visitnumber=='2'")
## subset_expt(): There were 41, now there are 13 samples.
t_neutrophil_v2_norm <- normalize_expt(t_neutrophil_v2, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 11473 low-count genes (8450 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
t_neutrophil_v2_pca <- plot_pca(t_neutrophil_v2_norm, plot_labels = FALSE)
dev <- pp(file="images/neutrophils_v2_cf_norm_pca.png")
t_neutrophil_v2_pca$plot
closed <- dev.off()
t_neutrophil_v2_pca$plot
t_neutrophil_v2_nb <- normalize_expt(t_neutrophil_v2, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 11473 low-count genes (8450 remaining).
## Setting 78 low elements to zero.
## transform_counts: Found 78 values equal to 0, adding 1 to the matrix.
t_neutrophil_v2_nb_pca <- plot_pca(t_neutrophil_v2_nb, plot_labels = FALSE)
dev <- pp(file="images/neutrophils_v2_cf_norm_sva_pca.png")
t_neutrophil_v2_nb_pca$plot
closed <- dev.off()
t_neutrophil_v2_nb_pca$plot
t_neutrophil_v3 <- subset_expt(t_neutrophils, subset = "visitnumber=='3'")
## subset_expt(): There were 41, now there are 12 samples.
t_neutrophil_v3_norm <- normalize_expt(t_neutrophil_v3, norm = "quant", convert = "cpm",
transform = "log3", filter = TRUE)
## Removing 11420 low-count genes (8503 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
## Did not recognize the transformation, leaving the table.
## Recognized transformations include: 'log2', 'log10', 'log'
t_neutrophil_v3_pca <- plot_pca(t_neutrophil_v3_norm, plot_labels = FALSE)
dev <- pp(file="images/neutrophils_v3_cf_norm_pca.png")
t_neutrophil_v3_pca$plot
closed <- dev.off()
t_neutrophil_v3_pca$plot
t_neutrophil_v3_nb <- normalize_expt(t_neutrophil_v3, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 11420 low-count genes (8503 remaining).
## Setting 83 low elements to zero.
## transform_counts: Found 83 values equal to 0, adding 1 to the matrix.
t_neutrophil_v3_nb_pca <- plot_pca(t_neutrophil_v3_nb, plot_labels = FALSE)
dev <- pp(file="images/neutrophils_v3_cf_norm_sva_pca.png")
t_neutrophil_v3_nb_pca$plot
closed <- dev.off()
t_neutrophil_v3_nb_pca$plot
t_eosinophil_v1 <- subset_expt(t_eosinophils, subset = "visitnumber=='1'")
## subset_expt(): There were 26, now there are 8 samples.
t_eosinophil_v1_norm <- normalize_expt(t_eosinophil_v1, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 9946 low-count genes (9977 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_eosinophil_v1_pca <- plot_pca(t_eosinophil_v1_norm, plot_labels = FALSE)
dev <- pp(file="images/eosinophils_v1_cf_norm_pca.png")
t_eosinophil_v1_pca$plot
closed <- dev.off()
t_eosinophil_v1_pca$plot
t_eosinophil_v1_nb <- normalize_expt(t_eosinophil_v1, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9946 low-count genes (9977 remaining).
## Setting 57 low elements to zero.
## transform_counts: Found 57 values equal to 0, adding 1 to the matrix.
t_eosinophil_v1_nb_pca <- plot_pca(t_eosinophil_v1_nb, plot_labels = FALSE)
dev <- pp(file="images/eosinophils_v1_cf_norm_sva_pca.png")
t_eosinophil_v1_nb_pca$plot
closed <- dev.off()
t_eosinophil_v1_nb_pca$plot
t_eosinophil_v2 <- subset_expt(t_eosinophils, subset = "visitnumber=='2'")
## subset_expt(): There were 26, now there are 9 samples.
t_eosinophil_v2_norm <- normalize_expt(t_eosinophil_v2, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 9808 low-count genes (10115 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
t_eosinophil_v2_pca <- plot_pca(t_eosinophil_v2_norm, plot_labels = FALSE)
dev <- pp(file="images/eosinophils_v2_cf_norm_pca.png")
t_eosinophil_v2_pca$plot
closed <- dev.off()
t_eosinophil_v2_pca$plot
t_eosinophil_v2_nb <- normalize_expt(t_eosinophil_v2, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9808 low-count genes (10115 remaining).
## Setting 90 low elements to zero.
## transform_counts: Found 90 values equal to 0, adding 1 to the matrix.
t_eosinophil_v2_nb_pca <- plot_pca(t_eosinophil_v2_nb, plot_labels = FALSE)
dev <- pp(file="images/eosinophils_v2_cf_norm_sva_pca.png")
t_eosinophil_v2_nb_pca$plot
closed <- dev.off()
t_eosinophil_v2_nb_pca$plot
t_eosinophil_v3 <- subset_expt(t_eosinophils, subset = "visitnumber=='3'")
## subset_expt(): There were 26, now there are 9 samples.
t_eosinophil_v3_norm <- normalize_expt(t_eosinophil_v3, norm = "quant", convert = "cpm",
transform = "log3", filter = TRUE)
## Removing 9845 low-count genes (10078 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
## Did not recognize the transformation, leaving the table.
## Recognized transformations include: 'log2', 'log10', 'log'
t_eosinophil_v3_pca <- plot_pca(t_eosinophil_v3_norm, plot_labels = FALSE)
dev <- pp(file="images/eosinophils_v3_cf_norm_pca.png")
t_eosinophil_v3_pca$plot
closed <- dev.off()
t_eosinophil_v3_pca$plot
t_eosinophil_v3_nb <- normalize_expt(t_eosinophil_v3, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9845 low-count genes (10078 remaining).
## Setting 48 low elements to zero.
## transform_counts: Found 48 values equal to 0, adding 1 to the matrix.
t_eosinophil_v3_nb_pca <- plot_pca(t_eosinophil_v3_nb, plot_labels = FALSE)
dev <- pp(file="images/eosinophils_v3_cf_norm_sva_pca.png")
t_eosinophil_v3_nb_pca$plot
closed <- dev.off()
t_eosinophil_v3_nb_pca$plot
In the following block the experimental condition was reset to the concatenation of clinical outcome and type of cells. There are an insufficient number of biopsy samples for them to be useful in this visualization, so they are ignored.
desired_levels <- c("cure_biopsy", "failure_biopsy", "cure_eosinophils", "failure_eosinophils",
"cure_monocytes", "failure_monocytes", "cure_neutrophils", "failure_neutrophils")
new_fact <- factor(
paste0(pData(t_clinical)[["condition"]], "_",
pData(t_clinical)[["batch"]]),
levels=desired_levels)
t_clinical_concat <- set_expt_conditions(t_clinical, fact = new_fact) %>%
set_expt_batches(fact = "visitnumber") %>%
set_expt_colors(color_choices[["cf_type"]]) %>%
subset_expt(subset="typeofcells!='biopsy'")
## subset_expt(): There were 123, now there are 109 samples.
## Try to ensure that the levels stay in the order I want
meta <- pData(t_clinical_concat) %>%
mutate(condition = fct_relevel(condition, desired_levels))
## Warning: Unknown levels in `f`: cure_biopsy, failure_biopsy
pData(t_clinical_concat) <- meta
The following block is pretty wild to my eyes; it seems to me that the variances introduced by cell type basically wipe out the apparent differences between cure/fail that we were able to see previously.
I suppose this is not entirely surprising, but when we had the Cali samples it at least looked like there were differences which were explicitly between cure/fail across cell types. I suppose this means those differences were actually coming from the unbalanced state of the two clinics from the perspective of clinic.
t_clinical_concat_norm <- normalize_expt(t_clinical_concat, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 8016 low-count genes (11907 remaining).
## transform_counts: Found 93 values equal to 0, adding 1 to the matrix.
t_clinical_concat_norm_pca <- plot_pca(t_clinical_concat_norm)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/clinical_concatenated_normalized_pca-v{ver}.png"), height=6, width=10)
t_clinical_concat_norm_pca$plot
closed <- dev.off()
t_clinical_concat_norm_pca$plot
t_clinical_concat_nb <- normalize_expt(t_clinical_concat, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 8016 low-count genes (11907 remaining).
## Setting 9896 low elements to zero.
## transform_counts: Found 9896 values equal to 0, adding 1 to the matrix.
t_clinical_concat_nb_pca <- plot_pca(t_clinical_concat_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/clinical_concatenated_svaseqbatch_pca-v{ver}.png"), height=6, width=12)
t_clinical_concat_nb_pca$plot
closed <- dev.off()
t_clinical_concat_nb_pca$plot
Let us shift the focus from cell type and/or Cure/Fail to the visit number. As you are likely aware, the three visits are significantly spread apart according to the clinical treatment of each patient. Thus we will now separate the samples by visit in order to more easily see what new patterns emerge.
Now let us shift the view slightly to focus on changes observed over time.
t_visit_expt <- set_expt_conditions(t_clinical, fact = "visitnumber") %>%
set_expt_batches(fact = "finaloutcome") %>%
subset_expt(subset="typeofcells!='biopsy'")
## subset_expt(): There were 123, now there are 109 samples.
t_visit_norm <- normalize_expt(t_visit_expt, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 8016 low-count genes (11907 remaining).
## transform_counts: Found 93 values equal to 0, adding 1 to the matrix.
plot_pca(t_visit_norm)$plot
## plot labels was not set and there are more than 100 samples, disabling it.
t_visit_nb <- normalize_expt(t_visit_expt, transform = "log2", convert="cpm",
filter = TRUE, batch = "svaseq")
## Removing 8016 low-count genes (11907 remaining).
## Setting 9614 low elements to zero.
## transform_counts: Found 9614 values equal to 0, adding 1 to the matrix.
t_visit_nb_pca <- plot_pca(t_visit_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/visit_svaseqbatch_pca-v{ver}.png"), height=7, width=9)
t_visit_nb_pca$plot
closed <- dev.off()
t_visit_nb_pca$plot
When looking at all cell types, it is quite difficult to see differences among the three visits.
Wen we had both Cali and Tumaco samples, it looked like there was variance suggesting differences between cure and fail for visit 1. I think the following block will suggest pretty strongly that this was not true.
tv1_norm <- normalize_expt(tv1_samples, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 5907 low-count genes (14016 remaining).
## transform_counts: Found 272 values equal to 0, adding 1 to the matrix.
plot_pca(tv1_norm)$plot
## Warning: ggrepel: 38 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
tv1_nb <- normalize_expt(tv1_samples, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")
## Removing 5907 low-count genes (14016 remaining).
## Setting 7615 low elements to zero.
## transform_counts: Found 7615 values equal to 0, adding 1 to the matrix.
plot_pca(tv1_nb, plot_labels = FALSE)$plot
tv2_clinical <- subset_expt(tv2_samples, subset="visitnumber=='2'") %>%
set_expt_conditions(fact = "finaloutcome") %>%
set_expt_batches(fact = "typeofcells")
## subset_expt(): There were 35, now there are 35 samples.
tv2_nb <- normalize_expt(tv2_clinical, transform = "log2", convert = "cpm", norm = "quant",
filter = TRUE, batch = "svaseq")
## Warning in normalize_expt(tv2_clinical, transform = "log2", convert = "cpm", :
## Quantile normalization and sva do not always play well together.
## Removing 8364 low-count genes (11559 remaining).
## Setting 1786 low elements to zero.
## transform_counts: Found 1786 values equal to 0, adding 1 to the matrix.
plot_pca(tv2_nb, plot_labels = FALSE)$plot
tv3_clinical <- subset_expt(tv3_samples, subset="visitnumber=='3'") %>%
set_expt_conditions(fact = "finaloutcome") %>%
set_expt_batches(fact = "typeofcells")
## subset_expt(): There were 34, now there are 34 samples.
tv3_nb <- normalize_expt(tv3_clinical, transform = "log2", convert = "cpm", norm = "quant",
filter = TRUE, batch = "svaseq")
## Warning in normalize_expt(tv3_clinical, transform = "log2", convert = "cpm", :
## Quantile normalization and sva do not always play well together.
## Removing 8474 low-count genes (11449 remaining).
## Setting 1481 low elements to zero.
## transform_counts: Found 1481 values equal to 0, adding 1 to the matrix.
plot_pca(tv3_nb, plot_labels = FALSE)$plot
Separate the samples by cell type in order to more easily observe patterns with respect to visit and clinical outcome.
t_visitcf_monocyte_norm <- normalize_expt(t_visitcf_monocyte, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 9064 low-count genes (10859 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
t_visitcf_monocyte_pca <- plot_pca(t_visitcf_monocyte_norm, plot_labels = FALSE)
dev <- pp(file="images/visit_monocytes_cf_norm_pca.png")
t_visitcf_monocyte_pca$plot
closed <- dev.off()
t_visitcf_monocyte_pca$plot
t_visitcf_monocyte_disheat <- plot_disheat(t_visitcf_monocyte_norm)
dev <- pp(file="images/visit_monocytes_cf_norm_disheat.png")
t_visitcf_monocyte_disheat$plot
closed <- dev.off()
t_visitcf_monocyte_disheat$plot
t_visitcf_monocyte_nb <- normalize_expt(t_visitcf_monocyte, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9064 low-count genes (10859 remaining).
## Setting 688 low elements to zero.
## transform_counts: Found 688 values equal to 0, adding 1 to the matrix.
t_visitcf_monocyte_nb_pca <- plot_pca(t_visitcf_monocyte_nb, plot_labels = FALSE)
dev <- pp(file="images/monocytes_cf_norm_sva_pca.png")
t_visitcf_monocyte_nb_pca$plot
closed <- dev.off()
t_visitcf_monocyte_nb_pca$plot
See if there are any patterns which look usable.
## All
t_persistence_norm <- normalize_expt(t_persistence, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 8537 low-count genes (11386 remaining).
## transform_counts: Found 15 values equal to 0, adding 1 to the matrix.
plot_pca(t_persistence_norm)$plot
## Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
t_persistence_nb <- normalize_expt(t_persistence, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 8537 low-count genes (11386 remaining).
## Setting 1538 low elements to zero.
## transform_counts: Found 1538 values equal to 0, adding 1 to the matrix.
plot_pca(t_persistence_nb)$plot
## Biopsies
##persistence_biopsy_norm <- normalize_expt(persistence_biopsy, transform = "log2", convert = "cpm",
## norm = "quant", filter = TRUE)
##plot_pca(persistence_biopsy_norm)$plot
## Insufficient data
## Monocytes
t_persistence_monocyte_norm <- normalize_expt(t_persistence_monocyte, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 9597 low-count genes (10326 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(t_persistence_monocyte_norm)$plot
t_persistence_monocyte_nb <- normalize_expt(t_persistence_monocyte, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 9597 low-count genes (10326 remaining).
## Setting 46 low elements to zero.
## transform_counts: Found 46 values equal to 0, adding 1 to the matrix.
plot_pca(t_persistence_monocyte_nb)$plot
## Neutrophils
t_persistence_neutrophil_norm <- normalize_expt(t_persistence_neutrophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 11531 low-count genes (8392 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
plot_pca(t_persistence_neutrophil_norm)$plot
t_persistence_neutrophil_nb <- normalize_expt(t_persistence_neutrophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 11531 low-count genes (8392 remaining).
## Setting 46 low elements to zero.
## transform_counts: Found 46 values equal to 0, adding 1 to the matrix.
plot_pca(t_persistence_neutrophil_nb)$plot
## Eosinophils
t_persistence_eosinophil_norm <- normalize_expt(t_persistence_eosinophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 9895 low-count genes (10028 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(t_persistence_eosinophil_norm)$plot
t_persistence_eosinophil_nb <- normalize_expt(t_persistence_eosinophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 9895 low-count genes (10028 remaining).
## Setting 25 low elements to zero.
## transform_counts: Found 25 values equal to 0, adding 1 to the matrix.
plot_pca(t_persistence_eosinophil_nb)$plot
I wrote out all the z2.2 and z2.3 specific variants to a couple files, I want to see if I can classify a human sample as infected with 2.2 or 2.3.
z22 <- read.csv("csv/variants_22.csv")
z23 <- read.csv("csv/variants_23.csv")
cure <- read.csv("csv/cure_variants.txt")
fail <- read.csv("csv/fail_variants.txt")
z22_vec <- gsub(pattern="\\-", replacement="_", x=z22[["x"]])
z23_vec <- gsub(pattern="\\-", replacement="_", x=z23[["x"]])
cure_vec <- gsub(pattern="\\-", replacement="_", x=cure)
fail_vec <- gsub(pattern="\\-", replacement="_", x=fail)
classify_zymo <- function(sample) {
arbitrary_tags <- sm(readr::read_tsv(sample))
arbitrary_ids <- arbitrary_tags[["position"]]
message("Length: ", length(arbitrary_ids), ", z22: ",
sum(arbitrary_ids %in% z22_vec) / (length(z22_vec)), " z23: ",
sum(arbitrary_ids %in% z23_vec) / (length(z23_vec)))
}
arbitrary_sample <- "preprocessing/TMRC30156/outputs/40freebayes_lpanamensis_v36/all_tags.txt.xz"
classify_zymo(arbitrary_sample)
if (!isTRUE(get0("skip_load"))) {
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
message(paste0("Saving to ", savefile))
tmp <- sm(saveme(filename=savefile))
}
tmp <- loadme(filename=savefile)