The set of analyses performed in tmrc3 sample estimation has become unorganized and difficult to follow. This is intended to start at the level of broad organization before stepping into the analyses.
As of 20220224, this document is attempting to synchronize with the google doc named ‘TMRC3_Aug18_2021’. I am hoping therefore to have blocks named according to the figures/etc in that document.
These samples are from patients who either successfully cleared a Leishmania panamensis infection following treatment, or did not. They include biopsies from each patient along with purifications for Monocytes, Neutrophils, and Eosinophils. When possible, this process was repeated over three visits; but some patients did not return for the second or third visit.
The over-arching goal is to look for attributes(most likely genes) which distinguish patients who do and do not cure the infection after treatment. If possible, these will be apparent on the first visit.
The metadata factors (in no particular order) of the experiment are:
Metadata was also collected for the patients and there are many factors which have potential to affect the outcome. These analyses will generally remain agnostic for these factors, which include (among other things):
The samples used in these analyses were all collected, purified, and libraries generated by the scientists/doctors at CIDEIM. The sequencing libraries were generated via the TruSeq non-stranded library kit and sequenced either at JHU or UMD; earlier samples were single-ended, but most were paired.
All samples were trimmed with trimomatic using the same set of parameters. All mapping was performed with hisat2 version 2.2.1. Quantifications were also performed with salmon version 1.2.0. The reference genome used was hg38 100 (released 202004). When samples were mapped against L.panamensis, the TriTrypDB version 36 reference was used. The set of annotations were therefore limited to ensembl’s 2020 release. The early samples were actually first mapped with hg38 91 and later redone.
This document will limit itself to a set of canonical RNAseq analyses. It must therefore create files containing the raw data to facilitate sharing the data. It will plot metrics of the data to demonstrate the sequencing quality and clustering of the samples under the various conditions examined and normalizations employed. It will perform differential expression analyses for the metadata factors of interest alongside likelihood ratio tests for factors like celltype and time. Given the sets of over/under expressed genes observed in the various DE methods, it will perform the likely gene set tests for over represented gene ontology groups, reactome, etc. Simultaneously, the raw data will be passed to gene set variance analyses to see if there are groups of genes overrepresented in other experiments.
There are two metadata sources:
samplesheet <- "sample_sheets/tmrc3_samples_202203.xlsx"
crf_metadata <- "sample_sheets/20210825_EXP_ESPECIAL_TMRC3_VERSION_2.xlsx"
The primary annotation sources are:
Both provide GO data. They also provide helpful links to other data sources. For the moment, we are focusing on the human annotations.
These analyses have focused on gene-level abundances/differences. Thus, when htseq-count was invoked against the hisat2-based mappings, parameters were chosen to count genes rather than transcripts. Similarly, when salmon counts were used via tximport, a mapping of genes to transcripts was used to collapse the matrix to gene-level abundances. This decision may be revisited.
hs_annot <- load_biomart_annotations(year="2020", month="jan")
## The biomart annotations file already exists, loading from it.
hs_annot <- hs_annot[["annotation"]]
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique=TRUE)
tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]
summary(hs_annot)
## ensembl_transcript_id ensembl_gene_id version transcript_version
## Length:227784 Length:227784 Min. : 1.0 Min. : 1.00
## Class :character Class :character 1st Qu.: 6.0 1st Qu.: 1.00
## Mode :character Mode :character Median :12.0 Median : 1.00
## Mean :10.7 Mean : 3.08
## 3rd Qu.:15.0 3rd Qu.: 5.00
## Max. :28.0 Max. :17.00
##
## hgnc_symbol description gene_biotype cds_length
## Length:227784 Length:227784 Length:227784 Min. : 3
## Class :character Class :character Class :character 1st Qu.: 357
## Mode :character Mode :character Mode :character Median : 694
## Mean : 1140
## 3rd Qu.: 1446
## Max. :107976
## NA's :127222
## chromosome_name strand start_position end_position
## Length:227784 Length:227784 Min. :5.77e+02 Min. :6.47e+02
## Class :character Class :character 1st Qu.:3.11e+07 1st Qu.:3.12e+07
## Mode :character Mode :character Median :6.04e+07 Median :6.06e+07
## Mean :7.41e+07 Mean :7.42e+07
## 3rd Qu.:1.09e+08 3rd Qu.:1.09e+08
## Max. :2.49e+08 Max. :2.49e+08
##
## transcript
## Length:227784
## Class :character
## Mode :character
##
##
##
##
The set of GO categories has not been limited to the 2020 data at the time of this writing.
hs_go <- load_biomart_go()[["go"]]
## The biomart annotations file already exists, loading from it.
hs_length <- hs_annot[, c("ensembl_gene_id", "cds_length")]
colnames(hs_length) <- c("ID", "length")
Before we do any of the following subsets/analyses of the data, we need to collect it all in one place. Let’s do that here and name it ‘hs_valid’ meaning it is the set of valid data for Homo sapiens.
sanitize_columns <- c("visitnumber", "clinicaloutcome", "donor",
"typeofcells", "clinicalpresentation", "drug",
"condition", "batch")
hs_expt <- create_expt(samplesheet,
file_column="hg38100hisatfile",
savefile=glue::glue("rda/hs_expt_all-v{ver}.rda"),
gene_info=hs_annot) %>%
exclude_genes_expt(column="gene_biotype", method="keep",
pattern="protein_coding", meta_column="ncrna_lost") %>%
sanitize_expt_metadata(columns=sanitize_columns) %>%
set_expt_factors(columns=sanitize_columns, class="factor") %>%
set_expt_conditions(fact="clinicaloutcome") %>%
set_expt_batches(fact="visitnumber")
## Reading the sample metadata.
## Dropped 69 rows from the sample metadata because the sample ID is blank.
## The sample definitions comprises: 254 rows(samples) and 84 columns(metadata fields).
## Warning in create_expt(samplesheet, file_column = "hg38100hisatfile", savefile =
## glue::glue("rda/hs_expt_all-v{ver}.rda"), : Some samples were removed when cross
## referencing the samples against the count data.
## Matched 21447 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final expressionset has 21481 features and 246 samples.
## remove_genes_expt(), before removal, there were 21481 genes, now there are 19923.
## There are 8 samples which kept less than 90 percent counts.
## TMRC30044 TMRC30045 TMRC30154 TMRC30017 TMRC30019 TMRC30241 TMRC30015 TMRC30269
## 80.29 73.28 83.16 85.66 89.69 89.37 79.19 89.18
## The following should make visit 1 the largest if one uses that column as a size factor when plotting.
meta <- pData(hs_expt) %>%
mutate(visitnumber = fct_relevel(visitnumber, c("notapplicable", "3", "2", "1")))
pData(hs_expt) <- meta
The above block does the following:
Let us also merge in the clinician’s metadata. I worry a little that this might not be allowed for dbGap data, but if it is a problem I suspect we can just remove the bad columns from it. Also note that I rarely use the join function, but it is somewhat required here because I do not want to risk shuffling the metadata when I add the new metadata, which comes from a spreadsheet sorted by patient, not sample. In doing this I therefore just created a new column ‘join’ which contains the shared information, e.g. the patient ID from the existing metadata and the same ID from the CRF file which has been coerced into lowercase.
hs_pd <- pData(hs_expt)
start <- rownames(hs_pd)
hs_crf <- openxlsx::read.xlsx(crf_metadata)
hs_crf[["join"]] <- tolower(hs_crf[["codigo_paciente"]])
hs_pd[["join"]] <- hs_pd[["tubelabelorigin"]]
test <- plyr::join(hs_pd, hs_crf, by="join")
test[["join"]] <- NULL
rownames(test) <- rownames(hs_pd)
na_idx <- is.na(test)
test[na_idx] <- "undefined"
pData(hs_expt) <- test
The above block does the following:
In our shared online sample sheet there is a clinical outcome column which should match the final CRF column ‘ef_lc_estado_final_estudio’ with the caveat that the CRF data is numeric:
0: curacion definitiva 1: fall terapeutica 2: perdida druante el seguimiento 3: excludio durante el estudio
two_columns <- pData(hs_expt)[, c("clinicaloutcome", "ef_lc_estado_final_estudio")]
undef_idx <- two_columns[[2]] == "undefined"
two_columns <- two_columns[!undef_idx, ]
two_columns[["rewritten"]] <- "undef"
cure_idx <- two_columns[[2]] == 0
two_columns[cure_idx, "rewritten"] <- "cure"
fail_idx <- two_columns[[2]] == 1
two_columns[fail_idx, "rewritten"] <- "failure"
lost_idx <- two_columns[[2]] == 2
two_columns[lost_idx, "rewritten"] <- "lost"
same_idx <- two_columns[["clinicaloutcome"]] == two_columns[["rewritten"]]
broken <- two_columns[!same_idx, ]\
broken
## Error: <text>:12:35: unexpected '\\'
## 11: same_idx <- two_columns[["clinicaloutcome"]] == two_columns[["rewritten"]]
## 12: broken <- two_columns[!same_idx, ]\
## ^
dim(pData(hs_expt))
## [1] 246 144
table(pData(hs_expt)$drug)
##
## antimony miltefosine none
## 216 8 22
table(pData(hs_expt)$clinic)
##
## Cali Tumaco undefined
## 75 143 28
table(pData(hs_expt)$typeofcells)
##
## biopsy eosinophils macrophages monocytes neutrophils pbmcs
## 21 45 28 74 72 6
table(pData(hs_expt)$visit)
##
## notapplicable 3 2 1
## 28 54 56 108
summary(as.numeric(pData(hs_expt)$eb_lc_tiempo_evolucion))
## Warning in summary(as.numeric(pData(hs_expt)$eb_lc_tiempo_evolucion)): NAs
## introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2.00 4.00 6.00 7.37 8.00 21.00 60
summary(as.numeric(pData(hs_expt)$eb_lc_tto_mcto_glucan_dosis))
## Warning in summary(as.numeric(pData(hs_expt)$eb_lc_tto_mcto_glucan_dosis)): NAs
## introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 13.0 14.0 19.0 59.5 20.0 999.0 60
summary(as.numeric(pData(hs_expt)$v3_lc_ejey_lesion_mm_1))
## Warning in summary(as.numeric(pData(hs_expt)$v3_lc_ejey_lesion_mm_1)): NAs
## introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 7.2 32.4 300.0 999.0 999.0 60
summary(as.numeric(pData(hs_expt)$v3_lc_lesion_area_1))
## Warning in summary(as.numeric(pData(hs_expt)$v3_lc_lesion_area_1)): NAs
## introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 223 999 2133 3016 11487 60
summary(as.numeric(pData(hs_expt)$v3_lc_ejex_ulcera_mm_1))
## Warning in summary(as.numeric(pData(hs_expt)$v3_lc_ejex_ulcera_mm_1)): NAs
## introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 0.0 12.5 292.8 999.0 999.0 60
table(pData(hs_expt)$eb_lc_sexo)
##
## 1 2 undefined
## 155 31 60
table(pData(hs_expt)$eb_lc_etnia)
##
## 1 2 3 undefined
## 99 53 34 60
summary(as.numeric(pData(hs_expt)$edad))
## Warning in summary(as.numeric(pData(hs_expt)$edad)): NAs introduced by coercion
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 18.0 24.0 28.0 29.9 36.0 51.0 60
table(pData(hs_expt)$eb_lc_peso)
##
## 100.8 53.9 55.9 57.9 58.1 58.3 58.6 59
## 2 9 2 2 7 10 11 14
## 59.6 62 63 67 69.4 74.7 75.6 76.5
## 1 8 6 6 10 3 3 3
## 77 78 79.2 82 83.3 83.4 86.4 87
## 18 10 10 9 4 10 9 3
## 89 93.3 undefined
## 9 7 60
table(pData(hs_expt)$eb_lc_estatura)
##
## 145 152 154 158 159 160 162 163
## 6 1 10 17 2 6 10 9
## 164 165 166 167 172 173 174 175
## 15 12 19 3 10 4 30 2
## 176 177 182 183 undefined
## 1 10 9 10 60
table(pData(hs_expt)$clinic)
##
## Cali Tumaco undefined
## 75 143 28
table(pData(hs_expt)$ef_lc_estado_final_estudio)
##
## 0 1 2 undefined
## 75 69 16 86
table(pData(hs_expt)$clinicaloutcome)
##
## cure failure lost notapplicable
## 104 60 18 64
There are lots of ways which we will categorize the data, here are some potential color choices for them.
cf_colors <- list(
"cure" = "#998EC3",
"failure" = "#F1A340")
type_visit_colors <- list(
"monocytes_v1" = "#DD1C77",
"monocytes_v2" = "#C994C7",
"monocytes_v3" = "#E7E1EF",
"eosinophils_v1" = "#31A354",
"eosinophils_v2" = "#ADDD8E",
"eosinophils_v3" = "#F7FCD9",
"neutrophils_v1" = "#3182BD",
"neutrophils_v2" = "#9ECAE1",
"neutrophils_v3" = "#DEEBF7",
"biopsy_v1" = "#D95F0E")
type_colors <- list(
"monocytes" = "#DD1C77",
"eosinophils" = "#31A354",
"neutrophils" = "#3182BD",
"biopsy" = "#D95F0E")
visit_colors <- list()
cf_type_colors <- list(
cure_biopsy = "#D95F0E",
failure_biopsy = "#FEC44F",
cure_monocytes = "#DD1C77",
failure_monocytes = "#C994C7",
cure_eosinophils = "#31A354",
failure_eosinophils = "#ADDD8E",
cure_neutrophils = "#3182BD",
failure_neutrophils = "#9ECAE1")
Note from Maria Adelaida: Some chemokines are suggestive of Eosinophil recruitment.
The following block contains the primary dataset which is the parent of everything which follows.
There exists a baseline coverage below which we do not wish to fall. One likely way to approach it heuristically is to assume we should observe some number of genes. With that in mind, I arbitrarily chose 11,000 non-zero genes as the minimum.
With this in mind, here is a non-zero plot before a cutoff of 11,000 genes.
all_nz <- plot_nonzero(hs_expt)
all_nz$plot
## Warning: ggrepel: 229 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
To my eyes, there are 3 or 4 samples which are likely candidates for removal.
hs_valid <- subset_expt(hs_expt, nonzero=11000) %>%
subset_expt(subset="clinicaloutcome!='lost'") %>%
subset_expt(subset="clinicaloutcome!='notapplicable'") %>%
subset_expt(subset="clinicaloutcome!='null'") %>%
set_expt_colors(cf_colors)
## The samples (and read coverage) removed when filtering 11000 non-zero genes are:
## TMRC30050 TMRC30052 TMRC30010
## 807571 3086349 52429
## subset_expt(): There were 246, now there are 243 samples.
## subset_expt(): There were 243, now there are 226 samples.
## subset_expt(): There were 226, now there are 162 samples.
## subset_expt(): There were 162, now there are 162 samples.
Here is the distribution of samples after dropping the less than 11,000 gene-samples.
nz_post <- plot_nonzero(hs_valid)
nz_post$plot
## Warning: ggrepel: 143 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
I recently convinced myself that there is a difference between the data when it does and does not include the miltefosine treated patients.
However, when I actually counted up the samples by a few criteria I immediately relized this is spurious.
table(pData(hs_valid)$drug)
##
## antimony miltefosine
## 159 3
table(pData(hs_valid)$clinic)
##
## Cali Tumaco
## 38 124
table(pData(hs_valid)$clinicaloutcome)
##
## cure failure
## 104 58
table(pData(hs_valid)$typeofcells)
##
## biopsy eosinophils monocytes neutrophils
## 17 34 56 55
table(pData(hs_valid)$visit)
##
## 3 2 1
## 44 45 73
summary(as.numeric(pData(hs_valid)$eb_lc_tiempo_evolucion))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 5.00 7.14 8.00 21.00
summary(as.numeric(pData(hs_valid)$eb_lc_tto_mcto_glucan_dosis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.0 15.0 19.0 35.7 20.0 999.0
summary(as.numeric(pData(hs_valid)$v3_lc_ejey_lesion_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 7.2 32.7 340.2 999.0 999.0
summary(as.numeric(pData(hs_valid)$v3_lc_lesion_area_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 222 999 2000 2448 11487
summary(as.numeric(pData(hs_valid)$v3_lc_ejex_ulcera_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 35 336 999 999
table(pData(hs_valid)$eb_lc_sexo)
##
## 1 2
## 137 25
table(pData(hs_valid)$eb_lc_etnia)
##
## 1 2 3
## 91 37 34
summary(as.numeric(pData(hs_valid)$edad))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 25.0 28.0 29.9 35.0 51.0
table(pData(hs_valid)$eb_lc_peso)
##
## 100.8 53.9 57.9 58.1 58.3 58.6 59 59.6 62 63 67 69.4 75.6
## 2 9 2 7 10 1 8 1 6 6 6 10 3
## 76.5 77 78 79.2 82 83.3 83.4 86.4 87 89 93.3
## 3 18 10 10 9 4 10 9 3 8 7
table(pData(hs_valid)$eb_lc_estatura)
##
## 152 154 158 159 160 163 164 165 166 167 172 173 174 176 177 182 183
## 1 10 15 2 6 9 15 12 19 3 10 4 29 1 7 9 10
table(pData(hs_valid)$clinic)
##
## Cali Tumaco
## 38 124
The above block does what it says on the tin, subsets the data to exclude any sample with less than 11,000 observed genes.
One of the first global metrics I would like to provide is the set of library sizes. Unfortunately, we have too many samples to fit on a screen now. Therefore, I am going to do an early split of the data by cell type in the hopes that doing so will make it possible to visualize the library sizes/nonzero genes.
The initial factor for this is ‘typeofcells’.
table(pData(hs_valid)[["typeofcells"]])
##
## biopsy eosinophils monocytes neutrophils
## 17 34 56 55
biopsy_samples <- subset_expt(hs_valid, subset="typeofcells=='biopsy'")
## subset_expt(): There were 162, now there are 17 samples.
eosinophil_samples <- subset_expt(hs_valid, subset="typeofcells=='eosinophils'")
## subset_expt(): There were 162, now there are 34 samples.
monocyte_samples <- subset_expt(hs_valid, subset="typeofcells=='monocytes'")
## subset_expt(): There were 162, now there are 56 samples.
neutrophil_samples <- subset_expt(hs_valid, subset="typeofcells=='neutrophils'")
## subset_expt(): There were 162, now there are 55 samples.
## Currently, these are not used, but instead I pulled the samples from the hs_clinical
## which means the biopsies are not included.
v1_samples <- subset_expt(hs_valid, subset="visitnumber=='1'")
## subset_expt(): There were 162, now there are 73 samples.
v1_monocytes <- subset_expt(v1_samples, subset="typeofcells=='monocytes'")
## subset_expt(): There were 73, now there are 22 samples.
v1_neutrophils <- subset_expt(v1_samples, subset="typeofcells=='neutrophils'")
## subset_expt(): There were 73, now there are 22 samples.
v1_eosinophils <- subset_expt(v1_samples, subset="typeofcells=='eosinophils'")
## subset_expt(): There were 73, now there are 12 samples.
v2_samples <- subset_expt(hs_valid, subset="visitnumber=='2'")
## subset_expt(): There were 162, now there are 45 samples.
v2_monocytes <- subset_expt(v2_samples, subset="typeofcells=='monocytes'")
## subset_expt(): There were 45, now there are 17 samples.
v2_neutrophils <- subset_expt(v2_samples, subset="typeofcells=='neutrophils'")
## subset_expt(): There were 45, now there are 17 samples.
v2_eosinophils <- subset_expt(v2_samples, subset="typeofcells=='eosinophils'")
## subset_expt(): There were 45, now there are 11 samples.
v3_samples <- subset_expt(hs_valid, subset="visitnumber=='3'")
## subset_expt(): There were 162, now there are 44 samples.
v3_monocytes <- subset_expt(v3_samples, subset="typeofcells=='monocytes'")
## subset_expt(): There were 44, now there are 17 samples.
v3_neutrophils <- subset_expt(v3_samples, subset="typeofcells=='neutrophils'")
## subset_expt(): There were 44, now there are 16 samples.
v3_eosinophils <- subset_expt(v3_samples, subset="typeofcells=='eosinophils'")
## subset_expt(): There were 44, now there are 11 samples.
The above block does a lot of subset operations to create separate data structures on a per-celltype and per-visit basis. Ergo, our large data structure is now joined by ~21 new, smaller data structures.
Let us see if we can make an expressionset of the parasite reads in the TMRC3 samples and distinguish between the faux and real reads. E.g: Are there samples which definitively contain parasites?
lp_expt <- create_expt("sample_sheets/tmrc3_samples_20211207.xlsx",
file_column="lpanamensisv36hisatfile", gene_info = NULL) %>%
subset_expt(coverage=1000) %>%
set_expt_conditions(fact="typeofcells")
visit_fact <- pData(lp_expt)[["visitnumber"]]
batch_na <- is.na(visit_fact)
visit_fact[batch_na] <- "undefined"
lp_expt <- set_expt_batches(lp_expt, fact = visit_fact)
lp_norm <- normalize_expt(lp_expt, filter="simple", norm="quant",
convert="cpm", transform="log2")
plotted <- plot_pca(lp_norm, plot_labels=FALSE)
plotted$plot
plotted_3d <- plot_3d_pca(plotted)
The above block is similar in concept to the previous expressionset creation. It uses a different column and currently ignores the gene annotations. Given that many of the samples have essentially 0 reads, I set a cutoff of only 20 observed genes. Finally, I did a quick and dirty PCA plot of this peculiar data structure in the hopes of being able to ‘see’ the difference between what are assumed to be ‘real’ samples with a significant number of ‘real’ parasite reads vs. those samples which just have a couple of potentially spurious reads.
This might be a bit early to consider the contrasts, but I think we should consider this question immediately. The main thing we will be comparing is of course cure vs. fail; but we may also look at the visits and compare cell types.
cf_contrasts <- list(
"fail_vs_cure" = c("failure", "cure"))
visit_contrasts <- list(
"v2v1" = c("c2", "c1"),
"v3v1" = c("c3", "c1"),
"v3v2" = c("c3", "c2"))
type_contrasts <- list(
"mono_biopsy" = c("monocytes", "biopsy"),
"eosinophil_biopsy" = c("eosinophils", "biopsy"),
"neutrophil_biopsy" = c("neutrophils", "biopsy"))
visit_cf_contrasts <- list(
"v1fail_vs_cure" = c("v1failure", "v1cure"),
"v2fail_vs_cure" = c("v2failure", "v2cure"),
"v3fail_vs_cure" = c("v3failure", "v3cure"))
clinic_contrasts <- list(
"clinics" = c("Cali", "Tumaco"))
Sample IDs starting with 1: Cali IDs starting with 2: Tumaco
The sets of samples used to visualize the data will also comprise the sets used when later performing the various differential expression analyses.
Start out with some initial metrics of all samples. The most obvious are plots of the numbers of non-zero genes observed, heatmaps showing the relative relationships among the samples, the relative library sizes, and some PCA. It might be smart to split the library sizes up across subsets of the data, because they have expanded too far to see well on a computer screen.
The most likely factors to query when considering the entire dataset are cure/fail, visit, and cell type. This is the level at which we will choose samples to exclude from future analyses.
plot_legend(biopsy_samples)$plot
plot_libsize(biopsy_samples)$plot
plot_nonzero(biopsy_samples)$plot
biopsy_prepost <- plot_libsize_prepost(biopsy_samples)
biopsy_prepost$count_plot
biopsy_prepost$lowgene_plot
## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of biopsy genes: ~ 14,000
plot_libsize(eosinophil_samples)$plot
plot_nonzero(eosinophil_samples)$plot
## Warning: ggrepel: 12 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
eosinophil_prepost <- plot_libsize_prepost(eosinophil_samples)
eosinophil_prepost$count_plot
eosinophil_prepost$lowgene_plot
## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of eosinophil genes: ~ 13,500
plot_libsize(monocyte_samples)$plot
plot_nonzero(monocyte_samples)$plot
## Warning: ggrepel: 43 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
monocyte_prepost <- plot_libsize_prepost(monocyte_samples)
monocyte_prepost$count_plot
monocyte_prepost$lowgene_plot
## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of monocyte genes: ~ 7,500 before setting the minimum.
plot_libsize(neutrophil_samples)$plot
plot_nonzero(neutrophil_samples)$plot
## Warning: ggrepel: 37 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
neutrophil_prepost <- plot_libsize_prepost(neutrophil_samples)
neutrophil_prepost$count_plot
neutrophil_prepost$lowgene_plot
## Warning: Using alpha for a discrete variable is not advised.
## Minimum number of neutrophil genes: ~ 10,000 before setting minimum coverage.
The above block just repeats the same two plots on a per-celltype basis: the number of reads observed / sample and a plot of observed genes with respect to coverage. I made some comments with my observations about the number of genes.
Now that those ‘global’ metrics are out of the way, lets look at some global metrics of the data following normalization; the most likely plots are of course PCA but also a couple of heatmaps.
In the google doc TMRC3_Aug18_2021, there is an example of an image for the first figure:
“Transcriptomic profiles of primary innate cells of CL patients show unique transcriptional signatures - Remove PBMCs and M0, maybe biopsies as well (but Remove WT samples)”
While we were talking in a meeting however, it sounded like there was some desire to keep all cell types. Therefore the following block will have one image with everything and one following the above.
type_valid <- set_expt_conditions(hs_valid, fact="typeofcells") %>%
set_expt_batches(fact="clinicaloutcome") %>%
set_expt_colors(type_colors)
all_norm <- sm(normalize_expt(type_valid, transform="log2", norm="quant",
convert="cpm", filter=TRUE))
all_pca <- plot_pca(all_norm, plot_labels=FALSE,
plot_title="PCA - Cell type", size_column="visitnumber")
dev <- pp(file=glue("images/tmrc3_pca_nolabels-v{ver}.png"))
all_pca$plot
closed <- dev.off()
all_pca$plot
all_pca_nosize <- plot_pca(all_norm, plot_labels=FALSE)
all_pca_nosize$plot
write.csv(all_pca$table, file="coords/hs_donor_pca_coords.csv")
all_cf_norm <- set_expt_batches(all_norm,
fact="visitnumber")
all_cf_corheat <- plot_corheat(all_cf_norm, plot_title="Heirarchical clustering:
cell types")
dev <- pp(file=glue("images/tmrc3_corheat_cf-v{ver}.png"))
all_cf_corheat$plot
closed <- dev.off()
all_cf_corheat$plot
all_cf_disheat <- plot_disheat(all_cf_norm, plot_title="Heirarchical clustering:
cell types")
dev <- pp(file=glue("images/tmrc3_disheat_cf-v{ver}.png"))
all_cf_disheat$plot
closed <- dev.off()
all_cf_disheat$plot
The biggest caveat for this is to ensure that there are no Wellcome Trust samples.
A potential figure legend for the following images might include:
The observed counts per gene for all of the clinical samples were filtered, log transformed, cpm converted, and quantile normalized. The colors were defined by cell types and shapes by patient visit. When the first two principle components were plotted, clustering was observed by cell type. The biopsy samples were significantly different from the innate immune cell types.
fig1v2_norm <- normalize_expt(type_valid, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 5663 low-count genes (14260 remaining).
## transform_counts: Found 506 values equal to 0, adding 1 to the matrix.
fig1v2_pca <- plot_pca(fig1v2_norm, cis=FALSE)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/tmrc3_fig1v2.png"))
fig1v2_pca$plot
closed <- dev.off()
fig1v2_pca$plot
fig1v3_samples <- subset_expt(type_valid, subset="condition!='biopsy'")
## subset_expt(): There were 162, now there are 145 samples.
fig1v3_norm <- normalize_expt(fig1v3_samples, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 7868 low-count genes (12055 remaining).
## transform_counts: Found 110 values equal to 0, adding 1 to the matrix.
fig1v3_pca <- plot_pca(fig1v3_norm, cis=FALSE)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file="images/tmrc3_fig1v3.png")
fig1v3_pca$plot
closed <- dev.off()
fig1v3_pca$plot
Continue looking, but switch the conditions/colors so that the clinical outcome becomes the focus and get rid of a few samples which are not actually a part of the TMRC3 focus (e.g. the PBMC and macrophage samples, which are all from the Wellcome Trust).
Included in this group will be the samples from patients who were lost.
In our 20220218 meeting, it was decided that we would focus explicitly on the cure and fail samples, ignoring lost/NA/null samples.
Thus I am adding explicit filters right at the top to exclude them.
hs_clinical <- hs_valid %>%
set_expt_conditions(fact="clinicaloutcome") %>%
set_expt_batches(fact="typeofcells") %>%
set_expt_colors(cf_colors)
hs_clinical_nobiop <- subset_expt(hs_clinical, subset="typeofcells!='biopsy'")
## subset_expt(): There were 162, now there are 145 samples.
clinic_biopsy <- hs_valid %>%
set_expt_conditions(fact="clinic") %>%
set_expt_batches(fact="clinicaloutcome") %>%
subset_expt(subset="typeofcells=='biopsy'")
## subset_expt(): There were 162, now there are 17 samples.
clinic_cf <- paste0(pData(clinic_biopsy)$condition, "_",
pData(clinic_biopsy)$batch)
table(clinic_cf)
## clinic_cf
## Cali_cure Tumaco_cure Tumaco_failure
## 3 9 5
clinic_biopsy <- set_expt_conditions(clinic_biopsy, fact=clinic_cf) %>%
set_expt_batches(fact="visitnumber")
clinic_biopsy_norm <- normalize_expt(clinic_biopsy, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 6321 low-count genes (13602 remaining).
## transform_counts: Found 169 values equal to 0, adding 1 to the matrix.
clinic_biopsy_pca <- plot_pca(clinic_biopsy_norm, plot_labels=FALSE)
dev <- pp(file="images/biopsy_place.png")
clinic_biopsy_pca$plot
closed <- dev.off()
clinic_biopsy_pca$plot
clinic_biopsy_nb <- normalize_expt(clinic_biopsy, transform="log2",
convert="cpm", batch="svaseq", filter=TRUE)
## Removing 6321 low-count genes (13602 remaining).
## Setting 275 low elements to zero.
## transform_counts: Found 275 values equal to 0, adding 1 to the matrix.
clinic_biopsy_nb_pca <- plot_pca(clinic_biopsy_nb, plot_labels=FALSE)
dev <- pp(file="images/biopsy_place_nb.png")
clinic_biopsy_nb_pca$plot
closed <- dev.off()
clinic_biopsy_nb_pca$plot
clinic_eosinophil <- hs_clinical %>%
set_expt_conditions(fact="clinic") %>%
set_expt_batches(fact="clinicaloutcome") %>%
subset_expt(subset="typeofcells=='eosinophils'")
## subset_expt(): There were 162, now there are 34 samples.
clinic_cf <- paste0(pData(clinic_eosinophil)$condition, "_",
pData(clinic_eosinophil)$batch)
table(clinic_cf)
## clinic_cf
## Cali_cure Tumaco_cure Tumaco_failure
## 7 18 9
clinic_eosinophil <- set_expt_conditions(clinic_eosinophil, fact=clinic_cf) %>%
set_expt_batches(fact="visitnumber")
clinic_eosinophil_norm <- normalize_expt(clinic_eosinophil, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 9238 low-count genes (10685 remaining).
## transform_counts: Found 4 values equal to 0, adding 1 to the matrix.
clinic_eosinophil_pca <- plot_pca(clinic_eosinophil_norm, plot_labels=FALSE)
dev <- pp(file="images/eosinophil_place.png")
clinic_eosinophil_pca$plot
closed <- dev.off()
clinic_eosinophil_pca$plot
clinic_eosinophil_nb <- normalize_expt(clinic_eosinophil, transform="log2",
convert="cpm", batch="svaseq", filter=TRUE)
## Removing 9238 low-count genes (10685 remaining).
## Setting 581 low elements to zero.
## transform_counts: Found 581 values equal to 0, adding 1 to the matrix.
clinic_eosinophil_nb_pca <- plot_pca(clinic_eosinophil_nb, plot_labels=FALSE)
dev <- pp(file="images/eosinophil_place_nb.png")
clinic_eosinophil_nb_pca$plot
closed <- dev.off()
clinic_eosinophil_nb_pca$plot
clinic_monocyte <- hs_clinical %>%
set_expt_conditions(fact="clinic") %>%
set_expt_batches(fact="clinicaloutcome") %>%
subset_expt(subset="typeofcells=='monocytes'")
## subset_expt(): There were 162, now there are 56 samples.
clinic_cf <- paste0(pData(clinic_monocyte)$condition, "_",
pData(clinic_monocyte)$batch)
table(clinic_cf)
## clinic_cf
## Cali_cure Cali_failure Tumaco_cure Tumaco_failure
## 11 3 23 19
clinic_monocyte <- set_expt_conditions(clinic_monocyte, fact=clinic_cf) %>%
set_expt_batches(fact="visitnumber")
clinic_monocyte_norm <- normalize_expt(clinic_monocyte, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 8867 low-count genes (11056 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
clinic_monocyte_pca <- plot_pca(clinic_monocyte_norm, plot_labels=FALSE)
dev <- pp(file="images/monocytes_place.png")
clinic_monocyte_pca$plot
closed <- dev.off()
clinic_monocyte_pca$plot
clinic_monocyte_nb <- normalize_expt(clinic_monocyte, transform="log2",
convert="cpm", batch="svaseq", filter=TRUE)
## Removing 8867 low-count genes (11056 remaining).
## Setting 1134 low elements to zero.
## transform_counts: Found 1134 values equal to 0, adding 1 to the matrix.
clinic_monocyte_nb_pca <- plot_pca(clinic_monocyte_nb, plot_labels=FALSE)
dev <- pp(file="images/monocytes_place_nb.png")
clinic_monocyte_nb_pca$plot
closed <- dev.off()
clinic_monocyte_nb_pca$plot
clinic_neutrophil <- hs_clinical %>%
set_expt_conditions(fact="clinic") %>%
set_expt_batches(fact="clinicaloutcome") %>%
subset_expt(subset="typeofcells=='neutrophils'")
## subset_expt(): There were 162, now there are 55 samples.
clinic_cf <- paste0(pData(clinic_neutrophil)$condition, "_",
pData(clinic_neutrophil)$batch)
table(clinic_cf)
## clinic_cf
## Cali_cure Cali_failure Tumaco_cure Tumaco_failure
## 11 3 22 19
clinic_neutrophil <- set_expt_conditions(clinic_neutrophil, fact=clinic_cf) %>%
set_expt_batches(fact="visitnumber")
clinic_neutrophil_norm <- normalize_expt(clinic_neutrophil, transform="log2",
convert="cpm", norm="quant", filter=TRUE)
## Removing 10713 low-count genes (9210 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
clinic_neutrophil_pca <- plot_pca(clinic_neutrophil_norm, plot_labels=FALSE)
dev <- pp(file="images/neutrophil_place.png")
clinic_neutrophil_pca$plot
closed <- dev.off()
clinic_neutrophil_pca$plot
clinic_neutrophil_nb <- normalize_expt(clinic_neutrophil, transform="log2",
convert="cpm", batch="svaseq", filter=TRUE)
## Removing 10713 low-count genes (9210 remaining).
## Setting 1119 low elements to zero.
## transform_counts: Found 1119 values equal to 0, adding 1 to the matrix.
clinic_neutrophil_nb_pca <- plot_pca(clinic_neutrophil_nb, plot_labels=FALSE)
dev <- pp(file="images/neutrophil_place_nb.png")
clinic_neutrophil_nb_pca$plot
closed <- dev.off()
clinic_neutrophil_nb_pca$plot
A majority of samples came from tumaco, let us see what happens if we limit the data to them.
Let us therefore consider a couple of different view of this portion of the data.
tumaco_clinical <- hs_clinical %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 162, now there are 124 samples.
tumaco_monocyte <- tumaco_clinical %>%
subset_expt(subset="typeofcells=='monocytes'") %>%
set_expt_conditions(fact="clinicaloutcome") %>%
set_expt_batches(fact="visitnumber") %>%
set_expt_samplenames("tubelabelorigin")
## subset_expt(): There were 124, now there are 42 samples.
tumaco_monocyte_norm <- normalize_expt(tumaco_monocyte, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 9057 low-count genes (10866 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
tumaco_monocyte_pca <- plot_pca(tumaco_monocyte_norm, plot_labels=FALSE)
dev <- pp(file="images/monocytes_oneplace_norm.png")
tumaco_monocyte_pca$plot
closed <- dev.off()
tumaco_monocyte_pca$plot
tumaco_monocyte_nb <- normalize_expt(tumaco_monocyte, transform="log2", convert="cpm",
batch="svaseq", filter=TRUE)
## Removing 9057 low-count genes (10866 remaining).
## Setting 705 low elements to zero.
## transform_counts: Found 705 values equal to 0, adding 1 to the matrix.
tumaco_monocyte_nb_pca <- plot_pca(tumaco_monocyte_nb, plot_labels=FALSE)
dev <- pp(file="images/monocytes_oneplace_norm_sva.png")
tumaco_monocyte_nb_pca$plot
closed <- dev.off()
tumaco_monocyte_nb_pca$plot
Let us move a direct comparison of the two clinics right to the top.
hs_clinic <- hs_valid %>%
set_expt_conditions(fact="clinic") %>%
set_expt_batches(fact="typeofcells")
hs_clinic_norm <- normalize_expt(hs_clinic, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 5663 low-count genes (14260 remaining).
## transform_counts: Found 506 values equal to 0, adding 1 to the matrix.
hs_clinic_pca <- plot_pca(hs_clinic_norm)
## plot labels was not set and there are more than 100 samples, disabling it.
hs_clinic_pca$plot
hs_clinic_nb <- normalize_expt(hs_clinic, transform="log2", convert="cpm",
batch="svaseq", filter=TRUE)
## Removing 5663 low-count genes (14260 remaining).
## Setting 24623 low elements to zero.
## transform_counts: Found 24623 values equal to 0, adding 1 to the matrix.
hs_clinic_nb_pca <- plot_pca(hs_clinic_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
hs_clinic_nb_pca$plot
hs_clinic_de <- all_pairwise(hs_clinic, model_batch="svaseq", filter=TRUE)
## Removing 0 low-count genes (14260 remaining).
## Setting 24623 low elements to zero.
## transform_counts: Found 24623 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
hs_clinic_table <- combine_de_tables(
hs_clinic_de, keepers=clinic_contrasts,
excel=glue::glue("excel/hs_clinic_table-v{ver}.xlsx"))
## Deleting the file excel/hs_clinic_table-v202203.xlsx before writing the tables.
hs_clinic_sig <- extract_significant_genes(
hs_clinic_table,
excel=glue::glue("excel/hs_clinic_sig-v{ver}.xlsx"))
## Deleting the file excel/hs_clinic_sig-v202203.xlsx before writing the tables.
clinic_biopsy_de <- all_pairwise(clinic_biopsy, model_batch="svaseq", filter=TRUE)
## Removing 0 low-count genes (13602 remaining).
## Setting 275 low elements to zero.
## transform_counts: Found 275 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
clinic_biopsy_table <- combine_de_tables(
clinic_biopsy_de,
excel=glue::glue("excel/clinic_biopsy_table-v{ver}.xlsx"))
## Deleting the file excel/clinic_biopsy_table-v202203.xlsx before writing the tables.
clinic_biopsy_sig <- extract_significant_genes(
clinic_biopsy_table,
excel=glue::glue("excel/clinic_biopsy_sig-v{ver}.xlsx"))
## Deleting the file excel/clinic_biopsy_sig-v202203.xlsx before writing the tables.
clinic_eosinophil_de <- all_pairwise(clinic_eosinophil, model_batch=FALSE, filter=TRUE)
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
clinic_eosinophil_table <- combine_de_tables(
clinic_eosinophil_de,
excel=glue::glue("excel/clinic_eosinophil_table-v{ver}.xlsx"))
## Deleting the file excel/clinic_eosinophil_table-v202203.xlsx before writing the tables.
clinic_eosinophil_sig <- extract_significant_genes(
clinic_eosinophil_table,
excel=glue::glue("excel/clinic_eosinophil_sig-v{ver}.xlsx"))
## Deleting the file excel/clinic_eosinophil_sig-v202203.xlsx before writing the tables.
clinic_monocyte_de <- all_pairwise(clinic_monocyte, model_batch=FALSE, filter=TRUE)
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
clinic_monocyte_table <- combine_de_tables(
clinic_monocyte_de,
excel=glue::glue("excel/clinic_monocyte_table-v{ver}.xlsx"))
## Deleting the file excel/clinic_monocyte_table-v202203.xlsx before writing the tables.
clinic_monocyte_sig <- extract_significant_genes(
clinic_monocyte_table,
excel=glue::glue("excel/clinic_monocyte_sig-v{ver}.xlsx"))
## Deleting the file excel/clinic_monocyte_sig-v202203.xlsx before writing the tables.
clinic_neutrophil_de <- all_pairwise(clinic_neutrophil, model_batch=FALSE, filter=TRUE)
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
clinic_neutrophil_table <- combine_de_tables(
clinic_neutrophil_de,
excel=glue::glue("excel/clinic_neutrophil_table-v{ver}.xlsx"))
## Deleting the file excel/clinic_neutrophil_table-v202203.xlsx before writing the tables.
clinic_neutrophil_sig <- extract_significant_genes(
clinic_neutrophil_table,
excel=glue::glue("excel/clinic_neutrophil_sig-v{ver}.xlsx"))
## Deleting the file excel/clinic_neutrophil_sig-v202203.xlsx before writing the tables.
clinic_sigenes_up <- rownames(hs_clinic_sig[["deseq"]][["ups"]][[1]])
clinic_sigenes_down <- rownames(hs_clinic_sig[["deseq"]][["downs"]][[1]])
clinic_sigenes <- c(clinic_sigenes_up, clinic_sigenes_down)
contrast <- "Tumacocure_vs_Calicure"
clinic_biopsy_sigenes <- c(rownames(clinic_biopsy_sig[["deseq"]][["ups"]][[contrast]]),
rownames(clinic_biopsy_sig[["deseq"]][["downs"]][[contrast]]))
clinic_eosinophil_sigenes_up <- rownames(clinic_eosinophil_sig[["deseq"]][["ups"]][[contrast]])
clinic_eosinophil_sigenes_down <- rownames(clinic_eosinophil_sig[["deseq"]][["downs"]][[contrast]])
clinic_monocyte_sigenes_up <- rownames(clinic_monocyte_sig[["deseq"]][["ups"]][[contrast]])
clinic_monocyte_sigenes_down <- rownames(clinic_monocyte_sig[["deseq"]][["downs"]][[contrast]])
clinic_neutrophil_sigenes_up <- rownames(clinic_neutrophil_sig[["deseq"]][["ups"]][[contrast]])
clinic_neutrophil_sigenes_down <- rownames(clinic_neutrophil_sig[["deseq"]][["downs"]][[contrast]])
clinic_eosinophil_sigenes <- c(clinic_eosinophil_sigenes_up,
clinic_eosinophil_sigenes_down)
clinic_monocyte_sigenes <- c(clinic_monocyte_sigenes_up,
clinic_monocyte_sigenes_down)
clinic_neutrophil_sigenes <- c(clinic_neutrophil_sigenes_up,
clinic_neutrophil_sigenes_down)
clinic_gp <- simple_gprofiler(clinic_sigenes)
## Performing gProfiler GO search of 915 genes against hsapiens.
## GO search found 106 hits.
## Performing gProfiler KEGG search of 915 genes against hsapiens.
## KEGG search found 5 hits.
## Performing gProfiler REAC search of 915 genes against hsapiens.
## REAC search found 12 hits.
## Performing gProfiler MI search of 915 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 915 genes against hsapiens.
## TF search found 33 hits.
## Performing gProfiler CORUM search of 915 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 915 genes against hsapiens.
## HP search found 8 hits.
clinic_gp$pvalue_plots$kegg_plot_over
clinic_gp$pvalue_plots$reactome_plot_over
clinic_gp <- simple_gprofiler(clinic_sigenes)
## Performing gProfiler GO search of 915 genes against hsapiens.
## GO search found 106 hits.
## Performing gProfiler KEGG search of 915 genes against hsapiens.
## KEGG search found 5 hits.
## Performing gProfiler REAC search of 915 genes against hsapiens.
## REAC search found 12 hits.
## Performing gProfiler MI search of 915 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 915 genes against hsapiens.
## TF search found 33 hits.
## Performing gProfiler CORUM search of 915 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 915 genes against hsapiens.
## HP search found 8 hits.
clinic_gp$pvalue_plots$kegg_plot_over
clinic_gp$pvalue_plots$reactome_plot_over
clinic_gp <- simple_gprofiler(clinic_sigenes)
## Performing gProfiler GO search of 915 genes against hsapiens.
## GO search found 106 hits.
## Performing gProfiler KEGG search of 915 genes against hsapiens.
## KEGG search found 5 hits.
## Performing gProfiler REAC search of 915 genes against hsapiens.
## REAC search found 12 hits.
## Performing gProfiler MI search of 915 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 915 genes against hsapiens.
## TF search found 33 hits.
## Performing gProfiler CORUM search of 915 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 915 genes against hsapiens.
## HP search found 8 hits.
clinic_gp$pvalue_plots$kegg_plot_over
clinic_gp$pvalue_plots$reactome_plot_over
clinic_eosinophil_gp <- simple_gprofiler(clinic_eosinophil_sigenes)
## Performing gProfiler GO search of 1136 genes against hsapiens.
## GO search found 255 hits.
## Performing gProfiler KEGG search of 1136 genes against hsapiens.
## KEGG search found 22 hits.
## Performing gProfiler REAC search of 1136 genes against hsapiens.
## REAC search found 6 hits.
## Performing gProfiler MI search of 1136 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 1136 genes against hsapiens.
## TF search found 360 hits.
## Performing gProfiler CORUM search of 1136 genes against hsapiens.
## CORUM search found 6 hits.
## Performing gProfiler HP search of 1136 genes against hsapiens.
## HP search found 0 hits.
clinic_eosinophil_gp$pvalue_plots$kegg_plot_over
clinic_eosinophil_gp$pvalue_plots$reactome_plot_over
clinic_eosinophil_up_gp <- simple_gprofiler(clinic_eosinophil_sigenes_up)
## Performing gProfiler GO search of 590 genes against hsapiens.
## GO search found 232 hits.
## Performing gProfiler KEGG search of 590 genes against hsapiens.
## KEGG search found 19 hits.
## Performing gProfiler REAC search of 590 genes against hsapiens.
## REAC search found 3 hits.
## Performing gProfiler MI search of 590 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 590 genes against hsapiens.
## TF search found 351 hits.
## Performing gProfiler CORUM search of 590 genes against hsapiens.
## CORUM search found 6 hits.
## Performing gProfiler HP search of 590 genes against hsapiens.
## HP search found 0 hits.
clinic_eosinophil_up_gp$pvalue_plots$kegg_plot_over
clinic_eosinophil_up_gp$pvalue_plots$reactome_plot_over
clinic_eosinophil_down_gp <- simple_gprofiler(clinic_eosinophil_sigenes_down)
## Performing gProfiler GO search of 546 genes against hsapiens.
## GO search found 73 hits.
## Performing gProfiler KEGG search of 546 genes against hsapiens.
## KEGG search found 4 hits.
## Performing gProfiler REAC search of 546 genes against hsapiens.
## REAC search found 11 hits.
## Performing gProfiler MI search of 546 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 546 genes against hsapiens.
## TF search found 22 hits.
## Performing gProfiler CORUM search of 546 genes against hsapiens.
## CORUM search found 6 hits.
## Performing gProfiler HP search of 546 genes against hsapiens.
## HP search found 0 hits.
clinic_eosinophil_down_gp$pvalue_plots$kegg_plot_over
clinic_eosinophil_down_gp$pvalue_plots$reactome_plot_over
clinic_monocyte_gp <- simple_gprofiler(clinic_monocyte_sigenes)
## Performing gProfiler GO search of 1241 genes against hsapiens.
## GO search found 408 hits.
## Performing gProfiler KEGG search of 1241 genes against hsapiens.
## KEGG search found 16 hits.
## Performing gProfiler REAC search of 1241 genes against hsapiens.
## REAC search found 8 hits.
## Performing gProfiler MI search of 1241 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 1241 genes against hsapiens.
## TF search found 347 hits.
## Performing gProfiler CORUM search of 1241 genes against hsapiens.
## CORUM search found 6 hits.
## Performing gProfiler HP search of 1241 genes against hsapiens.
## HP search found 6 hits.
clinic_monocyte_gp$pvalue_plots$kegg_plot_over
clinic_monocyte_gp$pvalue_plots$reactome_plot_over
clinic_monocyte_up_gp <- simple_gprofiler(clinic_monocyte_sigenes_up)
## Performing gProfiler GO search of 596 genes against hsapiens.
## GO search found 369 hits.
## Performing gProfiler KEGG search of 596 genes against hsapiens.
## KEGG search found 16 hits.
## Performing gProfiler REAC search of 596 genes against hsapiens.
## REAC search found 5 hits.
## Performing gProfiler MI search of 596 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 596 genes against hsapiens.
## TF search found 342 hits.
## Performing gProfiler CORUM search of 596 genes against hsapiens.
## CORUM search found 5 hits.
## Performing gProfiler HP search of 596 genes against hsapiens.
## HP search found 3 hits.
clinic_monocyte_up_gp$pvalue_plots$kegg_plot_over
clinic_monocyte_up_gp$pvalue_plots$reactome_plot_over
clinic_monocyte_down_gp <- simple_gprofiler(clinic_monocyte_sigenes_down)
## Performing gProfiler GO search of 645 genes against hsapiens.
## GO search found 60 hits.
## Performing gProfiler KEGG search of 645 genes against hsapiens.
## KEGG search found 2 hits.
## Performing gProfiler REAC search of 645 genes against hsapiens.
## REAC search found 4 hits.
## Performing gProfiler MI search of 645 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 645 genes against hsapiens.
## TF search found 206 hits.
## Performing gProfiler CORUM search of 645 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 645 genes against hsapiens.
## HP search found 4 hits.
clinic_monocyte_down_gp$pvalue_plots$kegg_plot_over
clinic_monocyte_down_gp$pvalue_plots$reactome_plot_over
clinic_neutrophil_gp <- simple_gprofiler(clinic_neutrophil_sigenes)
## Performing gProfiler GO search of 981 genes against hsapiens.
## GO search found 185 hits.
## Performing gProfiler KEGG search of 981 genes against hsapiens.
## KEGG search found 10 hits.
## Performing gProfiler REAC search of 981 genes against hsapiens.
## REAC search found 25 hits.
## Performing gProfiler MI search of 981 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 981 genes against hsapiens.
## TF search found 402 hits.
## Performing gProfiler CORUM search of 981 genes against hsapiens.
## CORUM search found 1 hits.
## Performing gProfiler HP search of 981 genes against hsapiens.
## HP search found 1 hits.
clinic_neutrophil_gp$pvalue_plots$kegg_plot_over
clinic_neutrophil_gp$pvalue_plots$reactome_plot_over
clinic_neutrophil_up_gp <- simple_gprofiler(clinic_neutrophil_sigenes_up)
## Performing gProfiler GO search of 542 genes against hsapiens.
## GO search found 157 hits.
## Performing gProfiler KEGG search of 542 genes against hsapiens.
## KEGG search found 7 hits.
## Performing gProfiler REAC search of 542 genes against hsapiens.
## REAC search found 2 hits.
## Performing gProfiler MI search of 542 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 542 genes against hsapiens.
## TF search found 391 hits.
## Performing gProfiler CORUM search of 542 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 542 genes against hsapiens.
## HP search found 0 hits.
clinic_neutrophil_up_gp$pvalue_plots$kegg_plot_over
clinic_neutrophil_up_gp$pvalue_plots$reactome_plot_over
clinic_neutrophil_down_gp <- simple_gprofiler(clinic_neutrophil_sigenes_down)
## Performing gProfiler GO search of 439 genes against hsapiens.
## GO search found 39 hits.
## Performing gProfiler KEGG search of 439 genes against hsapiens.
## KEGG search found 8 hits.
## Performing gProfiler REAC search of 439 genes against hsapiens.
## REAC search found 75 hits.
## Performing gProfiler MI search of 439 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 439 genes against hsapiens.
## TF search found 15 hits.
## Performing gProfiler CORUM search of 439 genes against hsapiens.
## CORUM search found 9 hits.
## Performing gProfiler HP search of 439 genes against hsapiens.
## HP search found 7 hits.
clinic_neutrophil_down_gp$pvalue_plots$kegg_plot_over
clinic_neutrophil_down_gp$pvalue_plots$reactome_plot_over
Let us recolor the same plot by cure/fail followed by a concatenation of the cell type and cure/fail. In the following block, the clinical samples are plotted once with the most common normalization (log,cpm,quant,filtered) method followed by a plot of the data using svaseq adjusted values and without quantile normalization.
Thus, the following block switches the colors of the groups to the clinical state (cure/fail) and shapes by cell type.
hs_clinical_norm <- sm(normalize_expt(hs_clinical, filter="simple", transform="log2",
norm="quant", convert="cpm"))
clinical_pca <- plot_pca(hs_clinical_norm, plot_labels=FALSE,
cis=NULL,
plot_title="PCA - clinical samples")
dev <- pp(file=glue("images/all_clinical_nobatch_pca-v{ver}.png"), height=8, width=16)
clinical_pca$plot
closed <- dev.off()
clinical_pca$plot
hs_clinical_nb <- normalize_expt(hs_clinical, filter="simple", transform="log2",
batch="svaseq", convert="cpm")
## Removing 1894 low-count genes (18029 remaining).
## Setting 137385 low elements to zero.
## transform_counts: Found 137385 values equal to 0, adding 1 to the matrix.
hs_clinical_nb_pca <- plot_pca(hs_clinical_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/all_clinical_svaseqbatch_pca-v{ver}.png"), height=6, width=8)
hs_clinical_nb_pca$plot
closed <- dev.off()
hs_clinical_nb_pca$plot
clinical_pca_info <- pca_information(
hs_clinical_norm, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file="images/clinical_samples_neglogp_pcs.png")
clinical_pca_info$anova_neglogp_heatmap
closed <- dev.off()
clinical_pca_info$anova_neglogp_heatmap
clinical_pca_info$pca_plots$PC20_PC29
## Warning: ggrepel: 115 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
clinical_scores <- pca_highscores(hs_clinical_norm)
clinical_scores[["highest"]][,"Comp.20"]
## [1] "7.756:ENSG00000266302" "5.744:ENSG00000067048" "4.721:ENSG00000129824"
## [4] "4.715:ENSG00000171860" "4.551:ENSG00000154529" "4.2:ENSG00000162722"
## [7] "4.154:ENSG00000179344" "4.071:ENSG00000163993" "3.937:ENSG00000198692"
## [10] "3.934:ENSG00000196735" "3.901:ENSG00000164125" "3.83:ENSG00000276070"
## [13] "3.771:ENSG00000076716" "3.754:ENSG00000196664" "3.672:ENSG00000115607"
## [16] "3.598:ENSG00000105991" "3.444:ENSG00000104361" "3.263:ENSG00000198502"
## [19] "3.234:ENSG00000184350" "3.213:ENSG00000121807"
clinical_scores[["highest"]][,"Comp.27"]
## [1] "7.498:ENSG00000137801" "7.383:ENSG00000110203" "7.085:ENSG00000106565"
## [4] "6.72:ENSG00000002933" "5.895:ENSG00000276085" "5.525:ENSG00000049247"
## [7] "5.313:ENSG00000147408" "4.428:ENSG00000115607" "4.021:ENSG00000124882"
## [10] "3.846:ENSG00000154153" "3.613:ENSG00000050820" "3.591:ENSG00000196126"
## [13] "3.419:ENSG00000179344" "3.355:ENSG00000002726" "3.326:ENSG00000168280"
## [16] "3.294:ENSG00000276070" "3.155:ENSG00000090382" "3.117:ENSG00000171916"
## [19] "3.104:ENSG00000149591" "2.972:ENSG00000269028"
first <- normalize_expt(hs_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=1)
## Removing 5663 low-count genes (14260 remaining).
## Setting 170679 low elements to zero.
## transform_counts: Found 170679 values equal to 0, adding 1 to the matrix.
first_info <- pca_information(
first, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
first_info$anova_neglogp_heatmap
first_info$pca_plots[["PC1_PC2"]]
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning: ggrepel: 152 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
second <- normalize_expt(hs_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=2) %>%
set_expt_batches(fact="clinic")
## Removing 5663 low-count genes (14260 remaining).
## Setting 28688 low elements to zero.
## transform_counts: Found 28688 values equal to 0, adding 1 to the matrix.
second_info <- pca_information(
second, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
second_info$anova_neglogp_heatmap
third <- normalize_expt(hs_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=3) %>%
set_expt_batches(fact="clinic")
## Removing 5663 low-count genes (14260 remaining).
## Setting 23196 low elements to zero.
## transform_counts: Found 23196 values equal to 0, adding 1 to the matrix.
third_info <- pca_information(
third, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
third_info$anova_neglogp_heatmap
fourth <- normalize_expt(hs_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=4) %>%
set_expt_batches(fact="clinic")
## Removing 5663 low-count genes (14260 remaining).
## Setting 22153 low elements to zero.
## transform_counts: Found 22153 values equal to 0, adding 1 to the matrix.
fourth_info <- pca_information(
fourth, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
fourth_info$anova_neglogp_heatmap
fourth_info[["pca_plots"]][["PC1_PC2"]]
## Warning: ggrepel: 76 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
fifth <- normalize_expt(hs_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=5) %>%
set_expt_batches(fact="clinic")
## Removing 5663 low-count genes (14260 remaining).
## Setting 23511 low elements to zero.
## transform_counts: Found 23511 values equal to 0, adding 1 to the matrix.
fifth_info <- pca_information(
fifth, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
fifth_info$anova_neglogp_heatmap
fifth_info[["pca_plots"]][["PC1_PC12"]]
## Warning: ggrepel: 111 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
sixth <- normalize_expt(hs_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=6) %>%
set_expt_batches(fact="clinic")
## Removing 5663 low-count genes (14260 remaining).
## Setting 21292 low elements to zero.
## transform_counts: Found 21292 values equal to 0, adding 1 to the matrix.
sixth_info <- pca_information(
sixth, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
sixth_info$anova_neglogp_heatmap
seventh <- normalize_expt(hs_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=7) %>%
set_expt_batches(fact="clinic")
## Removing 5663 low-count genes (14260 remaining).
## Setting 19835 low elements to zero.
## transform_counts: Found 19835 values equal to 0, adding 1 to the matrix.
seventh_info <- pca_information(
seventh, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
seventh_info$anova_neglogp_heatmap
eighth <- normalize_expt(hs_clinical, transform="log2", convert="cpm",
filter = TRUE, batch="svaseq", surrogates=8)
## Removing 5663 low-count genes (14260 remaining).
## Setting 19967 low elements to zero.
## transform_counts: Found 19967 values equal to 0, adding 1 to the matrix.
eighth_info <- pca_information(
eighth, plot_pcas=TRUE, num_components = 30,
expt_factors=c("visitnumber", "typeofcells", "clinicaloutcome",
"clinic"))
## plot labels was not set and there are more than 100 samples, disabling it.
eighth_info$anova_neglogp_heatmap
In the following block the experimental condition was reset to the concatenation of clinical outcome and type of cells. There are an insufficient number of biopsy samples for them to be useful in this visualization, so they are ignored.
desired_levels <- c("cure_biopsy", "failure_biopsy", "cure_eosinophils", "failure_eosinophils",
"cure_monocytes", "failure_monocytes", "cure_neutrophils", "failure_neutrophils")
new_fact <- factor(
paste0(pData(hs_clinical)[["condition"]], "_",
pData(hs_clinical)[["batch"]]),
levels=desired_levels)
hs_clinical_concat <- set_expt_conditions(hs_clinical, fact = new_fact) %>%
set_expt_batches(fact = "visitnumber") %>%
set_expt_colors(cf_type_colors) %>%
subset_expt(subset="typeofcells!='biopsy'")
## subset_expt(): There were 162, now there are 145 samples.
## Try to ensure that the levels stay in the order I want
meta <- pData(hs_clinical_concat) %>%
mutate(condition = fct_relevel(condition, desired_levels))
## Warning: Unknown levels in `f`: cure_biopsy, failure_biopsy
pData(hs_expt) <- meta
clinical_concat_norm <- normalize_expt(hs_clinical_concat, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 7868 low-count genes (12055 remaining).
## transform_counts: Found 110 values equal to 0, adding 1 to the matrix.
clinical_concat_norm_pca <- plot_pca(clinical_concat_norm)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/clinical_concatenated_normalized_pca-v{ver}.png"), height=6, width=10)
clinical_concat_norm_pca$plot
closed <- dev.off()
clinical_concat_norm_pca$plot
clinical_concat_nb <- normalize_expt(hs_clinical_concat, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 7868 low-count genes (12055 remaining).
## Setting 16075 low elements to zero.
## transform_counts: Found 16075 values equal to 0, adding 1 to the matrix.
clinical_concat_nb_pca <- plot_pca(clinical_concat_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/clinical_concatenated_svaseqbatch_pca-v{ver}.png"), height=6, width=12)
clinical_concat_nb_pca$plot
closed <- dev.off()
clinical_concat_nb_pca$plot
tumaco_concat <- subset_expt(hs_clinical_concat, subset="clinic=='Tumaco'")
## subset_expt(): There were 145, now there are 110 samples.
tumaco_concat_norm <- normalize_expt(tumaco_concat, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 8006 low-count genes (11917 remaining).
## transform_counts: Found 95 values equal to 0, adding 1 to the matrix.
tumaco_concat_norm_pca <- plot_pca(tumaco_concat_norm)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/tumaco_concatenated_normalized_pca-v{ver}.png"), height=6, width=10)
tumaco_concat_norm_pca$plot
closed <- dev.off()
tumaco_concat_norm_pca$plot
tumaco_concat_nb <- normalize_expt(tumaco_concat, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 8006 low-count genes (11917 remaining).
## Setting 19028 low elements to zero.
## transform_counts: Found 19028 values equal to 0, adding 1 to the matrix.
tumaco_concat_nb_pca <- plot_pca(tumaco_concat_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/clinical_concatenated_svaseqbatch_pca-v{ver}.png"), height=6, width=12)
tumaco_concat_nb_pca$plot
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
closed <- dev.off()
tumaco_concat_nb_pca$plot
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
library(ggplot2)
wanted_legend_order <- c("cure_eosinophils","cure_monocytes","cure_neutrophils",
"failure_eosinophils","failure_monocytes","failure_neutrophils")
test <- plot_pca(clinical_concat_norm)
## plot labels was not set and there are more than 100 samples, disabling it.
test$plot
test$plot +
scale_color_discrete(breaks=wanted_legend_order) +
guides(color=guide_legend(ncol=2))
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
test2 <- plot_legend(clinical_concat_norm)
## plot labels was not set and there are more than 100 samples, disabling it.
Separate the samples by visit in order to more easily see what patterns emerge across cell type and clinical outcome.
The most likely data set for this is hs_clinical.
Now let us shift the view slightly to focus on changes observed over time.
visit_expt <- set_expt_conditions(hs_clinical, fact = "visitnumber") %>%
set_expt_batches(fact = "clinicaloutcome") %>%
subset_expt(subset="typeofcells!='biopsy'")
## subset_expt(): There were 162, now there are 145 samples.
visit_norm <- normalize_expt(visit_expt, transform="log2", convert="cpm",
norm="quant", filter=TRUE)
## Removing 7868 low-count genes (12055 remaining).
## transform_counts: Found 110 values equal to 0, adding 1 to the matrix.
plot_pca(visit_norm)$plot
## plot labels was not set and there are more than 100 samples, disabling it.
visit_nb <- normalize_expt(visit_expt, transform = "log2", convert="cpm",
filter = TRUE, batch = "svaseq")
## Removing 7868 low-count genes (12055 remaining).
## Setting 13664 low elements to zero.
## transform_counts: Found 13664 values equal to 0, adding 1 to the matrix.
visit_nb_pca <- plot_pca(visit_nb)
## plot labels was not set and there are more than 100 samples, disabling it.
dev <- pp(file=glue("images/visit_svaseqbatch_pca-v{ver}.png"), height=7, width=9)
visit_nb_pca$plot
closed <- dev.off()
visit_nb_pca$plot
repeat with only the most expressed genes from one cell type.
rpkm_values <- normalize_expt(type_valid, transform="log2", convert="rpkm",
filter=TRUE, column="cds_length")
## Removing 5663 low-count genes (14260 remaining).
## transform_counts: Found 178981 values equal to 0, adding 1 to the matrix.
rpkm_median <- median_by_factor(rpkm_values)
## The factor biopsy has 17 rows.
## The factor eosinophils has 34 rows.
## The factor monocytes has 56 rows.
## The factor neutrophils has 55 rows.
rpkm_tumaco_monocyte_idx <- order(rpkm_median[["medians"]][["monocytes"]], decreasing=TRUE)
rpkm_tumaco_monocyte <- rpkm_median[["medians"]][rpkm_tumaco_monocyte_idx, ]
tumaco_monocyte_gene <- head(rownames(rpkm_tumaco_monocyte), n=1000)
Now separate the visits by cell type and whether or not the drug used was the antimonial.
visit_monocyte <- subset_expt(visit_expt, subset = "typeofcells=='monocytes'")
## subset_expt(): There were 145, now there are 56 samples.
visit_neutrophil <- subset_expt(visit_expt, subset = "typeofcells=='neutrophils'")
## subset_expt(): There were 145, now there are 55 samples.
visit_eosinophil <- subset_expt(visit_expt, subset = "typeofcells=='eosinophils'")
## subset_expt(): There were 145, now there are 34 samples.
v1_clinical <- subset_expt(visit_expt, subset = "visitnumber=='1'") %>%
set_expt_conditions(fact = "clinicaloutcome") %>%
set_expt_batches(fact = "typeofcells")
## subset_expt(): There were 145, now there are 56 samples.
v1_norm <- normalize_expt(v1_clinical, transform="log2", convert="cpm", norm="quant", filter=TRUE)
## Removing 8242 low-count genes (11681 remaining).
## transform_counts: Found 53 values equal to 0, adding 1 to the matrix.
plot_pca(v1_norm)$plot
## Warning: ggrepel: 55 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
v1_nb <- normalize_expt(v1_clinical, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")
## Removing 8242 low-count genes (11681 remaining).
## Setting 3870 low elements to zero.
## transform_counts: Found 3870 values equal to 0, adding 1 to the matrix.
plot_pca(v1_nb, plot_labels = FALSE)$plot
So, given the data on hand, there is slight ability to discern cure and fail among the visit 1 samples.
v2_clinical <- subset_expt(visit_expt, subset="visitnumber=='2'") %>%
set_expt_conditions(fact = "clinicaloutcome") %>%
set_expt_batches(fact = "typeofcells")
## subset_expt(): There were 145, now there are 45 samples.
v2_nb <- normalize_expt(v2_clinical, transform = "log2", convert = "cpm", norm = "quant",
filter = TRUE, batch = "svaseq")
## Warning in normalize_expt(v2_clinical, transform = "log2", convert = "cpm", :
## Quantile normalization and sva do not always play well together.
## Removing 8243 low-count genes (11680 remaining).
## Setting 2461 low elements to zero.
## transform_counts: Found 2461 values equal to 0, adding 1 to the matrix.
plot_pca(v2_nb, plot_labels = FALSE)$plot
There might be some variance among the visit 2 samples which are cure/fail-ish.
v3_clinical <- subset_expt(visit_expt, subset="visitnumber=='3'") %>%
set_expt_conditions(fact = "clinicaloutcome") %>%
set_expt_batches(fact = "typeofcells")
## subset_expt(): There were 145, now there are 44 samples.
v3_nb <- normalize_expt(v3_clinical, transform = "log2", convert = "cpm", norm = "quant",
filter = TRUE, batch = "svaseq")
## Warning in normalize_expt(v3_clinical, transform = "log2", convert = "cpm", :
## Quantile normalization and sva do not always play well together.
## Removing 8329 low-count genes (11594 remaining).
## Setting 2018 low elements to zero.
## transform_counts: Found 2018 values equal to 0, adding 1 to the matrix.
plot_pca(v3_nb, plot_labels = FALSE)$plot
In a similar fashion, it seems that the visit 3 samples are not particularly informative.
Separate the samples by cell type in order to more easily observe patterns with respect to visit and clinical outcome.
biopsy_norm <- normalize_expt(biopsy_samples, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 6321 low-count genes (13602 remaining).
## transform_counts: Found 169 values equal to 0, adding 1 to the matrix.
plot_pca(biopsy_norm, plot_labels = FALSE)$plot
biopsy_nb <- normalize_expt(biopsy_samples, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 6321 low-count genes (13602 remaining).
## Setting 208 low elements to zero.
## transform_counts: Found 208 values equal to 0, adding 1 to the matrix.
plot_pca(biopsy_nb, plot_labels = FALSE)$plot
monocyte_norm <- normalize_expt(monocyte_samples, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 8867 low-count genes (11056 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
monocyte_pca <- plot_pca(monocyte_norm, plot_labels = FALSE)
dev <- pp(file="images/monocytes_cf_norm_pca.png")
monocyte_pca$plot
closed <- dev.off()
monocyte_pca$plot
monocyte_disheat <- plot_disheat(monocyte_norm)
dev <- pp(file="images/monocytes_cf_norm_disheat.png")
monocyte_disheat$plot
closed <- dev.off()
monocyte_disheat$plot
monocyte_nb <- normalize_expt(monocyte_samples, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 8867 low-count genes (11056 remaining).
## Setting 1086 low elements to zero.
## transform_counts: Found 1086 values equal to 0, adding 1 to the matrix.
monocyte_nb_pca <- plot_pca(monocyte_nb, plot_labels = FALSE)
dev <- pp(file="images/monocytes_cf_norm_sva_pca.png")
monocyte_nb_pca$plot
closed <- dev.off()
monocyte_nb_pca$plot
monocyte_v1 <- subset_expt(monocyte_samples, subset = "visitnumber=='1'")
## subset_expt(): There were 56, now there are 22 samples.
monocyte_v1_norm <- normalize_expt(monocyte_v1, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 9179 low-count genes (10744 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
monocyte_v1_pca <- plot_pca(monocyte_v1_norm, plot_labels = FALSE)
dev <- pp(file="images/monocytes_v1_cf_norm_pca.png")
monocyte_v1_pca$plot
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
closed <- dev.off()
monocyte_v1_pca$plot
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
## Warning in MASS::cov.trob(data[, vars]): Probable convergence failure
monocyte_v1_nb <- normalize_expt(monocyte_v1, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9179 low-count genes (10744 remaining).
## Setting 344 low elements to zero.
## transform_counts: Found 344 values equal to 0, adding 1 to the matrix.
monocyte_v1_nb_pca <- plot_pca(monocyte_v1_nb, plot_labels = FALSE)
dev <- pp(file="images/monocytes_v1_cf_norm_sva_pca.png")
monocyte_v1_nb_pca$plot
closed <- dev.off()
monocyte_v1_nb_pca$plot
The monocytes, on the other hand, appear to have some real information lurking in them.
neutrophil_norm <- normalize_expt(neutrophil_samples, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 10713 low-count genes (9210 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(neutrophil_norm, plot_labels = FALSE)$plot
neutrophil_nb <- normalize_expt(neutrophil_samples, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 10713 low-count genes (9210 remaining).
## Setting 1238 low elements to zero.
## transform_counts: Found 1238 values equal to 0, adding 1 to the matrix.
plot_pca(neutrophil_nb, plot_labels = FALSE)$plot
This appears also to be the case for the neutrophil samples.
eosinophil_norm <- normalize_expt(eosinophil_samples, norm = "quant", convert = "cpm",
transform = "log2", filter = TRUE)
## Removing 9238 low-count genes (10685 remaining).
## transform_counts: Found 4 values equal to 0, adding 1 to the matrix.
plot_pca(eosinophil_norm, plot_labels = FALSE)$plot
eosinophil_nb <- normalize_expt(eosinophil_samples, convert = "cpm",
transform = "log2", filter = TRUE, batch = "svaseq")
## Removing 9238 low-count genes (10685 remaining).
## Setting 614 low elements to zero.
## transform_counts: Found 614 values equal to 0, adding 1 to the matrix.
plot_pca(eosinophil_nb, plot_labels = FALSE)$plot
We have fewer eosinophil samples currently, but they appear to have some real differences.
The primary goal is to learn about cure vs. fail.
Now let us start performing the various differential expression analyses, starting with the set of all/most clinical samples.
cf_clinical_de <- all_pairwise(hs_clinical, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (14260 remaining).
## Setting 23511 low elements to zero.
## transform_counts: Found 23511 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_clinical_tables <- combine_de_tables(
cf_clinical_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_clinical_tables-v{ver}.xlsx"))
## Deleting the file excel/cf_clinical_tables-v202203.xlsx before writing the tables.
cf_clinical_sig <- extract_significant_genes(
cf_clinical_tables,
excel = glue::glue("excel/cf_clinical_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_clinical_sig-v202203.xlsx before writing the tables.
cf_biopsy_de <- all_pairwise(biopsy_samples, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (13602 remaining).
## Setting 208 low elements to zero.
## transform_counts: Found 208 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_biopsy_tables <- combine_de_tables(
cf_biopsy_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_biopsy_tables-v{ver}.xlsx"))
## Deleting the file excel/cf_biopsy_tables-v202203.xlsx before writing the tables.
cf_biopsy_sig <- extract_significant_genes(
cf_biopsy_tables,
excel = glue::glue("excel/cf_biopsy_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_biopsy_sig-v202203.xlsx before writing the tables.
cf_monocyte_sva_de <- all_pairwise(monocyte_samples, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (11056 remaining).
## Setting 1086 low elements to zero.
## transform_counts: Found 1086 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_monocyte_sva_tables <- combine_de_tables(
cf_monocyte_sva_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_monocyte_sva_tables-v{ver}.xlsx"))
## Deleting the file excel/cf_monocyte_sva_tables-v202203.xlsx before writing the tables.
cf_monocyte_sva_sig <- extract_significant_genes(
cf_monocyte_sva_tables,
excel = glue::glue("excel/cf_monocyte_sva_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_monocyte_sva_sig-v202203.xlsx before writing the tables.
cf_monocyte_batchvisit_de <- all_pairwise(monocyte_samples, model_batch = TRUE, filter = TRUE)
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_monocyte_batchvisit_tables <- combine_de_tables(
cf_monocyte_batchvisit_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_monocyte_batchvisit_tables-v{ver}.xlsx"))
## Deleting the file excel/cf_monocyte_batchvisit_tables-v202203.xlsx before writing the tables.
cf_monocyte_batchvisit_sig <- extract_significant_genes(
cf_monocyte_batchvisit_tables,
excel = glue::glue("excel/cf_monocyte_batchvisit_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_monocyte_batchvisit_sig-v202203.xlsx before writing the tables.
cf_tumaco_monocyte_sva_de <- all_pairwise(tumaco_monocyte, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (10866 remaining).
## Setting 705 low elements to zero.
## transform_counts: Found 705 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_tumaco_monocyte_sva_tables <- combine_de_tables(
cf_tumaco_monocyte_sva_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_monocyte_tables_sva_tumaco-v{ver}.xlsx"))
## Deleting the file excel/cf_monocyte_tables_sva_tumaco-v202203.xlsx before writing the tables.
cf_tumaco_monocyte_sva_sig <- extract_significant_genes(
cf_tumaco_monocyte_sva_tables,
excel = glue::glue("excel/cf_monocyte_sva_tumaco_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_monocyte_sva_tumaco_sig-v202203.xlsx before writing the tables.
cf_tumaco_monocyte_batchvisit_de <- all_pairwise(tumaco_monocyte, model_batch = TRUE, filter = TRUE)
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_tumaco_monocyte_batchvisit_tables <- combine_de_tables(
cf_tumaco_monocyte_batchvisit_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_monocyte_tables_batchvisit_tumaco-v{ver}.xlsx"))
## Deleting the file excel/cf_monocyte_tables_batchvisit_tumaco-v202203.xlsx before writing the tables.
cf_tumaco_monocyte_batchvisit_sig <- extract_significant_genes(
cf_tumaco_monocyte_batchvisit_tables,
excel = glue::glue("excel/cf_monocyte_batchvisit_tumaco_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_monocyte_batchvisit_tumaco_sig-v202203.xlsx before writing the tables.
tumaco_sva_aucc <- calculate_aucc(cf_tumaco_monocyte_sva_tables[["data"]][["fail_vs_cure"]],
tbl2=cf_monocyte_sva_tables[["data"]][["fail_vs_cure"]],
py="deseq_adjp", ly="deseq_logfc")
tumaco_sva_aucc
## $aucc
## [1] 0.388
##
## $plot
shared_ids <- rownames(cf_tumaco_monocyte_sva_tables[["data"]][["fail_vs_cure"]]) %in%
rownames(cf_monocyte_sva_tables[["data"]][["fail_vs_cure"]])
first <- cf_monocyte_sva_tables[["data"]][["fail_vs_cure"]][shared_ids, ]
second <- cf_tumaco_monocyte_sva_tables[["data"]][["fail_vs_cure"]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])
##
## Pearson's product-moment correlation
##
## data: first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 188, df = 10864, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8703 0.8791
## sample estimates:
## cor
## 0.8748
tumaco_batch_aucc <- calculate_aucc(cf_tumaco_monocyte_batchvisit_tables[["data"]][["fail_vs_cure"]],
tbl2=cf_monocyte_batchvisit_tables[["data"]][["fail_vs_cure"]],
py="deseq_adjp", ly="deseq_logfc")
tumaco_batch_aucc
## $aucc
## [1] 0.5309
##
## $plot
shared_ids <- rownames(cf_tumaco_monocyte_batchvisit_tables[["data"]][["fail_vs_cure"]]) %in%
rownames(cf_monocyte_batchvisit_tables[["data"]][["fail_vs_cure"]])
first <- cf_monocyte_batchvisit_tables[["data"]][["fail_vs_cure"]][shared_ids, ]
second <- cf_tumaco_monocyte_batchvisit_tables[["data"]][["fail_vs_cure"]][rownames(first), ]
cor.test(first[["deseq_logfc"]], second[["deseq_logfc"]])
##
## Pearson's product-moment correlation
##
## data: first[["deseq_logfc"]] and second[["deseq_logfc"]]
## t = 220, df = 10864, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9005 0.9074
## sample estimates:
## cor
## 0.904
first_sig_names <- rownames(cf_monocyte_batchvisit_sig[["deseq"]][["ups"]][["fail_vs_cure"]])
second_sig_names <- rownames(cf_tumaco_monocyte_batchvisit_sig[["deseq"]][["ups"]][["fail_vs_cure"]])
batch_venn_lst <- list(all_batch = first_sig_names, tumaco_batch = second_sig_names)
batch_venn <- Vennerable::Venn(batch_venn_lst)
Vennerable::plot(batch_venn)
cf_neutrophil_de <- all_pairwise(neutrophil_samples, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (9210 remaining).
## Setting 1238 low elements to zero.
## transform_counts: Found 1238 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_neutrophil_tables <- combine_de_tables(
cf_neutrophil_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_neutrophil_tables-v{ver}.xlsx"))
## Deleting the file excel/cf_neutrophil_tables-v202203.xlsx before writing the tables.
cf_neutrophil_sig <- extract_significant_genes(
cf_neutrophil_tables,
excel = glue::glue("excel/cf_neutrophil_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_neutrophil_sig-v202203.xlsx before writing the tables.
cf_eosinophil_de <- all_pairwise(eosinophil_samples, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (10685 remaining).
## Setting 614 low elements to zero.
## transform_counts: Found 614 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_eosinophil_tables <- combine_de_tables(
cf_eosinophil_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_eosinophil_tables-v{ver}.xlsx"))
## Deleting the file excel/cf_eosinophil_tables-v202203.xlsx before writing the tables.
cf_eosinophil_sig <- extract_significant_genes(
cf_eosinophil_tables,
excel = glue::glue("excel/cf_eosinophil_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_eosinophil_sig-v202203.xlsx before writing the tables.
cf_nobiopsy_de <- all_pairwise(hs_clinical_nobiop, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (12055 remaining).
## Setting 13520 low elements to zero.
## transform_counts: Found 13520 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cf_nobiopsy_tables <- combine_de_tables(
cf_nobiopsy_de,
keepers = cf_contrasts,
excel = glue::glue("excel/cf_nobiopsy_tables-v{ver}.xlsx"))
## Deleting the file excel/cf_nobiopsy_tables-v202203.xlsx before writing the tables.
cf_nobiopsy_sig <- extract_significant_genes(
cf_nobiopsy_tables,
excel = glue::glue("excel/cf_nobiopsy_sig-v{ver}.xlsx"))
## Deleting the file excel/cf_nobiopsy_sig-v202203.xlsx before writing the tables.
For these contrasts, we want to see fail_v1 vs. cure_v1, fail_v2 vs. cure_v2 etc. As a result, we will need to juggle the data slightly and add another set of contrasts.
visit_cf_expt_factor <- paste0("v", pData(hs_clinical)[["visitnumber"]],
pData(hs_clinical)[["condition"]])
visit_cf_expt <- set_expt_conditions(hs_clinical, fact = visit_cf_expt_factor)
visit_cf_eosinophil <- subset_expt(visit_cf_expt, subset="typeofcells=='eosinophils'")
## subset_expt(): There were 162, now there are 34 samples.
visit_cf_tumaco_eosinophil <- subset_expt(visit_cf_eosinophil, subset="clinic=='Tumaco'")
## subset_expt(): There were 34, now there are 27 samples.
visit_cf_monocyte <- subset_expt(visit_cf_expt, subset="typeofcells=='monocytes'")
## subset_expt(): There were 162, now there are 56 samples.
visit_cf_tumaco_monocyte <- subset_expt(visit_cf_monocyte, subset="clinic=='Tumaco'")
## subset_expt(): There were 56, now there are 42 samples.
visit_cf_neutrophil <- subset_expt(visit_cf_expt, subset="typeofcells=='neutrophils'")
## subset_expt(): There were 162, now there are 55 samples.
visit_cf_tumaco_neutrophil <- subset_expt(visit_cf_neutrophil, subset="clinic=='Tumaco'")
## subset_expt(): There were 55, now there are 41 samples.
visit_cf_all_de <- all_pairwise(visit_cf_expt, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (14260 remaining).
## Setting 23481 low elements to zero.
## transform_counts: Found 23481 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_cf_all_tables <- combine_de_tables(
visit_cf_all_de,
keepers = visit_cf_contrasts,
excel = glue::glue("excel/visit_cf_all_tables-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_all_tables-v202203.xlsx before writing the tables.
visit_cf_all_sig <- extract_significant_genes(
visit_cf_all_tables,
excel = glue::glue("excel/visit_cf_all_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_all_sig-v202203.xlsx before writing the tables.
visit_cf_monocyte_de <- all_pairwise(visit_cf_monocyte, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (11056 remaining).
## Setting 1147 low elements to zero.
## transform_counts: Found 1147 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_cf_monocyte_tables <- combine_de_tables(
visit_cf_monocyte_de,
keepers = visit_cf_contrasts,
excel = glue::glue("excel/visit_cf_monocyte_tables-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_monocyte_tables-v202203.xlsx before writing the tables.
visit_cf_monocyte_sig <- extract_significant_genes(
visit_cf_monocyte_tables,
excel = glue::glue("excel/visit_cf_monocyte_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_monocyte_sig-v202203.xlsx before writing the tables.
v1fc_deseq_ma <- visit_cf_monocyte_tables[["plots"]][["v1fail_vs_cure"]][["deseq_ma_plots"]][["plot"]]
dev <- pp(file="images/monocyte_cf_de_v1_maplot.png")
v1fc_deseq_ma
closed <- dev.off()
v1fc_deseq_ma
v2fc_deseq_ma <- visit_cf_monocyte_tables[["plots"]][["v2fail_vs_cure"]][["deseq_ma_plots"]][["plot"]]
dev <- pp(file="images/monocyte_cf_de_v2_maplot.png")
v2fc_deseq_ma
closed <- dev.off()
v2fc_deseq_ma
v3fc_deseq_ma <- visit_cf_monocyte_tables[["plots"]][["v3fail_vs_cure"]][["deseq_ma_plots"]][["plot"]]
dev <- pp(file="images/monocyte_cf_de_v3_maplot.png")
v3fc_deseq_ma
closed <- dev.off()
v3fc_deseq_ma
## Repeat for the tumaco subset
visit_cf_tumaco_monocyte_de <- all_pairwise(visit_cf_tumaco_monocyte,
model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (10866 remaining).
## Setting 682 low elements to zero.
## transform_counts: Found 682 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_cf_tumaco_monocyte_tables <- combine_de_tables(
visit_cf_tumaco_monocyte_de,
keepers = visit_cf_contrasts,
excel = glue::glue("excel/visit_cf_tumaco_monocyte_tables-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_tumaco_monocyte_tables-v202203.xlsx before writing the tables.
visit_cf_tumaco_monocyte_sig <- extract_significant_genes(
visit_cf_tumaco_monocyte_tables,
excel = glue::glue("excel/visit_cf_tumaco_monocyte_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_tumaco_monocyte_sig-v202203.xlsx before writing the tables.
One query from Alejandro is to look at the genes shared up/down across visits. I am not entirely certain we have enough samples for this to work, but let us find out.
I am thinking this is a good place to use the AUCC curves I learned about thanks to Julie Cridland.
Note that the following is all monocyte samples, this should therefore potentially be moved up and a version of this with only the Tumaco samples put here?
v1fc <- visit_cf_monocyte_tables[["data"]][["v1fail_vs_cure"]]
v2fc <- visit_cf_monocyte_tables[["data"]][["v2fail_vs_cure"]]
v3fc <- visit_cf_monocyte_tables[["data"]][["v3fail_vs_cure"]]
v1_sig <- c(
rownames(visit_cf_monocyte_sig[["deseq"]][["ups"]][["v1fail_vs_cure"]]),
rownames(visit_cf_monocyte_sig[["deseq"]][["downs"]][["v1fail_vs_cure"]]))
v2_sig <- c(
rownames(visit_cf_monocyte_sig[["deseq"]][["ups"]][["v2fail_vs_cure"]]),
rownames(visit_cf_monocyte_sig[["deseq"]][["downs"]][["v2fail_vs_cure"]]))
v3_sig <- c(
rownames(visit_cf_monocyte_sig[["deseq"]][["ups"]][["v2fail_vs_cure"]]),
rownames(visit_cf_monocyte_sig[["deseq"]][["downs"]][["v2fail_vs_cure"]]))
monocyte_visit_aucc_v2v1 <- calculate_aucc(v1fc, tbl2=v2fc, py="deseq_adjp", ly="deseq_logfc")
dev <- pp(file="images/monocyte_visit_v2v1_aucc.png")
monocyte_visit_aucc_v2v1[["plot"]]
closed <- dev.off()
monocyte_visit_aucc_v2v1[["plot"]]
monocyte_visit_aucc_v3v1 <- calculate_aucc(v1fc, tbl2=v3fc, py="deseq_adjp", ly="deseq_logfc")
dev <- pp(file="images/monocyte_visit_v3v1_aucc.png")
monocyte_visit_aucc_v3v1[["plot"]]
closed <- dev.off()
monocyte_visit_aucc_v3v1[["plot"]]
v1fc_tumaco <- visit_cf_tumaco_monocyte_tables[["data"]][["v1fail_vs_cure"]]
v2fc_tumaco <- visit_cf_tumaco_monocyte_tables[["data"]][["v2fail_vs_cure"]]
v3fc_tumaco <- visit_cf_tumaco_monocyte_tables[["data"]][["v3fail_vs_cure"]]
v1_tumaco_sig <- c(
rownames(visit_cf_tumaco_monocyte_sig[["deseq"]][["ups"]][["v1fail_vs_cure"]]),
rownames(visit_cf_tumaco_monocyte_sig[["deseq"]][["downs"]][["v1fail_vs_cure"]]))
v2_tumaco_sig <- c(
rownames(visit_cf_tumaco_monocyte_sig[["deseq"]][["ups"]][["v2fail_vs_cure"]]),
rownames(visit_cf_tumaco_monocyte_sig[["deseq"]][["downs"]][["v2fail_vs_cure"]]))
v3_tumaco_sig <- c(
rownames(visit_cf_tumaco_monocyte_sig[["deseq"]][["ups"]][["v2fail_vs_cure"]]),
rownames(visit_cf_tumaco_monocyte_sig[["deseq"]][["downs"]][["v2fail_vs_cure"]]))
monocyte_tumaco_visit_aucc_v2v1 <- calculate_aucc(v1fc_tumaco, tbl2=v2fc_tumaco,
py="deseq_adjp", ly="deseq_logfc")
dev <- pp(file="images/monocyte_visit_tumaco_v2v1_aucc.png")
monocyte_tumaco_visit_aucc_v2v1[["plot"]]
closed <- dev.off()
monocyte_tumaco_visit_aucc_v2v1[["plot"]]
monocyte_tumaco_visit_aucc_v3v1 <- calculate_aucc(v1fc_tumaco, tbl2=v3fc_tumaco,
py="deseq_adjp", ly="deseq_logfc")
dev <- pp(file="images/monocyte_visit_tumaco_v3v1_aucc.png")
monocyte_tumaco_visit_aucc_v3v1[["plot"]]
closed <- dev.off()
monocyte_tumaco_visit_aucc_v3v1[["plot"]]
monocyte_tumaco_visit_aucc_v3v2 <- calculate_aucc(v3fc_tumaco, tbl2=v2fc_tumaco,
py="deseq_adjp", ly="deseq_logfc")
dev <- pp(file="images/monocyte_visit_tumaco_v3v2_aucc.png")
monocyte_tumaco_visit_aucc_v3v2[["plot"]]
closed <- dev.off()
monocyte_tumaco_visit_aucc_v3v2[["plot"]]
a
visit_cf_neutrophil_de <- all_pairwise(visit_cf_neutrophil, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (9210 remaining).
## Setting 1086 low elements to zero.
## transform_counts: Found 1086 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_cf_neutrophil_tables <- combine_de_tables(
visit_cf_neutrophil_de,
keepers = visit_cf_contrasts,
excel = glue::glue("excel/visit_cf_neutrophil_tables-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_neutrophil_tables-v202203.xlsx before writing the tables.
visit_cf_neutrophil_sig <- extract_significant_genes(
visit_cf_neutrophil_tables,
excel = glue::glue("excel/visit_cf_neutrophil_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_neutrophil_sig-v202203.xlsx before writing the tables.
visit_cf_eosinophil_de <- all_pairwise(visit_cf_eosinophil, model_batch = "svaseq", filter = TRUE)
## Removing 0 low-count genes (10685 remaining).
## Setting 596 low elements to zero.
## transform_counts: Found 596 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_cf_eosinophil_tables <- combine_de_tables(
visit_cf_eosinophil_de,
keepers = visit_cf_contrasts,
excel = glue::glue("excel/visit_cf_eosinophil_tables-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_eosinophil_tables-v202203.xlsx before writing the tables.
visit_cf_eosinophil_sig <- extract_significant_genes(
visit_cf_eosinophil_tables,
excel = glue::glue("excel/visit_cf_eosinophil_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_cf_eosinophil_sig-v202203.xlsx before writing the tables.
Having put some SL read mapping information in the sample sheet, Maria Adelaida added a new column using it with the putative persistence state on a per-sample basis. One question which arised from that: what differences are observable between the persistent yes vs. no samples on a per-cell-type basis among the visit 3 samples.
First things first, create the datasets.
persistence_expt <- subset_expt(hs_clinical, subset = "persistence=='Y'|persistence=='N'") %>%
subset_expt(subset = 'visitnumber==3') %>%
set_expt_conditions(fact = 'persistence')
## subset_expt(): There were 162, now there are 94 samples.
## subset_expt(): There were 94, now there are 25 samples.
## persistence_biopsy <- subset_expt(persistence_expt, subset = "typeofcells=='biopsy'")
persistence_monocyte <- subset_expt(persistence_expt, subset = "typeofcells=='monocytes'")
## subset_expt(): There were 25, now there are 10 samples.
persistence_neutrophil <- subset_expt(persistence_expt, subset = "typeofcells=='neutrophils'")
## subset_expt(): There were 25, now there are 9 samples.
persistence_eosinophil <- subset_expt(persistence_expt, subset = "typeofcells=='eosinophils'")
## subset_expt(): There were 25, now there are 6 samples.
See if there are any patterns which look usable.
## All
persistence_norm <- normalize_expt(persistence_expt, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 8563 low-count genes (11360 remaining).
## transform_counts: Found 35 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_norm)$plot
persistence_nb <- normalize_expt(persistence_expt, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 8563 low-count genes (11360 remaining).
## Setting 1234 low elements to zero.
## transform_counts: Found 1234 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_nb)$plot
## Biopsies
##persistence_biopsy_norm <- normalize_expt(persistence_biopsy, transform = "log2", convert = "cpm",
## norm = "quant", filter = TRUE)
##plot_pca(persistence_biopsy_norm)$plot
## Insufficient data
## Monocytes
persistence_monocyte_norm <- normalize_expt(persistence_monocyte, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 9552 low-count genes (10371 remaining).
## transform_counts: Found 3 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_monocyte_norm)$plot
persistence_monocyte_nb <- normalize_expt(persistence_monocyte, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 9552 low-count genes (10371 remaining).
## Setting 44 low elements to zero.
## transform_counts: Found 44 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_monocyte_nb)$plot
## Neutrophils
persistence_neutrophil_norm <- normalize_expt(persistence_neutrophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 11563 low-count genes (8360 remaining).
## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_neutrophil_norm)$plot
persistence_neutrophil_nb <- normalize_expt(persistence_neutrophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 11563 low-count genes (8360 remaining).
## Setting 53 low elements to zero.
## transform_counts: Found 53 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_neutrophil_nb)$plot
## Eosinophils
persistence_eosinophil_norm <- normalize_expt(persistence_eosinophil, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)
## Removing 9983 low-count genes (9940 remaining).
## transform_counts: Found 6 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_eosinophil_norm)$plot
persistence_eosinophil_nb <- normalize_expt(persistence_eosinophil, transform = "log2", convert = "cpm",
batch = "svaseq", filter = TRUE)
## Removing 9983 low-count genes (9940 remaining).
## Setting 19 low elements to zero.
## transform_counts: Found 19 values equal to 0, adding 1 to the matrix.
plot_pca(persistence_eosinophil_nb)$plot
persistence_de <- all_pairwise(persistence_expt, filter = TRUE, model_batch = "svaseq")
## Removing 0 low-count genes (11360 remaining).
## Setting 1234 low elements to zero.
## transform_counts: Found 1234 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
persistence_table <- combine_de_tables(
persistence_de,
excel = glue::glue("excel/persistence_all_de-v{ver}.xlsx"))
## Deleting the file excel/persistence_all_de-v202203.xlsx before writing the tables.
persistence_monocyte_de <- all_pairwise(persistence_monocyte, filter = TRUE, model_batch = "svaseq")
## Removing 0 low-count genes (10371 remaining).
## Setting 44 low elements to zero.
## transform_counts: Found 44 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
persistence_monocyte_table <- combine_de_tables(
persistence_monocyte_de,
excel = glue::glue("excel/persistence_monocyte_de-v{ver}.xlsx"))
## Deleting the file excel/persistence_monocyte_de-v202203.xlsx before writing the tables.
persistence_neutrophil_de <- all_pairwise(persistence_neutrophil, filter = TRUE, model_batch = "svaseq")
## Removing 0 low-count genes (8360 remaining).
## Setting 53 low elements to zero.
## transform_counts: Found 53 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
persistence_neutrophil_table <- combine_de_tables(
persistence_neutrophil_de,
excel = glue::glue("excel/persistence_neutrophil_de-v{ver}.xlsx"))
## Deleting the file excel/persistence_neutrophil_de-v202203.xlsx before writing the tables.
persistence_eosinophil_de <- all_pairwise(persistence_eosinophil, filter = TRUE, model_batch = "svaseq")
## Removing 0 low-count genes (9940 remaining).
## Setting 19 low elements to zero.
## transform_counts: Found 19 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
persistence_eosinophil_table <- combine_de_tables(
persistence_eosinophil_de,
excel = glue::glue("excel/persistence_eosinophil_de-v{ver}.xlsx"))
## Deleting the file excel/persistence_eosinophil_de-v202203.xlsx before writing the tables.
visit_all_de <- all_pairwise(visit_expt, filter = TRUE, model_batch = "svaseq")
## Removing 0 low-count genes (12055 remaining).
## Setting 13664 low elements to zero.
## transform_counts: Found 13664 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_all_table <- combine_de_tables(
visit_all_de,
keepers = visit_contrasts,
excel = glue::glue("excel/visit_all_de-v{ver}.xlsx"))
## Deleting the file excel/visit_all_de-v202203.xlsx before writing the tables.
visit_all_sig <- extract_significant_genes(
visit_all_table,
excel = glue::glue("excel/visit_all_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_all_sig-v202203.xlsx before writing the tables.
visit_all_saved <- save(list = "visit_all_table",
file = glue::glue("rda/visit_all_table-v{ver}.rda"))
visit_monocyte_de <- all_pairwise(visit_monocyte, filter = TRUE, model_batch = "svaseq")
## Removing 0 low-count genes (11056 remaining).
## Setting 1077 low elements to zero.
## transform_counts: Found 1077 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_monocyte_table <- combine_de_tables(
visit_monocyte_de,
keepers = visit_contrasts,
excel = glue::glue("excel/visit_monocyte_de-v{ver}.xlsx"))
## Deleting the file excel/visit_monocyte_de-v202203.xlsx before writing the tables.
visit_monocyte_sig <- extract_significant_genes(
visit_monocyte_table,
excel = glue::glue("excel/visit_monocyte_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_monocyte_sig-v202203.xlsx before writing the tables.
visit_neutrophil_de <- all_pairwise(visit_neutrophil, filter = TRUE, model_batch = "svaseq")
## Removing 0 low-count genes (9210 remaining).
## Setting 1087 low elements to zero.
## transform_counts: Found 1087 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_neutrophil_table <- combine_de_tables(
visit_neutrophil_de,
keepers = visit_contrasts,
excel = glue::glue("excel/visit_neutrophil_de-v{ver}.xlsx"))
## Deleting the file excel/visit_neutrophil_de-v202203.xlsx before writing the tables.
visit_neutrophil_sig <- extract_significant_genes(
visit_neutrophil_table,
excel = glue::glue("excel/visit_neutrophil_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_neutrophil_sig-v202203.xlsx before writing the tables.
visit_neutrophil_saved <- save(list = "visit_neutrophil_table",
file = glue::glue("rda/visit_neutrophil_table-v{ver}.rda"))
visit_eosinophil_de <- all_pairwise(visit_eosinophil, filter = TRUE, model_batch = "svaseq")
## Removing 0 low-count genes (10685 remaining).
## Setting 470 low elements to zero.
## transform_counts: Found 470 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
visit_eosinophil_table <- combine_de_tables(
visit_eosinophil_de,
keepers = visit_contrasts,
excel = glue::glue("excel/visit_eosinophil_de-v{ver}.xlsx"))
## Deleting the file excel/visit_eosinophil_de-v202203.xlsx before writing the tables.
visit_eosinophil_sig <- extract_significant_genes(
visit_eosinophil_table,
excel = glue::glue("excel/visit_eosinophil_sig-v{ver}.xlsx"))
## Deleting the file excel/visit_eosinophil_sig-v202203.xlsx before writing the tables.
visit_eosinophil_saved <- save(list = "visit_eosinophil_table",
file = glue::glue("rda/visit_eosinophil_table-v{ver}.rda"))
monocytes_by_place <- set_expt_conditions(monocyte_samples, fact="clinic")
eosinophils_by_place <- set_expt_conditions(eosinophil_samples, fact="clinic")
neutrophils_by_place <- set_expt_conditions(neutrophil_samples, fact="clinic")
biopsies_by_place <- set_expt_conditions(biopsy_samples, fact="clinic")
monocyte_place_de <- all_pairwise(monocytes_by_place, model_batch=TRUE)
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
monocyte_table <- combine_de_tables(
monocyte_place_de,
excel="excel/monocytes_by_place-table.xlsx")
## Deleting the file excel/monocytes_by_place-table.xlsx before writing the tables.
monocyte_sig <- extract_significant_genes(
monocyte_table,
excel="excel/monocytes_by_place-sig.xlsx")
## Deleting the file excel/monocytes_by_place-sig.xlsx before writing the tables.
eosinophil_place_de <- all_pairwise(eosinophils_by_place, model_batch=TRUE)
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
eosinophil_table <- combine_de_tables(
eosinophil_place_de,
excel="excel/eosinophils_by_place-table.xlsx")
## Deleting the file excel/eosinophils_by_place-table.xlsx before writing the tables.
eosinophil_sig <- extract_significant_genes(
eosinophil_table,
excel="excel/eosinophils_by_place-sig.xlsx")
## Deleting the file excel/eosinophils_by_place-sig.xlsx before writing the tables.
neutrophil_place_de <- all_pairwise(neutrophils_by_place, model_batch=TRUE)
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
neutrophil_table <- combine_de_tables(
neutrophil_place_de,
excel="excel/neutrophils_by_place-table.xlsx")
## Deleting the file excel/neutrophils_by_place-table.xlsx before writing the tables.
neutrophil_sig <- extract_significant_genes(
neutrophil_table,
excel="excel/neutrophils_by_place-sig.xlsx")
## Deleting the file excel/neutrophils_by_place-sig.xlsx before writing the tables.
biopsy_place_de <- all_pairwise(biopsies_by_place, model_batch=TRUE)
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
biopsy_table <- combine_de_tables(
biopsy_place_de,
excel="biopsies_by_place-table.xlsx")
## Deleting the file biopsies_by_place-table.xlsx before writing the tables.
biopsy_sig <- extract_significant_genes(
biopsy_table,
excel="biopsies_by_place-sig.xlsx")
## Deleting the file biopsies_by_place-sig.xlsx before writing the tables.
The gene sets we are most likely to consider are the results of the various preceding differential expression analyses.
In the context of cure vs. fail, we have mixed and matched the data in quite a few different ways.
For the moment, let us assume that the default definition of ‘significant’ is sufficient for these analyses, with the knowledge that is not likely to remain true.
The resulting data structures are organized as:
thing[[“deseq”]][[ups"]][[contrast]]
In addition, we have a few ontology-esque tools available, so let us mix and match and see what pops out.
cf_all_up <- cf_clinical_sig[["deseq"]][["ups"]][["fail_vs_cure"]]
dim(cf_all_up)
## [1] 75 58
cf_all_down <- cf_clinical_sig[["deseq"]][["downs"]][["fail_vs_cure"]]
dim(cf_all_down)
## [1] 24 58
cf_all_up_gp <- simple_gprofiler(cf_all_up)
## Performing gProfiler GO search of 75 genes against hsapiens.
## GO search found 49 hits.
## Performing gProfiler KEGG search of 75 genes against hsapiens.
## KEGG search found 16 hits.
## Performing gProfiler REAC search of 75 genes against hsapiens.
## REAC search found 6 hits.
## Performing gProfiler MI search of 75 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 75 genes against hsapiens.
## TF search found 9 hits.
## Performing gProfiler CORUM search of 75 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 75 genes against hsapiens.
## HP search found 0 hits.
cf_all_up_gp[["pvalue_plots"]][["reactome_plot_over"]]
cf_all_up_gp[["pvalue_plots"]][["kegg_plot_over"]]
cf_all_up_gp[["pvalue_plots"]][["mfp_plot_over"]]
cf_all_up_gp[["pvalue_plots"]][["bpp_plot_over"]]
cf_all_down_gp <- simple_gprofiler(cf_all_down)
## Performing gProfiler GO search of 24 genes against hsapiens.
## GO search found 9 hits.
## Performing gProfiler KEGG search of 24 genes against hsapiens.
## KEGG search found 0 hits.
## Performing gProfiler REAC search of 24 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 24 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 24 genes against hsapiens.
## TF search found 0 hits.
## Performing gProfiler CORUM search of 24 genes against hsapiens.
## CORUM search found 4 hits.
## Performing gProfiler HP search of 24 genes against hsapiens.
## HP search found 0 hits.
cf_all_down_gp[["pvalue_plots"]][["reactome_plot_over"]]
## NULL
cf_all_down_gp[["pvalue_plots"]][["kegg_plot_over"]]
## NULL
cf_all_down_gp[["pvalue_plots"]][["mfp_plot_over"]]
## NULL
cf_all_down_gp[["pvalue_plots"]][["bpp_plot_over"]]
## NULL
There are no genes deemed significant in the biopsy comparisons of cure/fail. Oddly, when I looked manually, I thought I saw one candidate gene. In addition, when I stepped through the function manually I got the same gene…
Try again with monocytes.
cf_monocyte_up <- cf_monocyte_sva_sig[["deseq"]][["ups"]][["fail_vs_cure"]]
dim(cf_monocyte_up)
## [1] 30 58
cf_monocyte_down <- cf_monocyte_sva_sig[["deseq"]][["downs"]][["fail_vs_cure"]]
dim(cf_monocyte_down)
## [1] 19 58
cf_monocyte_up_gp <- simple_gprofiler(cf_monocyte_up)
## Performing gProfiler GO search of 30 genes against hsapiens.
## GO search found 33 hits.
## Performing gProfiler KEGG search of 30 genes against hsapiens.
## KEGG search found 8 hits.
## Performing gProfiler REAC search of 30 genes against hsapiens.
## REAC search found 12 hits.
## Performing gProfiler MI search of 30 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 30 genes against hsapiens.
## TF search found 2 hits.
## Performing gProfiler CORUM search of 30 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 30 genes against hsapiens.
## HP search found 35 hits.
cf_monocyte_up_gp[["pvalue_plots"]][["reactome_plot_over"]]
cf_monocyte_up_gp[["pvalue_plots"]][["kegg_plot_over"]]
cf_monocyte_up_gp[["pvalue_plots"]][["bpp_plot_over"]]
cf_monocyte_down_gp <- simple_gprofiler(cf_monocyte_down)
## Performing gProfiler GO search of 19 genes against hsapiens.
## GO search found 0 hits.
## Performing gProfiler KEGG search of 19 genes against hsapiens.
## KEGG search found 1 hits.
## Performing gProfiler REAC search of 19 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 19 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 19 genes against hsapiens.
## TF search found 0 hits.
## Performing gProfiler CORUM search of 19 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 19 genes against hsapiens.
## HP search found 0 hits.
cf_monocyte_down_gp[["pvalue_plots"]][["reactome_plot_over"]]
## NULL
cf_monocyte_down_gp[["pvalue_plots"]][["kegg_plot_over"]]
cf_neutrophil_up <- cf_neutrophil_sig[["deseq"]][["ups"]][["fail_vs_cure"]]
dim(cf_neutrophil_up)
## [1] 24 58
cf_neutrophil_down <- cf_neutrophil_sig[["deseq"]][["downs"]][["fail_vs_cure"]]
dim(cf_neutrophil_down)
## [1] 17 58
cf_neutrophil_up_gp <- simple_gprofiler(cf_neutrophil_up)
## Performing gProfiler GO search of 24 genes against hsapiens.
## GO search found 0 hits.
## Performing gProfiler KEGG search of 24 genes against hsapiens.
## KEGG search found 0 hits.
## Performing gProfiler REAC search of 24 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 24 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 24 genes against hsapiens.
## TF search found 12 hits.
## Performing gProfiler CORUM search of 24 genes against hsapiens.
## CORUM search found 1 hits.
## Performing gProfiler HP search of 24 genes against hsapiens.
## HP search found 0 hits.
cf_neutrophil_down_gp <- simple_gprofiler(cf_neutrophil_down)
## Performing gProfiler GO search of 17 genes against hsapiens.
## GO search found 0 hits.
## Performing gProfiler KEGG search of 17 genes against hsapiens.
## KEGG search found 1 hits.
## Performing gProfiler REAC search of 17 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 17 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 17 genes against hsapiens.
## TF search found 0 hits.
## Performing gProfiler CORUM search of 17 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 17 genes against hsapiens.
## HP search found 0 hits.
cf_neutrophil_down_gp[["pvalue_plots"]][["kegg_plot_over"]]
cf_eosinophil_up <- cf_eosinophil_sig[["deseq"]][["ups"]][["fail_vs_cure"]]
dim(cf_eosinophil_up)
## [1] 111 58
cf_eosinophil_down <- cf_eosinophil_sig[["deseq"]][["downs"]][["fail_vs_cure"]]
dim(cf_eosinophil_down)
## [1] 68 58
cf_eosinophil_up_gp <- simple_gprofiler(cf_eosinophil_up)
## Performing gProfiler GO search of 111 genes against hsapiens.
## GO search found 123 hits.
## Performing gProfiler KEGG search of 111 genes against hsapiens.
## KEGG search found 8 hits.
## Performing gProfiler REAC search of 111 genes against hsapiens.
## REAC search found 9 hits.
## Performing gProfiler MI search of 111 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 111 genes against hsapiens.
## TF search found 27 hits.
## Performing gProfiler CORUM search of 111 genes against hsapiens.
## CORUM search found 5 hits.
## Performing gProfiler HP search of 111 genes against hsapiens.
## HP search found 0 hits.
cf_eosinophil_up_gp[["pvalue_plots"]][["kegg_plot_over"]]
cf_eosinophil_up_gp[["pvalue_plots"]][["reactome_plot_over"]]
cf_eosinophil_up_gp[["pvalue_plots"]][["mfp_plot_over"]]
cf_eosinophil_up_gp[["pvalue_plots"]][["bpp_plot_over"]]
cf_eosinophil_down_gp <- simple_gprofiler(cf_eosinophil_down)
## Performing gProfiler GO search of 68 genes against hsapiens.
## GO search found 27 hits.
## Performing gProfiler KEGG search of 68 genes against hsapiens.
## KEGG search found 3 hits.
## Performing gProfiler REAC search of 68 genes against hsapiens.
## REAC search found 2 hits.
## Performing gProfiler MI search of 68 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 68 genes against hsapiens.
## TF search found 3 hits.
## Performing gProfiler CORUM search of 68 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 68 genes against hsapiens.
## HP search found 0 hits.
cf_eosinophil_down_gp[["pvalue_plots"]][["reactome_plot_over"]]
cf_eosinophil_down_gp[["pvalue_plots"]][["mfp_plot_over"]]
cf_eosinophil_down_gp[["pvalue_plots"]][["bpp_plot_over"]]
cf_nobiopsy_up <- cf_nobiopsy_sig[["deseq"]][["ups"]][["fail_vs_cure"]]
dim(cf_nobiopsy_up)
## [1] 85 58
cf_nobiopsy_down <- cf_nobiopsy_sig[["deseq"]][["downs"]][["fail_vs_cure"]]
dim(cf_nobiopsy_down)
## [1] 53 58
cf_nobiopsy_up_gp <- simple_gprofiler(cf_nobiopsy_up)
## Performing gProfiler GO search of 85 genes against hsapiens.
## GO search found 43 hits.
## Performing gProfiler KEGG search of 85 genes against hsapiens.
## KEGG search found 5 hits.
## Performing gProfiler REAC search of 85 genes against hsapiens.
## REAC search found 6 hits.
## Performing gProfiler MI search of 85 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 85 genes against hsapiens.
## TF search found 3 hits.
## Performing gProfiler CORUM search of 85 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 85 genes against hsapiens.
## HP search found 0 hits.
cf_nobiopsy_up_gp[["pvalue_plots"]][["kegg_plot_over"]]
cf_nobiopsy_up_gp[["pvalue_plots"]][["reactome_plot_over"]]
cf_nobiopsy_up_gp[["pvalue_plots"]][["mfp_plot_over"]]
## NULL
cf_nobiopsy_up_gp[["pvalue_plots"]][["bpp_plot_over"]]
cf_nobiopsy_down_gp <- simple_gprofiler(cf_nobiopsy_down)
## Performing gProfiler GO search of 53 genes against hsapiens.
## GO search found 1 hits.
## Performing gProfiler KEGG search of 53 genes against hsapiens.
## KEGG search found 0 hits.
## Performing gProfiler REAC search of 53 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 53 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 53 genes against hsapiens.
## TF search found 2 hits.
## Performing gProfiler CORUM search of 53 genes against hsapiens.
## CORUM search found 6 hits.
## Performing gProfiler HP search of 53 genes against hsapiens.
## HP search found 2 hits.
cf_nobiopsy_down_gp[["pvalue_plots"]][["reactome_plot_over"]]
## NULL
cf_nobiopsy_down_gp[["pvalue_plots"]][["mfp_plot_over"]]
## NULL
cf_nobiopsy_down_gp[["pvalue_plots"]][["bpp_plot_over"]]
## NULL
hs_celltype_gsva_c2 <- simple_gsva(hs_valid)
## Converting the rownames() of the expressionset to ENTREZID.
## 588 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19923 entries.
## After conversion, the expressionset has 19495 entries.
hs_celltype_gsva_c2_sig <- get_sig_gsva_categories(
hs_celltype_gsva_c2,
excel="excel/individual_celltypes_gsva_c2.xlsx")
## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Choosing the non-intercept containing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure. Adjust = BH
## The factor cure has 104 rows.
## The factor failure has 58 rows.
## plot labels was not set and there are more than 100 samples, disabling it.
## Testing each factor against the others.
## Scoring cure against everything else.
## Scoring failure against everything else.
## Deleting the file excel/individual_celltypes_gsva_c2.xlsx before writing the tables.
hs_celltype_gsva_c2_sig$subset_plot
hs_celltype_gsva_c2_sig$score_plot
broad_c7 <- load_gmt_signatures(signatures="reference/msigdb/c7.all.v7.2.entrez.gmt",
signature_category="c7")
hs_celltype_gsva_c7 <- simple_gsva(hs_valid,
signatures="reference/msigdb/c7.all.v7.2.entrez.gmt",
signature_category="c7",
msig_xml="reference/msigdb_v7.2.xml",
cores=10)
## Converting the rownames() of the expressionset to ENTREZID.
## 588 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19923 entries.
## After conversion, the expressionset has 19495 entries.
## Adding annotations from reference/msigdb_v7.2.xml.
hs_celltype_gsva_c7_sig <- get_sig_gsva_categories(
hs_celltype_gsva_c7,
excel="excel/individual_celltypes_gsva_c7.xlsx")
## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Limma step 1/6: choosing model.
## Choosing the non-intercept containing model.
## Assuming this data is similar to a micro array and not performign voom.
## Limma step 3/6: running lmFit with method: ls.
## Limma step 4/6: making and fitting contrasts with no intercept. (~ 0 + factors)
## Limma step 5/6: Running eBayes with robust = FALSE and trend = FALSE.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/1: Creating table: failure_vs_cure. Adjust = BH
## Limma step 6/6: 1/2: Creating table: cure. Adjust = BH
## Limma step 6/6: 2/2: Creating table: failure. Adjust = BH
## The factor cure has 104 rows.
## The factor failure has 58 rows.
## plot labels was not set and there are more than 100 samples, disabling it.
## Testing each factor against the others.
## Scoring cure against everything else.
## Scoring failure against everything else.
## Deleting the file excel/individual_celltypes_gsva_c7.xlsx before writing the tables.
## The raw heatmap of the C2 values
hs_celltype_gsva_c2_sig[["raw_plot"]]
## The 'significance' scores of the C2 values
hs_celltype_gsva_c2_sig[["score_plot"]]
## The subset of scores for categories deemed significantly different.
hs_celltype_gsva_c2_sig[["subset_plot"]]
## The raw heatmap of the C7 values
hs_celltype_gsva_c7_sig[["raw_plot"]]
## The 'significance' scores of the C7 values
hs_celltype_gsva_c7_sig[["score_plot"]]
## The subset of scores for categories deemed significantly different.
hs_celltype_gsva_c7_sig[["subset_plot"]]
Let us compare various results to see how well they agreed
monocyte_cor_subset <- cf_monocyte_sva_tables[["data"]][["fail_vs_cure"]][, c("deseq_logfc", "deseq_adjp")]
neutrophil_cor_subset <- cf_neutrophil_tables[["data"]][["fail_vs_cure"]][, c("deseq_logfc", "deseq_adjp")]
eosinophil_cor_subset <- cf_eosinophil_tables[["data"]][["fail_vs_cure"]][, c("deseq_logfc", "deseq_adjp")]
mono_neut_cor <- merge(monocyte_cor_subset, neutrophil_cor_subset, by="row.names")
mono_eo_cor <- merge(monocyte_cor_subset, eosinophil_cor_subset, by="row.names")
neut_eo_cor <- merge(neutrophil_cor_subset, eosinophil_cor_subset, by="row.names")
cor.test(mono_neut_cor[["deseq_logfc.x"]], mono_neut_cor[["deseq_logfc.y"]])
##
## Pearson's product-moment correlation
##
## data: mono_neut_cor[["deseq_logfc.x"]] and mono_neut_cor[["deseq_logfc.y"]]
## t = 41, df = 8722, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3845 0.4197
## sample estimates:
## cor
## 0.4022
monocyte_neutrophil_aucc <- calculate_aucc(cf_monocyte_sva_tables[["data"]][["fail_vs_cure"]],
cf_neutrophil_tables[["data"]][["fail_vs_cure"]],
px="deseq_p", py="deseq_p", ly="deseq_logfc", lx="deseq_logfc")
monocyte_neutrophil_aucc$plot
cor.test(mono_eo_cor[["deseq_logfc.x"]], mono_eo_cor[["deseq_logfc.y"]])
##
## Pearson's product-moment correlation
##
## data: mono_eo_cor[["deseq_logfc.x"]] and mono_eo_cor[["deseq_logfc.y"]]
## t = 44, df = 9953, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3858 0.4187
## sample estimates:
## cor
## 0.4024
monocyte_eosinophil_aucc <- calculate_aucc(cf_monocyte_sva_tables[["data"]][["fail_vs_cure"]],
cf_eosinophil_tables[["data"]][["fail_vs_cure"]],
px="deseq_p", py="deseq_p", ly="deseq_logfc", lx="deseq_logfc")
monocyte_eosinophil_aucc$plot
cor.test(neut_eo_cor[["deseq_logfc.x"]], neut_eo_cor[["deseq_logfc.y"]])
##
## Pearson's product-moment correlation
##
## data: neut_eo_cor[["deseq_logfc.x"]] and neut_eo_cor[["deseq_logfc.y"]]
## t = 31, df = 8689, p-value <2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2998 0.3376
## sample estimates:
## cor
## 0.3188
neutrophil_eosinophil_aucc <- calculate_aucc(cf_neutrophil_tables[["data"]][["fail_vs_cure"]],
cf_eosinophil_tables[["data"]][["fail_vs_cure"]],
px="deseq_p", py="deseq_p", ly="deseq_logfc", lx="deseq_logfc")
neutrophil_eosinophil_aucc$plot
I wrote out all the z2.2 and z2.3 specific variants to a couple files, I want to see if I can classify a human sample as infected with 2.2 or 2.3.
z22 <- read.csv("csv/variants_22.csv")
z23 <- read.csv("csv/variants_23.csv")
cure <- read.csv("csv/cure_variants.txt")
fail <- read.csv("csv/fail_variants.txt")
z22_vec <- gsub(pattern="\\-", replacement="_", x=z22[["x"]])
z23_vec <- gsub(pattern="\\-", replacement="_", x=z23[["x"]])
cure_vec <- gsub(pattern="\\-", replacement="_", x=cure)
fail_vec <- gsub(pattern="\\-", replacement="_", x=fail)
classify_zymo <- function(sample) {
arbitrary_tags <- sm(readr::read_tsv(sample))
arbitrary_ids <- arbitrary_tags[["position"]]
message("Length: ", length(arbitrary_ids), ", z22: ",
sum(arbitrary_ids %in% z22_vec) / (length(z22_vec)), " z23: ",
sum(arbitrary_ids %in% z23_vec) / (length(z23_vec)))
}
arbitrary_sample <- "preprocessing/TMRC30156/outputs/40freebayes_lpanamensis_v36/all_tags.txt.xz"
classify_zymo(arbitrary_sample)
if (!isTRUE(get0("skip_load"))) {
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
message(paste0("Saving to ", savefile))
tmp <- sm(saveme(filename=savefile))
}
tmp <- loadme(filename=savefile)