I think there have been some new/improved annotations in the online sample sheet, so let us go download a fresh copy and see.
samplesheet <- "sample_sheets/tmrc3_samples_20211012.xlsx"
crf_metadata <- "sample_sheets/20210825_EXP_ESPECIAL_TMRC3_VERSION_2.xlsx"
We take the annotation data from ensembl’s biomart instance. The genome which was used to map the data was hg38 revision 100. My default when using biomart is to load the data from 1 year before the current date.
hs_annot <- sm(load_biomart_annotations(year="2020"))
hs_annot <- hs_annot[["annotation"]]
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique=TRUE)
tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]
summary(hs_annot)
## ensembl_transcript_id ensembl_gene_id version transcript_version
## Length:227921 Length:227921 Min. : 1.0 Min. : 1.00
## Class :character Class :character 1st Qu.: 6.0 1st Qu.: 1.00
## Mode :character Mode :character Median :12.0 Median : 1.00
## Mean :10.7 Mean : 3.08
## 3rd Qu.:16.0 3rd Qu.: 5.00
## Max. :29.0 Max. :17.00
##
## hgnc_symbol description gene_biotype cds_length
## Length:227921 Length:227921 Length:227921 Min. : 3
## Class :character Class :character Class :character 1st Qu.: 357
## Mode :character Mode :character Mode :character Median : 694
## Mean : 1139
## 3rd Qu.: 1446
## Max. :107976
## NA's :127343
## chromosome_name strand start_position end_position
## Length:227921 Length:227921 Min. :5.77e+02 Min. :6.47e+02
## Class :character Class :character 1st Qu.:3.11e+07 1st Qu.:3.12e+07
## Mode :character Mode :character Median :6.04e+07 Median :6.06e+07
## Mean :7.41e+07 Mean :7.42e+07
## 3rd Qu.:1.09e+08 3rd Qu.:1.09e+08
## Max. :2.49e+08 Max. :2.49e+08
##
## transcript
## Length:227921
## Class :character
## Mode :character
##
##
##
##
hs_go <- sm(load_biomart_go()[["go"]])
hs_length <- hs_annot[, c("ensembl_gene_id", "cds_length")]
colnames(hs_length) <- c("ID", "length")
This document is intended to provide an overview of TMRC3 samples which have been sequenced. It includes some plots and analyses showing the relationships among the samples as well as some differential analyses when possible.
The sample sheet is copied from our shared online sheet and updated with each release of sequencing data.
The first thing to note is the large range in coverage. There are multiple samples with coverage which is too low to use. These will be removed shortly.
In the following block I immediately exclude any non-coding reads as well.
## Create the expressionset and immediately pass it to a filter
## removing the non protein coding genes.
sanitize_columns <- c("visitnumber", "clinicaloutcome", "donor",
"typeofcells", "clinicalpresentation",
"condition", "batch")
hs_expt <- create_expt(samplesheet,
file_column="hg38100hisatfile",
savefile=glue::glue("rda/hs_expt_all-v{ver}.rda"),
gene_info=hs_annot) %>%
exclude_genes_expt(column="gene_biotype", method="keep",
pattern="protein_coding", meta_column="ncrna_lost") %>%
sanitize_expt_metadata(columns=sanitize_columns) %>%
set_expt_factors(columns=sanitize_columns, class="factor")
## Reading the sample metadata.
## Dropped 54 rows from the sample metadata because they were blank.
## The sample definitions comprises: 238 rows(samples) and 75 columns(metadata fields).
## Warning in create_expt(samplesheet, file_column = "hg38100hisatfile", savefile =
## glue::glue("rda/hs_expt_all-v{ver}.rda"), : Some samples were removed when cross
## referencing the samples against the count data.
## Matched 21452 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final expressionset has 21481 rows and 205 columns.
## Before removal, there were 21481 genes, now there are 19941.
## There are 19 samples which kept less than 90 percent counts.
## TMRC30015 TMRC30017 TMRC30019 TMRC30044 TMRC30045 TMRC30154 TMRC30097 TMRC30075
## 79.24 85.72 89.75 80.34 73.33 83.20 89.90 86.97
## TMRC30087 TMRC30101 TMRC30104 TMRC30114 TMRC30126 TMRC30127 TMRC30120 TMRC30128
## 83.63 88.41 80.29 87.62 84.52 89.49 79.16 82.53
## TMRC30141 TMRC30131 TMRC30073
## 89.40 86.82 89.26
levels(pData(hs_expt[["expressionset"]])[["visitnumber"]]) <- list(
'0'="notapplicable", '1'=1, '2'=2, '3'=3)
Let us also merge in the clinician’s metadata. I worry a little that this might not be allowed for dbGap data, but if it is a problem I suspect we can just remove the bad columns from it. Also note that I rarely use the join function, but it is somewhat required here because I do not want to risk shuffling the metadata when I add the new metadata, which comes from a spreadsheet sorted by patient, not sample. In doing this I therefore just created a new column ‘join’ which contains the shared information, e.g. the patient ID from the existing metadata and the same ID from the CRF file which has been coerced into lowercase.
hs_pd <- pData(hs_expt)
start <- rownames(hs_pd)
hs_crf <- openxlsx::read.xlsx(crf_metadata)
hs_crf[["join"]] <- tolower(hs_crf[["codigo_paciente"]])
hs_pd[["join"]] <- hs_pd[["tubelabelorigin"]]
test <- plyr::join(hs_pd, hs_crf, by="join")
test[["join"]] <- NULL
rownames(test) <- rownames(hs_pd)
na_idx <- is.na(test)
test[na_idx] <- "undefined"
pData(hs_expt) <- test
This added a bunch of new columns, of which there are a few which Theresa showed are of likely interest:
Split this data into CDS and lncRNA. Oh crap in order to do that I need to recount the data. Running now (20210518)
## lnc_expt <- create_expt(samplesheet,
## file_column="hg38100lncfile",
## gene_info=hs_annot)
Once the data was loaded, there are a couple of metrics which may be plotted immediately.
nonzero <- plot_nonzero(hs_expt)
nonzero$plot
## Warning: ggrepel: 182 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
ncrna_lost_df <- as.data.frame(pData(hs_expt)[["ncrna_lost"]])
rownames(ncrna_lost_df) <- rownames(pData(hs_expt))
colnames(ncrna_lost_df) <- "ncrna_lost"
tmpdf <- merge(nonzero$table, ncrna_lost_df, by="row.names")
rownames(tmpdf) <- tmpdf[["Row.names"]]
tmpdf[["Row.names"]] <- NULL
ggplot(tmpdf, aes(x=ncrna_lost, y=nonzero_genes)) +
ggplot2::geom_point() +
ggplot2::ggtitle("Nonzero genes with respect to percent counts
lost when ncRNA was removed.")
Najib doesn’t want this plot, but I am using it to check new samples, so will hide it from general use.
libsize <- plot_libsize(hs_expt)
libsize$plot
I arbitrarily chose 11,000 non-zero genes as a minimum. We may want this to be higher.
hs_valid <- subset_expt(hs_expt, nonzero=11000)
## The samples (and read coverage) removed when filtering 11000 non-zero genes are:
## TMRC30010 TMRC30050 TMRC30052
## 52471 808149 3087347
## subset_expt(): There were 205, now there are 202 samples.
valid_write <- sm(write_expt(hs_valid, excel=glue("excel/hs_valid-v{ver}.xlsx")))
## Error in quantile.default(prop_median, p = c(1, 3)/4) :
## missing values and NaN's not allowed if 'na.rm' is FALSE
## Error in if (sum(warning_idx) > 0) { :
## missing value where TRUE/FALSE needed
The project seeks to determine the relationship of the innate immune response and inflammatory signaling to the clinical outcome of antileishmanial drug treatment. We will test the hypothesis that the profile of innate immune cell activation and their dynamics through the course of treatment differ between CL patients with prospectively determined therapeutic cure or failure.
This will be achieved through the characterization of the in vivo dynamics of blood-derived monocyte, neutrophil and eosinophil transcriptome before, during and at the end of treatment in CL patients. Cell-type specific transcriptomes, composite signatures and time-response expression profiles will be contrasted among patients with therapeutic cure or failure.
To address these, I added to the end of the sample sheet columns named ‘condition’, ‘batch’, ‘donor’, and ‘time’. These are filled in with shorthand values according to the above.
Before addressing the questions explicitly by subsetting the data, I want to get a look at the samples as they are.
new_names <- pData(hs_valid)[["samplename"]]
hs_valid <- hs_valid %>%
set_expt_batches(fact="cellssource") %>%
set_expt_conditions(fact="typeofcells") %>%
set_expt_samplenames(newnames=new_names)
all_norm <- sm(normalize_expt(hs_valid, transform="log2", norm="quant",
convert="cpm", filter=TRUE))
all_pca <- plot_pca(all_norm, plot_labels=FALSE,
plot_title="PCA - Cell type", size_column="visitnumber")
pp(file=glue("images/tmrc3_pca_nolabels-v{ver}.png"), image=all_pca$plot)
write.csv(all_pca$table, file="coords/hs_donor_pca_coords.csv")
plot_corheat(all_norm, plot_title="Heirarchical clustering:
cell types")$plot
Now let us consider only the samples for which we have a clinical outcome. These fall primarily into either ‘cured’ or ‘failed’, but some people have not yet returned to the clinic after the first or second visit. These are deemed ‘lost’.
hs_clinical <- hs_valid %>%
set_expt_conditions(fact="clinicaloutcome") %>%
set_expt_batches(fact="typeofcells") %>%
subset_expt(subset="typeofcells!='pbmcs'&typeofcells!='macrophages'")
## subset_expt(): There were 202, now there are 182 samples.
chosen_colors <- c("#D95F02", "#7570B3", "#1B9E77", "#FF0000", "#FF0000")
names(chosen_colors) <- c("cure", "failure", "lost", "null", "notapplicable")
hs_clinical <- set_expt_colors(hs_clinical, colors=chosen_colors)
## Warning in set_expt_colors(hs_clinical, colors = chosen_colors): Colors for the
## following categories are not being used: null.
newnames <- make.names(pData(hs_clinical)[["samplename"]], unique=TRUE)
hs_clinical <- set_expt_samplenames(hs_clinical, newnames=newnames)
hs_clinical_norm <- sm(normalize_expt(hs_clinical, filter=TRUE, transform="log2",
convert="cpm", norm="quant"))
clinical_pca <- plot_pca(hs_clinical_norm, plot_labels=FALSE,
size_column="visitnumber", cis=NULL,
plot_title="PCA - clinical samples")
pp(file=glue("images/all_clinical_nobatch_pca-v{ver}.png"), image=clinical_pca$plot,
height=8, width=20)
hs_clinical_nobiop <- hs_clinical %>%
subset_expt(subset="typeofcells!='biopsy'") %>%
subset_expt(subset="condition=='lost'|condition=='cure'|condition=='failure'")
## subset_expt(): There were 182, now there are 130 samples.
## subset_expt(): There were 130, now there are 128 samples.
hs_clinical_nobiop_norm <- sm(normalize_expt(hs_clinical_nobiop, filter=TRUE, transform="log2",
convert="cpm", norm="quant"))
clinical_nobiop_pca <- plot_pca(hs_clinical_nobiop_norm, plot_labels=FALSE, cis=NULL,
plot_title="PCA - clinical samples without biopsies")
pp(file=glue("images/all_clinical_nobiop_nobatch_pca-v{ver}.png"),
image=clinical_nobiop_pca$plot)
At this time we have two primary data structures of interest: hs_clinical and hs_clinical_nobiop
hs_clinical_nb <- normalize_expt(hs_clinical, filter=TRUE, batch="svaseq",
transform="log2", convert="cpm")
## Removing 5216 low-count genes (14725 remaining).
## batch_counts: Before batch/surrogate estimation, 198077 entries are x==0: 7%.
## batch_counts: Before batch/surrogate estimation, 528392 entries are 0<x<1: 20%.
## Setting 27938 low elements to zero.
## transform_counts: Found 27938 values equal to 0, adding 1 to the matrix.
clinical_batch_pca <- plot_pca(hs_clinical_nb, plot_labels=FALSE, cis=NULL,
size_column="visitnumber", plot_title="PCA - clinical samples")
clinical_batch_pca$plot
hs_clinical_nobiop_nb <- sm(normalize_expt(hs_clinical_nobiop, filter=TRUE, batch="svaseq",
transform="log2", convert="cpm"))
clinical_nobiop_batch_pca <- plot_pca(hs_clinical_nobiop_nb,
plot_title="PCA - clinical samples without biopsies",
plot_labels=FALSE)
pp(file="images/clinical_batch.png", image=clinical_nobiop_batch_pca$plot)
test <- plot_pca(hs_clinical_nobiop_nb, size_column="visitnumber",
plot_title="PCA - clinical samples without biopsies",
plot_labels=FALSE)
test$plot
clinical_nobiop_batch_tsne <- plot_tsne(hs_clinical_nobiop_nb,
plot_title="tSNE - clinical samples without biopsies",
plot_labels=FALSE)
clinical_nobiop_batch_tsne$plot
individual_celltypes <- subset_expt(hs_clinical_nobiop, subset="condition!='lost'")
## subset_expt(): There were 128, now there are 115 samples.
hs_clinic_de <- sm(all_pairwise(individual_celltypes, model_batch="svaseq", filter=TRUE))
hs_clinic_table <- sm(combine_de_tables(
hs_clinic_de,
excel=glue::glue("excel/individual_celltypes_table-v{ver}.xlsx")))
hs_clinic_sig <- sm(extract_significant_genes(
hs_clinic_table,
excel=glue::glue("excel/individual_celltypes_sig-v{ver}.xlsx")))
hs_clinic_sig[["summary_df"]]
## limma_up limma_down edger_up edger_down deseq_up deseq_down
## failure_vs_cure 66 65 149 82 147 85
## ebseq_up ebseq_down basic_up basic_down
## failure_vs_cure 79 151 27 16
hs_clinic_de[["comparison"]][["heat"]]
## NULL
I am not sure if we have enough samples across the three visit to completely work as well as we would like, but there is only 1 way to find out! Now that I think about it, one thing which might be awesome is to use cell type as an interacting factor…
I figure this might be a place where the biopsy samples might prove useful.
clinical_nolost <- subset_expt(hs_clinical, subset="condition!='lost'&condition!='notapplicable'")
## subset_expt(): There were 182, now there are 165 samples.
## Currently (202109), this is not returning any significant hits.
## In previous iterations it did.
lrt_visit_clinical_test <- sm(deseq_lrt(clinical_nolost, transform="vst",
interactor_column="visitnumber",
interest_column="clinicaloutcome"))
summary(lrt_visit_clinical_test[["deseq_table"]])
## gene baseMean log2FoldChange lfcSE
## Length:19941 Min. : 0 Min. :-4.5 Min. :0.0
## Class :character 1st Qu.: 6 1st Qu.:-0.1 1st Qu.:0.4
## Mode :character Median : 186 Median : 0.1 Median :0.6
## Mean : 1771 Mean : 0.1 Mean :1.0
## 3rd Qu.: 1107 3rd Qu.: 0.3 3rd Qu.:1.0
## Max. :355344 Max. : 8.0 Max. :7.0
## NA's :1624 NA's :1624
## stat pvalue padj
## Min. :-1.6 Min. :0.0 Min. :0.4
## 1st Qu.: 0.1 1st Qu.:0.6 1st Qu.:1.0
## Median : 0.4 Median :0.8 Median :1.0
## Mean : 0.7 Mean :0.8 Mean :1.0
## 3rd Qu.: 0.9 3rd Qu.:0.9 3rd Qu.:1.0
## Max. :21.6 Max. :1.0 Max. :1.0
## NA's :1624 NA's :1624 NA's :1625
written <- write_xlsx(data=as.data.frame(lrt_visit_clinical_test[["deseq_table"]]),
excel=glue::glue("excel/lrt_clinical_visit-v{ver}.xlsx"))
lrt_celltype_clinical_test <- sm(deseq_lrt(clinical_nolost, transform="vst",
interactor_column="typeofcells",
interest_column="clinicaloutcome"))
summary(lrt_celltype_clinical_test[["favorite_genes"]])
## genes cluster
## Length:556 Min. : 1.00
## Class :character 1st Qu.: 2.00
## Mode :character Median : 5.00
## Mean : 6.39
## 3rd Qu.:10.00
## Max. :17.00
favorite_lrt_df <- merge(hs_annot, lrt_celltype_clinical_test[["favorite_genes"]], all.y=TRUE,
by="row.names")
written <- write_xlsx(data=favorite_lrt_df,
excel=glue::glue("excel/lrt_clinical_celltype_favorites-v{ver}.xlsx"))
## Error : colNames must be a unique vector (case sensitive)
deseq_lrt_df <- merge(hs_annot, as.data.frame(lrt_celltype_clinical_test[["deseq_table"]]), all.y=TRUE,
by="row.names")
rownames(deseq_lrt_df) <- deseq_lrt_df[["Row.names"]]
deseq_lrt_df[["Row.names"]] <- NULL
written <- write_xlsx(data=deseq_lrt_df,
excel=glue::glue("excel/lrt_clinical_celltype-v{ver}.xlsx"))
lrt_celltype_clinical_test$cluster_data$plot
cluster_23_genes_idx <- lrt_celltype_clinical_test[["cluster_data"]][["df"]][["cluster"]] == 23
cluster_23_genes <- lrt_celltype_clinical_test[["cluster_data"]][["df"]][cluster_23_genes_idx, ]
cluster_23_genes
## [1] genes cluster
## <0 rows> (or 0-length row.names)
cluster_24_genes_idx <- lrt_celltype_clinical_test[["cluster_data"]][["df"]][["cluster"]] == 24
cluster_24_genes <- lrt_celltype_clinical_test[["cluster_data"]][["df"]][cluster_24_genes_idx, ]
cluster_24_genes
## [1] genes cluster
## <0 rows> (or 0-length row.names)
cluster_22_genes_idx <- lrt_celltype_clinical_test[["cluster_data"]][["df"]][["cluster"]] == 22
cluster_22_genes <- lrt_celltype_clinical_test[["cluster_data"]][["df"]][cluster_22_genes_idx, ]
cluster_22_genes
## [1] genes cluster
## <0 rows> (or 0-length row.names)
A good suggestion from Theresa was to examine only the most variant genes from failure vs. cure and see how they change the clustering/etc results. This is my attempt to address this query.
hs_clinic_topn <- sm(extract_significant_genes(hs_clinic_table, n=100))
table <- "failure_vs_cure"
wanted <- rbind(hs_clinic_topn[["deseq"]][["ups"]][[table]],
hs_clinic_topn[["deseq"]][["downs"]][[table]])
small_expt <- exclude_genes_expt(hs_clinical_nobiop, ids=rownames(wanted), method="keep")
## Before removal, there were 19941 genes, now there are 185.
## There are 128 samples which kept less than 90 percent counts.
## X1034n1 X1034n2 X1034m2 X1034m2. X2052e1 X2052m2 X2052n2 X2052m3
## 2.2274 2.6091 1.1142 1.0829 0.4526 0.5752 1.4151 0.6020
## X2052n3 X2065m1 X2065n1 X2066m1 X2066n1 X2065m2 X2065n2 X2065e2
## 1.1405 0.9576 2.7549 0.3155 0.7269 0.4954 0.3815 1.1328
## X2066m2 X2066n2 X2066e2 X2068m1 X2068n1 X2068e1 X2072m1 X2072n1
## 0.4227 0.5830 0.7075 0.3947 0.4228 0.7894 0.4116 0.3634
## X2072e1 X2071m1 X2071n1 X2071e1 X2073m1 X2073n1 X2073e1 X2068m2
## 2.2533 0.6788 1.5732 0.7201 0.5611 1.7058 0.7313 0.2834
## X2068n2 X2068e2 X2072m2 X2072n2 X2072e2 X2073m2 X2073n2 X2073e2
## 0.4449 0.7470 0.3621 0.5655 0.6087 0.9586 2.6473 0.9657
## X2066m3 X2066n3 X2066e3 X2065m3 X2065n3 X2065e3 X2068m3 X2068n3
## 0.3645 0.5962 0.6344 0.3224 0.4705 0.7163 0.3410 0.4551
## X2068e3 X2072m3 X2072n3 X2072e3 X2073m3 X2073n3 X2073e3 X2162m1
## 0.6287 0.6529 1.8931 0.6974 0.4709 0.6388 0.6440 0.3844
## X2162n1 X2162e1 X2162m2 X2162n2 X2162e2 X2162n3 X2162e3 X2167m1
## 0.3656 0.7524 0.4541 0.4905 0.6892 0.5677 0.6321 0.4773
## X2167n1 X2167e1 X2168m1 X2168n1 X2168e1 X2168m2 X2168n2 X2168e2
## 0.5943 0.8638 0.9937 2.2257 0.9736 0.8788 2.2667 0.8118
## X2167m2 X2167n2 X2167e2 X2167m3 X2167n3 X2167e3 X2168m3 X2168n3
## 0.2899 0.5627 0.6555 0.5068 0.9561 0.8044 1.2773 3.5622
## X2168e3 X2172m1 X2172n1 X2172e1 X2172m2 X2172n2 X2173m1 X2173n1
## 1.0071 0.3799 0.3831 0.6905 0.5475 0.6967 0.3014 0.2521
## X2172m3 X2172e3 X1168m1 X1168n1 X1168e1 X1168m2 X1168n2 X1168e2
## 0.5066 0.9251 0.6825 1.3526 0.6068 0.6667 1.1999 0.6026
## X1168m3 X1168n3 X1167m3 X1167n3 X2183m1 X2183n1 X2183m2 X2183n2
## 0.6623 0.9424 0.4544 0.7630 0.4737 1.1239 0.3716 0.5072
## X2183m3 X2183n3 X2184m1 X2184n1 X2184m3 X2184n3 X2190m1 X2190n1
## 0.3410 0.4222 0.5341 1.2013 0.8209 1.9456 0.3757 0.6372
## X2190m2 X2190n2 X2190m3 X2190n3 X1176m1 X1176n1 X1176e1 X1176m2
## 0.3435 0.3881 0.2749 0.3130 0.4005 0.6181 0.8713 0.3835
## X1176n2 X1176m3 X1176n3 X1176e3 X1167m1 X1167n1 X1167m2 X1167n2
## 0.4984 0.5570 0.8030 0.9950 0.4504 0.6149 0.4886 0.6292
small_norm <- sm(normalize_expt(small_expt, transform="log2", convert="cpm",
norm="quant", filter=TRUE))
plot_pca(small_norm)$plot
## plot labels was not set and there are more than 100 samples, disabling it.
small_nb <- normalize_expt(small_expt, transform="log2", convert="cpm",
batch="svaseq", norm="quant", filter=TRUE)
## Warning in normalize_expt(small_expt, transform = "log2", convert = "cpm", :
## Quantile normalization and sva do not always play well together.
## Removing 0 low-count genes (185 remaining).
## batch_counts: Before batch/surrogate estimation, 1 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 1705 entries are 0<x<1: 7%.
## Setting 741 low elements to zero.
## transform_counts: Found 741 values equal to 0, adding 1 to the matrix.
plot_pca(small_nb)$plot
## plot labels was not set and there are more than 100 samples, disabling it.
## DESeq2 MA plot of failure / cure
hs_clinic_table[["plots"]][["failure_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure
hs_clinic_table[["plots"]][["failure_vs_cure"]][["deseq_vol_plots"]]$plot
ups <- hs_clinic_sig[["deseq"]][["ups"]][[1]]
downs <- hs_clinic_sig[["deseq"]][["downs"]][[1]]
hs_clinic_gprofiler_ups <- simple_gprofiler(ups)
## Performing gProfiler GO search of 147 genes against hsapiens.
## GO search found 45 hits.
## Performing gProfiler KEGG search of 147 genes against hsapiens.
## KEGG search found 5 hits.
## Performing gProfiler REAC search of 147 genes against hsapiens.
## REAC search found 7 hits.
## Performing gProfiler MI search of 147 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 147 genes against hsapiens.
## TF search found 35 hits.
## Performing gProfiler CORUM search of 147 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 147 genes against hsapiens.
## HP search found 0 hits.
hs_clinic_gprofiler_ups[["pvalue_plots"]][["bpp_plot_over"]]
hs_clinic_gprofiler_ups[["pvalue_plots"]][["mfp_plot_over"]]
hs_clinic_gprofiler_ups[["pvalue_plots"]][["reactome_plot_over"]]
##hs_try2 <- simple_gprofiler2(ups)
hs_clinic_gprofiler_downs <- simple_gprofiler(downs)
## Performing gProfiler GO search of 85 genes against hsapiens.
## GO search found 40 hits.
## Performing gProfiler KEGG search of 85 genes against hsapiens.
## KEGG search found 1 hits.
## Performing gProfiler REAC search of 85 genes against hsapiens.
## REAC search found 4 hits.
## Performing gProfiler MI search of 85 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 85 genes against hsapiens.
## TF search found 0 hits.
## Performing gProfiler CORUM search of 85 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 85 genes against hsapiens.
## HP search found 8 hits.
hs_clinic_gprofiler_downs[["pvalue_plots"]][["bpp_plot_over"]]
hs_clinic_gprofiler_downs[["pvalue_plots"]][["mfp_plot_over"]]
## NULL
hs_clinic_gprofiler_downs[["pvalue_plots"]][["reactome_plot_over"]]
hs_celltype_gsva_c2 <- sm(simple_gsva(individual_celltypes))
hs_celltype_gsva_c2_sig <- sm(get_sig_gsva_categories(
hs_celltype_gsva_c2,
excel="excel/individual_celltypes_gsva_c2.xlsx"))
## broad_c7 <- GSEABase::getGmt("reference/msigdb/c7.all.v7.2.entrez.gmt",
## collectionType=GSEABase::BroadCollection(category="c7"),
## geneIdType=GSEABase::EntrezIdentifier())
## hs_celltype_gsva_c7 <- sm(simple_gsva(individual_celltypes, signatures=broad_c7,
## msig_xml="reference/msigdb_v7.2.xml", cores=10))
hs_celltype_gsva_c7 <- simple_gsva(individual_celltypes,
signatures="reference/msigdb/c7.all.v7.2.entrez.gmt",
signature_category="c7",
msig_xml="reference/msigdb_v7.2.xml",
cores=10)
## Converting the rownames() of the expressionset to ENTREZID.
## 583 ENSEMBL ID's didn't have a matching ENTEREZ ID. Dropping them now.
## Before conversion, the expressionset has 19941 entries.
## After conversion, the expressionset has 19519 entries.
## Adding annotations from reference/msigdb_v7.2.xml.
## Warning: `xml_nodes()` was deprecated in rvest 1.0.0.
## Please use `html_elements()` instead.
hs_celltype_gsva_c7_sig <- sm(get_sig_gsva_categories(
hs_celltype_gsva_c7,
excel="excel/individual_celltypes_gsva_c7.xlsx"))
## The raw heatmap of the C2 values
hs_celltype_gsva_c2_sig[["raw_plot"]]
## The 'significance' scores of the C2 values
hs_celltype_gsva_c2_sig[["score_plot"]]
## The subset of scores for categories deemed significantly different.
hs_celltype_gsva_c2_sig[["subset_plot"]]
## The raw heatmap of the C7 values
hs_celltype_gsva_c7_sig[["raw_plot"]]
## The 'significance' scores of the C7 values
hs_celltype_gsva_c7_sig[["score_plot"]]
## The subset of scores for categories deemed significantly different.
hs_celltype_gsva_c7_sig[["subset_plot"]]
Upon returning from Maine, I sat with Najib, Theresa, and Maria Adelaida. A whole host of potential ideas were considered; here are a couple of them:
My TODO text says: "Variance partition of TMRC3 samples with at least time, cell-type, and cure/fail. I mixed this idea with the visit 1 previously.
vp_factors <- c("condition", "batch", "visitnumber", "eb_lc_ulcera_area_1")
v1_vp <- simple_varpart(hs_clinical, factors=vp_factors, do_fit=TRUE)
## Error in checkModelStatus(fit, showWarnings = showWarnings, colinearityCutoff = colinearityCutoff) :
## The variables specified in this model are redundant,
## so the design matrix is not full rank
## Retrying with only condition in the model.
##
## Total:109 s
##
## Total:47 s
pp(file="images/v1_vp.png", image=v1_vp$partition_plot)
Here is the note from my TODO list: “Using v1, perform cell-type comparisons/signatures as well as a cure/fail differential expression”.
Implicit in this query is the idea that we should do an explicit visualization of the visit 1 samples without the subsequent visits.
## For the moment, let us ignore the lost/NA samples
v1_expt <- subset_expt(hs_clinical, subset="visitnumber=='1'") %>%
subset_expt(subset="condition=='cure'|condition=='failure'") %>%
subset_expt(subset="batch!='biopsy'")
## subset_expt(): There were 182, now there are 76 samples.
## subset_expt(): There were 76, now there are 68 samples.
## subset_expt(): There were 68, now there are 42 samples.
v1_norm <- normalize_expt(v1_expt, filter=TRUE, transform="log2", convert="cpm",
batch="svaseq")
## Removing 8361 low-count genes (11580 remaining).
## batch_counts: Before batch/surrogate estimation, 8014 entries are x==0: 2%.
## batch_counts: Before batch/surrogate estimation, 75129 entries are 0<x<1: 15%.
## Setting 2645 low elements to zero.
## transform_counts: Found 2645 values equal to 0, adding 1 to the matrix.
plot_pca(v1_norm)$plot
v1_de <- all_pairwise(v1_expt, filter=TRUE, model_batch="svaseq")
## batch_counts: Before batch/surrogate estimation, 8014 entries are x==0: 2%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (11580 remaining).
## batch_counts: Before batch/surrogate estimation, 8014 entries are x==0: 2%.
## batch_counts: Before batch/surrogate estimation, 75129 entries are 0<x<1: 15%.
## Setting 2645 low elements to zero.
## transform_counts: Found 2645 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
v1_table <- combine_de_tables(v1_de, excel=glue::glue("excel/v1_tables-v{ver}.xlsx"))
It might be interesting to add ‘parasitemappingrate’ here but I have not filled it in for all samples yet.
samplecollectiondate should be useful too but is not complete.
This is also a great place to pull in some of the clinical information. I need to ask Maria Adelaida for it…
v1_vp <- simple_varpart(v1_expt, do_fit=TRUE)
##
## Total:104 s
##
## Total:59 s
v1_vp$partition_plot
## condition is cure/fail batch is cell type.
v1_eo <- subset_expt(v1_expt, subset="batch=='eosinophils'")
## subset_expt(): There were 42, now there are 10 samples.
v1_eo_norm <- normalize_expt(v1_eo, transform="log2", convert="cpm",
filter=TRUE, norm="quant")
## Removing 9789 low-count genes (10152 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(v1_eo_norm)$plot
v1_eo_nb <- normalize_expt(v1_eo, transform="log2", convert="cpm",
filter=TRUE, batch="svaseq")
## Removing 9789 low-count genes (10152 remaining).
## batch_counts: Before batch/surrogate estimation, 93 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 3050 entries are 0<x<1: 3%.
## Setting 106 low elements to zero.
## transform_counts: Found 106 values equal to 0, adding 1 to the matrix.
plot_pca(v1_eo_nb)$plot
v1_mono <- subset_expt(v1_expt, subset="batch=='monocytes'")
## subset_expt(): There were 42, now there are 16 samples.
v1_mono_norm <- normalize_expt(v1_mono, transform="log2", convert="cpm",
filter=TRUE, norm="quant")
## Removing 9329 low-count genes (10612 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(v1_mono_norm)$plot
v1_mono_nb <- normalize_expt(v1_mono, transform="log2", convert="cpm",
filter=TRUE, batch="svaseq")
## Removing 9329 low-count genes (10612 remaining).
## batch_counts: Before batch/surrogate estimation, 226 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 6672 entries are 0<x<1: 4%.
## Setting 205 low elements to zero.
## transform_counts: Found 205 values equal to 0, adding 1 to the matrix.
plot_pca(v1_mono_nb)$plot
v1_neut <- subset_expt(v1_expt, subset="batch=='neutrophils'")
## subset_expt(): There were 42, now there are 16 samples.
v1_neut_norm <- normalize_expt(v1_neut, transform="log2", convert="cpm",
filter=TRUE, norm="quant")
## Removing 11186 low-count genes (8755 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
plot_pca(v1_neut_norm)$plot
v1_neut_nb <- normalize_expt(v1_neut, transform="log2", convert="cpm",
filter=TRUE, batch="svaseq")
## Removing 11186 low-count genes (8755 remaining).
## batch_counts: Before batch/surrogate estimation, 138 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 6557 entries are 0<x<1: 5%.
## Setting 167 low elements to zero.
## transform_counts: Found 167 values equal to 0, adding 1 to the matrix.
plot_pca(v1_neut_nb)$plot
Compare visits independent of cure/fail
time_expt <- hs_clinical %>%
set_expt_conditions(fact="visitnumber")
pData(time_expt)[["condition"]] <- paste0("v", pData(time_expt)[["condition"]])
time_norm <- normalize_expt(time_expt, filter=TRUE, norm="quant", convert="cpm",
transform="log2")
## Removing 5216 low-count genes (14725 remaining).
## transform_counts: Found 107 values equal to 0, adding 1 to the matrix.
time_nb <- normalize_expt(time_expt, filter=TRUE, batch="svaseq", convert="cpm",
transform="log2")
## Removing 5216 low-count genes (14725 remaining).
## batch_counts: Before batch/surrogate estimation, 198077 entries are x==0: 7%.
## batch_counts: Before batch/surrogate estimation, 528392 entries are 0<x<1: 20%.
## Setting 33769 low elements to zero.
## transform_counts: Found 33769 values equal to 0, adding 1 to the matrix.
plot_pca(time_nb)$plot
## plot labels was not set and there are more than 100 samples, disabling it.
time_de <- all_pairwise(time_expt, filter=TRUE, model_batch="svaseq")
## batch_counts: Before batch/surrogate estimation, 198077 entries are x==0: 7%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (14725 remaining).
## batch_counts: Before batch/surrogate estimation, 198077 entries are x==0: 7%.
## batch_counts: Before batch/surrogate estimation, 528392 entries are 0<x<1: 20%.
## Setting 33769 low elements to zero.
## transform_counts: Found 33769 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
keepers <- list(
"v2v1"=c("v2", "v1"),
"v3v1"=c("v3", "v1"),
"v3v2"=c("v3", "v2"))
time_tables <- combine_de_tables(time_de, keepers=keepers,
excel=glue::glue("excel/compare_visits-v{ver}.xlsx"))
Now let us consider the 3 visits from the lens of cell type.
time_type_factor <- paste0(pData(time_expt)[["condition"]], "_", pData(time_expt)[["batch"]])
time_type_expt <- set_expt_conditions(time_expt, fact=time_type_factor)
time_type_biopsy <- subset_expt(time_type_expt, subset="batch=='biopsy'")
## subset_expt(): There were 182, now there are 52 samples.
time_type_neutrophil <- subset_expt(time_type_expt, subset="batch=='neutrophils'")
## subset_expt(): There were 182, now there are 50 samples.
time_type_eosinophil <- subset_expt(time_type_expt, subset="batch=='eosinophils'")
## subset_expt(): There were 182, now there are 30 samples.
time_type_monocyte <- subset_expt(time_type_expt, subset="batch=='monocytes'")
## subset_expt(): There were 182, now there are 50 samples.
time_biopsy <- normalize_expt(time_type_biopsy, filter=TRUE, norm="quant", convert="cpm",
transform="log2")
## Removing 5738 low-count genes (14203 remaining).
## transform_counts: Found 16 values equal to 0, adding 1 to the matrix.
time_biopsy_nb <- normalize_expt(time_type_biopsy, filter=TRUE, batch="svaseq", convert="cpm",
transform="log2")
## Removing 5738 low-count genes (14203 remaining).
## batch_counts: Before batch/surrogate estimation, 5504 entries are x==0: 1%.
## batch_counts: Before batch/surrogate estimation, 44637 entries are 0<x<1: 6%.
## Setting 1749 low elements to zero.
## transform_counts: Found 1749 values equal to 0, adding 1 to the matrix.
plot_pca(time_biopsy)$plot
plot_pca(time_biopsy_nb)$plot
time_biopsy_de <- all_pairwise(time_type_biopsy, model_batch="svaseq", filter=TRUE)
## batch_counts: Before batch/surrogate estimation, 5504 entries are x==0: 1%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (14203 remaining).
## batch_counts: Before batch/surrogate estimation, 5504 entries are x==0: 1%.
## batch_counts: Before batch/surrogate estimation, 44637 entries are 0<x<1: 6%.
## Setting 1749 low elements to zero.
## transform_counts: Found 1749 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
keepers <- list(
"v2v1"=c("v2_biopsy", "v1_biopsy"),
"v3v1"=c("v3_biopsy", "v1_biopsy"),
"v3v2"=c("v3_biopsy", "v2_biopsy"))
time_biopsy_tables <- combine_de_tables(
time_biopsy_de, keepers=keepers,
excel=glue::glue("excel/time_biopsy_tables-v{ver}.xlsx"))
time_neutrophil <- normalize_expt(time_type_neutrophil, filter=TRUE, norm="quant", convert="cpm",
transform="log2")
## Removing 10599 low-count genes (9342 remaining).
## transform_counts: Found 1 values equal to 0, adding 1 to the matrix.
time_neutrophil_nb <- normalize_expt(time_type_neutrophil, filter=TRUE, batch="svaseq", convert="cpm",
transform="log2")
## Removing 10599 low-count genes (9342 remaining).
## batch_counts: Before batch/surrogate estimation, 1669 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 42558 entries are 0<x<1: 9%.
## Setting 775 low elements to zero.
## transform_counts: Found 775 values equal to 0, adding 1 to the matrix.
plot_pca(time_neutrophil)$plot
plot_pca(time_neutrophil_nb)$plot
time_neutrophil_de <- all_pairwise(time_type_neutrophil, model_batch="svaseq", filter=TRUE)
## batch_counts: Before batch/surrogate estimation, 1669 entries are x==0: 0%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (9342 remaining).
## batch_counts: Before batch/surrogate estimation, 1669 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 42558 entries are 0<x<1: 9%.
## Setting 775 low elements to zero.
## transform_counts: Found 775 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
keepers <- list(
"v2v1"=c("v2_neutrophils", "v1_neutrophils"),
"v3v1"=c("v3_neutrophils", "v1_neutrophils"),
"v3v2"=c("v3_neutrophils", "v2_neutrophils"))
time_neutrophil_tables <- combine_de_tables(
time_neutrophil_de, keepers=keepers,
excel=glue::glue("excel/time_neutrophil_tables-v{ver}.xlsx"))
time_eosinophil <- normalize_expt(time_type_eosinophil, filter=TRUE, norm="quant", convert="cpm",
transform="log2")
## Removing 9190 low-count genes (10751 remaining).
## transform_counts: Found 5 values equal to 0, adding 1 to the matrix.
time_eosinophil_nb <- normalize_expt(time_type_eosinophil, filter=TRUE, batch="svaseq", convert="cpm",
transform="log2")
## Removing 9190 low-count genes (10751 remaining).
## batch_counts: Before batch/surrogate estimation, 785 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 18837 entries are 0<x<1: 6%.
## Setting 428 low elements to zero.
## transform_counts: Found 428 values equal to 0, adding 1 to the matrix.
plot_pca(time_eosinophil)$plot
plot_pca(time_eosinophil_nb)$plot
time_eosinophil_de <- all_pairwise(time_type_eosinophil, model_batch="svaseq", filter=TRUE)
## batch_counts: Before batch/surrogate estimation, 785 entries are x==0: 0%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (10751 remaining).
## batch_counts: Before batch/surrogate estimation, 785 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 18837 entries are 0<x<1: 6%.
## Setting 428 low elements to zero.
## transform_counts: Found 428 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
keepers <- list(
"v2v1"=c("v2_eosinophils", "v1_eosinophils"),
"v3v1"=c("v3_eosinophils", "v1_eosinophils"),
"v3v2"=c("v3_eosinophils", "v2_eosinophils"))
time_eosinophil_tables <- combine_de_tables(
time_eosinophil_de, keepers=keepers,
excel=glue::glue("excel/time_eosinophil_tables-v{ver}.xlsx"))
time_monocyte <- normalize_expt(time_type_monocyte, filter=TRUE, norm="quant", convert="cpm",
transform="log2")
## Removing 8776 low-count genes (11165 remaining).
## transform_counts: Found 11 values equal to 0, adding 1 to the matrix.
time_monocyte_nb <- normalize_expt(time_type_monocyte, filter=TRUE, batch="svaseq", convert="cpm",
transform="log2")
## Removing 8776 low-count genes (11165 remaining).
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 36412 entries are 0<x<1: 7%.
## Setting 1029 low elements to zero.
## transform_counts: Found 1029 values equal to 0, adding 1 to the matrix.
plot_pca(time_monocyte)$plot
plot_pca(time_monocyte_nb)$plot
time_monocyte_de <- all_pairwise(time_type_monocyte, model_batch="svaseq", filter=TRUE)
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (11165 remaining).
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 36412 entries are 0<x<1: 7%.
## Setting 1029 low elements to zero.
## transform_counts: Found 1029 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
keepers <- list(
"v2v1"=c("v2_monocytes", "v1_monocytes"),
"v3v1"=c("v3_monocytes", "v1_monocytes"),
"v3v2"=c("v3_monocytes", "v2_monocytes"))
time_monocyte_tables <- combine_de_tables(
time_monocyte_de, keepers=keepers,
excel=glue::glue("excel/time_monocyte_tables-v{ver}.xlsx"))
time_cf_factor <- paste0(pData(time_type_biopsy)[["condition"]], "_", pData(time_type_biopsy)[["clinicaloutcome"]])
biopsy_time_cf <- set_expt_conditions(time_type_biopsy, fact=time_cf_factor)
time_cf_biopsy <- normalize_expt(biopsy_time_cf, filter=TRUE, norm="quant", convert="cpm",
transform="log2")
## Removing 5738 low-count genes (14203 remaining).
## transform_counts: Found 16 values equal to 0, adding 1 to the matrix.
time_cf_biopsy_nb <- normalize_expt(biopsy_time_cf, filter=TRUE, batch="svaseq", convert="cpm",
transform="log2")
## Removing 5738 low-count genes (14203 remaining).
## batch_counts: Before batch/surrogate estimation, 5504 entries are x==0: 1%.
## batch_counts: Before batch/surrogate estimation, 44637 entries are 0<x<1: 6%.
## Setting 1721 low elements to zero.
## transform_counts: Found 1721 values equal to 0, adding 1 to the matrix.
plot_pca(time_cf_biopsy)$plot
plot_pca(time_cf_biopsy_nb)$plot
biopsy_time_de <- all_pairwise(time_type_biopsy, filter=TRUE, model_batch="svaseq")
## batch_counts: Before batch/surrogate estimation, 5504 entries are x==0: 1%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (14203 remaining).
## batch_counts: Before batch/surrogate estimation, 5504 entries are x==0: 1%.
## batch_counts: Before batch/surrogate estimation, 44637 entries are 0<x<1: 6%.
## Setting 1749 low elements to zero.
## transform_counts: Found 1749 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
biopsy_time_tables <- combine_de_tables(time_type_de, keepers=keepers,
excel=glue::glue("excel/compare_biopsy_visits-v{ver}.xlsx"))
## Error in combine_de_tables(time_type_de, keepers = keepers, excel = glue::glue("excel/compare_biopsy_visits-v{ver}.xlsx")): object 'time_type_de' not found
The following blocks split the samples into a few groups by sample type and look at the distributions between them.
Get top/bottom n genes for each cell type, using clinical outcome as the factor of interest. For the moment, use sva for the DE analysis. Provide cpms for the top/bottom n genes.
Start with top/bottom 200. Perform default logFC and p-value as well.
mono <- subset_expt(hs_valid, subset="typeofcells=='monocytes'") %>%
set_expt_conditions(fact="clinicaloutcome") %>%
set_expt_batches(fact="visitnumber") %>%
set_expt_colors(colors=chosen_colors)
## subset_expt(): There were 202, now there are 50 samples.
## Warning in set_expt_colors(., colors = chosen_colors): Colors for the following
## categories are not being used: null.
## FIXME set_expt_colors should speak up if there are mismatches here!!!
save_result <- save(mono, file="rda/monocyte_expt.rda")
mono_norm <- normalize_expt(mono, convert="cpm", filter=TRUE,
transform="log2", norm="quant")
## Removing 8776 low-count genes (11165 remaining).
## transform_counts: Found 11 values equal to 0, adding 1 to the matrix.
plt <- plot_pca(mono_norm, plot_labels=FALSE)$plot
pp(file=glue("images/mono_pca_normalized-v{ver}.pdf"), image=plt)
mono_nb <- normalize_expt(mono, convert="cpm", filter=TRUE,
transform="log2", batch="svaseq")
## Removing 8776 low-count genes (11165 remaining).
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 36412 entries are 0<x<1: 7%.
## Setting 1199 low elements to zero.
## transform_counts: Found 1199 values equal to 0, adding 1 to the matrix.
plt <- plot_pca(mono_nb, plot_labels=FALSE)$plot
pp(file=glue("images/mono_pca_normalized_batch-v{ver}.pdf"), image=plt)
mono_de <- sm(all_pairwise(mono, model_batch=FALSE, filter=TRUE))
mono_tables <- sm(combine_de_tables(
mono_de, keepers=keepers,
excel=glue::glue("excel/monocyte_clinical_all_tables-v{ver}.xlsx")))
written <- write_xlsx(data=mono_tables[["data"]][[1]],
excel=glue::glue("excel/monocyte_clinical_table-v{ver}.xlsx"))
mono_sig <- sm(extract_significant_genes(mono_tables, according_to="deseq"))
written <- write_xlsx(data=mono_sig[["deseq"]][["ups"]][[1]],
excel=glue::glue("excel/monocyte_clinical_sigup-v{ver}.xlsx"))
written <- write_xlsx(data=mono_sig[["deseq"]][["downs"]][[1]],
excel=glue::glue("excel/monocyte_clinical_sigdown-v{ver}.xlsx"))
mono_pct_sig <- sm(extract_significant_genes(mono_tables, n=200,
lfc=NULL, p=NULL, according_to="deseq"))
written <- write_xlsx(data=mono_pct_sig[["deseq"]][["ups"]][[1]],
excel=glue::glue("excel/monocyte_clinical_sigup_pct-v{ver}.xlsx"))
written <- write_xlsx(data=mono_pct_sig[["deseq"]][["downs"]][[1]],
excel=glue::glue("excel/monocyte_clinical_sigdown_pct-v{ver}.xlsx"))
mono_sig$summary_df
## data frame with 0 columns and 0 rows
## Print out a table of the cpm values for other explorations.
mono_cpm <- sm(normalize_expt(mono, convert="cpm"))
written <- write_xlsx(data=exprs(mono_cpm),
excel=glue::glue("excel/monocyte_cpm_before_batch-v{ver}.xlsx"))
mono_bcpm <- sm(normalize_expt(mono, filter=TRUE, convert="cpm", batch="svaseq"))
written <- write_xlsx(data=exprs(mono_bcpm),
excel=glue::glue("excel/monocyte_cpm_after_batch-v{ver}.xlsx"))
mono_de_sva <- sm(all_pairwise(mono, model_batch="svaseq", filter=TRUE))
mono_tables_sva <- sm(combine_de_tables(
mono_de_sva, keepers=keepers,
excel=glue::glue("excel/monocyte_clinical_all_tables_sva-v{ver}.xlsx")))
mono_sig_sva <- sm(extract_significant_genes(
mono_tables_sva,
excel=glue::glue("excel/monocyte_clinical_sig_tables_sva-v{ver}.xlsx"),
according_to="deseq"))
## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1
First print out the DE plots without and then with sva estimates.
## DESeq2 MA plot of failure / cure
mono_tables[["plots"]][["fail_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure
mono_tables[["plots"]][["fail_vs_cure"]][["deseq_vol_plots"]]$plot
## DESeq2 MA plot of failure / cure with svaseq
mono_tables_sva[["plots"]][["fail_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure with svaseq
mono_tables_sva[["plots"]][["fail_vs_cure"]][["deseq_vol_plots"]]$plot
ups <- mono_sig[["deseq"]][["ups"]][["fail_vs_cure"]]
downs <- mono_sig[["deseq"]][["downs"]][["fail_vs_cure"]]
mono_up_gp <- simple_gprofiler(sig_genes=ups)
## Performing gProfiler GO search of 64 genes against hsapiens.
## GO search found 15 hits.
## Performing gProfiler KEGG search of 64 genes against hsapiens.
## KEGG search found 3 hits.
## Performing gProfiler REAC search of 64 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 64 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 64 genes against hsapiens.
## TF search found 2 hits.
## Performing gProfiler CORUM search of 64 genes against hsapiens.
## CORUM search found 1 hits.
## Performing gProfiler HP search of 64 genes against hsapiens.
## HP search found 0 hits.
mono_up_gp[["pvalue_plots"]][["bpp_plot_over"]]
mono_up_gp[["pvalue_plots"]][["mfp_plot_over"]]
## NULL
mono_up_gp[["pvalue_plots"]][["reactome_plot_over"]]
## NULL
mono_down_gp <- simple_gprofiler(sig_genes=downs)
## Performing gProfiler GO search of 178 genes against hsapiens.
## GO search found 18 hits.
## Performing gProfiler KEGG search of 178 genes against hsapiens.
## KEGG search found 2 hits.
## Performing gProfiler REAC search of 178 genes against hsapiens.
## REAC search found 2 hits.
## Performing gProfiler MI search of 178 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 178 genes against hsapiens.
## TF search found 23 hits.
## Performing gProfiler CORUM search of 178 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 178 genes against hsapiens.
## HP search found 0 hits.
mono_down_gp[["pvalue_plots"]][["bpp_plot_over"]]
mono_down_gp[["pvalue_plots"]][["mfp_plot_over"]]
## NULL
mono_down_gp[["pvalue_plots"]][["reactome_plot_over"]]
## broad_c7 <- GSEABase::getGmt("reference/msigdb/c7.all.v7.2.entrez.gmt",
## collectionType=GSEABase::BroadCollection(category="c7"),
## geneIdType=GSEABase::EntrezIdentifier())
broad_c7 <- load_gmt_signatures(signatures="reference/msigdb/c7.all.v7.2.entrez.gmt",
signature_category="c7")
mono_up_goseq_msig <- goseq_msigdb(sig_genes=ups, signatures=broad_c7,
signature_category="c7", length_db=hs_length)
## Before conversion: 64, after conversion: 65.
## Before conversion: 227921, after conversion: 35481.
## Found 62 go_db genes and 65 length_db genes out of 65.
mono_down_goseq_msig <- goseq_msigdb(sig_genes=downs, signatures=broad_c7,
signature_category="c7", length_db=hs_length)
## Before conversion: 178, after conversion: 177.
## Before conversion: 227921, after conversion: 35481.
## Found 169 go_db genes and 177 length_db genes out of 177.
## Monocyte genes with increased expression in the failed samples
## share genes with the following experiments
mono_up_goseq_msig[["pvalue_plots"]][["mfp_plot_over"]]
## Monocyte genes with increased expression in the cured samples
## share genes with the following experiments
mono_down_goseq_msig[["pvalue_plots"]][["mfp_plot_over"]]
neut <- subset_expt(hs_valid, subset="typeofcells=='neutrophils'") %>%
set_expt_conditions(fact="clinicaloutcome") %>%
set_expt_batches(fact="visitnumber") %>%
set_expt_colors(colors=chosen_colors)
## subset_expt(): There were 202, now there are 50 samples.
## Warning in set_expt_colors(., colors = chosen_colors): Colors for the following
## categories are not being used: null.
save_result <- save(neut, file="rda/neutrophil_expt.rda")
neut_norm <- sm(normalize_expt(neut, convert="cpm", filter=TRUE, transform="log2"))
plt <- plot_pca(neut_norm, plot_labels=FALSE)$plot
pp(file=glue("images/neut_pca_normalized-v{ver}.pdf"), image=plt)
neut_nb <- sm(normalize_expt(neut, convert="cpm", filter=TRUE,
transform="log2", batch="svaseq"))
plt <- plot_pca(neut_nb, plot_labels=FALSE)$plot
pp(file=glue("images/neut_pca_normalized_svaseq-v{ver}.pdf"), image=plt)
neut_de <- sm(all_pairwise(neut, model_batch=FALSE, filter=TRUE))
neut_tables <- sm(combine_de_tables(
neut_de, keepers=keepers,
excel=glue::glue("excel/neutrophil_clinical_all_tables-v{ver}.xlsx")))
written <- write_xlsx(data=neut_tables[["data"]][[1]],
excel=glue::glue("excel/neutrophil_clinical_table-v{ver}.xlsx"))
neut_sig <- sm(extract_significant_genes(neut_tables, according_to="deseq"))
written <- write_xlsx(data=neut_sig[["deseq"]][["ups"]][[1]],
excel=glue::glue("excel/neutrophil_clinical_sigup-v{ver}.xlsx"))
written <- write_xlsx(data=neut_sig[["deseq"]][["downs"]][[1]],
excel=glue::glue("excel/neutrophil_clinical_sigdown-v{ver}.xlsx"))
neut_pct_sig <- sm(extract_significant_genes(neut_tables, n=200, lfc=NULL,
p=NULL, according_to="deseq"))
written <- write_xlsx(data=neut_sig[["deseq"]][["ups"]][[1]],
excel=glue::glue("excel/neutrophil_clinical_sigup_pct-v{ver}.xlsx"))
written <- write_xlsx(data=neut_sig[["deseq"]][["downs"]][[1]],
excel=glue::glue("excel/neutrophil_clinical_sigdown_pct-v{ver}.xlsx"))
neut_cpm <- sm(normalize_expt(neut, convert="cpm"))
written <- write_xlsx(data=exprs(neut_cpm),
excel=glue::glue("excel/neutrophil_cpm_before_batch-v{ver}.xlsx"))
neut_bcpm <- sm(normalize_expt(neut, filter=TRUE, batch="svaseq", convert="cpm"))
written <- write_xlsx(data=exprs(neut_bcpm),
excel=glue::glue("excel/neutrophil_cpm_after_batch-v{ver}.xlsx"))
neut_de_sva <- sm(all_pairwise(neut, model_batch="svaseq", filter=TRUE))
neut_tables_sva <- sm(combine_de_tables(
neut_de_sva, keepers=keepers,
excel=glue::glue("excel/neutrophil_clinical_all_tables_sva-v{ver}.xlsx")))
neut_sig_sva <- sm(extract_significant_genes(
neut_tables_sva,
excel=glue::glue("excel/neutrophil_clinical_sig_tables_sva-v{ver}.xlsx"),
according_to="deseq"))
## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1
## DESeq2 MA plot of failure / cure
neut_tables[["plots"]][["fail_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure
neut_tables[["plots"]][["fail_vs_cure"]][["deseq_vol_plots"]]$plot
## DESeq2 MA plot of failure / cure with sva
neut_tables_sva[["plots"]][["fail_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure with sva
neut_tables_sva[["plots"]][["fail_vs_cure"]][["deseq_vol_plots"]]$plot
ups <- neut_sig[["deseq"]][["ups"]][["fail_vs_cure"]]
downs <- neut_sig[["deseq"]][["downs"]][["fail_vs_cure"]]
neut_up_gp <- simple_gprofiler(sig_genes=ups)
## Performing gProfiler GO search of 66 genes against hsapiens.
## GO search found 0 hits.
## Performing gProfiler KEGG search of 66 genes against hsapiens.
## KEGG search found 0 hits.
## Performing gProfiler REAC search of 66 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 66 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 66 genes against hsapiens.
## TF search found 13 hits.
## Performing gProfiler CORUM search of 66 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 66 genes against hsapiens.
## HP search found 0 hits.
neut_up_gp[["pvalue_plots"]][["bpp_plot_over"]]
## NULL
neut_up_gp[["pvalue_plots"]][["mfp_plot_over"]]
## NULL
neut_up_gp[["pvalue_plots"]][["reactome_plot_over"]]
## NULL
neut_down_gp <- simple_gprofiler(downs)
## Performing gProfiler GO search of 71 genes against hsapiens.
## GO search found 2 hits.
## Performing gProfiler KEGG search of 71 genes against hsapiens.
## KEGG search found 1 hits.
## Performing gProfiler REAC search of 71 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 71 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 71 genes against hsapiens.
## TF search found 3 hits.
## Performing gProfiler CORUM search of 71 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 71 genes against hsapiens.
## HP search found 0 hits.
neut_down_gp[["pvalue_plots"]][["bpp_plot_over"]]
## NULL
neut_down_gp[["pvalue_plots"]][["mfp_plot_over"]]
neut_down_gp[["pvalue_plots"]][["reactome_plot_over"]]
## NULL
neut_up_goseq_msig <- goseq_msigdb(sig_genes=ups, signatures=broad_c7,
signature_category="c7", length_db=hs_length)
## Before conversion: 66, after conversion: 68.
## Before conversion: 227921, after conversion: 35481.
## Found 65 go_db genes and 68 length_db genes out of 68.
neut_down_goseq_msig <- goseq_msigdb(sig_genes=downs, signatures=broad_c7,
signature_category="c7", length_db=hs_length)
## Before conversion: 71, after conversion: 69.
## Before conversion: 227921, after conversion: 35481.
## Found 62 go_db genes and 69 length_db genes out of 69.
## Neutrophil genes with increased expression in the failed samples
## share genes with the following experiments
neut_up_goseq_msig[["pvalue_plots"]][["mfp_plot_over"]]
## Neutrophil genes with increased expression in the cured samples
## share genes with the following experiments
neut_down_goseq_msig[["pvalue_plots"]][["mfp_plot_over"]]
eo <- subset_expt(hs_valid, subset="typeofcells=='eosinophils'") %>%
set_expt_conditions(fact="clinicaloutcome") %>%
set_expt_batches(fact="visitnumber") %>%
set_expt_colors(colors=chosen_colors)
## subset_expt(): There were 202, now there are 30 samples.
## Warning in set_expt_colors(., colors = chosen_colors): Colors for the following
## categories are not being used: nullnotapplicable.
save_result <- save(eo, file="rda/eosinophil_expt.rda")
eo_norm <- sm(normalize_expt(eo, convert="cpm", transform="log2",
norm="quant", filter=TRUE))
plt <- plot_pca(eo_norm, plot_labels=FALSE)$plot
pp(file=glue("images/eo_pca_normalized-v{ver}.pdf"), image=plt)
eo_nb <- sm(normalize_expt(eo, convert="cpm", transform="log2",
filter=TRUE, batch="svaseq"))
plt <- plot_pca(eo_nb, plot_labels=FALSE)$plot
plt
eo_de <- sm(all_pairwise(eo, model_batch=FALSE, filter=TRUE))
eo_tables <- sm(combine_de_tables(
eo_de, keepers=keepers,
excel=glue::glue("excel/eosinophil_clinical_all_tables-v{ver}.xlsx")))
written <- write_xlsx(data=eo_tables[["data"]][[1]],
excel=glue::glue("excel/eosinophil_clinical_table-v{ver}.xlsx"))
eo_sig <- sm(extract_significant_genes(eo_tables, according_to="deseq"))
written <- write_xlsx(data=eo_sig[["deseq"]][["ups"]][[1]],
excel=glue::glue("excel/eosinophil_clinical_sigup-v{ver}.xlsx"))
written <- write_xlsx(data=eo_sig[["deseq"]][["downs"]][[1]],
excel=glue::glue("excel/eosinophil_clinical_sigdown-v{ver}.xlsx"))
eo_pct_sig <- sm(extract_significant_genes(eo_tables, n=200,
lfc=NULL, p=NULL, according_to="deseq"))
written <- write_xlsx(data=eo_pct_sig[["deseq"]][["ups"]][[1]],
excel=glue::glue("excel/eosinophil_clinical_sigup_pct-v{ver}.xlsx"))
written <- write_xlsx(data=eo_pct_sig[["deseq"]][["downs"]][[1]],
excel=glue::glue("excel/eosinophil_clinical_sigdown_pct-v{ver}.xlsx"))
eo_cpm <- sm(normalize_expt(eo, convert="cpm"))
written <- write_xlsx(data=exprs(eo_cpm),
excel=glue::glue("excel/eosinophil_cpm_before_batch-v{ver}.xlsx"))
eo_bcpm <- sm(normalize_expt(eo, filter=TRUE, batch="svaseq", convert="cpm"))
written <- write_xlsx(data=exprs(eo_bcpm),
excel=glue::glue("excel/eosinophil_cpm_after_batch-v{ver}.xlsx"))
eo_de_sva <- sm(all_pairwise(eo, model_batch="svaseq", filter=TRUE))
eo_tables_sva <- sm(combine_de_tables(
eo_de_sva, keepers=keepers,
excel=glue::glue("excel/eosinophil_clinical_all_tables_sva-v{ver}.xlsx")))
eo_sig_sva <- sm(extract_significant_genes(
eo_tables_sva,
excel=glue::glue("excel/eosinophil_clinical_sig_tables_sva-v{ver}.xlsx"),
according_to="deseq"))
## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1
## DESeq2 MA plot of failure / cure
eo_tables[["plots"]][["fail_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure
eo_tables[["plots"]][["fail_vs_cure"]][["deseq_vol_plots"]]$plot
## DESeq2 MA plot of failure / cure with sva
eo_tables_sva[["plots"]][["fail_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure with sva
eo_tables_sva[["plots"]][["fail_vs_cure"]][["deseq_vol_plots"]]$plot
ups <- eo_sig[["deseq"]][["ups"]][["fail_vs_cure"]]
downs <- eo_sig[["deseq"]][["downs"]][["fail_vs_cure"]]
eo_up_gp <- simple_gprofiler(sig_genes=ups)
## Performing gProfiler GO search of 108 genes against hsapiens.
## GO search found 95 hits.
## Performing gProfiler KEGG search of 108 genes against hsapiens.
## KEGG search found 5 hits.
## Performing gProfiler REAC search of 108 genes against hsapiens.
## REAC search found 8 hits.
## Performing gProfiler MI search of 108 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 108 genes against hsapiens.
## TF search found 46 hits.
## Performing gProfiler CORUM search of 108 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 108 genes against hsapiens.
## HP search found 0 hits.
eo_up_gp[["pvalue_plots"]][["bpp_plot_over"]]
eo_up_gp[["pvalue_plots"]][["mfp_plot_over"]]
eo_up_gp[["pvalue_plots"]][["reactome_plot_over"]]
eo_down_gp <- simple_gprofiler(downs)
## Performing gProfiler GO search of 37 genes against hsapiens.
## GO search found 0 hits.
## Performing gProfiler KEGG search of 37 genes against hsapiens.
## KEGG search found 0 hits.
## Performing gProfiler REAC search of 37 genes against hsapiens.
## REAC search found 0 hits.
## Performing gProfiler MI search of 37 genes against hsapiens.
## MI search found 0 hits.
## Performing gProfiler TF search of 37 genes against hsapiens.
## TF search found 0 hits.
## Performing gProfiler CORUM search of 37 genes against hsapiens.
## CORUM search found 0 hits.
## Performing gProfiler HP search of 37 genes against hsapiens.
## HP search found 0 hits.
eo_down_gp[["pvalue_plots"]][["bpp_plot_over"]]
## NULL
eo_down_gp[["pvalue_plots"]][["mfp_plot_over"]]
## NULL
eo_down_gp[["pvalue_plots"]][["reactome_plot_over"]]
## NULL
eo_up_goseq_msig <- goseq_msigdb(sig_genes=ups, signatures=broad_c7,
signature_category="c7", length_db=hs_length)
## Before conversion: 108, after conversion: 108.
## Before conversion: 227921, after conversion: 35481.
## Found 105 go_db genes and 108 length_db genes out of 108.
eo_down_goseq_msig <- goseq_msigdb(sig_genes=downs, signatures=broad_c7,
signature_category="c7", length_db=hs_length)
## Before conversion: 37, after conversion: 36.
## Before conversion: 227921, after conversion: 35481.
## Found 32 go_db genes and 36 length_db genes out of 36.
## Eosinophil genes with increased expression in the failed samples
## share genes with the following experiments
eo_up_goseq_msig[["pvalue_plots"]][["mfp_plot_over"]]
## Eosinophil genes with increased expression in the cured samples
## share genes with the following experiments
eo_down_goseq_msig[["pvalue_plots"]][["mfp_plot_over"]]
biop <- subset_expt(hs_valid, subset="typeofcells=='biopsy'") %>%
set_expt_conditions(fact="clinicaloutcome") %>%
set_expt_batches(fact="visitnumber") %>%
set_expt_colors(colors=chosen_colors)
## subset_expt(): There were 202, now there are 52 samples.
## Warning in set_expt_colors(., colors = chosen_colors): Colors for the following
## categories are not being used: nullnotapplicable.
save_result <- save(biop, file="rda/biopsy_expt.rda")
biop_norm <- normalize_expt(biop, filter=TRUE, convert="cpm",
transform="log2", norm="quant")
## Removing 5738 low-count genes (14203 remaining).
## transform_counts: Found 16 values equal to 0, adding 1 to the matrix.
plt <- plot_pca(biop_norm, plot_labels=FALSE)$plot
pp(file=glue("images/biop_pca_normalized-v{ver}.pdf"), image=plt)
biop_nb <- sm(normalize_expt(biop, convert="cpm", filter=TRUE,
transform="log2", batch="svaseq"))
plt <- plot_pca(biop_nb, plot_labels=FALSE)$plot
pp(file=glue("images/biop_pca_normalized_svaseq-v{ver}.pdf"), image=plt)
biop_de <- sm(all_pairwise(biop, model_batch=FALSE, filter=TRUE))
biop_tables <- combine_de_tables(biop_de, keepers=keepers,
excel=glue::glue("excel/biopsy_clinical_all_tables-v{ver}.xlsx"))
written <- write_xlsx(data=biop_tables[["data"]][[1]],
excel=glue::glue("excel/biopsy_clinical_table-v{ver}.xlsx"))
biop_sig <- extract_significant_genes(biop_tables, according_to="deseq")
##written <- write_xlsx(data=biop_sig[["deseq"]][["ups"]][[1]],
## excel=glue::glue("excel/biopsy_clinical_sigup-v{ver}.xlsx"))
written <- write_xlsx(data=biop_sig[["deseq"]][["downs"]][[1]],
excel=glue::glue("excel/biopsy_clinical_sigdown-v{ver}.xlsx"))
biop_pct_sig <- extract_significant_genes(biop_tables, n=200, lfc=NULL, p=NULL, according_to="deseq")
## Getting the top and bottom 200 genes.
written <- write_xlsx(data=biop_pct_sig[["deseq"]][["ups"]][[1]],
excel=glue::glue("excel/biopsy_clinical_sigup_pct-v{ver}.xlsx"))
written <- write_xlsx(data=biop_pct_sig[["deseq"]][["downs"]][[1]],
excel=glue::glue("excel/biopsy_clinical_sigdown_pct-v{ver}.xlsx"))
biop_cpm <- sm(normalize_expt(biop, convert="cpm"))
written <- write_xlsx(data=exprs(biop_cpm),
excel=glue::glue("excel/biopsy_cpm_before_batch-v{ver}.xlsx"))
biop_bcpm <- sm(normalize_expt(biop, filter=TRUE, batch="svaseq", convert="cpm"))
written <- write_xlsx(data=exprs(biop_bcpm),
excel=glue::glue("excel/biopsy_cpm_after_batch-v{ver}.xlsx"))
biop_de_sva <- sm(all_pairwise(biop, model_batch="svaseq", filter=TRUE))
biop_tables_sva <- sm(combine_de_tables(
biop_de_sva, keepers=keepers,
excel=glue::glue("excel/biopsy_clinical_all_tables_sva-v{ver}.xlsx")))
biop_sig_sva <- sm(extract_significant_genes(
biop_tables_sva,
excel=glue::glue("excel/biopsy_clinical_sig_tables_sva-v{ver}.xlsx"),
according_to="deseq"))
## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1
## DESeq2 MA plot of failure / cure
biop_tables[["plots"]][["fail_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure
biop_tables[["plots"]][["fail_vs_cure"]][["deseq_vol_plots"]]$plot
## DESeq2 MA plot of failure / cure
biop_tables_sva[["plots"]][["fail_vs_cure"]][["deseq_ma_plots"]]$plot
## DESeq2 Volcano plot of failure / cure
biop_tables_sva[["plots"]][["fail_vs_cure"]][["deseq_vol_plots"]]$plot
visit_colors <- chosen_colors <- c("#D95F02", "#7570B3", "#1B9E77")
names(visit_colors) <- c(1, 2, 3)
mono_visit <- subset_expt(hs_valid, subset="typeofcells=='monocytes'") %>%
set_expt_conditions(fact="visitnumber") %>%
set_expt_batches(fact="clinicaloutcome") %>%
set_expt_colors(colors=chosen_colors)
## subset_expt(): There were 202, now there are 50 samples.
mono_visit_norm <- normalize_expt(mono_visit, filter=TRUE, norm="quant", convert="cpm",
transform="log2")
## Removing 8776 low-count genes (11165 remaining).
## transform_counts: Found 11 values equal to 0, adding 1 to the matrix.
mono_visit_pca <- plot_pca(mono_visit_norm)
pp(file="images/monocyte_by_visit.png", image=mono_visit_pca$plot)
mono_visit_nb <- normalize_expt(mono_visit, filter=TRUE, convert="cpm",
batch="svaseq", transform="log2")
## Removing 8776 low-count genes (11165 remaining).
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 36412 entries are 0<x<1: 7%.
## Setting 1029 low elements to zero.
## transform_counts: Found 1029 values equal to 0, adding 1 to the matrix.
mono_visit_nb_pca <- plot_pca(mono_visit_nb)
pp(file="images/monocyte_by_visit_nb.png", image=mono_visit_nb_pca$plot)
table(pData(mono_visit_norm)$batch)
##
## cure failure lost notapplicable
## 19 25 5 1
keepers <- list(
"second_vs_first"=c("c2", "c1"),
"third_vs_second"=c("c3", "c2"),
"third_vs_first"=c("c3", "c1"))
mono_visit_de <- all_pairwise(mono_visit, model_batch="svaseq", filter=TRUE)
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (11165 remaining).
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 36412 entries are 0<x<1: 7%.
## Setting 1029 low elements to zero.
## transform_counts: Found 1029 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
mono_visit_tables <- combine_de_tables(
mono_visit_de,
keepers=keepers,
excel=glue::glue("excel/mono_visit_tables-v{ver}.xlsx"))
new_factor <- as.character(pData(mono_visit)[["visitnumber"]])
not_one_idx <- new_factor != 1
new_factor[not_one_idx] <- "not_1"
mono_one_vs <- set_expt_conditions(mono_visit, new_factor)
mono_one_vs_de <- all_pairwise(mono_one_vs, model_batch="svaseq", filter=TRUE)
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## Plotting a PCA before surrogate/batch inclusion.
## Using svaseq to visualize before/after batch inclusion.
## Performing a test normalization with: raw
## Removing 0 low-count genes (11165 remaining).
## batch_counts: Before batch/surrogate estimation, 2380 entries are x==0: 0%.
## batch_counts: Before batch/surrogate estimation, 36412 entries are 0<x<1: 7%.
## Setting 936 low elements to zero.
## transform_counts: Found 936 values equal to 0, adding 1 to the matrix.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
mono_one_vs_tables <- combine_de_tables(
mono_one_vs_de,
excel=glue::glue("excel/mono_one_vs_tables-v{ver}.xlsx"))
In writing the following, I quickly realized that tspair was not joking when it said it is intended for small numbers of genes. For a full expressionset of human data it is struggling. I like the idea, it may prove worth while to spend some time optimizing the package so that it is more usable.
expt <- hs_clinical_nobiop
simple_tsp <- function(expt, column="condition") {
facts <- levels(as.factor(pData(expt)[[column]]))
retlist <- list()
if (length(facts) < 2) {
stop("This requires factors with at least 2 levels.")
} else if (length(facts) == 2) {
retlist <- simple_tsp_pair(expt, column=column)
} else {
for (first in 1:(length(facts) - 1)) {
for (second in 2:(length(facts))) {
if (first < second) {
name <- glue::glue("{facts[first]}_vs_{facts[second]}")
message("Starting ", name, ".")
substring <- glue::glue("{column}=='{facts[first]}'|{column}=='{facts[second]}'")
subby <- subset_expt(expt, subset=as.character(substring))
retlist[[name]] <- simple_tsp_pair(subby, column=column)
}
}
}
}
}
simple_tsp_pair <- function(subby, column="condition", repetitions=50) {
tsp_input <- subby[["expressionset"]]
tsp_output <- tspcalc(tsp_input, column)
tsp_scores <- tspsig(tsp_input, column, B=repetitions)
}
tsp1 <- tspcalc(tsp_input, "condition")
if (!isTRUE(get0("skip_load"))) {
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
message(paste0("Saving to ", savefile))
tmp <- sm(saveme(filename=savefile))
}
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 3126ec4ed453b24c357acd29c97436ef8d784005
## This is hpgltools commit: Thu Oct 7 19:29:41 2021 -0400: 3126ec4ed453b24c357acd29c97436ef8d784005
## Saving to tmrc3_02sample_estimation_v202110.rda.xz
tmp <- loadme(filename=savefile)