We are splitting the creation of the various data structures and their analysis into separate documents.
Currently we use the sample metadata and the CRF data from the clinic.
## samplesheet <- "sample_sheets/tmrc3_samples_20220615.xlsx"
## Redundant sample sheet entries so we can test if something changed in June
samplesheet <- "sample_sheets/tmrc3_samples_20220728.xlsx"
crf_metadata <- "sample_sheets/20220509_EXP_ESPECIAL_TMRC3_V2.xlsx"
The primary annotation sources are:
Both provide GO data. They also provide helpful links to other data sources. For the moment, we are focusing on the human annotations.
These analyses have focused on gene-level abundances/differences. Thus, when htseq-count was invoked against the hisat2-based mappings, parameters were chosen to count genes rather than transcripts. In this context, a gene refers to the non-redundant union of the transcripts’ exons. Similarly, when salmon counts were used via tximport, a mapping of genes to transcripts was used to collapse the matrix to gene-level abundances. This decision may be revisited.
hs_annot <- load_biomart_annotations(year="2020", month="jan")
## The biomart annotations file already exists, loading from it.
hs_annot <- hs_annot[["annotation"]]
## The next two lines make a bridge between the gff file used by hisat2
## and the gene IDs.
hs_annot[["transcript"]] <- paste0(rownames(hs_annot), ".", hs_annot[["version"]])
rownames(hs_annot) <- make.names(hs_annot[["ensembl_gene_id"]], unique=TRUE)
## The tx_gene_map is used for tximport and salmon quantifications.
tx_gene_map <- hs_annot[, c("transcript", "ensembl_gene_id")]
summary(hs_annot)
## ensembl_transcript_id ensembl_gene_id version transcript_version
## Length:227784 Length:227784 Min. : 1.0 Min. : 1.00
## Class :character Class :character 1st Qu.: 6.0 1st Qu.: 1.00
## Mode :character Mode :character Median :12.0 Median : 1.00
## Mean :10.7 Mean : 3.08
## 3rd Qu.:15.0 3rd Qu.: 5.00
## Max. :28.0 Max. :17.00
##
## hgnc_symbol description gene_biotype cds_length
## Length:227784 Length:227784 Length:227784 Min. : 3
## Class :character Class :character Class :character 1st Qu.: 357
## Mode :character Mode :character Mode :character Median : 694
## Mean : 1140
## 3rd Qu.: 1446
## Max. :107976
## NA's :127222
## chromosome_name strand start_position end_position
## Length:227784 Length:227784 Min. :5.77e+02 Min. :6.47e+02
## Class :character Class :character 1st Qu.:3.11e+07 1st Qu.:3.12e+07
## Mode :character Mode :character Median :6.04e+07 Median :6.06e+07
## Mean :7.41e+07 Mean :7.42e+07
## 3rd Qu.:1.09e+08 3rd Qu.:1.09e+08
## Max. :2.49e+08 Max. :2.49e+08
##
## transcript
## Length:227784
## Class :character
## Mode :character
##
##
##
##
The set of GO categories has not been limited to the 2020 data at the time of this writing. The GO categories is collected along with lengths for goseq. The other methods either have built-in databases of human data (gProfiler) or support orgDB data (org.Hs.eg.db) (clusterProfiler/topGO/gostats).
hs_go <- load_biomart_go()[["go"]]
## The biomart annotations file already exists, loading from it.
hs_length <- hs_annot[, c("ensembl_gene_id", "cds_length")]
colnames(hs_length) <- c("ID", "length")
Before we do any of the following subsets/analyses of the data, we need to collect it all in one place. Let’s do that here and name it ‘hs_expt’, it will comprise the full set of human data. When we cull some samples it will be renamed to ‘hs_valid’.
The variable ‘data_structures’ will contain the names of every variable I create in this document which I wish to save to disk.
Similarly, the character vector ‘sanitize_columns’ is the list of metadata columns over which I will iterate and sanitize the data, removing any spurious spaces, capitalization, etc.
data_structures <- c()
sanitize_columns <- c("visitnumber", "finaloutcome", "donor",
"typeofcells", "clinicalpresentation", "drug",
"condition", "batch")
hs_expt <- create_expt(samplesheet,
file_column="hg38100hisatfile",
savefile=glue::glue("rda/tmrc3_hs_expt_all_raw-v{ver}.rda"),
gene_info=hs_annot) %>%
exclude_genes_expt(column="gene_biotype", method="keep",
pattern="protein_coding", meta_column="ncrna_lost") %>%
sanitize_expt_metadata(columns=sanitize_columns) %>%
set_expt_factors(columns=sanitize_columns, class="factor") %>%
set_expt_conditions(fact="finaloutcome") %>%
set_expt_batches(fact="visitnumber") %>%
subset_expt(subset="typeofcells!='pbmcs'") %>%
subset_expt(subset="typeofcells!='macrophages'") %>%
subset_expt(subset="clinicalpresentation!='h'")
## Reading the sample metadata.
## Dropped 69 rows from the sample metadata because the sample ID is blank.
## The sample definitions comprises: 254 rows(samples) and 88 columns(metadata fields).
## Warning in create_expt(samplesheet, file_column = "hg38100hisatfile", savefile =
## glue::glue("rda/tmrc3_hs_expt_all_raw-v{ver}.rda"), : Some samples were removed
## when cross referencing the samples against the count data.
## Matched 21447 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final expressionset has 21481 features and 246 samples.
## remove_genes_expt(), before removal, there were 21481 genes, now there are 19923.
## There are 8 samples which kept less than 90 percent counts.
## TMRC30015 TMRC30269 TMRC30017 TMRC30019 TMRC30044 TMRC30045 TMRC30154 TMRC30241
## 79.19 89.18 85.66 89.69 80.29 73.28 83.16 89.37
## subset_expt(): There were 246, now there are 240 samples.
## subset_expt(): There were 240, now there are 212 samples.
## subset_expt(): There were 212, now there are 210 samples.
data_structures <- c(data_structures, "hs_expt")
## The following should make visit 1 the largest if one uses that column as a size factor when plotting.
meta <- pData(hs_expt) %>%
mutate(visitnumber = fct_relevel(visitnumber, c("3", "2", "1")))
pData(hs_expt) <- meta
save(list = "hs_expt", file = glue::glue("rda/tmrc3_hs_expt_all_sanitized-v{ver}.rda"))
There are some parameters provided by the clinicians which may prove to be of interest. I therefore am adding the CRF metadata here. I will also run some simple checks to try to make sure that the clinician’s metadata agrees with the sample metadata.
hs_pd <- pData(hs_expt)
start <- rownames(hs_pd)
hs_crf <- openxlsx::read.xlsx(crf_metadata)
hs_crf[["join"]] <- tolower(hs_crf[["codigo_paciente"]])
hs_pd[["join"]] <- hs_pd[["tubelabelorigin"]]
test <- plyr::join(hs_pd, hs_crf, by="join")
test[["join"]] <- NULL
rownames(test) <- rownames(hs_pd)
na_idx <- is.na(test)
test[na_idx] <- "undefined"
pData(hs_expt) <- test
save(list = "hs_expt", file = glue::glue("rda/tmrc3_hs_expt_all_sanitized_crf-v{ver}.rda"))
Check to make sure the CRF agrees with the metadata.
two_columns <- pData(hs_expt)[, c("finaloutcome", "ef_lc_estado_final_estudio")]
undef_idx <- two_columns[[2]] == "undefined"
two_columns <- two_columns[!undef_idx, ]
two_columns[["rewritten"]] <- "undef"
cure_idx <- two_columns[[2]] == 0
two_columns[cure_idx, "rewritten"] <- "cure"
fail_idx <- two_columns[[2]] == 1
two_columns[fail_idx, "rewritten"] <- "failure"
lost_idx <- two_columns[[2]] == 2
two_columns[lost_idx, "rewritten"] <- "lost"
same_idx <- two_columns[["finaloutcome"]] == two_columns[["rewritten"]]
broken <- two_columns[!same_idx, ]
broken
## [1] finaloutcome ef_lc_estado_final_estudio
## [3] rewritten
## <0 rows> (or 0-length row.names)
There are some metadata factors for which I think it will be nice to see the numbers before and after our filters. The following shows how many samples we have of the primary types before filtering.
dim(pData(hs_expt))
## [1] 210 148
table(pData(hs_expt)$drug)
##
## antimony miltefosine
## 202 8
table(pData(hs_expt)$clinic)
##
## Cali Tumaco
## 67 143
table(pData(hs_expt)$finaloutcome)
##
## cure failure lost
## 130 64 16
table(pData(hs_expt)$typeofcells)
##
## biopsy eosinophils monocytes neutrophils
## 21 45 73 71
table(pData(hs_expt)$visit)
##
## 3 2 1
## 54 56 100
summary(as.numeric(pData(hs_expt)$eb_lc_tiempo_evolucion))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 6.00 8.25 12.00 21.00
summary(as.numeric(pData(hs_expt)$eb_lc_tto_mcto_glucan_dosis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.0 14.0 19.0 54.6 20.0 999.0
summary(as.numeric(pData(hs_expt)$v3_lc_ejey_lesion_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 7.4 32.0 269.5 771.8 999.0
summary(as.numeric(pData(hs_expt)$v3_lc_lesion_area_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 226 999 2412 3016 16965
summary(as.numeric(pData(hs_expt)$v3_lc_ejex_ulcera_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 9.1 259.6 761.7 999.0
table(pData(hs_expt)$eb_lc_sexo)
##
## 1 2
## 173 37
table(pData(hs_expt)$eb_lc_etnia)
##
## 1 2 3
## 101 62 47
summary(as.numeric(pData(hs_expt)$edad))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 25.0 28.5 30.7 36.0 51.0
table(pData(hs_expt)$eb_lc_peso)
##
## 53.9 55.9 57.9 58 58.1 58.3 58.6 59 59.6 62 63 67 69.4
## 9 2 2 6 7 10 13 14 1 8 6 6 10
## 72 74.7 75 75.6 76.5 77 78 79.2 82 83.3 83.4 86.4 87
## 9 3 2 3 3 18 10 10 9 4 10 9 3
## 89 93.3 100 100.8
## 9 7 5 2
table(pData(hs_expt)$eb_lc_estatura)
##
## 145 152 154 155 156 158 159 160 162 163 164 165 166 167 169 172 173 174 175 176
## 6 1 10 9 6 17 2 6 10 9 15 12 19 3 2 10 9 32 2 1
## 177 182 183
## 10 9 10
table(pData(hs_expt)$ef_lc_estado_final_estudio)
##
## 0 1 2
## 130 64 16
table(pData(hs_expt)$finaloutcome)
##
## cure failure lost
## 130 64 16
unique(pData(hs_expt)$codigo_paciente)
## [1] "SU1034" "SU1154" "SU1167" "SU1168" "SU1174" "SU1176" "SU1175" "SU2050"
## [9] "SU2052" "SU2065" "SU1195" "SU1196" "SU1197" "SU2066" "SU1198" "SU2068"
## [17] "SU2071" "SU2072" "SU2073" "SU2159" "SU2162" "SU2161" "SU2164" "SU2167"
## [25] "SU2168" "SU2172" "SU2173" "SU2183" "SU2184" "SU2195" "SU2188" "SU2190"
## [33] "SU2194" "SU2201"
length(unique(pData(hs_expt)$codigo_paciente))
## [1] 34
Get the sex of the unique patients.
unique_people <- unique(pData(hs_expt)$codigo_paciente)
unique_people
## [1] "SU1034" "SU1154" "SU1167" "SU1168" "SU1174" "SU1176" "SU1175" "SU2050"
## [9] "SU2052" "SU2065" "SU1195" "SU1196" "SU1197" "SU2066" "SU1198" "SU2068"
## [17] "SU2071" "SU2072" "SU2073" "SU2159" "SU2162" "SU2161" "SU2164" "SU2167"
## [25] "SU2168" "SU2172" "SU2173" "SU2183" "SU2184" "SU2195" "SU2188" "SU2190"
## [33] "SU2194" "SU2201"
length(unique_people)
## [1] 34
df <- pData(hs_expt)
people <- df[["codigo_paciente"]]
first_indices <- order(people)[!duplicated(sort(people))]
table(df[first_indices, "eb_lc_sexo"])
##
## 1 2
## 27 7
table(df[first_indices, "finaloutcome"])
##
## cure failure lost
## 22 10 2
There are lots of ways which we will categorize the data, here are some potential color choices for them.
color_choices <- list(
"cf" = list(
"cure" = "#998EC3",
"failure" = "#F1A340"),
"type_visit" = list(
"monocytes_v1" = "#DD1C77",
"monocytes_v2" = "#C994C7",
"monocytes_v3" = "#E7E1EF",
"eosinophils_v1" = "#31A354",
"eosinophils_v2" = "#ADDD8E",
"eosinophils_v3" = "#F7FCD9",
"neutrophils_v1" = "#3182BD",
"neutrophils_v2" = "#9ECAE1",
"neutrophils_v3" = "#DEEBF7",
"biopsy_v1" = "#D95F0E"),
"type" = list(
"monocytes" = "#DD1C77",
"eosinophils" = "#31A354",
"neutrophils" = "#3182BD",
"biopsy" = "#D95F0E"),
"visit" = list(
## Turns out we never decided on visit-specific colors...
"v1" = "#DD0000",
"v2" = "#00DD00",
"v3" = "#0000DD"),
"cf_type" <- list(
cure_biopsy = "#D95F0E",
failure_biopsy = "#FEC44F",
cure_monocytes = "#DD1C77",
failure_monocytes = "#C994C7",
cure_eosinophils = "#31A354",
failure_eosinophils = "#ADDD8E",
cure_neutrophils = "#3182BD",
failure_neutrophils = "#9ECAE1"))
data_structures <- c(data_structures, "color_choices")
Given the starting point above, we will start extracting groups of samples of interest and giving them hopefully useful names.
The first set of samples removed from the data are those with too many missing genes.
all_nz <- plot_nonzero(hs_expt)
## The following samples have less than 12949.95 genes.
## [1] "TMRC30010" "TMRC30140" "TMRC30280" "TMRC30284" "TMRC30050" "TMRC30056"
## [7] "TMRC30052" "TMRC30058" "TMRC30031" "TMRC30038" "TMRC30265"
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
all_nz$plot
## Warning: ggrepel: 195 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
xlsx_prefix <- "analyses"
cpm_data <- normalize_expt(hs_expt, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel=glue::glue("{xlsx_prefix}/3_cali_and_tumaco/CPM/all_samples.xlsx"))
## Deleting the file analyses/3_cali_and_tumaco/CPM/all_samples.xlsx before writing the tables.
rpkm_data <- normalize_expt(hs_expt, filter=TRUE, convert="rpkm",
column="cds_length", na_to_zero=TRUE)
## Removing 5149 low-count genes (14774 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/hs_expt_rpkm-v{ver}.csv"))
To my eyes, there are 3 or 4 samples which are likely candidates for removal. In addition, we will remove samples which were lost during the treatment and/or ones which were used in other experiments but included in the TMRC3 sample sheet (thus the ‘notapplicable’ or ‘null’).
NOTE: I changed the prefix of the data structures to reflect the clinic(s) from which they were acquired. This will likely be a problem until I carefully step through the file; but will make sure I do not accidently upload data with the Cali samples.
tc_valid <- subset_expt(hs_expt, nonzero=11000) %>%
subset_expt(subset="finaloutcome!='lost'") %>%
subset_expt(subset="drug=='antimony'") %>%
set_expt_colors(color_choices[["cf"]])
## The samples (and read coverage) removed when filtering 11000 non-zero genes are:
## TMRC30010 TMRC30050 TMRC30052
## 52429 807571 3086349
## subset_expt(): There were 210, now there are 207 samples.
## subset_expt(): There were 207, now there are 192 samples.
## subset_expt(): There were 192, now there are 184 samples.
save(list = "tc_valid", file = glue::glue("rda/tmrc3_tc_valid-v{ver}.rda"))
data_structures <- c(data_structures, "tc_valid")
The following plot is essentially identical to the previous with two exceptions:
nz_post <- plot_nonzero(tc_valid)
## The following samples have less than 12949.95 genes.
## [1] "TMRC30140" "TMRC30280" "TMRC30284" "TMRC30056" "TMRC30058" "TMRC30031"
## [7] "TMRC30265"
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
nz_post$plot
## Warning: ggrepel: 163 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
We need to keep track of how many of each sample type is lost when we do our various filters. Thus I am repeating the same set of tallies. This will likely happen one more time, following the removal of samples which came from Cali.
table(pData(tc_valid)$drug)
##
## antimony
## 184
table(pData(tc_valid)$clinic)
##
## Cali Tumaco
## 61 123
table(pData(tc_valid)$finaloutcome)
##
## cure failure
## 122 62
table(pData(tc_valid)$typeofcells)
##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
table(pData(tc_valid)$visit)
##
## 3 2 1
## 51 50 83
summary(as.numeric(pData(tc_valid)$eb_lc_tiempo_evolucion))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 6.00 8.19 12.00 21.00
summary(as.numeric(pData(tc_valid)$eb_lc_tto_mcto_glucan_dosis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.0 14.8 19.0 17.5 20.0 20.0
summary(as.numeric(pData(tc_valid)$v3_lc_ejey_lesion_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 7.2 32.0 303.4 999.0 999.0
summary(as.numeric(pData(tc_valid)$v3_lc_lesion_area_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 226 999 2328 2448 16965
summary(as.numeric(pData(tc_valid)$v3_lc_ejex_ulcera_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 12.5 295.9 999.0 999.0
table(pData(tc_valid)$eb_lc_sexo)
##
## 1 2
## 156 28
table(pData(tc_valid)$eb_lc_etnia)
##
## 1 2 3
## 91 46 47
summary(as.numeric(pData(tc_valid)$edad))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 25.0 28.5 30.7 36.0 51.0
table(pData(tc_valid)$eb_lc_peso)
##
## 53.9 57.9 58 58.1 58.3 58.6 59 59.6 62 63 67 69.4 72
## 9 2 6 7 10 3 8 1 6 6 6 10 9
## 75 76.5 77 78 79.2 82 83.3 83.4 86.4 87 89 93.3 100
## 2 3 18 10 10 9 4 10 9 3 9 7 5
## 100.8
## 2
table(pData(tc_valid)$eb_lc_estatura)
##
## 152 154 155 156 158 159 160 163 164 165 166 167 169 172 173 174 176 177 182 183
## 1 10 9 6 15 2 3 9 15 12 19 3 2 10 9 32 1 7 9 10
length(unique(pData(tc_valid)[["codigo_paciente"]]))
## [1] 29
tc_cure <- subset_expt(tc_valid, subset="finaloutcome=='cure'")
## subset_expt(): There were 184, now there are 122 samples.
table(pData(tc_cure)$visitnumber)
##
## 3 2 1
## 32 33 57
tc_fail <- subset_expt(tc_valid, subset="finaloutcome=='failure'")
## subset_expt(): There were 184, now there are 62 samples.
table(pData(tc_fail)$visitnumber)
##
## 3 2 1
## 19 17 26
all_types <- table(pData(tc_valid)[["typeofcells"]])
all_types
##
## biopsy eosinophils monocytes neutrophils
## 18 41 63 62
all_times <- table(pData(tc_valid)[["visitnumber"]])
all_times
##
## 3 2 1
## 51 50 83
cpm_data <- normalize_expt(tc_valid, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel=glue::glue("{xlsx_prefix}/3_cali_and_tumaco/CPM/all_valid_samples.xlsx"))
## Deleting the file analyses/3_cali_and_tumaco/CPM/all_valid_samples.xlsx before writing the tables.
rpkm_data <- normalize_expt(tc_valid, filter=TRUE, convert="rpkm",
column="cds_length", na_to_zero=TRUE)
## Removing 5633 low-count genes (14290 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tc_all_valid_samples-v{ver}.csv"))
## Note, I will save these data structures in a little bit when I decide how I want
## to set the conditions, batches, and colors.
tc_eosinophils <- subset_expt(tc_valid, subset="typeofcells=='eosinophils'")
## subset_expt(): There were 184, now there are 41 samples.
tc_monocytes <- subset_expt(tc_valid, subset="typeofcells=='monocytes'")
## subset_expt(): There were 184, now there are 63 samples.
tc_neutrophils <- subset_expt(tc_valid, subset="typeofcells=='neutrophils'")
## subset_expt(): There were 184, now there are 62 samples.
cpm_data <- normalize_expt(tc_eosinophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel=glue::glue("{xlsx_prefix}/3_cali_and_tumaco/CPM/tc_eosinophils.xlsx"))
## Deleting the file analyses/3_cali_and_tumaco/CPM/tc_eosinophils.xlsx before writing the tables.
rpkm_data <- normalize_expt(tc_eosinophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 9059 low-count genes (10864 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tc_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(tc_monocytes, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel=glue::glue("{xlsx_prefix}/3_cali_and_tumaco/CPM/tc_monocytes.xlsx"))
## Deleting the file analyses/3_cali_and_tumaco/CPM/tc_monocytes.xlsx before writing the tables.
rpkm_data <- normalize_expt(tc_monocytes, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 8819 low-count genes (11104 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tc_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(tc_neutrophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel=glue::glue("{xlsx_prefix}/3_cali_and_tumaco/CPM/tc_neutrophils.xlsx"))
## Deleting the file analyses/3_cali_and_tumaco/CPM/tc_neutrophils.xlsx before writing the tables.
rpkm_data <- normalize_expt(tc_neutrophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 10681 low-count genes (9242 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tc_neutrophils-v{ver}.csv"))
## Currently, these are not used, but instead I pulled the samples from the hs_clinical
## which means the biopsies are not included.
tcv1_samples <- subset_expt(tc_valid, subset="visitnumber=='1'")
## subset_expt(): There were 184, now there are 83 samples.
save(list = "tcv1_samples", file = glue::glue("rda/tmrc3_tcv1_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tcv1_samples")
tcv2_samples <- subset_expt(tc_valid, subset="visitnumber=='2'")
## subset_expt(): There were 184, now there are 50 samples.
save(list = "tcv2_samples", file = glue::glue("rda/tmrc3_tcv2_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tcv2_samples")
tcv3_samples <- subset_expt(tc_valid, subset="visitnumber=='3'")
## subset_expt(): There were 184, now there are 51 samples.
save(list = "tcv3_samples", file = glue::glue("rda/tmrc3_tcv3_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tcv3_samples")
cpm_data <- normalize_expt(tcv1_samples, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel=glue::glue("{xlsx_prefix}/3_cali_and_tumaco/CPM/tcv1_samples.xlsx"))
## Deleting the file analyses/3_cali_and_tumaco/CPM/tcv1_samples.xlsx before writing the tables.
rpkm_data <- normalize_expt(tcv1_samples, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 5772 low-count genes (14151 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tcv1_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tcv2_samples, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel=glue::glue("{xlsx_prefix}/3_cali_and_tumaco/CPM/tcv2_samples.xlsx"))
## Deleting the file analyses/3_cali_and_tumaco/CPM/tcv2_samples.xlsx before writing the tables.
rpkm_data <- normalize_expt(tcv2_samples, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 8184 low-count genes (11739 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tcv2_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tcv3_samples, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel=glue::glue("{xlsx_prefix}/3_cali_and_tumaco/CPM/tcv3_samples.xlsx"))
## Deleting the file analyses/3_cali_and_tumaco/CPM/tcv3_samples.xlsx before writing the tables.
rpkm_data <- normalize_expt(tcv3_samples, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 8259 low-count genes (11664 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tcv3_samples-v{ver}.csv"))
All data structures which start with the prefix ‘tc’ are Tumaco and Cali. Those with ‘t’ are only Tumaco.
tc_clinical <- tc_valid %>%
set_expt_conditions(fact="finaloutcome") %>%
set_expt_batches(fact="typeofcells") %>%
set_expt_colors(color_choices[["cf"]])
save(list = "tc_clinical", file = glue::glue("rda/tmrc3_tc_clinical-v{ver}.rda"))
data_structures <- c(data_structures, "tc_clinical")
tc_clinical_nobiop <- subset_expt(tc_clinical, subset="typeofcells!='biopsy'")
## subset_expt(): There were 184, now there are 166 samples.
save(list = "tc_clinical_nobiop", file = glue::glue("rda/tmrc3_tc_clinical_nobiop-v{ver}.rda"))
data_structures <- c(data_structures, "tc_clinical_nobiop")
tc_biopsies <- tc_valid %>%
set_expt_conditions(fact="clinic") %>%
set_expt_batches(fact="finaloutcome") %>%
subset_expt(subset="typeofcells=='biopsy'")
## subset_expt(): There were 184, now there are 18 samples.
tc_cf <- paste0(pData(tc_biopsies)[["clinic"]], "_",
pData(tc_biopsies)[["finaloutcome"]])
table(tc_cf)
## tc_cf
## Cali_cure Tumaco_cure Tumaco_failure
## 4 9 5
tc_biopsies <- set_expt_conditions(tc_biopsies, fact=tc_cf) %>%
set_expt_batches(fact="visitnumber")
save(list = "tc_biopsies", file = glue::glue("rda/tmrc3_tc_biopsies-v{ver}.rda"))
data_structures <- c(data_structures, "tc_biopsies")
clinic_cf <- paste0(pData(tc_eosinophils)[["clinic"]], "_",
pData(tc_eosinophils)[["finaloutcome"]])
table(clinic_cf)
## clinic_cf
## Cali_cure Tumaco_cure Tumaco_failure
## 15 17 9
tc_eosinophils <- set_expt_conditions(tc_eosinophils, fact=clinic_cf) %>%
set_expt_batches(fact="visitnumber")
save(list = "tc_eosinophils", file = glue::glue("rda/tmrc3_tc_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tc_eosinophils")
tcv1_eosinophils <- subset_expt(tc_eosinophils, subset="visitnumber=='1'")
## subset_expt(): There were 41, now there are 14 samples.
save(list = "tcv1_eosinophils", file = glue::glue("rda/tmrc3_tcv1_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv1_eosinophils")
tcv2_eosinophils <- subset_expt(tc_eosinophils, subset="visitnumber=='2'")
## subset_expt(): There were 41, now there are 14 samples.
save(list = "tcv2_eosinophils", file = glue::glue("rda/tmrc3_tcv2_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv2_eosinophils")
tcv3_eosinophils <- subset_expt(tc_eosinophils, subset="visitnumber=='3'")
## subset_expt(): There were 41, now there are 13 samples.
save(list = "tcv3_eosinophils", file = glue::glue("rda/tmrc3_tcv3_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv3_eosinophils")
clinic_cf <- paste0(pData(tc_monocytes)[["clinic"]], "_",
pData(tc_monocytes)[["finaloutcome"]])
table(clinic_cf)
## clinic_cf
## Cali_cure Cali_failure Tumaco_cure Tumaco_failure
## 18 3 21 21
tc_monocytes <- set_expt_conditions(tc_monocytes, fact=clinic_cf) %>%
set_expt_batches(fact="visitnumber")
save(list = "tc_monocytes", file = glue::glue("rda/tmrc3_tc_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tc_monocytes")
tcv1_monocytes <- subset_expt(tc_monocytes, subset="visitnumber=='1'")
## subset_expt(): There were 63, now there are 26 samples.
save(list = "tcv1_monocytes", file = glue::glue("rda/tmrc3_tcv1_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tcv1_monocytes")
tcv2_monocytes <- subset_expt(tc_monocytes, subset="visitnumber=='2'")
## subset_expt(): There were 63, now there are 18 samples.
save(list = "tcv2_monocytes", file = glue::glue("rda/tmrc3_tcv2_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tcv2_monocytes")
tcv3_monocytes <- subset_expt(tc_monocytes, subset="visitnumber=='3'")
## subset_expt(): There were 63, now there are 19 samples.
save(list = "tcv3_monocytes", file = glue::glue("rda/tmrc3_tcv3_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tcv3_monocytes")
clinic_cf <- paste0(pData(tc_neutrophils)[["clinic"]], "_",
pData(tc_neutrophils)[["finaloutcome"]])
table(clinic_cf)
## clinic_cf
## Cali_cure Cali_failure Tumaco_cure Tumaco_failure
## 18 3 20 21
tc_neutrophils <- set_expt_conditions(tc_neutrophils, fact=clinic_cf) %>%
set_expt_batches(fact="visitnumber")
save(list = "tc_neutrophils", file = glue::glue("rda/tmrc3_tc_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tc_neutrophils")
tcv1_neutrophils <- subset_expt(tc_neutrophils, subset="visitnumber=='1'")
## subset_expt(): There were 62, now there are 25 samples.
save(list = "tcv1_neutrophils", file = glue::glue("rda/tmrc3_tcv1_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv1_neutrophils")
tcv2_neutrophils <- subset_expt(tc_neutrophils, subset="visitnumber=='2'")
## subset_expt(): There were 62, now there are 18 samples.
save(list = "tcv2_neutrophils", file = glue::glue("rda/tmrc3_tcv2_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv2_neutrophils")
tcv3_neutrophils <- subset_expt(tc_neutrophils, subset="visitnumber=='3'")
## subset_expt(): There were 62, now there are 19 samples.
save(list = "tcv3_neutrophils", file = glue::glue("rda/tmrc3_tcv3_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tcv3_neutrophils")
Here is an outline of the samples in their current state:
Our recent discussions have settled one big question regarding which samples to use: We will limit our analyses to only those samples from Tumaco.
The following block will therefore set that group as our default for future analyses.
t_clinical <- tc_clinical %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 184, now there are 123 samples.
save(list = "t_clinical", file = glue::glue("rda/tmrc3_t_clinical-v{ver}.rda"))
data_structures <- c(data_structures, "t_clinical")
cpm_data <- normalize_expt(t_clinical, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_clinical_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_clinical_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_clinical, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 5774 low-count genes (14149 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_clinical-v{ver}.csv"))
cpm_data <- normalize_expt(t_clinical, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 5774 low-count genes (14149 remaining).
## Setting 17282 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_clinical_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_clinical_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_clinical, filter=TRUE, batch="svaseq") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 5774 low-count genes (14149 remaining).
## Setting 9132 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_clinical_sva-v{ver}.csv"))
t_clinical_nobiop <- subset_expt(t_clinical, subset="typeofcells!='biopsy'")
## subset_expt(): There were 123, now there are 109 samples.
save(list = "t_clinical_nobiop", file = glue::glue("rda/tmrc3_t_clinical_nobiop-v{ver}.rda"))
data_structures <- c(data_structures, "t_clinical_nobiop")
cpm_data <- normalize_expt(t_clinical_nobiop, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_clinical_nobiop_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_clinical_nobiop_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_clinical, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 5774 low-count genes (14149 remaining).
## Setting 9132 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_clinical_sva-v{ver}.csv"))
cpm_data <- normalize_expt(t_clinical_nobiop, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 8016 low-count genes (11907 remaining).
## Setting 9578 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_clinical_nobiop_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_clinical_nobiop_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_clinical_nobiop, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 8016 low-count genes (11907 remaining).
## Setting 1834 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_clinical_nobiop_sva-v{ver}.csv"))
At least in theory, everything which follows will be using the above ‘clinical’ data structure. Thus, let us count it up and get a sense of what we will work with.
table(pData(t_clinical)$drug)
##
## antimony
## 123
table(pData(t_clinical)$clinic)
##
## Tumaco
## 123
table(pData(t_clinical)$finaloutcome)
##
## cure failure
## 67 56
table(pData(t_clinical)$typeofcells)
##
## biopsy eosinophils monocytes neutrophils
## 14 26 42 41
table(pData(t_clinical)$visit)
##
## 3 2 1
## 34 35 54
summary(as.numeric(pData(t_clinical)$eb_lc_tiempo_evolucion))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 4.00 7.03 8.00 21.00
summary(as.numeric(pData(t_clinical)$eb_lc_tto_mcto_glucan_dosis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13 14 17 17 20 20
summary(as.numeric(pData(t_clinical)$v3_lc_ejey_lesion_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 7.2 31.3 389.6 999.0 999.0
summary(as.numeric(pData(t_clinical)$v3_lc_lesion_area_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 46 222 999 1089 999 5055
summary(as.numeric(pData(t_clinical)$v3_lc_ejex_ulcera_mm_1))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 383 999 999
table(pData(t_clinical)$eb_lc_sexo)
##
## 1 2
## 101 22
table(pData(t_clinical)$eb_lc_etnia)
##
## 1 2 3
## 76 19 28
summary(as.numeric(pData(t_clinical)$edad))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 23.0 25.0 28.5 34.0 51.0
table(pData(t_clinical)$eb_lc_peso)
##
## 53.9 57.9 58.1 58.3 58.6 59 59.6 62 63 69.4 77 78 79.2
## 9 2 7 10 3 8 1 6 6 10 9 10 10
## 83.3 83.4 86.4 93.3 100.8
## 4 10 9 7 2
table(pData(t_clinical)$eb_lc_estatura)
##
## 152 154 158 159 163 164 165 166 172 173 174 176 177 182 183
## 1 10 15 2 9 15 12 10 10 4 8 1 7 9 10
length(unique(pData(t_clinical)[["codigo_paciente"]]))
## [1] 19
only_cure <- pData(t_clinical)[["finaloutcome"]] == "cure"
c_meta <- pData(t_clinical)[only_cure, ]
length(unique(c_meta[["codigo_paciente"]]))
## [1] 10
only_fail <- pData(t_clinical)[["finaloutcome"]] == "failure"
f_meta <- pData(t_clinical)[only_fail, ]
length(unique(f_meta[["codigo_paciente"]]))
## [1] 9
t_cure <- subset_expt(t_clinical, subset="finaloutcome=='cure'")
## subset_expt(): There were 123, now there are 67 samples.
table(pData(t_cure)$visitnumber)
##
## 3 2 1
## 17 20 30
t_fail <- subset_expt(t_clinical, subset="finaloutcome=='failure'")
## subset_expt(): There were 123, now there are 56 samples.
table(pData(t_fail)$visitnumber)
##
## 3 2 1
## 17 15 24
I previously made a bunch of data subsets by visit, cell type, etc. So let us overwrite them all with versions which contain only the Tumaco samples.
There is no going back to the Cali samples after this block unless we regenerate the data from the original expressionsets.
t_biopsies <- tc_biopsies %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 18, now there are 14 samples.
save(list = "t_biopsies", file = glue::glue("rda/tmrc3_t_biopsies-v{ver}.rda"))
data_structures <- c(data_structures, "t_biopsies")
cpm_data <- normalize_expt(t_biopsies, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_biopsies_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_biopsies_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_biopsies, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 6417 low-count genes (13506 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_biopsies-v{ver}.csv"))
cpm_data <- normalize_expt(t_biopsies, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 6417 low-count genes (13506 remaining).
## Setting 145 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_biopsies_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_biopsies_cpm_sva.xlsx before writing the tables.
t_eosinophils <- tc_eosinophils %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 41, now there are 26 samples.
save(list = "t_eosinophils", file = glue::glue("rda/tmrc3_t_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "t_eosinophils")
cpm_data <- normalize_expt(t_eosinophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_eosinophils_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_eosinophils_cpm.xlsx before writing the tables.
cpm_data <- normalize_expt(t_eosinophils, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 9393 low-count genes (10530 remaining).
## Setting 325 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_eosinophils_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_eosinophils_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_eosinophils, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 9393 low-count genes (10530 remaining).
## Setting 106 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_eosinophils_sva-v{ver}.csv"))
t_monocytes <- tc_monocytes %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 63, now there are 42 samples.
save(list = "t_monocytes", file = glue::glue("rda/tmrc3_t_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "t_monocytes")
cpm_data <- normalize_expt(t_monocytes, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_monocytes_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_monocytes_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_monocytes, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 9064 low-count genes (10859 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(t_monocytes, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 9064 low-count genes (10859 remaining).
## Setting 730 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_monocytes_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_monocytes_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_monocytes, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 9064 low-count genes (10859 remaining).
## Setting 296 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_monocytes_sva-v{ver}.csv"))
t_neutrophils <- tc_neutrophils %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 62, now there are 41 samples.
save(list = "t_neutrophils", file = glue::glue("rda/tmrc3_t_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "t_neutrophils")
cpm_data <- normalize_expt(t_neutrophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_neutrophils_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_neutrophils_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_neutrophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 10824 low-count genes (9099 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(t_neutrophils, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 10824 low-count genes (9099 remaining).
## Setting 750 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_neutrophils_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_neutrophils_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(t_neutrophils, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 10824 low-count genes (9099 remaining).
## Setting 162 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/t_neutrophils_sva-v{ver}.csv"))
tv1_samples <- tcv1_samples %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 83, now there are 54 samples.
save(list = "tv1_samples", file = glue::glue("rda/tmrc3_tv1_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tv1_samples")
cpm_data <- normalize_expt(tv1_samples, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv1_samples_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv1_samples_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv1_samples, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 5907 low-count genes (14016 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv1_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tv1_samples, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 5907 low-count genes (14016 remaining).
## Setting 7615 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv1_samples_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv1_samples_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv1_samples, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 5907 low-count genes (14016 remaining).
## Setting 2539 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv1_samples_sva-v{ver}.csv"))
tv2_samples <- tcv2_samples %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 50, now there are 35 samples.
save(list = "tv2_samples", file = glue::glue("rda/tmrc3_tv2_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tv2_samples")
cpm_data <- normalize_expt(tv2_samples, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv2_samples_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv2_samples_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv2_samples, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 8364 low-count genes (11559 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv2_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tv2_samples, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 8364 low-count genes (11559 remaining).
## Setting 2848 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv2_samples_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv2_samples_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv2_samples, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 8364 low-count genes (11559 remaining).
## Setting 410 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv2_samples_sva-v{ver}.csv"))
tv3_samples <- tcv3_samples %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 51, now there are 34 samples.
save(list = "tv3_samples", file = glue::glue("rda/tmrc3_tv3_samples-v{ver}.rda"))
data_structures <- c(data_structures, "tv3_samples")
cpm_data <- normalize_expt(tv3_samples, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv3_samples_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv3_samples_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv3_samples, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 8474 low-count genes (11449 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv3_samples-v{ver}.csv"))
cpm_data <- normalize_expt(tv3_samples, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 8474 low-count genes (11449 remaining).
## Setting 1878 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv3_samples_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv3_samples_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv3_samples, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 8474 low-count genes (11449 remaining).
## Setting 248 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv3_samples_sva-v{ver}.csv"))
tv1_eosinophils <- tcv1_eosinophils %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 14, now there are 8 samples.
save(list = "tv1_eosinophils", file = glue::glue("rda/tmrc3_tv1_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv1_eosinophils")
cpm_data <- normalize_expt(tv1_eosinophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv1_eosinophils_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv1_eosinophils_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv1_eosinophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 9946 low-count genes (9977 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv1_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv1_eosinophils, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 9946 low-count genes (9977 remaining).
## Setting 57 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv1_eosinophils_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv1_eosinophils_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv1_eosinophils, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 9946 low-count genes (9977 remaining).
## Setting 20 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv1_eosinophils_sva-v{ver}.csv"))
tv2_eosinophils <- tcv2_eosinophils %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 14, now there are 9 samples.
save(list = "tv2_eosinophils", file = glue::glue("rda/tmrc3_tv2_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv2_eosinophils")
cpm_data <- normalize_expt(tv2_eosinophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv2_eosinophils_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv2_eosinophils_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv2_eosinophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 9808 low-count genes (10115 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv2_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv2_eosinophils, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 9808 low-count genes (10115 remaining).
## Setting 90 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv2_eosinophils_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv2_eosinophils_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv2_eosinophils, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 9808 low-count genes (10115 remaining).
## Setting 39 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv2_eosinophils_sva-v{ver}.csv"))
tv3_eosinophils <- tcv3_eosinophils %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 13, now there are 9 samples.
save(list = "tv3_eosinophils", file = glue::glue("rda/tmrc3_tv3_eosinophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv3_eosinophils")
cpm_data <- normalize_expt(tv3_eosinophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv3_eosinophils_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv3_eosinophils_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv3_eosinophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 9845 low-count genes (10078 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv3_eosinophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv3_eosinophils, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 9845 low-count genes (10078 remaining).
## Setting 48 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv3_eosinophils_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv3_eosinophils_cpm_sva.xlsx before writing the tables.
## Note, the cbcb filter left behind a sufficient number of zeros that it confused cpm.
rpkm_data <- normalize_expt(tv3_eosinophils, filter="simple", batch="svaseq") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 3335 low-count genes (16588 remaining).
## Setting 5428 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv3_eosinophils_sva-v{ver}.csv"))
tv1_monocytes <- tcv1_monocytes %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 26, now there are 16 samples.
save(list = "tv1_monocytes", file = glue::glue("rda/tmrc3_tv1_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tv1_monocytes")
cpm_data <- normalize_expt(tv1_monocytes, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv1_monocytes_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv1_monocytes_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv1_monocytes, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 9444 low-count genes (10479 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv1_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(tv1_monocytes, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 9444 low-count genes (10479 remaining).
## Setting 187 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv1_monocytes_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv1_monocytes_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv1_monocytes, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 9444 low-count genes (10479 remaining).
## Setting 80 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv1_monocytes_sva-v{ver}.csv"))
tv2_monocytes <- tcv2_monocytes %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 18, now there are 13 samples.
save(list = "tv2_monocytes", file = glue::glue("rda/tmrc3_tv2_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tv2_monocytes")
cpm_data <- normalize_expt(tv2_monocytes, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv2_monocytes_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv2_monocytes_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv2_monocytes, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 9403 low-count genes (10520 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv2_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(tv2_monocytes, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 9403 low-count genes (10520 remaining).
## Setting 115 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv2_monocytes_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv2_monocytes_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv2_monocytes, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 9403 low-count genes (10520 remaining).
## Setting 39 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv2_monocytes_sva-v{ver}.csv"))
tv3_monocytes <- tcv3_monocytes %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 19, now there are 13 samples.
save(list = "tv3_monocytes", file = glue::glue("rda/tmrc3_tv3_monocytes-v{ver}.rda"))
data_structures <- c(data_structures, "tv3_monocytes")
cpm_data <- normalize_expt(tv3_monocytes, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv3_monocytes_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv3_monocytes_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv3_monocytes, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 9549 low-count genes (10374 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv3_monocytes-v{ver}.csv"))
cpm_data <- normalize_expt(tv3_monocytes, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 9549 low-count genes (10374 remaining).
## Setting 55 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv3_monocytes_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv3_monocytes_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv3_monocytes, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 9549 low-count genes (10374 remaining).
## Setting 28 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv3_monocytes_sva-v{ver}.csv"))
tv1_neutrophils <- tcv1_neutrophils %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 25, now there are 16 samples.
save(list = "tv1_neutrophils", file = glue::glue("rda/tmrc3_tv1_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv1_neutrophils")
cpm_data <- normalize_expt(tv1_neutrophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv1_neutrophils_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv1_neutrophils_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv1_neutrophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 11208 low-count genes (8715 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv1_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv1_neutrophils, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 11208 low-count genes (8715 remaining).
## Setting 145 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv1_neutrophils_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv1_neutrophils_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv1_neutrophils, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 11208 low-count genes (8715 remaining).
## Setting 41 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv1_neutrophils_sva-v{ver}.csv"))
tv2_neutrophils <- tcv2_neutrophils %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 18, now there are 13 samples.
save(list = "tv2_neutrophils", file = glue::glue("rda/tmrc3_tv2_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv2_neutrophils")
cpm_data <- normalize_expt(tv2_neutrophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv2_neutrophils_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv2_neutrophils_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv2_neutrophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 11473 low-count genes (8450 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv2_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv2_neutrophils, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 11473 low-count genes (8450 remaining).
## Setting 78 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv2_neutrophils_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv2_neutrophils_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv2_neutrophils, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 11473 low-count genes (8450 remaining).
## Setting 41 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv2_neutrophils_sva-v{ver}.csv"))
tv3_neutrophils <- tcv3_neutrophils %>%
subset_expt(subset="clinic=='Tumaco'")
## subset_expt(): There were 19, now there are 12 samples.
save(list = "tv3_neutrophils", file = glue::glue("rda/tmrc3_tv3_neutrophils-v{ver}.rda"))
data_structures <- c(data_structures, "tv3_neutrophils")
cpm_data <- normalize_expt(tv3_neutrophils, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv3_neutrophils_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv3_neutrophils_cpm.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv3_neutrophils, filter=TRUE, convert="rpkm",
na_to_zero=TRUE, column="cds_length")
## Removing 11420 low-count genes (8503 remaining).
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv3_neutrophils-v{ver}.csv"))
cpm_data <- normalize_expt(tv3_neutrophils, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 11420 low-count genes (8503 remaining).
## Setting 83 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/tv3_neutrophils_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/tv3_neutrophils_cpm_sva.xlsx before writing the tables.
rpkm_data <- normalize_expt(tv3_neutrophils, filter=TRUE, batch="svaseq",
column="cds_length") %>%
normalize_expt(convert="rpkm", column="cds_length", na_to_zero=TRUE)
## Removing 11420 low-count genes (8503 remaining).
## Setting 18 low elements to zero.
write.csv(x = as.data.frame(exprs(rpkm_data)),
file = glue::glue("data/rpkm/tv3_neutrophils_sva-v{ver}.csv"))
Is this redundant?
t_visit <- set_expt_conditions(t_clinical, fact = "visitnumber") %>%
set_expt_batches(fact = "finaloutcome") %>%
subset_expt(subset="typeofcells!='biopsy'")
## subset_expt(): There were 123, now there are 109 samples.
save(list = "t_visit", file = glue::glue("rda/tmrc3_t_visit-v{ver}.rda"))
data_structures <- c(data_structures, "t_visit")
cpm_data <- normalize_expt(t_visit, convert="cpm")
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_visit_cpm.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_visit_cpm.xlsx before writing the tables.
cpm_data <- normalize_expt(t_visit, convert="cpm", filter=TRUE, batch="svaseq")
## Removing 8016 low-count genes (11907 remaining).
## Setting 9614 low elements to zero.
written <- write_xlsx(
exprs(cpm_data),
excel="analyses/4_tumaco/CPM/t_visit_cpm_sva.xlsx")
## Deleting the file analyses/4_tumaco/CPM/t_visit_cpm_sva.xlsx before writing the tables.
visit_cf_expt_factor <- paste0("v", pData(t_clinical)[["visitnumber"]],
pData(t_clinical)[["finaloutcome"]])
t_visitcf <- set_expt_conditions(t_clinical, fact = visit_cf_expt_factor)
save(list = "t_visitcf", file = glue::glue("rda/tmrc3_t_visitcf-v{ver}.rda"))
data_structures <- c(data_structures, "t_visitcf")
t_visitcf_eosinophil <- subset_expt(t_visitcf, subset="typeofcells=='eosinophils'")
## subset_expt(): There were 123, now there are 26 samples.
save(list = "t_visitcf_eosinophil", file = glue::glue("rda/tmrc3_t_visitcf_eosinophil-v{ver}.rda"))
data_structures <- c(data_structures, "t_visitcf_eosinophil")
t_visitcf_monocyte <- subset_expt(t_visitcf, subset="typeofcells=='monocytes'")
## subset_expt(): There were 123, now there are 42 samples.
save(list = "t_visitcf_monocyte", file = glue::glue("rda/tmrc3_t_visitcf_monocyte-v{ver}.rda"))
data_structures <- c(data_structures, "t_visitcf_monocyte")
t_visitcf_neutrophil <- subset_expt(t_visitcf, subset="typeofcells=='neutrophils'")
## subset_expt(): There were 123, now there are 41 samples.
save(list = "t_visitcf_neutrophil", file = glue::glue("rda/tmrc3_t_visitcf_neutrophil-v{ver}.rda"))
data_structures <- c(data_structures, "t_visitcf_neutrophil")
t_persistence <- subset_expt(t_clinical, subset = "persistence=='Y'|persistence=='N'") %>%
subset_expt(subset = 'visitnumber==3') %>%
set_expt_conditions(fact = 'persistence')
## subset_expt(): There were 123, now there are 97 samples.
## subset_expt(): There were 97, now there are 30 samples.
save(list = "t_persistence", file = glue::glue("rda/tmrc3_t_persistence-v{ver}.rda"))
data_structures <- c(data_structures, "t_persistence")
t_persistence_monocyte <- subset_expt(t_persistence, subset = "typeofcells=='monocytes'")
## subset_expt(): There were 30, now there are 12 samples.
save(list = "t_persistence_monocyte", file = glue::glue("rda/tmrc3_t_persistence_monocyte-v{ver}.rda"))
data_structures <- c(data_structures, "t_persistence_monocyte")
t_persistence_neutrophil <- subset_expt(t_persistence, subset = "typeofcells=='neutrophils'")
## subset_expt(): There were 30, now there are 10 samples.
save(list = "t_persistence_neutrophil", file = glue::glue("rda/tmrc3_t_persistence_neutrophil-v{ver}.rda"))
data_structures <- c(data_structures, "t_persistence_neutrophil")
t_persistence_eosinophil <- subset_expt(t_persistence, subset = "typeofcells=='eosinophils'")
## subset_expt(): There were 30, now there are 8 samples.
save(list = "t_persistence_eosinophil", file = glue::glue("rda/tmrc3_t_persistence_eosinophil-v{ver}.rda"))
data_structures <- c(data_structures, "t_persistence_eosinophil")
save(list = data_structures, file = glue::glue("rda/tmrc3_data_structures-v{ver}.rda"))
Here is an outline of the samples in their current state:
ncol(exprs(t_clinical))
## [1] 123
sum(pData(t_clinical)[["finaloutcome"]] == "cure")
## [1] 67
sum(pData(t_clinical)[["finaloutcome"]] == "failure")
## [1] 56
all_types[["biopsy"]]
## [1] 18
sum(pData(t_biopsies)[["finaloutcome"]] == "cure")
## [1] 9
sum(pData(t_biopsies)[["finaloutcome"]] == "failure")
## [1] 5
all_types[["eosinophils"]]
## [1] 41
sum(pData(t_eosinophils)[["finaloutcome"]] == "cure")
## [1] 17
sum(pData(t_eosinophils)[["finaloutcome"]] == "failure")
## [1] 9
nrow(pData(tv1_eosinophils))
## [1] 8
sum(pData(tv1_eosinophils)[["finaloutcome"]] == "cure")
## [1] 5
sum(pData(tv1_eosinophils)[["finaloutcome"]] == "failure")
## [1] 3
nrow(pData(tv2_eosinophils))
## [1] 9
sum(pData(tv2_eosinophils)[["finaloutcome"]] == "cure")
## [1] 6
sum(pData(tv2_eosinophils)[["finaloutcome"]] == "failure")
## [1] 3
nrow(pData(tv3_eosinophils))
## [1] 9
sum(pData(tv3_eosinophils)[["finaloutcome"]] == "cure")
## [1] 6
sum(pData(tv3_eosinophils)[["finaloutcome"]] == "failure")
## [1] 3
all_types[["monocytes"]]
## [1] 63
sum(pData(t_monocytes)[["finaloutcome"]] == "cure")
## [1] 21
sum(pData(t_monocytes)[["finaloutcome"]] == "failure")
## [1] 21
nrow(pData(tv1_monocytes))
## [1] 16
sum(pData(tv1_monocytes)[["finaloutcome"]] == "cure")
## [1] 8
sum(pData(tv1_monocytes)[["finaloutcome"]] == "failure")
## [1] 8
nrow(pData(tv2_monocytes))
## [1] 13
sum(pData(tv2_monocytes)[["finaloutcome"]] == "cure")
## [1] 7
sum(pData(tv2_monocytes)[["finaloutcome"]] == "failure")
## [1] 6
nrow(pData(tv3_monocytes))
## [1] 13
sum(pData(tv3_monocytes)[["finaloutcome"]] == "cure")
## [1] 6
sum(pData(tv3_monocytes)[["finaloutcome"]] == "failure")
## [1] 7
all_types[["neutrophils"]]
## [1] 62
sum(pData(t_neutrophils)[["finaloutcome"]] == "cure")
## [1] 20
sum(pData(t_neutrophils)[["finaloutcome"]] == "failure")
## [1] 21
nrow(pData(tv1_neutrophils))
## [1] 16
sum(pData(tv1_neutrophils)[["finaloutcome"]] == "cure")
## [1] 8
sum(pData(tv1_neutrophils)[["finaloutcome"]] == "failure")
## [1] 8
nrow(pData(tv2_monocytes))
## [1] 13
sum(pData(tv2_neutrophils)[["finaloutcome"]] == "cure")
## [1] 7
sum(pData(tv2_neutrophils)[["finaloutcome"]] == "failure")
## [1] 6
nrow(pData(tv3_neutrophils))
## [1] 12
sum(pData(tv3_neutrophils)[["finaloutcome"]] == "cure")
## [1] 5
sum(pData(tv3_neutrophils)[["finaloutcome"]] == "failure")
## [1] 7
if (!isTRUE(get0("skip_load"))) {
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
message(paste0("Saving to ", savefile))
tmp <- sm(saveme(filename=savefile))
}
Make an expressionset of the parasite reads in the TMRC3 samples and distinguish between the faux and real reads. E.g: Are there samples which definitively contain parasites?
Later, I manually went through the mappings of samples with a significant number of parasite reads in IGV with some TMRC2 known zymodeme samples. In many cases it is possible to state definitively the classification of the parasite which infected an individual.
lp_expt <- sm(create_expt(
samplesheet,
file_column="lpanamensisv36hisatfile", gene_info = NULL)) %>%
subset_expt(coverage=1000) %>%
set_expt_conditions(fact="typeofcells")
## The samples removed (and read coverage) when filtering samples with less than 1000 reads are:
## TMRC30001 TMRC30002 TMRC30003 TMRC30004 TMRC30005 TMRC30006 TMRC30007 TMRC30008
## 12 8 9 16 25 29 3 16
## TMRC30009 TMRC30010 TMRC30011 TMRC30012 TMRC30013 TMRC30185 TMRC30186 TMRC30178
## 16 0 5 9 13 0 852 420
## TMRC30179 TMRC30221 TMRC30222 TMRC30223 TMRC30224 TMRC30269 TMRC30148 TMRC30150
## 515 0 0 0 2 58 706 470
## TMRC30140 TMRC30138 TMRC30151 TMRC30234 TMRC30235 TMRC30270 TMRC30225 TMRC30226
## 153 156 480 0 0 21 0 0
## TMRC30227 TMRC30228 TMRC30229 TMRC30230 TMRC30231 TMRC30232 TMRC30233 TMRC30018
## 0 0 0 0 0 1 0 110
## TMRC30209 TMRC30210 TMRC30211 TMRC30212 TMRC30213 TMRC30216 TMRC30214 TMRC30215
## 0 0 0 0 0 0 0 1
## TMRC30271 TMRC30273 TMRC30275 TMRC30272 TMRC30274 TMRC30276 TMRC30254 TMRC30277
## 4 2 2 0 0 1 208 6
## TMRC30239 TMRC30240 TMRC30278 TMRC30279 TMRC30280 TMRC30257 TMRC30281 TMRC30283
## 209 314 0 0 3 427 548 18
## TMRC30284 TMRC30282 TMRC30050 TMRC30285 TMRC30164 TMRC30119 TMRC30025 TMRC30032
## 4 1 345 0 424 419 954 22
## TMRC30028 TMRC30118 TMRC30014 TMRC30196 TMRC30030 TMRC30021 TMRC30023 TMRC30037
## 93 747 4 687 3 88 120 9
## TMRC30031 TMRC30165 TMRC30027 TMRC30194 TMRC30033 TMRC30038 TMRC30166 TMRC30036
## 8 433 108 198 36 208 496 11
## TMRC30024 TMRC30195 TMRC30034 TMRC30045 TMRC30040 TMRC30035 TMRC30041 TMRC30171
## 49 458 21 334 896 153 264 370
## TMRC30192 TMRC30139 TMRC30042 TMRC30158 TMRC30160 TMRC30189 TMRC30123 TMRC30181
## 384 140 545 284 296 307 304 613
## TMRC30043 TMRC30159 TMRC30129 TMRC30172 TMRC30174 TMRC30137 TMRC30161 TMRC30142
## 571 405 353 383 644 389 300 4
## TMRC30190 TMRC30145 TMRC30143 TMRC30197 TMRC30146 TMRC30182 TMRC30198 TMRC30266
## 537 0 1 202 0 468 158 822
## TMRC30201 TMRC30200 TMRC30203 TMRC30202 TMRC30205 TMRC30152 TMRC30155 TMRC30154
## 149 96 104 145 169 495 508 398
## TMRC30206 TMRC30207 TMRC30238 TMRC30217 TMRC30208 TMRC30219 TMRC30218 TMRC30260
## 125 0 335 0 1 0 2 291
## TMRC30220 TMRC30262 TMRC30261 TMRC30173 TMRC30264 TMRC30263 TMRC30144 TMRC30147
## 0 905 500 898 309 431 0 1
## TMRC30265
## 596
## subset_expt(): There were 244, now there are 99 samples.
visit_fact <- pData(lp_expt)[["visitnumber"]]
batch_na <- is.na(visit_fact)
visit_fact[batch_na] <- "undefined"
lp_expt <- set_expt_batches(lp_expt, fact = visit_fact)
save(list = "lp_expt", file = glue::glue("rda/tmrc3_lp_expt_all_sanitized-v{ver}.rda"))
data_structures <- c(data_structures, "lp_expt")
The above block is similar in concept to the previous expressionset creation. It uses a different column and currently ignores the gene annotations. Given that many of the samples have essentially 0 reads, I set a cutoff of only 20 observed genes. Finally, I did a quick and dirty PCA plot of this peculiar data structure in the hopes of being able to ‘see’ the difference between what are assumed to be ‘real’ samples with a significant number of ‘real’ parasite reads vs. those samples which just have a couple of potentially spurious reads.
tmp <- loadme(filename=savefile)