index.html preprocessing.html annotation.html sample_estimation.html
In ‘sample_estimation’, we did a series of analyses to try to pick out some of the surrogate variables in the data. Now we will perform a set of differential expression analyses using the results from that.
In the following, I will perform a set of differential expression analyses for transcript samples, once using the miRNA alignments, and once with the transcript alignments. Depending on what happens with them, I will repeat after separating the data between exosomes/cells.
I am also going to need to consider different methodologies for the DE analyses, but since the first round takes a while, I just want to see what they look like.
extra <- "exo_vs_cell=((exo_polya_mut-cell_polya_mut)-(exo_polya_wt-cell_polya_wt))"
mmtx_large <- normalize_expt(mmtx_large, filter=TRUE)## This function will replace the expt$expressionset slot with:
## cbcb(data)
## It backs up the current data into a slot named:
## expt$backup_expressionset. It will also save copies of each step along the way
## in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
## when invoking limma. The appropriate libsize is the non-log(cpm(normalized)).
## This is most likely kept at:
## 'new_expt$normalized$intermediate_counts$normalization$libsizes'
## A copy of this may also be found at:
## new_expt$best_libsize
## Leaving the data in its current base format, keep in mind that
## some metrics are easier to see when the data is log2 transformed, but
## EdgeR/DESeq do not accept transformed data.
## Leaving the data unconverted. It is often advisable to cpm/rpkm
## the data to normalize for sampling differences, keep in mind though that rpkm
## has some annoying biases, and voom() by default does a cpm (though hpgl_voom()
## will try to detect this).
## Leaving the data unnormalized. This is necessary for DESeq, but
## EdgeR/limma might benefit from normalization. Good choices include quantile,
## size-factor, tmm, etc.
## Not correcting the count-data for batch effects. If batch is
## included in EdgerR/limma's model, then this is probably wise; but in extreme
## batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 47765 low-count genes (40433 remaining).
## Step 2: not normalizing the data.
## Step 3: not converting the data.
## Step 4: not transforming the data.
## Step 5: not doing batch correction.
mmtx_norm <- normalize_expt(mmtx_large, convert="cpm")## This function will replace the expt$expressionset slot with:
## cpm(data)
## It backs up the current data into a slot named:
## expt$backup_expressionset. It will also save copies of each step along the way
## in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
## when invoking limma. The appropriate libsize is the non-log(cpm(normalized)).
## This is most likely kept at:
## 'new_expt$normalized$intermediate_counts$normalization$libsizes'
## A copy of this may also be found at:
## new_expt$best_libsize
## Filter is false, this should likely be set to something, good
## choices include cbcb, kofa, pofa (anything but FALSE). If you want this to
## stay FALSE, keep in mind that if other normalizations are performed, then the
## resulting libsizes are likely to be strange (potentially negative!)
## Leaving the data in its current base format, keep in mind that
## some metrics are easier to see when the data is log2 transformed, but
## EdgeR/DESeq do not accept transformed data.
## Leaving the data unnormalized. This is necessary for DESeq, but
## EdgeR/limma might benefit from normalization. Good choices include quantile,
## size-factor, tmm, etc.
## Not correcting the count-data for batch effects. If batch is
## included in EdgerR/limma's model, then this is probably wise; but in extreme
## batch effects this is a good parameter to play with.
## Step 1: not doing count filtering.
## Step 2: not normalizing the data.
## Step 3: converting the data with cpm.
## Step 4: not transforming the data.
## Step 5: not doing batch correction.
extra_data <- Biobase::exprs(mmtx_norm$expressionset)
mmtx_medians <- median_by_factor(extra_data, mmtx_norm$condition)## The factor cell_polya_mut has 3 rows.
## The factor cell_polya_wt has 2 rows.
## The factor exo_polya_mut has 4 rows.
## The factor exo_polya_wt has 2 rows.
extra_data <- merge(extra_data, mmtx_medians, by="row.names")
rownames(extra_data) <- extra_data$Row.names
extra_data <- extra_data[, -1]
tx_comparisons_svamodel <- all_pairwise(mmtx_large, model_batch="sva",
extra_contrasts=extra, parallel=FALSE,
edger_method="long", test_type="lrt")## The be method chose 2 surrogate variable(s).
## Estimate type 'sva' is shorthand for 'sva_unsupervised'.
## Other sva options include: sva_supervised and svaseq.
## Attempting sva unsupervised surrogate estimation.
## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Including batch estimates from sva/ruv/pca in the model.
## Choosing the intercept containing model.
## Limma step 1/6: choosing model.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Limma step 3/6: running lmFit.
## Limma step 4/6: making and fitting contrasts.
## Limma step 5/6: Running eBayes and topTable.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/11: Creating table: cell_polya_mut.
## Limma step 6/6: 2/11: Creating table: cell_polya_wt.
## Limma step 6/6: 3/11: Creating table: exo_polya_mut.
## Limma step 6/6: 4/11: Creating table: exo_polya_wt.
## Limma step 6/6: 5/11: Creating table: cell_polya_wt_vs_cell_polya_mut.
## Limma step 6/6: 6/11: Creating table: exo_polya_mut_vs_cell_polya_mut.
## Limma step 6/6: 7/11: Creating table: exo_polya_wt_vs_cell_polya_mut.
## Limma step 6/6: 8/11: Creating table: exo_polya_mut_vs_cell_polya_wt.
## Limma step 6/6: 9/11: Creating table: exo_polya_wt_vs_cell_polya_wt.
## Limma step 6/6: 10/11: Creating table: exo_polya_wt_vs_exo_polya_mut.
## Limma step 6/6: 11/11: Creating table: exo_vs_cell.
## Starting DESeq2 pairwise comparisons.
## The data should be suitable for EdgeR/DESeq.
## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.
## Including batch estimates from sva/ruv/pca in the model.
## DESeq2 step 1/5: Including a matrix of batch estimates from sva/ruv/pca in the deseq model.
## converting counts to integer mode
## DESeq2 step 2/5: Estimate size factors.
## DESeq2 step 3/5: Estimate dispersions.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## DESeq2 step 4/5: nbinomWaldTest.
## DESeq2 step 5/5: 1/6: Creating table: cell_polya_wt_vs_cell_polya_mut
## DESeq2 step 5/5: 2/6: Creating table: exo_polya_mut_vs_cell_polya_mut
## DESeq2 step 5/5: 3/6: Creating table: exo_polya_wt_vs_cell_polya_mut
## DESeq2 step 5/5: 4/6: Creating table: exo_polya_mut_vs_cell_polya_wt
## DESeq2 step 5/5: 5/6: Creating table: exo_polya_wt_vs_cell_polya_wt
## DESeq2 step 5/5: 6/6: Creating table: exo_polya_wt_vs_exo_polya_mut
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq.
## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.
## Including batch estimates from sva/ruv/pca in the model.
## Choosing the intercept containing model.
## EdgeR step 1/9: normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'test_type'.
## EdgeR step 8/9: Making pairwise contrasts.
## EdgeR step 9/9: 1/7: Creating table: cell_polya_wt_vs_cell_polya_mut.
## EdgeR step 9/9: 2/7: Creating table: exo_polya_mut_vs_cell_polya_mut.
## EdgeR step 9/9: 3/7: Creating table: exo_polya_wt_vs_cell_polya_mut.
## EdgeR step 9/9: 4/7: Creating table: exo_polya_mut_vs_cell_polya_wt.
## EdgeR step 9/9: 5/7: Creating table: exo_polya_wt_vs_cell_polya_wt.
## EdgeR step 9/9: 6/7: Creating table: exo_polya_wt_vs_exo_polya_mut.
## EdgeR step 9/9: 7/7: Creating table: exo_vs_cell.
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating median and variance tables.
## Basic step 2/3: Performing comparisons.
## Basic step 2/3: 1/6: Performing log2 subtraction: cell_polya_wt_vs_cell_polya_mut.
## Basic step 2/3: 2/6: Performing log2 subtraction: exo_polya_mut_vs_cell_polya_mut.
## Basic step 2/3: 3/6: Performing log2 subtraction: exo_polya_wt_vs_cell_polya_mut.
## Basic step 2/3: 4/6: Performing log2 subtraction: exo_polya_mut_vs_cell_polya_wt.
## Basic step 2/3: 5/6: Performing log2 subtraction: exo_polya_wt_vs_cell_polya_wt.
## Basic step 2/3: 6/6: Performing log2 subtraction: exo_polya_wt_vs_exo_polya_mut.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## Comparing analyses 1/6: cell_polya_wt_vs_cell_polya_mut
## Comparing analyses 2/6: exo_polya_mut_vs_cell_polya_mut
## Comparing analyses 3/6: exo_polya_wt_vs_cell_polya_mut
## Comparing analyses 4/6: exo_polya_mut_vs_cell_polya_wt
## Comparing analyses 5/6: exo_polya_wt_vs_cell_polya_wt
## Comparing analyses 6/6: exo_polya_wt_vs_exo_polya_mut
tx_keepers <- list(
"exocell_mut" = c("exo_polya_mut", "cell_polya_mut"),
"exocell_wt" = c("exo_polya_wt", "cell_polya_wt"),
"mutwt_cell" = c("cell_polya_mut", "cell_polya_wt"),
"mutwt_exo" = c("exo_polya_mut", "exo_polya_wt"),
"exo_vs_cell" = c("exo_vs_cell"))
tx_tables_svamodel <- combine_de_tables(tx_comparisons_svamodel, extra_annot=extra_data,
keepers=tx_keepers,
excel=paste0("excel/all_tx_svamodel_comparisons-v", ver, ".xlsx"))## Deleting the file excel/all_tx_svamodel_comparisons-v17.xlsx before writing the tables.
## Writing a legend of columns.
## Working on 1/5: exocell_mut
## Found table with exo_polya_mut_vs_cell_polya_mut
## Working on 2/5: exocell_wt
## Found table with exo_polya_wt_vs_cell_polya_wt
## Working on 3/5: mutwt_cell
## Found inverse table with cell_polya_wt_vs_cell_polya_mut
## Working on 4/5: mutwt_exo
## Found inverse table with exo_polya_wt_vs_exo_polya_mut
## Working on 5/5: exo_vs_cell
## Found table with exo_vs_cell
## Adding venn plots for exocell_mut.
## Limma expression coefficients for exocell_mut; R^2: 0.0305; equation: y = -0.305x - 0.408
## Warning: Removed 30257 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 30257 rows containing missing values (geom_point).
## Warning: Removed 2005 rows containing missing values (geom_point).
## Warning: Removed 5207 rows containing missing values (geom_point).
## Warning: Removed 30257 rows containing missing values (geom_point).
## Edger expression coefficients for exocell_mut; R^2: 0.0429; equation: y = -0.305x + 18.2
## Warning: Computation failed in `stat_smooth()`:
## 'Calloc' could not allocate memory (18446744071867199488 of 8 bytes)
## DESeq2 expression coefficients for exocell_mut; R^2: 0.0723; equation: y = -0.402x + 6.28
## Warning: Removed 6097 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 6097 rows containing missing values (geom_point).
## Warning: Removed 983 rows containing missing values (geom_point).
## Warning: Removed 115 rows containing missing values (geom_point).
## Warning: Removed 6097 rows containing missing values (geom_point).
## Adding venn plots for exocell_wt.
## Limma expression coefficients for exocell_wt; R^2: 0.0195; equation: y = -0.233x - 0.177
## Warning: Removed 28556 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 28556 rows containing missing values (geom_point).
## Warning: Removed 2143 rows containing missing values (geom_point).
## Warning: Removed 4865 rows containing missing values (geom_point).
## Warning: Removed 28556 rows containing missing values (geom_point).
## Edger expression coefficients for exocell_wt; R^2: 0.0234; equation: y = -0.226x + 17.3
## Warning: Computation failed in `stat_smooth()`:
## 'Calloc' could not allocate memory (18446744071867199488 of 8 bytes)
## DESeq2 expression coefficients for exocell_wt; R^2: 0.0466; equation: y = -0.303x + 6.06
## Warning: Removed 531 rows containing non-finite values (stat_smooth).
## Warning: Computation failed in `stat_smooth()`:
## 'Calloc' could not allocate memory (18446744071803228160 of 8 bytes)
## Warning: Removed 531 rows containing missing values (geom_point).
## Warning: Removed 71 rows containing missing values (geom_point).
## Warning: Removed 531 rows containing missing values (geom_point).
## Adding venn plots for mutwt_cell.
## Limma expression coefficients for mutwt_cell; R^2: 0.922; equation: y = 1.07x - 0.143
## Warning: Removed 24108 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 24108 rows containing missing values (geom_point).
## Warning: Removed 5646 rows containing missing values (geom_point).
## Warning: Removed 3497 rows containing missing values (geom_point).
## Warning: Removed 24108 rows containing missing values (geom_point).
## Edger expression coefficients for mutwt_cell; R^2: 0.904; equation: y = 0.973x + 0.178
## Warning: Computation failed in `stat_smooth()`:
## 'Calloc' could not allocate memory (18446744071867199488 of 8 bytes)
## DESeq2 expression coefficients for mutwt_cell; R^2: 0.882; equation: y = 1.03x - 0.535
## Warning: Removed 6261 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 6261 rows containing missing values (geom_point).
## Warning: Removed 1842 rows containing missing values (geom_point).
## Warning: Removed 331 rows containing missing values (geom_point).
## Warning: Removed 6261 rows containing missing values (geom_point).
## Adding venn plots for mutwt_exo.
## Limma expression coefficients for mutwt_exo; R^2: 0.804; equation: y = 0.963x - 0.229
## Warning: Removed 14434 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_vline).
## Warning: Removed 1 rows containing missing values (geom_hline).
## Warning: Removed 14434 rows containing missing values (geom_point).
## Warning: Removed 5009 rows containing missing values (geom_point).
## Warning: Removed 3523 rows containing missing values (geom_point).
## Warning: Removed 14434 rows containing missing values (geom_point).
## Edger expression coefficients for mutwt_exo; R^2: 0.855; equation: y = 0.945x + 0.712
## Warning: Computation failed in `stat_smooth()`:
## 'Calloc' could not allocate memory (18446744071867199488 of 8 bytes)
## DESeq2 expression coefficients for mutwt_exo; R^2: 0.831; equation: y = 0.973x + 0.357
## Warning: Removed 218 rows containing non-finite values (stat_smooth).
## Warning: Computation failed in `stat_smooth()`:
## 'Calloc' could not allocate memory (18446744071840866304 of 8 bytes)
## Warning: Removed 218 rows containing missing values (geom_point).
## Warning: Removed 200 rows containing missing values (geom_point).
## Warning: Removed 9 rows containing missing values (geom_point).
## Warning: Removed 218 rows containing missing values (geom_point).
## Adding venn plots for exo_vs_cell.
## Writing summary information.
## The sheet: pairwise_summary is in legend, exocell_mut, exocell_wt, mutwt_cell, mutwt_exo, exo_vs_cell, pairwise_summary.
## Attempting to add the comparison plot to pairwise_summary at row: 23 and column: 1
## Performing save of the workbook.
## Change extract_significant_genes to state exactly what conditions are used.
tx_sig_svamodel <- sm(extract_significant_genes(tx_tables_svamodel, p_type="p",
excel=paste0("excel/all_tx_svamodel_signficant-v", ver, ".xlsx"),
according_to="all"))## Let us see if we can explicitly ask for batch in the model and allow edger/deseq to fail, print the table with a bunch of blanks, and just remove the offending columns.
tx_comparisons_batchmodel <- all_pairwise(mmtx_large, model_batch=TRUE, parallel=FALSE)## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$best_libsize.
## Choosing the intercept containing model.
## Limma step 1/6: choosing model.
## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.
## Limma step 3/6: running lmFit.
## Limma step 4/6: making and fitting contrasts.
## Limma step 5/6: Running eBayes and topTable.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/10: Creating table: cell_polya_mut.
## Limma step 6/6: 2/10: Creating table: cell_polya_wt.
## Limma step 6/6: 3/10: Creating table: exo_polya_mut.
## Limma step 6/6: 4/10: Creating table: exo_polya_wt.
## Limma step 6/6: 5/10: Creating table: cell_polya_wt_vs_cell_polya_mut.
## Limma step 6/6: 6/10: Creating table: exo_polya_mut_vs_cell_polya_mut.
## Limma step 6/6: 7/10: Creating table: exo_polya_wt_vs_cell_polya_mut.
## Limma step 6/6: 8/10: Creating table: exo_polya_mut_vs_cell_polya_wt.
## Limma step 6/6: 9/10: Creating table: exo_polya_wt_vs_cell_polya_wt.
## Limma step 6/6: 10/10: Creating table: exo_polya_wt_vs_exo_polya_mut.
## Starting DESeq2 pairwise comparisons.
## The data should be suitable for EdgeR/DESeq.
## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.
## DESeq2 step 1/5: Including batch and condition in the deseq model.
## converting counts to integer mode
## DESeq2 step 2/5: Estimate size factors.
## DESeq2 step 3/5: Estimate dispersions.
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## DESeq2 step 4/5: nbinomWaldTest.
## DESeq2 step 5/5: 1/6: Creating table: cell_polya_wt_vs_cell_polya_mut
## DESeq2 step 5/5: 2/6: Creating table: exo_polya_mut_vs_cell_polya_mut
## DESeq2 step 5/5: 3/6: Creating table: exo_polya_wt_vs_cell_polya_mut
## DESeq2 step 5/5: 4/6: Creating table: exo_polya_mut_vs_cell_polya_wt
## DESeq2 step 5/5: 5/6: Creating table: exo_polya_wt_vs_cell_polya_wt
## DESeq2 step 5/5: 6/6: Creating table: exo_polya_wt_vs_exo_polya_mut
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq.
## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.
## Choosing the intercept containing model.
## EdgeR step 1/9: normalizing data.
## EdgeR step 2/9: Estimating the common dispersion.
## EdgeR step 3/9: Estimating dispersion across genes.
## EdgeR step 4/9: Estimating GLM Common dispersion.
## EdgeR step 5/9: Estimating GLM Trended dispersion.
## EdgeR step 6/9: Estimating GLM Tagged dispersion.
## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'test_type'.
## EdgeR step 8/9: Making pairwise contrasts.
## EdgeR step 9/9: 1/6: Creating table: cell_polya_wt_vs_cell_polya_mut.
## EdgeR step 9/9: 2/6: Creating table: exo_polya_mut_vs_cell_polya_mut.
## EdgeR step 9/9: 3/6: Creating table: exo_polya_wt_vs_cell_polya_mut.
## EdgeR step 9/9: 4/6: Creating table: exo_polya_mut_vs_cell_polya_wt.
## EdgeR step 9/9: 5/6: Creating table: exo_polya_wt_vs_cell_polya_wt.
## EdgeR step 9/9: 6/6: Creating table: exo_polya_wt_vs_exo_polya_mut.
## Starting basic pairwise comparison.
## Basic step 0/3: Normalizing data.
## Basic step 0/3: Converting data.
## Basic step 0/3: Transforming data.
## Basic step 1/3: Creating median and variance tables.
## Basic step 2/3: Performing comparisons.
## Basic step 2/3: 1/6: Performing log2 subtraction: cell_polya_wt_vs_cell_polya_mut.
## Basic step 2/3: 2/6: Performing log2 subtraction: exo_polya_mut_vs_cell_polya_mut.
## Basic step 2/3: 3/6: Performing log2 subtraction: exo_polya_wt_vs_cell_polya_mut.
## Basic step 2/3: 4/6: Performing log2 subtraction: exo_polya_mut_vs_cell_polya_wt.
## Basic step 2/3: 5/6: Performing log2 subtraction: exo_polya_wt_vs_cell_polya_wt.
## Basic step 2/3: 6/6: Performing log2 subtraction: exo_polya_wt_vs_exo_polya_mut.
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## Comparing analyses 1/6: cell_polya_wt_vs_cell_polya_mut
## Comparing analyses 2/6: exo_polya_mut_vs_cell_polya_mut
## Comparing analyses 3/6: exo_polya_wt_vs_cell_polya_mut
## Comparing analyses 4/6: exo_polya_mut_vs_cell_polya_wt
## Comparing analyses 5/6: exo_polya_wt_vs_cell_polya_wt
## Comparing analyses 6/6: exo_polya_wt_vs_exo_polya_mut
tx_tables_batchmodel <- sm(combine_de_tables(tx_comparisons_batchmodel,
keepers=tx_keepers,
excel="excel/all_tx_batchmodel_comparisons-v", ver, ".xlsx"))tx_sig_batchmodel <- sm(extract_significant_genes(tx_tables_batchmodel, p_type="p",
excel=paste0("excel/all_tx_batchmodel_signficant-v", ver, ".xlsx"),
according_to="all"))tmp <- sm(saveme(filename="differential_expression.rda.xz"))index.html preprocessing.html annotation.html sample_estimation.html