RNA sequencing of Saccharomyces cerevisiae wt/mutant cbf5.

This document performs the differential expression analyses of wt/mut CBF5 yeast strains. Depending on my mood, it may also include some comparisons against an external data set of upf1/2/3 mutant strains which I randomly found.

Perform simplified differential expression

My function all_pairwise() makes use of limma, deseq, edger, and implements a basic differential expression algorithm. Using it we should be able to compare/contrast the wt/mutant as well as different tools.

There is an important caveat for this data: There is exactly 1 batch and 4 replicates. Thus we need to make sure that the statistical models never attempt to include experimental batch (which is the default). Later, we will include the exogeneous data and make them batch against one another.

Do the comparison of only the wt/mutants

our_pairwise <- sm(all_pairwise(our_expt, model_batch=FALSE))
our_pairwise$comparison
## $limma_vs_edger
## $limma_vs_edger$wt_vs_mut
##    cor 
## 0.9803 
## 
## 
## $limma_vs_deseq
## $limma_vs_deseq$wt_vs_mut
##   cor 
## 0.985 
## 
## 
## $limma_vs_basic
## $limma_vs_basic$wt_vs_mut
##    cor 
## 0.9784 
## 
## 
## $edger_vs_deseq
## $edger_vs_deseq$wt_vs_mut
##   cor 
## 0.958 
## 
## 
## $edger_vs_basic
## $edger_vs_basic$wt_vs_mut
##    cor 
## 0.9583 
## 
## 
## $deseq_vs_basic
## $deseq_vs_basic$wt_vs_mut
##    cor 
## 0.9606 
## 
## 
## $limma_vs_edger_scatter
## $limma_vs_edger_scatter$wt_vs_mut

## 
## 
## $limma_vs_deseq_scatter
## $limma_vs_deseq_scatter$wt_vs_mut

## 
## 
## $limma_vs_basic_scatter
## $limma_vs_basic_scatter$wt_vs_mut

## 
## 
## $edger_vs_deseq_scatter
## $edger_vs_deseq_scatter$wt_vs_mut

## 
## 
## $edger_vs_basic_scatter
## $edger_vs_basic_scatter$wt_vs_mut

## 
## 
## $deseq_vs_basic_scatter
## $deseq_vs_basic_scatter$wt_vs_mut

## 
## 
## $comp
##    wt_vs_mut
## le    0.9803
## ld    0.9850
## ed    0.9580
## lb    0.9784
## eb    0.9583
## db    0.9606
## 
## $heat
## NULL
## The different tools agree remarkably well.

Create tables of the data and extract ‘significant’ genes

The following lines take the results from limma/deseq/edger/basic and create 2 excel workbooks: all data sig data

direction <- list(mutvwt = c("mut","wt")) ## Make certain that it compares mutant / wt and not wt / mutant!!
our_tables <- s_p(combine_de_tables(our_pairwise, keepers=direction, excel="excel/our-mut_vs_wt.xlsx"))$result
## Deleting the file excel/our-mut_vs_wt.xlsx before writing the tables.
## Writing a legend of columns.
## Working on 1/1: mutvwt
## Found inverse table with wt_vs_mut
## Running create_combined_table with inverse=TRUE
## The table is: wt_vs_mut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## Attempting to add a coefficient plot for mutvwt at column 52
## Warning: Removed 878 rows containing non-finite values (stat_smooth).
## Warning: Removed 878 rows containing missing values (geom_point).
## Warning: Removed 168 rows containing missing values (geom_point).
## Warning: Removed 149 rows containing missing values (geom_point).
## Warning: Removed 878 rows containing missing values (geom_point).

## Warning: Removed 137 rows containing non-finite values (stat_smooth).
## Warning: Removed 137 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning: Removed 137 rows containing missing values (geom_point).
## Writing summary information.
## The sheet pairwise_summary already exists, it will get overwritten
## Attempting to add the comparison plot to pairwise_summary at row: 19 and column: 1

## Performing save of the workbook.

our_sig <- sm(extract_significant_genes(our_tables, according_to="all", excel="excel/our-sig_mut_vs_wt.xlsx"))

Do a similar comparison but including the exogeneous data

If I want to try including the exogeneous data, then I will need to make sure batch stays in the model and/or I will need to use batch-adjusted data and remove batch from the model. I should probably try both ways and see how they compare…

Also note that I combined upf1/upf2/upf3 into one condition called ‘nmd’.

exo_expt$state
## $lowfilter
## [1] "raw"
## 
## $normalization
## [1] "raw"
## 
## $conversion
## [1] "raw"
## 
## $batch
## [1] "raw"
## 
## $transform
## [1] "raw"
## Make sure that the exo expt hasn't been screwed up

exo_pairwise <- all_pairwise(exo_expt)
## Starting limma pairwise comparison.
## libsize was not specified, this parameter has profound effects on limma's result.
## Using the libsize from expt$libsize.
## Choosing the intercept containing model.
## Limma step 1/6: choosing model.
## Limma step 2/6: running voom
## The voom input was not cpm, converting now.
## The voom input was not log2, transforming now.
## Limma step 3/6: running lmFit
## Limma step 4/6: making and fitting contrasts.
## Limma step 5/6: Running eBayes and topTable.
## Limma step 6/6: Writing limma outputs.
## Limma step 6/6: 1/6: Printing table: mut.
## Limma step 6/6: 2/6: Printing table: nmd.
## Limma step 6/6: 3/6: Printing table: wt.
## Limma step 6/6: 4/6: Printing table: nmd_vs_mut.
## Limma step 6/6: 5/6: Printing table: wt_vs_mut.
## Limma step 6/6: 6/6: Printing table: wt_vs_nmd.
## Starting DESeq2 pairwise comparisons.
## The data should be suitable for EdgeR/DESeq.
## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.
## Choosing the intercept containing model.
## DESeq2 step 1/5: Including batch and condition in the deseq model.
## DESeq2 step 2/5: Estimate size factors.
## DESeq2 step 3/5: Estimate dispersions.
## DESeq2 step 4/5: nbinomWaldTest.
## DESeq2 step 5/5: 1/3: Printing table: nmd_vs_mut
## DESeq2 step 5/5: 2/3: Printing table: wt_vs_mut
## DESeq2 step 5/5: 3/3: Printing table: wt_vs_nmd
## Collected coefficients for: mut
## Collected coefficients for: nmd
## Collected coefficients for: wt
## Starting edgeR pairwise comparisons.
## The data should be suitable for EdgeR/DESeq.
## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.
## Choosing the intercept containing model.
## EdgeR step 1/9: normalizing data.
## EdgeR step 9/9: 1/3: Printing table: nmd_vs_mut.
## EdgeR step 9/9: 2/3: Printing table: wt_vs_mut.
## EdgeR step 9/9: 3/3: Printing table: wt_vs_nmd.
## Starting basic pairwise comparison.
## This basic pairwise function assumes log2, converted, normalized counts, normalizing now.
## Basic step 1/3: Creating median and variance tables.
## Basic step 2/3: Performing comparisons.
## Basic step 2/3: 1/3: Performing log2 subtraction: nmd_vs_mut
## Basic step 2/3: 2/3: Performing log2 subtraction: wt_vs_mut
## Basic step 2/3: 3/3: Performing log2 subtraction: wt_vs_nmd
## Basic step 3/3: Creating faux DE Tables.
## Basic: Returning tables.
## Comparing analyses 1/3: nmd_vs_mut
## Comparing analyses 2/3: wt_vs_mut
## Comparing analyses 3/3: wt_vs_nmd

directions <- list(mutvwt = c("mut","wt"),
                   upfvwt = c("nmd","wt"),
                   upfvmut = c("nmd","mut"))
exo_tables <- combine_de_tables(exo_pairwise, keepers=directions, excel="excel/exo-mut_vs_wt.xlsx")
## Writing a legend of columns.
## Working on 1/3: mutvwt
## Found inverse table with wt_vs_mut
## Running create_combined_table with inverse=TRUE
## The table is: wt_vs_mut
## Working on 2/3: upfvwt
## Found inverse table with wt_vs_nmd
## Running create_combined_table with inverse=TRUE
## The table is: wt_vs_nmd
## Working on 3/3: upfvmut
## Found table with nmd_vs_mut
## Running create_combined_table with inverse=FALSE
## The table is: nmd_vs_mut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## Attempting to add a coefficient plot for mutvwt at column 52
## Warning: Removed 878 rows containing non-finite values (stat_smooth).
## Warning: Removed 878 rows containing missing values (geom_point).
## Warning: Removed 168 rows containing missing values (geom_point).
## Warning: Removed 146 rows containing missing values (geom_point).
## Warning: Removed 878 rows containing missing values (geom_point).

## Warning: Removed 109 rows containing non-finite values (stat_smooth).
## Warning: Removed 109 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 6 rows containing missing values (geom_point).
## Warning: Removed 109 rows containing missing values (geom_point).
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## Attempting to add a coefficient plot for upfvwt at column 52

## Warning: Removed 847 rows containing non-finite values (stat_smooth).
## Warning: Removed 847 rows containing missing values (geom_point).
## Warning: Removed 165 rows containing missing values (geom_point).
## Warning: Removed 302 rows containing missing values (geom_point).
## Warning: Removed 847 rows containing missing values (geom_point).

## Warning: Removed 108 rows containing non-finite values (stat_smooth).
## Warning: Removed 108 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 108 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_smooth).
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## Attempting to add a coefficient plot for upfvmut at column 52

## Warning: Removed 860 rows containing non-finite values (stat_smooth).
## Warning: Removed 860 rows containing missing values (geom_point).
## Warning: Removed 253 rows containing missing values (geom_point).
## Warning: Removed 146 rows containing missing values (geom_point).
## Warning: Removed 860 rows containing missing values (geom_point).

## Warning: Removed 107 rows containing non-finite values (stat_smooth).
## Warning: Removed 107 rows containing missing values (geom_point).

## Warning: Removed 107 rows containing missing values (geom_point).
## Writing summary information.
## The sheet pairwise_summary already exists, it will get overwritten
## Attempting to add the comparison plot to pairwise_summary at row: 21 and column: 1

## Performing save of the workbook.

## haha that is awesome, the basic analysis completely fails (which it should!!)
exo_sig <- extract_significant_genes(exo_tables, according_to="all", excel="excel/exo-sig_mut_vs_wt.xlsx")
## Writing excel data sheet 1/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 2215 genes.
## After (adj)p filter, the down genes table has 1249 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 460 genes.
## After fold change filter, the down genes table has 363 genes.
## Writing excel data sheet 2/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 1223 genes.
## After (adj)p filter, the down genes table has 71 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 806 genes.
## After fold change filter, the down genes table has 25 genes.
## Writing excel data sheet 3/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 1483 genes.
## After (adj)p filter, the down genes table has 866 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 1090 genes.
## After fold change filter, the down genes table has 360 genes.
## Printing significant genes to the file: excel/exo-sig_mut_vs_wt.xlsx
## Converted change_counts_up to characters.
## Converted change_counts_down to characters.
## 1/3: Writing excel data sheet up_limma_mutvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 1/3: Writing excel data sheet down_limma_mutvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 2/3: Writing excel data sheet up_limma_upfvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 2/3: Writing excel data sheet down_limma_upfvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 3/3: Writing excel data sheet up_limma_upfvmut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 3/3: Writing excel data sheet down_limma_upfvmut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## Writing changed genes summary on last sheet.
## Writing excel data sheet 4/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 1752 genes.
## After (adj)p filter, the down genes table has 1789 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 282 genes.
## After fold change filter, the down genes table has 511 genes.
## Writing excel data sheet 5/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 716 genes.
## After (adj)p filter, the down genes table has 928 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 481 genes.
## After fold change filter, the down genes table has 82 genes.
## Writing excel data sheet 6/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 1126 genes.
## After (adj)p filter, the down genes table has 1598 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 797 genes.
## After fold change filter, the down genes table has 677 genes.
## Printing significant genes to the file: excel/exo-sig_mut_vs_wt.xlsx
## The sheet number_changed_genes already exists, it will get overwritten
## Converted change_counts_up to characters.
## Converted change_counts_down to characters.
## 1/3: Writing excel data sheet up_edger_mutvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 1/3: Writing excel data sheet down_edger_mutvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 2/3: Writing excel data sheet up_edger_upfvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 2/3: Writing excel data sheet down_edger_upfvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 3/3: Writing excel data sheet up_edger_upfvmut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 3/3: Writing excel data sheet down_edger_upfvmut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## Writing changed genes summary on last sheet.
## Writing excel data sheet 7/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 1737 genes.
## After (adj)p filter, the down genes table has 1909 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 236 genes.
## After fold change filter, the down genes table has 458 genes.
## Writing excel data sheet 8/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 792 genes.
## After (adj)p filter, the down genes table has 1222 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 424 genes.
## After fold change filter, the down genes table has 80 genes.
## Writing excel data sheet 9/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 1259 genes.
## After (adj)p filter, the down genes table has 1692 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 770 genes.
## After fold change filter, the down genes table has 628 genes.
## Printing significant genes to the file: excel/exo-sig_mut_vs_wt.xlsx
## The sheet number_changed_genes already exists, it will get overwritten
## Converted change_counts_up to characters.
## Converted change_counts_down to characters.
## 1/3: Writing excel data sheet up_deseq_mutvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 1/3: Writing excel data sheet down_deseq_mutvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 2/3: Writing excel data sheet up_deseq_upfvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 2/3: Writing excel data sheet down_deseq_upfvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 3/3: Writing excel data sheet up_deseq_upfvmut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 3/3: Writing excel data sheet down_deseq_upfvmut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## Writing changed genes summary on last sheet.
## Writing excel data sheet 10/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 345 genes.
## After (adj)p filter, the down genes table has 154 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 118 genes.
## After fold change filter, the down genes table has 70 genes.
## Writing excel data sheet 11/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 1703 genes.
## After (adj)p filter, the down genes table has 1943 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 1274 genes.
## After fold change filter, the down genes table has 1306 genes.
## Writing excel data sheet 12/6
## Assuming the fold changes are on the log scale and so taking >< 0
## After (adj)p filter, the up genes table has 2529 genes.
## After (adj)p filter, the down genes table has 2579 genes.
## Assuming the fold changes are on the log scale and so taking -1 * fc
## After fold change filter, the up genes table has 1543 genes.
## After fold change filter, the down genes table has 1517 genes.
## Printing significant genes to the file: excel/exo-sig_mut_vs_wt.xlsx
## The sheet number_changed_genes already exists, it will get overwritten
## Converted change_counts_up to characters.
## Converted change_counts_down to characters.
## 1/3: Writing excel data sheet up_basic_mutvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 1/3: Writing excel data sheet down_basic_mutvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 2/3: Writing excel data sheet up_basic_upfvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 2/3: Writing excel data sheet down_basic_upfvwt
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 3/3: Writing excel data sheet up_basic_upfvmut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## 3/3: Writing excel data sheet down_basic_upfvmut
## Converted seqnames to characters.
## Converted strand to characters.
## Converted source to characters.
## Converted type to characters.
## Writing changed genes summary on last sheet.
## Try again using batch-adjusted data and no batch in model
exo_data <- sm(normalize_expt(exo_expt, filter=TRUE, batch="sva"))
exo_batch_pairwise <- sm(all_pairwise(exo_expt, model_batch=FALSE, force=TRUE))

exo_batch_tables <- sm(combine_de_tables(exo_batch_pairwise, keepers=directions, excel="excel/exo-batch-mut_vs_wt.xlsx"))

exo_sig <- sm(extract_significant_genes(exo_batch_tables, according_to="all", excel="excel/exo-sig-batch-mut_vs_wt.xlsx"))
## awesome, the basic analysis is once again unable to deal with non-canonical data.  This is a good thing I think.