Shared data for RNAseq of S.cerevisiae: Annotation data using R’s SacCer package and biomart.

index.html annotation.html

Hmm well, setting up to do this was rather more difficult than I expected. I was able to fairly easily collect the set of introns according to ensembl, yeastgenome.org, and the ncbi copy of the yeast intron database. Interestingly, the only multi-intronic genes in yeast are all in the mitochrondrial genome – where (for example) COX1 has 8 introns!

The problem though, lies in the fact that I need to get my new intron information into a gff file in order to count them. A few mis-steps lead me to eventually cross reference the intron data against my existing gff database where I found that the coordinates did not quite match because of version mismatches in the genomes…

So, I decided to just use the existing gff annotation and extract the introns from it, and add them to it. The genometools software package has some functionality to do that automatically. However, that in turns requires a very perfectly formatted gff3 file… The yeast genome gff we have is not, so I wrote a perl script to reformat it as a canonical gff3. I then did

gt gff3 -addintrons testme.gff

This gave me a new testme.gff file with 427 new lines which looked like:

I       .       intron  87388   87500   .       +       .       .

Well, that is pretty close to what I need, but I cannot count the intron reads without the 9th column of the gff filled out. Nothing I did was able to fill in this final column, so I eventually decided to open the gff in the emacs text editor and write a small macro which takes the annotation from the previous line’s 9th column and copy/paste it to replace the terminal ‘.’ of the current line. It took about 15 minutes for me to hit control-s control-3 and invoke my macro on every intron feature, but the end result was a file with lines which look like:

I       .       intron  87388   87500   .       +       .    ID="SNC1";exon_number="1";gene_id="YAL030W";p_id="P3423";seqedit="false";transcript_id="YAL030W";transcript_name="SNC1";tss_id="TSS6280"

which is exactly what I need. The resulting file is in reference under scerevisiae_introns.gff.

Finally, armed with this new annotation database, I logged into the computer cluster and did a for loop over the samples to run:

cd ${sample}/outputs/bowtie2_scerevisiae/ && htseq-count -q -f bam -s no -i ID -t intron \ ${sample}_forward-trimmed.bam ${wd}/reference/scerevisiae_introns.gff 1>introns.count
2>introns.count.err && xz -9e introns.count

This allows me to add a new column in my sample sheet excel file ‘introns_file’…

intron_counts <- sm(create_expt("sample_sheets/all_samples.xlsx", gene_info=sc_gff_annotations,
                                file_column="intron_file"))
## The same mappings for the genome are in v3_expt

## Let us check that I mapped correctly and can xref these data...
head(Biobase::exprs(intron_counts$expressionset))

##           hpgl0774 hpgl0775 hpgl0776 hpgl0777 hpgl0778 hpgl0779 hpgl0780 hpgl0781
## X21S_rRNA        0        0        0        0        0        0        0        0
## ABP140         271      468      188      301      326      159      179      288
## ACT1           227      369      344      436      269      499      298      341
## AI2              0        0        0        0        0        0        0        0
## AI3              0        0        0        0        0        0        0        0
## AI4              0        0        0        0        0        0        0        0
##           hpgl0782 hpgl0783 hpgl0784 hpgl0785 hpgl0786 hpgl0787 hpgl0788 hpgl0789
## X21S_rRNA        0        0        0        0        0        0        0        0
## ABP140         127       83      131       59      224       53      146       39
## ACT1           337      396      246      376      342      478      384      359
## AI2              0        0        0        0        0        0        0        0
## AI3              0        0        0        0        0        0        0        0
## AI4              0        0        0        0        0        0        0        0

head(Biobase::exprs(v3_expt$expressionset))

##           hpgl0774 hpgl0775 hpgl0776 hpgl0777 hpgl0778 hpgl0779 hpgl0780 hpgl0781
## X15S_rRNA        0        0        0        0        0        0        0        0
## X21S_rRNA        0        0        0        0        0        0        0        0
## AAC1           536      477      743      443      634      188      763      414
## AAC3           126      216       93      765      152      154      102      738
## AAD10         1784     1928     2327     3869     2172      994     2472     3551
## AAD14         1054      901     1222     1863     1106      836     1307     1588
##           hpgl0782 hpgl0783 hpgl0784 hpgl0785 hpgl0786 hpgl0787 hpgl0788 hpgl0789
## X15S_rRNA        0        0        0        0        0        0        0        0
## X21S_rRNA        0        0        0        0        0        0        0        0
## AAC1           175      145      140      237      124      142      141      181
## AAC3           295      119      341      542      210      118      438     1071
## AAD10          365      589     1476     1593      352      542     1782     2082
## AAD14          542      766     1580     1814      439      795     1924     2333

## giggity looks good.
dim(Biobase::exprs(intron_counts$expressionset))

## [1] 396  16

dim(Biobase::exprs(v3_expt$expressionset))

## [1] 7126   16

Ok so now I have 2 datasets, one which is rather larger than the other. As I understand the stated goal from Dr. Mount, I want to ratio the intron/gene counts for each intron containing gene and see if there is a significant difference across conditions…

To do that I am thinking I will copy the existing expressionset to something like ‘intron_ratio’, extract the matrices of counts from the big and little datasets for only the intron containing genes, and finally do the division.

intron_ratio <- intron_counts

intron_mtrx <- as.matrix(Biobase::exprs(intron_counts$expressionset))
all_mtrx <- as.matrix(Biobase::exprs(v3_expt$expressionset))
all_subset <- all_mtrx[rownames(intron_mtrx), ]
intron_ratio_mtrx <- intron_mtrx / all_subset
nans <- is.nan(intron_ratio_mtrx)
intron_ratio_mtrx[nans] <- 0

Biobase::exprs(intron_ratio$expressionset) <- intron_ratio_mtrx

ratio_norm <- normalize_expt(intron_ratio, filter="simple")

## This function will replace the expt$expressionset slot with:

## simple(data)

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Leaving the data in its current base format, keep in mind that
##  some metrics are easier to see when the data is log2 transformed, but
##  EdgeR/DESeq do not accept transformed data.

## Leaving the data unconverted.  It is often advisable to cpm/rpkm
##  the data to normalize for sampling differences, keep in mind though that rpkm
##  has some annoying biases, and voom() by default does a cpm (though hpgl_voom()
##  will try to detect this).

## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.

## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.

## Step 1: performing count filter with option: simple

## Removing 331 low-count genes (65 remaining).

## Step 2: not normalizing the data.

## Step 3: not converting the data.

## Step 4: not transforming the data.

## Step 5: not doing batch correction.

plot_density(ratio_norm)

## This data will benefit from being displayed on the log scale.

## If this is not desired, set scale='raw'

## Some entries are 0.  We are on log scale, setting them to 0.5.

## Changed 236 zero count features.

That seems a dumb way to look at this, lets try something else.

## How about I subset the v3 data to as above
## Then do identical log2(cpm(something)) normalizations
## Then subtract the log scale numbers or something?

## Set1 will be the intron counts, set2 will be the exon counts subsetted for the intron genes.
intron_set1 <- intron_counts
intron_set2 <- intron_counts
## replace the data with the exon counts so that I now have 2 data sets of the same genes.
Biobase::exprs(intron_set2$expressionset) <- all_subset

norm_set1 <- normalize_expt(intron_set1, filter=TRUE, norm="quant", convert="cpm", transform="log2")

## This function will replace the expt$expressionset slot with:

## log2(cpm(quant(cbcb(data))))

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.

## Step 1: performing count filter with option: cbcb

## Removing 97 low-count genes (299 remaining).

## Step 2: normalizing the data with quant.

## Step 3: converting the data with cpm.

## Step 4: transforming the data with log2.

## transform_counts: Found 253 values equal to 0, adding 1 to the matrix.

## Step 5: not doing batch correction.

norm_set2 <- normalize_expt(intron_set2, filter=TRUE, norm="quant", convert="cpm", transform="log2")

## This function will replace the expt$expressionset slot with:

## log2(cpm(quant(cbcb(data))))

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.

## Step 1: performing count filter with option: cbcb

## Removing 85 low-count genes (311 remaining).

## Step 2: normalizing the data with quant.

## Step 3: converting the data with cpm.

## Step 4: transforming the data with log2.

## transform_counts: Found 2 values equal to 0, adding 1 to the matrix.

## Step 5: not doing batch correction.

set1_pca <- plot_pca(norm_set1)
set1_pca$plot

set2_pca <- plot_pca(norm_set2)
set2_pca$plot

hmm well that is interesting. There remains a fairly profound batch effect, but some other differences seemed to jump out. Should I try sva or somesuch?

norm_set3 <- normalize_expt(intron_set1, filter=TRUE, convert="cpm",
                            transform="log2", batch="fsva")

## This function will replace the expt$expressionset slot with:

## log2(fsva(cpm(cbcb(data))))

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.

## Step 1: performing count filter with option: cbcb

## Removing 97 low-count genes (299 remaining).

## Step 2: not normalizing the data.

## Step 3: converting the data with cpm.

## Step 4: transforming the data with log2.

## transform_counts: Found 338 values equal to 0, adding 1 to the matrix.

## Step 5: doing batch correction with fsva.

## Note to self:  If you get an error like 'x contains missing values'; I think this
##  means that the data has too many 0's and needs to have a better low-count filter applied.

## batch_counts: Before batch correction, 338 entries are >= 0.

## batch_counts: Using sva::fsva for batch correction.

## The number of elements which are < 0 after batch correction is: 121

## The variable low_to_zero sets whether to change <0 values to 0 and is: FALSE

norm_set4 <- normalize_expt(intron_set2, filter=TRUE, convert="cpm",
                            transform="log2", batch="fsva")

## This function will replace the expt$expressionset slot with:

## log2(fsva(cpm(cbcb(data))))

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.

## Step 1: performing count filter with option: cbcb

## Removing 85 low-count genes (311 remaining).

## Step 2: not normalizing the data.

## Step 3: converting the data with cpm.

## Step 4: transforming the data with log2.

## transform_counts: Found 101 values equal to 0, adding 1 to the matrix.

## Step 5: doing batch correction with fsva.

## Note to self:  If you get an error like 'x contains missing values'; I think this
##  means that the data has too many 0's and needs to have a better low-count filter applied.

## batch_counts: Before batch correction, 102 entries 0<x<1.

## batch_counts: Before batch correction, 101 entries are >= 0.

## batch_counts: Using sva::fsva for batch correction.

## The number of elements which are < 0 after batch correction is: 27

## The variable low_to_zero sets whether to change <0 values to 0 and is: FALSE

set3_pca <- plot_pca(norm_set3)
set3_pca$plot

set4_pca <- plot_pca(norm_set4)
set4_pca$plot

set3_cor <- plot_corheat(norm_set3)

set4_cor <- plot_corheat(norm_set4)

If I read this correctly, taking in the surrogate variables makes them look similar. Oh crap, this data still has all the tRNA features! Hmm that is actually only 20 features of my remaining 301.

Yeah I really am not sure how to think about this, so I am defaulting to just treating it like another expressionset, which is almost certainly incorrect. I think I will send this to Carol, Briggs, and perhaps Dr. Mount and see what they think.

Let us try some simple contrasts, I am guessing it is best to keep sva in the mix on both sides.

norm_set1 <- normalize_expt(intron_set1, filter=TRUE)

## This function will replace the expt$expressionset slot with:

## cbcb(data)

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Leaving the data in its current base format, keep in mind that
##  some metrics are easier to see when the data is log2 transformed, but
##  EdgeR/DESeq do not accept transformed data.

## Leaving the data unconverted.  It is often advisable to cpm/rpkm
##  the data to normalize for sampling differences, keep in mind though that rpkm
##  has some annoying biases, and voom() by default does a cpm (though hpgl_voom()
##  will try to detect this).

## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.

## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.

## Step 1: performing count filter with option: cbcb

## Removing 97 low-count genes (299 remaining).

## Step 2: not normalizing the data.

## Step 3: not converting the data.

## Step 4: not transforming the data.

## Step 5: not doing batch correction.

norm1_pairwise <- all_pairwise(norm_set1, model_batch="svaunsup", parallel=FALSE,
                               which_voom="voom", deseq_method="short")

## The be method chose 1 surrogate variable(s).

## Did not understand svaunsup, assuming supervised sva.

## sva warning: controls provided so supervised sva is being performed.
## Number of significant surrogate variables is:  1

## Starting limma pairwise comparison.

## libsize was not specified, this parameter has profound effects on limma's result.

## Using the libsize from expt$best_libsize.

## Including batch estimates from sva/ruv/pca in the model.

## Choosing the intercept containing model.

## Limma step 1/6: choosing model.

## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.

## Limma step 3/6: running lmFit.

## Limma step 4/6: making and fitting contrasts.

## Limma step 5/6: Running eBayes and topTable.

## Limma step 6/6: Writing limma outputs.

## Limma step 6/6: 1/10: Creating table: mtc_mtu.

## The qvalue estimation failed for mtc_mtu.

## Limma step 6/6: 2/10: Creating table: mtc_wtu.

## The qvalue estimation failed for mtc_wtu.

## Limma step 6/6: 3/10: Creating table: wtc_mtu.

## The qvalue estimation failed for wtc_mtu.

## Limma step 6/6: 4/10: Creating table: wtc_wtu.

## The qvalue estimation failed for wtc_wtu.

## Limma step 6/6: 5/10: Creating table: mtc_wtu_vs_mtc_mtu.

## Limma step 6/6: 6/10: Creating table: wtc_mtu_vs_mtc_mtu.

## Limma step 6/6: 7/10: Creating table: wtc_wtu_vs_mtc_mtu.

## Limma step 6/6: 8/10: Creating table: wtc_mtu_vs_mtc_wtu.

## Limma step 6/6: 9/10: Creating table: wtc_wtu_vs_mtc_wtu.

## Limma step 6/6: 10/10: Creating table: wtc_wtu_vs_wtc_mtu.

## Starting DESeq2 pairwise comparisons.

## The data should be suitable for EdgeR/DESeq.

## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.

## Including batch estimates from sva/ruv/pca in the model.

## DESeq2 step 1/5: Including a matrix of batch estimates from sva/ruv/pca in the deseq model.

## converting counts to integer mode

## DESeq steps 2-4 in one shot.

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## DESeq2 step 5/5: 1/6: Creating table: mtc_wtu_vs_mtc_mtu

## DESeq2 step 5/5: 2/6: Creating table: wtc_mtu_vs_mtc_mtu

## DESeq2 step 5/5: 3/6: Creating table: wtc_wtu_vs_mtc_mtu

## DESeq2 step 5/5: 4/6: Creating table: wtc_mtu_vs_mtc_wtu

## DESeq2 step 5/5: 5/6: Creating table: wtc_wtu_vs_mtc_wtu

## DESeq2 step 5/5: 6/6: Creating table: wtc_wtu_vs_wtc_mtu

## Starting edgeR pairwise comparisons.

## The data should be suitable for EdgeR/DESeq.

## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.

## Including batch estimates from sva/ruv/pca in the model.

## Choosing the intercept containing model.

## EdgeR step 1/9: normalizing data.

## EdgeR step 2/9: Estimating the common dispersion.

## EdgeR step 3/9: Estimating dispersion across genes.

## EdgeR step 4/9: Estimating GLM Common dispersion.

## EdgeR step 5/9: Estimating GLM Trended dispersion.

## EdgeR step 6/9: Estimating GLM Tagged dispersion.

## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'test_type'.

## EdgeR step 8/9: Making pairwise contrasts.

## EdgeR step 9/9: 1/6: Creating table: mtc_wtu_vs_mtc_mtu.

## EdgeR step 9/9: 2/6: Creating table: wtc_mtu_vs_mtc_mtu.

## EdgeR step 9/9: 3/6: Creating table: wtc_wtu_vs_mtc_mtu.

## EdgeR step 9/9: 4/6: Creating table: wtc_mtu_vs_mtc_wtu.

## EdgeR step 9/9: 5/6: Creating table: wtc_wtu_vs_mtc_wtu.

## EdgeR step 9/9: 6/6: Creating table: wtc_wtu_vs_wtc_mtu.

## Starting basic pairwise comparison.

## Basic step 0/3: Normalizing data.

## Basic step 0/3: Converting data.

## Basic step 0/3: Transforming data.

## Basic step 1/3: Creating median and variance tables.

## Basic step 2/3: Performing comparisons.

## Basic step 2/3: 1/6: Performing log2 subtraction: mtc_wtu_vs_mtc_mtu.

## Basic step 2/3: 2/6: Performing log2 subtraction: wtc_mtu_vs_mtc_mtu.

## Basic step 2/3: 3/6: Performing log2 subtraction: wtc_wtu_vs_mtc_mtu.

## Basic step 2/3: 4/6: Performing log2 subtraction: wtc_mtu_vs_mtc_wtu.

## Basic step 2/3: 5/6: Performing log2 subtraction: wtc_wtu_vs_mtc_wtu.

## Basic step 2/3: 6/6: Performing log2 subtraction: wtc_wtu_vs_wtc_mtu.

## Basic step 3/3: Creating faux DE Tables.

## Basic: Returning tables.

## Comparing analyses 1/6: mtc_wtu_vs_mtc_mtu

## Comparing analyses 2/6: wtc_mtu_vs_mtc_mtu

## Comparing analyses 3/6: wtc_wtu_vs_mtc_mtu

## Comparing analyses 4/6: wtc_mtu_vs_mtc_wtu

## Comparing analyses 5/6: wtc_wtu_vs_mtc_wtu

## Comparing analyses 6/6: wtc_wtu_vs_wtc_mtu

norm_set2 <- normalize_expt(intron_set2, filter=TRUE)

## This function will replace the expt$expressionset slot with:

## cbcb(data)

## It backs up the current data into a slot named:
##  expt$backup_expressionset. It will also save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
##  when invoking limma.  The appropriate libsize is the non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize

## Leaving the data in its current base format, keep in mind that
##  some metrics are easier to see when the data is log2 transformed, but
##  EdgeR/DESeq do not accept transformed data.

## Leaving the data unconverted.  It is often advisable to cpm/rpkm
##  the data to normalize for sampling differences, keep in mind though that rpkm
##  has some annoying biases, and voom() by default does a cpm (though hpgl_voom()
##  will try to detect this).

## Leaving the data unnormalized.  This is necessary for DESeq, but
##  EdgeR/limma might benefit from normalization.  Good choices include quantile,
##  size-factor, tmm, etc.

## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.

## Step 1: performing count filter with option: cbcb

## Removing 85 low-count genes (311 remaining).

## Step 2: not normalizing the data.

## Step 3: not converting the data.

## Step 4: not transforming the data.

## Step 5: not doing batch correction.

norm2_pairwise <- all_pairwise(norm_set2, model_batch="svaunsup", parallel=FALSE,
                               which_voom="voom", deseq_method="short")

## The be method chose 1 surrogate variable(s).

## Did not understand svaunsup, assuming supervised sva.

## sva warning: controls provided so supervised sva is being performed.
## Number of significant surrogate variables is:  1

## Starting limma pairwise comparison.

## libsize was not specified, this parameter has profound effects on limma's result.

## Using the libsize from expt$best_libsize.

## Including batch estimates from sva/ruv/pca in the model.

## Choosing the intercept containing model.

## Limma step 1/6: choosing model.

## Limma step 2/6: running limma::voom(), switch with the argument 'which_voom'.

## Limma step 3/6: running lmFit.

## Limma step 4/6: making and fitting contrasts.

## Limma step 5/6: Running eBayes and topTable.

## Limma step 6/6: Writing limma outputs.

## Limma step 6/6: 1/10: Creating table: mtc_mtu.

## Limma step 6/6: 2/10: Creating table: mtc_wtu.

## The qvalue estimation failed for mtc_wtu.

## Limma step 6/6: 3/10: Creating table: wtc_mtu.

## Limma step 6/6: 4/10: Creating table: wtc_wtu.

## Limma step 6/6: 5/10: Creating table: mtc_wtu_vs_mtc_mtu.

## Limma step 6/6: 6/10: Creating table: wtc_mtu_vs_mtc_mtu.

## Limma step 6/6: 7/10: Creating table: wtc_wtu_vs_mtc_mtu.

## Limma step 6/6: 8/10: Creating table: wtc_mtu_vs_mtc_wtu.

## Limma step 6/6: 9/10: Creating table: wtc_wtu_vs_mtc_wtu.

## Limma step 6/6: 10/10: Creating table: wtc_wtu_vs_wtc_mtu.

## Starting DESeq2 pairwise comparisons.

## The data should be suitable for EdgeR/DESeq.

## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.

## Including batch estimates from sva/ruv/pca in the model.

## DESeq2 step 1/5: Including a matrix of batch estimates from sva/ruv/pca in the deseq model.

## converting counts to integer mode

## DESeq steps 2-4 in one shot.

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

## DESeq2 step 5/5: 1/6: Creating table: mtc_wtu_vs_mtc_mtu

## DESeq2 step 5/5: 2/6: Creating table: wtc_mtu_vs_mtc_mtu

## DESeq2 step 5/5: 3/6: Creating table: wtc_wtu_vs_mtc_mtu

## DESeq2 step 5/5: 4/6: Creating table: wtc_mtu_vs_mtc_wtu

## DESeq2 step 5/5: 5/6: Creating table: wtc_wtu_vs_mtc_wtu

## DESeq2 step 5/5: 6/6: Creating table: wtc_wtu_vs_wtc_mtu

## Starting edgeR pairwise comparisons.

## The data should be suitable for EdgeR/DESeq.

## If EdgeR/DESeq freaks out, check the state of the count table and ensure that it is in integer counts.

## Including batch estimates from sva/ruv/pca in the model.

## Choosing the intercept containing model.

## EdgeR step 1/9: normalizing data.

## EdgeR step 2/9: Estimating the common dispersion.

## EdgeR step 3/9: Estimating dispersion across genes.

## EdgeR step 4/9: Estimating GLM Common dispersion.

## EdgeR step 5/9: Estimating GLM Trended dispersion.

## EdgeR step 6/9: Estimating GLM Tagged dispersion.

## EdgeR step 7/9: Running glmFit, switch to glmQLFit by changing the argument 'test_type'.

## EdgeR step 8/9: Making pairwise contrasts.

## EdgeR step 9/9: 1/6: Creating table: mtc_wtu_vs_mtc_mtu.

## EdgeR step 9/9: 2/6: Creating table: wtc_mtu_vs_mtc_mtu.

## EdgeR step 9/9: 3/6: Creating table: wtc_wtu_vs_mtc_mtu.

## EdgeR step 9/9: 4/6: Creating table: wtc_mtu_vs_mtc_wtu.

## EdgeR step 9/9: 5/6: Creating table: wtc_wtu_vs_mtc_wtu.

## EdgeR step 9/9: 6/6: Creating table: wtc_wtu_vs_wtc_mtu.

## Starting basic pairwise comparison.

## Basic step 0/3: Normalizing data.

## Basic step 0/3: Converting data.

## Basic step 0/3: Transforming data.

## Basic step 1/3: Creating median and variance tables.

## Basic step 2/3: Performing comparisons.

## Basic step 2/3: 1/6: Performing log2 subtraction: mtc_wtu_vs_mtc_mtu.

## Basic step 2/3: 2/6: Performing log2 subtraction: wtc_mtu_vs_mtc_mtu.

## Basic step 2/3: 3/6: Performing log2 subtraction: wtc_wtu_vs_mtc_mtu.

## Basic step 2/3: 4/6: Performing log2 subtraction: wtc_mtu_vs_mtc_wtu.

## Basic step 2/3: 5/6: Performing log2 subtraction: wtc_wtu_vs_mtc_wtu.

## Basic step 2/3: 6/6: Performing log2 subtraction: wtc_wtu_vs_wtc_mtu.

## Basic step 3/3: Creating faux DE Tables.

## Basic: Returning tables.

## Comparing analyses 1/6: mtc_wtu_vs_mtc_mtu

## Comparing analyses 2/6: wtc_mtu_vs_mtc_mtu

## Comparing analyses 3/6: wtc_wtu_vs_mtc_mtu

## Comparing analyses 4/6: wtc_mtu_vs_mtc_wtu

## Comparing analyses 5/6: wtc_wtu_vs_mtc_wtu

## Comparing analyses 6/6: wtc_wtu_vs_wtc_mtu

intron_keepers <- list(
    "mcwu_vs_wcwu" = c("mtc_wtu", "wtc_wtu"),
    "wcmu_vs_wcwu" = c("wtc_mtu", "wtc_wtu"),
    "mcmu_vw_wcwu" = c("mtc_mtu", "wtc_wtu"),
    "mcmu_vs_mcwu" = c("mtc_mtu", "mtc_wtu"),
    "mcmu_vs_wcmu" = c("mtc_mtu", "wtc_mtu"),
    "mcwu_vs_wcmu" = c("mtc_wtu", "wtc_mtu"),
    "wcmu_vs_mcwu" = c("wtc_mtu", "mtc_wtu"))
intron1_tables <- combine_de_tables(norm1_pairwise, keepers=intron_keepers,
                                    excel=paste0("excel/introns_set1-v", ver, ".xlsx"))

## Deleting the file excel/introns_set1-v20170310.xlsx before writing the tables.

## Writing a legend of columns.

## Working on 1/7: mcwu_vs_wcwu

## Found inverse table with wtc_wtu_vs_mtc_wtu

## Working on 2/7: wcmu_vs_wcwu

## Found inverse table with wtc_wtu_vs_wtc_mtu

## Working on 3/7: mcmu_vw_wcwu

## Found inverse table with wtc_wtu_vs_mtc_mtu

## Working on 4/7: mcmu_vs_mcwu

## Found inverse table with mtc_wtu_vs_mtc_mtu

## Working on 5/7: mcmu_vs_wcmu

## Found inverse table with wtc_mtu_vs_mtc_mtu

## Working on 6/7: mcwu_vs_wcmu

## Found inverse table with wtc_mtu_vs_mtc_wtu

## Working on 7/7: wcmu_vs_mcwu

## Found table with wtc_mtu_vs_mtc_wtu

## Adding venn plots for mcwu_vs_wcwu.

## Limma expression coefficients for mcwu_vs_wcwu; R^2: 0.965; equation: y = 1.03x + 0.0112

## Edger expression coefficients for mcwu_vs_wcwu; R^2: 0.962; equation: y = 1x - 0.168

## DESeq2 expression coefficients for mcwu_vs_wcwu; R^2: 0.978; equation: y = 1.01x - 0.229

## Adding venn plots for wcmu_vs_wcwu.

## Limma expression coefficients for wcmu_vs_wcwu; R^2: 0.96; equation: y = 1.01x - 0.147

## Edger expression coefficients for wcmu_vs_wcwu; R^2: 0.958; equation: y = 0.993x - 0.0213

## DESeq2 expression coefficients for wcmu_vs_wcwu; R^2: 0.98; equation: y = 0.985x + 0.0398

## Adding venn plots for mcmu_vw_wcwu.

## Limma expression coefficients for mcmu_vw_wcwu; R^2: 0.93; equation: y = 1.05x - 0.424

## Edger expression coefficients for mcmu_vw_wcwu; R^2: 0.921; equation: y = 0.983x + 0.000606

## DESeq2 expression coefficients for mcmu_vw_wcwu; R^2: 0.968; equation: y = 1x - 0.223

## Adding venn plots for mcmu_vs_mcwu.

## Limma expression coefficients for mcmu_vs_mcwu; R^2: 0.948; equation: y = 1x - 0.275

## Edger expression coefficients for mcmu_vs_mcwu; R^2: 0.942; equation: y = 0.974x + 0.226

## DESeq2 expression coefficients for mcmu_vs_mcwu; R^2: 0.98; equation: y = 0.984x + 0.0487

## Adding venn plots for mcmu_vs_wcmu.

## Limma expression coefficients for mcmu_vs_wcmu; R^2: 0.971; equation: y = 1.03x - 0.154

## Edger expression coefficients for mcmu_vs_wcmu; R^2: 0.968; equation: y = 1x - 0.0667

## DESeq2 expression coefficients for mcmu_vs_wcmu; R^2: 0.978; equation: y = 1.02x - 0.237

## Adding venn plots for mcwu_vs_wcmu.

## Limma expression coefficients for mcwu_vs_wcmu; R^2: 0.954; equation: y = 0.984x + 0.477

## Edger expression coefficients for mcwu_vs_wcmu; R^2: 0.942; equation: y = 0.955x + 0.206

## DESeq2 expression coefficients for mcwu_vs_wcmu; R^2: 0.962; equation: y = 1.01x - 0.161

## Adding venn plots for wcmu_vs_mcwu.

## Limma expression coefficients for wcmu_vs_mcwu; R^2: 0.954; equation: y = 0.984x + 0.477

## Edger expression coefficients for wcmu_vs_mcwu; R^2: 0.942; equation: y = 0.955x + 0.206

## DESeq2 expression coefficients for wcmu_vs_mcwu; R^2: 0.962; equation: y = 1.01x - 0.161

## Writing summary information.

## The sheet: pairwise_summary is in legend, mcwu_vs_wcwu, wcmu_vs_wcwu, mcmu_vw_wcwu, mcmu_vs_mcwu, mcmu_vs_wcmu, mcwu_vs_wcmu, wcmu_vs_mcwu, pairwise_summary.

## Attempting to add the comparison plot to pairwise_summary at row: 25 and column: 1

## Performing save of the workbook.

intron2_tables <- combine_de_tables(norm2_pairwise, keepers=intron_keepers,
                                    excel=paste0("excel/introns_set2-v", ver, ".xlsx"))

## Deleting the file excel/introns_set2-v20170310.xlsx before writing the tables.

## Writing a legend of columns.

## Working on 1/7: mcwu_vs_wcwu

## Found inverse table with wtc_wtu_vs_mtc_wtu

## Working on 2/7: wcmu_vs_wcwu

## Found inverse table with wtc_wtu_vs_wtc_mtu

## Working on 3/7: mcmu_vw_wcwu

## Found inverse table with wtc_wtu_vs_mtc_mtu

## Working on 4/7: mcmu_vs_mcwu

## Found inverse table with mtc_wtu_vs_mtc_mtu

## Working on 5/7: mcmu_vs_wcmu

## Found inverse table with wtc_mtu_vs_mtc_mtu

## Working on 6/7: mcwu_vs_wcmu

## Found inverse table with wtc_mtu_vs_mtc_wtu

## Working on 7/7: wcmu_vs_mcwu

## Found table with wtc_mtu_vs_mtc_wtu

## Adding venn plots for mcwu_vs_wcwu.

## Limma expression coefficients for mcwu_vs_wcwu; R^2: 0.988; equation: y = 1.04x - 0.155

## Edger expression coefficients for mcwu_vs_wcwu; R^2: 0.987; equation: y = 1.05x - 0.575

## DESeq2 expression coefficients for mcwu_vs_wcwu; R^2: 0.987; equation: y = 1.03x - 0.47

## Adding venn plots for wcmu_vs_wcwu.

## Limma expression coefficients for wcmu_vs_wcwu; R^2: 0.987; equation: y = 1.01x + 0.0168

## Edger expression coefficients for wcmu_vs_wcwu; R^2: 0.988; equation: y = 1x - 0.219

## DESeq2 expression coefficients for wcmu_vs_wcwu; R^2: 0.99; equation: y = 0.989x + 0.0364

## Adding venn plots for mcmu_vw_wcwu.

## Limma expression coefficients for mcmu_vw_wcwu; R^2: 0.972; equation: y = 0.999x + 0.459

## Edger expression coefficients for mcmu_vw_wcwu; R^2: 0.971; equation: y = 0.991x - 0.261

## DESeq2 expression coefficients for mcmu_vw_wcwu; R^2: 0.987; equation: y = 0.995x - 0.0969

## Adding venn plots for mcmu_vs_mcwu.

## Limma expression coefficients for mcmu_vs_mcwu; R^2: 0.985; equation: y = 0.961x + 0.531

## Edger expression coefficients for mcmu_vs_mcwu; R^2: 0.984; equation: y = 0.949x + 0.272

## DESeq2 expression coefficients for mcmu_vs_mcwu; R^2: 0.993; equation: y = 0.952x + 0.491

## Adding venn plots for mcmu_vs_wcmu.

## Limma expression coefficients for mcmu_vs_wcmu; R^2: 0.988; equation: y = 0.994x + 0.395

## Edger expression coefficients for mcmu_vs_wcmu; R^2: 0.986; equation: y = 0.985x - 0.0148

## DESeq2 expression coefficients for mcmu_vs_wcmu; R^2: 0.986; equation: y = 0.994x + 0.00793

## Adding venn plots for mcwu_vs_wcmu.

## Limma expression coefficients for mcwu_vs_wcmu; R^2: 0.986; equation: y = 1.03x - 0.103

## Edger expression coefficients for mcwu_vs_wcmu; R^2: 0.984; equation: y = 1.04x - 0.304

## DESeq2 expression coefficients for mcwu_vs_wcmu; R^2: 0.976; equation: y = 1.03x - 0.402

## Adding venn plots for wcmu_vs_mcwu.

## Limma expression coefficients for wcmu_vs_mcwu; R^2: 0.986; equation: y = 1.03x - 0.103

## Edger expression coefficients for wcmu_vs_mcwu; R^2: 0.984; equation: y = 1.04x - 0.304

## DESeq2 expression coefficients for wcmu_vs_mcwu; R^2: 0.976; equation: y = 1.03x - 0.402

## Writing summary information.

## The sheet: pairwise_summary is in legend, mcwu_vs_wcwu, wcmu_vs_wcwu, mcmu_vw_wcwu, mcmu_vs_mcwu, mcmu_vs_wcmu, mcwu_vs_wcmu, wcmu_vs_mcwu, pairwise_summary.

## Attempting to add the comparison plot to pairwise_summary at row: 25 and column: 1

## Performing save of the workbook.

limma_introns <- intron1_tables[["data"]][["mcwu_vs_wcwu"]][["limma_logfc"]]
limma_exons <- intron2_tables[["data"]][["mcwu_vs_wcwu"]][["limma_logfc"]]

intron_table <- intron1_tables[["data"]][["mcwu_vs_wcwu"]]
exon_table <- intron2_tables[["data"]][["mcwu_vs_wcwu"]]

ratio_table <- merge(intron_table, exon_table, by="row.names")
ratio_table[["ratio_ratios"]] <- ratio_table[["limma_logfc.x"]] - ratio_table[["limma_logfc.y"]]

big_table <- write_xls(data=ratio_table)
openxlsx::saveWorkbook(big_table$workbook, file="excel/intron_exon_table.xlsx")

## Error in openxlsx::saveWorkbook(big_table$workbook, file = "excel/intron_exon_table.xlsx"): File already exists!

LS0tCnRpdGxlOiAiU2hhcmVkIGRhdGEgZm9yIFJOQXNlcSBvZiBTLmNlcmV2aXNpYWU6ICBBbm5vdGF0aW9uIGRhdGEgdXNpbmcgUidzIFNhY0NlciBwYWNrYWdlIGFuZCBiaW9tYXJ0LiIKYXV0aG9yOiAiYXRiIGFiZWxld0BnbWFpbC5jb20iCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCIKb3V0cHV0OgogaHRtbF9kb2N1bWVudDoKICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgY29kZV9mb2xkaW5nOiBzaG93CiAgZmlnX2NhcHRpb246IHRydWUKICBmaWdfaGVpZ2h0OiA3CiAgZmlnX3dpZHRoOiA3CiAgaGlnaGxpZ2h0OiB0YW5nbwogIGtlZXBfbWQ6IGZhbHNlCiAgbW9kZTogc2VsZmNvbnRhaW5lZAogIG51bWJlcl9zZWN0aW9uczogdHJ1ZQogIHNlbGZfY29udGFpbmVkOiB0cnVlCiAgdGhlbWU6IGNvc21vCiAgdG9jOiB0cnVlCiAgdG9jX2Zsb2F0OgogICAgY29sbGFwc2VkOiBmYWxzZQogICAgc21vb3RoX3Njcm9sbDogZmFsc2UKLS0tCgo8c3R5bGU+CiAgPCEtLSBEb2N1bWVudCBwcmVsdWRlIHJldmlzaW9uIDIwMTctMDIgLS0+CmJvZHkgLm1haW4tY29udGFpbmVyIHsKbWF4LXdpZHRoOiAxNjAwcHg7Cn0KPC9zdHlsZT4KCmBgYHtyIG9wdGlvbnMsIGluY2x1ZGU9RkFMU0V9CiMjIFRoZXNlIGFyZSB0aGUgb3B0aW9ucyBJIHRlbmQgdG8gZmF2b3IKbGlicmFyeSgiaHBnbHRvb2xzIikKa25pdHI6Om9wdHNfa25pdCRzZXQoCiAgICBwcm9ncmVzcyA9IFRSVUUsCiAgICB2ZXJib3NlID0gVFJVRSwKICAgIHdpZHRoID0gOTAsCiAgICBlY2hvID0gVFJVRSkKa25pdHI6Om9wdHNfY2h1bmskc2V0KAogICAgZXJyb3IgPSBUUlVFLAogICAgZmlnLndpZHRoID0gOCwKICAgIGZpZy5oZWlnaHQgPSA4LAogICAgZHBpID0gOTYpCm9wdGlvbnMoCiAgICBkaWdpdHMgPSA0LAogICAgc3RyaW5nc0FzRmFjdG9ycyA9IEZBTFNFLAogICAga25pdHIuZHVwbGljYXRlLmxhYmVsID0gImFsbG93IikKc2V0LnNlZWQoMSkKcHJldmlvdXNfZmlsZSA8LSAiYW5ub3RhdGlvbi5SbWQiCnJtZF9maWxlIDwtICJpbnRyb25zLlJtZCIKdmVyIDwtICIyMDE3MDMxMCIKcHJldmlvdXNfc2F2ZSA8LSBwYXN0ZTAoZ3N1YihwYXR0ZXJuPSJcXC5SbWQiLCByZXBsYWNlPSIucmRhLnh6IiwgeD1wcmV2aW91c19maWxlKSkKdGhpc19zYXZlIDwtIHBhc3RlMChnc3ViKHBhdHRlcm49IlxcLlJtZCIsIHJlcGxhY2U9Ii5yZGEueHoiLCB4PXJtZF9maWxlKSkKYGBgCgpgYGB7ciByZW5kZXJpbmcsIGluY2x1ZGU9RkFMU0UsIGV2YWw9RkFMU0V9CiMjIFRoaXMgYmxvY2sgaXMgdXNlZCB0byByZW5kZXIgYSBkb2N1bWVudCBmcm9tIHdpdGhpbiBpdC4Kcm1hcmtkb3duOjpyZW5kZXIocm1kX2ZpbGUpCgojIyBBbiBleHRyYSByZW5kZXJlciBmb3IgcGRmIG91dHB1dAoKCnJtYXJrZG93bjo6cmVuZGVyKHJtZF9maWxlLCBvdXRwdXRfZm9ybWF0PSJwZGZfZG9jdW1lbnQiLCBvdXRwdXRfb3B0aW9ucz1jKCJza2lwX2h0bWwiKSkKIyMgT3IgdG8gc2F2ZS9sb2FkIGxhcmdlIFJkYXRhIGZpbGVzLgpocGdsdG9vbHM6OjpzYXZlbWUoKQpocGdsdG9vbHM6Ojpsb2FkbWUoKQpybShsaXN0PWxzKCkpCmBgYAoKYGBge3IgbG9hZG1lLCBpbmNsdWRlPUZBTFNFLCBldmFsPUZBTFNFfQp0bXAgPC0gc20obG9hZG1lKGZpbGVuYW1lPXByZXZpb3VzX3NhdmUpKQpgYGAKCltpbmRleC5odG1sXShpbmRleC5odG1sKSBbYW5ub3RhdGlvbi5odG1sXShhbm5vdGF0aW9uLmh0bWwpCgpIbW0gd2VsbCwgc2V0dGluZyB1cCB0byBkbyB0aGlzIHdhcyByYXRoZXIgbW9yZSBkaWZmaWN1bHQgdGhhbiBJIGV4cGVjdGVkLiAgSSB3YXMgYWJsZSB0byBmYWlybHkKZWFzaWx5IGNvbGxlY3QgdGhlIHNldCBvZiBpbnRyb25zIGFjY29yZGluZyB0byBlbnNlbWJsLCB5ZWFzdGdlbm9tZS5vcmcsIGFuZCB0aGUgbmNiaSBjb3B5IG9mIHRoZQp5ZWFzdCBpbnRyb24gZGF0YWJhc2UuICBJbnRlcmVzdGluZ2x5LCB0aGUgb25seSBtdWx0aS1pbnRyb25pYyBnZW5lcyBpbiB5ZWFzdCBhcmUgYWxsIGluIHRoZQptaXRvY2hyb25kcmlhbCBnZW5vbWUgLS0gd2hlcmUgKGZvciBleGFtcGxlKSBDT1gxIGhhcyA4IGludHJvbnMhCgpUaGUgcHJvYmxlbSB0aG91Z2gsIGxpZXMgaW4gdGhlIGZhY3QgdGhhdCBJIG5lZWQgdG8gZ2V0IG15IG5ldyBpbnRyb24gaW5mb3JtYXRpb24gaW50byBhIGdmZiBmaWxlIGluCm9yZGVyIHRvIGNvdW50IHRoZW0uICBBIGZldyBtaXMtc3RlcHMgbGVhZCBtZSB0byBldmVudHVhbGx5IGNyb3NzIHJlZmVyZW5jZSB0aGUgaW50cm9uIGRhdGEgYWdhaW5zdApteSBleGlzdGluZyBnZmYgZGF0YWJhc2Ugd2hlcmUgSSBmb3VuZCB0aGF0IHRoZSBjb29yZGluYXRlcyBkaWQgbm90IHF1aXRlIG1hdGNoIGJlY2F1c2Ugb2YgdmVyc2lvbgptaXNtYXRjaGVzIGluIHRoZSBnZW5vbWVzLi4uCgpTbywgSSBkZWNpZGVkIHRvIGp1c3QgdXNlIHRoZSBleGlzdGluZyBnZmYgYW5ub3RhdGlvbiBhbmQgZXh0cmFjdCB0aGUgaW50cm9ucyBmcm9tIGl0LCBhbmQgYWRkIHRoZW0KdG8gaXQuICBUaGUgZ2Vub21ldG9vbHMgc29mdHdhcmUgcGFja2FnZSBoYXMgc29tZSBmdW5jdGlvbmFsaXR5IHRvIGRvIHRoYXQgYXV0b21hdGljYWxseS4gIEhvd2V2ZXIsCnRoYXQgaW4gdHVybnMgcmVxdWlyZXMgYSB2ZXJ5IHBlcmZlY3RseSBmb3JtYXR0ZWQgZ2ZmMyBmaWxlLi4uICBUaGUgeWVhc3QgZ2Vub21lIGdmZiB3ZSBoYXZlIGlzIG5vdCwKc28gSSB3cm90ZSBhIHBlcmwgc2NyaXB0IHRvIHJlZm9ybWF0IGl0IGFzIGEgY2Fub25pY2FsIGdmZjMuICBJIHRoZW4gZGlkCgo+IGd0IGdmZjMgLWFkZGludHJvbnMgdGVzdG1lLmdmZgoKVGhpcyBnYXZlIG1lIGEgbmV3IHRlc3RtZS5nZmYgZmlsZSB3aXRoIDQyNyBuZXcgbGluZXMgd2hpY2ggbG9va2VkIGxpa2U6Cgo8cHJlPgpJICAgICAgIC4gICAgICAgaW50cm9uICA4NzM4OCAgIDg3NTAwICAgLiAgICAgICArICAgICAgIC4gICAgICAgLgo8L3ByZT4KCldlbGwsIHRoYXQgaXMgcHJldHR5IGNsb3NlIHRvIHdoYXQgSSBuZWVkLCBidXQgSSBjYW5ub3QgY291bnQgdGhlIGludHJvbiByZWFkcyB3aXRob3V0IHRoZSA5dGgKY29sdW1uIG9mIHRoZSBnZmYgZmlsbGVkIG91dC4gIE5vdGhpbmcgSSBkaWQgd2FzIGFibGUgdG8gZmlsbCBpbiB0aGlzIGZpbmFsIGNvbHVtbiwgc28gSSBldmVudHVhbGx5CmRlY2lkZWQgdG8gb3BlbiB0aGUgZ2ZmIGluIHRoZSBlbWFjcyB0ZXh0IGVkaXRvciBhbmQgd3JpdGUgYSBzbWFsbCBtYWNybyB3aGljaCB0YWtlcyB0aGUgYW5ub3RhdGlvbgpmcm9tIHRoZSBwcmV2aW91cyBsaW5lJ3MgOXRoIGNvbHVtbiBhbmQgY29weS9wYXN0ZSBpdCB0byByZXBsYWNlIHRoZSB0ZXJtaW5hbCAnLicgb2YgdGhlIGN1cnJlbnQKbGluZS4gIEl0IHRvb2sgYWJvdXQgMTUgbWludXRlcyBmb3IgbWUgdG8gaGl0IGNvbnRyb2wtcyBjb250cm9sLTMgYW5kIGludm9rZSBteSBtYWNybyBvbiBldmVyeQppbnRyb24gZmVhdHVyZSwgYnV0IHRoZSBlbmQgcmVzdWx0IHdhcyBhIGZpbGUgd2l0aCBsaW5lcyB3aGljaCBsb29rIGxpa2U6Cgo8cHJlPgpJICAgICAgIC4gICAgICAgaW50cm9uICA4NzM4OCAgIDg3NTAwICAgLiAgICAgICArICAgICAgIC4gICAgSUQ9IlNOQzEiO2V4b25fbnVtYmVyPSIxIjtnZW5lX2lkPSJZQUwwMzBXIjtwX2lkPSJQMzQyMyI7c2VxZWRpdD0iZmFsc2UiO3RyYW5zY3JpcHRfaWQ9IllBTDAzMFciO3RyYW5zY3JpcHRfbmFtZT0iU05DMSI7dHNzX2lkPSJUU1M2MjgwIgo8L3ByZT4KCndoaWNoIGlzIGV4YWN0bHkgd2hhdCBJIG5lZWQuICBUaGUgcmVzdWx0aW5nIGZpbGUgaXMgaW4gcmVmZXJlbmNlIHVuZGVyIHNjZXJldmlzaWFlX2ludHJvbnMuZ2ZmLgoKRmluYWxseSwgYXJtZWQgd2l0aCB0aGlzIG5ldyBhbm5vdGF0aW9uIGRhdGFiYXNlLCBJIGxvZ2dlZCBpbnRvIHRoZSBjb21wdXRlciBjbHVzdGVyIGFuZCBkaWQgYSBmb3IKbG9vcCBvdmVyIHRoZSBzYW1wbGVzIHRvIHJ1bjoKCj4gY2QgJHtzYW1wbGV9L291dHB1dHMvYm93dGllMl9zY2VyZXZpc2lhZS8gJiYgaHRzZXEtY291bnQgLXEgLWYgYmFtIC1zIG5vIC1pIElEIC10IGludHJvbiBcCiAgJHtzYW1wbGV9X2ZvcndhcmQtdHJpbW1lZC5iYW0gJHt3ZH0vcmVmZXJlbmNlL3NjZXJldmlzaWFlX2ludHJvbnMuZ2ZmIDE+aW50cm9ucy5jb3VudCBcCiAgMj5pbnRyb25zLmNvdW50LmVyciAmJiB4eiAtOWUgaW50cm9ucy5jb3VudAoKVGhpcyBhbGxvd3MgbWUgdG8gYWRkIGEgbmV3IGNvbHVtbiBpbiBteSBzYW1wbGUgc2hlZXQgZXhjZWwgZmlsZSAnaW50cm9uc19maWxlJy4uLgoKYGBge3IgbWFrZV9pbnRyb25zfQppbnRyb25fY291bnRzIDwtIHNtKGNyZWF0ZV9leHB0KCJzYW1wbGVfc2hlZXRzL2FsbF9zYW1wbGVzLnhsc3giLCBnZW5lX2luZm89c2NfZ2ZmX2Fubm90YXRpb25zLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGZpbGVfY29sdW1uPSJpbnRyb25fZmlsZSIpKQojIyBUaGUgc2FtZSBtYXBwaW5ncyBmb3IgdGhlIGdlbm9tZSBhcmUgaW4gdjNfZXhwdAoKIyMgTGV0IHVzIGNoZWNrIHRoYXQgSSBtYXBwZWQgY29ycmVjdGx5IGFuZCBjYW4geHJlZiB0aGVzZSBkYXRhLi4uCmhlYWQoQmlvYmFzZTo6ZXhwcnMoaW50cm9uX2NvdW50cyRleHByZXNzaW9uc2V0KSkKaGVhZChCaW9iYXNlOjpleHBycyh2M19leHB0JGV4cHJlc3Npb25zZXQpKQojIyBnaWdnaXR5IGxvb2tzIGdvb2QuCmRpbShCaW9iYXNlOjpleHBycyhpbnRyb25fY291bnRzJGV4cHJlc3Npb25zZXQpKQpkaW0oQmlvYmFzZTo6ZXhwcnModjNfZXhwdCRleHByZXNzaW9uc2V0KSkKYGBgCgpPayBzbyBub3cgSSBoYXZlIDIgZGF0YXNldHMsIG9uZSB3aGljaCBpcyByYXRoZXIgbGFyZ2VyIHRoYW4gdGhlIG90aGVyLgpBcyBJIHVuZGVyc3RhbmQgdGhlIHN0YXRlZCBnb2FsIGZyb20gRHIuIE1vdW50LCBJIHdhbnQgdG8gcmF0aW8gdGhlIGludHJvbi9nZW5lIGNvdW50cyBmb3IgZWFjaAppbnRyb24gY29udGFpbmluZyBnZW5lIGFuZCBzZWUgaWYgdGhlcmUgaXMgYSBzaWduaWZpY2FudCBkaWZmZXJlbmNlIGFjcm9zcyBjb25kaXRpb25zLi4uCgpUbyBkbyB0aGF0IEkgYW0gdGhpbmtpbmcgSSB3aWxsIGNvcHkgdGhlIGV4aXN0aW5nIGV4cHJlc3Npb25zZXQgdG8gc29tZXRoaW5nIGxpa2UgJ2ludHJvbl9yYXRpbycsCmV4dHJhY3QgdGhlIG1hdHJpY2VzIG9mIGNvdW50cyBmcm9tIHRoZSBiaWcgYW5kIGxpdHRsZSBkYXRhc2V0cyBmb3Igb25seSB0aGUgaW50cm9uIGNvbnRhaW5pbmcKZ2VuZXMsIGFuZCBmaW5hbGx5IGRvIHRoZSBkaXZpc2lvbi4KCmBgYHtyIGludHJvbl9zdWJzZXR9CmludHJvbl9yYXRpbyA8LSBpbnRyb25fY291bnRzCgppbnRyb25fbXRyeCA8LSBhcy5tYXRyaXgoQmlvYmFzZTo6ZXhwcnMoaW50cm9uX2NvdW50cyRleHByZXNzaW9uc2V0KSkKYWxsX210cnggPC0gYXMubWF0cml4KEJpb2Jhc2U6OmV4cHJzKHYzX2V4cHQkZXhwcmVzc2lvbnNldCkpCmFsbF9zdWJzZXQgPC0gYWxsX210cnhbcm93bmFtZXMoaW50cm9uX210cngpLCBdCmludHJvbl9yYXRpb19tdHJ4IDwtIGludHJvbl9tdHJ4IC8gYWxsX3N1YnNldApuYW5zIDwtIGlzLm5hbihpbnRyb25fcmF0aW9fbXRyeCkKaW50cm9uX3JhdGlvX210cnhbbmFuc10gPC0gMAoKQmlvYmFzZTo6ZXhwcnMoaW50cm9uX3JhdGlvJGV4cHJlc3Npb25zZXQpIDwtIGludHJvbl9yYXRpb19tdHJ4CgpyYXRpb19ub3JtIDwtIG5vcm1hbGl6ZV9leHB0KGludHJvbl9yYXRpbywgZmlsdGVyPSJzaW1wbGUiKQpwbG90X2RlbnNpdHkocmF0aW9fbm9ybSkKYGBgCgpUaGF0IHNlZW1zIGEgZHVtYiB3YXkgdG8gbG9vayBhdCB0aGlzLCBsZXRzIHRyeSBzb21ldGhpbmcgZWxzZS4KCmBgYHtyIGludHJvbl92c19hbGxfdjJ9CiMjIEhvdyBhYm91dCBJIHN1YnNldCB0aGUgdjMgZGF0YSB0byBhcyBhYm92ZQojIyBUaGVuIGRvIGlkZW50aWNhbCBsb2cyKGNwbShzb21ldGhpbmcpKSBub3JtYWxpemF0aW9ucwojIyBUaGVuIHN1YnRyYWN0IHRoZSBsb2cgc2NhbGUgbnVtYmVycyBvciBzb21ldGhpbmc/CgojIyBTZXQxIHdpbGwgYmUgdGhlIGludHJvbiBjb3VudHMsIHNldDIgd2lsbCBiZSB0aGUgZXhvbiBjb3VudHMgc3Vic2V0dGVkIGZvciB0aGUgaW50cm9uIGdlbmVzLgppbnRyb25fc2V0MSA8LSBpbnRyb25fY291bnRzCmludHJvbl9zZXQyIDwtIGludHJvbl9jb3VudHMKIyMgcmVwbGFjZSB0aGUgZGF0YSB3aXRoIHRoZSBleG9uIGNvdW50cyBzbyB0aGF0IEkgbm93IGhhdmUgMiBkYXRhIHNldHMgb2YgdGhlIHNhbWUgZ2VuZXMuCkJpb2Jhc2U6OmV4cHJzKGludHJvbl9zZXQyJGV4cHJlc3Npb25zZXQpIDwtIGFsbF9zdWJzZXQKCm5vcm1fc2V0MSA8LSBub3JtYWxpemVfZXhwdChpbnRyb25fc2V0MSwgZmlsdGVyPVRSVUUsIG5vcm09InF1YW50IiwgY29udmVydD0iY3BtIiwgdHJhbnNmb3JtPSJsb2cyIikKbm9ybV9zZXQyIDwtIG5vcm1hbGl6ZV9leHB0KGludHJvbl9zZXQyLCBmaWx0ZXI9VFJVRSwgbm9ybT0icXVhbnQiLCBjb252ZXJ0PSJjcG0iLCB0cmFuc2Zvcm09ImxvZzIiKQpzZXQxX3BjYSA8LSBwbG90X3BjYShub3JtX3NldDEpCnNldDFfcGNhJHBsb3QKCnNldDJfcGNhIDwtIHBsb3RfcGNhKG5vcm1fc2V0MikKc2V0Ml9wY2EkcGxvdApgYGAKCmhtbSB3ZWxsIHRoYXQgaXMgaW50ZXJlc3RpbmcuICBUaGVyZSByZW1haW5zIGEgZmFpcmx5IHByb2ZvdW5kIGJhdGNoIGVmZmVjdCwgYnV0IHNvbWUgb3RoZXIKZGlmZmVyZW5jZXMgc2VlbWVkIHRvIGp1bXAgb3V0LiAgU2hvdWxkIEkgdHJ5IHN2YSBvciBzb21lc3VjaD8KCmBgYHtyIHRyeV9hZ2Fpbl9iYXRjaH0Kbm9ybV9zZXQzIDwtIG5vcm1hbGl6ZV9leHB0KGludHJvbl9zZXQxLCBmaWx0ZXI9VFJVRSwgY29udmVydD0iY3BtIiwKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHRyYW5zZm9ybT0ibG9nMiIsIGJhdGNoPSJmc3ZhIikKbm9ybV9zZXQ0IDwtIG5vcm1hbGl6ZV9leHB0KGludHJvbl9zZXQyLCBmaWx0ZXI9VFJVRSwgY29udmVydD0iY3BtIiwKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHRyYW5zZm9ybT0ibG9nMiIsIGJhdGNoPSJmc3ZhIikKc2V0M19wY2EgPC0gcGxvdF9wY2Eobm9ybV9zZXQzKQpzZXQzX3BjYSRwbG90CnNldDRfcGNhIDwtIHBsb3RfcGNhKG5vcm1fc2V0NCkKc2V0NF9wY2EkcGxvdAoKc2V0M19jb3IgPC0gcGxvdF9jb3JoZWF0KG5vcm1fc2V0MykKc2V0NF9jb3IgPC0gcGxvdF9jb3JoZWF0KG5vcm1fc2V0NCkKYGBgCgpJZiBJIHJlYWQgdGhpcyBjb3JyZWN0bHksIHRha2luZyBpbiB0aGUgc3Vycm9nYXRlIHZhcmlhYmxlcyBtYWtlcyB0aGVtIGxvb2sgc2ltaWxhci4gIE9oIGNyYXAsIHRoaXMKZGF0YSBzdGlsbCBoYXMgYWxsIHRoZSB0Uk5BIGZlYXR1cmVzISAgSG1tIHRoYXQgaXMgYWN0dWFsbHkgb25seSAyMCBmZWF0dXJlcyBvZiBteSByZW1haW5pbmcgMzAxLgoKWWVhaCBJIHJlYWxseSBhbSBub3Qgc3VyZSBob3cgdG8gdGhpbmsgYWJvdXQgdGhpcywgc28gSSBhbSBkZWZhdWx0aW5nIHRvIGp1c3QgdHJlYXRpbmcgaXQgbGlrZQphbm90aGVyIGV4cHJlc3Npb25zZXQsIHdoaWNoIGlzIGFsbW9zdCBjZXJ0YWlubHkgaW5jb3JyZWN0LiAgSSB0aGluayBJIHdpbGwgc2VuZCB0aGlzIHRvIENhcm9sLApCcmlnZ3MsIGFuZCBwZXJoYXBzIERyLiBNb3VudCBhbmQgc2VlIHdoYXQgdGhleSB0aGluay4KCkxldCB1cyB0cnkgc29tZSBzaW1wbGUgY29udHJhc3RzLCBJIGFtIGd1ZXNzaW5nIGl0IGlzIGJlc3QgdG8ga2VlcCBzdmEgaW4gdGhlIG1peCBvbiBib3RoIHNpZGVzLgoKYGBge3IgZGVfaXNofQpub3JtX3NldDEgPC0gbm9ybWFsaXplX2V4cHQoaW50cm9uX3NldDEsIGZpbHRlcj1UUlVFKQpub3JtMV9wYWlyd2lzZSA8LSBhbGxfcGFpcndpc2Uobm9ybV9zZXQxLCBtb2RlbF9iYXRjaD0ic3ZhdW5zdXAiLCBwYXJhbGxlbD1GQUxTRSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHdoaWNoX3Zvb209InZvb20iLCBkZXNlcV9tZXRob2Q9InNob3J0IikKbm9ybV9zZXQyIDwtIG5vcm1hbGl6ZV9leHB0KGludHJvbl9zZXQyLCBmaWx0ZXI9VFJVRSkKbm9ybTJfcGFpcndpc2UgPC0gYWxsX3BhaXJ3aXNlKG5vcm1fc2V0MiwgbW9kZWxfYmF0Y2g9InN2YXVuc3VwIiwgcGFyYWxsZWw9RkFMU0UsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3aGljaF92b29tPSJ2b29tIiwgZGVzZXFfbWV0aG9kPSJzaG9ydCIpCgppbnRyb25fa2VlcGVycyA8LSBsaXN0KAogICAgIm1jd3VfdnNfd2N3dSIgPSBjKCJtdGNfd3R1IiwgInd0Y193dHUiKSwKICAgICJ3Y211X3ZzX3djd3UiID0gYygid3RjX210dSIsICJ3dGNfd3R1IiksCiAgICAibWNtdV92d193Y3d1IiA9IGMoIm10Y19tdHUiLCAid3RjX3d0dSIpLAogICAgIm1jbXVfdnNfbWN3dSIgPSBjKCJtdGNfbXR1IiwgIm10Y193dHUiKSwKICAgICJtY211X3ZzX3djbXUiID0gYygibXRjX210dSIsICJ3dGNfbXR1IiksCiAgICAibWN3dV92c193Y211IiA9IGMoIm10Y193dHUiLCAid3RjX210dSIpLAogICAgIndjbXVfdnNfbWN3dSIgPSBjKCJ3dGNfbXR1IiwgIm10Y193dHUiKSkKaW50cm9uMV90YWJsZXMgPC0gY29tYmluZV9kZV90YWJsZXMobm9ybTFfcGFpcndpc2UsIGtlZXBlcnM9aW50cm9uX2tlZXBlcnMsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGV4Y2VsPXBhc3RlMCgiZXhjZWwvaW50cm9uc19zZXQxLXYiLCB2ZXIsICIueGxzeCIpKQppbnRyb24yX3RhYmxlcyA8LSBjb21iaW5lX2RlX3RhYmxlcyhub3JtMl9wYWlyd2lzZSwga2VlcGVycz1pbnRyb25fa2VlcGVycywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZXhjZWw9cGFzdGUwKCJleGNlbC9pbnRyb25zX3NldDItdiIsIHZlciwgIi54bHN4IikpCgpsaW1tYV9pbnRyb25zIDwtIGludHJvbjFfdGFibGVzW1siZGF0YSJdXVtbIm1jd3VfdnNfd2N3dSJdXVtbImxpbW1hX2xvZ2ZjIl1dCmxpbW1hX2V4b25zIDwtIGludHJvbjJfdGFibGVzW1siZGF0YSJdXVtbIm1jd3VfdnNfd2N3dSJdXVtbImxpbW1hX2xvZ2ZjIl1dCgppbnRyb25fdGFibGUgPC0gaW50cm9uMV90YWJsZXNbWyJkYXRhIl1dW1sibWN3dV92c193Y3d1Il1dCmV4b25fdGFibGUgPC0gaW50cm9uMl90YWJsZXNbWyJkYXRhIl1dW1sibWN3dV92c193Y3d1Il1dCgpyYXRpb190YWJsZSA8LSBtZXJnZShpbnRyb25fdGFibGUsIGV4b25fdGFibGUsIGJ5PSJyb3cubmFtZXMiKQpyYXRpb190YWJsZVtbInJhdGlvX3JhdGlvcyJdXSA8LSByYXRpb190YWJsZVtbImxpbW1hX2xvZ2ZjLngiXV0gLSByYXRpb190YWJsZVtbImxpbW1hX2xvZ2ZjLnkiXV0KCmJpZ190YWJsZSA8LSB3cml0ZV94bHMoZGF0YT1yYXRpb190YWJsZSkKb3Blbnhsc3g6OnNhdmVXb3JrYm9vayhiaWdfdGFibGUkd29ya2Jvb2ssIGZpbGU9ImV4Y2VsL2ludHJvbl9leG9uX3RhYmxlLnhsc3giKQpgYGAK

Shared data for RNAseq of S.cerevisiae: Annotation data using R’s SacCer package and biomart.

atb abelew@gmail.com

2017-03-09