1 Revisisting some much older RNASeq data

One primary goal of revisiting this data is to look at some intergenic regions. I took a very simplistic route to consider this question.

1.1 Create sample sheet

I recently wrote a nifty function which in theory should fill in a sample sheet, let us see how well it works for this data.

##new_samplesheet <- gather_preprocessing_metadata("sample_sheets/all_samples.xlsx")

It turns out that function is not yet well suited to this data. Notably it tries to sanitize the date columns and does so poorly.

Also, damnit I am an idiot and redid the mapping off of my notes rather than remember that the sample sheet explicitly states that this is 5448 and not 5005. This will not make a huge difference, but I will remap as well.

Note that for the annotations, I am almost certain that the gene IDs have different lexical values between the genome downloaded from NCBI and the annotation data at microbesonline. If I recall it is something like the difference between ‘M5005_Spy_001’ and ‘M5005Spy_001’ or something similarly wacky.

annot <- as.data.frame(load_microbesonline_annotations("5005"))
## Found 1 entry.
## Streptococcus pyogenes MGAS5005Firmicutesyes2005-08-25yes101950293653
## The species being downloaded is: Streptococcus pyogenes MGAS5005
## Downloading: http://www.microbesonline.org/cgi-bin/genomeInfo.cgi?tId=293653;export=tab
rownames(annot) <- make.names(gsub(x=annot[["sysName"]], pattern="M5005_Spy_", replacement="M5005_Spy"), unique=TRUE)
sample_sheet <- "sample_sheets/all_samples.xlsx"
genome_expt <- create_expt(sample_sheet, gene_info=annot, file_column="counttable")
## Reading the sample metadata.
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## The sample definitions comprises: 20 rows(samples) and 28 columns(metadata fields).
## Matched 1926 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 1926 rows and 20 columns.
inter_expt <- create_expt(sample_sheet, gene_info=annot, file_column="intercount")
## Reading the sample metadata.
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## The sample definitions comprises: 20 rows(samples) and 28 columns(metadata fields).
## Matched 1841 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 1841 rows and 20 columns.

1.2 Decide what I want to consider as condition/batch/etc

genome_expt <- genome_expt %>%
  set_expt_conditions(fact="genotype") %>%
  set_expt_batches(fact="time")

inter_expt <- inter_expt %>%
  set_expt_conditions(fact="genotype") %>%
  set_expt_batches(fact="time")

1.3 A few descriptive plots

1.3.1 CDS

cds_plots <- graph_metrics(genome_expt)
## 2077 entries are 0.  We are on a log scale, adding 1 to the data.
## Warning in plot.window(...): "title" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "title" is not a graphical parameter
## Warning in title(...): "title" is not a graphical parameter
## Performing correlation.
## Warning in plot.window(...): "title" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "title" is not a graphical parameter
## Warning in title(...): "title" is not a graphical parameter
## Performing distance.
## Changed 2077 zero count features.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
cds_plots$libsize

cds_norm <- normalize_expt(genome_expt, filter=TRUE, norm="quant", convert="cpm", transform="log2")
## Removing 127 low-count genes (1799 remaining).
cds_norm_plots <- graph_metrics(cds_norm)
## Warning in plot.window(...): "title" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "title" is not a graphical parameter
## Warning in title(...): "title" is not a graphical parameter
## Performing correlation.
## Warning in plot.window(...): "title" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "title" is not a graphical parameter
## Warning in title(...): "title" is not a graphical parameter
## Performing distance.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
cds_norm_plots$pc_plot

cds_de <- all_pairwise(genome_expt, filter=TRUE, model_batch=TRUE)
## Plotting a PCA before surrogate/batch inclusion.
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
cds_table <- combine_de_tables(cds_de, excel="excel/cds_table.xlsx")
## Deleting the file excel/cds_table.xlsx before writing the tables.

1.3.2 InterCDS

inter_plots <- graph_metrics(inter_expt)
## 1342 entries are 0.  We are on a log scale, adding 1 to the data.
## Warning in plot.window(...): "title" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "title" is not a graphical parameter
## Warning in title(...): "title" is not a graphical parameter
## Performing correlation.
## Warning in plot.window(...): "title" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "title" is not a graphical parameter
## Warning in title(...): "title" is not a graphical parameter
## Performing distance.
## Changed 1342 zero count features.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
inter_plots$libsize

inter_norm <- normalize_expt(inter_expt, filter=TRUE, norm="quant", convert="cpm", transform="log2")
## Removing 23 low-count genes (1818 remaining).
inter_norm_plots <- graph_metrics(inter_norm)
## Warning in plot.window(...): "title" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "title" is not a graphical parameter
## Warning in title(...): "title" is not a graphical parameter
## Performing correlation.
## Warning in plot.window(...): "title" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "title" is not a graphical parameter
## Warning in title(...): "title" is not a graphical parameter
## Performing distance.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
inter_norm_plots$pc_plot

inter_de <- all_pairwise(inter_expt, filter=TRUE, model_batch=TRUE)
## Plotting a PCA before surrogate/batch inclusion.
## Using limma's removeBatchEffect to visualize with(out) batch inclusion.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
inter_table <- combine_de_tables(inter_de, excel="excel/inter_table.xlsx")
## Deleting the file excel/inter_table.xlsx before writing the tables.

1.4 Compare CDS/interCDS

comp <- compare_de_results(cds_table, inter_table)
## Testing method: limma.
## Adding method: limma to the set.
## Testing method: deseq.
## Adding method: deseq to the set.
## Testing method: edger.
## Adding method: edger to the set.
##  Starting method limma, table wt_vs_mga.
##  Starting method deseq, table wt_vs_mga.
##  Starting method edger, table wt_vs_mga.
as.data.frame(comp$result)
##   limma.wt_vs_mga.logfc limma.wt_vs_mga.p limma.wt_vs_mga.adjp
## 1                0.8272            0.5079                0.586
##   deseq.wt_vs_mga.logfc deseq.wt_vs_mga.p deseq.wt_vs_mga.adjp
## 1                0.8973             0.559               0.6215
##   edger.wt_vs_mga.logfc edger.wt_vs_mga.p edger.wt_vs_mga.adjp
## 1                0.8988            0.6019               0.6984

Similar, but definitely not the same, so there may be some interesting differences.

pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
message(paste0("Saving to ", this_save))
tmp <- sm(saveme(filename=this_save))
LS0tCnRpdGxlOiAiSSBhbSBub3QgY2VydGFpbiB3aGF0IHRvIHRpdGxlIHRoaXMuIgphdXRob3I6ICJhdGIgYWJlbGV3QGdtYWlsLmNvbSIKZGF0ZTogImByIFN5cy5EYXRlKClgIgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIGNvZGVfZG93bmxvYWQ6IHRydWUKICAgIGNvZGVfZm9sZGluZzogc2hvdwogICAgZmlnX2NhcHRpb246IHRydWUKICAgIGZpZ19oZWlnaHQ6IDcKICAgIGZpZ193aWR0aDogNwogICAgaGlnaGxpZ2h0OiB0YW5nbwogICAga2VlcF9tZDogZmFsc2UKICAgIG1vZGU6IHNlbGZjb250YWluZWQKICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQogICAgc2VsZl9jb250YWluZWQ6IHRydWUKICAgIHRoZW1lOiByZWFkYWJsZQogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6CiAgICAgIGNvbGxhcHNlZDogZmFsc2UKICAgICAgc21vb3RoX3Njcm9sbDogZmFsc2UKICBybWRmb3JtYXRzOjpyZWFkdGhlZG93bjoKICAgIGNvZGVfZG93bmxvYWQ6IHRydWUKICAgIGNvZGVfZm9sZGluZzogc2hvdwogICAgZGZfcHJpbnQ6IHBhZ2VkCiAgICBmaWdfY2FwdGlvbjogdHJ1ZQogICAgZmlnX2hlaWdodDogNwogICAgZmlnX3dpZHRoOiA3CiAgICBoaWdobGlnaHQ6IHRhbmdvCiAgICB3aWR0aDogMzAwCiAgICBrZWVwX21kOiBmYWxzZQogICAgbW9kZTogc2VsZmNvbnRhaW5lZAogICAgdG9jX2Zsb2F0OiB0cnVlCiAgQmlvY1N0eWxlOjpodG1sX2RvY3VtZW50OgogICAgY29kZV9kb3dubG9hZDogdHJ1ZQogICAgY29kZV9mb2xkaW5nOiBzaG93CiAgICBmaWdfY2FwdGlvbjogdHJ1ZQogICAgZmlnX2hlaWdodDogNwogICAgZmlnX3dpZHRoOiA3CiAgICBoaWdobGlnaHQ6IHRhbmdvCiAgICBrZWVwX21kOiBmYWxzZQogICAgbW9kZTogc2VsZmNvbnRhaW5lZAogICAgdG9jX2Zsb2F0OiB0cnVlCi0tLQoKPHN0eWxlIHR5cGU9InRleHQvY3NzIj4KYm9keSwgdGQgewogIGZvbnQtc2l6ZTogMTZweDsKfQpjb2RlLnJ7CiAgZm9udC1zaXplOiAxNnB4Owp9CnByZSB7CiBmb250LXNpemU6IDE2cHgKfQo8L3N0eWxlPgoKYGBge3Igb3B0aW9ucywgaW5jbHVkZT1GQUxTRX0KbGlicmFyeSgiaHBnbHRvb2xzIikKdHQgPC0gZGV2dG9vbHM6OmxvYWRfYWxsKCJ+L2hwZ2x0b29scyIpCmtuaXRyOjpvcHRzX2tuaXQkc2V0KHdpZHRoPTEyMCwKICAgICAgICAgICAgICAgICAgICAgcHJvZ3Jlc3M9VFJVRSwKICAgICAgICAgICAgICAgICAgICAgdmVyYm9zZT1UUlVFLAogICAgICAgICAgICAgICAgICAgICBlY2hvPVRSVUUpCmtuaXRyOjpvcHRzX2NodW5rJHNldChlcnJvcj1UUlVFLAogICAgICAgICAgICAgICAgICAgICAgZHBpPTk2KQpvbGRfb3B0aW9ucyA8LSBvcHRpb25zKGRpZ2l0cz00LAogICAgICAgICAgICAgICAgICAgICAgIHN0cmluZ3NBc0ZhY3RvcnM9RkFMU0UsCiAgICAgICAgICAgICAgICAgICAgICAga25pdHIuZHVwbGljYXRlLmxhYmVsPSJhbGxvdyIpCmdncGxvdDI6OnRoZW1lX3NldChnZ3Bsb3QyOjp0aGVtZV9idyhiYXNlX3NpemU9MTApKQpydW5kYXRlIDwtIGZvcm1hdChTeXMuRGF0ZSgpLCBmb3JtYXQ9IiVZJW0lZCIpCnByZXZpb3VzX2ZpbGUgPC0gIiIKdmVyIDwtIGZvcm1hdChTeXMuRGF0ZSgpLCAiJVklbSVkIikKCiMjdG1wIDwtIHNtKGxvYWRtZShmaWxlbmFtZT1wYXN0ZTAoZ3N1YihwYXR0ZXJuPSJcXC5SbWQiLCByZXBsYWNlPSIiLCB4PXByZXZpb3VzX2ZpbGUpLCAiLXYiLCB2ZXIsICIucmRhLnh6IikpKQpybWRfZmlsZSA8LSAiMjAyMV9hbmFseXNlcy5SbWQiCmBgYAoKIyBSZXZpc2lzdGluZyBzb21lIG11Y2ggb2xkZXIgUk5BU2VxIGRhdGEKCk9uZSBwcmltYXJ5IGdvYWwgb2YgcmV2aXNpdGluZyB0aGlzIGRhdGEgaXMgdG8gbG9vayBhdCBzb21lIGludGVyZ2VuaWMKcmVnaW9ucy4gIEkgdG9vayBhIHZlcnkgc2ltcGxpc3RpYyByb3V0ZSB0byBjb25zaWRlciB0aGlzIHF1ZXN0aW9uLgoKIyMgQ3JlYXRlIHNhbXBsZSBzaGVldAoKSSByZWNlbnRseSB3cm90ZSBhIG5pZnR5IGZ1bmN0aW9uIHdoaWNoIGluIHRoZW9yeSBzaG91bGQgZmlsbCBpbiBhCnNhbXBsZSBzaGVldCwgbGV0IHVzIHNlZSBob3cgd2VsbCBpdCB3b3JrcyBmb3IgdGhpcyBkYXRhLgoKYGBge3IgdGVzdF9yfQojI25ld19zYW1wbGVzaGVldCA8LSBnYXRoZXJfcHJlcHJvY2Vzc2luZ19tZXRhZGF0YSgic2FtcGxlX3NoZWV0cy9hbGxfc2FtcGxlcy54bHN4IikKYGBgCgpJdCB0dXJucyBvdXQgdGhhdCBmdW5jdGlvbiBpcyBub3QgeWV0IHdlbGwgc3VpdGVkIHRvIHRoaXMgZGF0YS4KTm90YWJseSBpdCB0cmllcyB0byBzYW5pdGl6ZSB0aGUgZGF0ZSBjb2x1bW5zIGFuZCBkb2VzIHNvIHBvb3JseS4KCkFsc28sIGRhbW5pdCBJIGFtIGFuIGlkaW90IGFuZCByZWRpZCB0aGUgbWFwcGluZyBvZmYgb2YgbXkgbm90ZXMKcmF0aGVyIHRoYW4gcmVtZW1iZXIgdGhhdCB0aGUgc2FtcGxlIHNoZWV0IGV4cGxpY2l0bHkgc3RhdGVzIHRoYXQgdGhpcwppcyA1NDQ4IGFuZCBub3QgNTAwNS4gIFRoaXMgd2lsbCBub3QgbWFrZSBhIGh1Z2UgZGlmZmVyZW5jZSwgYnV0IEkKd2lsbCByZW1hcCBhcyB3ZWxsLgoKTm90ZSB0aGF0IGZvciB0aGUgYW5ub3RhdGlvbnMsIEkgYW0gYWxtb3N0IGNlcnRhaW4gdGhhdCB0aGUgZ2VuZSBJRHMKaGF2ZSBkaWZmZXJlbnQgbGV4aWNhbCB2YWx1ZXMgYmV0d2VlbiB0aGUgZ2Vub21lIGRvd25sb2FkZWQgZnJvbSBOQ0JJCmFuZCB0aGUgYW5ub3RhdGlvbiBkYXRhIGF0IG1pY3JvYmVzb25saW5lLiAgSWYgSSByZWNhbGwgaXQgaXMKc29tZXRoaW5nIGxpa2UgdGhlIGRpZmZlcmVuY2UgYmV0d2VlbiAnTTUwMDVfU3B5XzAwMScgYW5kCidNNTAwNVNweV8wMDEnIG9yIHNvbWV0aGluZyBzaW1pbGFybHkgd2Fja3kuCgpgYGB7ciBhbm5vdGF0aW9uc30KYW5ub3QgPC0gYXMuZGF0YS5mcmFtZShsb2FkX21pY3JvYmVzb25saW5lX2Fubm90YXRpb25zKCI1MDA1IikpCnJvd25hbWVzKGFubm90KSA8LSBtYWtlLm5hbWVzKGdzdWIoeD1hbm5vdFtbInN5c05hbWUiXV0sIHBhdHRlcm49Ik01MDA1X1NweV8iLCByZXBsYWNlbWVudD0iTTUwMDVfU3B5IiksIHVuaXF1ZT1UUlVFKQpgYGAKCmBgYHtyIGV4cHJlc3Npb25zZXRzfQpzYW1wbGVfc2hlZXQgPC0gInNhbXBsZV9zaGVldHMvYWxsX3NhbXBsZXMueGxzeCIKZ2Vub21lX2V4cHQgPC0gY3JlYXRlX2V4cHQoc2FtcGxlX3NoZWV0LCBnZW5lX2luZm89YW5ub3QsIGZpbGVfY29sdW1uPSJjb3VudHRhYmxlIikKaW50ZXJfZXhwdCA8LSBjcmVhdGVfZXhwdChzYW1wbGVfc2hlZXQsIGdlbmVfaW5mbz1hbm5vdCwgZmlsZV9jb2x1bW49ImludGVyY291bnQiKQpgYGAKCiMjIERlY2lkZSB3aGF0IEkgd2FudCB0byBjb25zaWRlciBhcyBjb25kaXRpb24vYmF0Y2gvZXRjCgpgYGB7ciBzZXRfY29uZGl0aW9uc30KZ2Vub21lX2V4cHQgPC0gZ2Vub21lX2V4cHQgJT4lCiAgc2V0X2V4cHRfY29uZGl0aW9ucyhmYWN0PSJnZW5vdHlwZSIpICU+JQogIHNldF9leHB0X2JhdGNoZXMoZmFjdD0idGltZSIpCgppbnRlcl9leHB0IDwtIGludGVyX2V4cHQgJT4lCiAgc2V0X2V4cHRfY29uZGl0aW9ucyhmYWN0PSJnZW5vdHlwZSIpICU+JQogIHNldF9leHB0X2JhdGNoZXMoZmFjdD0idGltZSIpCmBgYAoKCiMjIEEgZmV3IGRlc2NyaXB0aXZlIHBsb3RzCgojIyMgQ0RTCgpgYGB7ciBjZHNfcGxvdHN9CmNkc19wbG90cyA8LSBncmFwaF9tZXRyaWNzKGdlbm9tZV9leHB0KQpjZHNfcGxvdHMkbGlic2l6ZQpjZHNfbm9ybSA8LSBub3JtYWxpemVfZXhwdChnZW5vbWVfZXhwdCwgZmlsdGVyPVRSVUUsIG5vcm09InF1YW50IiwgY29udmVydD0iY3BtIiwgdHJhbnNmb3JtPSJsb2cyIikKY2RzX25vcm1fcGxvdHMgPC0gZ3JhcGhfbWV0cmljcyhjZHNfbm9ybSkKY2RzX25vcm1fcGxvdHMkcGNfcGxvdAoKY2RzX2RlIDwtIGFsbF9wYWlyd2lzZShnZW5vbWVfZXhwdCwgZmlsdGVyPVRSVUUsIG1vZGVsX2JhdGNoPVRSVUUpCmNkc190YWJsZSA8LSBjb21iaW5lX2RlX3RhYmxlcyhjZHNfZGUsIGV4Y2VsPSJleGNlbC9jZHNfdGFibGUueGxzeCIpCmBgYAoKIyMjIEludGVyQ0RTCgpgYGB7ciBpbnRlcl9wbG90c30KaW50ZXJfcGxvdHMgPC0gZ3JhcGhfbWV0cmljcyhpbnRlcl9leHB0KQppbnRlcl9wbG90cyRsaWJzaXplCmludGVyX25vcm0gPC0gbm9ybWFsaXplX2V4cHQoaW50ZXJfZXhwdCwgZmlsdGVyPVRSVUUsIG5vcm09InF1YW50IiwgY29udmVydD0iY3BtIiwgdHJhbnNmb3JtPSJsb2cyIikKaW50ZXJfbm9ybV9wbG90cyA8LSBncmFwaF9tZXRyaWNzKGludGVyX25vcm0pCmludGVyX25vcm1fcGxvdHMkcGNfcGxvdAoKaW50ZXJfZGUgPC0gYWxsX3BhaXJ3aXNlKGludGVyX2V4cHQsIGZpbHRlcj1UUlVFLCBtb2RlbF9iYXRjaD1UUlVFKQppbnRlcl90YWJsZSA8LSBjb21iaW5lX2RlX3RhYmxlcyhpbnRlcl9kZSwgZXhjZWw9ImV4Y2VsL2ludGVyX3RhYmxlLnhsc3giKQpgYGAKCiMjIENvbXBhcmUgQ0RTL2ludGVyQ0RTCgpgYGB7ciBjb21wYXJlX2RlfQpjb21wIDwtIGNvbXBhcmVfZGVfcmVzdWx0cyhjZHNfdGFibGUsIGludGVyX3RhYmxlKQphcy5kYXRhLmZyYW1lKGNvbXAkcmVzdWx0KQpgYGAKClNpbWlsYXIsIGJ1dCBkZWZpbml0ZWx5IG5vdCB0aGUgc2FtZSwgc28gdGhlcmUgbWF5IGJlIHNvbWUgaW50ZXJlc3RpbmcgZGlmZmVyZW5jZXMuCgpgYGB7ciBzYXZlbWUsIGV2YWw9RkFMU0V9CnBhbmRlcjo6cGFuZGVyKHNlc3Npb25JbmZvKCkpCm1lc3NhZ2UocGFzdGUwKCJUaGlzIGlzIGhwZ2x0b29scyBjb21taXQ6ICIsIGdldF9naXRfY29tbWl0KCkpKQp0aGlzX3NhdmUgPC0gcGFzdGUwKGdzdWIocGF0dGVybj0iXFwuUm1kIiwgcmVwbGFjZT0iIiwgeD1ybWRfZmlsZSksICItdiIsIHZlciwgIi5yZGEueHoiKQptZXNzYWdlKHBhc3RlMCgiU2F2aW5nIHRvICIsIHRoaXNfc2F2ZSkpCnRtcCA8LSBzbShzYXZlbWUoZmlsZW5hbWU9dGhpc19zYXZlKSkKYGBgCg==