1 A fresh running of all proteomics tasks

I think I finally worked out all(most?) of the kinks in the processing of DIA data. Thus I want to have a fresh run of all the tasks required to interpret the results.

2 Annotation version: 20180611

2.1 Genome annotation input

2.1.1 Read a gff file

In contrast, it is possible to load most annotations of interest directly from the gff files used in the alignments. More in-depth information for the human transcriptome may be extracted from biomart.

## The old way of getting genome/annotation data
mtb_gff <- "reference/mycobacterium_tuberculosis_h37rv_2.gff.gz"

mtb_genome <- "reference/mtuberculosis_h37rv_genbank.fasta"
mtb_cds <- "reference/mtb_cds.fasta"

mtb_annotations <- sm(load_gff_annotations(mtb_gff, type="gene"))
colnames(mtb_annotations) <- gsub(pattern="\\.", replacement="", x=colnames(mtb_annotations))
mtb_annotations[["description"]] <- gsub(pattern="\\+", replacement=" ",
                                         x=mtb_annotations[["description"]])
mtb_annotations[["function"]] <- gsub(pattern="\\+", replacement=" ",
                                         x=mtb_annotations[["function"]])
rownames(mtb_annotations) <- mtb_annotations[["ID"]]

2.1.2 Download from microbesonline

Apparently I queried the microbesonline too often and now I get an error whenever I try to use them, this disappoints me.

## First figure out the ID for the Mtb genome:
ids <- get_microbesonline_ids("37")
head(ids)
## Mycobacterium tuberculosis H37Rv is the first entry and has id: 83332
mtb_microbes <- load_microbesonline_annotations(ids=83332)
## I made a nifty function to do this stuff: load_uniprotws_annotations().
## It is slow, though.
mtb_uniprot_annot <- load_uniprotws_annotations()

2.2 Getting ontology data

mtb_go <- load_microbesonline_go(id=83332)
if (!isTRUE(get0("skip_load"))) {
  pander::pander(sessionInfo())
  message(paste0("This is hpgltools commit: ", get_git_commit()))
  this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
  message(paste0("Saving to ", this_save))
  tmp <- sm(saveme(filename=this_save))
}
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset f62f1ecc8572dec3d4dfd004ad6e7b48661234c0
## R> packrat::restore()
## This is hpgltools commit: Fri Jun 8 09:23:24 2018 -0400: f62f1ecc8572dec3d4dfd004ad6e7b48661234c0
## Saving to 01_annotation_20180611-v20180611.rda.xz
LS0tCnRpdGxlOiAiTS50dWJlcmN1bG9zaXMgMjAxODA2MTEgcHJvdGVvbWljczogQ29sbGVjdGluZyBhbm5vdGF0aW9uIGRhdGEuIgphdXRob3I6ICJhdGIgYWJlbGV3QGdtYWlsLmNvbSIKZGF0ZTogImByIFN5cy5EYXRlKClgIgpvdXRwdXQ6CiBodG1sX2RvY3VtZW50OgogIGNvZGVfZG93bmxvYWQ6IHRydWUKICBjb2RlX2ZvbGRpbmc6IHNob3cKICBmaWdfY2FwdGlvbjogdHJ1ZQogIGZpZ19oZWlnaHQ6IDcKICBmaWdfd2lkdGg6IDcKICBoaWdobGlnaHQ6IGRlZmF1bHQKICBrZWVwX21kOiBmYWxzZQogIG1vZGU6IHNlbGZjb250YWluZWQKICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICBzZWxmX2NvbnRhaW5lZDogdHJ1ZQogIHRoZW1lOiByZWFkYWJsZQogIHRvYzogdHJ1ZQogIHRvY19mbG9hdDoKICAgIGNvbGxhcHNlZDogZmFsc2UKICAgIHNtb290aF9zY3JvbGw6IGZhbHNlCi0tLQoKPHN0eWxlPgogIGJvZHkgLm1haW4tY29udGFpbmVyIHsKICAgIG1heC13aWR0aDogMTYwMHB4OwogIH0KPC9zdHlsZT4KCmBgYHtyIG9wdGlvbnMsIGluY2x1ZGU9RkFMU0V9CmlmICghaXNUUlVFKGdldDAoInNraXBfbG9hZCIpKSkgewogIGxpYnJhcnkoaHBnbHRvb2xzKQogIHR0IDwtIGRldnRvb2xzOjpsb2FkX2FsbCgifi9ocGdsdG9vbHMiKQogIGtuaXRyOjpvcHRzX2tuaXQkc2V0KHByb2dyZXNzPVRSVUUsCiAgICAgICAgICAgICAgICAgICAgICAgdmVyYm9zZT1UUlVFLAogICAgICAgICAgICAgICAgICAgICAgIHdpZHRoPTkwLAogICAgICAgICAgICAgICAgICAgICAgIGVjaG89VFJVRSkKICBrbml0cjo6b3B0c19jaHVuayRzZXQoZXJyb3I9VFJVRSwKICAgICAgICAgICAgICAgICAgICAgICAgZmlnLndpZHRoPTgsCiAgICAgICAgICAgICAgICAgICAgICAgIGZpZy5oZWlnaHQ9OCwKICAgICAgICAgICAgICAgICAgICAgICAgZHBpPTk2KQogIG9sZF9vcHRpb25zIDwtIG9wdGlvbnMoZGlnaXRzPTQsCiAgICAgICAgICAgICAgICAgICAgICAgICBzdHJpbmdzQXNGYWN0b3JzPUZBTFNFLAogICAgICAgICAgICAgICAgICAgICAgICAga25pdHIuZHVwbGljYXRlLmxhYmVsPSJhbGxvdyIpCiAgZ2dwbG90Mjo6dGhlbWVfc2V0KGdncGxvdDI6OnRoZW1lX2J3KGJhc2Vfc2l6ZT0xMCkpCiAgdmVyIDwtICIyMDE4MDYxMSIKICBwcmV2aW91c19maWxlIDwtICJpbmRleC5SbWQiCgogIHRtcCA8LSB0cnkoc20obG9hZG1lKGZpbGVuYW1lPXBhc3RlMChnc3ViKHBhdHRlcm49IlxcLlJtZCIsIHJlcGxhY2U9IiIsIHg9cHJldmlvdXNfZmlsZSksICItdiIsIHZlciwgIi5yZGEueHoiKSkpKQogIHJtZF9maWxlIDwtIHBhc3RlMCgiMDFfYW5ub3RhdGlvbl8iLCB2ZXIsICIuUm1kIikKfQpgYGAKCkEgZnJlc2ggcnVubmluZyBvZiBhbGwgcHJvdGVvbWljcyB0YXNrcwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KCkkgdGhpbmsgSSBmaW5hbGx5IHdvcmtlZCBvdXQgYWxsKG1vc3Q/KSBvZiB0aGUga2lua3MgaW4gdGhlIHByb2Nlc3Npbmcgb2YgRElBIGRhdGEuClRodXMgSSB3YW50IHRvIGhhdmUgYSBmcmVzaCBydW4gb2YgYWxsIHRoZSB0YXNrcyByZXF1aXJlZCB0byBpbnRlcnByZXQgdGhlIHJlc3VsdHMuCgojIEFubm90YXRpb24gdmVyc2lvbjogYHIgdmVyYAoKIyMgR2Vub21lIGFubm90YXRpb24gaW5wdXQKCiMjIyBSZWFkIGEgZ2ZmIGZpbGUKCkluIGNvbnRyYXN0LCBpdCBpcyBwb3NzaWJsZSB0byBsb2FkIG1vc3QgYW5ub3RhdGlvbnMgb2YgaW50ZXJlc3QgZGlyZWN0bHkgZnJvbSB0aGUgZ2ZmIGZpbGVzIHVzZWQgaW4KdGhlIGFsaWdubWVudHMuICBNb3JlIGluLWRlcHRoIGluZm9ybWF0aW9uIGZvciB0aGUgaHVtYW4gdHJhbnNjcmlwdG9tZSBtYXkgYmUgZXh0cmFjdGVkIGZyb20gYmlvbWFydC4KCmBgYHtyIGdlbm9tZV9pbnB1dH0KIyMgVGhlIG9sZCB3YXkgb2YgZ2V0dGluZyBnZW5vbWUvYW5ub3RhdGlvbiBkYXRhCm10Yl9nZmYgPC0gInJlZmVyZW5jZS9teWNvYmFjdGVyaXVtX3R1YmVyY3Vsb3Npc19oMzdydl8yLmdmZi5neiIKCm10Yl9nZW5vbWUgPC0gInJlZmVyZW5jZS9tdHViZXJjdWxvc2lzX2gzN3J2X2dlbmJhbmsuZmFzdGEiCm10Yl9jZHMgPC0gInJlZmVyZW5jZS9tdGJfY2RzLmZhc3RhIgoKbXRiX2Fubm90YXRpb25zIDwtIHNtKGxvYWRfZ2ZmX2Fubm90YXRpb25zKG10Yl9nZmYsIHR5cGU9ImdlbmUiKSkKY29sbmFtZXMobXRiX2Fubm90YXRpb25zKSA8LSBnc3ViKHBhdHRlcm49IlxcLiIsIHJlcGxhY2VtZW50PSIiLCB4PWNvbG5hbWVzKG10Yl9hbm5vdGF0aW9ucykpCm10Yl9hbm5vdGF0aW9uc1tbImRlc2NyaXB0aW9uIl1dIDwtIGdzdWIocGF0dGVybj0iXFwrIiwgcmVwbGFjZW1lbnQ9IiAiLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHg9bXRiX2Fubm90YXRpb25zW1siZGVzY3JpcHRpb24iXV0pCm10Yl9hbm5vdGF0aW9uc1tbImZ1bmN0aW9uIl1dIDwtIGdzdWIocGF0dGVybj0iXFwrIiwgcmVwbGFjZW1lbnQ9IiAiLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHg9bXRiX2Fubm90YXRpb25zW1siZnVuY3Rpb24iXV0pCnJvd25hbWVzKG10Yl9hbm5vdGF0aW9ucykgPC0gbXRiX2Fubm90YXRpb25zW1siSUQiXV0KYGBgCgojIyMgRG93bmxvYWQgZnJvbSBtaWNyb2Jlc29ubGluZQoKQXBwYXJlbnRseSBJIHF1ZXJpZWQgdGhlIG1pY3JvYmVzb25saW5lIHRvbyBvZnRlbiBhbmQgbm93IEkgZ2V0IGFuIGVycm9yCndoZW5ldmVyIEkgdHJ5IHRvIHVzZSB0aGVtLCB0aGlzIGRpc2FwcG9pbnRzIG1lLgoKYGBge3IgbWljcm9iZXNvbmxpbmUsIGV2YWw9RkFMU0V9CiMjIEZpcnN0IGZpZ3VyZSBvdXQgdGhlIElEIGZvciB0aGUgTXRiIGdlbm9tZToKaWRzIDwtIGdldF9taWNyb2Jlc29ubGluZV9pZHMoIjM3IikKaGVhZChpZHMpCiMjIE15Y29iYWN0ZXJpdW0gdHViZXJjdWxvc2lzIEgzN1J2IGlzIHRoZSBmaXJzdCBlbnRyeSBhbmQgaGFzIGlkOiA4MzMzMgptdGJfbWljcm9iZXMgPC0gbG9hZF9taWNyb2Jlc29ubGluZV9hbm5vdGF0aW9ucyhpZHM9ODMzMzIpCmBgYAoKYGBge3IgZ2VuYmFuaywgZXZhbD1GQUxTRX0KIyMgSSBtYWRlIGEgbmlmdHkgZnVuY3Rpb24gdG8gZG8gdGhpcyBzdHVmZjogbG9hZF91bmlwcm90d3NfYW5ub3RhdGlvbnMoKS4KIyMgSXQgaXMgc2xvdywgdGhvdWdoLgptdGJfdW5pcHJvdF9hbm5vdCA8LSBsb2FkX3VuaXByb3R3c19hbm5vdGF0aW9ucygpCmBgYAoKIyMgR2V0dGluZyBvbnRvbG9neSBkYXRhCgpgYGB7ciBvbnRvbG9neSwgZXZhbD1GQUxTRX0KbXRiX2dvIDwtIGxvYWRfbWljcm9iZXNvbmxpbmVfZ28oaWQ9ODMzMzIpCmBgYAoKYGBge3Igc2F2ZW1lfQppZiAoIWlzVFJVRShnZXQwKCJza2lwX2xvYWQiKSkpIHsKICBwYW5kZXI6OnBhbmRlcihzZXNzaW9uSW5mbygpKQogIG1lc3NhZ2UocGFzdGUwKCJUaGlzIGlzIGhwZ2x0b29scyBjb21taXQ6ICIsIGdldF9naXRfY29tbWl0KCkpKQogIHRoaXNfc2F2ZSA8LSBwYXN0ZTAoZ3N1YihwYXR0ZXJuPSJcXC5SbWQiLCByZXBsYWNlPSIiLCB4PXJtZF9maWxlKSwgIi12IiwgdmVyLCAiLnJkYS54eiIpCiAgbWVzc2FnZShwYXN0ZTAoIlNhdmluZyB0byAiLCB0aGlzX3NhdmUpKQogIHRtcCA8LSBzbShzYXZlbWUoZmlsZW5hbWU9dGhpc19zYXZlKSkKfQpgYGAK