1 How many genes of what length?

We always arbitrarily say 1.5 kb…

Pull some code from annotation.Rmd and check that assumption.

## The species being downloaded is: Pseudomonas aeruginosa UCBPP-PA14
locusId accession GI scaffoldId start stop strand sysName name desc COG COGFun COGDesc TIGRFam TIGRRoles GO EC ECDesc
2194572 YP_788156.1 116053721 4582 483 2027 + PA14_00010 dnaA chromosomal replication initiator protein DnaA (NCBI) COG593 L ATPase involved in DNA replication initiation TIGR00362 chromosomal replication initiator protein DnaA [dnaA] DNA metabolism:DNA replication, recombination, and repair GO:0006270,GO:0006275,GO:0003688,GO:0017111,GO:0005524
2194573 YP_788157.1 116053722 4582 2056 3159 + PA14_00020 dnaN DNA polymerase III, beta chain (NCBI) COG592 L DNA polymerase sliding clamp subunit (PCNA homolog) TIGR00663 DNA polymerase III, beta subunit [dnaN] DNA metabolism:DNA replication, recombination, and repair GO:0006260,GO:0003677,GO:0003893,GO:0008408,GO:0016449,GO:0019984,GO:0003889,GO:0003894,GO:0015999,GO:0016450,GO:0003890,GO:0003895,GO:0016000,GO:0016451,GO:0003891,GO:0016448,GO:0016452 2.7.7.7 DNA-directed DNA polymerase.
2194574 YP_788158.1 116053723 4582 3169 4278 + PA14_00030 recF DNA replication and repair protein RecF (NCBI) COG1195 L Recombinational DNA repair ATPase (RecF pathway) TIGR00611 DNA replication and repair protein RecF [recF] DNA metabolism:DNA replication, recombination, and repair GO:0006281,GO:0005694,GO:0005524,GO:0017111,GO:0003697
2194575 YP_788159.1 116053724 4582 4275 6695 + PA14_00050 gyrB DNA gyrase subunit B (NCBI) COG187 L Type IIA topoisomerase (DNA gyrase/topo II, topoisomerase IV), B subunit TIGR01059 DNA gyrase, B subunit [gyrB] DNA metabolism:DNA replication, recombination, and repair GO:0006304,GO:0006265,GO:0005694,GO:0003918,GO:0005524 5.99.1.3 DNA topoisomerase (ATP-hydrolyzing).
2194576 YP_788160.1 116053725 4582 7791 7018 - PA14_00060 PA14_00060 putative acyltransferase (NCBI) COG204 I 1-acyl-sn-glycerol-3-phosphate acyltransferase GO:0008152,GO:0003841 2.3.1.51 1-acylglycerol-3-phosphate O-acyltransferase.
2194577 YP_788161.1 116053726 4582 8339 7803 - PA14_00070 PA14_00070 putative histidinol-phosphatase (NCBI) COG241 E Histidinol phosphatase and related phosphatases TIGR01656 histidinol-phosphate phosphatase domain,TIGR01662 HAD hydrolase, family IIIA Unknown function:Enzymes of unknown specificity GO:0000105,GO:0004401 3.1.3.-
## [1] "GRanges object with 5890 ranges and 20 metadata columns"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      72     525     858     986    1254   15639
## [1] 6537648

2 Some numbers from illumina

I would like to back these up with some numbers from Yoann’s RNASeq. But until I do, the ‘HiSeq_2500_1500_Sequencing_systems_specsheet.pdf’ Najib sent me shows the following:

  • 2 x 50 nt reads: 135-150 Gb. in normal mode. (Call it 65 Gb. for 50 nt single)
  • 2 x 50 nt reads: 25-30 Gb. in high-throughput mode. (Call it 12 Gb for 50 nt single)

  • ‘Up to 1.5 billion single reads passing quality filters.’

So let us assume that the genome is completely and equally expressed.

We therefore expect some number of reads per nt.

The following is assuming one uses a normal read-mode, divide all numbers by 5 for rapid run.

## [1] 229.4
## [1] 11472
## [1] 247158
## [1] 254.5

3 Using Yoann’s RNASeq for a concrete example.

Caveat, the genome is only 1.8E6 nt rather than your 6.5E6.

Also, lets jump straight to the number of reads aligned to cds sequences.

These were run in a single sequencing run, I do not know if rapid or regular, but 100 nt single. Note that I am only counting the successfully mapped reads to the cds, including reads which map randomly to more than 1 place (but are mapped only once.)

Sample: Total counts to cds with strict mapping parameters (1,814 cds entries)

ids reads
hpgl0531 5729423
hpgl0532 6032465
hpgl0533 5554651
hpgl0534 5946316
hpgl0535 5711986
hpgl0536 6451776
hpgl0537 3356330
hpgl0538 4022852
hpgl0539 5203054
hpgl0540 5655144
hpgl0541 5324956
hpgl0542 5953598
hpgl0543 5556916
hpgl0544 5203653
hpgl0545 5076034
hpgl0546 5475249
hpgl0547 6150507
hpgl0548 4712725
hpgl0549 5176689
hpgl0550 5788482
hpgl0551 6052472
hpgl0552 5555723
hpgl0553 5655540
hpgl0554 6403623
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 3356330 5203503 5606030 5489590 5948136 6451776
## [1] "1.318e+08"

So I think we can assume that they loaded 1 lane of a normal 8 lane run?

But the main piece of interesting information in my mind: the numbers of reads/sample varied by about a factor of 2.

LS0tCnRpdGxlOiAiMjAxNzA2MjA6IFF1aWNrIGNvdmVyYWdlIGVzdGltYXRlcyBmb3IgUHNldWRvbW9uYXMgYWVydWdpbm9zYS4iCmF1dGhvcjogImF0YiBhYmVsZXdAZ21haWwuY29tIgpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiCm91dHB1dDoKIGh0bWxfZG9jdW1lbnQ6CiAgY29kZV9kb3dubG9hZDogdHJ1ZQogIGNvZGVfZm9sZGluZzogc2hvdwogIGZpZ19jYXB0aW9uOiB0cnVlCiAgZmlnX2hlaWdodDogNwogIGZpZ193aWR0aDogNwogIGhpZ2hsaWdodDogdGFuZ28KICBrZWVwX21kOiBmYWxzZQogIG1vZGU6IHNlbGZjb250YWluZWQKICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICBzZWxmX2NvbnRhaW5lZDogdHJ1ZQogIHRoZW1lOiBjb3NtbwogIHRvYzogdHJ1ZQogIHRvY19mbG9hdDoKICAgIGNvbGxhcHNlZDogZmFsc2UKICAgIHNtb290aF9zY3JvbGw6IGZhbHNlCi0tLQoKPHN0eWxlPgogIGJvZHkgLm1haW4tY29udGFpbmVyIHsKICAgIG1heC13aWR0aDogMTYwMHB4Owp9Cjwvc3R5bGU+CgpgYGB7ciBvcHRpb25zLCBpbmNsdWRlPUZBTFNFfQojIyBUaGVzZSBhcmUgdGhlIG9wdGlvbnMgSSB0ZW5kIHRvIGZhdm9yCmxpYnJhcnkoImhwZ2x0b29scyIpCnR0IDwtIGRldnRvb2xzOjpsb2FkX2FsbCgifi9ocGdsdG9vbHMiKQprbml0cjo6b3B0c19rbml0JHNldCgKICAgIHByb2dyZXNzID0gVFJVRSwKICAgIHZlcmJvc2UgPSBUUlVFLAogICAgd2lkdGggPSA5MCwKICAgIGVjaG8gPSBUUlVFKQprbml0cjo6b3B0c19jaHVuayRzZXQoCiAgICBlcnJvciA9IFRSVUUsCiAgICBmaWcud2lkdGggPSA4LAogICAgZmlnLmhlaWdodCA9IDgsCiAgICBkcGkgPSA5NikKb3B0aW9ucygKICAgIGRpZ2l0cyA9IDQsCiAgICBzdHJpbmdzQXNGYWN0b3JzID0gRkFMU0UsCiAgICBrbml0ci5kdXBsaWNhdGUubGFiZWwgPSAiYWxsb3ciKQpnZ3Bsb3QyOjp0aGVtZV9zZXQoZ2dwbG90Mjo6dGhlbWVfYncoYmFzZV9zaXplPTEwKSkKc2V0LnNlZWQoMSkKYGBgCgojIEhvdyBtYW55IGdlbmVzIG9mIHdoYXQgbGVuZ3RoPwoKV2UgYWx3YXlzIGFyYml0cmFyaWx5IHNheSAxLjUga2IuLi4KClB1bGwgc29tZSBjb2RlIGZyb20gYW5ub3RhdGlvbi5SbWQgYW5kIGNoZWNrIHRoYXQgYXNzdW1wdGlvbi4KCmBgYHtyIGFubm90YXRpb259CiMjIG1pY3JvYmVzX2lkcwojIyBMb29rcyBsaWtlIGl0IGlzIHRheG9uIElEIDIwODk2MwpwYWVydWdpbm9zYV9hbm5vdGF0aW9ucyA8LSBsb2FkX21pY3JvYmVzb25saW5lX2Fubm90YXRpb25zKGlkPTIwODk2MykKa25pdHI6OmthYmxlKGhlYWQocGFlcnVnaW5vc2FfYW5ub3RhdGlvbnMpKQoKcGEgPC0gbmV3LmVudigpCnRlc3QgPC0gdHJ5KGxvYWQoInBhMTQucmRhIiwgZW52aXI9cGEpKQpwYTE0IDwtIE5VTEwKaWYgKGNsYXNzKHRlc3QpID09ICJ0cnktZXJyb3IiKSB7CiAgICBwYTE0IDwtIHNtKGdiazJ0eGRiKGFjY2Vzc2lvbj0iTkNfMDA4NDYzLjEiKSkKICAgICMjIHNhdmUgYSBjb3B5IG9mIHRoaXMgZGF0YSBzdHJ1Y3R1cmUgdG8gYXZvaWQgaGF2aW5nIHRvIHJlZG93bmxvYWQgaXQKICAgIHNhdmUobGlzdD1jKCJwYTE0IiksIGZpbGU9InBhMTQucmRhIikKfSBlbHNlIHsKICAgIHBhMTQgPC0gcGEkcGExNAogICAgcm0ocGEpCn0KCnN1bW1hcnkocGExNCRjZHMpCiMjIDYwNjkgY2RzIHJlZ2lvbnMuCmNkc19yZWdpb25zIDwtIGFzLmRhdGEuZnJhbWUocGExNCRjZHMpCmxlbmd0aF9zdW1tYXJ5IDwtIHN1bW1hcnkoY2RzX3JlZ2lvbnMkd2lkdGgpCmxlbmd0aF9zdW1tYXJ5CiMjIE1lYW46IDk3MS4zIG50IC8gY2RzIG92ZXIgNjA2OSBjZHMgcmVnaW9uczogNSw5ODQsODIwIG50IG9mIGNkcyBzZXF1ZW5jZS4KSVJhbmdlczo6d2lkdGgocGExNCRzZXEpCiMjIDYsNTM3LDY0OCBudCB0b3RhbCwgc28gOTEuNSUgb2YgdGhlIGdlbm9tZSBpcyBjZHMgYnkgdGhpcyBtZXRyaWMuLi4KYGBgCgojIFNvbWUgbnVtYmVycyBmcm9tIGlsbHVtaW5hCgpJIHdvdWxkIGxpa2UgdG8gYmFjayB0aGVzZSB1cCB3aXRoIHNvbWUgbnVtYmVycyBmcm9tIFlvYW5uJ3MgUk5BU2VxLgpCdXQgdW50aWwgSSBkbywgdGhlICdIaVNlcV8yNTAwXzE1MDBfU2VxdWVuY2luZ19zeXN0ZW1zX3NwZWNzaGVldC5wZGYnIE5hamliIHNlbnQgbWUgc2hvd3MgdGhlIGZvbGxvd2luZzoKCiogMiB4IDUwIG50IHJlYWRzOiAxMzUtMTUwIEdiLiBpbiBub3JtYWwgbW9kZS4gKENhbGwgaXQgNjUgR2IuIGZvciA1MCBudCBzaW5nbGUpCiogMiB4IDUwIG50IHJlYWRzOiAyNS0zMCBHYi4gaW4gaGlnaC10aHJvdWdocHV0IG1vZGUuIChDYWxsIGl0IDEyIEdiIGZvciA1MCBudCBzaW5nbGUpCgoqICdVcCB0byAxLjUgYmlsbGlvbiBzaW5nbGUgcmVhZHMgcGFzc2luZyBxdWFsaXR5IGZpbHRlcnMuJwoKU28gbGV0IHVzIGFzc3VtZSB0aGF0IHRoZSBnZW5vbWUgaXMgY29tcGxldGVseSBhbmQgZXF1YWxseSBleHByZXNzZWQuCgpXZSB0aGVyZWZvcmUgZXhwZWN0IHNvbWUgbnVtYmVyIG9mIHJlYWRzIHBlciBudC4KClRoZSBmb2xsb3dpbmcgaXMgYXNzdW1pbmcgb25lIHVzZXMgYSBub3JtYWwgcmVhZC1tb2RlLCBkaXZpZGUgYWxsIG51bWJlcnMgYnkgNSBmb3IgcmFwaWQgcnVuLgoKYGBge3IgcXVpY2tfbWF0aH0KIyMgUmVhZHMgLyBnZW5vbWljIG51Y2xlb3RpZGUuCjEuNUU5IC8gNjUzNzY0OAoKIyMgTnVjbGVvdGlkZXMgc2VxdWVuY2VkIC8gZ2Vub21pYyBudWNsZW90aWRlLgooMS41RTkgLyA2NTM3NjQ4KSAqIDUwCgojIyBSZWFkcyAvIGNkcwoxLjVFOSAvIDYwNjkKCiMjIFJlYWRzIC8gY2RzIG51Y2xlb3RpZGUKKDEuNUU5IC8gNjA2OSkgLyA5NzEuMwpgYGAKCiMgVXNpbmcgWW9hbm4ncyBSTkFTZXEgZm9yIGEgY29uY3JldGUgZXhhbXBsZS4KCkNhdmVhdCwgdGhlIGdlbm9tZSBpcyBvbmx5IDEuOEU2IG50IHJhdGhlciB0aGFuIHlvdXIgNi41RTYuCgpBbHNvLCBsZXRzIGp1bXAgc3RyYWlnaHQgdG8gdGhlIG51bWJlciBvZiByZWFkcyBhbGlnbmVkIHRvIGNkcyBzZXF1ZW5jZXMuCgpUaGVzZSB3ZXJlIHJ1biBpbiBhIHNpbmdsZSBzZXF1ZW5jaW5nIHJ1biwgSSBkbyBub3Qga25vdyBpZiByYXBpZCBvciByZWd1bGFyLApidXQgMTAwIG50IHNpbmdsZS4gIE5vdGUgdGhhdCBJIGFtIG9ubHkgY291bnRpbmcgdGhlIHN1Y2Nlc3NmdWxseSBtYXBwZWQgcmVhZHMKdG8gdGhlIGNkcywgaW5jbHVkaW5nIHJlYWRzIHdoaWNoIG1hcCByYW5kb21seSB0byBtb3JlIHRoYW4gMSBwbGFjZSAoYnV0IGFyZQptYXBwZWQgb25seSBvbmNlLikKClNhbXBsZTogIFRvdGFsIGNvdW50cyB0byBjZHMgd2l0aCBzdHJpY3QgbWFwcGluZyBwYXJhbWV0ZXJzICgxLDgxNCBjZHMgZW50cmllcykKCmBgYHtyIGZ1bmt5fQpzcHlvZ2VuZXNfc3VtbWFyeSA8LSBkYXRhLmZyYW1lKAogICAgaWRzPWMoImhwZ2wwNTMxIiwiaHBnbDA1MzIiLCJocGdsMDUzMyIsImhwZ2wwNTM0IiwiaHBnbDA1MzUiLCJocGdsMDUzNiIsImhwZ2wwNTM3IiwiaHBnbDA1MzgiLCJocGdsMDUzOSIsCiAgICAgICAgICAiaHBnbDA1NDAiLCJocGdsMDU0MSIsImhwZ2wwNTQyIiwiaHBnbDA1NDMiLCJocGdsMDU0NCIsImhwZ2wwNTQ1IiwiaHBnbDA1NDYiLCJocGdsMDU0NyIsImhwZ2wwNTQ4IiwiaHBnbDA1NDkiLAogICAgICAgICAgImhwZ2wwNTUwIiwiaHBnbDA1NTEiLCJocGdsMDU1MiIsImhwZ2wwNTUzIiwiaHBnbDA1NTQiKSwKICAgIHJlYWRzPWMoNTcyOTQyMywgNjAzMjQ2NSwgNTU1NDY1MSwgNTk0NjMxNiwgNTcxMTk4NiwgNjQ1MTc3NiwgMzM1NjMzMCwgNDAyMjg1MiwKICAgICAgICAgICAgNTIwMzA1NCwgNTY1NTE0NCwgNTMyNDk1NiwgNTk1MzU5OCwgNTU1NjkxNiwgNTIwMzY1MywgNTA3NjAzNCwgNTQ3NTI0OSwKICAgICAgICAgICAgNjE1MDUwNywgNDcxMjcyNSwgNTE3NjY4OSwgNTc4ODQ4MiwgNjA1MjQ3MiwgNTU1NTcyMywgNTY1NTU0MCwgNjQwMzYyMykpCmtuaXRyOjprYWJsZShzcHlvZ2VuZXNfc3VtbWFyeSkKCiMjIHRoZSByYW5nZSBvZiByZWFkcyAvIHNhbXBsZSBtYXBwZWQgZ29lcyBmcm9tIH4gMywzNjAsMDAwIHRvIDYsNDUwLDAwMC4Kc3VtbWFyeShzcHlvZ2VuZXNfc3VtbWFyeSRyZWFkcykKCiMjIFN1bSB0aGUgbnVtYmVyIG9mIHJlYWRzIG1hcHBlZCBpbiB0aGUgZW5kCmZvcm1hdChzdW0oc3B5b2dlbmVzX3N1bW1hcnkkcmVhZHMpLCBzY2llbnRpZmljPVRSVUUpCmBgYAoKU28gSSB0aGluayB3ZSBjYW4gYXNzdW1lIHRoYXQgdGhleSBsb2FkZWQgMSBsYW5lIG9mIGEgbm9ybWFsIDggbGFuZSBydW4/CgpCdXQgdGhlIG1haW4gcGllY2Ugb2YgaW50ZXJlc3RpbmcgaW5mb3JtYXRpb24gaW4gbXkgbWluZDogIHRoZSBudW1iZXJzIG9mCnJlYWRzL3NhbXBsZSB2YXJpZWQgYnkgYWJvdXQgYSBmYWN0b3Igb2YgMi4K