A quick analysis of some small RNAs from Pseudomonas aeruginosa

I hope to use this document to attempt to make it easy for anyone to redo/improve what I did. The logic has been split into a few pieces:

  1. Preprocessing Take the raw reads and remove sequencing adapters, count n-mers, and put them into a useable format.
  2. Annotation Download/extract the P.aeruginosa genome and gene annotations.
  3. 2mers Perform a simplistic differential-expression-like analyses for 2mers.
  4. 3mers Perform a simplistic differential-expression-like analyses for 3mers.
  5. 4mers Perform a simplistic differential-expression-like analyses for 4mers.
  6. 5mers Perform a simplistic differential-expression-like analyses for 5mers.
  7. 6mers Perform a simplistic differential-expression-like analyses for 6mers.
  8. 7mers Perform a simplistic differential-expression-like analyses for 7mers.
  9. 8mers Perform a simplistic differential-expression-like analyses for 8mers.
  10. 9mers Perform a simplistic differential-expression-like analyses for 9mers.
  11. 10mers Perform a simplistic differential-expression-like analyses for 10mers.
  12. 11mers Perform a simplistic differential-expression-like analyses for 11mers.
  13. 12mers Perform a simplistic differential-expression-like analyses for 12mers.

Currently I have only extended the 2…n analysis for the set of 2-10 nucleotides. This is being done via a small Makefile which invokes the following:

R version 3.5.1 (2018-07-02)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.utf8, LC_NUMERIC=C, LC_TIME=en_US.utf8, LC_COLLATE=en_US.utf8, LC_MONETARY=en_US.utf8, LC_MESSAGES=en_US.utf8, LC_PAPER=en_US.utf8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.utf8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: pander(v.0.6.2) and hpgltools(v.2018.03)

loaded via a namespace (and not attached): Rcpp(v.0.12.17), compiler(v.3.5.1), pillar(v.1.3.0), plyr(v.1.8.4), bindr(v.0.1.1), iterators(v.1.0.10), tools(v.3.5.1), digest(v.0.6.15), evaluate(v.0.11), tibble(v.1.4.2), gtable(v.0.2.0), pkgconfig(v.2.0.1), rlang(v.0.2.1), foreach(v.1.4.4), yaml(v.2.1.19), parallel(v.3.5.1), bindrcpp(v.0.2.2), stringr(v.1.3.1), dplyr(v.0.7.6), knitr(v.1.20), rprojroot(v.1.3-2), grid(v.3.5.1), tidyselect(v.0.2.4), glue(v.1.3.0), data.table(v.1.11.4), Biobase(v.2.40.0), R6(v.2.2.2), rmarkdown(v.1.10), ggplot2(v.3.0.0), purrr(v.0.2.5), magrittr(v.1.5), backports(v.1.1.2), scales(v.0.5.0), codetools(v.0.2-15), htmltools(v.0.3.6), BiocGenerics(v.0.26.0), assertthat(v.0.2.0), colorspace(v.1.3-2), stringi(v.1.2.3), lazyeval(v.0.2.1), munsell(v.0.5.0) and crayon(v.1.3.4)