Introduction

Overview

The file contains the code used to generate all of the tables and figures for the transcriptome analysis portion of this study. In addition to figures and tables included in the main text, several additional supplementary figures and tables are included here as well.

If you would like to regenerate all of the results from this text, see the section on “Reproducing this analysis” below.

Reproducing this analysis

This PDF was generated using the knitr and rmarkdown packages for R.

To reproduce the figures and results in this PDF, download a copy of the source .Rmd file and associated annotations and count tables.

Using the Bioconductor biocLite() function, install the following dependencies:

Next, open up an R console in the directory containing XXX.Rmd, and run:

rmarkdown::render('XXX.Rmd', output_format='pdf_document')

Note that you can also generate an HTML version of the output by switching pdf_document for html_document in the function call above.

If all of the dependencies specified above are properly installed, and the versions are not significantly different from those used at the time of writing this manuscript, then you should be able to regenerate all of the tables and figures in the PDF as they appear below.

To find out which specific library versions were used, refer to the “System Information” section at the bottom of this document.

Figures and Tables

RNA-Seq samples
HPGL_ID Condition Reads
HPGL0130 Mtb 6,135,076
HPGL0131 Mtb 5,624,817
HPGL0132 Mtb 5,290,752
HPGL0133 MtbΔRv3167c 6,201,775
HPGL0134 MtbΔRv3167c 7,332,840
HPGL0135 MtbΔRv3167c 4,581,716
HPGL0136 MtbΔRv3167c::Comp 15,053,918
HPGL0137 MtbΔRv3167c::Comp 16,388,081
HPGL0138 MtbΔRv3167c::Comp 12,962,623
RNA-Seq sample counts The number of RNA-Seq reads successfully mapped for each sample, before any normalization is applied.

RNA-Seq sample counts The number of RNA-Seq reads successfully mapped for each sample, before any normalization is applied.

Sample PCA plot PCA was used to look at the relationship between samples and conditions after count filtering and normalization were applied. The tight clustering within each condition suggests that each sample represents a unique transcriptonal state.

Sample PCA plot PCA was used to look at the relationship between samples and conditions after count filtering and normalization were applied. The tight clustering within each condition suggests that each sample represents a unique transcriptonal state.

Sample Heatmap Pearson correlation was used to measure the similarity between each RNA-Seq sample, and biclustering was applied in order to generate a heatmap depicting the relationship between samples.

Sample Heatmap Pearson correlation was used to measure the similarity between each RNA-Seq sample, and biclustering was applied in order to generate a heatmap depicting the relationship between samples.

Raw read counts (Mtb)
HPGL0130 HPGL0131 HPGL0132
Rv3167c 89 82 80
Rv3168 1014 686 742
Rv3169 952 645 723
Raw read counts (MtbΔRv3167c)
HPGL0133 HPGL0134 HPGL0135
Rv3167c 263 69 185
Rv3168 6052 7508 4571
Rv3169 3944 5066 3179
Raw read counts (MtbΔRv3167c::Comp)
HPGL0136 HPGL0137 HPGL0138
Rv3167c 144585 198535 134943
Rv3168 3494 3377 3364
Rv3169 1800 1874 1715
Normalized read counts (Mtb)
HPGL0130 HPGL0131 HPGL0132
Rv3167c 134.7778 143.2222 114.2778
Rv3168 1483.0000 1148.0000 1193.8333
Rv3169 1381.3333 1080.7778 1159.6667
Normalized read counts (MtbΔRv3167c)
HPGL0133 HPGL0134 HPGL0135
Rv3167c 374.4444 76.500 321.000
Rv3168 8248.0000 8473.667 8384.667
Rv3169 5491.5556 5754.222 5754.222
Normalized read counts (MtbΔRv3167c::Comp)
HPGL0136 HPGL0137 HPGL0138
Rv3167c 88543.889 88543.889 88543.889
Rv3168 2194.778 1871.222 2271.667
Rv3169 1111.333 1019.222 1150.389
Normalized expression values (Rv3168)
condition mean sd
Mtb 1274.94 181.63
MtbΔRv3167c 8368.78 113.67
MtbΔRv3167c::Comp 2112.56 212.51
Normalized expression values (Rv3169)
condition mean sd
Mtb 1207.259 155.82747
MtbΔRv3167c 5666.667 151.65067
MtbΔRv3167c::Comp 1093.648 67.34796
Rv3168 Expression For each of the three experimental conditions, mRNA was collected and sequenced in triplicate. Above, mean quantile normalized mRNA counts are shown for Rv3168 in each of the conditions.

Rv3168 Expression For each of the three experimental conditions, mRNA was collected and sequenced in triplicate. Above, mean quantile normalized mRNA counts are shown for Rv3168 in each of the conditions.

Rv3169 Expression For each of the three experimental conditions, mRNA was collected and sequenced in triplicate. Above, mean quantile normalized mRNA counts are shown for Rv3169 in each of the conditions.

Rv3169 Expression For each of the three experimental conditions, mRNA was collected and sequenced in triplicate. Above, mean quantile normalized mRNA counts are shown for Rv3169 in each of the conditions.

## Warning in plot.window(...): "legend_width" is not a graphical parameter
## Warning in plot.window(...): "legend_cex" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "legend_width" is not a graphical
## parameter
## Warning in plot.xy(xy, type, ...): "legend_cex" is not a graphical
## parameter
## Warning in title(...): "legend_width" is not a graphical parameter
## Warning in title(...): "legend_cex" is not a graphical parameter
Expression of genes in the PDIM operon Biclustering heatmap showing the normalized expression levels for each gene in the PDIM operon, across all samples.

Expression of genes in the PDIM operon Biclustering heatmap showing the normalized expression levels for each gene in the PDIM operon, across all samples.

## Warning: `legend.margin` must be specified using `margin()`. For the old
## behavior use legend.spacing
Mtb vs. MtbΔRv3167c Limma was used to detect genes that were differentially expressed between Mtb and MtbΔRv3167c samples, resulting in a total of 1407 genes. In the above figure, each point represents a single gene, with red points indicating genes which are differentially expressed.

Mtb vs. MtbΔRv3167c Limma was used to detect genes that were differentially expressed between Mtb and MtbΔRv3167c samples, resulting in a total of 1407 genes. In the above figure, each point represents a single gene, with red points indicating genes which are differentially expressed.

## Warning: `legend.margin` must be specified using `margin()`. For the old
## behavior use legend.spacing
MtbΔRv3167c vs MtbΔRv3167c::Comp Limma was used to detect genes that were differentially expressed between MtbΔRv3167c and MtbΔRv3167c::Comp samples, resulting in a total of 929 genes. In the above figure, each point represents a single gene, with red points indicating genes which are differentially expressed.

MtbΔRv3167c vs MtbΔRv3167c::Comp Limma was used to detect genes that were differentially expressed between MtbΔRv3167c and MtbΔRv3167c::Comp samples, resulting in a total of 929 genes. In the above figure, each point represents a single gene, with red points indicating genes which are differentially expressed.

## Warning: `legend.margin` must be specified using `margin()`. For the old
## behavior use legend.spacing
Mtb vs MtbΔRv3167c::Comp Limma was used to detect genes that were differentially expressed between Mtb and MtbΔRv3167c::Comp samples, resulting in a total of 1404 genes. In the above figure, each point represents a single gene, with red points indicating genes which are differentially expressed.

Mtb vs MtbΔRv3167c::Comp Limma was used to detect genes that were differentially expressed between Mtb and MtbΔRv3167c::Comp samples, resulting in a total of 1404 genes. In the above figure, each point represents a single gene, with red points indicating genes which are differentially expressed.

Determination of putative Rv3167c-regulated genes In order to determine which genes are potentially regulated (either directly or indirectly) by Rv3167c, mRNA-sequencing was performed on three replicates each of Mtb, MtbΔRv3167c, and MtbΔRv3167c::Comp. Candidate regulated genes were required to be differentially expressed in both the Mtb vs. MtbΔRv3167c contrast, and also the MtbΔRv3167c vs. MtbΔRv3167c::Comp contrast.

Determination of putative Rv3167c-regulated genes In order to determine which genes are potentially regulated (either directly or indirectly) by Rv3167c, mRNA-sequencing was performed on three replicates each of Mtb, MtbΔRv3167c, and MtbΔRv3167c::Comp. Candidate regulated genes were required to be differentially expressed in both the Mtb vs. MtbΔRv3167c contrast, and also the MtbΔRv3167c vs. MtbΔRv3167c::Comp contrast.

Enriched GO terms (Mtb vs MtbΔRv3167c)
category TERM numDEInCat numInCat over_pval_adj
65 GO:0005618 cell wall 295 659 0.0134636
312 GO:0071770 DIM/DIP cell wall layer assembly 9 9 0.0348816
Enriched GO terms (MtbΔRv3167-deregulated)
category TERM numDEInCat numInCat over_pval_adj
312 GO:0071770 DIM/DIP cell wall layer assembly 9 9 0.0000054
65 GO:0005618 cell wall 123 659 0.0010785
93 GO:0006633 fatty acid biosynthetic process 6 7 0.0039282
Enriched GO terms (MtbΔRv3167c-deregulated, upregulated genes only)
category TERM numDEInCat numInCat over_pval_adj
65 GO:0005618 cell wall 54 659 0.010955
Depleted GO terms (MtbΔRv3167c-deregulated, downregulated genes only)
category TERM numDEInCat numInCat over_pval_adj
312 GO:0071770 DIM/DIP cell wall layer assembly 9 9 0.0000001
93 GO:0006633 fatty acid biosynthetic process 6 7 0.0004165

References

[1] L. Cuthbertson and J. R. Nodwell. “The TetR Family of Regulators”. In: Microbiology and Molecular Biology Reviews 77.3 (Sep. 2013), pp. 440-475.
DOI: 10.1128/mmbr.00018-13. <URL: http://dx.doi.org/10.1128/MMBR.00018-13>.
[2] S. Durinck, P. T. Spellman, E. Birney, et al. “Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt”.
In: Nat Protoc 4.8 (Jul. 2009), pp. 1184-1191. DOI: 10.1038/nprot.2009.97. <URL: http://dx.doi.org/10.1038/nprot.2009.97>.

[3] W. E. Johnson, C. Li and A. Rabinovic. “Adjusting batch effects in microarray expression data using empirical Bayes methods”. In: Biostatistics 8.1
(Apr. 2006), pp. 118-127. DOI: 10.1093/biostatistics/kxj037. <URL: http://dx.doi.org/10.1093/biostatistics/kxj037>.

[4] C. W. Law, Y. Chen, W. Shi, et al. “voom: precision weights unlock linear model analysis tools for RNA-seq read counts”. In: Genome Biol 15.2
(2014), p. R29. DOI: 10.1186/gb-2014-15-2-r29. <URL: http://dx.doi.org/10.1186/gb-2014-15-2-r29>.

[5] M. D. Young, M. J. Wakefield, G. K. Smyth, et al. “Gene ontology analysis for RNA-seq: accounting for selection bias”. In: Genome Biol 11.2 (2010),
p. R14. DOI: 10.1186/gb-2010-11-2-r14. <URL: http://dx.doi.org/10.1186/gb-2010-11-2-r14>.

[6] G. Yu, L. Wang, Y. Han, et al. “clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters”. In: OMICS: A Journal of
Integrative Biology
16.5 (May. 2012), pp. 284-287. DOI: 10.1089/omi.2011.0118. <URL: http://dx.doi.org/10.1089/omi.2011.0118>.

System Information

R version 3.3.2 (2016-10-31)

**Platform:** x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=en_US.UTF-8, LC_ADDRESS=en_US.UTF-8, LC_TELEPHONE=en_US.UTF-8, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=en_US.UTF-8

attached base packages: parallel, stats4, tools, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: pander(v.0.6.0), viridis(v.0.3.4), knitcitations(v.1.0.7), venneuler(v.1.1-0), rJava(v.0.9-8), preprocessCore(v.1.36.0), limma(v.3.30.4), dplyr(v.0.5.0), genomeIntervals(v.1.30.0), intervals(v.0.15.1), RColorBrewer(v.1.1-2), GO.db(v.3.4.0), AnnotationDbi(v.1.36.0), IRanges(v.2.8.1), S4Vectors(v.0.12.0), Biobase(v.2.34.0), BiocGenerics(v.0.20.0), gplots(v.3.0.1), goseq(v.1.26.0), geneLenDataBase(v.1.10.0), BiasedUrn(v.1.07), ggplot2(v.2.2.0), knitr(v.1.15.1), rmarkdown(v.1.2), nvimcom(v.0.9-25) and colorout(v.1.1-0)

loaded via a namespace (and not attached): httr(v.1.2.1), gtools(v.3.5.0), assertthat(v.0.1), highr(v.0.6), Rsamtools(v.1.26.1), yaml(v.2.1.14), RSQLite(v.1.0.0), backports(v.1.0.4), lattice(v.0.20-34), digest(v.0.6.10), GenomicRanges(v.1.26.1), XVector(v.0.14.0), RefManageR(v.0.13.1), colorspace(v.1.3-1), htmltools(v.0.3.5), Matrix(v.1.2-7.1), plyr(v.1.8.4), XML(v.3.98-1.5), bibtex(v.0.4.0), biomaRt(v.2.30.0), zlibbioc(v.1.20.0), scales(v.0.4.1), gdata(v.2.17.0), BiocParallel(v.1.8.1), tibble(v.1.2), mgcv(v.1.8-16), SummarizedExperiment(v.1.4.0), GenomicFeatures(v.1.26.0), lazyeval(v.0.2.0.9000), RJSONIO(v.1.3-0), magrittr(v.1.5), evaluate(v.0.10), nlme(v.3.1-128), stringr(v.1.1.0), munsell(v.0.4.3), Biostrings(v.2.42.0), GenomeInfoDb(v.1.10.1), caTools(v.1.17.1), grid(v.3.3.2), RCurl(v.1.95-4.8), labeling(v.0.3), bitops(v.1.0-6), gtable(v.0.2.0), DBI(v.0.5-1), R6(v.2.2.0), GenomicAlignments(v.1.10.0), gridExtra(v.2.2.1), lubridate(v.1.6.0), rtracklayer(v.1.34.1), rprojroot(v.1.1), KernSmooth(v.2.23-15), stringi(v.1.1.2) and Rcpp(v.0.12.8)