It looks to me, that despite the oddities in processing the raw reads, there is nice coverage and some obviously essential genes. The next question: did any change status as more calprotectin was added?
The ID for strain a909 at microbesonline.org is: 205921.
Let us load up annotations from my gff file along with the microbesonline.
As a heads up, the count tables are using IDs which look like: SAK_RS00185. This appears to be the ‘sysName’ column from microbesonline and the locus_tag column from the gff annotations. In addition, there are a bunch of unused columns in both data sets which we likely want to prune.
Ahh, that is incorrect, the microbesonline ‘sysName’ is the same as ‘old_locus_tag’ column.
## The species being downloaded is: Streptococcus agalactiae A909
## Downloading: http://www.microbesonline.org/cgi-bin/genomeInfo.cgi?tId=205921;export=tab
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=TRUE)
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=FALSE)
## Had a successful gff import with rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=FALSE)
## Returning a df with 34 columns and 4426 rows.
a909_microbes <- as.data.frame(a909_microbes)
rownames(a909_gff) <- make.names(a909_gff[["locus_tag"]], unique=TRUE)
## I am going to only pay attention to the first annotation for each locus tag from microbesonline.
a909_microbes[["sysName"]] <- make.names(a909_microbes[["sysName"]], unique=TRUE)
all_annot <- merge(a909_gff, a909_microbes, by.x="old_locus_tag", by.y="sysName")
rownames(all_annot) <- make.names(all_annot[["locus_tag"]], unique=TRUE)
a909_expt <- create_expt(metadata="sample_sheets/sagalactiae_samples.xlsx",
gene_info=all_annot, file_column="filename")
## Reading the sample metadata.
## The sample definitions comprises: 9 rows(samples) and 15 columns(metadata fields).
## Reading count tables.
## Reading count tables with read.table().
## /mnt/sshfs/cbcbsub/fs/cbcb-lab/nelsayed/scratch/atb/tnseq/sagalacticae_2019/preprocessing/01/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows.
## preprocessing/02/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows and merges to 2212 rows.
## preprocessing/03/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows and merges to 2212 rows.
## preprocessing/04/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows and merges to 2212 rows.
## preprocessing/05/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows and merges to 2212 rows.
## preprocessing/06/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows and merges to 2212 rows.
## preprocessing/07/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows and merges to 2212 rows.
## preprocessing/08/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows and merges to 2212 rows.
## preprocessing/09/outputs/bowtie_sagalactiae_a909/trimmed_ca-v0M1.count.xz contains 2212 rows and merges to 2212 rows.
## Finished reading count tables.
## Matched 2048 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## The final expressionset has 2207 rows and 9 columns.
## Writing the first sheet, containing a legend and some summary data.
## Writing the raw reads.
## Graphing the raw reads.
## Warning in doTryCatch(return(expr), name, parentenv, handler): display list
## redraw incomplete
## Warning in doTryCatch(return(expr), name, parentenv, handler): display list
## redraw incomplete
## Writing the normalized reads.
## Graphing the normalized reads.
## Warning in doTryCatch(return(expr), name, parentenv, handler): display list
## redraw incomplete
## Warning in doTryCatch(return(expr), name, parentenv, handler): display list
## redraw incomplete
## Writing the median reads by factor.
## Note: zip::zip() is deprecated, please use zip::zipr() instead
I think this looks reasonable, though it makes me slightly wonder if 04 and 09 are switched. But as long as we are willing to state that the primary difference is between calprotectin and control, then I would suggest against considering it. I think it is reasonable to assume the samples are not switched and this is just how they are. If however, the primary goal is to investigate changing concentrations of calprotectin, then I would want to check into this distribution of samples or make the statement that these two concentrations have no significant difference unless we get more samples to look at.
For differential expression, I am going to assume until I hear otherwise, that my batch assignments are not correct and that the 1,2,3 assignments of the sample names do not actually delineate separate batches. Though, if they do delineate separate batches, it might be taken as a (very)small degree of evidence that 04 and 09 were switched.
## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(cbcb(data))))
## It will save copies of each step along the way
## in expt$normalized with the corresponding libsizes. Keep libsizes in mind
## when invoking limma. The appropriate libsize is non-log(cpm(normalized)).
## This is most likely kept at:
## 'new_expt$normalized$intermediate_counts$normalization$libsizes'
## A copy of this may also be found at:
## new_expt$best_libsize
## Not correcting the count-data for batch effects. If batch is
## included in EdgerR/limma's model, then this is probably wise; but in extreme
## batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 294 low-count genes (1913 remaining).
## Step 2: normalizing the data with quant.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## transform_counts: Found 73 values equal to 0, adding 1 to the matrix.
## Step 5: not doing batch correction.
## Plotting a PCA before surrogates/batch inclusion.
## Assuming no batch in model for testing pca.
## Finished running DE analyses, collecting outputs.
## Comparing analyses.
a909_contrasts <- list(
"low_vs_control" = c("cal_low", "control"),
"high_vs_control" = c("cal_high", "control"))
a909_tables <- combine_de_tables(
a909_de, keepers=a909_contrasts,
excel=glue::glue("excel/a909_tables-v{ver}.xlsx"))
## Deleting the file excel/a909_tables-v20191105.xlsx before writing the tables.
## Writing a legend of columns.
## Printing a pca plot before/after surrogates/batch estimation.
## Working on 1/2: low_vs_control which is: cal_low/control.
## Found inverse table with control_vs_cal_low
## Working on 2/2: high_vs_control which is: cal_high/control.
## Found inverse table with control_vs_cal_high
## Adding venn plots for low_vs_control.
## Limma expression coefficients for low_vs_control; R^2: 0.994; equation: y = 0.996x + 0.0272
## Edger expression coefficients for low_vs_control; R^2: 0.994; equation: y = 0.996x + 0.0363
## DESeq2 expression coefficients for low_vs_control; R^2: 0.987; equation: y = 0.985x + 0.167
## Adding venn plots for high_vs_control.
## Limma expression coefficients for high_vs_control; R^2: 0.994; equation: y = 0.999x - 0.0103
## Edger expression coefficients for high_vs_control; R^2: 0.995; equation: y = 0.997x + 0.0218
## DESeq2 expression coefficients for high_vs_control; R^2: 0.988; equation: y = 0.993x + 0.0786
## Writing summary information.
## Performing save of excel/a909_tables-v20191105.xlsx.
saturation_01 <- tnseq_saturation("preprocessing/01/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_01$plot
## Warning: Removed 22368 rows containing non-finite values (stat_bin).
## Warning: Removed 22368 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 16 40 37 8057
ess_plts <- plot_essentiality("preprocessing/01/outputs/essentiality/mh_ess-trimmed_ca-v0M1_gene_tas_m8.csv")
ess_plts[["zbar"]]
saturation_02 <- tnseq_saturation("preprocessing/02/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_02$plot
## Warning: Removed 21315 rows containing non-finite values (stat_bin).
## Warning: Removed 21315 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 16 39 36 10021
ess_plts <- plot_essentiality("preprocessing/02/outputs/essentiality/mh_ess-trimmed_ca-v0M1_gene_tas_m8.csv")
ess_plts[["zbar"]]
saturation_03 <- tnseq_saturation("preprocessing/03/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_03$plot
## Warning: Removed 24101 rows containing non-finite values (stat_bin).
## Warning: Removed 24101 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 17 41 38 10713
ess_plts <- plot_essentiality("preprocessing/03/outputs/essentiality/mh_ess-trimmed_ca-v0M1_gene_tas_m8.csv")
ess_plts[["zbar"]]
saturation_04 <- tnseq_saturation("preprocessing/04/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_04$plot
## Warning: Removed 23905 rows containing non-finite values (stat_bin).
## Warning: Removed 23905 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 17 43 39 7596
ess_plts <- plot_essentiality("preprocessing/04/outputs/essentiality/mh_ess-trimmed_ca-v0M1_gene_tas_m8.csv")
ess_plts[["zbar"]]
saturation_05 <- tnseq_saturation("preprocessing/05/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_05$plot
## Warning: Removed 24770 rows containing non-finite values (stat_bin).
## Warning: Removed 24770 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 14 35 32 5765
ess_plts <- plot_essentiality("preprocessing/06/outputs/essentiality/mh_ess-trimmed_ca-v0M1_gene_tas_m8.csv")
ess_plts[["zbar"]]
saturation_06 <- tnseq_saturation("preprocessing/06/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_06$plot
## Warning: Removed 23586 rows containing non-finite values (stat_bin).
## Warning: Removed 23586 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 14 32 30 10041
ess_plts <- plot_essentiality("preprocessing/06/outputs/essentiality/mh_ess-trimmed_ca-v0M1_gene_tas_m8.csv")
ess_plts[["zbar"]]
saturation_07 <- tnseq_saturation("preprocessing/07/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_07$plot
## Warning: Removed 23584 rows containing non-finite values (stat_bin).
## Warning: Removed 23584 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 15 38 35 5946
## Warning in lmrob.fit(x, y, control, init = init): M-step did NOT converge.
## Returning unconverged SM-estimate
saturation_08 <- tnseq_saturation("preprocessing/08/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_08$plot
## Warning: Removed 20952 rows containing non-finite values (stat_bin).
## Warning: Removed 20952 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 14 32 30 8235
ess_plts <- plot_essentiality("preprocessing/08/outputs/essentiality/mh_ess-trimmed_ca-v0M1_gene_tas_m8.csv")
ess_plts[["zbar"]]
saturation_09 <- tnseq_saturation("preprocessing/09/outputs/essentiality/trimmed_ca-v0M1.wig")
saturation_09$plot
## Warning: Removed 22581 rows containing non-finite values (stat_bin).
## Warning: Removed 22581 rows containing non-finite values (stat_density).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6 15 37 34 10604
## Warning in lmrob.fit(x, y, control, init = init): M-step did NOT converge.
## Returning unconverged SM-estimate
## The factor control has 3 rows.
## The factor cal_low has 3 rows.
## The factor cal_high has 3 rows.
## control cal_low cal_high
## SAK_RS00260 E E U
## SAK_RS00325 NE U U
## SAK_RS00415 NE U U
## SAK_RS00420 U E U
## SAK_RS00470 U NE U
## SAK_RS00475 U E U
## SAK_RS00485 E U E
## SAK_RS00500 U NE U
## SAK_RS00530 U E U
## SAK_RS00535 U NE E
## SAK_RS00540 U NE U
## SAK_RS00545 U E E
## SAK_RS00560 U E U
## SAK_RS00565 U E E
## SAK_RS00570 U U E
## SAK_RS00580 E E U
## SAK_RS00735 U U NE
## SAK_RS00740 U NE NE
## SAK_RS00820 U NE U
## SAK_RS00830 U NE NE
## SAK_RS00840 E U NE
## SAK_RS00955 E NE NE
## SAK_RS00960 U NE NE
## SAK_RS00975 U NE NE
## SAK_RS01305 U NE U
## SAK_RS01320 U U E
## SAK_RS01360 U E E
## SAK_RS01380 U U E
## SAK_RS01410 U U NE
## SAK_RS01425 E U E
## SAK_RS01645 NE U U
## SAK_RS01685 E U E
## SAK_RS01700 U U NE
## SAK_RS01770 E E U
## SAK_RS01805 E E U
## SAK_RS01810 NE U NE
## SAK_RS01850 U U NE
## SAK_RS01860 U NE NE
## SAK_RS02025 U NE NE
## SAK_RS02065 NE U U
## SAK_RS02075 NE NE U
## SAK_RS02090 U U NE
## SAK_RS02095 U U E
## SAK_RS02105 E U E
## SAK_RS02120 U U NE
## SAK_RS02125 NE U E
## SAK_RS02395 U NE NE
## SAK_RS02455 NE E NE
## SAK_RS02460 NE E NE
## SAK_RS02470 U NE U
## SAK_RS02480 NE U NE
## SAK_RS02490 NE E E
## SAK_RS02495 NE E U
## SAK_RS02500 U E E
## SAK_RS02505 U U E
## SAK_RS02545 NE NE U
## SAK_RS02590 U U E
## SAK_RS02610 E U NE
## SAK_RS02695 U E E
## SAK_RS02700 NE U NE
## SAK_RS02765 E E U
## SAK_RS02800 U NE NE
## SAK_RS02865 U NE NE
## SAK_RS02905 U U NE
## SAK_RS02920 E U E
## SAK_RS02970 U U NE
## SAK_RS02980 U NE NE
## SAK_RS03055 U E E
## SAK_RS03075 E U E
## SAK_RS03105 E U E
## SAK_RS03135 U U E
## SAK_RS03190 U E E
## SAK_RS03200 U E E
## SAK_RS03210 E E U
## SAK_RS03225 U E E
## SAK_RS03275 NE NE U
## SAK_RS03350 E U E
## SAK_RS03415 E NE U
## SAK_RS03435 E U E
## SAK_RS03440 U NE NE
## SAK_RS03445 U U NE
## SAK_RS03530 E U E
## SAK_RS03575 NE U NE
## SAK_RS03635 U NE NE
## SAK_RS03650 U E U
## SAK_RS03670 U E E
## SAK_RS03680 NE NE U
## SAK_RS03685 NE NE U
## SAK_RS03730 E U NE
## SAK_RS03740 U NE U
## SAK_RS03745 U U E
## SAK_RS03765 NE NE U
## SAK_RS03770 E E U
## SAK_RS03785 U E E
## SAK_RS04040 NE NE U
## SAK_RS04115 U NE NE
## SAK_RS04125 NE U NE
## SAK_RS04140 NE U NE
## SAK_RS04200 U NE NE
## SAK_RS04210 E U E
## SAK_RS04220 U E NE
## SAK_RS04235 U E NE
## SAK_RS04250 E E U
## SAK_RS04385 U NE NE
## SAK_RS04470 U NE NE
## SAK_RS04575 NE U NE
## SAK_RS04645 U NE NE
## SAK_RS04745 U NE E
## SAK_RS04765 NE U NE
## SAK_RS04780 E E U
## SAK_RS04840 U NE NE
## SAK_RS04845 E NE U
## SAK_RS04915 E U E
## SAK_RS04930 E NE E
## SAK_RS04945 NE U NE
## SAK_RS04950 U U NE
## SAK_RS05010 E E U
## SAK_RS05015 E U U
## SAK_RS05020 E U E
## SAK_RS05025 E E U
## SAK_RS05050 U E E
## SAK_RS05075 NE U NE
## SAK_RS05125 NE U NE
## SAK_RS05145 U NE NE
## SAK_RS05265 U NE U
## SAK_RS05285 U NE NE
## SAK_RS05295 U E U
## SAK_RS05370 U NE U
## SAK_RS05380 U NE U
## SAK_RS05410 E U E
## SAK_RS05570 E U E
## SAK_RS05585 E NE E
## SAK_RS05615 U U E
## SAK_RS05630 NE U U
## SAK_RS05660 U U E
## SAK_RS05665 U U NE
## SAK_RS05675 NE NE U
## SAK_RS05685 U NE NE
## SAK_RS05705 U NE U
## SAK_RS05710 E NE NE
## SAK_RS05740 E NE NE
## SAK_RS05770 U E E
## SAK_RS05875 NE U NE
## SAK_RS05945 U NE NE
## SAK_RS05955 U NE U
## SAK_RS05965 NE U U
## SAK_RS05995 U E U
## SAK_RS06010 E U E
## SAK_RS06040 U NE NE
## SAK_RS06105 U NE NE
## SAK_RS06140 U NE NE
## SAK_RS06175 U U NE
## SAK_RS06195 NE NE U
## SAK_RS06220 NE NE U
## SAK_RS06275 E U E
## SAK_RS06395 E U NE
## SAK_RS06425 U NE NE
## SAK_RS06440 U E E
## SAK_RS06485 NE U NE
## SAK_RS06530 U NE NE
## SAK_RS06545 NE U NE
## SAK_RS06560 U NE NE
## SAK_RS06635 U NE NE
## SAK_RS06675 U U NE
## SAK_RS06715 U NE U
## SAK_RS06720 U NE NE
## SAK_RS06760 E NE U
## SAK_RS06770 NE U U
## SAK_RS06845 E U E
## SAK_RS06875 NE U NE
## SAK_RS06880 E NE NE
## SAK_RS06925 U NE NE
## SAK_RS06935 E E U
## SAK_RS06940 U NE NE
## SAK_RS06970 E U E
## SAK_RS07030 U NE NE
## SAK_RS07035 U E U
## SAK_RS07055 E U NE
## SAK_RS07060 E U U
## SAK_RS07095 NE U NE
## SAK_RS07105 U NE NE
## SAK_RS07120 E U U
## SAK_RS07125 U NE U
## SAK_RS07315 E U E
## SAK_RS07335 U E E
## SAK_RS07360 NE NE U
## SAK_RS07405 E E U
## SAK_RS07535 NE NE U
## SAK_RS07575 NE U U
## SAK_RS07580 U NE U
## SAK_RS07605 U E U
## SAK_RS07650 U NE NE
## SAK_RS07655 NE U NE
## SAK_RS07660 NE U NE
## SAK_RS07670 U NE U
## SAK_RS07720 U NE E
## SAK_RS07725 NE NE U
## SAK_RS07745 U E E
## SAK_RS07750 NE NE U
## SAK_RS07770 NE U E
## SAK_RS07775 E U E
## SAK_RS07875 NE NE U
## SAK_RS08005 NE NE U
## SAK_RS08015 U E U
## SAK_RS08060 U NE U
## SAK_RS08120 U NE NE
## SAK_RS08200 U U NE
## SAK_RS08205 U NE U
## SAK_RS08210 U NE NE
## SAK_RS08260 E U E
## SAK_RS08265 NE U NE
## SAK_RS08310 U E NE
## SAK_RS08325 U NE NE
## SAK_RS08330 U NE NE
## SAK_RS08355 E U E
## SAK_RS08430 E U E
## SAK_RS08445 E E U
## SAK_RS08500 U NE NE
## SAK_RS08520 NE NE U
## SAK_RS08545 E U E
## SAK_RS08560 E U E
## SAK_RS08570 U NE NE
## SAK_RS08580 U E NE
## SAK_RS08620 U U NE
## SAK_RS08675 NE U NE
## SAK_RS08755 U E E
## SAK_RS08945 NE U NE
## SAK_RS08965 U E E
## SAK_RS08990 U NE E
## SAK_RS09020 U E U
## SAK_RS09030 E U E
## SAK_RS09045 U NE NE
## SAK_RS09050 E U E
## SAK_RS09100 NE E NE
## SAK_RS09105 U U NE
## SAK_RS09110 U NE NE
## SAK_RS09120 U U NE
## SAK_RS09145 U U NE
## SAK_RS09325 U U E
## SAK_RS09395 E E U
## SAK_RS09440 U U NE
## SAK_RS09500 U E E
## SAK_RS09570 NE U NE
## SAK_RS09575 E E U
## SAK_RS09645 NE NE U
## SAK_RS09715 U U NE
## SAK_RS09800 E U E
## SAK_RS09855 U NE NE
## SAK_RS09885 E E NE
## SAK_RS10205 U NE NE
## SAK_RS10210 E E NE
## SAK_RS10220 NE U U
## SAK_RS10225 E U E
## SAK_RS10240 E E U
## SAK_RS10410 NE NE U
## SAK_RS10455 U E E
## SAK_RS10460 U E E
## SAK_RS10465 U U E
## SAK_RS10490 U NE U
## SAK_RS10495 E E U
## SAK_RS10500 U E U
## SAK_RS10510 U E E
## SAK_RS10520 U NE NE
## SAK_RS10530 U E U
## SAK_RS10550 U NE U
## SAK_RS10560 U E E
## SAK_RS10570 U NE NE
## SAK_RS10595 U U NE
## SAK_RS10725 U U E
## SAK_RS10730 NE E U
## SAK_RS10755 E NE E
## SAK_RS10780 NE U U
## SAK_RS10815 U U NE
## SAK_RS10875 NE NE E
## SAK_RS10935 U U E
## SAK_RS11085 U NE NE
## SAK_RS11090 U NE NE
## SAK_RS11110 U NE NE