Sample metrics of RNA/TN sequencing of S.pyogenes.

Lets play with some sample metrics when combining TNSeq and RNASeq for S.pyogenes.

sp_metrics <- sm(graph_metrics(sp_expt))

sp_metrics$libsize

## TNSeq library 4 time 0 is poor, otherwise there are no disasters here
sp_metrics$density

## We see pretty divergent densities/gene from rnaseq to tnseq
sp_metrics$boxplot

sp_metrics$corheat

## Hey, guess what, RNASeq is different from TNSeq

Split the data into functional pieces

I believe that this is in fact 6-8 experiments, lets define them now:

thyexpress: RNASeq alone: Compare gene expression of 3 time points in 5448; exponential, transition, and stationary.
1. Include all Type==RNASeq
thytnseq: TNSeq in THY: Use differential expression tools to assess gene fitness with respect to time in THY.
1. Include all samples TNSeq which are also condition==t0|t1|t2|t3
thyrnatn: RNASeq+TNSeq in THY: Contrast changing fitness and gene expression in THY with respect to time; even though the time scales of the two experiments are very different.
1. Include all Type==RNASeq as well as condition==t0|t1|t2|t3
subcu: 5448 fitness during subcutaneous infection in SKH1 mice; use DE tools for fitness vs. time.
1. Include all condition beginning with “subcut”
pmnexp1: 5448 fitness while being attacked by pmn cells experiment 1 (why is 1 different than 2?)
1. Include samples: pmnlavt0, pmnlavt1, pmnylbt0, pmnylbt1
pmnexp2: 5448 fitness while being attacked by pmn cells experiment 2 (because the first one had 24 hours outgrowth while this experiment has only 4 hours outgrowth)
1. Include samples: pmnlavt0v2, pmnlavexp1v2, pmnlavexp2v2, pmnlavplasmav2, pmnlavt0v2, pmnylbexpv2, pmnylbplasmav2, pmnylbt0v2
abscess: 5448 fitness within a rabbit abscess
1. Include all samples starting with ‘abscess’
rpoe: A group of wt, rpoe deletion, and complement samples

Create experimental subsets

thyexpress <- expt_subset(expt=sp_expt, subset="type=='RNASeq'")
thytnseq <- expt_subset(expt=sp_expt, subset="condition=='thyd0'|condition=='thyd1'|condition=='thyd2'|condition=='thyd3'")
thyrnatn <- expt_subset(expt=sp_expt, subset="type=='RNASeq'|condition=='thyd0'|condition=='thyd1'|condition=='thyd2'|condition=='thyd3'")
subcu <- expt_subset(expt=sp_expt, subset="media=='mouse_skin'|condition=='subcut0'")
pmnexp1 <- expt_subset(expt=sp_expt, subset="batch=='k'|batch=='l'")
pmnexp2 <- expt_subset(expt=sp_expt, subset="batch=='m'|batch=='n'|batch=='o'")
abscess <- expt_subset(expt=sp_expt, subset="media=='rabbit_abscess'")
rpoe <- expt_subset(expt=sp_expt, subset="batch=='z'")
tmp <- sm(saveme(filename="subsets.rda.xz"))

THY Expression

thyexpress_metrics <- sm(graph_metrics(thyexpress))

thyexpress_norm <- sm(normalize_expt(expt=thyexpress, transform="log2", convert="cpm", norm="quant", filter=TRUE))
thyexpress_norm_metrics <- sm(graph_metrics(thyexpress_norm))

thyexpress_normbatch <- sm(normalize_expt(expt=thyexpress, transform="log2", convert="cpm", norm="quant", filter=TRUE, batch="sva"))
thyexpress_normbatch_metrics <- sm(graph_metrics(thyexpress_normbatch))

Now show some plots of interest

thyexpress_metrics$libsize

## We see that 537 and 538 are low (2 of the 4 late log samples)

thyexpress_metrics$density

## Not too bad

thyexpress_norm_metrics$disheat

## nice
thyexpress_norm_metrics$corheat

## here we see them(the 2 late log samples) clustering with the early log samples, which is going to be annoying in a pca plot
thyexpress_norm_metrics$pcaplot

## And lo, they suck

thyexpress_normbatch_metrics$pcaplot

## Allowing sva to minimize this effect seems to help, though, I suspect at the cost of variance that we should have been able to use
## Maybe we want to this, maybe we don't I am not sure.

THY TNSeq

thytnseq <- set_expt_colors(thytnseq, colors=c("darkred","forestgreen","blue","black"))

## Error in set_expt_colors(thytnseq, colors = c("darkred", "forestgreen", : object 'sample_colors' not found

thytnseq_metrics <- sm(graph_metrics(thytnseq))

## holy crap the colors suck, oh yeah yoann has some colors he wanted for this... where did I put them?
thytnseq_norm <- sm(normalize_expt(expt=thytnseq, transform="log2", convert="cpm", norm="quant", filter=TRUE))
thytnseq_norm_metrics <- sm(graph_metrics(thytnseq_norm))

thytnseq_normbatch <- sm(normalize_expt(expt=thytnseq, transform="log2", convert="cpm", norm="quant", filter=TRUE, batch="combat_scale"))

thytnseq_normbatch_metrics <- sm(graph_metrics(thytnseq_normbatch))

Now show some plots of interest

thytnseq_metrics$libsize

## We see that 537 and 538 are low (2 of the 4 late log samples)

thytnseq_metrics$density

## That is a bit funky

thytnseq_norm_metrics$disheat

## Wow, these really do split by library!!
thytnseq_norm_metrics$corheat

## This split by library is pretty profound indeed
thytnseq_norm_metrics$pcaplot

## See what I mean!?

thytnseq_normbatch_metrics$pcaplot

## But when the library effect is removed, that picture changes quite drastically and I think becomes something quite assayable.

THY Expression and TNSeq

Subcutaneous infection

In this first block we create a set of initial pictures and we will show them below.

subcu_metrics <- sm(graph_metrics(subcu))

subcu_norm <- sm(normalize_expt(subcu, transform="log2", convert="cpm", norm="quant", filter=TRUE))
subcu_norm_metrics <- sm(graph_metrics(subcu_norm))

Now we have raw-data metrics and default-normalized metrics. Lets visualize some of them.

subcu_metrics$libsize

## mouset48r2 is low, subcut0 is high  They are not tragic
subcu_metrics$density

## Ok, I am a bit concerned about the 48 hour data
subcu_norm_metrics$corheat

## t0 is brightest pink, t12 is 2nd brightest, t24 starting to get brown, t48 is brownish
## There is some funky batch effect going on which we will need to address
## Yoann wants colors:  t0='red', t12='orange', t24='yellow', t48='green'
subcu_norm_metrics$pcaplot

## the one 24 hour sample is a bit of an arsehole
subcu_norm_metrics$smc

## t24 m2v2

Ok, so subcutaneous sample time 24 m2v2 might be too damaged to use. Lets see if it is possible to do some sort of batch correction to salvage it?

attempt <- sm(normalize_expt(subcu, transform="log2", convert="cpm", norm="quant", filter=TRUE, batch="sva"))
attempt_metrics <- graph_metrics(attempt, qq=TRUE)

## Graphing number of non-zero genes with respect to CPM by library.

## Graphing library sizes.

## Graphing a boxplot.

## Graphing a correlation heatmap.

## Graphing a standard median correlation.

## Performing correlation.

## Graphing a distance heatmap.

## Graphing a standard median distance.

## Performing distance.

## Graphing a PCA plot.

## Plotting a density plot.

## QQ plotting!

## Making plot of mouset0r1v1(1) vs. a sample distribution.

## Making plot of mouset0r2v1(2) vs. a sample distribution.

## Making plot of mouset12r1v1(3) vs. a sample distribution.

## Making plot of mouset12r2v1(4) vs. a sample distribution.

## Making plot of mouset24r1v1(5) vs. a sample distribution.

## Making plot of mouset24r2v1(6) vs. a sample distribution.

## Making plot of mouset48r1v1(7) vs. a sample distribution.

## Making plot of mouset48r2v1(8) vs. a sample distribution.

## Making plot of subcut0v2(9) vs. a sample distribution.

## Making plot of subcut12m1v2(10) vs. a sample distribution.

## Making plot of subcut24m1v2(11) vs. a sample distribution.

## Making plot of subcut24m2v2(12) vs. a sample distribution.

## Making plot of subcut48m1v2(13) vs. a sample distribution.

## Making plot of subcut48m2v2(14) vs. a sample distribution.

attempt_metrics$pcaplot

## This might be valid?
attempt_metrics$corheat

## The early time points are very similar to each other and so mixed
## Oh, 1 t12 sample clusters with t24, that is annoying
attempt_metrics$disheat

## Ditto
attempt_metrics$smc

## This suggests t48r1v1 as will the distance
attempt_metrics$smd

## hmm using this normalization, t48r1v1 might be a problem.
attempt_metrics$qqrat

## According to this, the following samples might be jerks: mouset48r2v1, mouset48r1v1, subcut12m1v2

The question now is, can we drop one or more of these samples and still have a valid pairwise comparison?

The primary candidate for dropping is: t48r1v1 I want to suggest dropping this sample.

attempt <- expt_subset(expt=attempt, subset="sampleid!='mouset48r1v1'")
plot_pca(attempt)$plot

plot_corheat(attempt)

## mouset12r1v1 is problematic, it might get removed

PMN experiment 1

PMN experiment 2

Rabbit abscess

RPOE

rpoe_metrics <- sm(graph_metrics(rpoe))

rpoe_norm <- sm(normalize_expt(rpoe, transform="log2", convert="cpm", norm="quant", filter=TRUE))
rpoe_norm_metrics <- sm(graph_metrics(rpoe_norm))

rpoe_metrics$libsize

## mouset48r2 is low, rpoet0 is high  They are not tragic
rpoe_metrics$density

## Ok, I am a bit concerned about the 48 hour data
rpoe_norm_metrics$corheat

## t0 is brightest pink, t12 is 2nd brightest, t24 starting to get brown, t48 is brownish
## There is some funky batch effect going on which we will need to address
## Yoann wants colors:  t0='red', t12='orange', t24='yellow', t48='green'
rpoe_norm_metrics$pcaplot

## the one 24 hour sample is a bit of an arsehole
rpoe_norm_metrics$smc

## t24 m2v2

Holy ass crackers, the data is beautiful.

tmp <- sm(saveme(filename="sample_metrics.rda.xz"))

Sample metrics: RNAseq/TNSeq of S.pyogenes

atb abelew@gmail.com

2016-07-29