Lets play with some sample metrics when combining TNSeq and RNASeq for S.pyogenes.
sp_metrics <- sm(graph_metrics(sp_expt))
sp_metrics$libsize
## TNSeq library 4 time 0 is poor, otherwise there are no disasters here
sp_metrics$density
## We see pretty divergent densities/gene from rnaseq to tnseq
sp_metrics$boxplot
sp_metrics$corheat
## Hey, guess what, RNASeq is different from TNSeq
I believe that this is in fact 6-8 experiments, lets define them now:
thyexpress <- expt_subset(expt=sp_expt, subset="type=='RNASeq'")
thytnseq <- expt_subset(expt=sp_expt, subset="condition=='thyd0'|condition=='thyd1'|condition=='thyd2'|condition=='thyd3'")
thyrnatn <- expt_subset(expt=sp_expt, subset="type=='RNASeq'|condition=='thyd0'|condition=='thyd1'|condition=='thyd2'|condition=='thyd3'")
subcu <- expt_subset(expt=sp_expt, subset="media=='mouse_skin'|condition=='subcut0'")
pmnexp1 <- expt_subset(expt=sp_expt, subset="batch=='k'|batch=='l'")
pmnexp2 <- expt_subset(expt=sp_expt, subset="batch=='m'|batch=='n'|batch=='o'")
abscess <- expt_subset(expt=sp_expt, subset="media=='rabbit_abscess'")
rpoe <- expt_subset(expt=sp_expt, subset="batch=='z'")
tmp <- sm(saveme(filename="subsets.rda.xz"))
thyexpress_metrics <- sm(graph_metrics(thyexpress))
thyexpress_norm <- sm(normalize_expt(expt=thyexpress, transform="log2", convert="cpm", norm="quant", filter=TRUE))
thyexpress_norm_metrics <- sm(graph_metrics(thyexpress_norm))
thyexpress_normbatch <- sm(normalize_expt(expt=thyexpress, transform="log2", convert="cpm", norm="quant", filter=TRUE, batch="sva"))
thyexpress_normbatch_metrics <- sm(graph_metrics(thyexpress_normbatch))
Now show some plots of interest
thyexpress_metrics$libsize
## We see that 537 and 538 are low (2 of the 4 late log samples)
thyexpress_metrics$density
## Not too bad
thyexpress_norm_metrics$disheat
## nice
thyexpress_norm_metrics$corheat
## here we see them(the 2 late log samples) clustering with the early log samples, which is going to be annoying in a pca plot
thyexpress_norm_metrics$pcaplot
## And lo, they suck
thyexpress_normbatch_metrics$pcaplot
## Allowing sva to minimize this effect seems to help, though, I suspect at the cost of variance that we should have been able to use
## Maybe we want to this, maybe we don't I am not sure.
thytnseq <- set_expt_colors(thytnseq, colors=c("darkred","forestgreen","blue","black"))
## Error in set_expt_colors(thytnseq, colors = c("darkred", "forestgreen", : object 'sample_colors' not found
thytnseq_metrics <- sm(graph_metrics(thytnseq))
## holy crap the colors suck, oh yeah yoann has some colors he wanted for this... where did I put them?
thytnseq_norm <- sm(normalize_expt(expt=thytnseq, transform="log2", convert="cpm", norm="quant", filter=TRUE))
thytnseq_norm_metrics <- sm(graph_metrics(thytnseq_norm))
thytnseq_normbatch <- sm(normalize_expt(expt=thytnseq, transform="log2", convert="cpm", norm="quant", filter=TRUE, batch="combat_scale"))
thytnseq_normbatch_metrics <- sm(graph_metrics(thytnseq_normbatch))
Now show some plots of interest
thytnseq_metrics$libsize
## We see that 537 and 538 are low (2 of the 4 late log samples)
thytnseq_metrics$density
## That is a bit funky
thytnseq_norm_metrics$disheat
## Wow, these really do split by library!!
thytnseq_norm_metrics$corheat
## This split by library is pretty profound indeed
thytnseq_norm_metrics$pcaplot
## See what I mean!?
thytnseq_normbatch_metrics$pcaplot
## But when the library effect is removed, that picture changes quite drastically and I think becomes something quite assayable.
In this first block we create a set of initial pictures and we will show them below.
subcu_metrics <- sm(graph_metrics(subcu))
subcu_norm <- sm(normalize_expt(subcu, transform="log2", convert="cpm", norm="quant", filter=TRUE))
subcu_norm_metrics <- sm(graph_metrics(subcu_norm))
Now we have raw-data metrics and default-normalized metrics. Lets visualize some of them.
subcu_metrics$libsize
## mouset48r2 is low, subcut0 is high They are not tragic
subcu_metrics$density
## Ok, I am a bit concerned about the 48 hour data
subcu_norm_metrics$corheat
## t0 is brightest pink, t12 is 2nd brightest, t24 starting to get brown, t48 is brownish
## There is some funky batch effect going on which we will need to address
## Yoann wants colors: t0='red', t12='orange', t24='yellow', t48='green'
subcu_norm_metrics$pcaplot
## the one 24 hour sample is a bit of an arsehole
subcu_norm_metrics$smc
## t24 m2v2
Ok, so subcutaneous sample time 24 m2v2 might be too damaged to use. Lets see if it is possible to do some sort of batch correction to salvage it?
attempt <- sm(normalize_expt(subcu, transform="log2", convert="cpm", norm="quant", filter=TRUE, batch="sva"))
attempt_metrics <- graph_metrics(attempt, qq=TRUE)
## Graphing number of non-zero genes with respect to CPM by library.
## Graphing library sizes.
## Graphing a boxplot.
## Graphing a correlation heatmap.
## Graphing a standard median correlation.
## Performing correlation.
## Graphing a distance heatmap.
## Graphing a standard median distance.
## Performing distance.
## Graphing a PCA plot.
## Plotting a density plot.
## QQ plotting!
## Making plot of mouset0r1v1(1) vs. a sample distribution.
## Making plot of mouset0r2v1(2) vs. a sample distribution.
## Making plot of mouset12r1v1(3) vs. a sample distribution.
## Making plot of mouset12r2v1(4) vs. a sample distribution.
## Making plot of mouset24r1v1(5) vs. a sample distribution.
## Making plot of mouset24r2v1(6) vs. a sample distribution.
## Making plot of mouset48r1v1(7) vs. a sample distribution.
## Making plot of mouset48r2v1(8) vs. a sample distribution.
## Making plot of subcut0v2(9) vs. a sample distribution.
## Making plot of subcut12m1v2(10) vs. a sample distribution.
## Making plot of subcut24m1v2(11) vs. a sample distribution.
## Making plot of subcut24m2v2(12) vs. a sample distribution.
## Making plot of subcut48m1v2(13) vs. a sample distribution.
## Making plot of subcut48m2v2(14) vs. a sample distribution.
attempt_metrics$pcaplot
## This might be valid?
attempt_metrics$corheat
## The early time points are very similar to each other and so mixed
## Oh, 1 t12 sample clusters with t24, that is annoying
attempt_metrics$disheat
## Ditto
attempt_metrics$smc
## This suggests t48r1v1 as will the distance
attempt_metrics$smd
## hmm using this normalization, t48r1v1 might be a problem.
attempt_metrics$qqrat
## According to this, the following samples might be jerks: mouset48r2v1, mouset48r1v1, subcut12m1v2
The question now is, can we drop one or more of these samples and still have a valid pairwise comparison?
The primary candidate for dropping is: t48r1v1 I want to suggest dropping this sample.
attempt <- expt_subset(expt=attempt, subset="sampleid!='mouset48r1v1'")
plot_pca(attempt)$plot
plot_corheat(attempt)
## mouset12r1v1 is problematic, it might get removed
rpoe_metrics <- sm(graph_metrics(rpoe))
rpoe_norm <- sm(normalize_expt(rpoe, transform="log2", convert="cpm", norm="quant", filter=TRUE))
rpoe_norm_metrics <- sm(graph_metrics(rpoe_norm))
rpoe_metrics$libsize
## mouset48r2 is low, rpoet0 is high They are not tragic
rpoe_metrics$density
## Ok, I am a bit concerned about the 48 hour data
rpoe_norm_metrics$corheat
## t0 is brightest pink, t12 is 2nd brightest, t24 starting to get brown, t48 is brownish
## There is some funky batch effect going on which we will need to address
## Yoann wants colors: t0='red', t12='orange', t24='yellow', t48='green'
rpoe_norm_metrics$pcaplot
## the one 24 hour sample is a bit of an arsehole
rpoe_norm_metrics$smc
## t24 m2v2
Holy ass crackers, the data is beautiful.
tmp <- sm(saveme(filename="sample_metrics.rda.xz"))