1 Introduction

Take a look at these samples. I will spend a few minutes reading their previous document to fill this intro out.

2 Annotation information

I used the ensembl mm39 release 112. That should work find with the most recent biomart gene annotations (202411).

For the moment, the useast server seems to be down, I think I will just use the archive until that is fixed.

##mm_annot <- load_biomart_annotations(species = "mmusculus", archive = FALSE)
mm_annot <- load_biomart_annotations(species = "mmusculus")
## The biomart annotations file already exists, loading from it.
mm_genes <- mm_annot[["gene_annotations"]]

3 Extract metadata logs

I will probably disable the following block once I have the starting metadata in a completed state so that if someone wants to see all the numbers of mapped reads etc without having my working tree they will still be available.

sample_sheet <- "sample_sheets/TtR1_metadata.xlsx"
meta <- gather_preprocessing_metadata("sample_sheets/all_samples.xlsx", species = "mm39_112",
                                      new_metadata = sample_sheet)

This results in a new column ‘hisat_count_table’, so let us use that, and the gene annotations, to create an expressionset.

4 Create Expressionset

mm_expt <- create_expt(sample_sheet,
                       gene_info = mm_genes,
                       file_column = "hisat_count_table") %>%
  set_expt_conditions(fact = "flox")
## Reading the sample metadata.
## The sample definitions comprises: 24 rows(samples) and 27 columns(metadata fields).
## Matched 25409 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 25425 features and 24 samples.
## The numbers of samples by condition are:
## 
## control_1a control_2b control_s9 ko_1a_c25b ko_2b_54C1 ko_2b_54C2   ko_s9_s8  so_s9_r14 
##          3          3          3          3          3          3          3          3
mm_expt
## A modified expressionSet containing 25425 genes and 24 sample. There are 28 metadata columns and 13 annotation columns.
## The primary condition is comprised of:
## control_1a, control_2b, control_s9, ko_1a_c25b, ko_2b_54C1, ko_2b_54C2, ko_s9_s8, so_s9_r14.
## Its current state is: raw(data).
combined <- paste0(pData(mm_expt)[["state"]], "_",
                   pData(mm_expt)[["line"]])
pData(mm_expt)[["combined"]] <- combined
plot_meta_sankey(mm_expt, factors = c("flox", "line", "batch"))
## A sankey plot describing the metadata of 24 samples,
## including 40 out of 0 nodes and traversing metadata factors:
## .

5 Initial visualization

plot_libsize(mm_expt)
## Library sizes of 24 samples, 
## ranging from 14,686,859 to 17,960,838.

plot_nonzero(mm_expt)
## The following samples have less than 16526.25 genes.
##  [1] "1_IATF-1A_" "2_IATF-1A_" "3_IATF-1A_" "4_IATF-2B_" "5_IATF-2B_" "6_IATF-2B_" "7_IATF-S9_" "8_IATF-S9_"
##  [9] "9_IATF-S9_" "10_IATF-1A" "11_IATF-1A" "12_IATF-1A" "13_IATF-2B" "14_IATF-2B" "15_IATF-2B" "16_IATF-2B"
## [17] "19_IATF-S9" "20_IATF-S9" "21_IATF-S9" "22_IATF-S9" "23_IATF-S9" "24_IATF-S9"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## A non-zero genes plot of 24 samples.
## These samples have an average 16.65 CPM coverage and 15156 genes observed, ranging from 14576 to
## 16635.

plot_corheat(mm_expt)
## A heatmap of pairwise sample correlations ranging from: 
## 0.573161898662224 to 0.999309770057417.

6 Normalized visualization

mm_norm <- normalize_expt(mm_expt, transform = "log2", convert = "cpm",
                          norm = "quant", filter = TRUE)
## Removing 13165 low-count genes (12260 remaining).
## transform_counts: Found 1918 values equal to 0, adding 1 to the matrix.
plot_disheat(mm_norm)
## A heatmap of pairwise sample distances ranging from: 
## 14.9259724949041 to 139.831122474879.

plot_pca(mm_norm)
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by control_1a, control_2b, control_s9, ko_1a_c25b, ko_2b_54C1, ko_2b_54C2, ko_s9_s8, so_s9_r14
## Shapes are defined by b1, b2, b3.

Wow, these samples are clustering absurdly well.

7 Set up some contrasts

mm_combined <- set_expt_conditions(mm_expt, fact = "combined")
## The numbers of samples by condition are:
## 
## control_l1a control_l2b control_ls9      ko_l1a      ko_l2b      ko_ls9 
##           3           3           3           3           6           6
keepers <- list(
  "l1a" = c("ko_l1a", "control_l1a"),
  "l2b" = c("ko_l2b", "control_l2b"),
  "ls9" = c("ko_ls9", "control_ls9"))
mm_de <- all_pairwise(mm_combined, keepers = keepers, keep_underscore = TRUE)
## 
## control_l1a control_l2b control_ls9      ko_l1a      ko_l2b      ko_ls9 
##           3           3           3           3           6           6 
## 
## b1 b2 b3 
##  8  8  8
mm_de
## A pairwise differential expression with results from: basic, deseq, edger, limma, noiseq.
## This used a surrogate/batch estimate from: batch in model/limma.
## The primary analysis performed 10 comparisons.
mm_tables <- combine_de_tables(mm_de, excel = "excel/three_contrasts.xlsx")
mm_tables
## A set of combined differential expression results.
##                   table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 ko_l1a_vs_control_l1a        1203          1651        1333          1714        1711          1457
## 2 ko_l2b_vs_control_l2b        1612          1557        1759          1684        1723          1655
## 3 ko_ls9_vs_control_ls9        1089           539        1081           581         752           735
## Plot describing unique/shared genes in a differential expression table.

mm_sig <- extract_significant_genes(mm_tables, excel = "excel/three_contrasts-sig.xlsx")
mm_sig
## A set of genes deemed significant according to limma, edger, deseq, basic.
## The parameters defining significant were:
## LFC cutoff: 1 adj P cutoff: 0.05
##                       limma_up limma_down edger_up edger_down deseq_up deseq_down basic_up basic_down
## ko_l1a_vs_control_l1a     1711       1457     1333       1714     1203       1651     1198       1079
## ko_l2b_vs_control_l2b     1723       1655     1759       1684     1612       1557     1261       1194
## ko_ls9_vs_control_ls9      752        735     1081        581     1089        539      536        588

8 Take a look with gprofiler

mm_gp <- all_gprofiler(mm_sig, species = "mmusculus", excel = "excel/all_gp.xlsx")
mm_gp
## Running gProfiler on every set of significant genes found:
##                             BP CORUM HP KEGG MIRNA MF REAC  TF WP
## ko_l1a_vs_control_l1a_up   330     0  0    0     0 63    0 150  0
## ko_l1a_vs_control_l1a_down 882     0  0    0     0 87    5 293  0
## ko_l2b_vs_control_l2b_up   181     0  0    0     0 69    0 106  0
## ko_l2b_vs_control_l2b_down 896     0  1    2     0 89   15 302  1
## ko_ls9_vs_control_ls9_up   517     0  0    4     0 61    5 254  0
## ko_ls9_vs_control_ls9_down 429     0  0    0     2 31    3  21  0
LS0tCnRpdGxlOiAiUHJlcHJvY2Vzc2luZyB2YXJpb3VzIGFkaXBvc2UgZmVycml0aW4gdHJhbnNwb3J0ZXIgc2FtcGxlcy4iCmF1dGhvcjogImF0YiBhYmVsZXdAZ21haWwuY29tIgpiaWJsaW9ncmFwaHk6IC9ob21lL3RyZXkvc2NyYXRjaC96b3Rlcm9fbGlicmFyeS9hdGIuYmliCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6CiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgICBjb2RlX2ZvbGRpbmc6IHNob3cKICAgIGZpZ19jYXB0aW9uOiB0cnVlCiAgICBmaWdfaGVpZ2h0OiA3CiAgICBmaWdfd2lkdGg6IDcKICAgIGhpZ2hsaWdodDogemVuYnVybgogICAga2VlcF9tZDogZmFsc2UKICAgIG1vZGU6IHNlbGZjb250YWluZWQKICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQogICAgc2VsZl9jb250YWluZWQ6IHRydWUKICAgIHRoZW1lOiByZWFkYWJsZQogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6CiAgICAgIGNvbGxhcHNlZDogZmFsc2UKICAgICAgc21vb3RoX3Njcm9sbDogZmFsc2UKLS0tCgoKYGBge3Igb3B0aW9ucywgaW5jbHVkZSA9IEZBTFNFfQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KGZvcmNhdHMpCmxpYnJhcnkoZ2x1ZSkKbGlicmFyeShocGdsdG9vbHMpCmxpYnJhcnkodGlkeXIpCmRldnRvb2xzOjpsb2FkX2FsbCgifi9ocGdsdG9vbHMiKQprbml0cjo6b3B0c19rbml0JHNldChwcm9ncmVzcyA9IFRSVUUsIHZlcmJvc2UgPSBUUlVFLCB3aWR0aCA9IDkwLCBlY2hvID0gVFJVRSkKa25pdHI6Om9wdHNfY2h1bmskc2V0KAogIGVycm9yID0gVFJVRSwgZmlnLndpZHRoID0gOCwgZmlnLmhlaWdodCA9IDgsIGZpZy5yZXRpbmEgPSAyLAogIG91dC53aWR0aCA9ICIxMDAlIiwgZGV2ID0gInBuZyIsCiAgZGV2LmFyZ3MgPSBsaXN0KHBuZyA9IGxpc3QodHlwZSA9ICJjYWlyby1wbmciKSkpCm9sZF9vcHRpb25zIDwtIG9wdGlvbnMoZGlnaXRzID0gNCwgc3RyaW5nc0FzRmFjdG9ycyA9IEZBTFNFLCBrbml0ci5kdXBsaWNhdGUubGFiZWwgPSAiYWxsb3ciKQpnZ3Bsb3QyOjp0aGVtZV9zZXQoZ2dwbG90Mjo6dGhlbWVfYncoYmFzZV9zaXplID0gMTIpKQp2ZXIgPC0gU3lzLmdldGVudigiVkVSU0lPTiIpCnJ1bmRhdGUgPC0gZm9ybWF0KFN5cy5EYXRlKCksIGZvcm1hdCA9ICIlWSVtJWQiKQoKcm1kX2ZpbGUgPC0gIjAxZGF0YXNldHMuUm1kIgpzYXZlZmlsZSA8LSBnc3ViKHBhdHRlcm4gPSAiXFwuUm1kIiwgcmVwbGFjZSA9ICJcXC5yZGFcXC54eiIsIHggPSBybWRfZmlsZSkKbWV0aG9kcyA8LSBsaXN0KCJiYXNpYyIgPSBUUlVFLCAiZGVzZXEiID0gVFJVRSwgImRyZWFtIiA9IEZBTFNFLAogICAgICAgICAgICAgICAgImVic2VxIiA9IEZBTFNFLCAiZWRnZXIiID0gVFJVRSwgImxpbW1hIiA9IFRSVUUsICJub2lzZXEiID0gVFJVRSkKZGF0YV9zdHJ1Y3R1cmVzIDwtIGMoIm1ldGhvZHMiKQpgYGAKCiMgSW50cm9kdWN0aW9uCgpUYWtlIGEgbG9vayBhdCB0aGVzZSBzYW1wbGVzLiAgSSB3aWxsIHNwZW5kIGEgZmV3IG1pbnV0ZXMgcmVhZGluZwp0aGVpciBwcmV2aW91cyBkb2N1bWVudCB0byBmaWxsIHRoaXMgaW50cm8gb3V0LgoKIyBBbm5vdGF0aW9uIGluZm9ybWF0aW9uCgpJIHVzZWQgdGhlIGVuc2VtYmwgbW0zOSByZWxlYXNlIDExMi4gIFRoYXQgc2hvdWxkIHdvcmsgZmluZCB3aXRoIHRoZQptb3N0IHJlY2VudCBiaW9tYXJ0IGdlbmUgYW5ub3RhdGlvbnMgKDIwMjQxMSkuCgpGb3IgdGhlIG1vbWVudCwgdGhlIHVzZWFzdCBzZXJ2ZXIgc2VlbXMgdG8gYmUgZG93biwgSSB0aGluayBJIHdpbGwKanVzdCB1c2UgdGhlIGFyY2hpdmUgdW50aWwgdGhhdCBpcyBmaXhlZC4KCmBgYHtyfQojI21tX2Fubm90IDwtIGxvYWRfYmlvbWFydF9hbm5vdGF0aW9ucyhzcGVjaWVzID0gIm1tdXNjdWx1cyIsIGFyY2hpdmUgPSBGQUxTRSkKbW1fYW5ub3QgPC0gbG9hZF9iaW9tYXJ0X2Fubm90YXRpb25zKHNwZWNpZXMgPSAibW11c2N1bHVzIikKbW1fZ2VuZXMgPC0gbW1fYW5ub3RbWyJnZW5lX2Fubm90YXRpb25zIl1dCmBgYAoKIyBFeHRyYWN0IG1ldGFkYXRhIGxvZ3MKCkkgd2lsbCBwcm9iYWJseSBkaXNhYmxlIHRoZSBmb2xsb3dpbmcgYmxvY2sgb25jZSBJIGhhdmUgdGhlIHN0YXJ0aW5nCm1ldGFkYXRhIGluIGEgY29tcGxldGVkIHN0YXRlIHNvIHRoYXQgaWYgc29tZW9uZSB3YW50cyB0byBzZWUgYWxsIHRoZQpudW1iZXJzIG9mIG1hcHBlZCByZWFkcyBldGMgd2l0aG91dCBoYXZpbmcgbXkgd29ya2luZyB0cmVlIHRoZXkgd2lsbApzdGlsbCBiZSBhdmFpbGFibGUuCgpgYGB7cn0Kc2FtcGxlX3NoZWV0IDwtICJzYW1wbGVfc2hlZXRzL1R0UjFfbWV0YWRhdGEueGxzeCIKYGBgCgpgYGB7ciwgZXZhbD1GQUxTRX0KbWV0YSA8LSBnYXRoZXJfcHJlcHJvY2Vzc2luZ19tZXRhZGF0YSgic2FtcGxlX3NoZWV0cy9hbGxfc2FtcGxlcy54bHN4Iiwgc3BlY2llcyA9ICJtbTM5XzExMiIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgbmV3X21ldGFkYXRhID0gc2FtcGxlX3NoZWV0KQpgYGAKClRoaXMgcmVzdWx0cyBpbiBhIG5ldyBjb2x1bW4gJ2hpc2F0X2NvdW50X3RhYmxlJywgc28gbGV0IHVzIHVzZSB0aGF0LCBhbmQgdGhlIGdlbmUKYW5ub3RhdGlvbnMsIHRvIGNyZWF0ZSBhbiBleHByZXNzaW9uc2V0LgoKIyBDcmVhdGUgRXhwcmVzc2lvbnNldAoKYGBge3J9Cm1tX2V4cHQgPC0gY3JlYXRlX2V4cHQoc2FtcGxlX3NoZWV0LAogICAgICAgICAgICAgICAgICAgICAgIGdlbmVfaW5mbyA9IG1tX2dlbmVzLAogICAgICAgICAgICAgICAgICAgICAgIGZpbGVfY29sdW1uID0gImhpc2F0X2NvdW50X3RhYmxlIikgJT4lCiAgc2V0X2V4cHRfY29uZGl0aW9ucyhmYWN0ID0gImZsb3giKQptbV9leHB0CmNvbWJpbmVkIDwtIHBhc3RlMChwRGF0YShtbV9leHB0KVtbInN0YXRlIl1dLCAiXyIsCiAgICAgICAgICAgICAgICAgICBwRGF0YShtbV9leHB0KVtbImxpbmUiXV0pCnBEYXRhKG1tX2V4cHQpW1siY29tYmluZWQiXV0gPC0gY29tYmluZWQKcGxvdF9tZXRhX3NhbmtleShtbV9leHB0LCBmYWN0b3JzID0gYygiZmxveCIsICJsaW5lIiwgImJhdGNoIikpCmBgYAoKIyBJbml0aWFsIHZpc3VhbGl6YXRpb24KCmBgYHtyfQpwbG90X2xpYnNpemUobW1fZXhwdCkKcGxvdF9ub256ZXJvKG1tX2V4cHQpCnBsb3RfY29yaGVhdChtbV9leHB0KQpgYGAKCiMgTm9ybWFsaXplZCB2aXN1YWxpemF0aW9uCgpgYGB7cn0KbW1fbm9ybSA8LSBub3JtYWxpemVfZXhwdChtbV9leHB0LCB0cmFuc2Zvcm0gPSAibG9nMiIsIGNvbnZlcnQgPSAiY3BtIiwKICAgICAgICAgICAgICAgICAgICAgICAgICBub3JtID0gInF1YW50IiwgZmlsdGVyID0gVFJVRSkKcGxvdF9kaXNoZWF0KG1tX25vcm0pCnBsb3RfcGNhKG1tX25vcm0pCmBgYAoKV293LCB0aGVzZSBzYW1wbGVzIGFyZSBjbHVzdGVyaW5nIGFic3VyZGx5IHdlbGwuCgojIFNldCB1cCBzb21lIGNvbnRyYXN0cwoKYGBge3J9Cm1tX2NvbWJpbmVkIDwtIHNldF9leHB0X2NvbmRpdGlvbnMobW1fZXhwdCwgZmFjdCA9ICJjb21iaW5lZCIpCmtlZXBlcnMgPC0gbGlzdCgKICAibDFhIiA9IGMoImtvX2wxYSIsICJjb250cm9sX2wxYSIpLAogICJsMmIiID0gYygia29fbDJiIiwgImNvbnRyb2xfbDJiIiksCiAgImxzOSIgPSBjKCJrb19sczkiLCAiY29udHJvbF9sczkiKSkKbW1fZGUgPC0gYWxsX3BhaXJ3aXNlKG1tX2NvbWJpbmVkLCBrZWVwZXJzID0ga2VlcGVycywga2VlcF91bmRlcnNjb3JlID0gVFJVRSkKbW1fZGUKbW1fdGFibGVzIDwtIGNvbWJpbmVfZGVfdGFibGVzKG1tX2RlLCBleGNlbCA9ICJleGNlbC90aHJlZV9jb250cmFzdHMueGxzeCIpCm1tX3RhYmxlcwptbV9zaWcgPC0gZXh0cmFjdF9zaWduaWZpY2FudF9nZW5lcyhtbV90YWJsZXMsIGV4Y2VsID0gImV4Y2VsL3RocmVlX2NvbnRyYXN0cy1zaWcueGxzeCIpCm1tX3NpZwpgYGAKCiMgVGFrZSBhIGxvb2sgd2l0aCBncHJvZmlsZXIKCmBgYHtyfQptbV9ncCA8LSBhbGxfZ3Byb2ZpbGVyKG1tX3NpZywgc3BlY2llcyA9ICJtbXVzY3VsdXMiLCBleGNlbCA9ICJleGNlbC9hbGxfZ3AueGxzeCIpCm1tX2dwCmBgYAo=