I downloaded a new version of cellranger along with the various reference files provided by 10x for the VD(J) references etc. I got a bit distracted by the pipeline language implemented by 10x called ‘martian’. I have the feeling that it might prove a good thing to play with.
Here are the commands I ran to separate the samples and perform the alignments. There are 4 sample names and each was done with one run of the ‘normal’ GEX scRNASeq method and one of the (new to me) V(D)J library.
The sample names are below, but the types are GEX
module add cellranger
names="A_Control A_IM_Mex09 A_IN_Mex09 A_Mock_Mex09"
for name in names; do
echo $name
cellranger count --fastqs=preprocessing \
--transcriptome=/sw/local/cellranger/202212/reference/refdata-gex-mm10-2020-A \
--id="${name}_GEX" --sample="${name}_GEX"
cellranger vdj --fastqs preprocessing/ \
--reference /sw/local/cellranger/202212/reference/refdata-cellranger-vdj-GRCm38-alts-ensembl-7.0.0/ \
--id="${name}_VDJ" --sample="${name}_VDJ"
done
April kindly sent some information from 10x which shows that I should have used the multi pipline when preprocessing the data.
Intra-Muscular vs. Nasal
I wrote 4 separate configuration csv files using the templates I downloaded and following a little reading. It seemed to me that I should be able to process them all as a single csv file, but when I attempted that, cellranger did not react well. It also took a few tries before I got the various reference/library options correct.
Note that once cellranger successfully ran for the samples I moved them to the multi/ directory so that I can compare the outputs to when I simply did the ‘count’ operation.
The following invocations of cellranger all appear to work without any problems. Ideally I would like them to be done in a single run, though.
module add cellranger
cellranger multi --id control --csv sample_sheets/multi_config_try03_control.csv
cellranger multi --id mock --csv sample_sheets/multi_config_try03_mock.csv
cellranger multi --id m_vaccine --csv sample_sheets/multi_config_try03_m.csv
cellranger multi --id n_vaccine --csv sample_sheets/multi_config_try03_n.csv
mv control mock m_vaccine n_vaccine multi/
The following command, which in theory should do the preprocessing of all samples together, fails. It seems my understanding of the multi csv input is incorrect.
module add cellranger
rm -rf run01 && cellranger multi --id run01 --csv sample_sheets/multi_config_try04_all.csv
## Error in running command bash
read.csv("sample_sheets/multi_config_try04_all.csv", comment.char="#")
## X.gene.expression.
## 1 reference
## 2 no-target-umi-filter
## 3 chemistry
## 4 expect-cells
## 5 include-introns
## 6 no-secondary
## 7 no-bam
## 8 min-assignment-confidence
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16 [feature]
## 17
## 18
## 19
## 20
## 21 [vdj]
## 22 reference
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31 [libraries]
## 32 fastq_id
## 33 feature_types
## 34
## 35
## 36 A_Control_GEX
## 37 gene expression
## 38
## 39
## 40 A_Mock_Mex09_GEX
## 41 gene expression
## 42
## 43
## 44 A_IM_Mex09_GEX
## 45 gene expression
## 46
## 47
## 48 A_IN_Mex09_GEX
## 49 gene expression
## 50
## 51
## 52 A_Control_VDJ
## 53 vdj
## 54
## 55
## 56 A_Mock_Mex09_VDJ
## 57 vdj
## 58
## 59
## 60 A_IM_Mex09_VDJ
## 61 vdj
## 62
## 63
## 64 A_IN_Mex09_VDJ
## 65 vdj
## 66
## 67
## X
## 1 /sw/local/cellranger/202212/reference/refdata-gex-mm10-2020-A
## 2 false
## 3 auto
## 4 10000
## 5 false
## 6 false
## 7 false
## 8 0.9
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22 /sw/local/cellranger/202212/reference/refdata-cellranger-vdj-GRCm38-alts-ensembl-7.0.0/
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32 fastqs
## 33 subsample_rate
## 34
## 35
## 36 preprocessing
## 37 1
## 38
## 39
## 40 preprocessing
## 41 1
## 42
## 43
## 44 preprocessing
## 45 1
## 46
## 47
## 48 preprocessing
## 49 1
## 50
## 51
## 52 preprocessing
## 53 1
## 54
## 55
## 56 preprocessing
## 57 1
## 58
## 59
## 60 preprocessing
## 61 1
## 62
## 63
## 64 preprocessing
## 65 1
## 66
## 67
## X.1
## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8 # The minimum estimated likelihood to call a sample as tagged with a Cell Multiplexing Oligo instead of Unassigned". Optional. Default: 0.9. Introduced in Cell Ranger 6.0.2."
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32 physical_library_id
## 33
## 34
## 35
## 36 gex
## 37
## 38
## 39
## 40 gex
## 41
## 42
## 43
## 44 gex
## 45
## 46
## 47
## 48 gex
## 49
## 50
## 51
## 52 vdj
## 53
## 54
## 55
## 56 vdj
## 57
## 58
## 59
## 60 vdj
## 61
## 62
## 63
## 64 vdj
## 65
## 66
## 67
Here is a snippet of code copy-pasted from:
which, if I read it correctly, will read in the vdj output and add it as a set of annotations to the seurat object. Note, I made some changes to it because I simply cannot help myself.
<- function(seurat_obj, name = "control", type = "t") {
add_clonotype <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/vdj_t")
vdj_directory <- glue::glue("{vdj_directory}/filtered_contig_annotations.csv")
vdj_csv <- glue::glue("{vdj_directory}/clonotypes.csv")
reference_csv <- readr::read_csv(vdj_csv, show_col_types = FALSE)
tcr <- readr::read_csv(reference_csv, show_col_types = FALSE)
ref
<- duplicated(tcr[["barcode"]])
tcr_duplicate_barcode_idx <- tcr[!tcr_duplicate_barcode_idx, ]
tcr_nodup
<- as.data.frame(merge(tcr_nodup, ref, by.x = "raw_clonotype_id", by.y = "clonotype_id"))
both rownames(both) <- both[["barcode"]]
"barcode"]] <- NULL
both[[
<- Seurat::AddMetaData(object = seurat_obj, metadata = both)
new return(new)
}
The following block is mostly a cut/paste of itself where I set the (over)simplified name of each sample. This then becomes the template for the path and parameters used to read the data, create a seurat object, and add the clonotype data from the vdj run.
<- "control"
name <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
path <- Seurat::Read10X(path) %>%
a_control CreateSeuratObject(project = name) %>%
add_clonotype(name = name)
<- "mock"
name <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
path <- Seurat::Read10X(path) %>%
a_mock CreateSeuratObject(project = name) %>%
add_clonotype(name = name)
<- "m"
name <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
path <- Seurat::Read10X(path) %>%
a_m CreateSeuratObject(project = name) %>%
add_clonotype(name = name)
<- "n"
name <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
path <- Seurat::Read10X(path) %>%
a_n CreateSeuratObject(project = name) %>%
add_clonotype(name = name)
For the moment I want to be able to play with the individual samples as well as the aggregate so that I can better understand the data. So I guess it works out that I didn’t figure out how to run all the samples at the same time via ‘cellranger multi’.
<- merge(a_control, a_mock) %>%
all merge(a_m) %>%
merge(a_n)
## Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
## duplicated across objects provided. Renaming to enforce unique cell names.
## Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
## duplicated across objects provided. Renaming to enforce unique cell names.
<- !is.na(a_control[["raw_clonotype_id"]])
control_clono summary(control_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:12601
## TRUE :1937
<- !is.na(a_mock[["raw_clonotype_id"]])
mock_clono summary(mock_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:10480
## TRUE :3014
<- !is.na(a_m[["raw_clonotype_id"]])
m_clono summary(m_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:9256
## TRUE :4113
<- !is.na(a_n[["raw_clonotype_id"]])
n_clono summary(n_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:5412
## TRUE :3033
<- as.factor(LETTERS[Idents(object=all)])
cluster_letters names(cluster_letters) <- colnames(x=all)
<- as.character(cluster_letters)
control <- control
mock <- control
m_vaccine <- control
n_vaccine
<- control == "A"
control_idx <- "Control"
control[control_idx] !control_idx] <- "Stimulated"
control[<- mock == "B"
mock_idx <- "Mock"
mock[mock_idx] !mock_idx] <- "Not mock"
mock[<- m_vaccine == "C"
m_vaccine_idx <- "M"
m_vaccine[m_vaccine_idx] !m_vaccine_idx] <- "Not M"
m_vaccine[<- n_vaccine == "D"
n_vaccine_idx <- "N"
n_vaccine[n_vaccine_idx] !n_vaccine_idx] <- "Not N"
n_vaccine[
<- AddMetaData(
all object=all,
metadata=cluster_letters,
col.name="cluster_letters")
<- AddMetaData(
all object=all,
metadata=control,
col.name="control")
<- AddMetaData(
all object=all,
metadata=mock,
col.name="mock")
<- AddMetaData(
all object=all,
metadata=m_vaccine,
col.name="M")
<- AddMetaData(
all object=all,
metadata=n_vaccine,
col.name="N")
"percent_mt"]] <- PercentageFeatureSet(all, pattern="^mt-")
all[["percent_ribo"]] <- PercentageFeatureSet(all, pattern="^Rp[sl]")
all[[VlnPlot(all, features="nFeature_RNA", pt.size=0)
VlnPlot(all, features="percent_mt", pt.size=0)
VlnPlot(all, features="percent_ribo", pt.size=0)
VlnPlot(all, features="nCount_RNA", pt.size=0)
FeatureScatter(all, "percent_ribo", "percent_mt")
FeatureScatter(all, "nCount_RNA", "nFeature_RNA")
FeatureScatter(all, "nCount_RNA", "nFeature_RNA")
## How many cells have specific chains associated with them
sum(!is.na(all$chain))
## [1] 12097
ncol(all)
## [1] 49846
## Get a summary of the counts / cell
summary(colSums(all@assays$RNA@counts))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 500 1301 2210 3421 4576 37440
dim(all)
## [1] 32285 49846
<- WhichCells(all, expression = nFeature_RNA > 200)
sufficient_rna_observed <- subset(all, cells = sufficient_rna_observed)
filt dim(filt)
## [1] 32285 49838
<- rowSums(all) > 3
sufficiently_observed_idx summary(sufficiently_observed_idx)
## Mode FALSE TRUE
## logical 12113 20172
<- subset(filt, features = rownames(all)[sufficiently_observed_idx])
filt dim(filt)
## [1] 20172 49838
## Keep cells with at least some ribosomal reads
## Note the Percent function above actually puts in a floating point
## number from 0-100, not (as I assumed from 0-1).
<- WhichCells(filt, expression = percent_ribo > 5)
high_ribosomal <- subset(filt, cells = high_ribosomal)
filt dim(filt)
## [1] 20172 37946
<- WhichCells(filt, expression = percent_mt < 20)
low_mitochondrial <- subset(filt, cells = low_mitochondrial)
filt dim(filt)
## [1] 20172 37929
## Now drop mitochondrial genes
<- grepl("^mt-", rownames(filt))
mt_idx summary(mt_idx)
## Mode FALSE TRUE
## logical 20159 13
<- filt[!mt_idx, ]
filt dim(filt)
## [1] 20159 37929
<- NormalizeData(object=all) %>%
all_norm FindVariableFeatures() %>%
ScaleData() %>%
RunPCA() %>%
FindNeighbors() %>%
FindClusters() %>%
RunTSNE() %>%
RunUMAP(reduction = "pca", dims = 1:10)
## Centering and scaling data matrix
## PC_ 1
## Positive: Ager, Sparc, Mettl7a1, Cldn18, Emp2, Cd63, Cystm1, Aqp5, Clic3, Krt7
## Gsn, Limch1, Cryab, Alcam, Selenbp1, Scnn1a, Selenbp2, Timp3, Krt18, Krt19
## Sec14l3, Gprc5a, Cyp2b10, Krt8, Igfbp6, Igfbp7, Mal2, Myh14, Npnt, Scd2
## Negative: Rac2, Coro1a, Arhgdib, Ms4a4b, Laptm5, Cd3e, Srgn, Selplg, Cd52, Cd3g
## Cd3d, Ptprc, Lat, Thy1, Ltb, Vps37b, Lck, Ms4a6b, Gimap3, Ptprcap
## Il7r, Sept1, Itgb7, Gramd3, Lsp1, Skap1, Cd53, Cytip, Cd2, Fyb
## PC_ 2
## Positive: Wfdc2, Atp1b1, Cbr2, Ctsh, Sftpd, Napsa, Cldn3, Sftpb, Cxcl15, Slc34a2
## Sftpa1, Chil1, Muc1, Lyz2, Sftpc, Lamp3, Chchd10, Ppp1r14c, Lgi3, Krt18
## Scd1, Cldn7, Ptprf, Car8, Irx1, Baiap2l1, Hc, Epcam, Krt8, Dram1
## Negative: Sod3, Serping1, Bgn, Prelp, Plpp3, Olfml3, Tcf21, Spon1, Hsd11b1, Fn1
## Rarres2, Loxl1, Mfap4, Plxdc2, Gpx3, Cdo1, Itga8, Pdgfra, Clec3b, Dpep1
## Col1a2, Pcolce2, Ptgis, Gyg, Col1a1, Inmt, Ogn, Fhl1, Fmo2, Lrp1
## PC_ 3
## Positive: Rtkn2, Spock2, Scnn1g, Igfbp2, Pdpn, Col4a3, Sema3e, Cyp2b10, Scnn1b, Col4a4
## Flrt3, Hopx, Cytl1, Lama3, Tacstd2, Ndnf, Tmem37, Itgb6, Krt7, Krt19
## Lamb3, Ptpre, Bdnf, Cyp4b1, Clic5, Crlf1, Cdkn2b, Mab21l4, Scnn1a, Hs2st1
## Negative: Cbr2, Cxcl15, Slc34a2, Sftpb, Sftpd, Chil1, Sftpa1, Cldn3, Napsa, Dram1
## Lyz2, Lamp3, Sftpc, Muc1, Ppp1r14c, Scd1, Mgst1, Hc, Lgi3, Car8
## Nupr1, Fabp5, S100g, Ctsc, Egfl6, Hp, Cpm, Atp1b1, Fasn, Chia1
## PC_ 4
## Positive: Cd74, H2-Ab1, H2-Aa, Slc34a2, Sftpc, H2-Eb1, Napsa, Sftpa1, Sftpb, Lamp3
## Cxcl15, Lyz2, Hc, Chil1, Lgi3, Scd1, Sftpd, Lpcat1, Cd93, Dram1
## S100g, Ly6c1, Egfl6, Tmem252, Sema3c, Pla2g1b, Etv5, Car4, Chia1, Ctsc
## Negative: Fam183b, Foxj1, Tmem212, Cyp2s1, 1110017D15Rik, Gm19935, 1700007K13Rik, Ccdc153, Cfap126, Tctex1d4
## Spaca9, 1700094D03Rik, AU040972, BC051019, Rsph1, Dnali1, Dynlrb2, Nme5, Mns1, Gm867
## 1700016K19Rik, Pifo, Arhgdig, 1700001C02Rik, Ccdc113, Tekt4, Mlf1, Odf3b, 2410004P03Rik, Sntn
## PC_ 5
## Positive: Coro1a, Rac2, Lgals1, Cd3e, Arhgdib, Laptm5, Cd52, Ms4a4b, Selplg, Cd3g
## Ifi27l2a, Ptprc, Thy1, Ptprcap, Lsp1, Cd3d, Lat, Sept1, Lck, Ltb
## Cytip, Ms4a6b, Itgb7, Gimap3, Skap1, Icos, Il2rb, Cd2, Vps37b, Fyb
## Negative: Ly6a, Tm4sf1, Ly6c1, Ly6e, Cd93, Sema3c, Car4, Tmem252, Kdr, Stmn2
## Plaur, Id1, Tmcc2, H2-Ab1, Lyve1, Gja4, Efnb2, Fam183b, Hspa1b, Hspb1
## Cyp4b1, Foxj1, Odf3b, Tbx3, Cd74, Tmem212, Rgs12, 1110017D15Rik, Ccdc153, 1700007K13Rik
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 49846
## Number of edges: 1551538
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9142
## Number of communities: 27
## Elapsed time: 26 seconds
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 17:03:42 UMAP embedding parameters a = 0.9922 b = 1.112
## 17:03:42 Read 49846 rows and found 10 numeric columns
## 17:03:42 Using Annoy for neighbor search, n_neighbors = 30
## 17:03:42 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 17:03:51 Writing NN index file to temp file /tmp/RtmpQMNruO/file1f36bb6741cd59
## 17:03:51 Searching Annoy index using 1 thread, search_k = 3000
## 17:04:19 Annoy recall = 100%
## 17:04:21 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 17:04:27 Initializing from normalized Laplacian + noise (using irlba)
## 17:04:37 Commencing optimization for 200 epochs, with 2065552 positive edges
## 17:05:05 Optimization finished
DimPlot(object=all_norm, reduction="tsne")
<- DimPlot(all_norm, reduction="umap", group.by="cluster_letters", label=TRUE)
plotted plotted
<- NormalizeData(object=filt) %>%
filt_norm FindVariableFeatures() %>%
ScaleData()
## Centering and scaling data matrix
<- JackStraw(filt_norm, num.replicate=10) filt_norm
## Error: Cannot find 'pca' in this Seurat object
<- ScoreJackStraw(filt_norm) filt_norm
## Error: Cannot find 'pca' in this Seurat object
JackStrawPlot(filt_norm)
## Error: Cannot find 'pca' in this Seurat object
ElbowPlot(filt_norm)
## Error: Cannot find 'pca' in this Seurat object
## So I am thinking maybe 4-10?
<- 6
wanted_dims
##filt_norm <- filt_norm %>%
## RunPCA() %>%
## FindNeighbors(k.param = wanted_dims) %>%
## FindClusters(resolution = 0.75) %>%
## RunTSNE() %>%
## RunUMAP(reduction = "pca", dims = 1:wanted_dims)
<- FindNeighbors(filt_norm, dims = 1:wanted_dims) %>%
test FindClusters(resolution = 0.5)
## Error: Cannot find 'pca' in this Seurat object
<- StashIdent(test, save.name = "old_ident") test
## Error in StashIdent(test, save.name = "old_ident"): object 'test' not found
<- FindClusters(filt_norm, resolution = 0.1) %>%
test2 FindNeighbors(k.param = 6, do.plot = TRUE)
## Error in FindClusters.Seurat(filt_norm, resolution = 0.1): Provided graph.name not present in Seurat object
<- RunUMAP(test2, dims = 1:9) test3
## Error in RunUMAP(test2, dims = 1:9): object 'test2' not found
DimPlot(test3, label = TRUE)
## Error in is(x, "classRepresentation"): object 'test3' not found
<- test2 concat
## Error in eval(expr, envir, enclos): object 'test2' not found
<- concat[["orig.ident"]][["orig.ident"]] identity_vector
## Error in eval(expr, envir, enclos): object 'concat' not found
class(identity_vector)
## Error in eval(expr, envir, enclos): object 'identity_vector' not found
<- as.character(concat[["seurat_clusters"]][["seurat_clusters"]]) cluster_vector
## Error in eval(expr, envir, enclos): object 'concat' not found
<- paste0(identity_vector, "_", cluster_vector) concatenated
## Error in paste0(identity_vector, "_", cluster_vector): object 'identity_vector' not found
"cluster_sample"]] <- concatenated concat[[
## Error in eval(expr, envir, enclos): object 'concatenated' not found
<- FindMarkers(concat, group.by = "cluster_sample",
controlm_0 ident.1 = "control_0", ident.2 = "m_0")
## Error in FindMarkers(concat, group.by = "cluster_sample", ident.1 = "control_0", : object 'concat' not found
summary(controlm_0)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'controlm_0' not found
DimPlot(object=filt_norm, reduction="tsne", group.by = "orig.ident", label = TRUE)
## Error: Cannot find 'tsne' in this Seurat object
<- DimPlot(filt_norm, reduction="umap", group.by="seurat_clusters", label=TRUE) plotted
## Error: Cannot find 'umap' in this Seurat object
plotted
<- FindVariableFeatures(object = filt_norm)
var <- head(VariableFeatures(var), 30)
most_var <- VariableFeaturePlot(var)
variable_plot <- LabelPoints(plot = variable_plot, points = most_var, repel = TRUE) variable_plot
## When using repel, set xnudge and ynudge to 0 for optimal results
variable_plot
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: ggrepel: 12 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
DefaultAssay(filt_norm) <- "RNA"
<- FindConservedMarkers(test2,
markers ident.1 = "control",
ident.2 = "m",
grouping.var = "seurat_clusters",
verbose=TRUE)
## Error in FetchData(object = object, vars = grouping.var): object 'test2' not found
<- FindMarkers(filt_norm, group.by = "orig.ident",
mock_vs_control ident.1 = "mock",
ident.2 = "control")
head(mock_vs_control)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## Stat1 0 1.5039 0.496 0.157 0
## Sp100 0 1.0433 0.415 0.161 0
## B2m 0 0.9737 0.967 0.863 0
## Zbp1 0 1.5934 0.346 0.026 0
## Gbp7 0 1.0010 0.387 0.142 0
## Gbp3 0 0.7871 0.223 0.044 0
<- FindMarkers(filt_norm, group.by = "orig.ident",
muscular_vs_mock ident.1 = "m",
ident.2 = "mock")
summary(muscular_vs_mock)
## p_val avg_log2FC pct.1 pct.2
## Min. :0.0000 Min. :-2.9002 Min. :0.028 Min. :0.065
## 1st Qu.:0.0000 1st Qu.:-0.3886 1st Qu.:0.186 1st Qu.:0.221
## Median :0.0000 Median :-0.2541 Median :0.352 Median :0.384
## Mean :0.0021 Mean :-0.0509 Mean :0.426 Mean :0.435
## 3rd Qu.:0.0000 3rd Qu.: 0.3379 3rd Qu.:0.612 3rd Qu.:0.592
## Max. :0.3900 Max. : 0.9502 Max. :0.991 Max. :0.990
## p_val_adj
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.0489
## 3rd Qu.:0.0000
## Max. :1.0000
<- FindMarkers(filt, group.by = "orig.ident",
nasal_vs_mock min.pct = 0.25, ident.1 = "n",
ident.2 = "mock")
summary(nasal_vs_mock)
## p_val avg_log2FC pct.1 pct.2
## Min. :0.0000 Min. :-0.534 Min. :0.176 Min. :0.094
## 1st Qu.:0.0000 1st Qu.: 0.293 1st Qu.:0.410 1st Qu.:0.337
## Median :0.0000 Median : 0.355 Median :0.554 Median :0.482
## Mean :0.0038 Mean : 0.405 Mean :0.589 Mean :0.521
## 3rd Qu.:0.0000 3rd Qu.: 0.458 3rd Qu.:0.739 3rd Qu.:0.678
## Max. :0.7851 Max. : 1.404 Max. :0.997 Max. :0.998
## p_val_adj
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.0306
## 3rd Qu.:0.0000
## Max. :1.0000
FeaturePlot(filt, features=c("Rgcc"),
split.by="orig.ident", max.cutoff=3, cols=c("darkgreen", "darkred"))
## Error: Unable to find a DimReduc matching one of 'umap', 'tsne', or 'pca', please specify a dimensional reduction to use
This is a neat idea, I think we can repurpose it to immunology gene sets.
<- CellCycleScoring(
filt object = filt_norm,
g2m.features = cc.genes$g2m.genes,
s.features = cc.genes$s.genes)
## Warning: The following features are not present in the object: MCM5, PCNA, TYMS,
## FEN1, MCM2, MCM4, RRM1, UNG, GINS2, MCM6, CDCA7, DTL, PRIM1, UHRF1, MLF1IP,
## HELLS, RFC2, RPA2, NASP, RAD51AP1, GMNN, WDR76, SLBP, CCNE2, UBR7, POLD3, MSH2,
## ATAD2, RAD51, RRM2, CDC45, CDC6, EXO1, TIPIN, DSCC1, BLM, CASP8AP2, USP1, CLSPN,
## POLA1, CHAF1B, BRIP1, E2F8, not searching for symbol synonyms
## Warning: The following features are not present in the object: HMGB2, CDK1,
## NUSAP1, UBE2C, BIRC5, TPX2, TOP2A, NDC80, CKS2, NUF2, CKS1B, MKI67, TMPO, CENPF,
## TACC3, FAM64A, SMC4, CCNB2, CKAP2L, CKAP2, AURKB, BUB1, KIF11, ANP32E, TUBB4B,
## GTSE1, KIF20B, HJURP, CDCA3, HN1, CDC20, TTK, CDC25C, KIF2C, RANGAP1, NCAPD2,
## DLGAP5, CDCA2, CDCA8, ECT2, KIF23, HMMR, AURKA, PSRC1, ANLN, LBR, CKAP5, CENPE,
## CTCF, NEK2, G2E3, GAS2L3, CBX5, CENPA, not searching for symbol synonyms
## Warning in AddModuleScore(object = object, features = features, name = name, :
## Could not find enough features in the object from the following feature lists:
## S.Score Attempting to match case...Could not find enough features in the object
## from the following feature lists: G2M.Score Attempting to match case...
VlnPlot(filt, features = c("S.Score", "G2M.Score"),
group.by = "orig.ident",
ncol = 4, pt.size = 0)
Having written the following I realized I used an older version of my mSigDB reference… FIXME: Redo it with the 7.5+ data.
<- load_gmt_signatures(signatures = "reference/c7.all.v7.2.symbols.gmt",
broad_c2 signature_category = "c2")
<- grepl(x = names(broad_c2), pattern = "FLU")
influenza_categories_idx <- broad_c2[influenza_categories_idx]
influenza_categories head(names(influenza_categories))
## [1] "GSE29614_CTRL_VS_TIV_FLU_VACCINE_PBMC_2007_UP"
## [2] "GSE29614_CTRL_VS_TIV_FLU_VACCINE_PBMC_2007_DN"
## [3] "GSE29614_CTRL_VS_DAY3_TIV_FLU_VACCINE_PBMC_UP"
## [4] "GSE29614_CTRL_VS_DAY3_TIV_FLU_VACCINE_PBMC_DN"
## [5] "GSE29614_CTRL_VS_DAY7_TIV_FLU_VACCINE_PBMC_UP"
## [6] "GSE29614_CTRL_VS_DAY7_TIV_FLU_VACCINE_PBMC_DN"
## Note that the gene IDs in the Seurat datastructure are all
## Firstlettercapital, while those in Broad are ALLUPPER...
summary(geneIds(influenza_categories[[1]]) %in% toupper(rownames(filt)))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': error in evaluating the argument 'x' in selecting a method for function '%in%': could not find function "geneIds"
summary(stringr::str_to_title(geneIds(influenza_categories[[1]])) %in%
rownames(filt))
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': error in evaluating the argument 'x' in selecting a method for function '%in%': could not find function "geneIds"
names(influenza_categories)[1]
## [1] "GSE29614_CTRL_VS_TIV_FLU_VACCINE_PBMC_2007_UP"
<- list("up" = geneIds(influenza_categories[[17]]),
test_features "down" = geneIds(influenza_categories[[18]]))
## Error in geneIds(influenza_categories[[17]]): could not find function "geneIds"
<- AddModuleScore(object = filt_norm, features = test_features,
wtf name = "influenza1")
## Error in AddModuleScore(object = filt_norm, features = test_features, : object 'test_features' not found
VlnPlot(wtf, features = c("influenza11", "influenza12"),
group.by = "orig.ident", same.y.lims = TRUE,
ncol = 4, pt.size = 0.01)
## Error in DefaultAssay(object = object): object 'wtf' not found
<- list("up" = geneIds(influenza_categories[[27]]),
test_features "down" = geneIds(influenza_categories[[28]]))
## Error in geneIds(influenza_categories[[27]]): could not find function "geneIds"
<- AddModuleScore(object = filt_norm, features = test_features,
wtf name = "influenza2")
## Error in AddModuleScore(object = filt_norm, features = test_features, : object 'test_features' not found
VlnPlot(wtf, features = c("influenza21", "influenza22"),
group.by = "orig.ident",
ncol = 4, pt.size = 0.01)
## Error in DefaultAssay(object = object): object 'wtf' not found
::pander(sessionInfo()) pander
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: ggplot2(v.3.4.0), SeuratObject(v.4.1.3), Seurat(v.4.3.0), hpgltools(v.1.0), testthat(v.3.1.6), reticulate(v.1.26), SummarizedExperiment(v.1.28.0), GenomicRanges(v.1.50.2), GenomeInfoDb(v.1.34.6), IRanges(v.2.32.0), S4Vectors(v.0.36.1), MatrixGenerics(v.1.10.0), matrixStats(v.0.63.0), Biobase(v.2.58.0) and BiocGenerics(v.0.44.0)
loaded via a namespace (and not attached): ica(v.1.0-3), ps(v.1.7.2), Rsamtools(v.2.14.0), foreach(v.1.5.2), lmtest(v.0.9-40), rprojroot(v.2.0.3), crayon(v.1.5.2), rbibutils(v.2.2.11), MASS(v.7.3-58.1), nlme(v.3.1-161), backports(v.1.4.1), sva(v.3.46.0), GOSemSim(v.2.24.0), rlang(v.1.0.6), XVector(v.0.38.0), HDO.db(v.0.99.1), ROCR(v.1.0-11), irlba(v.2.3.5.1), nloptr(v.2.0.3), callr(v.3.7.3), limma(v.3.54.0), filelock(v.1.0.2), BiocParallel(v.1.32.5), rjson(v.0.2.21), bit64(v.4.0.5), glue(v.1.6.2), sctransform(v.0.3.5), pbkrtest(v.0.5.1), parallel(v.4.2.0), processx(v.3.8.0), vipor(v.0.4.5), spatstat.sparse(v.3.0-0), AnnotationDbi(v.1.60.0), DOSE(v.3.24.2), spatstat.geom(v.3.0-3), tidyselect(v.1.2.0), usethis(v.2.1.6), fitdistrplus(v.1.1-8), variancePartition(v.1.28.0), XML(v.3.99-0.13), tidyr(v.1.2.1), zoo(v.1.8-11), qqconf(v.1.3.1), GenomicAlignments(v.1.34.0), xtable(v.1.8-4), magrittr(v.2.0.3), evaluate(v.0.19), Rdpack(v.2.4), cli(v.3.5.0), zlibbioc(v.1.44.0), sn(v.2.1.0), rstudioapi(v.0.14), miniUI(v.0.1.1.1), sp(v.1.5-1), bslib(v.0.4.2), mathjaxr(v.1.6-0), fastmatch(v.1.1-3), aod(v.1.3.2), treeio(v.1.22.0), shiny(v.1.7.4), xfun(v.0.36), multtest(v.2.54.0), pkgbuild(v.1.4.0), gson(v.0.0.9), cluster(v.2.1.4), caTools(v.1.18.2), tidygraph(v.1.2.2), KEGGREST(v.1.38.0), tibble(v.3.1.8), ggrepel(v.0.9.2), ape(v.5.6-2), listenv(v.0.9.0), Biostrings(v.2.66.0), png(v.0.1-8), future(v.1.30.0), withr(v.2.5.0), bitops(v.1.0-7), ggforce(v.0.4.1), plyr(v.1.8.8), GSEABase(v.1.60.0), pillar(v.1.8.1), gplots(v.3.1.3), cachem(v.1.0.6), GenomicFeatures(v.1.50.3), multcomp(v.1.4-20), fs(v.1.5.2), clusterProfiler(v.4.6.0), vctrs(v.0.5.1), ellipsis(v.0.3.2), generics(v.0.1.3), devtools(v.2.4.5), metap(v.1.8), tools(v.4.2.0), beeswarm(v.0.4.0), munsell(v.0.5.0), tweenr(v.2.0.2), fgsea(v.1.24.0), DelayedArray(v.0.24.0), fastmap(v.1.1.0), compiler(v.4.2.0), pkgload(v.1.3.2), abind(v.1.4-5), httpuv(v.1.6.7), rtracklayer(v.1.58.0), sessioninfo(v.1.2.2), plotly(v.4.10.1), GenomeInfoDbData(v.1.2.9), gridExtra(v.2.3), edgeR(v.3.40.1), lattice(v.0.20-45), deldir(v.1.0-6), mutoss(v.0.1-12), utf8(v.1.2.2), later(v.1.3.0), dplyr(v.1.0.10), BiocFileCache(v.2.6.0), jsonlite(v.1.8.4), scales(v.1.2.1), graph(v.1.76.0), tidytree(v.0.4.2), pbapply(v.1.6-0), genefilter(v.1.80.2), lazyeval(v.0.2.2), promises(v.1.2.0.1), doParallel(v.1.0.17), R.utils(v.2.12.2), goftest(v.1.2-3), spatstat.utils(v.3.0-1), sandwich(v.3.0-2), rmarkdown(v.2.19), cowplot(v.1.1.1), Rtsne(v.0.16), pander(v.0.6.5), downloader(v.0.4), uwot(v.0.1.14), igraph(v.1.3.5), plotrix(v.3.8-2), numDeriv(v.2016.8-1.1), survival(v.3.4-0), yaml(v.2.3.6), htmltools(v.0.5.4), memoise(v.2.0.1), profvis(v.0.3.7), BiocIO(v.1.8.0), locfit(v.1.5-9.7), graphlayouts(v.0.8.4), viridisLite(v.0.4.1), digest(v.0.6.31), assertthat(v.0.2.1), RhpcBLASctl(v.0.21-247.1), mime(v.0.12), rappdirs(v.0.3.3), RSQLite(v.2.2.20), yulab.utils(v.0.0.6), future.apply(v.1.10.0), remotes(v.2.4.2), data.table(v.1.14.6), urlchecker(v.1.0.1), blob(v.1.2.3), R.oo(v.1.25.0), splines(v.4.2.0), labeling(v.0.4.2), RCurl(v.1.98-1.9), broom(v.1.0.2), hms(v.1.1.2), colorspace(v.2.0-3), mnormt(v.2.1.1), ggbeeswarm(v.0.7.1), aplot(v.0.1.9), ggrastr(v.1.0.1), sass(v.0.4.4), Rcpp(v.1.0.9), RANN(v.2.6.1), mvtnorm(v.1.1-3), enrichplot(v.1.18.3), fansi(v.1.0.3), tzdb(v.0.3.0), brio(v.1.1.3), parallelly(v.1.33.0), R6(v.2.5.1), grid(v.4.2.0), ggridges(v.0.5.4), lifecycle(v.1.0.3), TFisher(v.0.2.0), curl(v.4.3.3), minqa(v.1.2.5), leiden(v.0.4.3), jquerylib(v.0.1.4), PROPER(v.1.30.0), Matrix(v.1.5-3), qvalue(v.2.30.0), TH.data(v.1.1-1), desc(v.1.4.2), RcppAnnoy(v.0.0.20), RColorBrewer(v.1.1-3), iterators(v.1.0.14), spatstat.explore(v.3.0-5), stringr(v.1.5.0), htmlwidgets(v.1.6.0), polyclip(v.1.10-4), biomaRt(v.2.54.0), purrr(v.1.0.0), shadowtext(v.0.1.2), gridGraphics(v.0.5-1), mgcv(v.1.8-41), globals(v.0.16.2), patchwork(v.1.1.2), spatstat.random(v.3.0-1), progressr(v.0.12.0), codetools(v.0.2-18), GO.db(v.3.16.0), gtools(v.3.9.4), prettyunits(v.1.1.1), dbplyr(v.2.2.1), R.methodsS3(v.1.8.2), gtable(v.0.3.1), DBI(v.1.1.3), highr(v.0.10), ggfun(v.0.0.9), tensor(v.1.5), httr(v.1.4.4), KernSmooth(v.2.23-20), stringi(v.1.7.8), vroom(v.1.6.0), progress(v.1.2.2), reshape2(v.1.4.4), farver(v.2.1.1), annotate(v.1.76.0), viridis(v.0.6.2), ggtree(v.3.6.2), xml2(v.1.3.3), boot(v.1.3-28.1), lme4(v.1.1-31), restfulr(v.0.0.15), readr(v.2.1.3), ggplotify(v.0.1.0), scattermore(v.0.8), bit(v.4.0.5), scatterpie(v.0.1.8), spatstat.data(v.3.0-0), ggraph(v.2.1.0), pkgconfig(v.2.0.3) and knitr(v.1.41)
message(paste0("This is hpgltools commit: ", get_git_commit()))
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 911e7d4beebdc73267ec4be631a305574289efd3
## This is hpgltools commit: Tue Jan 17 10:36:44 2023 -0500: 911e7d4beebdc73267ec4be631a305574289efd3
<- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
this_save message(paste0("Saving to ", this_save))
## Saving to index-v202301.rda.xz
<- sm(saveme(filename=this_save)) tmp