I downloaded a new version of cellranger along with the various reference files provided by 10x for the VD(J) references etc. I got a bit distracted by the pipeline language implemented by 10x called ‘martian’. I have the feeling that it might prove a good thing to play with.
Here are the commands I ran to separate the samples and perform the alignments. There are 4 sample names and each was done with one run of the ‘normal’ GEX scRNASeq method and one of the (new to me) V(D)J library.
April kindly sent some information from 10x which shows that I should have used the multi pipline when preprocessing the data.
Intra-Muscular vs. Nasal
I wrote 4 separate configuration csv files using the templates I downloaded and following a little reading. It seemed to me that I should be able to process them all as a single csv file, but when I attempted that, cellranger did not react well. It also took a few tries before I got the various reference/library options correct.
Note that once cellranger successfully ran for the samples I moved them to the multi/ directory so that I can compare the outputs to when I simply did the ‘count’ operation.
The following invocations of cellranger all appear to work without any problems. Ideally I would like them to be done in a single run, though.
My attempts so far to use the csv configuration to concatenate multiple vdj libraries have not worked, so I chose to do it the stupid way, which is what I should have just done to begin with. Caveat, it works fine for the gex libraries to do it the way the documentation suggests.
cd preprocessing
for i in R1 R2; do
for j in Control Mock_Mex09 IM_Mex09 IN_Mex09; do
A_file=$(/bin/ls A_${j}_VDJ*_${i}_001.fastq.gz)
B_file=$(/bin/ls B_${j}_VDJ*_${i}_001.fastq.gz)
out_file="Concat_${j}_VDJ_${i}.fastq.gz"
cp_cmd="cp ${A_file} ${out_file}"
echo "Running: ${cp_cmd}."
eval $cp_cmd
cat_cmd="cat ${B_file} >> ${out_file}"
echo "Running: ${cat_cmd}."
eval $cat_cmd
done
done
module add cellranger
mkdir -p gex_vdj_ab_combined
cellranger multi --id control --csv cellranger_config/multi_config_try06_control.csv
mv control gex_vdj_ab_combined/
cellranger multi --id mock --csv cellranger_config/multi_config_try06_mock.csv
mv mock gex_vdj_ab_combined/
cellranger multi --id m --csv cellranger_config/multi_config_try06_m.csv
mv m gex_vdj_ab_combined/
cellranger multi --id n --csv cellranger_config/multi_config_try06_n.csv
mv n gex_vdj_ab_combined/
I wonder if I can put the gene annotations into the misc slot of the seurat data structure? And perhaps overload fData() to use it?
<- load_biomart_annotations()$annotation annotations
## The biomart annotations file already exists, loading from it.
<- unique(annotations[, c("hgnc_symbol", "description", "ensembl_gene_id")]) brief
#prefix <- "multi"
<- "gex_vdj_ab_combined" prefix
The following block is mostly a cut/paste of itself where I set the (over)simplified name of each sample. This then becomes the template for the path and parameters used to read the data, create a seurat object, and add the clonotype data from the vdj run.
I wrote a little function to make loading the Seurat data from a sample sheet easier. My intention is to have some of this code write back to that sample sheet.
In case I decide to play with the 4 samples separately, I immediately turned around and extracted all cells from each sample into separate data structures named control_scd, mock_scd, muscular_scd, and nasal_scd (I am using scd as a suffix to denote single-cell datastructure).
<- create_scd("sample_sheets/all_samples_202301.xlsx",
all_scd expression_column = "dataroot", types = c("gex", "ab"),
vdj_t_column = "vdjtcells")
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## 10X data contains more than one type and is being returned as a list containing matrices of each type.
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## 10X data contains more than one type and is being returned as a list containing matrices of each type.
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
## duplicated across objects provided. Renaming to enforce unique cell names.
## 10X data contains more than one type and is being returned as a list containing matrices of each type.
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## 10X data contains more than one type and is being returned as a list containing matrices of each type.
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## Warning: Some cell names are duplicated across objects provided. Renaming to
## enforce unique cell names.
table(as.factor(Idents(all_scd)))
##
## control m mock n
## 15362 14008 14088 8682
I want to take a couple minutes to add some annotations to the seurat object, notably I want to state the identity relationships with some sort of name.
Thus I will make a vector of the the sample IDs and for each one make a category of self/not-self. Note that Seurat comes with a function ‘FindAllMarkers()’ which compares each self to all other samples, so this may be redundant; but it is kind of nice to be able to see the categories as a set of binary indexes.
<- add_binary_states(all_scd) all_scd
I recorded into the xlsx sample sheet the number of cells with defined clonotypes according to cellranger. We should be able to immediately recapitulate those numbers by asking how many cells have a clonotype_id.
<- !is.na(all_scd[["raw_clonotype_id"]])
all_clono summary(all_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:39782
## TRUE :12358
<- !is.na(all_scd[["raw_clonotype_id"]]) &
control_clono Idents(all_scd) == "control"
summary(control_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:50165
## TRUE :1975
<- !is.na(all_scd[["raw_clonotype_id"]]) &
mock_clono Idents(all_scd) == "mock"
summary(mock_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:49043
## TRUE :3097
<- !is.na(all_scd[["raw_clonotype_id"]]) &
m_clono Idents(all_scd) == "m"
summary(m_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:47918
## TRUE :4222
<- !is.na(all_scd[["raw_clonotype_id"]]) &
n_clono Idents(all_scd) == "n"
summary(n_clono)
## raw_clonotype_id
## Mode :logical
## FALSE:49076
## TRUE :3064
The numbers above correspond to the column ‘vdjt_cells’ in the excel sample sheet.
We have some simple functions which can record some potentially useful simple metrics into the metadata of the single cell datastructure. The record_seurat_samples appends to our input samplesheet extra columns containing metrics which might be of interest. In the following example it adds a column of how many cells Seurat thinks there are. This should match what we observed from cellranger as well as the binary sorter above.
Following that, we count up the number of genes observed followed by mean values for:
For a little extra fun, if the verbose parameter is left TRUE, this function uses the new(to me) skim function, which provides a pretty summary of the category of interest. It therefore also makes it easy, if we want, to add SD or quartiles in addition to mean values.
<- is.na(all_scd[["raw_clonotype_id"]])
not_vdj_idx "tcell"]] <- "Yes"
all_scd[[@meta.data[not_vdj_idx, "tcell"] <- "No"
all_scd@meta.data[["tcell_sample"]] <- paste0(Idents(all_scd), "_",
all_scd@meta.data[["tcell"]]) all_scd
<- record_seurat_samples(all_scd, type = "num_cells") %>%
all_scd record_seurat_samples(type = "nFeature_RNA") %>%
record_seurat_samples(type = "nCount_RNA") %>%
record_seurat_samples(type = "reads", column_name = "clonotype_reads") %>%
record_seurat_samples(type = "pct_mito", pattern = "^mt-") %>%
record_seurat_samples(type = "pct_ribo", pattern = "^Rp[sl]") %>%
record_seurat_samples(type = "CD45", pattern = "^CD45-Total", assay = "ab") %>%
record_seurat_samples(type="CD45-2", pattern="^CD45-2-Total", assay="ab") %>%
record_seurat_samples(type = "CD4", pattern = "^CD4-Total", assay = "ab") %>%
record_seurat_samples(type = "CD8a", pattern = "^CD8a-Total", assay = "ab") %>%
record_seurat_samples(type = "CD44", pattern = "^CD44-Total", assay = "ab") %>%
record_seurat_samples(type = "CD62L", pattern = "^CD62L-Total", assay = "ab") %>%
record_seurat_samples(type = "CD69", pattern = "^CD69-Total", assay = "ab") %>%
record_seurat_samples(type = "CD103", pattern = "^CD103-Total", assay = "ab") %>%
record_seurat_samples(type = "CD19", pattern = "^CD19-Total", assay = "ab")
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 50
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 nFeature_RNA control 0 1 2295. 1254. 22 1357 1978. 3034
## 2 nFeature_RNA m 0 1 2369. 1421. 26 1296 1924 3192.
## 3 nFeature_RNA mock 0 1 2352. 1349. 19 1320 1982. 3151
## 4 nFeature_RNA n 0 1 2445. 1531. 27 1192 1926. 3559.
## p100 hist
## 1 8273 ▇▇▃▁▁
## 2 8186 ▇▆▃▁▁
## 3 7971 ▇▇▃▂▁
## 4 8556 ▇▅▃▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 50
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50
## 1 nCount_RNA control 0 1 6354. 5187. 500 2662 4502
## 2 nCount_RNA m 0 1 7151. 6753. 500 2688 4494.
## 3 nCount_RNA mock 0 1 6726. 5953. 502 2595 4448
## 4 nCount_RNA n 0 1 7601. 7117. 500 2445. 4582.
## p75 p100 hist
## 1 8792. 49546 ▇▂▁▁▁
## 2 9538. 66734 ▇▁▁▁▁
## 3 9150. 54589 ▇▂▁▁▁
## 4 11065 63171 ▇▂▁▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 50
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 reads control 13387 0.129 7058. 6034. 41 3114. 5356 9334.
## 2 reads m 9786 0.301 3682. 4113. 24 1401. 2563 4467.
## 3 reads mock 10991 0.220 4702. 4083. 42 1996 3612 6195
## 4 reads n 5618 0.353 3459. 3603. 25 1377 2472. 4231.
## p100 hist
## 1 53188 ▇▂▁▁▁
## 2 48039 ▇▁▁▁▁
## 3 40634 ▇▁▁▁▁
## 4 36105 ▇▁▁▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 50
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75 p100
## 1 pct_mito control 0 1 2.52 4.20 0 1.35 1.98 2.92 98.5
## 2 pct_mito m 0 1 2.38 4.03 0 1.34 1.96 2.74 98.0
## 3 pct_mito mock 0 1 2.13 3.14 0 1.19 1.76 2.53 98.9
## 4 pct_mito n 0 1 2.09 3.81 0 1.20 1.75 2.43 97.7
## hist
## 1 ▇▁▁▁▁
## 2 ▇▁▁▁▁
## 3 ▇▁▁▁▁
## 4 ▇▁▁▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 50
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 pct_ribo control 0 1 10.1 7.39 0 5.51 7.74 11.1
## 2 pct_ribo m 0 1 13.4 10.7 0 5.60 8.42 18.9
## 3 pct_ribo mock 0 1 9.23 7.46 0 4.55 6.24 10.9
## 4 pct_ribo n 0 1 13.7 10.5 0 5.84 8.38 20.7
## p100 hist
## 1 51.3 ▇▂▁▁▁
## 2 50.4 ▇▂▂▂▁
## 3 48.4 ▇▂▁▁▁
## 4 49.2 ▇▂▂▂▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 51
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 CD45 control 0 1 0.0262 0.0695 0 0 0 0
## 2 CD45 m 0 1 0.0268 0.0694 0 0 0 0
## 3 CD45 mock 0 1 0.0262 0.0670 0 0 0 0
## 4 CD45 n 0 1 0.0246 0.0667 0 0 0 0
## p100 hist
## 1 0.781 ▇▁▁▁▁
## 2 0.873 ▇▁▁▁▁
## 3 0.813 ▇▁▁▁▁
## 4 1.92 ▇▁▁▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 52
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 CD45-2 control 0 1 14.8 2.87 2.78 13.1 14.6 16.2
## 2 CD45-2 m 0 1 16.4 4.39 2.95 13.4 15.4 18.5
## 3 CD45-2 mock 0 1 16.6 3.88 0 14.0 16.0 18.6
## 4 CD45-2 n 0 1 16.9 4.56 3.17 13.8 15.6 19.4
## p100 hist
## 1 35.5 ▁▇▃▁▁
## 2 47.6 ▁▇▂▁▁
## 3 43.4 ▁▇▃▁▁
## 4 50.1 ▁▇▂▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 53
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 CD4 control 0 1 7.48 2.84 0 6.08 7.16 8.35
## 2 CD4 m 0 1 8.24 4.94 0.307 5.51 6.71 8.64
## 3 CD4 mock 0 1 7.46 3.40 0 5.75 6.79 8.08
## 4 CD4 n 0 1 8.13 4.33 0.449 5.72 6.77 8.59
## p100 hist
## 1 33.3 ▅▇▁▁▁
## 2 34.5 ▇▃▁▁▁
## 3 69.9 ▇▁▁▁▁
## 4 29.2 ▆▇▂▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 54
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 CD8a control 0 1 9.07 4.30 0.296 7.37 8.62 9.87
## 2 CD8a m 0 1 9.10 6.31 0 6.16 7.69 9.37
## 3 CD8a mock 0 1 9.75 5.17 0 7.22 8.70 10.4
## 4 CD8a n 0 1 9.62 6.15 0.687 6.68 8.13 9.64
## p100 hist
## 1 44.0 ▇▅▁▁▁
## 2 45.8 ▇▂▁▁▁
## 3 48.2 ▇▃▁▁▁
## 4 44.2 ▇▂▁▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 55
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 CD44 control 0 1 48.7 8.77 16.7 44.3 47.8 51.7
## 2 CD44 m 0 1 47.0 10.6 14.2 41.3 48.0 53.3
## 3 CD44 mock 0 1 44.6 8.99 15.7 38.9 44.6 50.0
## 4 CD44 n 0 1 46.5 8.87 17.0 41.8 47.8 52.0
## p100 hist
## 1 92.9 ▁▇▇▁▁
## 2 94.9 ▁▆▇▁▁
## 3 94.2 ▁▇▅▁▁
## 4 92.5 ▁▆▇▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 56
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 CD62L control 0 1 1.43 1.13 0 0.873 1.22 1.64
## 2 CD62L m 0 1 1.81 1.81 0 0.828 1.20 1.89
## 3 CD62L mock 0 1 1.68 1.26 0 1.02 1.39 1.89
## 4 CD62L n 0 1 1.57 1.70 0 0.726 1.03 1.56
## p100 hist
## 1 27.9 ▇▁▁▁▁
## 2 33.4 ▇▁▁▁▁
## 3 22.3 ▇▁▁▁▁
## 4 33.1 ▇▁▁▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 57
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 CD69 control 0 1 8.45 2.27 0 7.30 8.70 9.88
## 2 CD69 m 0 1 7.84 2.42 0.337 6.23 7.72 9.35
## 3 CD69 mock 0 1 9.95 2.55 0.488 8.27 9.87 11.6
## 4 CD69 n 0 1 7.53 2.22 0.608 6.10 7.47 8.79
## p100 hist
## 1 19.4 ▁▃▇▁▁
## 2 31.5 ▃▇▁▁▁
## 3 28.6 ▁▇▂▁▁
## 4 23.8 ▂▇▁▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 58
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75 p100
## 1 CD103 control 0 1 3.30 1.20 0 2.60 3.31 3.97 23.2
## 2 CD103 m 0 1 3.36 1.48 0 2.38 3.33 4.10 17.9
## 3 CD103 mock 0 1 3.50 1.33 0 2.67 3.47 4.17 22.7
## 4 CD103 n 0 1 3.07 1.31 0 2.21 3.08 3.74 16.6
## hist
## 1 ▇▁▁▁▁
## 2 ▇▆▁▁▁
## 3 ▇▂▁▁▁
## 4 ▇▅▁▁▁
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## ── Data Summary ────────────────────────
## Values
## Name data
## Number of rows 52140
## Number of columns 59
## _______________________
## Column type frequency:
## numeric 1
## ________________________
## Group variables Idents
##
## ── Variable type: numeric ──────────────────────────────────────────────────────
## skim_variable Idents n_missing complete_rate mean sd p0 p25 p50 p75
## 1 CD19 control 0 1 6.73 1.92 0 5.73 6.80 7.89
## 2 CD19 m 0 1 6.22 1.99 0 4.83 6.39 7.54
## 3 CD19 mock 0 1 6.42 1.83 0.425 5.33 6.58 7.58
## 4 CD19 n 0 1 6.62 1.86 0.473 5.33 6.83 7.85
## p100 hist
## 1 40.9 ▇▂▁▁▁
## 2 36.0 ▇▃▁▁▁
## 3 40.4 ▇▁▁▁▁
## 4 15.7 ▁▆▇▁▁
Let us start filtering the data, leading off with a definition of the minimum number of RNAs, minimum amount of rRNA, and maximum mitochondrial. In addition, let us print how much of each are observed before filtering. Before we can print/filter these attributes, we must use the PercentageFeatureSet() to get the numbers…
<- 200
min_num_rna <- 5
min_pct_ribo <- 20 max_pct_mito
Here is what skim looks like when given the entire dataset.
Genes observed:
skim(all_scd[["nFeature_RNA"]])
Name | all_scd[[“nFeature_RNA”]] |
Number of rows | 52140 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
nFeature_RNA | 0 | 1 | 2355 | 1375 | 19 | 1303 | 1958 | 3182 | 8556 | ▇▆▃▁▁ |
Number counts:
skim(all_scd[["nCount_RNA"]])
Name | all_scd[[“nCount_RNA”]] |
Number of rows | 52140 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
nCount_RNA | 0 | 1 | 6876 | 6195 | 500 | 2617 | 4491 | 9379 | 66734 | ▇▁▁▁▁ |
Percent ribosomal proteins:
skim(all_scd[["pct_mito"]])
Name | all_scd[[“pct_mito”]] |
Number of rows | 52140 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
pct_mito | 0 | 1 | 2.31 | 3.83 | 0 | 1.27 | 1.87 | 2.68 | 98.92 | ▇▁▁▁▁ |
Percent mitochondrial RNA:
skim(all_scd[["pct_ribo"]])
Name | all_scd[[“pct_ribo”]] |
Number of rows | 52140 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
pct_ribo | 0 | 1 | 11.33 | 9.16 | 0 | 5.25 | 7.55 | 13.98 | 51.25 | ▇▂▁▁▁ |
Number of clonotypes:
## Length and reads are for only those cells with clonotypes.
skim(all_scd[["reads"]])
Name | all_scd[[“reads”]] |
Number of rows | 52140 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
reads | 39782 | 0.24 | 4422 | 4532 | 24 | 1636 | 3088 | 5500 | 53188 | ▇▁▁▁▁ |
And just for my curiosity, how many cells have some chain associated with them?
## How many cells have specific chains associated with them
sum(!is.na(all_scd$chain))
## [1] 12358
::kable(all_scd@misc[["sample_metadata"]]) knitr
sampleid | condition | dataroot | gexcells | gexmedianrpc | gexmediangpc | gextotalgenes | gexmedianupc | vdjtcells | vdjtvjspanning | mediantraupc | mediantrbpc | areads | breads | creads | mappedgenomepct | mappedtxpct | vdjreads | vdjrpc | vdjmeanupc | abcells | abmedianupc | abmeanreadsupc | abreads | batch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
control | control | control | gex_vdj_ab_combined/control | 15362 | 12370 | 1979 | 22198 | 4502 | 1975 | 1515 | 4 | 10 | 88882712 | 118449275 | 102310661 | 92.15 | 66.9 | 61009543 | 30891 | 15004 | 15362 | 588 | 978 | 44209112 | undefined |
mock | mock | mock | gex_vdj_ab_combined/mock | 14088 | 11582 | 1982 | 21949 | 4448 | 3097 | 2468 | 4 | 10 | 99158139 | 100331735 | 82538789 | 93.3 | 68.4 | 56928487 | 18382 | 10200 | 14088 | 639 | 1084 | 43461345 | undefined |
m | m | muscular | gex_vdj_ab_combined/m | 14008 | 11130 | 1924 | 22431 | 4495 | 4222 | 3384 | 4 | 11 | 87946942 | 104613172 | 89191429 | 92.6 | 68.0 | 56660131 | 13420 | 8376 | 14008 | 625 | 1115 | 41110680 | undefined |
n | n | nasal | gex_vdj_ab_combined/n | 8682 | 10678 | 1927 | 21485 | 4582 | 3064 | 2489 | 41 | 11 | 53224250 | 40663298 | 81908022 | 92.5 | 68.3 | 38838833 | 12676 | 8050 | 8682 | 666 | 1289 | 37854793 | undefined |
The antibody data comprises ~ 9 ‘genes’ which have counts/cell in the counts slot of the associated slot in the seurat data structure. It seems to me that a simple way of incorporating that information is to do something like the percent_xxx calculation for each of the associated features and add the results to the primary metadata. Thus, when we do the various filters that information can be saved.
Seurat provides a violin plot for visualizing the distribution of some of these metrics easy. So lets look!
VlnPlot(all_scd, features = "nFeature_RNA", pt.size = 0)
VlnPlot(all_scd, features = "pct_mito", pt.size = 0)
VlnPlot(all_scd, features = "pct_ribo", pt.size = 0)
VlnPlot(all_scd, features = "nCount_RNA", pt.size = 0)
VlnPlot(all_scd, features = "reads", pt.size = 0)
## Warning: Removed 39782 rows containing non-finite values (`stat_ydensity()`).
## I am curious about the length of the clonotype sequences.
VlnPlot(all_scd, features = "length", pt.size = 0)
## Warning: Removed 39782 rows containing non-finite values (`stat_ydensity()`).
I wrote a little helper function which should make it easy to filter and observe what the data looks like before/after filtering.
<- filter_scd(all_scd, min_num_rna = 500, remerge = "ab") filt_scd
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## Warning in xtfrm.data.frame(x): cannot xtfrm data frames
## All filters removed 12163 (23.327580%) cells, 10162 (31.475918%) genes, 65236497 (18.195244%) counts.
## Filtering removed 12163 cells from the ab assay.
@misc[["pre_plots"]][["count_vs_genes"]] filt_scd
@misc[["post_plots"]][["count_vs_genes"]] filt_scd
@misc[["pre_plots"]][["count_vs_ribo"]] filt_scd
@misc[["post_plots"]][["count_vs_ribo"]] filt_scd
@misc[["pre_plots"]][["count_vs_mito"]] filt_scd
@misc[["post_plots"]][["count_vs_mito"]] filt_scd
@misc[["pre_plots"]][["ribo_vs_mito"]] filt_scd
@misc[["post_plots"]][["ribo_vs_mito"]] filt_scd
Now let us look at the various metrics following filtering as mean/sample.
::kable(filt_scd@misc[["sample_metadata"]]) knitr
sampleid | condition | dataroot | gexcells | gexmedianrpc | gexmediangpc | gextotalgenes | gexmedianupc | vdjtcells | vdjtvjspanning | mediantraupc | mediantrbpc | areads | breads | creads | mappedgenomepct | mappedtxpct | vdjreads | vdjrpc | vdjmeanupc | abcells | abmedianupc | abmeanreadsupc | abreads | batch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
control | control | control | gex_vdj_ab_combined/control | 15362 | 12370 | 1979 | 22198 | 4502 | 1975 | 1515 | 4 | 10 | 88882712 | 118449275 | 102310661 | 92.15 | 66.9 | 61009543 | 30891 | 15004 | 15362 | 588 | 978 | 44209112 | undefined |
mock | mock | mock | gex_vdj_ab_combined/mock | 14088 | 11582 | 1982 | 21949 | 4448 | 3097 | 2468 | 4 | 10 | 99158139 | 100331735 | 82538789 | 93.3 | 68.4 | 56928487 | 18382 | 10200 | 14088 | 639 | 1084 | 43461345 | undefined |
m | m | muscular | gex_vdj_ab_combined/m | 14008 | 11130 | 1924 | 22431 | 4495 | 4222 | 3384 | 4 | 11 | 87946942 | 104613172 | 89191429 | 92.6 | 68.0 | 56660131 | 13420 | 8376 | 14008 | 625 | 1115 | 41110680 | undefined |
n | n | nasal | gex_vdj_ab_combined/n | 8682 | 10678 | 1927 | 21485 | 4582 | 3064 | 2489 | 41 | 11 | 53224250 | 40663298 | 81908022 | 92.5 | 68.3 | 38838833 | 12676 | 8050 | 8682 | 666 | 1289 | 37854793 | undefined |
<- NormalizeData(object=all_scd) %>%
all_norm FindVariableFeatures() %>%
ScaleData() %>%
RunPCA() %>%
FindNeighbors() %>%
FindClusters() %>%
RunTSNE(check_duplicates=FALSE) %>%
RunUMAP(reduction="pca", dims=1:10)
## Centering and scaling data matrix
## PC_ 1
## Positive: Sparc, Cd63, Ager, Emp2, Selenbp1, Selenbp2, Gsn, Cldn18, Limch1, Cystm1
## Alcam, Cryab, Npnt, Timp3, Clic3, Aqp5, Krt7, Igfbp7, Igfbp6, Fads3
## Scd2, Prnp, Scnn1a, Krt18, Myh14, Gpx3, Ces1d, Fam189a2, Krt8, Gprc5a
## Negative: Coro1a, Rac2, Arhgdib, Laptm5, Cd3e, Selplg, Cd52, Ms4a4b, Cd3g, Ptprc
## Lat, Cd3d, Gimap3, Lck, Ms4a6b, Thy1, Ptprcap, Ltb, Vps37b, Sept1
## Skap1, Itgb7, Cd53, Il7r, Lsp1, Cytip, Fyb, Cd2, Gramd3, Cd28
## PC_ 2
## Positive: Wfdc2, Atp1b1, Ctsh, Cldn3, Cbr2, Cldn7, Sftpd, Chchd10, Napsa, Sftpb
## Muc1, Cxcl15, Slc34a2, Epcam, Ppp1r14c, Ptprf, Sftpa1, Chil1, Irx1, Lamp3
## Krt18, Lgi3, Sdc1, Nkx2-1, Sftpc, Baiap2l1, Lyz2, Krt8, Car8, Scd1
## Negative: Sod3, Prelp, Serping1, Bgn, Olfml3, Plpp3, Loxl1, Spon1, Tcf21, Plxdc2
## Col1a2, Hsd11b1, Rarres2, Col1a1, Pdgfra, Ptgis, Itga8, Fn1, Cdo1, Dpep1
## Colec12, Clec3b, Lrp1, Atp1a2, C7, Col3a1, Mfap4, Mxra8, Fxyd1, Pcolce2
## PC_ 3
## Positive: Dynlrb2, Foxj1, Fam183b, Nupr1, Tmem212, Mgst1, 1700007K13Rik, Cfap126, Gm19935, Ccdc153
## Cxcl17, 1110017D15Rik, Cyp2s1, Spaca9, Cldn3, BC051019, 1700094D03Rik, Tctex1d4, Cbr2, Rsph1
## Nme5, Gm867, 1700016K19Rik, 1700001C02Rik, Tekt4, Dnali1, Dmkn, Fhad1, Ak7, Lrrc10b
## Negative: Rtkn2, Scnn1g, Spock2, Col4a3, Igfbp2, Pdpn, Col4a4, Scnn1b, Flrt3, Lamb3
## Cytl1, Itgb6, Tmem37, Clic5, Lama3, Ndnf, Sema3e, Bdnf, Hopx, Crlf1
## Tspan8, Sema3a, Ptpre, Cyp2b10, Cdkn2b, Krt7, Hck, Abca5, Gprc5a, Akap5
## PC_ 4
## Positive: Arhgdig, Fam183b, Tmem212, 1700007K13Rik, Foxj1, Gm19935, Ccdc153, Cyp2s1, 1110017D15Rik, Spaca9
## BC051019, Tctex1d4, 1700094D03Rik, Rsph1, Cfap126, Gm867, 1700001C02Rik, Nme5, Dnali1, 1700016K19Rik
## Ccdc113, Ak7, Sntn, Tekt4, Odf3b, 2410004P03Rik, Nme9, Mapk15, AU040972, Pifo
## Negative: Slc34a2, Lamp3, Sftpb, Napsa, Sftpa1, Cxcl15, Sftpc, Chil1, Dram1, Lyz2
## Lgi3, Sftpd, Hc, Scd1, Egfl6, S100g, Muc1, Pla2g1b, Etv5, Sfta2
## Slc26a9, Fabp5, Ppp1r14c, Mlc1, Ctsc, Chia1, Tspan11, Kcnj15, Fasn, Lpcat1
## PC_ 5
## Positive: Lgals1, Coro1a, Rac2, Laptm5, Cd52, Arhgdib, Cd3e, Selplg, Cd3g, Lsp1
## Ptprc, Ms4a4b, Ptprcap, Lat, Thy1, Cytip, Sept1, Cd3d, Lck, Itgb7
## Ltb, Ms4a6b, Ifi27l2a, Gimap3, Skap1, Prdx6, Il2rb, Cd53, Icos, Cd2
## Negative: Ly6a, Tm4sf1, Sema3c, Kdr, Tmem252, Car4, H2-Ab1, Plaur, Tmcc2, Efnb2
## Gja4, Cd74, Lyve1, Sox17, Emcn, Sema7a, Plk2, Scarb1, H2-Eb1, Nes
## Serpina3i, Prx, Ccnd1, Cyp4b1, Hspb1, H2-Aa, Igfbp4, Id3, Upp1, Adamts1
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 52140
## Number of edges: 1613410
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9157
## Number of communities: 29
## Elapsed time: 16 seconds
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 20:28:40 UMAP embedding parameters a = 0.9922 b = 1.112
## 20:28:40 Read 52140 rows and found 10 numeric columns
## 20:28:40 Using Annoy for neighbor search, n_neighbors = 30
## 20:28:40 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 20:28:50 Writing NN index file to temp file /tmp/RtmpdOLh2z/file1c77923e5f4c69
## 20:28:50 Searching Annoy index using 1 thread, search_k = 3000
## 20:29:18 Annoy recall = 100%
## 20:29:21 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 20:29:27 Initializing from normalized Laplacian + noise (using irlba)
## 20:29:38 Commencing optimization for 200 epochs, with 2153204 positive edges
## 20:30:08 Optimization finished
DimPlot(object=all_norm, reduction="tsne", group.by="tcell")
<- DimPlot(all_norm, reduction="umap", group.by="tcell_sample", label=TRUE)
plotted plotted
<- NormalizeData(object=filt_scd) %>%
fnorm_scd FindVariableFeatures() %>%
ScaleData() %>%
RunPCA() %>%
FindNeighbors() %>%
FindClusters() %>%
RunTSNE() %>%
RunUMAP(reduction="pca", dims=1:10)
## Centering and scaling data matrix
## PC_ 1
## Positive: Coro1a, Rac2, Arhgdib, Laptm5, Cd3e, Selplg, Cd52, Ms4a4b, Cd3g, Ptprc
## Lat, Cd3d, Lck, Ms4a6b, Thy1, Ptprcap, Ltb, Lsp1, Itgb7, Il7r
## Fyb, Gramd3, Cd2, Cd28, Bin2, Icos, Cd69, Itgb2, Il2rb, Cotl1
## Negative: Sparc, Emp2, Cd63, Igfbp7, Ager, Gsn, Cldn18, Timp3, Selenbp1, Limch1
## Cystm1, Cst3, Gpx3, Selenbp2, Cryab, Npnt, Bcam, Bgn, Pmp22, Ces1d
## Alcam, Clic3, Apoe, Aqp5, Anxa3, Krt7, Hspb1, Col4a1, Fhl1, Serping1
## PC_ 2
## Positive: Bgn, Serping1, Plpp3, Sod3, Prelp, Rarres2, Col1a2, Sparcl1, Vim, Hsd11b1
## Olfml3, Col1a1, Loxl1, Clec3b, Tcf21, Mxra8, Ptgis, Fxyd1, Plxdc2, Spon1
## Ltbp4, Ppp1r14a, Fhl1, Lrp1, Gpx3, Col3a1, Colec12, Cdo1, Pcolce2, Pcolce
## Negative: Sftpd, Napsa, Sftpb, Slc34a2, Wfdc2, Cxcl15, Cbr2, Cldn3, Atp1b1, Sftpa1
## Muc1, Lamp3, Chil1, Ppp1r14c, Lgi3, Sftpc, Lyz2, Dram1, Ctsh, Scd1
## Hc, Car8, Egfl6, S100g, Ank3, Pi4k2b, Irx1, Lpcat1, Pla2g1b, Abca3
## PC_ 3
## Positive: Mgst1, Col6a1, Nupr1, Loxl1, Col3a1, Col1a1, Dram1, Sod3, Col1a2, Col6a2
## Ces1d, C3, Apoe, Slc34a2, Cxcl15, Lamp3, Mmp2, Chil1, Sftpb, Cd302
## Plxdc2, Sftpa1, Prelp, Lyz2, Sftpd, Ppp1r14c, Serpine2, Pdgfra, Scd1, Muc1
## Negative: Rtkn2, Scnn1g, Spock2, Igfbp2, Col4a3, Sema3e, Hopx, Pdpn, Cyp2b10, Scnn1b
## Cyp4b1, Clic5, Lama3, Flrt3, Col4a4, Tacstd2, Krt7, Lamb3, Cytl1, Sec14l3
## Tspan8, Itgb6, Cdkn2b, Krt19, Mab21l4, Tmem37, Scnn1a, Bdnf, Lmo7, Ndnf
## PC_ 4
## Positive: Cldn5, H2-Ab1, Cd74, Hpgd, Cd93, H2-Aa, H2-Eb1, Tmem252, Sema3c, Pcdh1
## Mcam, Efnb2, Dll4, Ly6c1, Sox17, Plk2, Gja4, Kdr, Stmn2, Lyve1
## Tmcc2, Sftpc, Hspa1a, Emcn, Slc34a2, Scn7a, Car4, Lamp3, Hc, Tbx3
## Negative: Fam183b, Cyp2s1, Foxj1, Tmem212, Spaca9, 1700007K13Rik, 1110017D15Rik, Gm19935, Ccdc153, Tctex1d4
## Rsph1, Arhgdig, Cfap126, BC051019, Nme5, 1700001C02Rik, Gm867, Tekt4, 1700016K19Rik, 1700094D03Rik
## 2410004P03Rik, Dnali1, Ak7, Ccdc113, Sntn, Nme9, Mns1, Pifo, AU040972, Anxa8
## PC_ 5
## Positive: Lgals1, S100a6, Col4a4, Scnn1g, Pdpn, Ndnf, Rtkn2, Birc5, Spock2, Col4a3
## Mki67, Scnn1b, Igfbp6, Tmem37, Pclaf, Flrt3, Top2a, Aqp5, Ccna2, Coro1a
## Rac2, Igfbp2, Gprc5a, Prdx6, Lgals3, Crlf1, Ube2c, Lamb3, Fads3, Rab11fip1
## Negative: Cldn5, Tm4sf1, Odf3b, Fam183b, Hpgd, Foxj1, Cd93, Tmem212, 1700007K13Rik, 1110017D15Rik
## Ccdc153, Ly6a, Gm19935, BC051019, Tctex1d4, 1700094D03Rik, Cfap126, Spaca9, 1700001C02Rik, 1700016K19Rik
## Tekt4, Gm867, Sema3c, Ak7, Dnali1, 2410004P03Rik, Sntn, Pcdh1, Nme9, Tmem252
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 39977
## Number of edges: 1236173
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9099
## Number of communities: 27
## Elapsed time: 10 seconds
## 20:36:21 UMAP embedding parameters a = 0.9922 b = 1.112
## 20:36:21 Read 39977 rows and found 10 numeric columns
## 20:36:21 Using Annoy for neighbor search, n_neighbors = 30
## 20:36:21 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 20:36:28 Writing NN index file to temp file /tmp/RtmpdOLh2z/file1c77925a3f1996
## 20:36:28 Searching Annoy index using 1 thread, search_k = 3000
## 20:36:49 Annoy recall = 100%
## 20:36:51 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 20:36:57 Initializing from normalized Laplacian + noise (using irlba)
## 20:37:04 Commencing optimization for 200 epochs, with 1634470 positive edges
## 20:37:27 Optimization finished
DimPlot(object=fnorm_scd, reduction="tsne", group.by="tcell")
<- DimPlot(fnorm_scd, reduction="umap", group.by="tcell_sample", label=TRUE)
plotted plotted
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:GSEABase':
##
## intersect, setdiff, union
##
## The following object is masked from 'package:graph':
##
## union
##
## The following object is masked from 'package:AnnotationDbi':
##
## select
##
## The following object is masked from 'package:hpgltools':
##
## combine
##
## The following object is masked from 'package:testthat':
##
## matches
##
## The following objects are masked from 'package:GenomicRanges':
##
## intersect, setdiff, union
##
## The following object is masked from 'package:GenomeInfoDb':
##
## intersect
##
## The following objects are masked from 'package:IRanges':
##
## collapse, desc, intersect, setdiff, slice, union
##
## The following objects are masked from 'package:S4Vectors':
##
## first, intersect, rename, setdiff, setequal, union
##
## The following object is masked from 'package:matrixStats':
##
## count
##
## The following object is masked from 'package:Biobase':
##
## combine
##
## The following objects are masked from 'package:BiocGenerics':
##
## combine, intersect, setdiff, union
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
<- fnorm_scd
tt <- tt@meta.data %>%
tt_meta mutate(CD19quart = ntile(CD19, 4)) %>%
mutate(CD103quart = ntile(CD103, 4)) %>%
mutate(CD69quart = ntile(CD69, 4)) %>%
mutate(CD62Lquart = ntile(CD62L, 4)) %>%
mutate(CD44quart = ntile(CD44, 4)) %>%
mutate(CD8aquart = ntile(CD8a, 4)) %>%
mutate(CD4quart = ntile(CD4, 4)) %>%
mutate(CD452quart = ntile(CD45-2, 4)) %>%
mutate(CD45quart = ntile(CD45, 4))
@meta.data <- tt_meta
ttDimPlot(tt, reduction="umap", group.by="CD19quart", label=TRUE)
DimPlot(tt, reduction="umap", group.by="CD103quart", label=TRUE)
DimPlot(tt, reduction="umap", group.by="CD69quart", label=TRUE)
DimPlot(tt, reduction="umap", group.by="CD62Lquart", label=TRUE)
DimPlot(tt, reduction="umap", group.by="CD44quart", label=TRUE)
DimPlot(tt, reduction="umap", group.by="CD8aquart", label=TRUE)
DimPlot(tt, reduction="umap", group.by="CD4quart", label=TRUE)
DimPlot(tt, reduction="umap", group.by="CD452quart", label=TRUE)
DimPlot(tt, reduction="umap", group.by="CD45quart", label=TRUE)
<- JackStraw(fnorm_scd, num.replicate=10)
fnorm_scd <- ScoreJackStraw(fnorm_scd)
fnorm_scd JackStrawPlot(fnorm_scd)
## Warning: Removed 7000 rows containing missing values (`geom_point()`).
ElbowPlot(fnorm_scd)
## So I am thinking maybe 4-10?
<- 6
wanted_dims
<- FindNeighbors(fnorm_scd, dims=1:wanted_dims) %>%
fnorm_scd FindClusters(resolution=0.5) %>%
StashIdent(save.name="res0p5_clusters")
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 39977
## Number of edges: 1169562
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9268
## Number of communities: 19
## Elapsed time: 12 seconds
## With Seurat 3.X, stashing identity classes can be accomplished with the following:
## .[["res0p5_clusters"]] <- Idents(object = .)
RunUMAP(fnorm_scd, dims=1:9)
## 20:41:18 UMAP embedding parameters a = 0.9922 b = 1.112
## 20:41:18 Read 39977 rows and found 9 numeric columns
## 20:41:18 Using Annoy for neighbor search, n_neighbors = 30
## 20:41:18 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 20:41:25 Writing NN index file to temp file /tmp/RtmpdOLh2z/file1c779268dbe5
## 20:41:25 Searching Annoy index using 1 thread, search_k = 3000
## 20:41:48 Annoy recall = 100%
## 20:41:51 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 20:41:56 Initializing from normalized Laplacian + noise (using irlba)
## 20:42:04 Commencing optimization for 200 epochs, with 1631114 positive edges
## 20:42:28 Optimization finished
## An object of class Seurat
## 22132 features across 39977 samples within 2 assays
## Active assay: RNA (22123 features, 2000 variable features)
## 1 other assay present: ab
## 3 dimensional reductions calculated: pca, tsne, umap
DimPlot(fnorm_scd, label=TRUE)
<- FindClusters(fnorm_scd, resolution=0.1) %>%
fnorm_scd FindNeighbors(k.param=6) %>%
StashIdent(save.name="res0p1_clusters")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 39977
## Number of edges: 1169562
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9715
## Number of communities: 8
## Elapsed time: 10 seconds
## Computing nearest neighbor graph
## Computing SNN
## With Seurat 3.X, stashing identity classes can be accomplished with the following:
## .[["res0p1_clusters"]] <- Idents(object = .)
RunUMAP(fnorm_scd, dims=1:9)
## 20:42:55 UMAP embedding parameters a = 0.9922 b = 1.112
## 20:42:55 Read 39977 rows and found 9 numeric columns
## 20:42:55 Using Annoy for neighbor search, n_neighbors = 30
## 20:42:55 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 20:43:02 Writing NN index file to temp file /tmp/RtmpdOLh2z/file1c7792405de78a
## 20:43:02 Searching Annoy index using 1 thread, search_k = 3000
## 20:43:25 Annoy recall = 100%
## 20:43:27 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 20:43:33 Initializing from normalized Laplacian + noise (using irlba)
## 20:43:42 Commencing optimization for 200 epochs, with 1631114 positive edges
## 20:44:05 Optimization finished
## An object of class Seurat
## 22132 features across 39977 samples within 2 assays
## Active assay: RNA (22123 features, 2000 variable features)
## 1 other assay present: ab
## 3 dimensional reductions calculated: pca, tsne, umap
DimPlot(fnorm_scd, label=TRUE)
<- FindClusters(fnorm_scd, resolution=0.2) %>%
fnorm_scd FindNeighbors(k.param=10) %>%
StashIdent(save.name="res0p2_clusters")
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 39977
## Number of edges: 517683
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9573
## Number of communities: 13
## Elapsed time: 7 seconds
## Computing nearest neighbor graph
## Computing SNN
## With Seurat 3.X, stashing identity classes can be accomplished with the following:
## .[["res0p2_clusters"]] <- Idents(object = .)
RunUMAP(fnorm_scd, dims=1:10)
## 20:44:30 UMAP embedding parameters a = 0.9922 b = 1.112
## 20:44:30 Read 39977 rows and found 10 numeric columns
## 20:44:30 Using Annoy for neighbor search, n_neighbors = 30
## 20:44:30 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 20:44:37 Writing NN index file to temp file /tmp/RtmpdOLh2z/file1c779255298ec1
## 20:44:37 Searching Annoy index using 1 thread, search_k = 3000
## 20:44:59 Annoy recall = 100%
## 20:45:02 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 20:45:07 Initializing from normalized Laplacian + noise (using irlba)
## 20:45:15 Commencing optimization for 200 epochs, with 1634470 positive edges
## 20:45:39 Optimization finished
## An object of class Seurat
## 22132 features across 39977 samples within 2 assays
## Active assay: RNA (22123 features, 2000 variable features)
## 1 other assay present: ab
## 3 dimensional reductions calculated: pca, tsne, umap
DimPlot(fnorm_scd, label=TRUE)
Add into the metadata a concatenation of the sample ID and the cluster ID
<- fnorm_scd@meta.data[["Idents"]]
identity_vector <- as.character(fnorm_scd@meta.data[["res0p1_clusters"]])
cluster_vector <- paste0(identity_vector, "_", cluster_vector)
concatenated_vector "cluster1_sample"]] <- concatenated_vector
fnorm_scd[[summary(as.factor(fnorm_scd@meta.data[["cluster1_sample"]]))
## control_0 control_1 control_2 control_3 control_4 control_5 control_6 control_7
## 1852 4675 2448 1586 939 487 196 48
## m_0 m_1 m_2 m_3 m_4 m_5 m_6 m_7
## 3822 2640 2006 1065 757 544 313 64
## mock_0 mock_1 mock_2 mock_3 mock_4 mock_5 mock_6 mock_7
## 2892 2671 1815 858 498 266 256 27
## n_0 n_1 n_2 n_3 n_4 n_5 n_6 n_7
## 2923 971 1615 682 674 262 65 60
<- as.character(fnorm_scd@meta.data[["res0p2_clusters"]])
cluster_vector2 <- paste0(identity_vector, "_", cluster_vector2)
concatenated_vector2 "cluster2_sample"]] <- concatenated_vector2
fnorm_scd[[summary(as.factor(fnorm_scd@meta.data[["cluster2_sample"]]))
## control_0 control_1 control_10 control_11 control_12 control_2 control_3
## 4074 971 101 451 48 862 1548
## control_4 control_5 control_6 control_7 control_8 control_9 m_0
## 858 885 1087 818 315 213 2150
## m_1 m_10 m_11 m_12 m_2 m_3 m_4
## 1784 428 193 64 1980 1209 736
## m_5 m_6 m_7 m_8 m_9 mock_0 mock_1
## 720 672 617 288 370 2239 2228
## mock_10 mock_11 mock_12 mock_2 mock_3 mock_4 mock_5
## 141 163 28 643 1126 647 484
## mock_6 mock_7 mock_8 mock_9 n_0 n_1 n_10
## 485 588 220 291 695 1707 228
## n_11 n_12 n_2 n_3 n_4 n_5 n_6
## 64 60 1200 740 809 664 373
## n_7 n_8 n_9
## 382 251 79
I am not yet certain of how Seurat handles (non)normalized data for the various FindMarkers functions. Thus, I am adding the clusters from the dimension reductions to the non-normalized data here.
"res0p1_clusters"]] <- fnorm_scd[["res0p1_clusters"]]
filt_scd[["res0p2_clusters"]] <- fnorm_scd[["res0p2_clusters"]]
filt_scd[["cluster1_sample"]] <- fnorm_scd[["cluster1_sample"]]
filt_scd[["cluster2_sample"]] <- fnorm_scd[["cluster2_sample"]] filt_scd[[
If we decide to limit the comparisons to the relatively small subset of genes with significant variance, these will be the winners; so we can decide now if that is a goal. Given that this dataset is not super-huge, I am not included to do it; but it is nice to see the set of highly-variable genes.
<- FindVariableFeatures(fnorm_scd)
var <- head(VariableFeatures(var), 30)
most_var <- VariableFeaturePlot(var)
variable_plot <- LabelPoints(plot=variable_plot, points=most_var, repel=TRUE) variable_plot
## When using repel, set xnudge and ynudge to 0 for optimal results
variable_plot
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: ggrepel: 11 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
I assume that Dr. Park would like to play with these various results in a xlsx file, so I will dump them all to a list and use that to define the sheets of this putative output file.
<- list()
de_lst <- 0.6 fc_threshold
The Clonotype containing cells are pink, the rest are blue in the final plot in the following block.
<- filt_scd
vdj_scd Idents(vdj_scd) <- vdj_scd[["tcell"]]
<- NormalizeData(vdj_scd) %>%
vdj_scd_viz FindVariableFeatures() %>%
ScaleData() %>%
RunPCA() %>%
FindNeighbors() %>%
FindClusters() %>%
RunTSNE() %>%
RunUMAP(reduction="pca", dims=1:10)
## Centering and scaling data matrix
## PC_ 1
## Positive: Coro1a, Rac2, Arhgdib, Laptm5, Cd3e, Selplg, Cd52, Ms4a4b, Cd3g, Ptprc
## Lat, Cd3d, Lck, Ms4a6b, Thy1, Ptprcap, Ltb, Lsp1, Itgb7, Il7r
## Fyb, Gramd3, Cd2, Cd28, Bin2, Icos, Cd69, Itgb2, Il2rb, Cotl1
## Negative: Sparc, Emp2, Cd63, Igfbp7, Ager, Gsn, Cldn18, Timp3, Selenbp1, Limch1
## Cystm1, Cst3, Gpx3, Selenbp2, Cryab, Npnt, Bcam, Bgn, Pmp22, Ces1d
## Alcam, Clic3, Apoe, Aqp5, Anxa3, Krt7, Hspb1, Col4a1, Fhl1, Serping1
## PC_ 2
## Positive: Bgn, Serping1, Plpp3, Sod3, Prelp, Rarres2, Col1a2, Sparcl1, Vim, Hsd11b1
## Olfml3, Col1a1, Loxl1, Clec3b, Tcf21, Mxra8, Ptgis, Fxyd1, Plxdc2, Spon1
## Ltbp4, Ppp1r14a, Fhl1, Lrp1, Gpx3, Col3a1, Colec12, Cdo1, Pcolce2, Pcolce
## Negative: Sftpd, Napsa, Sftpb, Slc34a2, Wfdc2, Cxcl15, Cbr2, Cldn3, Atp1b1, Sftpa1
## Muc1, Lamp3, Chil1, Ppp1r14c, Lgi3, Sftpc, Lyz2, Dram1, Ctsh, Scd1
## Hc, Car8, Egfl6, S100g, Ank3, Pi4k2b, Irx1, Lpcat1, Pla2g1b, Abca3
## PC_ 3
## Positive: Mgst1, Col6a1, Nupr1, Loxl1, Col3a1, Col1a1, Dram1, Sod3, Col1a2, Col6a2
## Ces1d, C3, Apoe, Slc34a2, Cxcl15, Lamp3, Mmp2, Chil1, Sftpb, Cd302
## Plxdc2, Sftpa1, Prelp, Lyz2, Sftpd, Ppp1r14c, Serpine2, Pdgfra, Scd1, Muc1
## Negative: Rtkn2, Scnn1g, Spock2, Igfbp2, Col4a3, Sema3e, Hopx, Pdpn, Cyp2b10, Scnn1b
## Cyp4b1, Clic5, Lama3, Flrt3, Col4a4, Tacstd2, Krt7, Lamb3, Cytl1, Sec14l3
## Tspan8, Itgb6, Cdkn2b, Krt19, Mab21l4, Tmem37, Scnn1a, Bdnf, Lmo7, Ndnf
## PC_ 4
## Positive: Cldn5, H2-Ab1, Cd74, Hpgd, Cd93, H2-Aa, H2-Eb1, Tmem252, Sema3c, Pcdh1
## Mcam, Efnb2, Dll4, Ly6c1, Sox17, Plk2, Gja4, Kdr, Stmn2, Lyve1
## Tmcc2, Sftpc, Hspa1a, Emcn, Slc34a2, Scn7a, Car4, Lamp3, Hc, Tbx3
## Negative: Fam183b, Cyp2s1, Foxj1, Tmem212, Spaca9, 1700007K13Rik, 1110017D15Rik, Gm19935, Ccdc153, Tctex1d4
## Rsph1, Arhgdig, Cfap126, BC051019, Nme5, 1700001C02Rik, Gm867, Tekt4, 1700016K19Rik, 1700094D03Rik
## 2410004P03Rik, Dnali1, Ak7, Ccdc113, Sntn, Nme9, Mns1, Pifo, AU040972, Anxa8
## PC_ 5
## Positive: Lgals1, S100a6, Col4a4, Scnn1g, Pdpn, Ndnf, Rtkn2, Birc5, Spock2, Col4a3
## Mki67, Scnn1b, Igfbp6, Tmem37, Pclaf, Flrt3, Top2a, Aqp5, Ccna2, Coro1a
## Rac2, Igfbp2, Gprc5a, Prdx6, Lgals3, Crlf1, Ube2c, Lamb3, Fads3, Rab11fip1
## Negative: Cldn5, Tm4sf1, Odf3b, Fam183b, Hpgd, Foxj1, Cd93, Tmem212, 1700007K13Rik, 1110017D15Rik
## Ccdc153, Ly6a, Gm19935, BC051019, Tctex1d4, 1700094D03Rik, Cfap126, Spaca9, 1700001C02Rik, 1700016K19Rik
## Tekt4, Gm867, Sema3c, Ak7, Dnali1, 2410004P03Rik, Sntn, Pcdh1, Nme9, Tmem252
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 39977
## Number of edges: 1236173
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9099
## Number of communities: 27
## Elapsed time: 10 seconds
## 20:52:05 UMAP embedding parameters a = 0.9922 b = 1.112
## 20:52:05 Read 39977 rows and found 10 numeric columns
## 20:52:05 Using Annoy for neighbor search, n_neighbors = 30
## 20:52:05 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 20:52:12 Writing NN index file to temp file /tmp/RtmpdOLh2z/file1c77926f8cd66e
## 20:52:12 Searching Annoy index using 1 thread, search_k = 3000
## 20:52:34 Annoy recall = 100%
## 20:52:37 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 20:52:42 Initializing from normalized Laplacian + noise (using irlba)
## 20:52:49 Commencing optimization for 200 epochs, with 1634470 positive edges
## 20:53:13 Optimization finished
DimPlot(vdj_scd_viz, reduction="tsne")
Idents(vdj_scd_viz) <- vdj_scd_viz@meta.data[["tcell"]]
<- DimPlot(vdj_scd_viz, reduction="umap", group.by="ident", label=TRUE)
plotted plotted
<- FindAllMarkers(vdj_scd, logfc.threshold=fc_threshold) vdj_markers
## Calculating cluster Yes
## Calculating cluster No
head(vdj_markers)
## p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## Ms4a4b 0 2.440 0.907 0.056 0 Yes Ms4a4b
## Ccl5 0 2.346 0.282 0.036 0 Yes Ccl5
## Rac2 0 2.206 0.952 0.118 0 Yes Rac2
## Coro1a 0 2.186 0.955 0.126 0 Yes Coro1a
## Cd3e 0 2.050 0.954 0.093 0 Yes Cd3e
## Arhgdib 0 1.975 0.944 0.173 0 Yes Arhgdib
<- as.data.frame(vdj_markers)
combined rownames(combined) <- toupper(rownames(combined))
<- merge(combined, brief, by.x="row.names",
annotated_markers by.y="hgnc_symbol", all.x=TRUE)
"vdj_vs_all"]] <- annotated_markers
de_lst[[head(annotated_markers)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## 1 4931406P16RIK 0 -0.6663 0.150 0.473 0 Yes 4931406P16Rik
## 2 4931406P16RIK.1 0 0.6663 0.473 0.150 0 No 4931406P16Rik
## 3 ABCA3 0 -0.9337 0.132 0.398 0 Yes Abca3
## 4 ABCA3.1 0 0.9337 0.398 0.132 0 No Abca3
## 5 ACE 0 -1.5775 0.136 0.663 0 Yes Ace
## 6 ACE.1 0 1.5775 0.663 0.136 0 No Ace
## description
## 1 <NA>
## 2 <NA>
## 3 ATP binding cassette subfamily A member 3 [Source:HGNC Symbol;Acc:HGNC:33]
## 4 <NA>
## 5 angiotensin I converting enzyme [Source:HGNC Symbol;Acc:HGNC:2707]
## 6 <NA>
## ensembl_gene_id
## 1 <NA>
## 2 <NA>
## 3 ENSG00000167972
## 4 <NA>
## 5 ENSG00000159640
## 6 <NA>
Question: Is it smart enough to use the raw data if I give FindAllMarkers the normalized data? For the moment I do not think I will risk it.
<- FindAllMarkers(filt_scd, logfc.threshold=fc_threshold) combined_markers
## Calculating cluster control
## Calculating cluster m
## Calculating cluster mock
## Calculating cluster n
head(combined_markers)
## p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## Tmem252 0 1.0736 0.515 0.331 0 control Tmem252
## Plvap 0 0.8446 0.616 0.428 0 control Plvap
## Ctla2a 0 0.6833 0.693 0.545 0 control Ctla2a
## Lyve1 0 0.6534 0.439 0.261 0 control Lyve1
## Eng 0 0.6347 0.648 0.470 0 control Eng
## Cd69 0 -0.6046 0.092 0.267 0 control Cd69
<- as.data.frame(combined_markers)
combined rownames(combined) <- toupper(rownames(combined))
<- merge(combined, brief, by.x="row.names",
annotated_markers by.y="hgnc_symbol", all.x=TRUE)
"samples_vs_all"]] <- annotated_markers de_lst[[
Since I am not using the fnorm_scd data structure, I will need to pull the cluster information from the normalized copy…
<- filt_scd
cluster_scd Idents(cluster_scd) <- cluster_scd[["res0p2_clusters"]]
<- FindAllMarkers(cluster_scd, only.pos=TRUE, logfc.threshold=fc_threshold) cluster_markers
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
## Calculating cluster 9
## Calculating cluster 10
## Calculating cluster 11
## Calculating cluster 12
<- as.data.frame(cluster_markers)
cluster_genes rownames(cluster_genes) <- toupper(rownames(cluster_genes))
<- merge(cluster_genes, brief, by.x="row.names",
annotated_clusters by.y="hgnc_symbol", all.x=TRUE)
head(annotated_clusters)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj cluster
## 1 0610010K14RIK 2.151e-195 0.8222 0.682 0.284 4.758e-191 10
## 2 0610012G03RIK 0.000e+00 0.6270 0.695 0.340 0.000e+00 4
## 3 0610012G03RIK1 0.000e+00 0.6110 0.677 0.344 0.000e+00 5
## 4 1110004F10RIK 0.000e+00 0.7022 0.732 0.375 0.000e+00 5
## 5 1110008P14RIK 0.000e+00 0.8619 0.775 0.324 0.000e+00 4
## 6 1110008P14RIK1 0.000e+00 0.8493 0.761 0.329 0.000e+00 5
## gene description ensembl_gene_id
## 1 0610010K14Rik <NA> <NA>
## 2 0610012G03Rik <NA> <NA>
## 3 0610012G03Rik <NA> <NA>
## 4 1110004F10Rik <NA> <NA>
## 5 1110008P14Rik <NA> <NA>
## 6 1110008P14Rik <NA> <NA>
"samplecluster_vs_all"]] <- annotated_clusters
de_lst[[
%>%
annotated_clusters group_by(cluster) %>%
::top_n(n=10, wt=avg_log2FC) %>%
dplyras.data.frame()
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## 1 ACTA2 2.029e-212 2.1746 0.154 0.031 4.490e-208 7 Acta2
## 2 AGER1 0.000e+00 3.8973 1.000 0.402 0.000e+00 5 Ager
## 3 AQP51 0.000e+00 3.1869 0.993 0.245 0.000e+00 5 Aqp5
## 4 AU040972 0.000e+00 5.1629 0.915 0.002 0.000e+00 12 AU040972
## 5 BTG11 0.000e+00 0.9538 0.964 0.723 0.000e+00 2 Btg1
## 6 C33 2.912e-133 3.3287 0.514 0.259 6.442e-129 8 C3
## 7 CBR2 0.000e+00 3.3325 0.989 0.147 0.000e+00 3 Cbr2
## 8 CCL11 0.000e+00 3.1280 0.214 0.005 0.000e+00 8 Ccl11
## 9 CCL4 0.000e+00 3.2284 0.340 0.027 0.000e+00 11 Ccl4
## 10 CCL5 0.000e+00 2.7653 0.395 0.055 0.000e+00 1 Ccl5
## 11 CCL51 1.710e-60 1.1371 0.276 0.108 3.784e-56 9 Ccl5
## 12 CCL6 0.000e+00 4.2855 0.719 0.004 0.000e+00 11 Ccl6
## 13 CCR7 0.000e+00 1.8535 0.736 0.097 0.000e+00 2 Ccr7
## 14 CD24A2 7.556e-219 3.9311 0.990 0.239 1.672e-214 12 Cd24a
## 15 CD3E 0.000e+00 1.4657 0.928 0.241 0.000e+00 1 Cd3e
## 16 CD3E2 9.318e-228 0.8787 0.857 0.344 2.062e-223 9 Cd3e
## 17 CD3G 0.000e+00 1.4709 0.894 0.197 0.000e+00 1 Cd3g
## 18 CD3G1 7.876e-218 0.8150 0.790 0.302 1.742e-213 9 Cd3g
## 19 CD52 0.000e+00 1.4353 0.901 0.227 0.000e+00 1 Cd52
## 20 CDKN1C2 0.000e+00 2.0148 0.886 0.154 0.000e+00 7 Cdkn1c
## 21 CFB1 6.267e-280 2.8981 0.408 0.095 1.386e-275 8 Cfb
## 22 CHIL11 0.000e+00 2.3877 0.968 0.149 0.000e+00 4 Chil1
## 23 CHIL3 0.000e+00 3.1485 0.583 0.001 0.000e+00 11 Chil3
## 24 CLDN5 0.000e+00 0.9530 0.898 0.383 0.000e+00 0 Cldn5
## 25 CLEC14A 0.000e+00 0.9154 0.789 0.328 0.000e+00 0 Clec14a
## 26 CLIC31 0.000e+00 3.1025 0.997 0.282 0.000e+00 5 Clic3
## 27 COL1A12 0.000e+00 3.0879 0.736 0.184 0.000e+00 8 Col1a1
## 28 COL3A12 0.000e+00 3.5841 0.855 0.171 0.000e+00 8 Col3a1
## 29 CORO1A2 0.000e+00 3.1931 0.783 0.370 0.000e+00 10 Coro1a
## 30 COX4I22 0.000e+00 2.9163 0.965 0.151 0.000e+00 7 Cox4i2
## 31 CTSS 0.000e+00 2.9050 0.887 0.104 0.000e+00 11 Ctss
## 32 CXCL15 0.000e+00 3.4958 0.995 0.150 0.000e+00 3 Cxcl15
## 33 CXCL151 0.000e+00 2.3954 0.993 0.186 0.000e+00 4 Cxcl15
## 34 CXCL2 0.000e+00 3.2717 0.535 0.050 0.000e+00 11 Cxcl2
## 35 CYBB 0.000e+00 2.8848 0.828 0.008 0.000e+00 11 Cybb
## 36 CYP2B101 0.000e+00 3.0812 0.974 0.138 0.000e+00 5 Cyp2b10
## 37 CYP2F21 0.000e+00 4.8603 0.855 0.069 0.000e+00 12 Cyp2f2
## 38 DCN 0.000e+00 4.6497 0.738 0.015 0.000e+00 8 Dcn
## 39 DUSP10 0.000e+00 0.9657 0.636 0.188 0.000e+00 2 Dusp10
## 40 EGFL7 0.000e+00 0.9748 0.943 0.420 0.000e+00 0 Egfl7
## 41 EMP21 0.000e+00 3.0378 1.000 0.536 0.000e+00 5 Emp2
## 42 FAM183B 0.000e+00 4.1113 0.995 0.001 0.000e+00 12 Fam183b
## 43 FCER1G 0.000e+00 3.0825 0.892 0.030 0.000e+00 11 Fcer1g
## 44 FHL12 0.000e+00 2.3125 0.987 0.349 0.000e+00 6 Fhl1
## 45 FN12 0.000e+00 2.3677 0.970 0.231 0.000e+00 6 Fn1
## 46 FOXJ1 0.000e+00 3.7948 0.990 0.006 0.000e+00 12 Foxj1
## 47 GPIHBP1 0.000e+00 0.9780 0.882 0.381 0.000e+00 0 Gpihbp1
## 48 GPX32 0.000e+00 2.4391 0.997 0.458 0.000e+00 6 Gpx3
## 49 GRAMD31 0.000e+00 1.0519 0.752 0.229 0.000e+00 2 Gramd3
## 50 GSN2 0.000e+00 2.7170 0.999 0.463 0.000e+00 6 Gsn
## 51 GUCY1A12 0.000e+00 2.7680 0.961 0.145 0.000e+00 7 Gucy1a1
## 52 GUCY1B12 0.000e+00 2.3584 0.935 0.129 0.000e+00 7 Gucy1b1
## 53 GYG2 0.000e+00 2.2837 0.987 0.361 0.000e+00 6 Gyg
## 54 GZMA 0.000e+00 1.3468 0.126 0.015 0.000e+00 1 Gzma
## 55 H2AFZ1 0.000e+00 3.3885 0.997 0.797 0.000e+00 10 H2afz
## 56 HMGB2 0.000e+00 4.1915 0.993 0.436 0.000e+00 10 Hmgb2
## 57 HOPX1 0.000e+00 3.6044 1.000 0.553 0.000e+00 5 Hopx
## 58 HP2 8.587e-168 4.0893 0.930 0.197 1.900e-163 12 Hp
## 59 HPGD 0.000e+00 0.9508 0.791 0.359 0.000e+00 0 Hpgd
## 60 IGFBP21 0.000e+00 3.0475 0.863 0.095 0.000e+00 5 Igfbp2
## 61 IGFBP5 0.000e+00 3.6452 0.640 0.026 0.000e+00 8 Igfbp5
## 62 IGKC 5.959e-200 2.6524 0.111 0.008 1.318e-195 11 Igkc
## 63 IL1B 0.000e+00 2.9888 0.525 0.005 0.000e+00 11 Il1b
## 64 IL7R1 0.000e+00 1.0491 0.798 0.212 0.000e+00 2 Il7r
## 65 IL7R2 2.959e-227 0.9600 0.744 0.269 6.547e-223 9 Il7r
## 66 INMT2 0.000e+00 2.6961 0.982 0.349 0.000e+00 6 Inmt
## 67 KRT191 0.000e+00 3.0081 0.984 0.202 0.000e+00 5 Krt19
## 68 KRT71 0.000e+00 3.4074 0.999 0.295 0.000e+00 5 Krt7
## 69 LAPTM5 0.000e+00 1.3463 0.956 0.244 0.000e+00 1 Laptm5
## 70 LCN21 0.000e+00 2.4051 0.866 0.191 0.000e+00 4 Lcn2
## 71 LEF1 0.000e+00 1.0690 0.642 0.137 0.000e+00 2 Lef1
## 72 LGALS11 0.000e+00 3.6558 0.883 0.488 0.000e+00 10 Lgals1
## 73 LTB 0.000e+00 1.3748 0.800 0.199 0.000e+00 1 Ltb
## 74 LTB1 2.966e-242 0.9192 0.786 0.288 6.562e-238 9 Ltb
## 75 LYVE1 0.000e+00 0.9225 0.565 0.241 0.000e+00 0 Lyve1
## 76 LYZ1 0.000e+00 3.5086 0.586 0.105 0.000e+00 3 Lyz1
## 77 LYZ11 0.000e+00 2.4761 0.614 0.123 0.000e+00 4 Lyz1
## 78 LYZ2 0.000e+00 3.5789 0.961 0.211 0.000e+00 3 Lyz2
## 79 LYZ21 0.000e+00 2.4426 0.971 0.242 0.000e+00 4 Lyz2
## 80 MFAP42 0.000e+00 2.9087 0.994 0.354 0.000e+00 6 Mfap4
## 81 MFGE82 0.000e+00 2.0864 0.975 0.422 0.000e+00 7 Mfge8
## 82 MGP3 0.000e+00 3.8952 0.959 0.301 0.000e+00 8 Mgp
## 83 MKI67 0.000e+00 3.3311 0.898 0.014 0.000e+00 10 Mki67
## 84 MS4A4B 0.000e+00 1.5901 0.850 0.209 0.000e+00 1 Ms4a4b
## 85 MS4A4B1 0.000e+00 0.8762 0.852 0.245 0.000e+00 2 Ms4a4b
## 86 MS4A4B2 1.276e-194 0.8917 0.778 0.305 2.823e-190 9 Ms4a4b
## 87 NDUFA4L21 0.000e+00 2.0304 0.890 0.090 0.000e+00 7 Ndufa4l2
## 88 NKG7 0.000e+00 1.5389 0.522 0.081 0.000e+00 1 Nkg7
## 89 PCLAF 0.000e+00 3.1753 0.861 0.006 0.000e+00 10 Pclaf
## 90 PDE5A2 0.000e+00 2.0832 0.915 0.190 0.000e+00 7 Pde5a
## 91 PDZD22 0.000e+00 2.2914 0.923 0.231 0.000e+00 7 Pdzd2
## 92 PLTP 0.000e+00 0.9417 0.920 0.460 0.000e+00 0 Pltp
## 93 PLVAP 0.000e+00 1.1622 0.862 0.374 0.000e+00 0 Plvap
## 94 POSTN2 0.000e+00 2.9881 0.942 0.093 0.000e+00 7 Postn
## 95 RAC2 0.000e+00 1.4591 0.966 0.254 0.000e+00 1 Rac2
## 96 RAC21 9.508e-257 0.8291 0.927 0.359 2.103e-252 9 Rac2
## 97 RAMP2 0.000e+00 0.9862 0.958 0.430 0.000e+00 0 Ramp2
## 98 RARRES23 0.000e+00 3.2971 0.925 0.309 0.000e+00 8 Rarres2
## 99 RETNLA3 1.904e-121 3.9942 0.415 0.051 4.213e-117 12 Retnla
## 100 RPL12 0.000e+00 0.9420 0.983 0.884 0.000e+00 2 Rpl12
## 101 S100A101 0.000e+00 2.9991 0.979 0.738 0.000e+00 10 S100a10
## 102 SCGB1A1 0.000e+00 3.2126 0.750 0.482 0.000e+00 3 Scgb1a1
## 103 SCGB1A12 4.630e-83 6.0663 0.900 0.511 1.024e-78 12 Scgb1a1
## 104 SCGB3A2 8.439e-237 3.8681 0.275 0.012 1.867e-232 12 Scgb3a2
## 105 SELPLG1 1.002e-217 0.8022 0.830 0.325 2.217e-213 9 Selplg
## 106 SERPING12 0.000e+00 2.3104 0.999 0.326 0.000e+00 6 Serping1
## 107 SFTPA1 0.000e+00 3.5726 0.996 0.204 0.000e+00 3 Sftpa1
## 108 SFTPA11 0.000e+00 2.5243 0.994 0.238 0.000e+00 4 Sftpa1
## 109 SFTPB 0.000e+00 3.5783 0.997 0.154 0.000e+00 3 Sftpb
## 110 SFTPB1 0.000e+00 2.3461 0.994 0.190 0.000e+00 4 Sftpb
## 111 SFTPC 0.000e+00 3.6816 0.993 0.583 0.000e+00 3 Sftpc
## 112 SFTPC1 0.000e+00 2.5518 0.997 0.600 0.000e+00 4 Sftpc
## 113 SFTPD 0.000e+00 3.3058 0.995 0.131 0.000e+00 3 Sftpd
## 114 SFTPD1 0.000e+00 2.4078 0.997 0.168 0.000e+00 4 Sftpd
## 115 SLC34A2 0.000e+00 3.1787 0.976 0.109 0.000e+00 3 Slc34a2
## 116 SLC34A21 0.000e+00 2.2930 0.985 0.145 0.000e+00 4 Slc34a2
## 117 SOD32 0.000e+00 2.6091 0.997 0.271 0.000e+00 6 Sod3
## 118 SRGN1 1.499e-175 0.8936 0.973 0.706 3.316e-171 9 Srgn
## 119 STK17B1 0.000e+00 0.8377 0.799 0.318 0.000e+00 2 Stk17b
## 120 TCF212 0.000e+00 2.5133 0.966 0.240 0.000e+00 6 Tcf21
## 121 TIMP11 0.000e+00 3.6219 0.505 0.061 0.000e+00 8 Timp1
## 122 TOP2A 0.000e+00 3.2327 0.843 0.019 0.000e+00 10 Top2a
## 123 TPPP32 1.202e-218 3.8979 0.990 0.242 2.659e-214 12 Tppp3
## 124 TSPAN7 0.000e+00 1.1043 0.928 0.414 0.000e+00 0 Tspan7
## 125 TUBB52 0.000e+00 3.2251 0.993 0.761 0.000e+00 10 Tubb5
## 126 TYROBP 0.000e+00 3.8142 0.917 0.016 0.000e+00 11 Tyrobp
## 127 UBE2C 0.000e+00 3.3696 0.751 0.009 0.000e+00 10 Ube2c
## 128 VEGFA1 0.000e+00 2.9507 0.988 0.383 0.000e+00 5 Vegfa
## 129 VPS37B1 0.000e+00 1.0445 0.822 0.338 0.000e+00 2 Vps37b
## 130 VPS37B2 9.986e-169 0.8813 0.793 0.385 2.209e-164 9 Vps37b
## description
## 1 actin alpha 2, smooth muscle [Source:HGNC Symbol;Acc:HGNC:130]
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
## 7 <NA>
## 8 C-C motif chemokine ligand 11 [Source:HGNC Symbol;Acc:HGNC:10610]
## 9 C-C motif chemokine ligand 4 [Source:HGNC Symbol;Acc:HGNC:10630]
## 10 C-C motif chemokine ligand 5 [Source:HGNC Symbol;Acc:HGNC:10632]
## 11 <NA>
## 12 <NA>
## 13 C-C motif chemokine receptor 7 [Source:HGNC Symbol;Acc:HGNC:1608]
## 14 <NA>
## 15 CD3e molecule [Source:HGNC Symbol;Acc:HGNC:1674]
## 16 <NA>
## 17 CD3g molecule [Source:HGNC Symbol;Acc:HGNC:1675]
## 18 <NA>
## 19 CD52 molecule [Source:HGNC Symbol;Acc:HGNC:1804]
## 20 <NA>
## 21 <NA>
## 22 <NA>
## 23 <NA>
## 24 claudin 5 [Source:HGNC Symbol;Acc:HGNC:2047]
## 25 C-type lectin domain containing 14A [Source:HGNC Symbol;Acc:HGNC:19832]
## 26 <NA>
## 27 <NA>
## 28 <NA>
## 29 <NA>
## 30 <NA>
## 31 cathepsin S [Source:HGNC Symbol;Acc:HGNC:2545]
## 32 <NA>
## 33 <NA>
## 34 C-X-C motif chemokine ligand 2 [Source:HGNC Symbol;Acc:HGNC:4603]
## 35 cytochrome b-245 beta chain [Source:HGNC Symbol;Acc:HGNC:2578]
## 36 <NA>
## 37 <NA>
## 38 decorin [Source:HGNC Symbol;Acc:HGNC:2705]
## 39 dual specificity phosphatase 10 [Source:HGNC Symbol;Acc:HGNC:3065]
## 40 EGF like domain multiple 7 [Source:HGNC Symbol;Acc:HGNC:20594]
## 41 <NA>
## 42 <NA>
## 43 Fc fragment of IgE receptor Ig [Source:HGNC Symbol;Acc:HGNC:3611]
## 44 <NA>
## 45 <NA>
## 46 forkhead box J1 [Source:HGNC Symbol;Acc:HGNC:3816]
## 47 glycosylphosphatidylinositol anchored high density lipoprotein binding protein 1 [Source:HGNC Symbol;Acc:HGNC:24945]
## 48 <NA>
## 49 <NA>
## 50 <NA>
## 51 <NA>
## 52 <NA>
## 53 glycogenin 2 [Source:HGNC Symbol;Acc:HGNC:4700]
## 54 granzyme A [Source:HGNC Symbol;Acc:HGNC:4708]
## 55 <NA>
## 56 high mobility group box 2 [Source:HGNC Symbol;Acc:HGNC:5000]
## 57 <NA>
## 58 <NA>
## 59 15-hydroxyprostaglandin dehydrogenase [Source:HGNC Symbol;Acc:HGNC:5154]
## 60 <NA>
## 61 insulin like growth factor binding protein 5 [Source:HGNC Symbol;Acc:HGNC:5474]
## 62 immunoglobulin kappa constant [Source:HGNC Symbol;Acc:HGNC:5716]
## 63 interleukin 1 beta [Source:HGNC Symbol;Acc:HGNC:5992]
## 64 <NA>
## 65 <NA>
## 66 <NA>
## 67 <NA>
## 68 keratin 71 [Source:HGNC Symbol;Acc:HGNC:28927]
## 69 lysosomal protein transmembrane 5 [Source:HGNC Symbol;Acc:HGNC:29612]
## 70 <NA>
## 71 lymphoid enhancer binding factor 1 [Source:HGNC Symbol;Acc:HGNC:6551]
## 72 <NA>
## 73 lymphotoxin beta [Source:HGNC Symbol;Acc:HGNC:6711]
## 74 <NA>
## 75 lymphatic vessel endothelial hyaluronan receptor 1 [Source:HGNC Symbol;Acc:HGNC:14687]
## 76 <NA>
## 77 <NA>
## 78 <NA>
## 79 <NA>
## 80 <NA>
## 81 <NA>
## 82 <NA>
## 83 marker of proliferation Ki-67 [Source:HGNC Symbol;Acc:HGNC:7107]
## 84 <NA>
## 85 <NA>
## 86 <NA>
## 87 <NA>
## 88 natural killer cell granule protein 7 [Source:HGNC Symbol;Acc:HGNC:7830]
## 89 PCNA clamp associated factor [Source:HGNC Symbol;Acc:HGNC:28961]
## 90 <NA>
## 91 <NA>
## 92 phospholipid transfer protein [Source:HGNC Symbol;Acc:HGNC:9093]
## 93 plasmalemma vesicle associated protein [Source:HGNC Symbol;Acc:HGNC:13635]
## 94 <NA>
## 95 Rac family small GTPase 2 [Source:HGNC Symbol;Acc:HGNC:9802]
## 96 <NA>
## 97 receptor activity modifying protein 2 [Source:HGNC Symbol;Acc:HGNC:9844]
## 98 <NA>
## 99 <NA>
## 100 ribosomal protein L12 [Source:HGNC Symbol;Acc:HGNC:10302]
## 101 <NA>
## 102 secretoglobin family 1A member 1 [Source:HGNC Symbol;Acc:HGNC:12523]
## 103 <NA>
## 104 secretoglobin family 3A member 2 [Source:HGNC Symbol;Acc:HGNC:18391]
## 105 <NA>
## 106 <NA>
## 107 surfactant protein A1 [Source:HGNC Symbol;Acc:HGNC:10798]
## 108 <NA>
## 109 surfactant protein B [Source:HGNC Symbol;Acc:HGNC:10801]
## 110 <NA>
## 111 surfactant protein C [Source:HGNC Symbol;Acc:HGNC:10802]
## 112 <NA>
## 113 surfactant protein D [Source:HGNC Symbol;Acc:HGNC:10803]
## 114 <NA>
## 115 solute carrier family 34 member 2 [Source:HGNC Symbol;Acc:HGNC:11020]
## 116 <NA>
## 117 <NA>
## 118 <NA>
## 119 <NA>
## 120 <NA>
## 121 <NA>
## 122 DNA topoisomerase II alpha [Source:HGNC Symbol;Acc:HGNC:11989]
## 123 <NA>
## 124 tetraspanin 7 [Source:HGNC Symbol;Acc:HGNC:11854]
## 125 <NA>
## 126 transmembrane immune signaling adaptor TYROBP [Source:HGNC Symbol;Acc:HGNC:12449]
## 127 ubiquitin conjugating enzyme E2 C [Source:HGNC Symbol;Acc:HGNC:15937]
## 128 <NA>
## 129 <NA>
## 130 <NA>
## ensembl_gene_id
## 1 ENSG00000107796
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
## 7 <NA>
## 8 ENSG00000172156
## 9 ENSG00000275302
## 10 ENSG00000271503
## 11 <NA>
## 12 <NA>
## 13 ENSG00000126353
## 14 <NA>
## 15 ENSG00000198851
## 16 <NA>
## 17 ENSG00000160654
## 18 <NA>
## 19 ENSG00000169442
## 20 <NA>
## 21 <NA>
## 22 <NA>
## 23 <NA>
## 24 ENSG00000184113
## 25 ENSG00000176435
## 26 <NA>
## 27 <NA>
## 28 <NA>
## 29 <NA>
## 30 <NA>
## 31 ENSG00000163131
## 32 <NA>
## 33 <NA>
## 34 ENSG00000081041
## 35 ENSG00000165168
## 36 <NA>
## 37 <NA>
## 38 ENSG00000011465
## 39 ENSG00000143507
## 40 ENSG00000172889
## 41 <NA>
## 42 <NA>
## 43 ENSG00000158869
## 44 <NA>
## 45 <NA>
## 46 ENSG00000129654
## 47 ENSG00000277494
## 48 <NA>
## 49 <NA>
## 50 <NA>
## 51 <NA>
## 52 <NA>
## 53 ENSG00000056998
## 54 ENSG00000145649
## 55 <NA>
## 56 ENSG00000164104
## 57 <NA>
## 58 <NA>
## 59 ENSG00000164120
## 60 <NA>
## 61 ENSG00000115461
## 62 ENSG00000211592
## 63 ENSG00000125538
## 64 <NA>
## 65 <NA>
## 66 <NA>
## 67 <NA>
## 68 ENSG00000139648
## 69 ENSG00000162511
## 70 <NA>
## 71 ENSG00000138795
## 72 <NA>
## 73 ENSG00000227507
## 74 <NA>
## 75 ENSG00000133800
## 76 <NA>
## 77 <NA>
## 78 <NA>
## 79 <NA>
## 80 <NA>
## 81 <NA>
## 82 <NA>
## 83 ENSG00000148773
## 84 <NA>
## 85 <NA>
## 86 <NA>
## 87 <NA>
## 88 ENSG00000105374
## 89 ENSG00000166803
## 90 <NA>
## 91 <NA>
## 92 ENSG00000100979
## 93 ENSG00000130300
## 94 <NA>
## 95 ENSG00000128340
## 96 <NA>
## 97 ENSG00000131477
## 98 <NA>
## 99 <NA>
## 100 ENSG00000197958
## 101 <NA>
## 102 ENSG00000149021
## 103 <NA>
## 104 ENSG00000164265
## 105 <NA>
## 106 <NA>
## 107 ENSG00000122852
## 108 <NA>
## 109 ENSG00000168878
## 110 <NA>
## 111 ENSG00000168484
## 112 <NA>
## 113 ENSG00000133661
## 114 <NA>
## 115 ENSG00000157765
## 116 <NA>
## 117 <NA>
## 118 <NA>
## 119 <NA>
## 120 <NA>
## 121 <NA>
## 122 ENSG00000131747
## 123 <NA>
## 124 ENSG00000156298
## 125 <NA>
## 126 ENSG00000011600
## 127 ENSG00000175063
## 128 <NA>
## 129 <NA>
## 130 <NA>
<- count_clonotype_by_cluster(cluster_scd) cluster_scd
## This result may be found in the res0p2_clusters element of scd@misc.
@misc$res0p2_clusters cluster_scd
## cluster_name cells has_clono proportion
## 1 0 9158 115 0.0125573269272767
## 2 1 6690 5583 0.834529147982063
## 3 2 4685 3933 0.83948772678762
## 4 3 4623 246 0.0532121998702141
## 5 4 3050 178 0.0583606557377049
## 6 5 2753 261 0.0948056665455866
## 7 6 2617 221 0.084447841039358
## 8 7 2405 146 0.0607068607068607
## 9 8 1074 64 0.0595903165735568
## 10 9 953 734 0.770199370409234
## 11 10 898 624 0.694877505567929
## 12 11 871 94 0.107921928817451
## 13 12 200 25 0.125
## 14 cluster_sum 39977 12224 0.305775821097131
<- cluster_scd@misc$res0p2_clusters[["proportion"]] >= 0.75
high_vdj_cluster_idx <- cluster_scd@misc$res0p2_clusters[high_vdj_cluster_idx, "cluster_name"] high_vdj_clusters
Lets add a new annotation column ‘high_vdj_prop’; this is the set of all cells in clusters with a proportion of vdj’s >= 0.75.
<- cluster_scd@meta.data[["res0p2_clusters"]] %in% high_vdj_clusters
high_vdj_cell_idx summary(high_vdj_cell_idx)
## Mode FALSE TRUE
## logical 27649 12328
"high_vdj_cluster"]] <- "No"
cluster_scd[[@meta.data[high_vdj_cell_idx, "high_vdj_cluster"] <- "Yes"
cluster_scd
@meta.data[["sample_vdjcluster"]] <- paste0(cluster_scd@meta.data[["orig.ident"]], "_",
cluster_scd@meta.data[["high_vdj_cluster"]]) cluster_scd
This asks what is different from all cells in high VDJ clusters vs. all others. It is likely we want to concatenate this will sample ID in order to perform specific comparisons.
<- cluster_scd
highvdj_scd Idents(highvdj_scd) <- highvdj_scd[["high_vdj_cluster"]]
<- FindAllMarkers(highvdj_scd, only.pos=TRUE, logfc.threshold=fc_threshold) cluster_markers
## Calculating cluster Yes
## Calculating cluster No
<- as.data.frame(cluster_markers)
cluster_genes rownames(cluster_genes) <- toupper(rownames(cluster_genes))
<- merge(cluster_genes, brief, by.x="row.names", by.y="hgnc_symbol",
annotated_clusters all.x=TRUE)
head(annotated_clusters)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene
## 1 1810037I17RIK 0 1.0613 0.744 0.433 0 No 1810037I17Rik
## 2 4931406P16RIK 0 0.8715 0.499 0.094 0 No 4931406P16Rik
## 3 9530068E07RIK 0 0.6489 0.493 0.162 0 No 9530068E07Rik
## 4 ABCA3 0 1.2911 0.426 0.070 0 No Abca3
## 5 ABCD3 0 0.6546 0.337 0.026 0 No Abcd3
## 6 ABHD17C 0 0.6044 0.354 0.074 0 No Abhd17c
## description
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 ATP binding cassette subfamily A member 3 [Source:HGNC Symbol;Acc:HGNC:33]
## 5 ATP binding cassette subfamily D member 3 [Source:HGNC Symbol;Acc:HGNC:67]
## 6 abhydrolase domain containing 17C, depalmitoylase [Source:HGNC Symbol;Acc:HGNC:26925]
## ensembl_gene_id
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 ENSG00000167972
## 5 ENSG00000117528
## 6 ENSG00000136379
"highvdjcluster_vs_all"]] <- annotated_clusters de_lst[[
A few clusters have a large majority of the VDJ information. Previously I recorded which they are, but because of the vagaries of UMAP, this appears to change from invocation to invocation.
Among the cluster2_samples, cluster 1 has 85% VDJ’s; so let us look at mock vs. nasal and mock vs. muscular in it.
I do an explicit query of my clonotype percentages in order to extract the cluster with the most VDJ markers. I will then perform an explicit pairwise comparison of what I perceive to be the most interesting categories therein: muscular/mock, nasal/mock, and muscular/nasal.
<- order(cluster_scd@misc[["res0p2_clusters"]][["proportion"]], decreasing=TRUE)
highest_cluster_idx <- cluster_scd@misc[["res0p2_clusters"]][highest_cluster_idx[1], "cluster_name"]
highest_cluster
<- paste0("control_", highest_cluster)
highest_control <- paste0("mock_", highest_cluster)
highest_mock <- paste0("n_", highest_cluster)
highest_nasal <- paste0("m_", highest_cluster)
highest_musc
<- "cluster2_sample"
category <- cluster_scd[["cluster2_sample"]] == highest_control
control_group sum(control_group)
## [1] 862
<- cluster_scd[["cluster2_sample"]] == highest_mock
mock_group sum(mock_group)
## [1] 643
<- cluster_scd[["cluster2_sample"]] == highest_musc
musc_group sum(musc_group)
## [1] 1980
<- cluster_scd[["cluster2_sample"]] == highest_nasal
nasal_group sum(nasal_group)
## [1] 1200
<- FindMarkers(
nasal_clust1 group.by=category,
cluster_scd, ident.1=highest_nasal, ident.2=highest_mock)
<- as.data.frame(nasal_clust1)
nasal_clust1 rownames(nasal_clust1) <- toupper(rownames(nasal_clust1))
<- merge(nasal_clust1, brief, by="row.names",
nasal_clust1 by.y="hgnc_symbol", all.x=TRUE)
head(nasal_clust1)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj
## 1 ALDOA 2.400e-07 -0.2567 0.580 0.658 5.309e-03
## 2 AY036118 1.218e-06 -0.3184 0.635 0.708 2.694e-02
## 3 CCL5 6.261e-05 0.7200 0.117 0.059 1.000e+00
## 4 CCR8 1.517e-05 -0.2943 0.054 0.109 3.356e-01
## 5 CD53 7.373e-11 -0.2841 0.601 0.701 1.631e-06
## 6 DDX5 7.279e-19 -0.3632 0.873 0.916 1.610e-14
## description
## 1 aldolase, fructose-bisphosphate A [Source:HGNC Symbol;Acc:HGNC:414]
## 2 <NA>
## 3 C-C motif chemokine ligand 5 [Source:HGNC Symbol;Acc:HGNC:10632]
## 4 C-C motif chemokine receptor 8 [Source:HGNC Symbol;Acc:HGNC:1609]
## 5 CD53 molecule [Source:HGNC Symbol;Acc:HGNC:1686]
## 6 DEAD-box helicase 5 [Source:HGNC Symbol;Acc:HGNC:2746]
## ensembl_gene_id
## 1 ENSG00000149925
## 2 <NA>
## 3 ENSG00000271503
## 4 ENSG00000179934
## 5 ENSG00000143119
## 6 ENSG00000108654
"highestvdj_nasal_vs_mock"]] <- nasal_clust1 de_lst[[
<- FindMarkers(
musc_clust1 group.by=category,
cluster_scd, ident.1=highest_musc, ident.2=highest_mock)
<- as.data.frame(musc_clust1)
musc_clust1 rownames(musc_clust1) <- toupper(rownames(musc_clust1))
<- merge(musc_clust1, brief, by="row.names",
musc_clust1 by.y="hgnc_symbol", all.x=TRUE)
head(musc_clust1)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj
## 1 ATP5E 1.515e-15 0.3046 0.660 0.530 3.353e-11
## 2 ATP5L 8.794e-15 0.2848 0.609 0.460 1.946e-10
## 3 BC004004 4.507e-17 0.2922 0.523 0.345 9.972e-13
## 4 CCR8 1.348e-06 -0.2786 0.054 0.109 2.982e-02
## 5 CD52 3.387e-24 0.4280 0.851 0.739 7.494e-20
## 6 COX8A 1.562e-12 0.2817 0.754 0.652 3.456e-08
## description
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 C-C motif chemokine receptor 8 [Source:HGNC Symbol;Acc:HGNC:1609]
## 5 CD52 molecule [Source:HGNC Symbol;Acc:HGNC:1804]
## 6 cytochrome c oxidase subunit 8A [Source:HGNC Symbol;Acc:HGNC:2294]
## ensembl_gene_id
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 ENSG00000179934
## 5 ENSG00000169442
## 6 ENSG00000176340
"highestvdj_muscular_vs_mock"]] <- musc_clust1 de_lst[[
<- FindMarkers(
nm_clust1 group.by=category,
cluster_scd, ident.1=highest_musc, ident.2=highest_nasal)
<- as.data.frame(nm_clust1)
nm_clust1 rownames(nm_clust1) <- toupper(rownames(nm_clust1))
<- merge(nm_clust1, brief, by="row.names",
nm_clust1 by.y="hgnc_symbol", all.x=TRUE)
head(nm_clust1)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj
## 1 ACTB 1.848e-15 0.2520 0.995 0.992 4.089e-11
## 2 AY036118 4.312e-21 0.4262 0.747 0.635 9.538e-17
## 3 CCL5 1.235e-10 -0.9465 0.055 0.117 2.731e-06
## 4 CCR7 3.328e-15 0.3414 0.798 0.688 7.363e-11
## 5 CD52 1.887e-16 0.2799 0.851 0.791 4.175e-12
## 6 DDX5 1.968e-36 0.3770 0.933 0.873 4.354e-32
## description
## 1 actin beta [Source:HGNC Symbol;Acc:HGNC:132]
## 2 <NA>
## 3 C-C motif chemokine ligand 5 [Source:HGNC Symbol;Acc:HGNC:10632]
## 4 C-C motif chemokine receptor 7 [Source:HGNC Symbol;Acc:HGNC:1608]
## 5 CD52 molecule [Source:HGNC Symbol;Acc:HGNC:1804]
## 6 DEAD-box helicase 5 [Source:HGNC Symbol;Acc:HGNC:2746]
## ensembl_gene_id
## 1 ENSG00000075624
## 2 <NA>
## 3 ENSG00000271503
## 4 ENSG00000126353
## 5 ENSG00000169442
## 6 ENSG00000108654
"highestvdj_muscular_vs_nasal"]] <- musc_clust1 de_lst[[
Conversely, we can just ask for differences among all vdj cells.
"vdj_sample"]] <- ""
cluster_scd[[<- paste0(cluster_scd@meta.data[["Idents"]], "_",
cluster_scd_vector @meta.data[["tcell"]])
cluster_scd"vdj_sample"]] <- cluster_scd_vector
cluster_scd[[
<- "mock_Yes"
v_mock <- "n_Yes"
v_nasal <- "m_Yes"
v_musc <- "vdj_sample"
category <- cluster_scd[[category]] == v_mock
mock_group sum(mock_group)
## [1] 3048
<- cluster_scd[[category]] == v_musc
musc_group sum(musc_group)
## [1] 4185
<- cluster_scd[[category]] == v_nasal
nasal_group sum(nasal_group)
## [1] 3035
<- cluster_scd
vdj_cluster Idents(vdj_cluster) <- cluster_scd[[category]]
<- NormalizeData(vdj_cluster) %>%
vdj_cluster FindVariableFeatures() %>%
ScaleData() %>%
RunPCA() %>%
FindNeighbors() %>%
FindClusters() %>%
RunTSNE() %>%
RunUMAP(reduction="pca", dims=1:10)
## Centering and scaling data matrix
## PC_ 1
## Positive: Coro1a, Rac2, Arhgdib, Laptm5, Cd3e, Selplg, Cd52, Ms4a4b, Cd3g, Ptprc
## Lat, Cd3d, Lck, Ms4a6b, Thy1, Ptprcap, Ltb, Lsp1, Itgb7, Il7r
## Fyb, Gramd3, Cd2, Cd28, Bin2, Icos, Cd69, Itgb2, Il2rb, Cotl1
## Negative: Sparc, Emp2, Cd63, Igfbp7, Ager, Gsn, Cldn18, Timp3, Selenbp1, Limch1
## Cystm1, Cst3, Gpx3, Selenbp2, Cryab, Npnt, Bcam, Bgn, Pmp22, Ces1d
## Alcam, Clic3, Apoe, Aqp5, Anxa3, Krt7, Hspb1, Col4a1, Fhl1, Serping1
## PC_ 2
## Positive: Bgn, Serping1, Plpp3, Sod3, Prelp, Rarres2, Col1a2, Sparcl1, Vim, Hsd11b1
## Olfml3, Col1a1, Loxl1, Clec3b, Tcf21, Mxra8, Ptgis, Fxyd1, Plxdc2, Spon1
## Ltbp4, Ppp1r14a, Fhl1, Lrp1, Gpx3, Col3a1, Colec12, Cdo1, Pcolce2, Pcolce
## Negative: Sftpd, Napsa, Sftpb, Slc34a2, Wfdc2, Cxcl15, Cbr2, Cldn3, Atp1b1, Sftpa1
## Muc1, Lamp3, Chil1, Ppp1r14c, Lgi3, Sftpc, Lyz2, Dram1, Ctsh, Scd1
## Hc, Car8, Egfl6, S100g, Ank3, Pi4k2b, Irx1, Lpcat1, Pla2g1b, Abca3
## PC_ 3
## Positive: Mgst1, Col6a1, Nupr1, Loxl1, Col3a1, Col1a1, Dram1, Sod3, Col1a2, Col6a2
## Ces1d, C3, Apoe, Slc34a2, Cxcl15, Lamp3, Mmp2, Chil1, Sftpb, Cd302
## Plxdc2, Sftpa1, Prelp, Lyz2, Sftpd, Ppp1r14c, Serpine2, Pdgfra, Scd1, Muc1
## Negative: Rtkn2, Scnn1g, Spock2, Igfbp2, Col4a3, Sema3e, Hopx, Pdpn, Cyp2b10, Scnn1b
## Cyp4b1, Clic5, Lama3, Flrt3, Col4a4, Tacstd2, Krt7, Lamb3, Cytl1, Sec14l3
## Tspan8, Itgb6, Cdkn2b, Krt19, Mab21l4, Tmem37, Scnn1a, Bdnf, Lmo7, Ndnf
## PC_ 4
## Positive: Cldn5, H2-Ab1, Cd74, Hpgd, Cd93, H2-Aa, H2-Eb1, Tmem252, Sema3c, Pcdh1
## Mcam, Efnb2, Dll4, Ly6c1, Sox17, Plk2, Gja4, Kdr, Stmn2, Lyve1
## Tmcc2, Sftpc, Hspa1a, Emcn, Slc34a2, Scn7a, Car4, Lamp3, Hc, Tbx3
## Negative: Fam183b, Cyp2s1, Foxj1, Tmem212, Spaca9, 1700007K13Rik, 1110017D15Rik, Gm19935, Ccdc153, Tctex1d4
## Rsph1, Arhgdig, Cfap126, BC051019, Nme5, 1700001C02Rik, Gm867, Tekt4, 1700016K19Rik, 1700094D03Rik
## 2410004P03Rik, Dnali1, Ak7, Ccdc113, Sntn, Nme9, Mns1, Pifo, AU040972, Anxa8
## PC_ 5
## Positive: Lgals1, S100a6, Col4a4, Scnn1g, Pdpn, Ndnf, Rtkn2, Birc5, Spock2, Col4a3
## Mki67, Scnn1b, Igfbp6, Tmem37, Pclaf, Flrt3, Top2a, Aqp5, Ccna2, Coro1a
## Rac2, Igfbp2, Gprc5a, Prdx6, Lgals3, Crlf1, Ube2c, Lamb3, Fads3, Rab11fip1
## Negative: Cldn5, Tm4sf1, Odf3b, Fam183b, Hpgd, Foxj1, Cd93, Tmem212, 1700007K13Rik, 1110017D15Rik
## Ccdc153, Ly6a, Gm19935, BC051019, Tctex1d4, 1700094D03Rik, Cfap126, Spaca9, 1700001C02Rik, 1700016K19Rik
## Tekt4, Gm867, Sema3c, Ak7, Dnali1, 2410004P03Rik, Sntn, Pcdh1, Nme9, Tmem252
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 39977
## Number of edges: 1236173
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9099
## Number of communities: 27
## Elapsed time: 10 seconds
## 22:39:36 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:39:36 Read 39977 rows and found 10 numeric columns
## 22:39:36 Using Annoy for neighbor search, n_neighbors = 30
## 22:39:36 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:39:43 Writing NN index file to temp file /tmp/RtmpdOLh2z/file1c779217ab51b6
## 22:39:43 Searching Annoy index using 1 thread, search_k = 3000
## 22:40:04 Annoy recall = 100%
## 22:40:07 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:40:12 Initializing from normalized Laplacian + noise (using irlba)
## 22:40:19 Commencing optimization for 200 epochs, with 1634470 positive edges
## 22:40:43 Optimization finished
<- DimPlot(vdj_cluster, reduction="umap", group.by=category, label=TRUE)
plotted plotted
DimPlot(vdj_cluster, reduction="tsne")
<- FindMarkers(
nasal_v group.by=category,
cluster_scd, ident.1=v_nasal, ident.2=v_mock)
<- as.data.frame(nasal_v)
nasal_v rownames(nasal_v) <- toupper(rownames(nasal_v))
<- merge(nasal_v, brief, by="row.names",
nasal_v by.y="hgnc_symbol", all.x=TRUE)
head(nasal_v)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj
## 1 ACVRL1 7.565e-12 -0.2994 0.086 0.142 1.674e-07
## 2 ATF3 7.323e-08 -0.3325 0.168 0.221 1.620e-03
## 3 ATP5E 9.263e-17 0.3377 0.679 0.609 2.049e-12
## 4 ATP5G1 2.458e-12 0.3140 0.569 0.496 5.437e-08
## 5 ATP5G2 7.744e-13 0.3284 0.844 0.811 1.713e-08
## 6 ATP5G3 5.355e-02 0.3318 0.771 0.758 1.000e+00
## description
## 1 activin A receptor like type 1 [Source:HGNC Symbol;Acc:HGNC:175]
## 2 activating transcription factor 3 [Source:HGNC Symbol;Acc:HGNC:785]
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
## ensembl_gene_id
## 1 ENSG00000139567
## 2 ENSG00000162772
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
"highvdj_nasal_vs_mock"]] <- nasal_v
de_lst[[
<- FindMarkers(
musc_v group.by=category,
cluster_scd, ident.1=v_musc, ident.2=v_mock)
<- as.data.frame(musc_v)
musc_v rownames(musc_v) <- toupper(rownames(musc_v))
<- merge(musc_v, brief, by="row.names",
musc_v by.y="hgnc_symbol", all.x=TRUE)
head(musc_v)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj
## 1 1810037I17RIK 5.835e-10 0.2987 0.533 0.486 1.291e-05
## 2 2410006H16RIK 1.700e-41 0.3219 0.615 0.477 3.761e-37
## 3 ACTB 7.160e-05 0.2939 0.999 1.000 1.000e+00
## 4 ACTG1 3.871e-02 0.3139 0.989 0.990 1.000e+00
## 5 AGER 6.193e-03 0.3321 0.148 0.126 1.000e+00
## 6 ALDH1A1 1.428e-04 0.2613 0.100 0.075 1.000e+00
## description
## 1 <NA>
## 2 <NA>
## 3 actin beta [Source:HGNC Symbol;Acc:HGNC:132]
## 4 actin gamma 1 [Source:HGNC Symbol;Acc:HGNC:144]
## 5 advanced glycosylation end-product specific receptor [Source:HGNC Symbol;Acc:HGNC:320]
## 6 aldehyde dehydrogenase 1 family member A1 [Source:HGNC Symbol;Acc:HGNC:402]
## ensembl_gene_id
## 1 <NA>
## 2 <NA>
## 3 ENSG00000075624
## 4 ENSG00000184009
## 5 ENSG00000204305
## 6 ENSG00000165092
"highvdj_muscular_vs_mock"]] <- musc_v
de_lst[[
<- FindMarkers(
nm_v group.by=category,
cluster_scd, ident.1=v_musc, ident.2=v_nasal)
<- as.data.frame(nm_v)
nm_v rownames(nm_v) <- toupper(rownames(nm_v))
<- merge(nm_v, brief, by="row.names",
nm_v by.y="hgnc_symbol", all.x=TRUE)
head(nm_v)
## Row.names p_val avg_log2FC pct.1 pct.2 p_val_adj
## 1 ACE 7.063e-14 0.3313 0.149 0.089 1.563e-09
## 2 ACVRL1 6.028e-15 0.2639 0.148 0.086 1.334e-10
## 3 ADAMTS1 8.105e-13 0.3077 0.116 0.067 1.793e-08
## 4 AGER 2.736e-03 0.2587 0.148 0.124 1.000e+00
## 5 ALDH2 7.980e-18 0.2702 0.218 0.137 1.765e-13
## 6 ANXA5 2.151e-09 0.3101 0.401 0.342 4.759e-05
## description
## 1 angiotensin I converting enzyme [Source:HGNC Symbol;Acc:HGNC:2707]
## 2 activin A receptor like type 1 [Source:HGNC Symbol;Acc:HGNC:175]
## 3 ADAM metallopeptidase with thrombospondin type 1 motif 1 [Source:HGNC Symbol;Acc:HGNC:217]
## 4 advanced glycosylation end-product specific receptor [Source:HGNC Symbol;Acc:HGNC:320]
## 5 aldehyde dehydrogenase 2 family member [Source:HGNC Symbol;Acc:HGNC:404]
## 6 annexin A5 [Source:HGNC Symbol;Acc:HGNC:543]
## ensembl_gene_id
## 1 ENSG00000159640
## 2 ENSG00000139567
## 3 ENSG00000154734
## 4 ENSG00000204305
## 5 ENSG00000111275
## 6 ENSG00000164111
"highvdj_muscular_vs_nasal"]] <- musc_v de_lst[[
<- write_xlsx(
written
de_lst,excel=glue::glue("excel/scd_de_results-v{ver}.xlsx"))
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## [1] "vdj_vs_all"
## [1] "vdj_vs_all"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all"
## [1] "vdj_vs_all" "samples_vs_all"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all" "samplecluster_vs_all"
## [1] "vdj_vs_all" "samples_vs_all" "samplecluster_vs_all"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all" "samplecluster_vs_all"
## [4] "highvdjcluster_vs_all"
## [1] "vdj_vs_all" "samples_vs_all" "samplecluster_vs_all"
## [4] "highvdjcluster_vs_all"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock"
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## [7] "highestvdj_muscular_vs_nasal"
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## [7] "highestvdj_muscular_vs_nasal"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## [7] "highestvdj_muscular_vs_nasal" "highvdj_nasal_vs_mock"
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## [7] "highestvdj_muscular_vs_nasal" "highvdj_nasal_vs_mock"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## [7] "highestvdj_muscular_vs_nasal" "highvdj_nasal_vs_mock"
## [9] "highvdj_muscular_vs_mock"
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## [7] "highestvdj_muscular_vs_nasal" "highvdj_nasal_vs_mock"
## [9] "highvdj_muscular_vs_mock"
## Deleting the file excel/scd_de_results-v202302.xlsx before writing the tables.
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## Warning in .self$setColWidths(i): NAs introduced by coercion
## [1] "vdj_vs_all" "samples_vs_all"
## [3] "samplecluster_vs_all" "highvdjcluster_vs_all"
## [5] "highestvdj_nasal_vs_mock" "highestvdj_muscular_vs_mock"
## [7] "highestvdj_muscular_vs_nasal" "highvdj_nasal_vs_mock"
## [9] "highvdj_muscular_vs_mock" "highvdj_muscular_vs_nasal"
Ok, I finally figured out its logic, I weirdly assumed it is doing something different than the various ways I was concatenating clusters/samples and using FindMarkers() to compare them. Instead, this logic of this function is very similar to my own all_pairwise(); except it is performing the pairwise comparison of subsets of the primary cell identities.
DefaultAssay(filt_scd) <- "RNA"
<- FindConservedMarkers(
conserved_markers ident.1="control",
filt_scd, grouping.var="res0p2_clusters", only.pos=TRUE,
verbose=TRUE)
## Testing group 12: (control) vs (mock, m, n)
## Testing group 11: (control) vs (mock, m, n)
## Testing group 10: (control) vs (mock, m, n)
## Testing group 9: (control) vs (mock, m, n)
## Testing group 8: (control) vs (mock, m, n)
## Testing group 7: (control) vs (mock, m, n)
## Testing group 6: (control) vs (mock, m, n)
## Testing group 5: (control) vs (mock, m, n)
## Testing group 4: (control) vs (mock, m, n)
## Testing group 3: (control) vs (mock, m, n)
## Testing group 2: (control) vs (mock, m, n)
## Testing group 1: (control) vs (mock, m, n)
## Testing group 0: (control) vs (mock, m, n)
<- FindMarkers(cluster_scd, group.by="Idents",
mock_vs_control ident.1="mock", ident.2="control")
head(mock_vs_control)
<- FindMarkers(cluster_scd, group.by="Idents",
muscular_vs_mock ident.1="m", ident.2="mock")
summary(muscular_vs_mock)
<- FindMarkers(cluster_scd, group.by="Idents",
nasal_vs_mock min.pct=0.25, ident.1="n",
ident.2="mock")
summary(nasal_vs_mock)
FeaturePlot(cluster_scd, features=c("Rgcc"),
split.by="orig.ident", max.cutoff=3,
cols=c("darkgreen", "darkred"))
First load the mSigDB data for mouse cell types.
<- load_gmt_signatures(signatures="reference/m8.all.v2022.1.Mm.symbols.gmt")
broad_types <- list()
broad_list for (i in names(broad_types)) {
<- geneIds(broad_types[[i]])
broad_list[[i]]
}
<- grepl(x=names(broad_list), pattern="^DESC")
descartes_idx <- broad_list[descartes_idx]
descartes_list names(descartes_list) <- gsub(x=names(descartes_list),
pattern="^DESCARTES_ORGANOGENESIS_",
replacement="")
<- grepl(x=names(broad_list), pattern="^TAB")
senis_idx <- broad_list[senis_idx]
senis_list names(senis_list) <- gsub(x=names(senis_list),
pattern = "^(TABULA_MURIS_SENIS)(_)*",
replacement = "")
<- suppressWarnings(AddModuleScore(object=cluster_scd, features=descartes_list,
cluster_scd name="descartes"))
<- suppressWarnings(AddModuleScore(object=cluster_scd, features=senis_list,
cluster_scd name="senis"))
<- summarize_scd_clusters(cluster_scd, real_column_names=names(descartes_list),
cluster_heats cluster_column="res0p2_clusters", abbreviate=FALSE,
column_prefix="descartes")
::pheatmap(cluster_heats$z_df) pheatmap
<- summarize_scd_clusters(cluster_scd, real_column_names=names(senis_list),
cluster_heats cluster_column="res0p2_clusters", abbreviate=TRUE,
column_prefix="senis")
::pheatmap(cluster_heats$z_df) pheatmap
DimPlot(fnorm_scd, label=TRUE)
<- c(151, 152, 153)
chosen_numbers <- names(senis_list)[chosen_numbers]
chosen_names chosen_names
## [1] "SPLEEN_B_CELL_AGEING"
## [2] "SPLEEN_CD4_POSITIVE_ALPHA_BETA_T_CELL_AGEING"
## [3] "SPLEEN_CD8_POSITIVE_ALPHA_BETA_T_CELL_AGEING"
<- paste0("senis", chosen_numbers)
chosen_columns VlnPlot(cluster_scd, features=chosen_columns, pt.size=0, group.by="res0p2_clusters")
<- c(154, 155, 156)
chosen_numbers <- names(senis_list)[chosen_numbers]
chosen_names chosen_names
## [1] "SPLEEN_NK_CELL_AGEING" "SPLEEN_T_CELL_AGEING"
## [3] "SPLEEN_GRANULOCYTE_AGEING"
<- paste0("senis", chosen_numbers)
chosen_columns VlnPlot(cluster_scd, features=chosen_columns, pt.size=0, group.by="res0p2_clusters")
<- c(70, 80)
chosen_numbers <- names(senis_list)[chosen_numbers]
chosen_names chosen_names
## [1] "LIMB_MUSCLE_MACROPHAGE_AGEING"
## [2] "LUNG_CD4_POSITIVE_ALPHA_BETA_T_CELL_AGEING"
<- paste0("senis", chosen_numbers)
chosen_columns VlnPlot(cluster_scd, features=chosen_columns, pt.size=0, group.by="res0p2_clusters")
<- c(81, 88)
chosen_numbers <- names(senis_list)[chosen_numbers]
chosen_names chosen_names
## [1] "LUNG_CD8_POSITIVE_ALPHA_BETA_T_CELL_AGEING"
## [2] "LUNG_ENDOTHELIAL_CELL_OF_LYMPHATIC_VESSEL_AGEING"
<- paste0("senis", chosen_numbers)
chosen_columns VlnPlot(cluster_scd, features=chosen_columns, pt.size=0, group.by="res0p2_clusters")
<- c(93)
chosen_numbers <- names(senis_list)[chosen_numbers]
chosen_names chosen_names
## [1] "LUNG_NEUTROPHIL_AGEING"
<- paste0("senis", chosen_numbers)
chosen_columns VlnPlot(cluster_scd, features=chosen_columns, pt.size=0, group.by="res0p2_clusters")
<- c(2,9,20)
chosen_numbers <- names(descartes_list)[chosen_numbers]
chosen_names chosen_names
## [1] "ENDOTHELIAL_CELLS" "WHITE_BLOOD_CELLS" "STROMAL_CELLS"
<- paste0("descartes", chosen_numbers)
chosen_columns VlnPlot(cluster_scd, features=chosen_columns, pt.size=0, group.by="res0p2_clusters")
<- c(31,36)
chosen_numbers <- names(descartes_list)[chosen_numbers]
chosen_names chosen_names
## [1] "INHIBITORY_INTERNEURONS" "NUETROPHILS"
<- paste0("descartes", chosen_numbers)
chosen_columns VlnPlot(cluster_scd, features=chosen_columns, pt.size=0, group.by="res0p2_clusters")
<- cluster_scd
cluster2_sample_scd
Idents(cluster2_sample_scd) <- cluster_scd[["cluster2_sample"]]
<- suppressWarnings(AddModuleScore(object=cluster2_sample_scd, features=descartes_list,
cluster2_sample_scd name="descartes"))
<- suppressWarnings(AddModuleScore(object=cluster2_sample_scd, features=senis_list,
cluster2_sample_scd name="senis"))
<- summarize_scd_clusters(cluster2_sample_scd, real_column_names=names(descartes_list),
cluster2_heats cluster_column="cluster2_sample", abbreviate=FALSE,
column_prefix="descartes")
::pheatmap(cluster2_heats$z_df) pheatmap
::pander(sessionInfo()) pander
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: dplyr(v.1.1.0), future(v.1.31.0), tibble(v.3.1.8), GSEABase(v.1.60.0), graph(v.1.76.0), annotate(v.1.76.0), XML(v.3.99-0.13), AnnotationDbi(v.1.60.0), skimr(v.2.1.5), purrr(v.1.0.1), ggplot2(v.3.4.1), SeuratObject(v.4.1.3), Seurat(v.4.3.0), hpgltools(v.1.0), testthat(v.3.1.6), reticulate(v.1.28), SummarizedExperiment(v.1.28.0), GenomicRanges(v.1.50.2), GenomeInfoDb(v.1.34.9), IRanges(v.2.32.0), S4Vectors(v.0.36.1), MatrixGenerics(v.1.10.0), matrixStats(v.0.63.0), Biobase(v.2.58.0) and BiocGenerics(v.0.44.0)
loaded via a namespace (and not attached): ica(v.1.0-3), ps(v.1.7.2), Rsamtools(v.2.14.0), foreach(v.1.5.2), lmtest(v.0.9-40), rprojroot(v.2.0.3), crayon(v.1.5.2), rbibutils(v.2.2.13), MASS(v.7.3-58.2), nlme(v.3.1-162), backports(v.1.4.1), sva(v.3.46.0), GOSemSim(v.2.24.0), rlang(v.1.0.6), XVector(v.0.38.0), HDO.db(v.0.99.1), ROCR(v.1.0-11), irlba(v.2.3.5.1), nloptr(v.2.0.3), callr(v.3.7.3), limma(v.3.54.1), filelock(v.1.0.2), BiocParallel(v.1.32.5), rjson(v.0.2.21), bit64(v.4.0.5), glue(v.1.6.2), pheatmap(v.1.0.12), sctransform(v.0.3.5), vipor(v.0.4.5), pbkrtest(v.0.5.2), parallel(v.4.2.0), processx(v.3.8.0), spatstat.sparse(v.3.0-0), DOSE(v.3.24.2), spatstat.geom(v.3.0-6), tidyselect(v.1.2.0), usethis(v.2.1.6), fitdistrplus(v.1.1-8), variancePartition(v.1.28.4), tidyr(v.1.3.0), zoo(v.1.8-11), qqconf(v.1.3.1), GenomicAlignments(v.1.34.0), xtable(v.1.8-4), magrittr(v.2.0.3), evaluate(v.0.20), Rdpack(v.2.4), cli(v.3.6.0), zlibbioc(v.1.44.0), sn(v.2.1.0), rstudioapi(v.0.14), miniUI(v.0.1.1.1), sp(v.1.6-0), bslib(v.0.4.2), mathjaxr(v.1.6-0), fastmatch(v.1.1-3), aod(v.1.3.2), treeio(v.1.22.0), shiny(v.1.7.4), xfun(v.0.37), multtest(v.2.54.0), pkgbuild(v.1.4.0), gson(v.0.0.9), cluster(v.2.1.4), caTools(v.1.18.2), tidygraph(v.1.2.3), KEGGREST(v.1.38.0), clusterGeneration(v.1.3.7), ggrepel(v.0.9.3), ape(v.5.6-2), listenv(v.0.9.0), Biostrings(v.2.66.0), png(v.0.1-8), withr(v.2.5.0), bitops(v.1.0-7), ggforce(v.0.4.1), plyr(v.1.8.8), pillar(v.1.8.1), gplots(v.3.1.3), cachem(v.1.0.6), GenomicFeatures(v.1.50.4), multcomp(v.1.4-22), fs(v.1.6.1), clusterProfiler(v.4.6.0), RUnit(v.0.4.32), vctrs(v.0.5.2), ellipsis(v.0.3.2), generics(v.0.1.3), devtools(v.2.4.5), metap(v.1.8), tools(v.4.2.0), remaCor(v.0.0.11), beeswarm(v.0.4.0), munsell(v.0.5.0), tweenr(v.2.0.2), fgsea(v.1.24.0), DelayedArray(v.0.24.0), fastmap(v.1.1.0), compiler(v.4.2.0), pkgload(v.1.3.2), abind(v.1.4-5), httpuv(v.1.6.8), rtracklayer(v.1.58.0), sessioninfo(v.1.2.2), plotly(v.4.10.1), GenomeInfoDbData(v.1.2.9), gridExtra(v.2.3), edgeR(v.3.40.2), lattice(v.0.20-45), deldir(v.1.0-6), mutoss(v.0.1-12), utf8(v.1.2.3), later(v.1.3.0), BiocFileCache(v.2.6.0), jsonlite(v.1.8.4), scales(v.1.2.1), tidytree(v.0.4.2), pbapply(v.1.7-0), genefilter(v.1.80.3), lazyeval(v.0.2.2), promises(v.1.2.0.1), doParallel(v.1.0.17), R.utils(v.2.12.2), goftest(v.1.2-3), spatstat.utils(v.3.0-1), sandwich(v.3.0-2), rmarkdown(v.2.20), openxlsx(v.4.2.5.2), cowplot(v.1.1.1), Rtsne(v.0.16), pander(v.0.6.5), downloader(v.0.4), uwot(v.0.1.14), igraph(v.1.4.0), plotrix(v.3.8-2), numDeriv(v.2016.8-1.1), survival(v.3.5-3), yaml(v.2.3.7), htmltools(v.0.5.4), memoise(v.2.0.1), profvis(v.0.3.7), BiocIO(v.1.8.0), locfit(v.1.5-9.7), graphlayouts(v.0.8.4), viridisLite(v.0.4.1), digest(v.0.6.31), assertthat(v.0.2.1), RhpcBLASctl(v.0.23-42), mime(v.0.12), rappdirs(v.0.3.3), repr(v.1.1.6), RSQLite(v.2.2.20), yulab.utils(v.0.0.6), future.apply(v.1.10.0), remotes(v.2.4.2), data.table(v.1.14.6), urlchecker(v.1.0.1), blob(v.1.2.3), R.oo(v.1.25.0), labeling(v.0.4.2), splines(v.4.2.0), RCurl(v.1.98-1.10), broom(v.1.0.3), hms(v.1.1.2), colorspace(v.2.1-0), base64enc(v.0.1-3), mnormt(v.2.1.1), ggbeeswarm(v.0.7.1), aplot(v.0.1.9), ggrastr(v.1.0.1), sass(v.0.4.5), Rcpp(v.1.0.10), RANN(v.2.6.1), mvtnorm(v.1.1-3), enrichplot(v.1.18.3), fansi(v.1.0.4), tzdb(v.0.3.0), brio(v.1.1.3), parallelly(v.1.34.0), R6(v.2.5.1), grid(v.4.2.0), ggridges(v.0.5.4), lifecycle(v.1.0.3), TFisher(v.0.2.0), zip(v.2.2.2), curl(v.5.0.0), minqa(v.1.2.5), leiden(v.0.4.3), jquerylib(v.0.1.4), PROPER(v.1.30.0), Matrix(v.1.5-3), qvalue(v.2.30.0), TH.data(v.1.1-1), desc(v.1.4.2), RcppAnnoy(v.0.0.20), RColorBrewer(v.1.1-3), iterators(v.1.0.14), spatstat.explore(v.3.0-6), stringr(v.1.5.0), htmlwidgets(v.1.6.1), polyclip(v.1.10-4), biomaRt(v.2.54.0), shadowtext(v.0.1.2), gridGraphics(v.0.5-1), mgcv(v.1.8-41), globals(v.0.16.2), patchwork(v.1.1.2), spatstat.random(v.3.1-3), progressr(v.0.13.0), codetools(v.0.2-19), GO.db(v.3.16.0), gtools(v.3.9.4), prettyunits(v.1.1.1), dbplyr(v.2.3.0), R.methodsS3(v.1.8.2), gtable(v.0.3.1), DBI(v.1.1.3), highr(v.0.10), ggfun(v.0.0.9), tensor(v.1.5), httr(v.1.4.4), KernSmooth(v.2.23-20), vroom(v.1.6.1), stringi(v.1.7.12), progress(v.1.2.2), reshape2(v.1.4.4), farver(v.2.1.1), viridis(v.0.6.2), ggtree(v.3.6.2), xml2(v.1.3.3), boot(v.1.3-28.1), lme4(v.1.1-31), restfulr(v.0.0.15), readr(v.2.1.4), ggplotify(v.0.1.0), scattermore(v.0.8), bit(v.4.0.5), scatterpie(v.0.1.8), spatstat.data(v.3.0-0), ggraph(v.2.1.0), pkgconfig(v.2.0.3) and knitr(v.1.42)
message(paste0("This is hpgltools commit: ", get_git_commit()))
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset c100d5666f5032d24c933739015d267ef651c323
## This is hpgltools commit: Wed Mar 1 09:50:14 2023 -0500: c100d5666f5032d24c933739015d267ef651c323
<- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
this_save ##message(paste0("Saving to ", this_save))
##tmp <- sm(saveme(filename=this_save))