1 Preprocessing with cellranger

I downloaded a new version of cellranger along with the various reference files provided by 10x for the VD(J) references etc. I got a bit distracted by the pipeline language implemented by 10x called ‘martian’. I have the feeling that it might prove a good thing to play with.

Here are the commands I ran to separate the samples and perform the alignments. There are 4 sample names and each was done with one run of the ‘normal’ GEX scRNASeq method and one of the (new to me) V(D)J library.

The sample names are below, but the types are GEX

module add cellranger
names="A_Control A_IM_Mex09 A_IN_Mex09 A_Mock_Mex09"

for name in names; do
    echo $name
    cellranger count --fastqs=preprocessing \
               --transcriptome=/sw/local/cellranger/202212/reference/refdata-gex-mm10-2020-A \
               --id="${name}_GEX" --sample="${name}_GEX"
    cellranger vdj --fastqs preprocessing/ \
               --reference /sw/local/cellranger/202212/reference/refdata-cellranger-vdj-GRCm38-alts-ensembl-7.0.0/ \
               --id="${name}_VDJ" --sample="${name}_VDJ"
done

2 Rerun the pipeline with multi

April kindly sent some information from 10x which shows that I should have used the multi pipline when preprocessing the data.

Intra-Muscular vs. Nasal

I wrote 4 separate configuration csv files using the templates I downloaded and following a little reading. It seemed to me that I should be able to process them all as a single csv file, but when I attempted that, cellranger did not react well. It also took a few tries before I got the various reference/library options correct.

Note that once cellranger successfully ran for the samples I moved them to the multi/ directory so that I can compare the outputs to when I simply did the ‘count’ operation.

The following invocations of cellranger all appear to work without any problems. Ideally I would like them to be done in a single run, though.

module add cellranger
cellranger multi --id control --csv sample_sheets/multi_config_try03_control.csv
cellranger multi --id mock --csv sample_sheets/multi_config_try03_mock.csv
cellranger multi --id m_vaccine --csv sample_sheets/multi_config_try03_m.csv
cellranger multi --id n_vaccine --csv sample_sheets/multi_config_try03_n.csv
mv control mock m_vaccine n_vaccine multi/

The following command, which in theory should do the preprocessing of all samples together, fails. It seems my understanding of the multi csv input is incorrect.

module add cellranger
rm -rf run01 && cellranger multi --id run01 --csv sample_sheets/multi_config_try04_all.csv
## Error in running command bash
read.csv("sample_sheets/multi_config_try04_all.csv", comment.char="#")
##           X.gene.expression.
## 1                  reference
## 2       no-target-umi-filter
## 3                  chemistry
## 4               expect-cells
## 5            include-introns
## 6               no-secondary
## 7                     no-bam
## 8  min-assignment-confidence
## 9                           
## 10                          
## 11                          
## 12                          
## 13                          
## 14                          
## 15                          
## 16                 [feature]
## 17                          
## 18                          
## 19                          
## 20                          
## 21                     [vdj]
## 22                 reference
## 23                          
## 24                          
## 25                          
## 26                          
## 27                          
## 28                          
## 29                          
## 30                          
## 31               [libraries]
## 32                  fastq_id
## 33             feature_types
## 34                          
## 35                          
## 36             A_Control_GEX
## 37           gene expression
## 38                          
## 39                          
## 40          A_Mock_Mex09_GEX
## 41           gene expression
## 42                          
## 43                          
## 44            A_IM_Mex09_GEX
## 45           gene expression
## 46                          
## 47                          
## 48            A_IN_Mex09_GEX
## 49           gene expression
## 50                          
## 51                          
## 52             A_Control_VDJ
## 53                       vdj
## 54                          
## 55                          
## 56          A_Mock_Mex09_VDJ
## 57                       vdj
## 58                          
## 59                          
## 60            A_IM_Mex09_VDJ
## 61                       vdj
## 62                          
## 63                          
## 64            A_IN_Mex09_VDJ
## 65                       vdj
## 66                          
## 67                          
##                                                                                          X
## 1                            /sw/local/cellranger/202212/reference/refdata-gex-mm10-2020-A
## 2                                                                                    false
## 3                                                                                     auto
## 4                                                                                    10000
## 5                                                                                    false
## 6                                                                                    false
## 7                                                                                    false
## 8                                                                                      0.9
## 9                                                                                         
## 10                                                                                        
## 11                                                                                        
## 12                                                                                        
## 13                                                                                        
## 14                                                                                        
## 15                                                                                        
## 16                                                                                        
## 17                                                                                        
## 18                                                                                        
## 19                                                                                        
## 20                                                                                        
## 21                                                                                        
## 22 /sw/local/cellranger/202212/reference/refdata-cellranger-vdj-GRCm38-alts-ensembl-7.0.0/
## 23                                                                                        
## 24                                                                                        
## 25                                                                                        
## 26                                                                                        
## 27                                                                                        
## 28                                                                                        
## 29                                                                                        
## 30                                                                                        
## 31                                                                                        
## 32                                                                                  fastqs
## 33                                                                          subsample_rate
## 34                                                                                        
## 35                                                                                        
## 36                                                                           preprocessing
## 37                                                                                       1
## 38                                                                                        
## 39                                                                                        
## 40                                                                           preprocessing
## 41                                                                                       1
## 42                                                                                        
## 43                                                                                        
## 44                                                                           preprocessing
## 45                                                                                       1
## 46                                                                                        
## 47                                                                                        
## 48                                                                           preprocessing
## 49                                                                                       1
## 50                                                                                        
## 51                                                                                        
## 52                                                                           preprocessing
## 53                                                                                       1
## 54                                                                                        
## 55                                                                                        
## 56                                                                           preprocessing
## 57                                                                                       1
## 58                                                                                        
## 59                                                                                        
## 60                                                                           preprocessing
## 61                                                                                       1
## 62                                                                                        
## 63                                                                                        
## 64                                                                           preprocessing
## 65                                                                                       1
## 66                                                                                        
## 67                                                                                        
##                                                                                                                                                                               X.1
## 1                                                                                                                                                                                
## 2                                                                                                                                                                                
## 3                                                                                                                                                                                
## 4                                                                                                                                                                                
## 5                                                                                                                                                                                
## 6                                                                                                                                                                                
## 7                                                                                                                                                                                
## 8  # The minimum estimated likelihood to call a sample as tagged with a Cell Multiplexing Oligo instead of Unassigned". Optional. Default: 0.9. Introduced in Cell Ranger 6.0.2."
## 9                                                                                                                                                                                
## 10                                                                                                                                                                               
## 11                                                                                                                                                                               
## 12                                                                                                                                                                               
## 13                                                                                                                                                                               
## 14                                                                                                                                                                               
## 15                                                                                                                                                                               
## 16                                                                                                                                                                               
## 17                                                                                                                                                                               
## 18                                                                                                                                                                               
## 19                                                                                                                                                                               
## 20                                                                                                                                                                               
## 21                                                                                                                                                                               
## 22                                                                                                                                                                               
## 23                                                                                                                                                                               
## 24                                                                                                                                                                               
## 25                                                                                                                                                                               
## 26                                                                                                                                                                               
## 27                                                                                                                                                                               
## 28                                                                                                                                                                               
## 29                                                                                                                                                                               
## 30                                                                                                                                                                               
## 31                                                                                                                                                                               
## 32                                                                                                                                                            physical_library_id
## 33                                                                                                                                                                               
## 34                                                                                                                                                                               
## 35                                                                                                                                                                               
## 36                                                                                                                                                                            gex
## 37                                                                                                                                                                               
## 38                                                                                                                                                                               
## 39                                                                                                                                                                               
## 40                                                                                                                                                                            gex
## 41                                                                                                                                                                               
## 42                                                                                                                                                                               
## 43                                                                                                                                                                               
## 44                                                                                                                                                                            gex
## 45                                                                                                                                                                               
## 46                                                                                                                                                                               
## 47                                                                                                                                                                               
## 48                                                                                                                                                                            gex
## 49                                                                                                                                                                               
## 50                                                                                                                                                                               
## 51                                                                                                                                                                               
## 52                                                                                                                                                                            vdj
## 53                                                                                                                                                                               
## 54                                                                                                                                                                               
## 55                                                                                                                                                                               
## 56                                                                                                                                                                            vdj
## 57                                                                                                                                                                               
## 58                                                                                                                                                                               
## 59                                                                                                                                                                               
## 60                                                                                                                                                                            vdj
## 61                                                                                                                                                                               
## 62                                                                                                                                                                               
## 63                                                                                                                                                                               
## 64                                                                                                                                                                            vdj
## 65                                                                                                                                                                               
## 66                                                                                                                                                                               
## 67

3 Some stolen code

Here is a snippet of code copy-pasted from:

https://ucdavis-bioinformatics-training.github.io/2020-Advanced_Single_Cell_RNA_Seq/data_analysis/VDJ_Analysis_fixed

which, if I read it correctly, will read in the vdj output and add it as a set of annotations to the seurat object. Note, I made some changes to it because I simply cannot help myself.

add_clonotype <- function(seurat_obj, name = "control", type = "t") {
  vdj_directory <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/vdj_t")
  vdj_csv <- glue::glue("{vdj_directory}/filtered_contig_annotations.csv")
  reference_csv <- glue::glue("{vdj_directory}/clonotypes.csv")
  tcr <- readr::read_csv(vdj_csv, show_col_types = FALSE)
  ref <- readr::read_csv(reference_csv, show_col_types = FALSE)

  tcr_duplicate_barcode_idx <- duplicated(tcr[["barcode"]])
  tcr_nodup <- tcr[!tcr_duplicate_barcode_idx, ]

  both <- as.data.frame(merge(tcr_nodup, ref, by.x = "raw_clonotype_id", by.y = "clonotype_id"))
  rownames(both) <- both[["barcode"]]
  both[["barcode"]] <- NULL

  new <- Seurat::AddMetaData(object = seurat_obj, metadata = both)
  return(new)
}

4 Load the data into Seurat and poke at it

The following block is mostly a cut/paste of itself where I set the (over)simplified name of each sample. This then becomes the template for the path and parameters used to read the data, create a seurat object, and add the clonotype data from the vdj run.

name <- "control"
path <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
a_control <- Seurat::Read10X(path) %>%
  CreateSeuratObject(project = name) %>%
  add_clonotype(name = name)

name <- "mock"
path <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
a_mock <- Seurat::Read10X(path) %>%
  CreateSeuratObject(project = name) %>%
  add_clonotype(name = name)

name <- "m"
path <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
a_m <- Seurat::Read10X(path) %>%
  CreateSeuratObject(project = name) %>%
  add_clonotype(name = name)

name <- "n"
path <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
a_n <- Seurat::Read10X(path) %>%
  CreateSeuratObject(project = name) %>%
  add_clonotype(name = name)

For the moment I want to be able to play with the individual samples as well as the aggregate so that I can better understand the data. So I guess it works out that I didn’t figure out how to run all the samples at the same time via ‘cellranger multi’.

all <- merge(a_control, a_mock)  %>%
  merge(a_m) %>%
  merge(a_n)
## Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
## duplicated across objects provided. Renaming to enforce unique cell names.

## Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
## duplicated across objects provided. Renaming to enforce unique cell names.

5 Query for clonotypes in the individual samples and the full dataset.

control_clono <- !is.na(a_control[["raw_clonotype_id"]])
summary(control_clono)
##  raw_clonotype_id
##  Mode :logical   
##  FALSE:12601     
##  TRUE :1937
mock_clono <- !is.na(a_mock[["raw_clonotype_id"]])
summary(mock_clono)
##  raw_clonotype_id
##  Mode :logical   
##  FALSE:10480     
##  TRUE :3014
m_clono <- !is.na(a_m[["raw_clonotype_id"]])
summary(m_clono)
##  raw_clonotype_id
##  Mode :logical   
##  FALSE:9256      
##  TRUE :4113
n_clono <- !is.na(a_n[["raw_clonotype_id"]])
summary(n_clono)
##  raw_clonotype_id
##  Mode :logical   
##  FALSE:5412      
##  TRUE :3033

6 Initial Clusters

cluster_letters <- as.factor(LETTERS[Idents(object=all)])
names(cluster_letters) <- colnames(x=all)
control <- as.character(cluster_letters)
mock <- control
m_vaccine <- control
n_vaccine <- control

control_idx <- control == "A"
control[control_idx] <- "Control"
control[!control_idx] <- "Stimulated"
mock_idx <- mock == "B"
mock[mock_idx] <- "Mock"
mock[!mock_idx] <- "Not mock"
m_vaccine_idx <- m_vaccine == "C"
m_vaccine[m_vaccine_idx] <- "M"
m_vaccine[!m_vaccine_idx] <- "Not M"
n_vaccine_idx <- n_vaccine == "D"
n_vaccine[n_vaccine_idx] <- "N"
n_vaccine[!n_vaccine_idx] <- "Not N"

all <- AddMetaData(
    object=all,
    metadata=cluster_letters,
    col.name="cluster_letters")
all <- AddMetaData(
    object=all,
    metadata=control,
    col.name="control")
all <- AddMetaData(
    object=all,
    metadata=mock,
    col.name="mock")
all <- AddMetaData(
    object=all,
    metadata=m_vaccine,
    col.name="M")
all <- AddMetaData(
    object=all,
    metadata=n_vaccine,
    col.name="N")

7 Filters and QC

all[["percent_mt"]] <- PercentageFeatureSet(all, pattern="^mt-")
all[["percent_ribo"]] <- PercentageFeatureSet(all, pattern="^Rp[sl]")
VlnPlot(all, features="nFeature_RNA", pt.size=0)

VlnPlot(all, features="percent_mt", pt.size=0)

VlnPlot(all, features="percent_ribo", pt.size=0)

VlnPlot(all, features="nCount_RNA", pt.size=0)

FeatureScatter(all, "percent_ribo", "percent_mt")

FeatureScatter(all, "nCount_RNA", "nFeature_RNA")

FeatureScatter(all, "nCount_RNA", "nFeature_RNA")

## How many cells have specific chains associated with them
sum(!is.na(all$chain))
## [1] 12097

8 Filter high mitochondrial and ribosomal cells

ncol(all)
## [1] 49846
## Get a summary of the counts / cell
summary(colSums(all@assays$RNA@counts))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     500    1301    2210    3421    4576   37440
dim(all)
## [1] 32285 49846
sufficient_rna_observed <- WhichCells(all, expression = nFeature_RNA > 200)
filt <- subset(all, cells = sufficient_rna_observed)
dim(filt)
## [1] 32285 49838
sufficiently_observed_idx <- rowSums(all) > 3
summary(sufficiently_observed_idx)
##    Mode   FALSE    TRUE 
## logical   12113   20172
filt <- subset(filt, features = rownames(all)[sufficiently_observed_idx])
dim(filt)
## [1] 20172 49838
## Keep cells with at least some ribosomal reads
## Note the Percent function above actually puts in a floating point
## number from 0-100, not (as I assumed from 0-1).
high_ribosomal <- WhichCells(filt, expression = percent_ribo > 5)
filt <- subset(filt, cells = high_ribosomal)
dim(filt)
## [1] 20172 37946
low_mitochondrial <- WhichCells(filt, expression = percent_mt < 20)
filt <- subset(filt, cells = low_mitochondrial)
dim(filt)
## [1] 20172 37929
## Now drop mitochondrial genes
mt_idx <- grepl("^mt-", rownames(filt))
summary(mt_idx)
##    Mode   FALSE    TRUE 
## logical   20159      13
filt <- filt[!mt_idx, ]
dim(filt)
## [1] 20159 37929

9 Distribution

9.1 Before filtering

all_norm <- NormalizeData(object=all) %>%
  FindVariableFeatures() %>%
  ScaleData() %>%
  RunPCA() %>%
  FindNeighbors() %>%
  FindClusters() %>%
  RunTSNE() %>%
  RunUMAP(reduction = "pca", dims = 1:10)
## Centering and scaling data matrix
## PC_ 1 
## Positive:  Ager, Sparc, Mettl7a1, Cldn18, Emp2, Cd63, Cystm1, Aqp5, Clic3, Krt7 
##     Gsn, Limch1, Cryab, Alcam, Selenbp1, Scnn1a, Selenbp2, Timp3, Krt18, Krt19 
##     Sec14l3, Gprc5a, Cyp2b10, Krt8, Igfbp6, Igfbp7, Mal2, Myh14, Npnt, Scd2 
## Negative:  Rac2, Coro1a, Arhgdib, Ms4a4b, Laptm5, Cd3e, Srgn, Selplg, Cd52, Cd3g 
##     Cd3d, Ptprc, Lat, Thy1, Ltb, Vps37b, Lck, Ms4a6b, Gimap3, Ptprcap 
##     Il7r, Sept1, Itgb7, Gramd3, Lsp1, Skap1, Cd53, Cytip, Cd2, Fyb 
## PC_ 2 
## Positive:  Wfdc2, Atp1b1, Cbr2, Ctsh, Sftpd, Napsa, Cldn3, Sftpb, Cxcl15, Slc34a2 
##     Sftpa1, Chil1, Muc1, Lyz2, Sftpc, Lamp3, Chchd10, Ppp1r14c, Lgi3, Krt18 
##     Scd1, Cldn7, Ptprf, Car8, Irx1, Baiap2l1, Hc, Epcam, Krt8, Dram1 
## Negative:  Sod3, Serping1, Bgn, Prelp, Plpp3, Olfml3, Tcf21, Spon1, Hsd11b1, Fn1 
##     Rarres2, Loxl1, Mfap4, Plxdc2, Gpx3, Cdo1, Itga8, Pdgfra, Clec3b, Dpep1 
##     Col1a2, Pcolce2, Ptgis, Gyg, Col1a1, Inmt, Ogn, Fhl1, Fmo2, Lrp1 
## PC_ 3 
## Positive:  Rtkn2, Spock2, Scnn1g, Igfbp2, Pdpn, Col4a3, Sema3e, Cyp2b10, Scnn1b, Col4a4 
##     Flrt3, Hopx, Cytl1, Lama3, Tacstd2, Ndnf, Tmem37, Itgb6, Krt7, Krt19 
##     Lamb3, Ptpre, Bdnf, Cyp4b1, Clic5, Crlf1, Cdkn2b, Mab21l4, Scnn1a, Hs2st1 
## Negative:  Cbr2, Cxcl15, Slc34a2, Sftpb, Sftpd, Chil1, Sftpa1, Cldn3, Napsa, Dram1 
##     Lyz2, Lamp3, Sftpc, Muc1, Ppp1r14c, Scd1, Mgst1, Hc, Lgi3, Car8 
##     Nupr1, Fabp5, S100g, Ctsc, Egfl6, Hp, Cpm, Atp1b1, Fasn, Chia1 
## PC_ 4 
## Positive:  Cd74, H2-Ab1, H2-Aa, Slc34a2, Sftpc, H2-Eb1, Napsa, Sftpa1, Sftpb, Lamp3 
##     Cxcl15, Lyz2, Hc, Chil1, Lgi3, Scd1, Sftpd, Lpcat1, Cd93, Dram1 
##     S100g, Ly6c1, Egfl6, Tmem252, Sema3c, Pla2g1b, Etv5, Car4, Chia1, Ctsc 
## Negative:  Fam183b, Foxj1, Tmem212, Cyp2s1, 1110017D15Rik, Gm19935, 1700007K13Rik, Ccdc153, Cfap126, Tctex1d4 
##     Spaca9, 1700094D03Rik, AU040972, BC051019, Rsph1, Dnali1, Dynlrb2, Nme5, Mns1, Gm867 
##     1700016K19Rik, Pifo, Arhgdig, 1700001C02Rik, Ccdc113, Tekt4, Mlf1, Odf3b, 2410004P03Rik, Sntn 
## PC_ 5 
## Positive:  Coro1a, Rac2, Lgals1, Cd3e, Arhgdib, Laptm5, Cd52, Ms4a4b, Selplg, Cd3g 
##     Ifi27l2a, Ptprc, Thy1, Ptprcap, Lsp1, Cd3d, Lat, Sept1, Lck, Ltb 
##     Cytip, Ms4a6b, Itgb7, Gimap3, Skap1, Icos, Il2rb, Cd2, Vps37b, Fyb 
## Negative:  Ly6a, Tm4sf1, Ly6c1, Ly6e, Cd93, Sema3c, Car4, Tmem252, Kdr, Stmn2 
##     Plaur, Id1, Tmcc2, H2-Ab1, Lyve1, Gja4, Efnb2, Fam183b, Hspa1b, Hspb1 
##     Cyp4b1, Foxj1, Odf3b, Tbx3, Cd74, Tmem212, Rgs12, 1110017D15Rik, Ccdc153, 1700007K13Rik
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 49846
## Number of edges: 1551538
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9142
## Number of communities: 27
## Elapsed time: 16 seconds
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 17:57:42 UMAP embedding parameters a = 0.9922 b = 1.112
## 17:57:42 Read 49846 rows and found 10 numeric columns
## 17:57:42 Using Annoy for neighbor search, n_neighbors = 30
## 17:57:42 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 17:57:51 Writing NN index file to temp file /tmp/RtmptPz7ea/file1761b2551e40fc
## 17:57:51 Searching Annoy index using 1 thread, search_k = 3000
## 17:58:17 Annoy recall = 100%
## 17:58:20 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 17:58:25 Initializing from normalized Laplacian + noise (using irlba)
## 17:58:35 Commencing optimization for 200 epochs, with 2065552 positive edges
## 17:59:02 Optimization finished
DimPlot(object=all_norm, reduction="tsne")

plotted <- DimPlot(all_norm, reduction="umap", group.by="cluster_letters", label=TRUE)
plotted

9.2 After filtering

filt_norm <- NormalizeData(object=filt) %>%
  FindVariableFeatures() %>%
  ScaleData() %>%
  RunPCA() %>%
  FindNeighbors() %>%
  FindClusters() %>%
  RunTSNE() %>%
  RunUMAP(reduction = "pca", dims = 1:10)
## Centering and scaling data matrix
## PC_ 1 
## Positive:  Rac2, Coro1a, Arhgdib, Cd3e, Ms4a4b, Laptm5, Selplg, Cd52, Cd3g, Cd3d 
##     Ptprc, Lat, Thy1, Ltb, Lck, Ms4a6b, Ptprcap, Il7r, Sept1, Gramd3 
##     Lsp1, Itgb7, Cd2, Fyb, Cd28, Icos, Cd69, Il2rb, Ccr7, AW112010 
## Negative:  Sparc, Emp2, Ager, Igfbp7, Cd63, Mettl7a1, Gsn, Cldn18, Tns1, Timp3 
##     Limch1, Gpx3, Cystm1, Cst3, Cryab, Bgn, Clic3, Aqp5, Bcam, Krt7 
##     Npnt, Selenbp1, Hspb1, Pmp22, Ifitm3, Selenbp2, Inmt, Apoe, Serping1, Ccn1 
## PC_ 2 
## Positive:  Sftpd, Sftpb, Napsa, Cxcl15, Slc34a2, Sftpa1, Wfdc2, Cbr2, Chil1, Cldn3 
##     Atp1b1, Sftpc, Lyz2, Lamp3, Muc1, Ppp1r14c, Dram1, Ctsh, Scd1, Lgi3 
##     Hc, Lpcat1, Car8, S100g, Ctsc, Egfl6, Pi4k2b, Fabp5, Abca3, Irx1 
## Negative:  Bgn, Serping1, Sod3, Plpp3, Rarres2, Prelp, Hsd11b1, Tcf21, Sparcl1, Clec3b 
##     Olfml3, Gpx3, Col1a2, Mfap4, Vim, Spon1, Fhl1, Loxl1, Col1a1, Cdo1 
##     Fn1, Pcolce2, Ptgis, Mxra8, Plxdc2, Ppp1r14a, Col3a1, Ogn, Gsn, Lrp1 
## PC_ 3 
## Positive:  Rtkn2, Spock2, Scnn1g, Igfbp2, Cyp2b10, Col4a3, Pdpn, Sema3e, Hopx, Scnn1b 
##     Krt7, Col4a4, Krt19, Sec14l3, Tacstd2, Flrt3, Cyp4b1, Lama3, Cytl1, Scnn1a 
##     Clic5, Tmem37, Clic3, Ndnf, Itgb6, Mab21l4, Lamb3, Cdkn2b, Tspan8, Lmo7 
## Negative:  Mgst1, Nupr1, Col6a1, Dram1, Slc34a2, Sod3, Cxcl15, Chil1, Sftpb, Loxl1 
##     Apoe, Ces1d, Col3a1, Sftpd, Sftpa1, Lyz2, Col1a1, Lamp3, Napsa, Col1a2 
##     Cbr2, C3, Sftpc, Scd1, Hc, Serping1, Muc1, Ppp1r14c, Lgi3, Prelp 
## PC_ 4 
## Positive:  Cldn5, Plvap, Icam2, Hpgd, Cd93, Tmem252, Tm4sf1, H2-Ab1, Sema3c, Cd74 
##     Lyve1, Stmn2, Pcdh1, Hilpda, Gja4, Mcam, Ier3, Efnb2, Ly6c1, Cebpd 
##     Ly6a, Hspa1a, Kdr, Ifitm3, Sox17, Dll4, Car4, H2-Eb1, Plk2, H2-Aa 
## Negative:  Birc5, Lgals1, Mki67, Pclaf, Top2a, Ube2c, Ccna2, Rrm2, Cdk1, S100a6 
##     Ccnb2, Coro1a, Rac2, Cdca8, Spc24, Tk1, Knl1, Cks1b, Aurkb, Tpx2 
##     Arhgdib, Hist1h1b, Lgals3, Cdca3, Nusap1, Laptm5, Cenpf, Ifi27l2a, Hmgb2, Kif11 
## PC_ 5 
## Positive:  Ager, Rtkn2, Aqp5, Scnn1g, Npnt, Col4a4, Col4a3, Rnase4, Cldn18, Gprc5a 
##     Ndnf, Spock2, Igfbp2, Scd2, Igfbp6, Slc39a8, Rgcc, Fam189a2, Limch1, Sftpc 
##     Slc34a2, Crlf1, Flrt3, Tmem37, Mal2, Scnn1b, Inmt, Napsa, Pdpn, Hc 
## Negative:  Fam183b, Foxj1, Tmem212, 1110017D15Rik, Ccdc153, 1700007K13Rik, Gm19935, Cfap126, 1700094D03Rik, Spaca9 
##     Cyp2s1, Tctex1d4, AU040972, Odf3b, Rsph1, Dnali1, Dynlrb2, BC051019, 1700016K19Rik, Nme5 
##     1700001C02Rik, Sntn, Tekt4, Pifo, Ccdc113, Gm867, Mns1, 1700088E04Rik, Cdhr4, Ak7
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 37929
## Number of edges: 1177655
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9064
## Number of communities: 25
## Elapsed time: 9 seconds
## 18:03:57 UMAP embedding parameters a = 0.9922 b = 1.112
## 18:03:57 Read 37929 rows and found 10 numeric columns
## 18:03:57 Using Annoy for neighbor search, n_neighbors = 30
## 18:03:57 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 18:04:03 Writing NN index file to temp file /tmp/RtmptPz7ea/file1761b231f15123
## 18:04:03 Searching Annoy index using 1 thread, search_k = 3000
## 18:04:23 Annoy recall = 100%
## 18:04:25 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 18:04:30 Initializing from normalized Laplacian + noise (using irlba)
## 18:04:36 Commencing optimization for 200 epochs, with 1557970 positive edges
## 18:04:57 Optimization finished
DimPlot(object=filt_norm, reduction="tsne")

plotted <- DimPlot(filt_norm, reduction="umap", group.by="cluster_letters", label=TRUE)
plotted

10 Scan for likely cell cycle genes

filt <- CellCycleScoring(
    object = filt,
    g2m.features = cc.genes$g2m.genes,
    s.features = cc.genes$s.genes)
## Warning: The following features are not present in the object: MCM5, PCNA, TYMS,
## FEN1, MCM2, MCM4, RRM1, UNG, GINS2, MCM6, CDCA7, DTL, PRIM1, UHRF1, MLF1IP,
## HELLS, RFC2, RPA2, NASP, RAD51AP1, GMNN, WDR76, SLBP, CCNE2, UBR7, POLD3, MSH2,
## ATAD2, RAD51, RRM2, CDC45, CDC6, EXO1, TIPIN, DSCC1, BLM, CASP8AP2, USP1, CLSPN,
## POLA1, CHAF1B, BRIP1, E2F8, not searching for symbol synonyms
## Warning: The following features are not present in the object: HMGB2, CDK1,
## NUSAP1, UBE2C, BIRC5, TPX2, TOP2A, NDC80, CKS2, NUF2, CKS1B, MKI67, TMPO, CENPF,
## TACC3, FAM64A, SMC4, CCNB2, CKAP2L, CKAP2, AURKB, BUB1, KIF11, ANP32E, TUBB4B,
## GTSE1, KIF20B, HJURP, CDCA3, HN1, CDC20, TTK, CDC25C, KIF2C, RANGAP1, NCAPD2,
## DLGAP5, CDCA2, CDCA8, ECT2, KIF23, HMMR, AURKA, PSRC1, ANLN, LBR, CKAP5, CENPE,
## CTCF, NEK2, G2E3, GAS2L3, CBX5, CENPA, not searching for symbol synonyms
## Warning in AddModuleScore(object = object, features = features, name = name, :
## Could not find enough features in the object from the following feature lists:
## S.Score Attempting to match case...Could not find enough features in the object
## from the following feature lists: G2M.Score Attempting to match case...
VlnPlot(filt, features = c("S.Score", "G2M.Score"),
        group.by = "orig.ident",
        ncol = 4, pt.size = 0)

pander::pander(sessionInfo())

R version 4.2.0 (2022-04-22)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats4, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: ggplot2(v.3.4.0), SeuratObject(v.4.1.3), Seurat(v.4.3.0), hpgltools(v.1.0), testthat(v.3.1.6), reticulate(v.1.26), SummarizedExperiment(v.1.28.0), GenomicRanges(v.1.50.2), GenomeInfoDb(v.1.34.6), IRanges(v.2.32.0), S4Vectors(v.0.36.1), MatrixGenerics(v.1.10.0), matrixStats(v.0.63.0), Biobase(v.2.58.0) and BiocGenerics(v.0.44.0)

loaded via a namespace (and not attached): ica(v.1.0-3), ps(v.1.7.2), Rsamtools(v.2.14.0), foreach(v.1.5.2), lmtest(v.0.9-40), rprojroot(v.2.0.3), crayon(v.1.5.2), rbibutils(v.2.2.11), MASS(v.7.3-58.1), nlme(v.3.1-161), backports(v.1.4.1), sva(v.3.46.0), GOSemSim(v.2.24.0), rlang(v.1.0.6), XVector(v.0.38.0), HDO.db(v.0.99.1), ROCR(v.1.0-11), irlba(v.2.3.5.1), nloptr(v.2.0.3), callr(v.3.7.3), limma(v.3.54.0), filelock(v.1.0.2), BiocParallel(v.1.32.5), rjson(v.0.2.21), bit64(v.4.0.5), glue(v.1.6.2), sctransform(v.0.3.5), pbkrtest(v.0.5.1), parallel(v.4.2.0), processx(v.3.8.0), vipor(v.0.4.5), spatstat.sparse(v.3.0-0), AnnotationDbi(v.1.60.0), DOSE(v.3.24.2), spatstat.geom(v.3.0-3), tidyselect(v.1.2.0), usethis(v.2.1.6), fitdistrplus(v.1.1-8), variancePartition(v.1.28.0), XML(v.3.99-0.13), tidyr(v.1.2.1), zoo(v.1.8-11), GenomicAlignments(v.1.34.0), xtable(v.1.8-4), magrittr(v.2.0.3), evaluate(v.0.19), Rdpack(v.2.4), cli(v.3.5.0), zlibbioc(v.1.44.0), rstudioapi(v.0.14), miniUI(v.0.1.1.1), sp(v.1.5-1), bslib(v.0.4.2), fastmatch(v.1.1-3), aod(v.1.3.2), treeio(v.1.22.0), shiny(v.1.7.4), xfun(v.0.36), pkgbuild(v.1.4.0), gson(v.0.0.9), cluster(v.2.1.4), caTools(v.1.18.2), tidygraph(v.1.2.2), KEGGREST(v.1.38.0), tibble(v.3.1.8), ggrepel(v.0.9.2), ape(v.5.6-2), listenv(v.0.9.0), Biostrings(v.2.66.0), png(v.0.1-8), future(v.1.30.0), withr(v.2.5.0), bitops(v.1.0-7), ggforce(v.0.4.1), plyr(v.1.8.8), GSEABase(v.1.60.0), pillar(v.1.8.1), gplots(v.3.1.3), cachem(v.1.0.6), GenomicFeatures(v.1.50.3), fs(v.1.5.2), clusterProfiler(v.4.6.0), vctrs(v.0.5.1), ellipsis(v.0.3.2), generics(v.0.1.3), devtools(v.2.4.5), tools(v.4.2.0), beeswarm(v.0.4.0), munsell(v.0.5.0), tweenr(v.2.0.2), fgsea(v.1.24.0), DelayedArray(v.0.24.0), fastmap(v.1.1.0), compiler(v.4.2.0), pkgload(v.1.3.2), abind(v.1.4-5), httpuv(v.1.6.7), rtracklayer(v.1.58.0), sessioninfo(v.1.2.2), plotly(v.4.10.1), GenomeInfoDbData(v.1.2.9), gridExtra(v.2.3), edgeR(v.3.40.1), lattice(v.0.20-45), deldir(v.1.0-6), utf8(v.1.2.2), later(v.1.3.0), dplyr(v.1.0.10), BiocFileCache(v.2.6.0), jsonlite(v.1.8.4), scales(v.1.2.1), graph(v.1.76.0), tidytree(v.0.4.2), pbapply(v.1.6-0), genefilter(v.1.80.2), lazyeval(v.0.2.2), promises(v.1.2.0.1), doParallel(v.1.0.17), R.utils(v.2.12.2), goftest(v.1.2-3), spatstat.utils(v.3.0-1), rmarkdown(v.2.19), cowplot(v.1.1.1), Rtsne(v.0.16), pander(v.0.6.5), downloader(v.0.4), uwot(v.0.1.14), igraph(v.1.3.5), survival(v.3.4-0), yaml(v.2.3.6), htmltools(v.0.5.4), memoise(v.2.0.1), profvis(v.0.3.7), BiocIO(v.1.8.0), locfit(v.1.5-9.7), graphlayouts(v.0.8.4), viridisLite(v.0.4.1), digest(v.0.6.31), assertthat(v.0.2.1), RhpcBLASctl(v.0.21-247.1), mime(v.0.12), rappdirs(v.0.3.3), RSQLite(v.2.2.20), yulab.utils(v.0.0.6), future.apply(v.1.10.0), remotes(v.2.4.2), data.table(v.1.14.6), urlchecker(v.1.0.1), blob(v.1.2.3), R.oo(v.1.25.0), splines(v.4.2.0), labeling(v.0.4.2), RCurl(v.1.98-1.9), broom(v.1.0.2), hms(v.1.1.2), colorspace(v.2.0-3), ggbeeswarm(v.0.7.1), aplot(v.0.1.9), ggrastr(v.1.0.1), sass(v.0.4.4), Rcpp(v.1.0.9), RANN(v.2.6.1), enrichplot(v.1.18.3), fansi(v.1.0.3), tzdb(v.0.3.0), brio(v.1.1.3), parallelly(v.1.33.0), R6(v.2.5.1), grid(v.4.2.0), ggridges(v.0.5.4), lifecycle(v.1.0.3), curl(v.4.3.3), minqa(v.1.2.5), leiden(v.0.4.3), jquerylib(v.0.1.4), PROPER(v.1.30.0), Matrix(v.1.5-3), qvalue(v.2.30.0), desc(v.1.4.2), RcppAnnoy(v.0.0.20), RColorBrewer(v.1.1-3), iterators(v.1.0.14), spatstat.explore(v.3.0-5), stringr(v.1.5.0), htmlwidgets(v.1.6.0), polyclip(v.1.10-4), biomaRt(v.2.54.0), purrr(v.1.0.0), shadowtext(v.0.1.2), gridGraphics(v.0.5-1), mgcv(v.1.8-41), globals(v.0.16.2), patchwork(v.1.1.2), spatstat.random(v.3.0-1), progressr(v.0.12.0), codetools(v.0.2-18), GO.db(v.3.16.0), gtools(v.3.9.4), prettyunits(v.1.1.1), dbplyr(v.2.2.1), R.methodsS3(v.1.8.2), gtable(v.0.3.1), DBI(v.1.1.3), highr(v.0.10), ggfun(v.0.0.9), tensor(v.1.5), httr(v.1.4.4), KernSmooth(v.2.23-20), stringi(v.1.7.8), vroom(v.1.6.0), progress(v.1.2.2), reshape2(v.1.4.4), farver(v.2.1.1), annotate(v.1.76.0), viridis(v.0.6.2), ggtree(v.3.6.2), xml2(v.1.3.3), boot(v.1.3-28.1), lme4(v.1.1-31), restfulr(v.0.0.15), readr(v.2.1.3), ggplotify(v.0.1.0), scattermore(v.0.8), bit(v.4.0.5), scatterpie(v.0.1.8), spatstat.data(v.3.0-0), ggraph(v.2.1.0), pkgconfig(v.2.0.3) and knitr(v.1.41)

message(paste0("This is hpgltools commit: ", get_git_commit()))
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 911e7d4beebdc73267ec4be631a305574289efd3
## This is hpgltools commit: Tue Jan 17 10:36:44 2023 -0500: 911e7d4beebdc73267ec4be631a305574289efd3
this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
message(paste0("Saving to ", this_save))
## Saving to index-v202301.rda.xz
tmp <- sm(saveme(filename=this_save))
---
title: "Playing with some new scRNASeq data: 202301."
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
  html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: tango
    keep_md: false
    mode: selfcontained
    number_sections: true
    self_contained: true
    theme: readable
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
---

<style type="text/css">
body, td {
  font-size: 16px;
}
code.r{
  font-size: 16px;
}
pre {
 font-size: 16px
}
</style>

```{r options, include=FALSE}
library("hpgltools")
library("reticulate")
tt <- devtools::load_all("~/hpgltools")
knitr::opts_knit$set(width=120,
                     progress=TRUE,
                     verbose=TRUE,
                     echo=TRUE)
knitr::opts_chunk$set(error=TRUE,
                      dpi=96)
lua_filters <- rmarkdown::pandoc_lua_filter_args("pandoc-zotxt.lua")
old_options <- options(digits=4,
                       stringsAsFactors=FALSE,
                       knitr.duplicate.label="allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size=10))
rundate <- format(Sys.Date(), format="%Y%m%d")
previous_file <- ""
ver <- "202301"

##tmp <- sm(loadme(filename=paste0(gsub(pattern="\\.Rmd", replace="", x=previous_file), "-v", ver, ".rda.xz")))
rmd_file <- "index.Rmd"
library(Seurat)
library(ggplot2)
```

# Preprocessing with cellranger

I downloaded a new version of cellranger along with the various
reference files provided by 10x for the VD(J) references etc.  I got a
bit distracted by the pipeline language implemented by 10x called
'martian'.   I have the feeling that it might prove a good thing to
play with.

Here are the commands I ran to separate the samples and perform the
alignments.  There are 4 sample names and each was done with one run
of the 'normal' GEX scRNASeq method and one of the (new to me) V(D)J
library.

The sample names are below, but the types are GEX

```{bash cellranger, eval=FALSE}
module add cellranger
names="A_Control A_IM_Mex09 A_IN_Mex09 A_Mock_Mex09"

for name in names; do
    echo $name
    cellranger count --fastqs=preprocessing \
               --transcriptome=/sw/local/cellranger/202212/reference/refdata-gex-mm10-2020-A \
               --id="${name}_GEX" --sample="${name}_GEX"
    cellranger vdj --fastqs preprocessing/ \
               --reference /sw/local/cellranger/202212/reference/refdata-cellranger-vdj-GRCm38-alts-ensembl-7.0.0/ \
               --id="${name}_VDJ" --sample="${name}_VDJ"
done
```

# Rerun the pipeline with multi

April kindly sent some information from 10x which shows that I should
have used the multi pipline when preprocessing the data.

Intra-Muscular vs. Nasal

I wrote 4 separate configuration csv files using the templates I
downloaded and following a little reading.  It seemed to me that I
should be able to process them all as a single csv file, but when I
attempted that, cellranger did not react well.  It also took a few
tries before I got the various reference/library options correct.

Note that once cellranger successfully ran for the samples I moved
them to the multi/ directory so that I can compare the outputs to when
I simply did the 'count' operation.

The following invocations of cellranger all appear to work without any
problems.  Ideally I would like them to be done in a single run, though.

```{bash cellrangerv2, eval=FALSE}
module add cellranger
cellranger multi --id control --csv sample_sheets/multi_config_try03_control.csv
cellranger multi --id mock --csv sample_sheets/multi_config_try03_mock.csv
cellranger multi --id m_vaccine --csv sample_sheets/multi_config_try03_m.csv
cellranger multi --id n_vaccine --csv sample_sheets/multi_config_try03_n.csv
mv control mock m_vaccine n_vaccine multi/
```

The following command, which in theory should do the preprocessing of
all samples together, fails.  It seems my understanding of the multi
csv input is incorrect.

```{bash cellrangerv3}
module add cellranger
rm -rf run01 && cellranger multi --id run01 --csv sample_sheets/multi_config_try04_all.csv
```

```{r print_multi_csv}
read.csv("sample_sheets/multi_config_try04_all.csv", comment.char="#")
```

# Some stolen code

Here is a snippet of code copy-pasted from:

https://ucdavis-bioinformatics-training.github.io/2020-Advanced_Single_Cell_RNA_Seq/data_analysis/VDJ_Analysis_fixed

which, if I read it correctly, will read in the vdj output and add it
as a set of annotations to the seurat object. Note, I made some
changes to it because I simply cannot help myself.

```{r stolen_code}
add_clonotype <- function(seurat_obj, name = "control", type = "t") {
  vdj_directory <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/vdj_t")
  vdj_csv <- glue::glue("{vdj_directory}/filtered_contig_annotations.csv")
  reference_csv <- glue::glue("{vdj_directory}/clonotypes.csv")
  tcr <- readr::read_csv(vdj_csv, show_col_types = FALSE)
  ref <- readr::read_csv(reference_csv, show_col_types = FALSE)

  tcr_duplicate_barcode_idx <- duplicated(tcr[["barcode"]])
  tcr_nodup <- tcr[!tcr_duplicate_barcode_idx, ]

  both <- as.data.frame(merge(tcr_nodup, ref, by.x = "raw_clonotype_id", by.y = "clonotype_id"))
  rownames(both) <- both[["barcode"]]
  both[["barcode"]] <- NULL

  new <- Seurat::AddMetaData(object = seurat_obj, metadata = both)
  return(new)
}
```

# Load the data into Seurat and poke at it

The following block is mostly a cut/paste of itself where I set the
(over)simplified name of each sample.  This then becomes the template
for the path and parameters used to read the data, create a seurat
object, and add the clonotype data from the vdj run.

```{r seurat_loading}
name <- "control"
path <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
a_control <- Seurat::Read10X(path) %>%
  CreateSeuratObject(project = name) %>%
  add_clonotype(name = name)

name <- "mock"
path <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
a_mock <- Seurat::Read10X(path) %>%
  CreateSeuratObject(project = name) %>%
  add_clonotype(name = name)

name <- "m"
path <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
a_m <- Seurat::Read10X(path) %>%
  CreateSeuratObject(project = name) %>%
  add_clonotype(name = name)

name <- "n"
path <- glue::glue("multi/{name}/outs/per_sample_outs/{name}/count/sample_filtered_feature_bc_matrix")
a_n <- Seurat::Read10X(path) %>%
  CreateSeuratObject(project = name) %>%
  add_clonotype(name = name)
```

For the moment I want to be able to play with the individual
samples as well as the aggregate so that I can better understand the
data.  So I guess it works out that I didn't figure out how to run all
the samples at the same time via 'cellranger multi'.

```{r merge_samples}
all <- merge(a_control, a_mock)  %>%
  merge(a_m) %>%
  merge(a_n)
```

# Query for clonotypes in the individual samples and the full dataset.

```{r query_clonotypes}
control_clono <- !is.na(a_control[["raw_clonotype_id"]])
summary(control_clono)

mock_clono <- !is.na(a_mock[["raw_clonotype_id"]])
summary(mock_clono)

m_clono <- !is.na(a_m[["raw_clonotype_id"]])
summary(m_clono)

n_clono <- !is.na(a_n[["raw_clonotype_id"]])
summary(n_clono)
```

# Initial Clusters

```{r initial_clusters}
cluster_letters <- as.factor(LETTERS[Idents(object=all)])
names(cluster_letters) <- colnames(x=all)
control <- as.character(cluster_letters)
mock <- control
m_vaccine <- control
n_vaccine <- control

control_idx <- control == "A"
control[control_idx] <- "Control"
control[!control_idx] <- "Stimulated"
mock_idx <- mock == "B"
mock[mock_idx] <- "Mock"
mock[!mock_idx] <- "Not mock"
m_vaccine_idx <- m_vaccine == "C"
m_vaccine[m_vaccine_idx] <- "M"
m_vaccine[!m_vaccine_idx] <- "Not M"
n_vaccine_idx <- n_vaccine == "D"
n_vaccine[n_vaccine_idx] <- "N"
n_vaccine[!n_vaccine_idx] <- "Not N"

all <- AddMetaData(
    object=all,
    metadata=cluster_letters,
    col.name="cluster_letters")
all <- AddMetaData(
    object=all,
    metadata=control,
    col.name="control")
all <- AddMetaData(
    object=all,
    metadata=mock,
    col.name="mock")
all <- AddMetaData(
    object=all,
    metadata=m_vaccine,
    col.name="M")
all <- AddMetaData(
    object=all,
    metadata=n_vaccine,
    col.name="N")
```

# Filters and QC

```{r filters_qc}

all[["percent_mt"]] <- PercentageFeatureSet(all, pattern="^mt-")
all[["percent_ribo"]] <- PercentageFeatureSet(all, pattern="^Rp[sl]")
VlnPlot(all, features="nFeature_RNA", pt.size=0)
VlnPlot(all, features="percent_mt", pt.size=0)
VlnPlot(all, features="percent_ribo", pt.size=0)
VlnPlot(all, features="nCount_RNA", pt.size=0)

FeatureScatter(all, "percent_ribo", "percent_mt")
FeatureScatter(all, "nCount_RNA", "nFeature_RNA")
FeatureScatter(all, "nCount_RNA", "nFeature_RNA")

## How many cells have specific chains associated with them
sum(!is.na(all$chain))
```

# Filter high mitochondrial and ribosomal cells

```{r filter_ribomt}
ncol(all)

## Get a summary of the counts / cell
summary(colSums(all@assays$RNA@counts))

dim(all)
sufficient_rna_observed <- WhichCells(all, expression = nFeature_RNA > 200)
filt <- subset(all, cells = sufficient_rna_observed)
dim(filt)
sufficiently_observed_idx <- rowSums(all) > 3
summary(sufficiently_observed_idx)
filt <- subset(filt, features = rownames(all)[sufficiently_observed_idx])
dim(filt)
## Keep cells with at least some ribosomal reads
## Note the Percent function above actually puts in a floating point
## number from 0-100, not (as I assumed from 0-1).
high_ribosomal <- WhichCells(filt, expression = percent_ribo > 5)
filt <- subset(filt, cells = high_ribosomal)
dim(filt)
low_mitochondrial <- WhichCells(filt, expression = percent_mt < 20)
filt <- subset(filt, cells = low_mitochondrial)
dim(filt)

## Now drop mitochondrial genes
mt_idx <- grepl("^mt-", rownames(filt))
summary(mt_idx)
filt <- filt[!mt_idx, ]
dim(filt)
```

# Distribution

## Before filtering

```{r distribution}
all_norm <- NormalizeData(object=all) %>%
  FindVariableFeatures() %>%
  ScaleData() %>%
  RunPCA() %>%
  FindNeighbors() %>%
  FindClusters() %>%
  RunTSNE() %>%
  RunUMAP(reduction = "pca", dims = 1:10)

DimPlot(object=all_norm, reduction="tsne")
plotted <- DimPlot(all_norm, reduction="umap", group.by="cluster_letters", label=TRUE)
plotted
```

## After filtering

```{r distribution_post}
filt_norm <- NormalizeData(object=filt) %>%
  FindVariableFeatures() %>%
  ScaleData() %>%
  RunPCA() %>%
  FindNeighbors() %>%
  FindClusters() %>%
  RunTSNE() %>%
  RunUMAP(reduction = "pca", dims = 1:10)

DimPlot(object=filt_norm, reduction="tsne")
plotted <- DimPlot(filt_norm, reduction="umap", group.by="cluster_letters", label=TRUE)
plotted
```

# Scan for likely cell cycle genes

```{r cell_cycle}
filt <- CellCycleScoring(
    object = filt,
    g2m.features = cc.genes$g2m.genes,
    s.features = cc.genes$s.genes)
VlnPlot(filt, features = c("S.Score", "G2M.Score"),
        group.by = "orig.ident",
        ncol = 4, pt.size = 0)
```

```{r saveme}
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
this_save <- paste0(gsub(pattern="\\.Rmd", replace="", x=rmd_file), "-v", ver, ".rda.xz")
message(paste0("Saving to ", this_save))
tmp <- sm(saveme(filename=this_save))
```
