Examining some R.norvegicus scRNASeq data.

Table of Contents

1. Introduction

This data set is with Dr. Li's lab and was initially examined by April; my current goal is to poke around and recapitulate her observations.

I therefore downloaded new versions of the various cellranger suite tools and am now going to her home directory to see what was done and what I can observe.

export BASE=/scratch/atb/scrnaseq/rattus_norvegicus_202602

1.1. Indices

Being lazy, I copied April's reference material to /scratch/atb/scrnaseq/rattus_norvegicus_2026/reference and reorganized it slightly so that I do not get confused (e.g. genus_species_method).

Thus I now have $BASE/reference/rnorvegicus_atac and rnorvegicus_3p which were created by April for the ATAC and 'normal' libraries respectively. April has one important question: are these libraries the correct species for the animals used in Phoebe's experiments. These are specific to the mRatBN7.2 assembly: The BN/NHsdMcwi strain of the norway rat.

https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_015227675.2/

We should check in with Phoebe but I am betting this assembly is better than close enough.

Thus, let us take a moment and record April's commands just in case we need to regenerate these with a different assembly (we won't).

1.1.1. ATAC reference

  1. She wrote the rat.config with the following:

    {
      organism: "Norway Rat"
      genome: ["mRatBN7-2"]
      input_fasta: ["../Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa"]
      input_gtf: ["../Rattus_norvegicus.mRatBN7.2.113.gtf"]
      non_nuclear_contigs: ["MT"]
      localcores: "16"
    }
    

    and invoked

/opt/cellranger-atac-2.2.0/bin/cellranger-atac mkref --config=rat.config

an optional step is to filter the rnorvegicus gtf file via cellranger mkgtf; I am going to ignore this step for the moment.

1.1.2. ARC reference

We don't have an example right now because ARC requires the 3p data to run; so leave this aside for the moment. The caveat here is that we should only be using ARC/ATAC if the combined 3' and ATAC libraries do not work as well as we want.

1.1.3. 3p reference

It looks like this does not have a config file and just uses the 'normal' cellranger mkref command; no wait, April is saying this is potentially only a reference valid for normal cellranger, so let us take a moment to look for the arc.

/opt/cellranger-10.0.0/bin/cellranger mkref --ref-version="2024-A" --genome="mRatBN7-2" --fasta=Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa --genes=Rattus_norvegicus.mRatBN7.2.113.filtered.gtf

1.2. Running the ATAC counter

Given the references above, April ran the following:

/opt/cellranger-atac-2.2.0/bin/cellranger-atac count --id 1_saline --reference rattus_norvegicus/mRatBN7-2/ --fastqs ~/nas-read-only/260117_VL00136_113_AAHNVMMM5/Analysis/1/Data/fastq/ --sample 1_saline_ATAC --localmem 90 --localcores 30
/opt/cellranger-atac-2.2.0/bin/cellranger-atac count --id 2_oxycodone --reference rattus_norvegicus/mRatBN7-2/ --fastqs ~/nas-read-only/260117_VL00136_113_AAHNVMMM5/Analysis/1/Data/fastq/ --sample 2_oxycodone_ATAC --localmem 90 --localcores 30
/opt/cellranger-atac-2.2.0/bin/cellranger-atac count --id 3_saline --reference rattus_norvegicus/mRatBN7-2/ --fastqs ~/nas-read-only/260117_VL00136_113_AAHNVMMM5/Analysis/1/Data/fastq/ --sample 3_saline_ATAC --localmem 90 --localcores 30
/opt/cellranger-atac-2.2.0/bin/cellranger-atac count --id 4_oxycodone --reference rattus_norvegicus/mRatBN7-2/ --fastqs ~/nas-read-only/260117_VL00136_113_AAHNVMMM5/Analysis/1/Data/fastq/ --sample 4_oxycodone_ATAC --localmem 90 --localcores 30

Since I moved this around and installed via modules, I will be slightly different, I moved the raw files into separate directories and moved the indexes a little.

Note, I created a test sample using the first 100,000 sequences from the 1_saline sample; so let us try out arguments etc there.

Note: 10x provides different kits, this was done with the 'epi multiome kit' which does 3p+ATAC; other kits do each step separately. When using the multiome kit, one must specify –chemistry=ARC-v1 presumably if I go poking around at 10xgenomics I will figure this out…

module add cellranger_atac
ref="${BASE}/reference/rnorvegicus_atac/mRatBN7-2/"
pre="${BASE}/preprocessing"
ids="test"
chemistry="--chemistry=ARC-v1"
for id in $ids; do
    echo $id
    cmd="cellranger-atac count ${chemistry} --id ${id}_out --reference $ref --fastqs ${pre}/${id}/ --sample ${id}"
    ls -la $pre/$id
    echo "I would run $cmd"
    $cmd
done
test
total 20099
drwxrwxr-x 2 abelew hpgl        6 Feb  2 13:17 .
drwxrwxr-x 9 abelew hpgl        9 Feb  2 13:16 ..
-rw-rw-r-- 1 abelew hpgl 10256496 Feb  2 13:17 test_I1.fastq
-rw-rw-r-- 1 abelew hpgl 13456496 Feb  2 13:17 test_I2.fastq
-rw-rw-r-- 1 abelew hpgl 19256496 Feb  2 13:17 test_R1.fastq
-rw-rw-r-- 1 abelew hpgl 19256496 Feb  2 13:17 test_R2.fastq
I would run cellranger-atac count --chemistry=ARC-v1 --id test_out --reference /scratch/atb/scrnaseq/rattus_norvegicus_202602/reference/rnorvegicus_atac/mRatBN7-2/ --fastqs /scratch/atb/scrnaseq/rattus_norvegicus_202602/preprocessing/test/ --sample test
bash: cellranger-atac: command not found
module add cellranger_atac
cd $BASE
ref="${BASE}/reference/rnorvegicus_atac/mRatBN7-2/"
pre="${BASE}/preprocessing"
ids="1_saline 2_oxycodone 3_saline 4_oxycodone"
for id in $ids; do
    echo $id
    cmd="cellranger-atac count ${chemistry} --id ${id} --reference $ref --fastqs ${pre}/${id}/ --sample ${id}_ATAC"
    ls -la $pre/$id
    echo "I would run $cmd"
done
1_saline
total 4629096
drwxrwxr-x 2 abelew hpgl          6 Feb  2 13:08 .
drwxrwxr-x 9 abelew hpgl          9 Feb  2 13:16 ..
-rw-rw---- 1 abelew hpgl  741393405 Jan 18 04:50 1_saline_ATAC_S1_I1_001.fastq.gz
-rw-rw---- 1 abelew hpgl  897325116 Jan 18 04:50 1_saline_ATAC_S1_I2_001.fastq.gz
-rw-rw---- 1 abelew hpgl 1567315767 Jan 18 04:50 1_saline_ATAC_S1_R1_001.fastq.gz
-rw-rw---- 1 abelew hpgl 1533341528 Jan 18 04:50 1_saline_ATAC_S1_R2_001.fastq.gz
I would run cellranger-atac count --chemistry=ARC-v1 --id 1_saline --reference /scratch/atb/scrnaseq/rattus_norvegicus_202602/reference/rnorvegicus_atac/mRatBN7-2/ --fastqs /scratch/atb/scrnaseq/rattus_norvegicus_202602/preprocessing/1_saline/ --sample 1_saline_ATAC
2_oxycodone
total 7101971
drwxrwxr-x 2 abelew hpgl          6 Feb  2 13:08 .
drwxrwxr-x 9 abelew hpgl          9 Feb  2 13:16 ..
-rw-rw---- 1 abelew hpgl 1129819923 Jan 18 04:53 2_oxycodone_ATAC_S2_I1_001.fastq.gz
-rw-rw---- 1 abelew hpgl 1352714185 Jan 18 04:51 2_oxycodone_ATAC_S2_I2_001.fastq.gz
-rw-rw---- 1 abelew hpgl 2400197776 Jan 18 04:52 2_oxycodone_ATAC_S2_R1_001.fastq.gz
-rw-rw---- 1 abelew hpgl 2388428710 Jan 18 04:53 2_oxycodone_ATAC_S2_R2_001.fastq.gz
I would run cellranger-atac count --chemistry=ARC-v1 --id 2_oxycodone --reference /scratch/atb/scrnaseq/rattus_norvegicus_202602/reference/rnorvegicus_atac/mRatBN7-2/ --fastqs /scratch/atb/scrnaseq/rattus_norvegicus_202602/preprocessing/2_oxycodone/ --sample 2_oxycodone_ATAC
3_saline
total 7501859
drwxrwxr-x 2 abelew hpgl          6 Feb  2 13:08 .
drwxrwxr-x 9 abelew hpgl          9 Feb  2 13:16 ..
-rw-rw---- 1 abelew hpgl 1160021614 Jan 18 04:51 3_saline_ATAC_S3_I1_001.fastq.gz
-rw-rw---- 1 abelew hpgl 1425810317 Jan 18 04:53 3_saline_ATAC_S3_I2_001.fastq.gz
-rw-rw---- 1 abelew hpgl 2539689802 Jan 18 04:51 3_saline_ATAC_S3_R1_001.fastq.gz
-rw-rw---- 1 abelew hpgl 2555076313 Jan 18 04:51 3_saline_ATAC_S3_R2_001.fastq.gz
I would run cellranger-atac count --chemistry=ARC-v1 --id 3_saline --reference /scratch/atb/scrnaseq/rattus_norvegicus_202602/reference/rnorvegicus_atac/mRatBN7-2/ --fastqs /scratch/atb/scrnaseq/rattus_norvegicus_202602/preprocessing/3_saline/ --sample 3_saline_ATAC
4_oxycodone
total 7015235
drwxrwxr-x 2 abelew hpgl          6 Feb  2 13:08 .
drwxrwxr-x 9 abelew hpgl          9 Feb  2 13:16 ..
-rw-rw---- 1 abelew hpgl 1097052924 Jan 18 04:51 4_oxycodone_ATAC_S4_I1_001.fastq.gz
-rw-rw---- 1 abelew hpgl 1330497254 Jan 18 04:52 4_oxycodone_ATAC_S4_I2_001.fastq.gz
-rw-rw---- 1 abelew hpgl 2374261791 Jan 18 04:50 4_oxycodone_ATAC_S4_R1_001.fastq.gz
-rw-rw---- 1 abelew hpgl 2380564230 Jan 18 04:52 4_oxycodone_ATAC_S4_R2_001.fastq.gz
I would run cellranger-atac count --chemistry=ARC-v1 --id 4_oxycodone --reference /scratch/atb/scrnaseq/rattus_norvegicus_202602/reference/rnorvegicus_atac/mRatBN7-2/ --fastqs /scratch/atb/scrnaseq/rattus_norvegicus_202602/preprocessing/4_oxycodone/ --sample 4_oxycodone_ATAC
pwd
/z1/scratch/atb/scrnaseq/rattus_norvegicus_202602

2. Manual processing as if these were bulk samples

I am going to treat these samples in an identical fashion to how I am treating some L.donovani samples; my current understanding of this process is in /scratch/atb/atacseq/ldonovani_2021

Here is a copy/paste of the steps outlined in that paper: (1)

  1. Trim via trimomatic/trimgalore/fastp/cutadapt/etc
  2. Align against mtDNA
  3. Align against target species without mtDNA aligned reads a. Run kraken using the these reads
  4. Collect fragment sizes via picard/gatk (plot via R ATACseqQC)
  5. Mark duplicates via gatk
  6. Adjust Tn5 cut sites via alignmentSieve from deepTools (–ATACshift)
  7. Call peaks with MACS2
  8. FriP score calculation via plotEnrichment from deepTools (BPM normalized)
  9. Take fragments <100bp as nucleosome free; use fragments mapped to genes for TSS enrichment plots (Done via deepTools: computeMatrix and plotProfile)
  10. Annotate peaks via HOMER: annotatePeaks.pl
  11. Find consensus peaks across samples via multiintersect from deepTools
  12. Visual inspection via IGV

    module add cyoa/202410
    species=rnorvegicus_norway_grcr8
    

2.1. Organize reads

cd preprocessing
start=$(pwd)
samples=$(/bin/ls -d [0-9]*)
for i in $samples; do
    pushd $i
    mkdir -p unprocessed
    mv *.fastq.gz unprocessed/
    popd
done

2.2. Trim reads

The ATAC protocols I have been reading are for bulk reads and as a result do not take into account the cell barcodes; I may therefore need to use UMItools or similar in order to properly perform this step. I did poke around in the cellranger output directories and it looks to me like they did not, so I assume a default trimomatic run should work fine.

for i in ${samples}; do
    pushd ${start}/${i}
    inputs=$(/bin/ls unprocessed/*_R[1-2]_* | tr '\n' ':')
    cyoa --method trim --input ${inputs}
    popd
done

2.3. Remove mtDNA reads

There are a couple of other goals here: I would like a sense of the percentage of reads which are mitochondrial in origin. I also intend to query my bacterial kraken database and would to make sure the mitochondrial reads are not skewing the results.

for i in ${samples}; do
    pushd ${start}/${i}
    inputs=$(/bin/ls outputs/01trimomatic/*-trimmed.fastq.xz | tr '\n' ':')
    cyoa --method hisat --introns 0 --libtype mt --input ${inputs} --species rnorvegicus \
         --gff_type gene --gff_tag gene
    popd
done

2.4. Query the R.norvegicus genome

I am using the NCBI norway rat BN/NHsdMcwi assembly: GCA_036323735.1 and named it: rnorvegicus_norway_grcr8

echo "Making sure start is defined: ${start}."
echo "Making sure samples are defined: ${samples}."
for i in ${samples}; do
    pushd ${start}/${i}
    inputs=$(/bin/ls outputs/40hisat_rnorvegicus/unalcon*.fastq.xz | tr '\n' ':')
    cyoa --method hisat --introns 0 --input ${inputs} --species $species \
         --gff_type gene --gff_tag ID
    popd
done

2.5. Query unmapped reads via kraken

Note, I ran this initially while some jobs were still actively compressing the outputs from hisat2, as a result it is likely that one sample (#1 is wrong), I will rerun it later, but I am super curious to see what these unmapped reads are.

echo "Making sure start is defined: ${start}."
echo "Making sure samples are defined: ${samples}."
for i in ${samples}; do
    pushd ${start}/${i}
    inputs=$(/bin/ls outputs/40hisat_rnorvegicus_norway_grcr8/unalcon*.fastq.xz | tr '\n' ':')
    echo $inputs
    cyoa --method kraken --input ${inputs} --library bacteria
    popd
done

2.6. Deduplication

gatk may be used to pretty easily check for identical reads.

echo "Making sure start is defined: ${start}."
echo "Making sure samples are defined: ${samples}."
for i in ${samples}; do
    pushd ${start}/${i}
    input="outputs/40hisat_rnorvegicus_norway_grcr8/rnorvegicus_norway_grcr8_genome.bam"
    echo $input
    ls -ld $input
    cyoa --method gatkdedup --input ${input}
    popd
done

2.7. Peak Detection

echo "Checking variables:"
echo "$start"
echo "$samples"
echo "$species"
for i in ${samples}; do
    pushd ${start}/${i}
    input="outputs/50gatk_dedup/deduplicated.bam"
    echo $input
    cyoa --method atac --input ${input} --species ${species}
    popd
done

3. Bibliography

1.
D. g. E. Erdo{ g}an, et al., ATAC-seq in Emerging Model Organisms: Challenges and Strategies. Journal of experimental zoology part b: Molecular and developmental evolution 344, 394–414 (2025).

4. PostScript

Download the original document text.

Author: Ashton Belew

Created: 2026-02-11 Wed 11:59

Validate