• 1 Introduction
  • 2 SL-Seq
    • 2.1 Given: Capped mRNA with SL>5’>CDS>3’>pA
    • 2.2 Step 1: 1st Strand cDNA synthesis
    • 2.3 Step 2: Destroy the RNA strand and purify
    • 2.4 Step 3: Second strand synthesis
    • 2.5 Step 4: Adapter addition
    • 2.6 Step 5. Indexing
  • Bibliography

1 Introduction

Assuming we get to a place where we want to publish this work, this will be an introductory document into what happened here. In the interim, it is a place for me to take some notes and lay out some ideas.

2 SL-Seq

The spliced leader sequencing protocol used follows (Cuypers et al. (2017)) very closely. I would like to write down my understanding of the protocol here:

2.1 Given: Capped mRNA with SL>5’>CDS>3’>pA

One way to imagine our canonical protist RNA:

7mGCapnnnnnnnnnnnnnnnnnnnnnAGTTTCTGTACTTTATTGGxxxxxxxxxxxxxxxxxxxxxx5pUTRAUGnnnnnnnnnnnnnnnnnnnnnUAA3pUTRAAAAAAAAAAAAA

Note: I did not type out the complete SL sequence here because it is somewhat variable (Gibson et al. (2000)).

The full ‘canonical’ sequence looks like this:

7mGCapAACTAACGCTATTATTGATACAGTTTCTGTACTATATTG

with the caveat that I am pretty sure it leaves off the 2 or 3 nt at the 5’ end which are methylated and modified. Also note that the n’s at the 5’ end of my SL sequence above are all in the variant region of the Trypanosome SL.

Here is the relevant alignment from the paper above:

Note the portion in the primer
Note the portion in the primer

With all that in mind; let us step through the protocol. Also, I am reasonably certain that L.panamensis has a substitution at ~ position 17 in some but not all SL sequences (I think it has the AGA like T.brucei but also ACA?). Either way, the multiplexed paper chose wisely to avoid those shenanigans…

2.2 Step 1: 1st Strand cDNA synthesis

First strand cDNA with 5’GTATAAGACACAGNNNNNNNNN3’: I am reasonably certain the polyN sequence anneals semi-randomly and leaves a 5’ hanging end which happens to coincide with the nextera handle; this will be important later I assume. I assume without reading the next step that it will add some RNAse H or T1 to blow up the nascent cDNA:RNA duplex.

So: Now we have a ssDNA which looks something like this:

5'GTATAAGACACAGNNNNNNNNN....CAATATAGTACAGAAACTGTATCAATAATAGCGTTA3'
  ^nextera handle (trans2)  ^RC of the SL

with the caveat that the …. is likely quite long (mean ~ 1700 nt) and those last ~ 18 nt are likely to have a few variants depending on species.

2.3 Step 2: Destroy the RNA strand and purify

We don’t want a bunch of weirdo concatamers to form.

2.4 Step 3: Second strand synthesis

Second strand cDNA synthesis (I probably would have just done PCR here, which I am guessing would not have worked for reasons I cannot remember?) The noteworthy part: add 3ul of 10 uM Strand 2 Leishmania primer. I am going to guess that this primer looks like the primer in step 1 with another handle for indexing. Nope, I found it in the text: 5’TCAGTTTCTGTA3’

So, this will make the hybrid:

  ...CAATATAGTACAGAAACTGTATCAATAATAGCGTTA3'
      .......ATGTCTTTGACT5'  : once again missing the variant bases.

This must mean that following PCR steps add the overhangs for the rest of the library adapters?

2.5 Step 4: Adapter addition

I would note that the extension time is only 60 seconds; where does this cap the dsDNA amplicon size given the HiFi hotstart mix? (I used to know this, but no)

In any event, the following shows the regions of identity and where they will therefore anneal on the opposite strand.

                               GTATAAGACACAG: I am the trans2 from step 1
Forward: 5'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG3'
Reverse: 5'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATCAGTTTCTGTA3'
invariant SL region at 3' of the other strand :CAGTTTCTGTA: First time I wrote it I put the whole SL and was confused why they didn't match
                                              TCAGTTTCTGTA3'

I typed these so they match, thus they will anneal on the opposite strands, e.g. the forward primer will anneal to the 2nd strand copy of the SL primer from step 1. The reverse primer thus must hit on the SL, yay!

2.6 Step 5. Indexing

Finally, add the nextera primers and amplify again. So, those primers will have yet another overhang with the chosen index. Depending on the chosen indexes/primers, this may be important for how April sets up the sequencing runs.

The final amplicon should therefore look something like:

                                                                     I feel like these SL bases might benefit from dark cycles?
                                                                                  vvvvvvvvvvvv
nextera_index_sequence_TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG...3'endofrna...5'endofrnaTACAGAAACTGATCTGTCTCTTATACACATCTCCGAGCCCACGAGAC_nextera_index_sequence
                                                                                  TACAGAAACTGA

Bibliography

Cuypers, Bart, Malgorzata A. Domagalska, Pieter Meysman, Géraldine de Muylder, Manu Vanaerschot, Hideo Imamura, Franck Dumetz, et al. 2017. “Multiplexed Spliced-Leader Sequencing: A High-Throughput, Selective Method for RNA-seq in Trypanosomatids.” Scientific Reports 7 (1): 3725. https://doi.org/10.1038/s41598-017-03987-0.
Gibson, Wendy, Lewis Bingle, Wim Blendeman, Julia Brown, James Wood, and Jamie Stevens. 2000. “Structure and Sequence Variation of the Trypanosome Spliced Leader Transcript.” Molecular and Biochemical Parasitology 107 (2): 269–77. https://doi.org/10.1016/S0166-6851(00)00193-6.
---
title: "Notes on our persistence project."
author: "atb abelew@gmail.com"
bibliography: /home/trey/scratch/zotero_library/atb.bib
date: "`r Sys.Date()`"
output:
  html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: zenburn
    keep_md: false
    mode: selfcontained
    number_sections: true
    self_contained: true
    theme: readable
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: false
  rmdformats::readthedown:
    code_download: true
    code_folding: show
    df_print: paged
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: zenburn
    width: 300
    keep_md: false
    mode: selfcontained
    toc_float: true
  BiocStyle::html_document:
    code_download: true
    code_folding: show
    fig_caption: true
    fig_height: 7
    fig_width: 7
    highlight: zenburn
    keep_md: false
    mode: selfcontained
    toc_float: true
---

<style type="text/css">
body, td {
  font-size: 16px;
}
code.r {
  font-size: 16px;
}
pre {
  font-size: 16px
}
body .main-container {
  max-width: 1600px;
}
</style>

```{r options, include=FALSE}
library(hpgltools)
library(reticulate)
tt <- try(devtools::load_all("~/hpgltools"))
knitr::opts_knit$set(
  progress = TRUE, verbose = TRUE, width = 90, echo = TRUE)
knitr::opts_chunk$set(
  error = TRUE, fig.width = 8, fig.height = 8, fig.retina = 2,
  out.width = "100%", dev = "png",
  dev.args = list(png = list(type = "cairo-png")))
old_options <- options(digits = 4, stringsAsFactors = FALSE, knitr.duplicate.label = "allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size = 12))
ver <- "202305"
previous_file <- ""
ver <- format(Sys.Date(), "%Y%m%d")

##tmp <- sm(loadme(filename=paste0(gsub(pattern="\\.Rmd", replace="", x=previous_file), "-v", ver, ".rda.xz")))
rmd_file <- "template.Rmd"
```

# Introduction

Assuming we get to a place where we want to publish this work, this
will be an introductory document into what happened here.  In the
interim, it is a place for me to take some notes and lay out some
ideas.

# SL-Seq

The spliced leader sequencing protocol used follows
(@cuypersMultiplexedSplicedLeaderSequencing2017) very closely.  I
would like to write down my understanding of the protocol here:

## Given: Capped mRNA with SL>5'>CDS>3'>pA

One way to imagine our canonical protist RNA:

<pre>
7mGCapnnnnnnnnnnnnnnnnnnnnnAGTTTCTGTACTTTATTGGxxxxxxxxxxxxxxxxxxxxxx5pUTRAUGnnnnnnnnnnnnnnnnnnnnnUAA3pUTRAAAAAAAAAAAAA
</pre>

Note: I did not type out the complete SL sequence here because it is
somewhat variable (@gibsonStructureSequenceVariation2000).

The full 'canonical' sequence looks like this:

<pre>
7mGCapAACTAACGCTATTATTGATACAGTTTCTGTACTATATTG
</pre>

with the caveat that I am pretty sure it leaves off the 2 or 3 nt at
the 5' end which are methylated and modified.  Also note that the n's
at the 5' end of my SL sequence above are all in the variant region of
the Trypanosome SL.

Here is the relevant alignment from the paper above:

![Note the portion in the primer](1-s2.0-S0166685100001936-gr2_lrg.gif)

With all that in mind; let us step through the protocol.  Also, I am
reasonably certain that L.panamensis has a substitution at ~ position
17 in some but not all SL sequences (I think it has the AGA like
T.brucei but also ACA?).  Either way, the multiplexed paper chose
wisely to avoid those shenanigans...

## Step 1: 1st Strand cDNA synthesis

First strand cDNA with 5'GTATAAGACACAGNNNNNNNNN3': I am reasonably
certain the polyN sequence anneals semi-randomly and leaves a 5'
hanging end which happens to coincide with the nextera handle;
this will be important later I assume.  I assume without reading
the next step that it will add some RNAse H or T1 to blow up the
nascent cDNA:RNA duplex.

So: Now we have a ssDNA which looks something like this:

<pre>
5'GTATAAGACACAGNNNNNNNNN....CAATATAGTACAGAAACTGTATCAATAATAGCGTTA3'
  ^nextera handle (trans2)  ^RC of the SL
</pre>

with the caveat that the .... is likely quite long (mean ~ 1700
nt) and those last ~ 18 nt are likely to have a few variants depending
on species.

## Step 2: Destroy the RNA strand and purify

We don't want a bunch of weirdo concatamers to form.

## Step 3: Second strand synthesis

Second strand cDNA synthesis (I probably would have just done PCR
here, which I am guessing would not have worked for reasons I
cannot remember?)  The noteworthy part: add 3ul of 10 uM Strand 2
Leishmania primer.  I am going to guess that this primer looks
like the primer in step 1 with another handle for indexing.  Nope,
I found it in the text: 5'TCAGTTTCTGTA3'

So, this will make the hybrid:

<pre>
  ...CAATATAGTACAGAAACTGTATCAATAATAGCGTTA3'
      .......ATGTCTTTGACT5'  : once again missing the variant bases.
</pre>

This must mean that following PCR steps add the overhangs for the rest
of the library adapters?

## Step 4: Adapter addition

I would note that the extension time is only 60 seconds; where does
this cap the dsDNA amplicon size given the HiFi hotstart mix?  (I used
to know this, but no)

In any event, the following shows the regions of identity and where
they will therefore anneal on the opposite strand.

<pre>
                               GTATAAGACACAG: I am the trans2 from step 1
Forward: 5'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG3'
</pre>

<pre>
Reverse: 5'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATCAGTTTCTGTA3'
invariant SL region at 3' of the other strand :CAGTTTCTGTA: First time I wrote it I put the whole SL and was confused why they didn't match
                                              TCAGTTTCTGTA3'

</pre>

I typed these so they match, thus they will anneal on the opposite
strands, e.g. the forward primer will anneal to the 2nd strand copy of
the SL primer from step 1.  The reverse primer thus must hit on the SL, yay!

## Step 5.  Indexing

Finally, add the nextera primers and amplify again.  So,
those primers will have yet another overhang with the chosen
index.  Depending on the chosen indexes/primers, this may be important
for how April sets up the sequencing runs.

The final amplicon should therefore look something like:

<pre>
                                                                     I feel like these SL bases might benefit from dark cycles?
                                                                                  vvvvvvvvvvvv
nextera_index_sequence_TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG...3'endofrna...5'endofrnaTACAGAAACTGATCTGTCTCTTATACACATCTCCGAGCCCACGAGAC_nextera_index_sequence
                                                                                  TACAGAAACTGA
</pre>

# Bibliography
