https://tritrypdb.org/tritrypdb/app/record/gene/TcCLB.506551.10 I’d like to get the 5’ and 3’ UTR of this gene and the orthologs for CL Brener (12 genes total). Can you write for me a step by step script of how to do that? And also how to do an alignment of these regions to find similarities.
I have a little function which gathers UTRs from the TriTrypDB using an arbitrary padding around every CDS for the species for which the UTRs have not yet been identified. The best part about it is that I only need to type a portion of the species name which is unique to the species in question.
## Found the following hits: Trypanosoma cruzi CL Brener Esmeraldo-like, Trypanosoma cruzi CL Brener Non-Esmeraldo-like, choosing the first.
## Using: Trypanosoma cruzi CL Brener Esmeraldo-like.
## Unable to find CDSNAME, setting it to ANNOT_GENE_NAME.
## Unable to find CDSCHROM in the db, removing it.
## Unable to find CDSSTRAND in the db, removing it.
## Unable to find CDSSTART in the db, removing it.
## Unable to find CDSEND in the db, removing it.
## Extracted all gene ids.
## Attempting to select: ANNOT_GENE_NAME, GENE_TYPE, ANNOT_GENE_LOCATION_TEXT, ANNOT_GENE_NAME, ANNOT_GENE_PRODUCT, ANNOT_GENE_TYPE
## 'select()' returned 1:1 mapping between keys and columns
## Found 10 genes which are less than 300 nt. from the beginning of the chromosome.
## Found 5 genes which are less than 300 nt. from the end of the chromosome.
## gid annot_gene_name gene_type chromosome
## TcCLB.398345.10 TcCLB.398345.10 protein coding TcChr40-S
## TcCLB.401041.10 TcCLB.401041.10 protein coding TcChr14-S
## TcCLB.401473.9 TcCLB.401473.9 RPA1 protein coding TcChr25-S
## TcCLB.401569.10 TcCLB.401569.10 protein coding TcChr33-S
## TcCLB.401661.10 TcCLB.401661.10 protein coding TcChr35-S
## TcCLB.403789.9 TcCLB.403789.9 RPA1 protein coding TcChr25-S
## start end strand
## TcCLB.398345.10 517675 518916 +
## TcCLB.401041.10 586977 588116 +
## TcCLB.401473.9 173974 175273 +
## TcCLB.401569.10 203490 204662 +
## TcCLB.401661.10 1174114 1175710 +
## TcCLB.403789.9 172858 173873 +
## annot_gene_product
## TcCLB.398345.10 root hair defective 3 GTP-binding protein (RHD3), putative
## TcCLB.401041.10 retrotransposon hot spot protein (RHS, pseudogene), putative
## TcCLB.401473.9 DNA-directed RNA polymerase I largest subunit (fragment)
## TcCLB.401569.10 trans-sialidase, putative
## TcCLB.401661.10 trans-sialidase (pseudogene), putative
## TcCLB.403789.9 DNA-directed RNA polymerase I largest subunit (fragment)
## annot_gene_type length chr_length low_boundary high_boundary
## TcCLB.398345.10 protein coding 1241 2036759 517375 519216
## TcCLB.401041.10 protein coding 1139 598625 586677 588416
## TcCLB.401473.9 protein coding 1299 822374 173674 175573
## TcCLB.401569.10 protein coding 1172 1041172 203190 204962
## TcCLB.401661.10 protein coding 1596 1186946 1173814 1176010
## TcCLB.403789.9 protein coding 1015 822374 172558 174173
## fivep
## TcCLB.398345.10 TTATTTACGGAATTTTGCTCCAATTATACGGAACGTTTGCAGCGTGGTGAGCTTGTGACACATCTCACTTCTTTGCTTGAGCGCGATGTGGAAGATAAGCTGCGTGATTTTCACCAACAAACGAGGCTATACAGGGTAGACATTGTGCGGAAGACCGAGGCTGAACTTGAAGAGGAGCTCTTGAAGGTGGAGCTGAAACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNA
## TcCLB.401041.10 AGAAAAACGTAAGGCGGAAGAAAAGCGGGAGGCAGAAGAAAGATTAAGGCGTGAGGAGGATGAAAGGCAAAGACGAGCGCAAGAAATGAAATTTACCATTTCCACTACGATCGAAGAAGTACTGTTTAAAGGAGGAGTCCGCGTCAAGGAAAAGAAGCTGAACGATTTTCTTTACGATGGATTGGACGGCAGGGGCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTG
## TcCLB.401473.9 ATTCGGGGACTTATTCAGGATCATATTGCGGCTGGTGTCCTGTTAACATTGCGTGACAAGTTTCTTGAGCATGCCACCTTTGTTCAACTTACTTACTATGGGCTGGCACCCTACTTGCGCCAGCAACACGAAATCACACTTTCTGAGCTTATTCCTCTGCCGGCGATCTTATGGCCCCGGCCACTCTGGACAGGGAAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNT
## TcCLB.401569.10 TGCTGTGTGGAGCAGGAAGAATGCACCGGATACTACACATGGGGTGAGTTTTTTCTTGGCCGTGGGGATCGCGTGGCGCCGTTTGCCGATTTTTGTTTTTACGTTCTCATCTCCACATTTGCTGCGATGATAGCATCTTTTTTTTGTAAGGTGTATGCACCGTATGCAGCTGGTGGCGGCATCAATGAGGTGAAGACAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNA
## TcCLB.401661.10 CGGCGGTGTTCCGCGACGCTGTTGGCGTGTTGGTTGTTGGGGGCGTGGCGCTGTCGTCGCGTGGTGCGCTGTACGTGGACGGGCTGTTGGTGCAGACGGCGCTGGGGCTGTGCGTGTCGGTGGAGGGCGGTGTTGCGGCCAGCGGCGGCTCCGTGGTGGCGTTTTTTGACAGAGACTTCCTGCTGTGCAAGCACGCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGC
## TcCLB.403789.9 CGTCATACCCGACGTCGTGGACGGTGCGCTGGAGCAGCAAACGCTGCCTTCGTGGCTGCCACAGTTCGATTCTGTGAACTTCACACGAAATGCCGATGACGCGACGAGTGGCGAACTTCTTTTTCAGGGCGCCAACGCGACGATGCGTCATGTCTTGTCGTTTCTTTCTCTTTTTACTTTGGGGACGCGGGCAGGATTANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCC
## threep
## TcCLB.398345.10 ATTTCCTGAATTCATGTAATGGATTATTATTACACATGTACCTGTATGCATCCATTTGTGAGAGTTTGGAAGAAAAAGAAAGGACGCAATGAAAAGTTTTGGCTTGTACTTTGAAGTACCCCAATAATTCTACGCGCACAGGTTGTGCTGCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCTGTTTTTGTCCTTGATTGTTGGTATTTTATTCGGCGTTGTTTTTCTG
## TcCLB.401041.10 CANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
## TcCLB.401473.9 TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCAGGGCGCCAACGCGACGATGCGTCATGTCTTGTCGTTTCTTTCTCTTTTTACTTTGGGGACGCGGGCGATTAAATTGCGGCAGGCCCGCTCCACAGACATTCGTGACATGGCGAACTGGTTTGGTGTCGAGTCGGCATACCGCACCCTTTACGACGAGCTGTCCAAATTATTCAAGCGGTACTCCGTTGACCATCGC
## TcCLB.401569.10 ATGTAGTGAGAGAGTCTCCTGACAAATGTAGATAAATTCATAATTGTGCTGTGAACCGTTTGGGTAAATGTGTGTGTGCGCTCTCATAACAAGGAAATGATTTCCAGTAATGTTTTTGTTTTTTGTTCTCGAACTTTTTGAACAAATCTGCGGACAGACGGTGATGAGTAATTTGAATTTGTTTTTCAGCGTGTTTTTGTCACTGACCCTTTGTTTAAGTGGAGACCGCGTTGGAATGCGGTGAGGGCATTTCTCTGTTTTGTTTTTCCCCTTTTTTTTTTTCCTTTGTGTTTCTTCAATT
## TcCLB.401661.10 TTTTTTTTTTTGAGAGTGTGCACAAAGAGCCGTCCACCGCCAACACGCTCGCCGGCGACAAACAACACAACGACCCTGAAGGGGAAACGCATGCCTGCACTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGCCGCCTGCGAGGGCCGGCGGCGGTGGTGGCGATACTGCGGAGGGCTGCGTGAGTGGCGTGACGCTGACGGAGTCGGTGACGGTTGGCGGCCGGCGG
## TcCLB.403789.9 ANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTATACCGGGCCGTATGATGGTGCCATTTCCCCGCAACCATTTGCTCATGATGACCGCCTCAGGGGCAAAGGGAAGTAACGCTAATGCGACGCAGATGGCACTGGGACTCGGTCAGCAGCTTTTTGATGGACGACGCGTGAAACGAATGAATTCGGGAAAGACGCTGCCTGCCTTTTTTGCTCATGAACGCCGTGCCCG
## cds
## TcCLB.398345.10 ATGTCCCACTTGAAGAGTGTGGAAAGAGCGTCTTCCAATGATGTTTACATAACTGACAATGAAACGTGTAATGTTCTTGTGCATAGCTTTTGGAAGAGACTGTGCCGTGCTCTTCAGGCGGAGATAGAATTGCTTTATTGTGACTATAGCCAGCAGCATCAACGTCAACGGGAATCACAAACTTTAAATTTATATGATCGTTATGCTTCTTTGGTTGCAGAGGATCCGGCCTTGCAGGAGGCGATAGCACACGTTGTGTTGGATGCGGTATTTCAGAAGGTCAGTCGCCGTTTTGCCTCAATGGCTGAGAACGCGGCGGAAACAATTCATCAGGCCTTTGAGGGTGTTCTCAACCGCAACCAGGACGGCACAGTCCGTTTCTTTCATACAACAAAGGCACTACAACGCATTGAACCTCAGGCGCGCCAAGCTGGGCTTGTTCTCTTGGGCTGCCTGTTGTACTATCGCGTAAAGGTCGTTGCGGATCGGGTGGTTTACAAGCTAGAGGATACTGATGGACTCAGCCGTGCTGCTGTTCACCTCCTTGGCGAGCGTCGCAGGCTGATTGTGCGAGAAAACAGCGAGGAACAAAAATTTTTTCTTCACTATGCCACCATTTCGGAGGCTCCGCGGTACCCGATAGGTGCGCCTGTGGTGGAGACCGATTCTGGAGACACATCAGACAACGTTGTAGATAGGGACTGTGTGTTACTCAGTCAGCAGGCAGTGCAGCGGGCATTTGACCTGTATACACAAAAATGTGAATTCACCATGCAACTGCAACTTCGCTCCATTGAAGGCGAGAAACAGAATTTGCCTGCCTGGGTGCTGCCGGTGTTGTTGCTGTTGGGATGGAATGAAATATGGTATGTCCTTTCGTCCCCAGTTCTTTTTGTAGTTGTTGTTATTATCGCTGCGGTGTTTTTAAGGGGCTTTTTGTTGACTCAATGGGCAATATTTGAGGAGACAGGGCCCACCTGTGTCGTGGTGGGTGTTCGCGTCGTCGTGCGGCAAATTCGGAATATATACAAGGCCCTTGTTCCAATGATACCGGACGATGTTAAGAGTAACGTGGCACGGCACCGTGACCCAGGGAGTTTCTCTGATGTGACTGCGTCTGCTGTGGGAACATCATGGCCTTATGCTGCTGCCGAACCGACTGTGTTGCCGCCCTCTACAACGTCCGCCACTCTCACGCGGCGATTAAAGAAGGAAGAGGAGGTACCGACCCAGAAAGAATGA
## TcCLB.401041.10 GCGTACGTTATTGGGAGTCAATCATTCCTGTTGGACAGACCCACCAAAACCGTATCAACATACAGGGATAACCCCAGGATTGAGGATGTTGTAAACATTTTTTTTTTCCGTGGGGTTAAAGGGTATTGTATCTACGATGCGACATTGGCATGTCGTCAACCGTCTGCTGGTTTGCCTTGCAAGGGATGGGGCATGATTGTGGTGACACCACCAGACAAAAACGAATATGAACGGTGGACAAAAAAAATGGACGCTACTGCAATCGTAACGAATTGTCCCGAAGAAAACGATGTGAGGGCAATGTGCATTTGGATGAAGCGCAATCGACCCCTGCAGGAGCAAGCGGAATACTGGAAGGAGGTGAGGGGTCGCATGAATAACGTGGGACCAATTCTCCGCTCCATCTTTGATAAACAGGCATATGATGACCGCATTAAAGCGTGTCAGCAAGCCGTGGATGGGATGAACGCTTCGGAATTAAAGCGTAATTTGGGTATTGGCTGTTGTTATTCGTCCAATGACAATGACTTGTCTTGAAAGCTTCTGAAGGTTGTCCGAGTACGACGAGGAAACAACATTGAATCGCCTCTGAATTTGCTGGTATCTCCCCACCTTGAACGTGAAACTTTGTCCAGGTTGGAGAATGAAATGAAGCAGTCCGATTTTATTTTTTTTGTTTTGAGGTTCTGGGATTATGTCCCACCATATCTTATTGAAAAGTATGCCGTATCCGCATTTTTGAATGAGGATTTCCTGCGTGCGATAAGAATTAAAATCAAGGAACTGAGGCCACCAGGACGACGTGAGCCACACAGCTGTGCGCTGAAAGAGCACTCAGACACGAGCTTCACCAGAAAAGAGGTTCTACCGCCACCGGAACGTCTTTCCAATCCGGTTGCTATGGACCACTGGGTGCTGTATGAACCGAAGGTCCAACACTTTCCGCTGGTGGACGGCTTTTTCTTTGTGGACTCAAATCCAATGACGCTGGTTGGGCTGCGGATGACTACGGCGGGTGAGCACCGCACCACAACCAGCACTGTGAGGCAGTTCACTGAGTGCCTGGCGGCATATTTTAATGGTTGGGAGGAGTTATCCCGAGACATGTCGTGGGAGATTATTAATGTGCAGCACGCAGAC
## TcCLB.401473.9 TTTATACCGGGCCGTATGATGGTGCCATTTCCCCGCAACCATTTGCTCATGATGACCGCCTCAGGGGCAAAGGGAAGTAACGCTAATGCGACGCAGATGGCACTGGGACTCGGTCAGCAGCTTTTTGATGGACGACGCGTGAAACGAATGAATTCGGGAAAGACGCTGCCTGCCTTTTTTGCTCATGAACGCCGTGCCCGTTCTTTTGGCTACGCGATTGGACGCTTCACATCCGGTATCCGGCCACCAGAGTACACGATCCATGCCATGGCCGGTCGCGATGGTCTCATCGACACAGCTGTCAAAACTTCCCGCTCTGGCCATTTGCAGCGTTGCCTTATTAAAGGCCTTGAGAGTCTTGTGGTGCATTGGGACCACTCTGTTCGTGACTCAAACGGCAGCATCGTGCAGTTTACATACGGTGGCGATGGTCTTGACCCGTGCAAGGCCTCAACCCTTACGTCGTGGGAAACGTCGAAGGAAAACCTCGTGGATTTCGGAAAACGATTTGGAGTGGACACGGGTGAGGCGACCAGCGAGACGCGACGGCCAGAAAACTGGGAACAAGGAGTGAGGAACAGCAGCGTGAAGCGCGGTAAACGGCCACGCACAAGCATGAATAACGATAAGAAGAATAATACTAATAAAAATAATGACAACGACGACGAAGAGGAGGATGGCGACGATGAAAATAAAAGGAGCAGTAATTGTAATCACAATGAGAATGCACGGCAGCGGCACAAAGAGCAACAGTTACGTGAGAATCCACTCCCGCGGCACATGCATGACGGCCTTGAGGACTATTTACGCACCAAAGCAACATTTCCACTCTTCCAGCGGGTCTCACAGTTGGCACGCTGGAAGGCGCAGGGACAAGTGCAGGAGAAACTTGCAGAGAAGCGTAGAGAAAGTATTGCATATTATCGTGATGTTCTCTCGGAACTTGCCACAAGCCGACGTGTAAAGGCTTTCTGTGACCCCGGGGAACCTGTTGGTCTTCTTGCGGCACAGGCCGCCGGAGAGCCGTCTACGCAAATGACACTTAACACATTCCACAGTGCCGGTTCCACGGTGACCCACGTGACGGAAGGTATCCCACGTCTCCGTGAGCTGCTTATTCACGCCTCTGTTCAGAAAGCTGCAGTTATTGTGCCCGTGGAGAAGGCGACGCCGGTGGATGAATCAGCCATTGCACGCATACTACATGCGGGTGTGGCGATACGGCTGCGGGATTGTCTTGCGCGGGGCGGGACAAGTGGAAAAGGATATCATTACCACGTGGCACGACGTAAGGATGCGT
## TcCLB.401569.10 AAGAGGTACCACGTGGTTCTTACGATGGCGAATAAAATTGGCTCCGTGTACATTGATGGAGAACCTCTGGAGGGTTCAGGGCAGACCGTTGTGCCAGACGAGAGGACGCCTGACATCTCCCACTTCTACGTTGGCGGGTATAAAAGGAGTGATATGCCAACCATAAGCCACGTGACGGTGAATAATGTTCTTCTTTACAACCGTCAGCTGAATGCCGAGGAGATCAGGACCTTGTTCTTGAGCCAGGACCTGATTGGCACGGAAGCACACATGGGCAGCAGCAGCGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAACTCCCGGTGACAGCAGTGCCCACAGTACACCCTCAACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGGTGACAACGGTGCCCACAGTACGCCCTTGACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGGTGACAACGGTGCCCACAGTACGCCCTTGACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAACGGTGCCCACAGTACGCCCTCGACTCCCGCTGGCCACGGTGCCAATGGTACGGTTTTGATTTTGCACGATGGCGCTGCATTTTCGGCCTTTTCGGGCGGAGGGCTTCTTTTGTGTGCGGGTGCTTTGCTGCTGCACGTGTTCGTTATGGCAGTTTTTTTCTGA
## TcCLB.401661.10 CAGGGTGGCTTTGTCAGCGCGACGATCGATGGACAGAACGTCGTCCATTTCAGCCAGCCGGTGTATTCCTGGAAGGAGGGAGAAGAAGCGGGTCGAGTAGACTTGCGGCTGACGGACATGCAGCGAATTTATGATGTTGGGCCGGTATCCGCTGAAAATGAGAAGTTTGCCGCCAGCACTCTGCTGTACGCCACAGACGAAGTGCGATCATCGTTGGTGGAGAAAGAGTGGATAAAACTGTACTGCTCGCACGAGGTCGCTGCTGCGGATGACGAATGCAACATTGCTTTTGTGGACTTGACGGAGAAGTTGAAGGGCGTGAAGAGGGTGTTGGTTGCCTGGAAGGAGAAGGACGCGTAGGTTGCGAATGAATACCGCTGCGTGGGTGAAAAGAGCCAGAAGCGCCGTGACTGTAATGGTTCTGCCCCCACTGAAGGGCTGTTTGGCTTTTTATCCAACACGTCAACTGACAGCACGTGGGCCGACGAGTACCTCTGCGTGAATGCAGCAGTGAACAATGGGAGGTCAGGGGTTGATGGAGGGATGACGTTCAAAGGGTCTGGAGCGGGGAGTTTAGTGGCCTGTTGGCAAGCTGGGGCAGAATGTGCCGTACTACTTCGCAAACAACAAGTTCGGCCTTTTTTGGCGACGGTGACCATCCATGAGGAGCCGCATAATGGCCCTGTTTTTTTGGTGGGTGTGGGGATGAATGACACTGACAGCACCGTGCTTTTGGGCTGTTCTACACGAGTGGAAGGAAGTGGGAGGCCACCTTCAACGGTGAGACTCAGAATTTGTCAGGGGATCCCGACTTAGTGCAGGGCAAAACACATCAGTTGGCACTGCAATAGGATGATGCGGGGCTGATTGTGTACGTGGATGGATCGACGATATGCGACGGAGAACTGGATTATGAGGAGCATGAGGACTATGAGAGCTTTTCCAAGTGCTAATAACGCTGTTGAGGCCCCCCATCGTTTCACACTTCTGCGTTGGCGGCGGCGGCAAGAGTGCTCGGTAACGCTCATGTGACGGTGAGCAACGTCCTTTTGCACAACCGCGTGTTGAAGGGTGACGAGCTCCAAGCGCTAATGAAGACGAAGCCGGATGCTTCAGAGGCGAGGGTGCCGGCCCCGAAAGGTGCGCCTCACAACAATCATGCGAGTGAAACTTTCCCTCAATCCGCCAGTGGACTTGTCGTCGTGGATGAAGCGCGGCAGGAAGACACATCAGCACCACAACGTCAACACTCACCAGCGCAACCATCAGGAAATAGGAAGGGCTCAGCAGTTCCCAGGCAAACATCTTCTTCTGATGCCATTGGCCCGTCCACCTCAGCTGATACGGGGAAATGGAAGAAGAGACACCCAGCAGTGGTGTATTGGCGCCTGCATTGTCTTCGACACCGAGCGTGGTCAGTCGTCAGGAAGTACTTGAAAGCAAGATACCTGTCAGTGGGGGTCGCCCTGAGGGTGGCCGGGAGCACTTGCCCTCCAACGCGGCAGCATTGATGATGGGGCAGGCGGGCAAGGCGAGTGAAGGCTCTTCGCATAACGGATACACCGACGGTTGGCCCCAGCGCAGCATTTCACGTGAT
## TcCLB.403789.9 CCAGATCCACAGACACGTGCTCTCTCTAGTGTCCTTGGATTTGTGGAGCAAATTGAGTGTTACCAGGTACTGCAGAACAACTCCACACCGGACCGAAACCTCCTCTCCACCGCGCAAGAAATTGCAAATGAGATCAACCTACGTAACCTGCAGGTGAAGGTGAATGAGGTGTACCAAGAAATCATGGGGACATTTGCTAAGAAGGAAGGTTTGTTCCGTATGAATATGATGGGGAAGCGTGTGAATCAGGCCTGTCGCTCCGTCATCTCCCCGGATTTAATGGTGGAGCCGAACGAAGTCTTGCTGCCCCGACCATTTGCCCGAAATTTGTCTTTTCCTGAACAAGTGACATTTTACGCCTCTGCCCGCATGAATCTTTTGAAGCGTTGCGTCATCAACGGGCCGCGTCGCTACCCCGGTGCCACTCATCTTGAGATTCGACAAACGAATGGAGAAATTCGATTTATTGAGCTTGACGTGCCTGAGCAAACACGGCGGCAACACGCTGCCAAGTACTTTGCCATGGCGCAAAGTGGCGTCACACTGATTGTGTACCGTCATATCTTGGATGGCGACCGGGTGGTGTTTAACCGTCAACCGACATTGCACAAGCCAAGTATGATGGGGTACCAAGTCAAGGTGCTTTCTGGACACAAGACAATTCGCTTTCATTACGTGAATGGCAACTCCTTTAATGCCGATTTTGACGGCGATGAGATGAATATTCACGTCCCGCAGAGCCTGGAGGCGAAGGTGGAATTAGACGTCCTGATGGATGCCAATCTGAACTACCTCGTCCCCACCTCCGGGAAACCGATTCGGGGACTTATTCAGGATCATATTGCGGCTGGTGTCCTGTTAACATTGCGTGACAAGTTTCTTGAGCATGCCACCTTTGTTCAACTTACTTACTATGGGCTGGCACCCTACTTGCGCCAGCAACACGAAATCACACTTTCTGAGCTTATTCCTCTGCCGGCGATCTTATGGCCCCGGCCACTCTGGACAGGGAAACA
## all
## TcCLB.398345.10 TTATTTACGGAATTTTGCTCCAATTATACGGAACGTTTGCAGCGTGGTGAGCTTGTGACACATCTCACTTCTTTGCTTGAGCGCGATGTGGAAGATAAGCTGCGTGATTTTCACCAACAAACGAGGCTATACAGGGTAGACATTGTGCGGAAGACCGAGGCTGAACTTGAAGAGGAGCTCTTGAAGGTGGAGCTGAAACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGTCCCACTTGAAGAGTGTGGAAAGAGCGTCTTCCAATGATGTTTACATAACTGACAATGAAACGTGTAATGTTCTTGTGCATAGCTTTTGGAAGAGACTGTGCCGTGCTCTTCAGGCGGAGATAGAATTGCTTTATTGTGACTATAGCCAGCAGCATCAACGTCAACGGGAATCACAAACTTTAAATTTATATGATCGTTATGCTTCTTTGGTTGCAGAGGATCCGGCCTTGCAGGAGGCGATAGCACACGTTGTGTTGGATGCGGTATTTCAGAAGGTCAGTCGCCGTTTTGCCTCAATGGCTGAGAACGCGGCGGAAACAATTCATCAGGCCTTTGAGGGTGTTCTCAACCGCAACCAGGACGGCACAGTCCGTTTCTTTCATACAACAAAGGCACTACAACGCATTGAACCTCAGGCGCGCCAAGCTGGGCTTGTTCTCTTGGGCTGCCTGTTGTACTATCGCGTAAAGGTCGTTGCGGATCGGGTGGTTTACAAGCTAGAGGATACTGATGGACTCAGCCGTGCTGCTGTTCACCTCCTTGGCGAGCGTCGCAGGCTGATTGTGCGAGAAAACAGCGAGGAACAAAAATTTTTTCTTCACTATGCCACCATTTCGGAGGCTCCGCGGTACCCGATAGGTGCGCCTGTGGTGGAGACCGATTCTGGAGACACATCAGACAACGTTGTAGATAGGGACTGTGTGTTACTCAGTCAGCAGGCAGTGCAGCGGGCATTTGACCTGTATACACAAAAATGTGAATTCACCATGCAACTGCAACTTCGCTCCATTGAAGGCGAGAAACAGAATTTGCCTGCCTGGGTGCTGCCGGTGTTGTTGCTGTTGGGATGGAATGAAATATGGTATGTCCTTTCGTCCCCAGTTCTTTTTGTAGTTGTTGTTATTATCGCTGCGGTGTTTTTAAGGGGCTTTTTGTTGACTCAATGGGCAATATTTGAGGAGACAGGGCCCACCTGTGTCGTGGTGGGTGTTCGCGTCGTCGTGCGGCAAATTCGGAATATATACAAGGCCCTTGTTCCAATGATACCGGACGATGTTAAGAGTAACGTGGCACGGCACCGTGACCCAGGGAGTTTCTCTGATGTGACTGCGTCTGCTGTGGGAACATCATGGCCTTATGCTGCTGCCGAACCGACTGTGTTGCCGCCCTCTACAACGTCCGCCACTCTCACGCGGCGATTAAAGAAGGAAGAGGAGGTACCGACCCAGAAAGAATGATTTCCTGAATTCATGTAATGGATTATTATTACACATGTACCTGTATGCATCCATTTGTGAGAGTTTGGAAGAAAAAGAAAGGACGCAATGAAAAGTTTTGGCTTGTACTTTGAAGTACCCCAATAATTCTACGCGCACAGGTTGTGCTGCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCTGTTTTTGTCCTTGATTGTTGGTATTTTATTCGGCGTTGTTTTTCTG
## TcCLB.401041.10 AGAAAAACGTAAGGCGGAAGAAAAGCGGGAGGCAGAAGAAAGATTAAGGCGTGAGGAGGATGAAAGGCAAAGACGAGCGCAAGAAATGAAATTTACCATTTCCACTACGATCGAAGAAGTACTGTTTAAAGGAGGAGTCCGCGTCAAGGAAAAGAAGCTGAACGATTTTCTTTACGATGGATTGGACGGCAGGGGCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTGCGTACGTTATTGGGAGTCAATCATTCCTGTTGGACAGACCCACCAAAACCGTATCAACATACAGGGATAACCCCAGGATTGAGGATGTTGTAAACATTTTTTTTTTCCGTGGGGTTAAAGGGTATTGTATCTACGATGCGACATTGGCATGTCGTCAACCGTCTGCTGGTTTGCCTTGCAAGGGATGGGGCATGATTGTGGTGACACCACCAGACAAAAACGAATATGAACGGTGGACAAAAAAAATGGACGCTACTGCAATCGTAACGAATTGTCCCGAAGAAAACGATGTGAGGGCAATGTGCATTTGGATGAAGCGCAATCGACCCCTGCAGGAGCAAGCGGAATACTGGAAGGAGGTGAGGGGTCGCATGAATAACGTGGGACCAATTCTCCGCTCCATCTTTGATAAACAGGCATATGATGACCGCATTAAAGCGTGTCAGCAAGCCGTGGATGGGATGAACGCTTCGGAATTAAAGCGTAATTTGGGTATTGGCTGTTGTTATTCGTCCAATGACAATGACTTGTCTTGAAAGCTTCTGAAGGTTGTCCGAGTACGACGAGGAAACAACATTGAATCGCCTCTGAATTTGCTGGTATCTCCCCACCTTGAACGTGAAACTTTGTCCAGGTTGGAGAATGAAATGAAGCAGTCCGATTTTATTTTTTTTGTTTTGAGGTTCTGGGATTATGTCCCACCATATCTTATTGAAAAGTATGCCGTATCCGCATTTTTGAATGAGGATTTCCTGCGTGCGATAAGAATTAAAATCAAGGAACTGAGGCCACCAGGACGACGTGAGCCACACAGCTGTGCGCTGAAAGAGCACTCAGACACGAGCTTCACCAGAAAAGAGGTTCTACCGCCACCGGAACGTCTTTCCAATCCGGTTGCTATGGACCACTGGGTGCTGTATGAACCGAAGGTCCAACACTTTCCGCTGGTGGACGGCTTTTTCTTTGTGGACTCAAATCCAATGACGCTGGTTGGGCTGCGGATGACTACGGCGGGTGAGCACCGCACCACAACCAGCACTGTGAGGCAGTTCACTGAGTGCCTGGCGGCATATTTTAATGGTTGGGAGGAGTTATCCCGAGACATGTCGTGGGAGATTATTAATGTGCAGCACGCAGACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
## TcCLB.401473.9 ATTCGGGGACTTATTCAGGATCATATTGCGGCTGGTGTCCTGTTAACATTGCGTGACAAGTTTCTTGAGCATGCCACCTTTGTTCAACTTACTTACTATGGGCTGGCACCCTACTTGCGCCAGCAACACGAAATCACACTTTCTGAGCTTATTCCTCTGCCGGCGATCTTATGGCCCCGGCCACTCTGGACAGGGAAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTATACCGGGCCGTATGATGGTGCCATTTCCCCGCAACCATTTGCTCATGATGACCGCCTCAGGGGCAAAGGGAAGTAACGCTAATGCGACGCAGATGGCACTGGGACTCGGTCAGCAGCTTTTTGATGGACGACGCGTGAAACGAATGAATTCGGGAAAGACGCTGCCTGCCTTTTTTGCTCATGAACGCCGTGCCCGTTCTTTTGGCTACGCGATTGGACGCTTCACATCCGGTATCCGGCCACCAGAGTACACGATCCATGCCATGGCCGGTCGCGATGGTCTCATCGACACAGCTGTCAAAACTTCCCGCTCTGGCCATTTGCAGCGTTGCCTTATTAAAGGCCTTGAGAGTCTTGTGGTGCATTGGGACCACTCTGTTCGTGACTCAAACGGCAGCATCGTGCAGTTTACATACGGTGGCGATGGTCTTGACCCGTGCAAGGCCTCAACCCTTACGTCGTGGGAAACGTCGAAGGAAAACCTCGTGGATTTCGGAAAACGATTTGGAGTGGACACGGGTGAGGCGACCAGCGAGACGCGACGGCCAGAAAACTGGGAACAAGGAGTGAGGAACAGCAGCGTGAAGCGCGGTAAACGGCCACGCACAAGCATGAATAACGATAAGAAGAATAATACTAATAAAAATAATGACAACGACGACGAAGAGGAGGATGGCGACGATGAAAATAAAAGGAGCAGTAATTGTAATCACAATGAGAATGCACGGCAGCGGCACAAAGAGCAACAGTTACGTGAGAATCCACTCCCGCGGCACATGCATGACGGCCTTGAGGACTATTTACGCACCAAAGCAACATTTCCACTCTTCCAGCGGGTCTCACAGTTGGCACGCTGGAAGGCGCAGGGACAAGTGCAGGAGAAACTTGCAGAGAAGCGTAGAGAAAGTATTGCATATTATCGTGATGTTCTCTCGGAACTTGCCACAAGCCGACGTGTAAAGGCTTTCTGTGACCCCGGGGAACCTGTTGGTCTTCTTGCGGCACAGGCCGCCGGAGAGCCGTCTACGCAAATGACACTTAACACATTCCACAGTGCCGGTTCCACGGTGACCCACGTGACGGAAGGTATCCCACGTCTCCGTGAGCTGCTTATTCACGCCTCTGTTCAGAAAGCTGCAGTTATTGTGCCCGTGGAGAAGGCGACGCCGGTGGATGAATCAGCCATTGCACGCATACTACATGCGGGTGTGGCGATACGGCTGCGGGATTGTCTTGCGCGGGGCGGGACAAGTGGAAAAGGATATCATTACCACGTGGCACGACGTAAGGATGCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCAGGGCGCCAACGCGACGATGCGTCATGTCTTGTCGTTTCTTTCTCTTTTTACTTTGGGGACGCGGGCGATTAAATTGCGGCAGGCCCGCTCCACAGACATTCGTGACATGGCGAACTGGTTTGGTGTCGAGTCGGCATACCGCACCCTTTACGACGAGCTGTCCAAATTATTCAAGCGGTACTCCGTTGACCATCGC
## TcCLB.401569.10 TGCTGTGTGGAGCAGGAAGAATGCACCGGATACTACACATGGGGTGAGTTTTTTCTTGGCCGTGGGGATCGCGTGGCGCCGTTTGCCGATTTTTGTTTTTACGTTCTCATCTCCACATTTGCTGCGATGATAGCATCTTTTTTTTGTAAGGTGTATGCACCGTATGCAGCTGGTGGCGGCATCAATGAGGTGAAGACAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAGAGGTACCACGTGGTTCTTACGATGGCGAATAAAATTGGCTCCGTGTACATTGATGGAGAACCTCTGGAGGGTTCAGGGCAGACCGTTGTGCCAGACGAGAGGACGCCTGACATCTCCCACTTCTACGTTGGCGGGTATAAAAGGAGTGATATGCCAACCATAAGCCACGTGACGGTGAATAATGTTCTTCTTTACAACCGTCAGCTGAATGCCGAGGAGATCAGGACCTTGTTCTTGAGCCAGGACCTGATTGGCACGGAAGCACACATGGGCAGCAGCAGCGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAACTCCCGGTGACAGCAGTGCCCACAGTACACCCTCAACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGGTGACAACGGTGCCCACAGTACGCCCTTGACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGGTGACAACGGTGCCCACAGTACGCCCTTGACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAACGGTGCCCACAGTACGCCCTCGACTCCCGCTGGCCACGGTGCCAATGGTACGGTTTTGATTTTGCACGATGGCGCTGCATTTTCGGCCTTTTCGGGCGGAGGGCTTCTTTTGTGTGCGGGTGCTTTGCTGCTGCACGTGTTCGTTATGGCAGTTTTTTTCTGATGTAGTGAGAGAGTCTCCTGACAAATGTAGATAAATTCATAATTGTGCTGTGAACCGTTTGGGTAAATGTGTGTGTGCGCTCTCATAACAAGGAAATGATTTCCAGTAATGTTTTTGTTTTTTGTTCTCGAACTTTTTGAACAAATCTGCGGACAGACGGTGATGAGTAATTTGAATTTGTTTTTCAGCGTGTTTTTGTCACTGACCCTTTGTTTAAGTGGAGACCGCGTTGGAATGCGGTGAGGGCATTTCTCTGTTTTGTTTTTCCCCTTTTTTTTTTTCCTTTGTGTTTCTTCAATT
## TcCLB.401661.10 CGGCGGTGTTCCGCGACGCTGTTGGCGTGTTGGTTGTTGGGGGCGTGGCGCTGTCGTCGCGTGGTGCGCTGTACGTGGACGGGCTGTTGGTGCAGACGGCGCTGGGGCTGTGCGTGTCGGTGGAGGGCGGTGTTGCGGCCAGCGGCGGCTCCGTGGTGGCGTTTTTTGACAGAGACTTCCTGCTGTGCAAGCACGCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGCAGGGTGGCTTTGTCAGCGCGACGATCGATGGACAGAACGTCGTCCATTTCAGCCAGCCGGTGTATTCCTGGAAGGAGGGAGAAGAAGCGGGTCGAGTAGACTTGCGGCTGACGGACATGCAGCGAATTTATGATGTTGGGCCGGTATCCGCTGAAAATGAGAAGTTTGCCGCCAGCACTCTGCTGTACGCCACAGACGAAGTGCGATCATCGTTGGTGGAGAAAGAGTGGATAAAACTGTACTGCTCGCACGAGGTCGCTGCTGCGGATGACGAATGCAACATTGCTTTTGTGGACTTGACGGAGAAGTTGAAGGGCGTGAAGAGGGTGTTGGTTGCCTGGAAGGAGAAGGACGCGTAGGTTGCGAATGAATACCGCTGCGTGGGTGAAAAGAGCCAGAAGCGCCGTGACTGTAATGGTTCTGCCCCCACTGAAGGGCTGTTTGGCTTTTTATCCAACACGTCAACTGACAGCACGTGGGCCGACGAGTACCTCTGCGTGAATGCAGCAGTGAACAATGGGAGGTCAGGGGTTGATGGAGGGATGACGTTCAAAGGGTCTGGAGCGGGGAGTTTAGTGGCCTGTTGGCAAGCTGGGGCAGAATGTGCCGTACTACTTCGCAAACAACAAGTTCGGCCTTTTTTGGCGACGGTGACCATCCATGAGGAGCCGCATAATGGCCCTGTTTTTTTGGTGGGTGTGGGGATGAATGACACTGACAGCACCGTGCTTTTGGGCTGTTCTACACGAGTGGAAGGAAGTGGGAGGCCACCTTCAACGGTGAGACTCAGAATTTGTCAGGGGATCCCGACTTAGTGCAGGGCAAAACACATCAGTTGGCACTGCAATAGGATGATGCGGGGCTGATTGTGTACGTGGATGGATCGACGATATGCGACGGAGAACTGGATTATGAGGAGCATGAGGACTATGAGAGCTTTTCCAAGTGCTAATAACGCTGTTGAGGCCCCCCATCGTTTCACACTTCTGCGTTGGCGGCGGCGGCAAGAGTGCTCGGTAACGCTCATGTGACGGTGAGCAACGTCCTTTTGCACAACCGCGTGTTGAAGGGTGACGAGCTCCAAGCGCTAATGAAGACGAAGCCGGATGCTTCAGAGGCGAGGGTGCCGGCCCCGAAAGGTGCGCCTCACAACAATCATGCGAGTGAAACTTTCCCTCAATCCGCCAGTGGACTTGTCGTCGTGGATGAAGCGCGGCAGGAAGACACATCAGCACCACAACGTCAACACTCACCAGCGCAACCATCAGGAAATAGGAAGGGCTCAGCAGTTCCCAGGCAAACATCTTCTTCTGATGCCATTGGCCCGTCCACCTCAGCTGATACGGGGAAATGGAAGAAGAGACACCCAGCAGTGGTGTATTGGCGCCTGCATTGTCTTCGACACCGAGCGTGGTCAGTCGTCAGGAAGTACTTGAAAGCAAGATACCTGTCAGTGGGGGTCGCCCTGAGGGTGGCCGGGAGCACTTGCCCTCCAACGCGGCAGCATTGATGATGGGGCAGGCGGGCAAGGCGAGTGAAGGCTCTTCGCATAACGGATACACCGACGGTTGGCCCCAGCGCAGCATTTCACGTGATTTTTTTTTTTGAGAGTGTGCACAAAGAGCCGTCCACCGCCAACACGCTCGCCGGCGACAAACAACACAACGACCCTGAAGGGGAAACGCATGCCTGCACTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGCCGCCTGCGAGGGCCGGCGGCGGTGGTGGCGATACTGCGGAGGGCTGCGTGAGTGGCGTGACGCTGACGGAGTCGGTGACGGTTGGCGGCCGGCGG
## TcCLB.403789.9 CGTCATACCCGACGTCGTGGACGGTGCGCTGGAGCAGCAAACGCTGCCTTCGTGGCTGCCACAGTTCGATTCTGTGAACTTCACACGAAATGCCGATGACGCGACGAGTGGCGAACTTCTTTTTCAGGGCGCCAACGCGACGATGCGTCATGTCTTGTCGTTTCTTTCTCTTTTTACTTTGGGGACGCGGGCAGGATTANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCCAGATCCACAGACACGTGCTCTCTCTAGTGTCCTTGGATTTGTGGAGCAAATTGAGTGTTACCAGGTACTGCAGAACAACTCCACACCGGACCGAAACCTCCTCTCCACCGCGCAAGAAATTGCAAATGAGATCAACCTACGTAACCTGCAGGTGAAGGTGAATGAGGTGTACCAAGAAATCATGGGGACATTTGCTAAGAAGGAAGGTTTGTTCCGTATGAATATGATGGGGAAGCGTGTGAATCAGGCCTGTCGCTCCGTCATCTCCCCGGATTTAATGGTGGAGCCGAACGAAGTCTTGCTGCCCCGACCATTTGCCCGAAATTTGTCTTTTCCTGAACAAGTGACATTTTACGCCTCTGCCCGCATGAATCTTTTGAAGCGTTGCGTCATCAACGGGCCGCGTCGCTACCCCGGTGCCACTCATCTTGAGATTCGACAAACGAATGGAGAAATTCGATTTATTGAGCTTGACGTGCCTGAGCAAACACGGCGGCAACACGCTGCCAAGTACTTTGCCATGGCGCAAAGTGGCGTCACACTGATTGTGTACCGTCATATCTTGGATGGCGACCGGGTGGTGTTTAACCGTCAACCGACATTGCACAAGCCAAGTATGATGGGGTACCAAGTCAAGGTGCTTTCTGGACACAAGACAATTCGCTTTCATTACGTGAATGGCAACTCCTTTAATGCCGATTTTGACGGCGATGAGATGAATATTCACGTCCCGCAGAGCCTGGAGGCGAAGGTGGAATTAGACGTCCTGATGGATGCCAATCTGAACTACCTCGTCCCCACCTCCGGGAAACCGATTCGGGGACTTATTCAGGATCATATTGCGGCTGGTGTCCTGTTAACATTGCGTGACAAGTTTCTTGAGCATGCCACCTTTGTTCAACTTACTTACTATGGGCTGGCACCCTACTTGCGCCAGCAACACGAAATCACACTTTCTGAGCTTATTCCTCTGCCGGCGATCTTATGGCCCCGGCCACTCTGGACAGGGAAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTATACCGGGCCGTATGATGGTGCCATTTCCCCGCAACCATTTGCTCATGATGACCGCCTCAGGGGCAAAGGGAAGTAACGCTAATGCGACGCAGATGGCACTGGGACTCGGTCAGCAGCTTTTTGATGGACGACGCGTGAAACGAATGAATTCGGGAAAGACGCTGCCTGCCTTTTTTGCTCATGAACGCCGTGCCCG
## gid annot_gene_name gene_type chromosome
## TcCLB.506551.10 TcCLB.506551.10 protein coding TcChr27-S
## start end strand
## TcCLB.506551.10 202017 203870 -
## annot_gene_product
## TcCLB.506551.10 protein associated with differentiation 8, putative
## annot_gene_type length chr_length low_boundary high_boundary
## TcCLB.506551.10 protein coding 1853 850241 201717 204170
## fivep
## TcCLB.506551.10 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGGGTTTTAAGGGAGGGAAGCACACACGCGTGTCTATCCATATCTATATAACTATATATCTTAGACACAGATATAATAGGGGCCACTCTGTCTGCAACCATTATATCAATTGGAAGAGCACTCATCCAATCGGCTTGGCGTGAACCA
## threep
## TcCLB.506551.10 ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTATTTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGGTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCTCCCT
## cds
## TcCLB.506551.10 ATGACCGGCGAACAATTTGTTGAAGTTCTCGCTACGGGGCCACAGAAGGCAATCTATGAGCCCCGTCGCTTTGCGATTTTGCTGCTTGGATCCTATGGCTGTATCTGCTCATCTCTGAGCTACGCCTTCAATCTCATTGCGCCGGAGATGCAGTCGCGTTACGACCTTACTGGGCGCGATATCTCGACCATCAGCACGGTGGGTTTGGTGGTTGGCTACTTTCTTATGCCATATGGCTTCATTTTTGATCACTTTGGTCCCAAACCGATTTTCATACTGAGCATGGTGCTGTTTCCTCTCGGGGCGCTGCTGTTTGCGCTGTCATTTCGCGGGACAATTGAGGGCTCTGTGGTGCGTCTAAGCTTTTTCAACGCCATTCTGACACTCGGATGCACGCTGTATGACGTAGTATACATGATGACGATCATGAGCCATTTCCCGATCAGCAGGGGCCCTGTCGTGGCCATTTTGAAGTCGTACATCGGACTGGGCTCCGCCATTGTGGGAAGCATCCAGCTGGCCTTTTTTGACGGGAGGCCGGACCACTACTTCTATTTTCTGATGGTGCTGTTTTTTGTGACTGGAGCTGCGGGTTTCTTCCTTGTGCCACTCCCGTCGTACCACCTGACTGGCTATGAGGAGAAACACCTTGGCATCGAGGAAAAGGAGAGACGACTGGCACGCAAATCCGTTTACCTCCGCCAGCAACCACCCACAATTCGCTTCGCGATCGGCATTGCGTTTGTTGTCCTGCTGGTTATATACTTGCCACTGCAGAGCGCACTGGTTGCGTATCTGGGGTGGGGGAGGACGCAGCGCATCATATTTGCGTCCATCTTGATTGCTGTCCTTGTGGCACTTCCGTTGATGGCATTGCCCGTTTCGTGCCTTGAGAGGAGGGAGACACAACGGGAGGAGGATGACTGCGGTGGGACGGAGAGACCGAGTGCGGGTGATGAGGTGGCGAAAGAGCCTGCGGCGGCTGGTGGTCCTCCGAAGAAGGTGGAGACGGACGTCGACTACATTGCGCCGCAGTACCAGACGACCTTTCTCCAGAACCTGAAAACGCTGAAGTTGTGGGCGCTTCTCTGGTGCTTTTTTACCTTGGGGGGCGCCGGGTTTGTGATCATCTACAACGCCAGCTTTGTCTACGCCGCGCTTGCTGACGAAGAGGTGGACAACGCCATCAAAACGCTTCTCACGGTGCTGAACGGGGTGGGAAGTGCGGCGGGTCGGCTACTGATGAGCTACTTCGAGGTCTGGTCGCAGAAACGCAAGGCCGAGGACAGGGTTTCGATTATCGTTTCTATCTATTTGGCTGATGTGTTCGTGATCCTGTCGCTGGTTTTATTCCTCGTGGTGCCCAGGGCTGCACTGCCGCTGCCGTACGTATTGGCTGCCCTTGGCAACGGTTTTGGCGCAGCCTCCCTTGTGTTGGTTTCTCGGACTGTTTTTGCAAAGGACCCCGCCAAGCACTACAACTTCTGCTTCCTTGCATCACTGTTTTCTACAATCTTCCTGAACCGCCTTCTGTACGGCGAGTGGTACACGCGGGAGGCTGAGAAGCAGGGCGGCAATGTTTGCCTTGGCCGGAATTGTGTGATGATGCCGCTGATATTTTTGATTGTTCTCAGCTTCACCGCGTTTCTTTCTACTGCCTATTTTGACTGGGAGTACCGCCGATTCAGTCGATTGGTGCTTGAGGAGCGGTGCCGTCTGAAGGAGAGGGCAGGGGAAGGGCTATTGGCGGTGGAGTCTCCCCCGCTTGTTGCAGCGGAGCGACAGCAAGAGGAAGAGGATGCCGGCAACCGAACAACGACGCCGGCCAACGACAGGAAGGTGGCACGTCCGTAA
## all
## TcCLB.506551.10 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGGGTTTTAAGGGAGGGAAGCACACACGCGTGTCTATCCATATCTATATAACTATATATCTTAGACACAGATATAATAGGGGCCACTCTGTCTGCAACCATTATATCAATTGGAAGAGCACTCATCCAATCGGCTTGGCGTGAACCATGACCGGCGAACAATTTGTTGAAGTTCTCGCTACGGGGCCACAGAAGGCAATCTATGAGCCCCGTCGCTTTGCGATTTTGCTGCTTGGATCCTATGGCTGTATCTGCTCATCTCTGAGCTACGCCTTCAATCTCATTGCGCCGGAGATGCAGTCGCGTTACGACCTTACTGGGCGCGATATCTCGACCATCAGCACGGTGGGTTTGGTGGTTGGCTACTTTCTTATGCCATATGGCTTCATTTTTGATCACTTTGGTCCCAAACCGATTTTCATACTGAGCATGGTGCTGTTTCCTCTCGGGGCGCTGCTGTTTGCGCTGTCATTTCGCGGGACAATTGAGGGCTCTGTGGTGCGTCTAAGCTTTTTCAACGCCATTCTGACACTCGGATGCACGCTGTATGACGTAGTATACATGATGACGATCATGAGCCATTTCCCGATCAGCAGGGGCCCTGTCGTGGCCATTTTGAAGTCGTACATCGGACTGGGCTCCGCCATTGTGGGAAGCATCCAGCTGGCCTTTTTTGACGGGAGGCCGGACCACTACTTCTATTTTCTGATGGTGCTGTTTTTTGTGACTGGAGCTGCGGGTTTCTTCCTTGTGCCACTCCCGTCGTACCACCTGACTGGCTATGAGGAGAAACACCTTGGCATCGAGGAAAAGGAGAGACGACTGGCACGCAAATCCGTTTACCTCCGCCAGCAACCACCCACAATTCGCTTCGCGATCGGCATTGCGTTTGTTGTCCTGCTGGTTATATACTTGCCACTGCAGAGCGCACTGGTTGCGTATCTGGGGTGGGGGAGGACGCAGCGCATCATATTTGCGTCCATCTTGATTGCTGTCCTTGTGGCACTTCCGTTGATGGCATTGCCCGTTTCGTGCCTTGAGAGGAGGGAGACACAACGGGAGGAGGATGACTGCGGTGGGACGGAGAGACCGAGTGCGGGTGATGAGGTGGCGAAAGAGCCTGCGGCGGCTGGTGGTCCTCCGAAGAAGGTGGAGACGGACGTCGACTACATTGCGCCGCAGTACCAGACGACCTTTCTCCAGAACCTGAAAACGCTGAAGTTGTGGGCGCTTCTCTGGTGCTTTTTTACCTTGGGGGGCGCCGGGTTTGTGATCATCTACAACGCCAGCTTTGTCTACGCCGCGCTTGCTGACGAAGAGGTGGACAACGCCATCAAAACGCTTCTCACGGTGCTGAACGGGGTGGGAAGTGCGGCGGGTCGGCTACTGATGAGCTACTTCGAGGTCTGGTCGCAGAAACGCAAGGCCGAGGACAGGGTTTCGATTATCGTTTCTATCTATTTGGCTGATGTGTTCGTGATCCTGTCGCTGGTTTTATTCCTCGTGGTGCCCAGGGCTGCACTGCCGCTGCCGTACGTATTGGCTGCCCTTGGCAACGGTTTTGGCGCAGCCTCCCTTGTGTTGGTTTCTCGGACTGTTTTTGCAAAGGACCCCGCCAAGCACTACAACTTCTGCTTCCTTGCATCACTGTTTTCTACAATCTTCCTGAACCGCCTTCTGTACGGCGAGTGGTACACGCGGGAGGCTGAGAAGCAGGGCGGCAATGTTTGCCTTGGCCGGAATTGTGTGATGATGCCGCTGATATTTTTGATTGTTCTCAGCTTCACCGCGTTTCTTTCTACTGCCTATTTTGACTGGGAGTACCGCCGATTCAGTCGATTGGTGCTTGAGGAGCGGTGCCGTCTGAAGGAGAGGGCAGGGGAAGGGCTATTGGCGGTGGAGTCTCCCCCGCTTGTTGCAGCGGAGCGACAGCAAGAGGAAGAGGATGCCGGCAACCGAACAACGACGCCGGCCAACGACAGGAAGGTGGCACGTCCGTAATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTATTTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGGTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCTCCCT
## Found: Trypanosoma cruzi CL Brener Non-Esmeraldo-like
## Unable to find CDSNAME, setting it to ANNOT_GENE_NAME.
## Unable to find CDSCHROM in the db, removing it.
## Unable to find CDSSTRAND in the db, removing it.
## Unable to find CDSSTART in the db, removing it.
## Unable to find CDSEND in the db, removing it.
## Extracted all gene ids.
## Attempting to select: ANNOT_GENE_NAME, GENE_TYPE, ANNOT_GENE_LOCATION_TEXT, ANNOT_GENE_NAME, ANNOT_GENE_PRODUCT, ANNOT_GENE_TYPE
## 'select()' returned 1:1 mapping between keys and columns
## Found 7 genes which are less than 300 nt. from the beginning of the chromosome.
## Found 9 genes which are less than 300 nt. from the end of the chromosome.
fivep <- gsub(pattern="^(N+)", replacement="", x=tc_utr["TcCLB.506551.10", "fivep"])
threep <- tc_utr["TcCLB.506551.10", "threep"]It seems to me that we can assume that the actual 3’ UTR is everything before the run of Ts starting at around position 100, assuming that is the polypyrimidine tract.
I am guessing that Fernanda got the number 12 by clicking the little OrthoMCL link provided by the tritrypdb. When I clicked it, the first thing I noticed is that the gene IDs for CL Brener are the old-style. Thus it will probably be simpler for me to just get them manually since sometimes the IDs don’t match up well old to new.
Now that I think about it, I have a little toy which is supposed to provide all the orthologs, this provides an opportunity to make sure that it actually works.
## Unable to find species names for 1 species.
## Plasmodium vivax like
## Found the following hits: Trypanosoma cruzi CL Brener Esmeraldo-like, Trypanosoma cruzi CL Brener Non-Esmeraldo-like, choosing the first.
## Using: Trypanosoma cruzi CL Brener Esmeraldo-like.
## Loaded: org.Tcruzi.CL.Brener.Esmeraldo.like.v46.eg.db
## Some columns were missing: ORTHOLOGS_GROUP_ID, ORTHOLOGS_COUNT
## Removing them, which may end badly.
## 'select()' returned 1:many mapping between keys and columns
## There are 52 possible species in this group.
## Found species: Blechomonas ayalai B08-376
## Found species: Bodo saltans strain Lake Konstanz
## Found species: Crithidia fasciculata strain Cf-Cl
## Found species: Endotrypanum monterogeii strain LV88
## Found species: Leishmania aethiopica L147
## Found species: Leishmania amazonensis MHOM/BR/71973/M2269
## Found species: Leishmania arabica strain LEM1108
## Found species: Leishmania braziliensis MHOM/BR/75/M2903
## Found species: Leishmania braziliensis MHOM/BR/75/M2904
## Found species: Leishmania braziliensis MHOM/BR/75/M2904 2019
## Found species: Leishmania donovani BPK282A1
## Found species: Leishmania donovani CL-SL
## Found species: Leishmania donovani strain LV9
## Found species: Leishmania enriettii strain LEM3045
## Found species: Leishmania gerbilli strain LEM452
## Found species: Leishmania infantum JPCM5
## Found species: Leishmania major strain Friedlin
## Found species: Leishmania major strain LV39c5
## Found species: Leishmania major strain SD 75.1
## Found species: Leishmania mexicana MHOM/GT/2001/U1103
## Found species: Leishmania panamensis MHOM/COL/81/L13
## Found species: Leishmania panamensis strain MHOM/PA/94/PSC-1
## Found species: Leishmania sp. MAR LEM2494
## Found species: Leishmania tarentolae Parrot-TarII
## Found species: Leishmania tropica L590
## Found species: Leishmania turanica strain LEM423
## Found species: Leptomonas pyrrhocoris H10
## Found species: Leptomonas seymouri ATCC 30220
## Found species: Paratrypanosoma confusum CUL13
## Found species: Trypanosoma brucei brucei TREU927
## Found species: Trypanosoma brucei gambiense DAL972
## Found species: Trypanosoma brucei Lister strain 427
## Found species: Trypanosoma brucei Lister strain 427 2018
## Found species: Trypanosoma congolense IL3000
## Found species: Trypanosoma congolense IL3000 2019
## Found species: Trypanosoma cruzi Brazil A4
## Found species: Trypanosoma cruzi CL Brener Esmeraldo-like
## Found species: Trypanosoma cruzi CL Brener Non-Esmeraldo-like
## Found species: Trypanosoma cruzi Dm28c 2014
## Found species: Trypanosoma cruzi Dm28c 2017
## Found species: Trypanosoma cruzi Dm28c 2018
## Found species: Trypanosoma cruzi marinkellei strain B7
## Found species: Trypanosoma cruzi strain CL Brener
## Found species: Trypanosoma cruzi Sylvio X10/1
## Found species: Trypanosoma cruzi Sylvio X10/1-2012
## Found species: Trypanosoma cruzi TCC
## Found species: Trypanosoma cruzi Y C6
## Found species: Trypanosoma evansi strain STIB 805
## Found species: Trypanosoma grayi ANR4
## Found species: Trypanosoma rangeli SC58
## Found species: Trypanosoma theileri isolate Edinburgh
## Found species: Trypanosoma vivax Y486
gene_idx <- tc_orthos[["GID"]] == "TcCLB.506551.10"
chosen <- tc_orthos[gene_idx, ]
esmer_gene_idx <- chosen[["ORTHOLOGS_ORGANISM"]] == "Trypanosoma cruzi CL Brener Esmeraldo-like"
nonesmer_gene_idx <- chosen[["ORTHOLOGS_ORGANISM"]] == "Trypanosoma cruzi CL Brener Non-Esmeraldo-like"
esmer_chosen <- chosen[esmer_gene_idx, ]
esmer_chosen_genes <- esmer_chosen[["ORTHOLOGS_GID"]]
non_chosen <- chosen[nonesmer_gene_idx, ]
non_chosen_genes <- non_chosen[["ORTHOLOGS_GID"]]
esmer_utr_idx <- rownames(tc_utr) %in% esmer_chosen_genes
esmer_utrs <- tc_utr[esmer_utr_idx, ]
non_utr_idx <- rownames(nonesmer_utr) %in% non_chosen_genes
nonesmer_utrs <- nonesmer_utr[non_utr_idx, ]
e5p <- esmer_utrs[, c("gid", "fivep")]
readr::write_csv(x=e5p, path="esmer_5p.csv")
n5p <- nonesmer_utrs[, c("gid", "fivep")]
readr::write_csv(x=n5p, path="nonesmer_5p.csv")
e3p <- esmer_utrs[, c("gid", "threep")]
readr::write_csv(x=e3p, path="esmer_3p.csv")
n3p <- nonesmer_utrs[, c("gid", "threep")]
readr::write_csv(x=n3p, path="nonesmer_3p.csv")Now I use a handy macro in my editor to convert the csv files to fasta.
Finally, I invoke an aligner (I chose fasta36 because I like it). It looks like there are no significant similarities for any of the 5’ UTRs, but a couple for the 3’ UTRs.
# /cbcb/sw/RedHat-7-x86_64/common/local/fasta/36.3.8e/bin/ggsearch36 query_threep.fasta threeps.fasta
GGSEARCH performs a global/global database searches
version 36.3.8e Sep, 2016(preload9)
Query: query_threep.fasta
1>>>query_threep - 301 nt
Library: threeps.fasta
3612 residues in 12 sequences
Statistics: (shuffled [500]) Unscaled normal statistics: mu= -188.4700 var=4262.2256 Ztrim: 0
statistics sampled from 12 (12) to 500 sequences
Algorithm: Global/Global affine Needleman-Wunsch (SSE2, Michael Farrar 2010) (6.0 April 2007)
Parameters: +5/-4 matrix (5:-4), open/ext: -12/-4
Scan time: 0.080
The best scores are: n-w bits E(12)
TcCLB.508799.270 ( 301) [f] 1336 60.5 8.1e-120
TcCLB.509713.10 ( 301) [f] 1243 57.9 8.8e-106
TcCLB.510811.20 ( 301) [f] 853 46.8 1.6e-56
TcCLB.510069.20 ( 301) [f] 380 33.4 1.9e-17
TcCLB.507383.10 ( 301) [f] 70 24.6 0.00045
TcCLB.510811.10 ( 301) [f] 70 24.6 0.00045
TcCLB.509713.20 ( 301) [f] -112 19.5 1.4
>>TcCLB.508799.270 (301 nt)
n-w opt: 1336 Z-score: 283.5 bits: 60.5 E(12): 8.1e-120
global/global (N-W) score: 1336; 94.4% identity (94.4% similar) in 305 nt overlap (1-301:1-301)
10 20 30 40 50 60
query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT
:::: ::::::::::::: :: :: :: ::::::::::::::::::::::::::::::::
TcCLB. ATTT-TTTTTGTATTGCCGCGCCGTTATTTTATTTATTTTTGATGATATGTTTGATTTAT
10 20 30 40 50
70 80 90 100 110 120
query_ TTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATTC
::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::
TcCLB. TTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGGCTTTTTCATTGTTGATTC
60 70 80 90 100 110
130 140 150 160 170 180
query_ TTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGGT
::::::: :::: ::::::::::::::::::::::::::::::::::::::::::: :
TcCLB. TTTTTTTCTTTT---TTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGCT
120 130 140 150 160 170
190 200 210 220 230 240
query_ GACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGG
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
TcCLB. GACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGG
180 190 200 210 220 230
250 260 270 280 290 300
query_ TAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCTCCC
:: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::
TcCLB. CAAATTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCTCCC
240 250 260 270 280 290
query_ T----
:
TcCLB. TCGGT
300
>>TcCLB.509713.10 (301 nt)
n-w opt: 1243 Z-score: 269.3 bits: 57.9 E(12): 8.8e-106
global/global (N-W) score: 1243; 91.5% identity (91.5% similar) in 305 nt overlap (1-301:1-301)
10 20 30 40 50 60
query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT
:::: ::::::::::::: : :: :: :::::::::::::::: :::::::::::::::
TcCLB. ATTT-TTTTTGTATTGCCGTGCCGTTATTTTATTTATTTTTGATTATATGTTTGATTTAT
10 20 30 40 50
70 80 90 100 110 120
query_ TTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATTC
::::::::::::::::::::: :::::::::::::::::: ::::::::::::::::::
TcCLB. GTATATTTCATCGTATGCTGGTTGGTTGCGTGTGTGTTTTGGCTTTTTCATTGTTGATTC
60 70 80 90 100 110
130 140 150 160 170 180
query_ TTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGGT
:::::::::: : :::::::::::::::::::::::::::::::::::: ::::::::
TcCLB. TTTTTTTTTTCT---TTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCAGGGACGGT
120 130 140 150 160 170
190 200 210 220 230 240
query_ GACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGG
:::::::::::::::::::::::::::::::::::: :::::::::::::::::::::::
TcCLB. GACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAACGAAGGATGTGGTGATAATAAGG
180 190 200 210 220 230
250 260 270 280 290
query_ TAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCT-CC
::::::::: :::::::::::: : ::::::::::: ::::::: :::::::::: : ::
TcCLB. TAACTTATTCCATTCTGTTTTTTAATGTTTTTCTTAGTGTTTTTTCTTTACTTTTTTTCC
240 250 260 270 280 290
300
query_ CT---
::
TcCLB. CTCGG
300
>>TcCLB.510811.20 (301 nt)
n-w opt: 853 Z-score: 209.5 bits: 46.8 E(12): 1.6e-56
global/global (N-W) score: 853; 77.5% identity (77.5% similar) in 311 nt overlap (1-301:1-301)
10 20 30 40 50 60
query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT
:::: :: :::: :::: :: ::::: ::: :: :::::::::::: :: :: : :
TcCLB. ATTT-TTCTTGTGTTGCTGCGCCGCTACTTTGTTCATTTTTGATGATGTGCTTTACACAC
10 20 30 40 50
70 80 90 100 110
query_ TTATATTT-CATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATT
::: ::: : : :: :::: : :::: :::::::::::::: ::::::::: ::::::
TcCLB. GTATTTTTTCGTTGTGTGCTAGCGGGTCGCGTGTGTGTTTTGGCTTTTTCATAGTTGATA
60 70 80 90 100 110
120 130 140 150 160 170
query_ CTTTTTTTTT--------TTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCC
:::: :::: ::::::::::: ::::::::::::::: :::::::: ::::
TcCLB. TTTTTATTTTGTTTCAATTTTTTTTTGGCAGGCGTCACTTTTTTCTGCGCGTGACCCTCC
120 130 140 150 160 170
180 190 200 210 220 230
query_ GGGGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTG
:::::::::: ::::::::::::::::::::::::::: ::::: ::::::::::::::
TcCLB. GGGGACGGTGGCGGAGCATTGCGGGGCGGTGTGCACGCATACCGAACGAAGGATGTGGTG
180 190 200 210 220 230
240 250 260 270 280 290
query_ ATAATAAGGTAACTTATTTCATTCTGTTTTTGA-GTGTTTTTCTTACTGTTTTTCCTTTA
::::::::: ::::::::: :: :: ::::: : ::::: :: ::::: :::
TcCLB. ATAATAAGGCAACTTATTTAATCCTTTTTTTTTCGAATTTTTTTTTTTGTTT----CTTA
240 250 260 270 280 290
300
query_ CTTTTCTCCCT
: :: :
TcCLB. TTGTT-----T
300
>>TcCLB.510069.20 (301 nt)
n-w opt: 380 Z-score: 137.1 bits: 33.4 E(12): 1.9e-17
global/global (N-W) score: 380; 49.0% identity (49.0% similar) in 304 nt overlap (1-301:1-301)
10 20 30 40 50 60
query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT
:: : : :::: :::: :: ::::: ::: :: :::::::::::: :: :: : :
TcCLB. ATAT-TGCTTGTGTTGCTGCGCCGCTACTTTGTTCATTTTTGATGATTTGCTTTACACAC
10 20 30 40 50
70 80 90 100 110
query_ TTATATTT-CATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATT
::::::: ::: :: :::::: :::: :::::::::::::: ::::::::: ::::::
TcCLB. GTATATTTTCATTGTGTGCTGGCGGGTCGCGTGTGTGTTTTGGCTTTTTCATAGTTGATA
60 70 80 90 100 110
120 130 140 150 160 170
query_ CTTTTTTTTTT--TTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGAC
:::: ::::: :::::::::::::::::::::: ::: :::::::
TcCLB. TTTTTATTTTTATTTTTTTTGGCGGGCGTCACTTTCTTCTGCGCGTGGCCNNNNNNNNNN
120 130 140 150 160 170
180 190 200 210 220 230
query_ GGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATA
TcCLB. NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
180 190 200 210 220 230
240 250 260 270 280 290
query_ AGGTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCT
::: : : :::: : :
TcCLB. NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATTTAAGGATACCATACCTT--CGGCACC
240 250 260 270 280 290
300
query_ CCCT
::::
TcCLB. CCCT
300
>>TcCLB.507383.10 (301 nt)
n-w opt: 70 Z-score: 89.6 bits: 24.6 E(12): 0.00045
global/global (N-W) score: 70; 50.8% identity (50.8% similar) in 311 nt overlap (1-301:1-301)
10 20 30 40 50 60
query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT
:: : : :::: :::: :: ::::: ::: :: :::::::::::: :: :: : :
TcCLB. ATAT-TGCTTGTGTTGCTGCGCCGCTACTTTGTTCATTTTTGATGATGTGCTTTACACAC
10 20 30 40 50
70 80 90 100 110
query_ TTATATTT-CATCGTATGCTGGTGGG---TTGCGTGTGTGTTTTGACTTTTTCATTGTTG
::::::: ::: :: :::: : ::: :: : : ::: :: : : : :
TcCLB. GTATATTTTCATTGTGTGCTAGCGGGCTTTTAGGAGCTTGTGAAGAGGGTGTTCGCGAGG
60 70 80 90 100 110
120 130 140 150 160 170
query_ ATTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTC---CGCGCGTGAGTCTCCGG
: : : : : :: : :: ::: : :: :: : :
TcCLB. AAAGCTAGCTATCTAACTT-----GATAAACATTTTAATAAAACGAATATGTATATTTTT
120 130 140 150 160 170
180 190 200 210 220 230
query_ GGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGAT
:: : : : : : :: :: :: : :: ::: ::: : : :
TcCLB. CGATTCTTCCTTATTAATTGTTGGAGGATCGCGGGTTGAG-GAGTCAAGAAACCGAGAAC
180 190 200 210 220 230
240 250 260 270 280 290
query_ AATAAG--GTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTAC
::: : :: :: ::: :: : ::::: ::::: ::::: : : :
TcCLB. TTCAAGACGGAATTTTTTTATTTTTATTTTTA---GTTTTCATTACTTCACTCACCCTCT
240 250 260 270 280 290
300
query_ TTTTCTCCCT-
::::: :
TcCLB. TTTTCCGATTT
300
>>TcCLB.510811.10 (301 nt)
n-w opt: 70 Z-score: 89.6 bits: 24.6 E(12): 0.00045
global/global (N-W) score: 70; 50.8% identity (50.8% similar) in 311 nt overlap (1-301:1-301)
10 20 30 40 50 60
query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT
:::: :: :::: :::: :: ::::: ::: :: :::::::::::: :: :: : :
TcCLB. ATTT-TTCTTGTGTTGCTGCGCCGCTACTTTGTTCATTTTTGATGATGTGCTTTACACAC
10 20 30 40 50
70 80 90 100 110
query_ -TTATATTTCATCGTATGCTGGTGGG---TTGCGTGTGTGTTTTGACTTTTTCATTGTTG
:: : :::::: :: :::: : ::: :: : : ::: :: : : : :
TcCLB. GTTTTTTTTCATTGTGTGCTAGCGGGCTTTTAGGAGCCTGTGAAGAGGGTGTTCGCGAGG
60 70 80 90 100 110
120 130 140 150 160 170
query_ ATTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCG-CGCGTGAG---TCTCCG
: : : : : :: : :: ::: : :: : : : :
TcCLB. AAAGCTAGCTATCTAACTT-----GATAAACATTTTAATAAAACGAATAATATATATTTT
120 130 140 150 160 170
180 190 200 210 220 230
query_ GGGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGA
:: : : : : : :: :: :: : :: ::: ::: : : :
TcCLB. TCGATTCTTCCTTATTAATTGTTGGAGGATCGCGGGTTGAG-GAGTCAAGAAACCGAGAA
180 190 200 210 220 230
240 250 260 270 280 290
query_ TAATAAG--GTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTA
::: : :: :: ::: :: : ::::: : : : :: : : : : :: :
TcCLB. CTTCAAGACGGAATTTTTTTATTTTTATTTTTAATTTTCATTACTTCACTCATCCCCT--
240 250 260 270 280 290
300
query_ CTTTTCTCCCT
:::: : :
TcCLB. -TTTTTTGATT
300
>>TcCLB.509713.20 (301 nt)
n-w opt: -112 Z-score: 61.7 bits: 19.5 E(12): 1.4
global/global (N-W) score: -112; 46.0% identity (46.0% similar) in 313 nt overlap (1-301:1-301)
10 20 30 40 50
query_ ATTTCTTTTTGTATT-GCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTA
: : :: : : : : : : :: : : ::: : : : :: :: :
TcCLB. AGTGAATTGTTTGTGAGGGATGACG-TGGACTGTTTTTGTGAGGGGAGTGCACCGACTAC
10 20 30 40 50
60 70 80 90 100 110
query_ TTTATATTTCATCGTATGCT---GGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTG
::: : : :: ::: :: : ::: ::: : : : : :: :::
TcCLB. AGAATACT-CCTCTGATGACAACGGCTGCTTGAAGGTGGGGAGCGGATGATCTTTTCTTG
60 70 80 90 100 110
120 130 140 150 160 170
query_ A-TTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGC-GTGA--GTCTCC-
: :::: :: : : :: : : ::: ::: :: ::: :::: : ::
TcCLB. AGTTCTGGAGGGATTGTCTCGTGCCTGAAGTC--TTTGTTGTGCGATGTGAACGAAGCCA
120 130 140 150 160 170
180 190 200 210 220
query_ -GGGGACGGTGACG-GAGCATTGCGGGGCGGTGTGCAC-GCGGACCGAGCGAAGGATGTG
:: :: : : : : :: : : : : :: : :: ::: : : : :: ::
TcCLB. TGGAGAAGTTCATGCGACTACTTACATGAGTAGTTTATAGCTACCCGCGTAATGCATCTG
180 190 200 210 220 230
230 240 250 260 270 280
query_ GTGATAATAAGGTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTT
: : :: : : :: :: : :::: :: :: : : : :::: :
TcCLB. CGTGCACGAGGG------AGTACA--CTTTGTTTGCGTTGATTACGTGATTTTTTGTTAT
240 250 260 270 280
290 300
query_ TACTTTTCTCCCT
:: : :
TcCLB. TATGACGATGGCG
290 300
301 residues in 1 query sequences
3612 residues in 12 library sequences
Tcomplib [36.3.8e Sep, 2016(preload9)] (32 proc in memory [0G])
start: Fri Mar 27 11:12:08 2020 done: Fri Mar 27 11:12:08 2020
Total Scan time: 0.080 Total Display time: 0.000
Function used was GGSEARCH [36.3.8e Sep, 2016(preload9)]