https://tritrypdb.org/tritrypdb/app/record/gene/TcCLB.506551.10 I’d like to get the 5’ and 3’ UTR of this gene and the orthologs for CL Brener (12 genes total). Can you write for me a step by step script of how to do that? And also how to do an alignment of these regions to find similarities.
I have a little function which gathers UTRs from the TriTrypDB using an arbitrary padding around every CDS for the species for which the UTRs have not yet been identified. The best part about it is that I only need to type a portion of the species name which is unique to the species in question.
## Found the following hits: Trypanosoma cruzi CL Brener Esmeraldo-like, Trypanosoma cruzi CL Brener Non-Esmeraldo-like, choosing the first.
## Using: Trypanosoma cruzi CL Brener Esmeraldo-like.
## Unable to find CDSNAME, setting it to ANNOT_GENE_NAME.
## Unable to find CDSCHROM in the db, removing it.
## Unable to find CDSSTRAND in the db, removing it.
## Unable to find CDSSTART in the db, removing it.
## Unable to find CDSEND in the db, removing it.
## Extracted all gene ids.
## Attempting to select: ANNOT_GENE_NAME, GENE_TYPE, ANNOT_GENE_LOCATION_TEXT, ANNOT_GENE_NAME, ANNOT_GENE_PRODUCT, ANNOT_GENE_TYPE
## 'select()' returned 1:1 mapping between keys and columns
## Found 10 genes which are less than 300 nt. from the beginning of the chromosome.
## Found 5 genes which are less than 300 nt. from the end of the chromosome.
## gid annot_gene_name gene_type chromosome
## TcCLB.398345.10 TcCLB.398345.10 protein coding TcChr40-S
## TcCLB.401041.10 TcCLB.401041.10 protein coding TcChr14-S
## TcCLB.401473.9 TcCLB.401473.9 RPA1 protein coding TcChr25-S
## TcCLB.401569.10 TcCLB.401569.10 protein coding TcChr33-S
## TcCLB.401661.10 TcCLB.401661.10 protein coding TcChr35-S
## TcCLB.403789.9 TcCLB.403789.9 RPA1 protein coding TcChr25-S
## start end strand
## TcCLB.398345.10 517675 518916 +
## TcCLB.401041.10 586977 588116 +
## TcCLB.401473.9 173974 175273 +
## TcCLB.401569.10 203490 204662 +
## TcCLB.401661.10 1174114 1175710 +
## TcCLB.403789.9 172858 173873 +
## annot_gene_product
## TcCLB.398345.10 root hair defective 3 GTP-binding protein (RHD3), putative
## TcCLB.401041.10 retrotransposon hot spot protein (RHS, pseudogene), putative
## TcCLB.401473.9 DNA-directed RNA polymerase I largest subunit (fragment)
## TcCLB.401569.10 trans-sialidase, putative
## TcCLB.401661.10 trans-sialidase (pseudogene), putative
## TcCLB.403789.9 DNA-directed RNA polymerase I largest subunit (fragment)
## annot_gene_type length chr_length low_boundary high_boundary
## TcCLB.398345.10 protein coding 1241 2036759 517375 519216
## TcCLB.401041.10 protein coding 1139 598625 586677 588416
## TcCLB.401473.9 protein coding 1299 822374 173674 175573
## TcCLB.401569.10 protein coding 1172 1041172 203190 204962
## TcCLB.401661.10 protein coding 1596 1186946 1173814 1176010
## TcCLB.403789.9 protein coding 1015 822374 172558 174173
## fivep
## TcCLB.398345.10 TTATTTACGGAATTTTGCTCCAATTATACGGAACGTTTGCAGCGTGGTGAGCTTGTGACACATCTCACTTCTTTGCTTGAGCGCGATGTGGAAGATAAGCTGCGTGATTTTCACCAACAAACGAGGCTATACAGGGTAGACATTGTGCGGAAGACCGAGGCTGAACTTGAAGAGGAGCTCTTGAAGGTGGAGCTGAAACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNA
## TcCLB.401041.10 AGAAAAACGTAAGGCGGAAGAAAAGCGGGAGGCAGAAGAAAGATTAAGGCGTGAGGAGGATGAAAGGCAAAGACGAGCGCAAGAAATGAAATTTACCATTTCCACTACGATCGAAGAAGTACTGTTTAAAGGAGGAGTCCGCGTCAAGGAAAAGAAGCTGAACGATTTTCTTTACGATGGATTGGACGGCAGGGGCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTG
## TcCLB.401473.9 ATTCGGGGACTTATTCAGGATCATATTGCGGCTGGTGTCCTGTTAACATTGCGTGACAAGTTTCTTGAGCATGCCACCTTTGTTCAACTTACTTACTATGGGCTGGCACCCTACTTGCGCCAGCAACACGAAATCACACTTTCTGAGCTTATTCCTCTGCCGGCGATCTTATGGCCCCGGCCACTCTGGACAGGGAAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNT
## TcCLB.401569.10 TGCTGTGTGGAGCAGGAAGAATGCACCGGATACTACACATGGGGTGAGTTTTTTCTTGGCCGTGGGGATCGCGTGGCGCCGTTTGCCGATTTTTGTTTTTACGTTCTCATCTCCACATTTGCTGCGATGATAGCATCTTTTTTTTGTAAGGTGTATGCACCGTATGCAGCTGGTGGCGGCATCAATGAGGTGAAGACAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNA
## TcCLB.401661.10 CGGCGGTGTTCCGCGACGCTGTTGGCGTGTTGGTTGTTGGGGGCGTGGCGCTGTCGTCGCGTGGTGCGCTGTACGTGGACGGGCTGTTGGTGCAGACGGCGCTGGGGCTGTGCGTGTCGGTGGAGGGCGGTGTTGCGGCCAGCGGCGGCTCCGTGGTGGCGTTTTTTGACAGAGACTTCCTGCTGTGCAAGCACGCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGC
## TcCLB.403789.9 CGTCATACCCGACGTCGTGGACGGTGCGCTGGAGCAGCAAACGCTGCCTTCGTGGCTGCCACAGTTCGATTCTGTGAACTTCACACGAAATGCCGATGACGCGACGAGTGGCGAACTTCTTTTTCAGGGCGCCAACGCGACGATGCGTCATGTCTTGTCGTTTCTTTCTCTTTTTACTTTGGGGACGCGGGCAGGATTANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCC
## threep
## TcCLB.398345.10 ATTTCCTGAATTCATGTAATGGATTATTATTACACATGTACCTGTATGCATCCATTTGTGAGAGTTTGGAAGAAAAAGAAAGGACGCAATGAAAAGTTTTGGCTTGTACTTTGAAGTACCCCAATAATTCTACGCGCACAGGTTGTGCTGCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCTGTTTTTGTCCTTGATTGTTGGTATTTTATTCGGCGTTGTTTTTCTG
## TcCLB.401041.10 CANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
## TcCLB.401473.9 TNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCAGGGCGCCAACGCGACGATGCGTCATGTCTTGTCGTTTCTTTCTCTTTTTACTTTGGGGACGCGGGCGATTAAATTGCGGCAGGCCCGCTCCACAGACATTCGTGACATGGCGAACTGGTTTGGTGTCGAGTCGGCATACCGCACCCTTTACGACGAGCTGTCCAAATTATTCAAGCGGTACTCCGTTGACCATCGC
## TcCLB.401569.10 ATGTAGTGAGAGAGTCTCCTGACAAATGTAGATAAATTCATAATTGTGCTGTGAACCGTTTGGGTAAATGTGTGTGTGCGCTCTCATAACAAGGAAATGATTTCCAGTAATGTTTTTGTTTTTTGTTCTCGAACTTTTTGAACAAATCTGCGGACAGACGGTGATGAGTAATTTGAATTTGTTTTTCAGCGTGTTTTTGTCACTGACCCTTTGTTTAAGTGGAGACCGCGTTGGAATGCGGTGAGGGCATTTCTCTGTTTTGTTTTTCCCCTTTTTTTTTTTCCTTTGTGTTTCTTCAATT
## TcCLB.401661.10 TTTTTTTTTTTGAGAGTGTGCACAAAGAGCCGTCCACCGCCAACACGCTCGCCGGCGACAAACAACACAACGACCCTGAAGGGGAAACGCATGCCTGCACTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGCCGCCTGCGAGGGCCGGCGGCGGTGGTGGCGATACTGCGGAGGGCTGCGTGAGTGGCGTGACGCTGACGGAGTCGGTGACGGTTGGCGGCCGGCGG
## TcCLB.403789.9 ANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTATACCGGGCCGTATGATGGTGCCATTTCCCCGCAACCATTTGCTCATGATGACCGCCTCAGGGGCAAAGGGAAGTAACGCTAATGCGACGCAGATGGCACTGGGACTCGGTCAGCAGCTTTTTGATGGACGACGCGTGAAACGAATGAATTCGGGAAAGACGCTGCCTGCCTTTTTTGCTCATGAACGCCGTGCCCG
## cds
## TcCLB.398345.10 ATGTCCCACTTGAAGAGTGTGGAAAGAGCGTCTTCCAATGATGTTTACATAACTGACAATGAAACGTGTAATGTTCTTGTGCATAGCTTTTGGAAGAGACTGTGCCGTGCTCTTCAGGCGGAGATAGAATTGCTTTATTGTGACTATAGCCAGCAGCATCAACGTCAACGGGAATCACAAACTTTAAATTTATATGATCGTTATGCTTCTTTGGTTGCAGAGGATCCGGCCTTGCAGGAGGCGATAGCACACGTTGTGTTGGATGCGGTATTTCAGAAGGTCAGTCGCCGTTTTGCCTCAATGGCTGAGAACGCGGCGGAAACAATTCATCAGGCCTTTGAGGGTGTTCTCAACCGCAACCAGGACGGCACAGTCCGTTTCTTTCATACAACAAAGGCACTACAACGCATTGAACCTCAGGCGCGCCAAGCTGGGCTTGTTCTCTTGGGCTGCCTGTTGTACTATCGCGTAAAGGTCGTTGCGGATCGGGTGGTTTACAAGCTAGAGGATACTGATGGACTCAGCCGTGCTGCTGTTCACCTCCTTGGCGAGCGTCGCAGGCTGATTGTGCGAGAAAACAGCGAGGAACAAAAATTTTTTCTTCACTATGCCACCATTTCGGAGGCTCCGCGGTACCCGATAGGTGCGCCTGTGGTGGAGACCGATTCTGGAGACACATCAGACAACGTTGTAGATAGGGACTGTGTGTTACTCAGTCAGCAGGCAGTGCAGCGGGCATTTGACCTGTATACACAAAAATGTGAATTCACCATGCAACTGCAACTTCGCTCCATTGAAGGCGAGAAACAGAATTTGCCTGCCTGGGTGCTGCCGGTGTTGTTGCTGTTGGGATGGAATGAAATATGGTATGTCCTTTCGTCCCCAGTTCTTTTTGTAGTTGTTGTTATTATCGCTGCGGTGTTTTTAAGGGGCTTTTTGTTGACTCAATGGGCAATATTTGAGGAGACAGGGCCCACCTGTGTCGTGGTGGGTGTTCGCGTCGTCGTGCGGCAAATTCGGAATATATACAAGGCCCTTGTTCCAATGATACCGGACGATGTTAAGAGTAACGTGGCACGGCACCGTGACCCAGGGAGTTTCTCTGATGTGACTGCGTCTGCTGTGGGAACATCATGGCCTTATGCTGCTGCCGAACCGACTGTGTTGCCGCCCTCTACAACGTCCGCCACTCTCACGCGGCGATTAAAGAAGGAAGAGGAGGTACCGACCCAGAAAGAATGA
## TcCLB.401041.10 GCGTACGTTATTGGGAGTCAATCATTCCTGTTGGACAGACCCACCAAAACCGTATCAACATACAGGGATAACCCCAGGATTGAGGATGTTGTAAACATTTTTTTTTTCCGTGGGGTTAAAGGGTATTGTATCTACGATGCGACATTGGCATGTCGTCAACCGTCTGCTGGTTTGCCTTGCAAGGGATGGGGCATGATTGTGGTGACACCACCAGACAAAAACGAATATGAACGGTGGACAAAAAAAATGGACGCTACTGCAATCGTAACGAATTGTCCCGAAGAAAACGATGTGAGGGCAATGTGCATTTGGATGAAGCGCAATCGACCCCTGCAGGAGCAAGCGGAATACTGGAAGGAGGTGAGGGGTCGCATGAATAACGTGGGACCAATTCTCCGCTCCATCTTTGATAAACAGGCATATGATGACCGCATTAAAGCGTGTCAGCAAGCCGTGGATGGGATGAACGCTTCGGAATTAAAGCGTAATTTGGGTATTGGCTGTTGTTATTCGTCCAATGACAATGACTTGTCTTGAAAGCTTCTGAAGGTTGTCCGAGTACGACGAGGAAACAACATTGAATCGCCTCTGAATTTGCTGGTATCTCCCCACCTTGAACGTGAAACTTTGTCCAGGTTGGAGAATGAAATGAAGCAGTCCGATTTTATTTTTTTTGTTTTGAGGTTCTGGGATTATGTCCCACCATATCTTATTGAAAAGTATGCCGTATCCGCATTTTTGAATGAGGATTTCCTGCGTGCGATAAGAATTAAAATCAAGGAACTGAGGCCACCAGGACGACGTGAGCCACACAGCTGTGCGCTGAAAGAGCACTCAGACACGAGCTTCACCAGAAAAGAGGTTCTACCGCCACCGGAACGTCTTTCCAATCCGGTTGCTATGGACCACTGGGTGCTGTATGAACCGAAGGTCCAACACTTTCCGCTGGTGGACGGCTTTTTCTTTGTGGACTCAAATCCAATGACGCTGGTTGGGCTGCGGATGACTACGGCGGGTGAGCACCGCACCACAACCAGCACTGTGAGGCAGTTCACTGAGTGCCTGGCGGCATATTTTAATGGTTGGGAGGAGTTATCCCGAGACATGTCGTGGGAGATTATTAATGTGCAGCACGCAGAC
## TcCLB.401473.9 TTTATACCGGGCCGTATGATGGTGCCATTTCCCCGCAACCATTTGCTCATGATGACCGCCTCAGGGGCAAAGGGAAGTAACGCTAATGCGACGCAGATGGCACTGGGACTCGGTCAGCAGCTTTTTGATGGACGACGCGTGAAACGAATGAATTCGGGAAAGACGCTGCCTGCCTTTTTTGCTCATGAACGCCGTGCCCGTTCTTTTGGCTACGCGATTGGACGCTTCACATCCGGTATCCGGCCACCAGAGTACACGATCCATGCCATGGCCGGTCGCGATGGTCTCATCGACACAGCTGTCAAAACTTCCCGCTCTGGCCATTTGCAGCGTTGCCTTATTAAAGGCCTTGAGAGTCTTGTGGTGCATTGGGACCACTCTGTTCGTGACTCAAACGGCAGCATCGTGCAGTTTACATACGGTGGCGATGGTCTTGACCCGTGCAAGGCCTCAACCCTTACGTCGTGGGAAACGTCGAAGGAAAACCTCGTGGATTTCGGAAAACGATTTGGAGTGGACACGGGTGAGGCGACCAGCGAGACGCGACGGCCAGAAAACTGGGAACAAGGAGTGAGGAACAGCAGCGTGAAGCGCGGTAAACGGCCACGCACAAGCATGAATAACGATAAGAAGAATAATACTAATAAAAATAATGACAACGACGACGAAGAGGAGGATGGCGACGATGAAAATAAAAGGAGCAGTAATTGTAATCACAATGAGAATGCACGGCAGCGGCACAAAGAGCAACAGTTACGTGAGAATCCACTCCCGCGGCACATGCATGACGGCCTTGAGGACTATTTACGCACCAAAGCAACATTTCCACTCTTCCAGCGGGTCTCACAGTTGGCACGCTGGAAGGCGCAGGGACAAGTGCAGGAGAAACTTGCAGAGAAGCGTAGAGAAAGTATTGCATATTATCGTGATGTTCTCTCGGAACTTGCCACAAGCCGACGTGTAAAGGCTTTCTGTGACCCCGGGGAACCTGTTGGTCTTCTTGCGGCACAGGCCGCCGGAGAGCCGTCTACGCAAATGACACTTAACACATTCCACAGTGCCGGTTCCACGGTGACCCACGTGACGGAAGGTATCCCACGTCTCCGTGAGCTGCTTATTCACGCCTCTGTTCAGAAAGCTGCAGTTATTGTGCCCGTGGAGAAGGCGACGCCGGTGGATGAATCAGCCATTGCACGCATACTACATGCGGGTGTGGCGATACGGCTGCGGGATTGTCTTGCGCGGGGCGGGACAAGTGGAAAAGGATATCATTACCACGTGGCACGACGTAAGGATGCGT
## TcCLB.401569.10 AAGAGGTACCACGTGGTTCTTACGATGGCGAATAAAATTGGCTCCGTGTACATTGATGGAGAACCTCTGGAGGGTTCAGGGCAGACCGTTGTGCCAGACGAGAGGACGCCTGACATCTCCCACTTCTACGTTGGCGGGTATAAAAGGAGTGATATGCCAACCATAAGCCACGTGACGGTGAATAATGTTCTTCTTTACAACCGTCAGCTGAATGCCGAGGAGATCAGGACCTTGTTCTTGAGCCAGGACCTGATTGGCACGGAAGCACACATGGGCAGCAGCAGCGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAACTCCCGGTGACAGCAGTGCCCACAGTACACCCTCAACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGGTGACAACGGTGCCCACAGTACGCCCTTGACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGGTGACAACGGTGCCCACAGTACGCCCTTGACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAACGGTGCCCACAGTACGCCCTCGACTCCCGCTGGCCACGGTGCCAATGGTACGGTTTTGATTTTGCACGATGGCGCTGCATTTTCGGCCTTTTCGGGCGGAGGGCTTCTTTTGTGTGCGGGTGCTTTGCTGCTGCACGTGTTCGTTATGGCAGTTTTTTTCTGA
## TcCLB.401661.10 CAGGGTGGCTTTGTCAGCGCGACGATCGATGGACAGAACGTCGTCCATTTCAGCCAGCCGGTGTATTCCTGGAAGGAGGGAGAAGAAGCGGGTCGAGTAGACTTGCGGCTGACGGACATGCAGCGAATTTATGATGTTGGGCCGGTATCCGCTGAAAATGAGAAGTTTGCCGCCAGCACTCTGCTGTACGCCACAGACGAAGTGCGATCATCGTTGGTGGAGAAAGAGTGGATAAAACTGTACTGCTCGCACGAGGTCGCTGCTGCGGATGACGAATGCAACATTGCTTTTGTGGACTTGACGGAGAAGTTGAAGGGCGTGAAGAGGGTGTTGGTTGCCTGGAAGGAGAAGGACGCGTAGGTTGCGAATGAATACCGCTGCGTGGGTGAAAAGAGCCAGAAGCGCCGTGACTGTAATGGTTCTGCCCCCACTGAAGGGCTGTTTGGCTTTTTATCCAACACGTCAACTGACAGCACGTGGGCCGACGAGTACCTCTGCGTGAATGCAGCAGTGAACAATGGGAGGTCAGGGGTTGATGGAGGGATGACGTTCAAAGGGTCTGGAGCGGGGAGTTTAGTGGCCTGTTGGCAAGCTGGGGCAGAATGTGCCGTACTACTTCGCAAACAACAAGTTCGGCCTTTTTTGGCGACGGTGACCATCCATGAGGAGCCGCATAATGGCCCTGTTTTTTTGGTGGGTGTGGGGATGAATGACACTGACAGCACCGTGCTTTTGGGCTGTTCTACACGAGTGGAAGGAAGTGGGAGGCCACCTTCAACGGTGAGACTCAGAATTTGTCAGGGGATCCCGACTTAGTGCAGGGCAAAACACATCAGTTGGCACTGCAATAGGATGATGCGGGGCTGATTGTGTACGTGGATGGATCGACGATATGCGACGGAGAACTGGATTATGAGGAGCATGAGGACTATGAGAGCTTTTCCAAGTGCTAATAACGCTGTTGAGGCCCCCCATCGTTTCACACTTCTGCGTTGGCGGCGGCGGCAAGAGTGCTCGGTAACGCTCATGTGACGGTGAGCAACGTCCTTTTGCACAACCGCGTGTTGAAGGGTGACGAGCTCCAAGCGCTAATGAAGACGAAGCCGGATGCTTCAGAGGCGAGGGTGCCGGCCCCGAAAGGTGCGCCTCACAACAATCATGCGAGTGAAACTTTCCCTCAATCCGCCAGTGGACTTGTCGTCGTGGATGAAGCGCGGCAGGAAGACACATCAGCACCACAACGTCAACACTCACCAGCGCAACCATCAGGAAATAGGAAGGGCTCAGCAGTTCCCAGGCAAACATCTTCTTCTGATGCCATTGGCCCGTCCACCTCAGCTGATACGGGGAAATGGAAGAAGAGACACCCAGCAGTGGTGTATTGGCGCCTGCATTGTCTTCGACACCGAGCGTGGTCAGTCGTCAGGAAGTACTTGAAAGCAAGATACCTGTCAGTGGGGGTCGCCCTGAGGGTGGCCGGGAGCACTTGCCCTCCAACGCGGCAGCATTGATGATGGGGCAGGCGGGCAAGGCGAGTGAAGGCTCTTCGCATAACGGATACACCGACGGTTGGCCCCAGCGCAGCATTTCACGTGAT
## TcCLB.403789.9 CCAGATCCACAGACACGTGCTCTCTCTAGTGTCCTTGGATTTGTGGAGCAAATTGAGTGTTACCAGGTACTGCAGAACAACTCCACACCGGACCGAAACCTCCTCTCCACCGCGCAAGAAATTGCAAATGAGATCAACCTACGTAACCTGCAGGTGAAGGTGAATGAGGTGTACCAAGAAATCATGGGGACATTTGCTAAGAAGGAAGGTTTGTTCCGTATGAATATGATGGGGAAGCGTGTGAATCAGGCCTGTCGCTCCGTCATCTCCCCGGATTTAATGGTGGAGCCGAACGAAGTCTTGCTGCCCCGACCATTTGCCCGAAATTTGTCTTTTCCTGAACAAGTGACATTTTACGCCTCTGCCCGCATGAATCTTTTGAAGCGTTGCGTCATCAACGGGCCGCGTCGCTACCCCGGTGCCACTCATCTTGAGATTCGACAAACGAATGGAGAAATTCGATTTATTGAGCTTGACGTGCCTGAGCAAACACGGCGGCAACACGCTGCCAAGTACTTTGCCATGGCGCAAAGTGGCGTCACACTGATTGTGTACCGTCATATCTTGGATGGCGACCGGGTGGTGTTTAACCGTCAACCGACATTGCACAAGCCAAGTATGATGGGGTACCAAGTCAAGGTGCTTTCTGGACACAAGACAATTCGCTTTCATTACGTGAATGGCAACTCCTTTAATGCCGATTTTGACGGCGATGAGATGAATATTCACGTCCCGCAGAGCCTGGAGGCGAAGGTGGAATTAGACGTCCTGATGGATGCCAATCTGAACTACCTCGTCCCCACCTCCGGGAAACCGATTCGGGGACTTATTCAGGATCATATTGCGGCTGGTGTCCTGTTAACATTGCGTGACAAGTTTCTTGAGCATGCCACCTTTGTTCAACTTACTTACTATGGGCTGGCACCCTACTTGCGCCAGCAACACGAAATCACACTTTCTGAGCTTATTCCTCTGCCGGCGATCTTATGGCCCCGGCCACTCTGGACAGGGAAACA
## all
## TcCLB.398345.10 TTATTTACGGAATTTTGCTCCAATTATACGGAACGTTTGCAGCGTGGTGAGCTTGTGACACATCTCACTTCTTTGCTTGAGCGCGATGTGGAAGATAAGCTGCGTGATTTTCACCAACAAACGAGGCTATACAGGGTAGACATTGTGCGGAAGACCGAGGCTGAACTTGAAGAGGAGCTCTTGAAGGTGGAGCTGAAACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGTCCCACTTGAAGAGTGTGGAAAGAGCGTCTTCCAATGATGTTTACATAACTGACAATGAAACGTGTAATGTTCTTGTGCATAGCTTTTGGAAGAGACTGTGCCGTGCTCTTCAGGCGGAGATAGAATTGCTTTATTGTGACTATAGCCAGCAGCATCAACGTCAACGGGAATCACAAACTTTAAATTTATATGATCGTTATGCTTCTTTGGTTGCAGAGGATCCGGCCTTGCAGGAGGCGATAGCACACGTTGTGTTGGATGCGGTATTTCAGAAGGTCAGTCGCCGTTTTGCCTCAATGGCTGAGAACGCGGCGGAAACAATTCATCAGGCCTTTGAGGGTGTTCTCAACCGCAACCAGGACGGCACAGTCCGTTTCTTTCATACAACAAAGGCACTACAACGCATTGAACCTCAGGCGCGCCAAGCTGGGCTTGTTCTCTTGGGCTGCCTGTTGTACTATCGCGTAAAGGTCGTTGCGGATCGGGTGGTTTACAAGCTAGAGGATACTGATGGACTCAGCCGTGCTGCTGTTCACCTCCTTGGCGAGCGTCGCAGGCTGATTGTGCGAGAAAACAGCGAGGAACAAAAATTTTTTCTTCACTATGCCACCATTTCGGAGGCTCCGCGGTACCCGATAGGTGCGCCTGTGGTGGAGACCGATTCTGGAGACACATCAGACAACGTTGTAGATAGGGACTGTGTGTTACTCAGTCAGCAGGCAGTGCAGCGGGCATTTGACCTGTATACACAAAAATGTGAATTCACCATGCAACTGCAACTTCGCTCCATTGAAGGCGAGAAACAGAATTTGCCTGCCTGGGTGCTGCCGGTGTTGTTGCTGTTGGGATGGAATGAAATATGGTATGTCCTTTCGTCCCCAGTTCTTTTTGTAGTTGTTGTTATTATCGCTGCGGTGTTTTTAAGGGGCTTTTTGTTGACTCAATGGGCAATATTTGAGGAGACAGGGCCCACCTGTGTCGTGGTGGGTGTTCGCGTCGTCGTGCGGCAAATTCGGAATATATACAAGGCCCTTGTTCCAATGATACCGGACGATGTTAAGAGTAACGTGGCACGGCACCGTGACCCAGGGAGTTTCTCTGATGTGACTGCGTCTGCTGTGGGAACATCATGGCCTTATGCTGCTGCCGAACCGACTGTGTTGCCGCCCTCTACAACGTCCGCCACTCTCACGCGGCGATTAAAGAAGGAAGAGGAGGTACCGACCCAGAAAGAATGATTTCCTGAATTCATGTAATGGATTATTATTACACATGTACCTGTATGCATCCATTTGTGAGAGTTTGGAAGAAAAAGAAAGGACGCAATGAAAAGTTTTGGCTTGTACTTTGAAGTACCCCAATAATTCTACGCGCACAGGTTGTGCTGCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCTGTTTTTGTCCTTGATTGTTGGTATTTTATTCGGCGTTGTTTTTCTG
## TcCLB.401041.10 AGAAAAACGTAAGGCGGAAGAAAAGCGGGAGGCAGAAGAAAGATTAAGGCGTGAGGAGGATGAAAGGCAAAGACGAGCGCAAGAAATGAAATTTACCATTTCCACTACGATCGAAGAAGTACTGTTTAAAGGAGGAGTCCGCGTCAAGGAAAAGAAGCTGAACGATTTTCTTTACGATGGATTGGACGGCAGGGGCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTGCGTACGTTATTGGGAGTCAATCATTCCTGTTGGACAGACCCACCAAAACCGTATCAACATACAGGGATAACCCCAGGATTGAGGATGTTGTAAACATTTTTTTTTTCCGTGGGGTTAAAGGGTATTGTATCTACGATGCGACATTGGCATGTCGTCAACCGTCTGCTGGTTTGCCTTGCAAGGGATGGGGCATGATTGTGGTGACACCACCAGACAAAAACGAATATGAACGGTGGACAAAAAAAATGGACGCTACTGCAATCGTAACGAATTGTCCCGAAGAAAACGATGTGAGGGCAATGTGCATTTGGATGAAGCGCAATCGACCCCTGCAGGAGCAAGCGGAATACTGGAAGGAGGTGAGGGGTCGCATGAATAACGTGGGACCAATTCTCCGCTCCATCTTTGATAAACAGGCATATGATGACCGCATTAAAGCGTGTCAGCAAGCCGTGGATGGGATGAACGCTTCGGAATTAAAGCGTAATTTGGGTATTGGCTGTTGTTATTCGTCCAATGACAATGACTTGTCTTGAAAGCTTCTGAAGGTTGTCCGAGTACGACGAGGAAACAACATTGAATCGCCTCTGAATTTGCTGGTATCTCCCCACCTTGAACGTGAAACTTTGTCCAGGTTGGAGAATGAAATGAAGCAGTCCGATTTTATTTTTTTTGTTTTGAGGTTCTGGGATTATGTCCCACCATATCTTATTGAAAAGTATGCCGTATCCGCATTTTTGAATGAGGATTTCCTGCGTGCGATAAGAATTAAAATCAAGGAACTGAGGCCACCAGGACGACGTGAGCCACACAGCTGTGCGCTGAAAGAGCACTCAGACACGAGCTTCACCAGAAAAGAGGTTCTACCGCCACCGGAACGTCTTTCCAATCCGGTTGCTATGGACCACTGGGTGCTGTATGAACCGAAGGTCCAACACTTTCCGCTGGTGGACGGCTTTTTCTTTGTGGACTCAAATCCAATGACGCTGGTTGGGCTGCGGATGACTACGGCGGGTGAGCACCGCACCACAACCAGCACTGTGAGGCAGTTCACTGAGTGCCTGGCGGCATATTTTAATGGTTGGGAGGAGTTATCCCGAGACATGTCGTGGGAGATTATTAATGTGCAGCACGCAGACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
## TcCLB.401473.9 ATTCGGGGACTTATTCAGGATCATATTGCGGCTGGTGTCCTGTTAACATTGCGTGACAAGTTTCTTGAGCATGCCACCTTTGTTCAACTTACTTACTATGGGCTGGCACCCTACTTGCGCCAGCAACACGAAATCACACTTTCTGAGCTTATTCCTCTGCCGGCGATCTTATGGCCCCGGCCACTCTGGACAGGGAAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTATACCGGGCCGTATGATGGTGCCATTTCCCCGCAACCATTTGCTCATGATGACCGCCTCAGGGGCAAAGGGAAGTAACGCTAATGCGACGCAGATGGCACTGGGACTCGGTCAGCAGCTTTTTGATGGACGACGCGTGAAACGAATGAATTCGGGAAAGACGCTGCCTGCCTTTTTTGCTCATGAACGCCGTGCCCGTTCTTTTGGCTACGCGATTGGACGCTTCACATCCGGTATCCGGCCACCAGAGTACACGATCCATGCCATGGCCGGTCGCGATGGTCTCATCGACACAGCTGTCAAAACTTCCCGCTCTGGCCATTTGCAGCGTTGCCTTATTAAAGGCCTTGAGAGTCTTGTGGTGCATTGGGACCACTCTGTTCGTGACTCAAACGGCAGCATCGTGCAGTTTACATACGGTGGCGATGGTCTTGACCCGTGCAAGGCCTCAACCCTTACGTCGTGGGAAACGTCGAAGGAAAACCTCGTGGATTTCGGAAAACGATTTGGAGTGGACACGGGTGAGGCGACCAGCGAGACGCGACGGCCAGAAAACTGGGAACAAGGAGTGAGGAACAGCAGCGTGAAGCGCGGTAAACGGCCACGCACAAGCATGAATAACGATAAGAAGAATAATACTAATAAAAATAATGACAACGACGACGAAGAGGAGGATGGCGACGATGAAAATAAAAGGAGCAGTAATTGTAATCACAATGAGAATGCACGGCAGCGGCACAAAGAGCAACAGTTACGTGAGAATCCACTCCCGCGGCACATGCATGACGGCCTTGAGGACTATTTACGCACCAAAGCAACATTTCCACTCTTCCAGCGGGTCTCACAGTTGGCACGCTGGAAGGCGCAGGGACAAGTGCAGGAGAAACTTGCAGAGAAGCGTAGAGAAAGTATTGCATATTATCGTGATGTTCTCTCGGAACTTGCCACAAGCCGACGTGTAAAGGCTTTCTGTGACCCCGGGGAACCTGTTGGTCTTCTTGCGGCACAGGCCGCCGGAGAGCCGTCTACGCAAATGACACTTAACACATTCCACAGTGCCGGTTCCACGGTGACCCACGTGACGGAAGGTATCCCACGTCTCCGTGAGCTGCTTATTCACGCCTCTGTTCAGAAAGCTGCAGTTATTGTGCCCGTGGAGAAGGCGACGCCGGTGGATGAATCAGCCATTGCACGCATACTACATGCGGGTGTGGCGATACGGCTGCGGGATTGTCTTGCGCGGGGCGGGACAAGTGGAAAAGGATATCATTACCACGTGGCACGACGTAAGGATGCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTCAGGGCGCCAACGCGACGATGCGTCATGTCTTGTCGTTTCTTTCTCTTTTTACTTTGGGGACGCGGGCGATTAAATTGCGGCAGGCCCGCTCCACAGACATTCGTGACATGGCGAACTGGTTTGGTGTCGAGTCGGCATACCGCACCCTTTACGACGAGCTGTCCAAATTATTCAAGCGGTACTCCGTTGACCATCGC
## TcCLB.401569.10 TGCTGTGTGGAGCAGGAAGAATGCACCGGATACTACACATGGGGTGAGTTTTTTCTTGGCCGTGGGGATCGCGTGGCGCCGTTTGCCGATTTTTGTTTTTACGTTCTCATCTCCACATTTGCTGCGATGATAGCATCTTTTTTTTGTAAGGTGTATGCACCGTATGCAGCTGGTGGCGGCATCAATGAGGTGAAGACAATNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAGAGGTACCACGTGGTTCTTACGATGGCGAATAAAATTGGCTCCGTGTACATTGATGGAGAACCTCTGGAGGGTTCAGGGCAGACCGTTGTGCCAGACGAGAGGACGCCTGACATCTCCCACTTCTACGTTGGCGGGTATAAAAGGAGTGATATGCCAACCATAAGCCACGTGACGGTGAATAATGTTCTTCTTTACAACCGTCAGCTGAATGCCGAGGAGATCAGGACCTTGTTCTTGAGCCAGGACCTGATTGGCACGGAAGCACACATGGGCAGCAGCAGCGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCAACTCCCGCTGACAGCAGTGCCCACAGTACGCCCTCGACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAACTCCCGGTGACAGCAGTGCCCACAGTACACCCTCAACTCCCGCTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGGTGACAACGGTGCCCACAGTACGCCCTTGACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGGTGACAACGGTGCCCACAGTACGCCCTTGACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAACTCCCGGTGACAACGGTGCCCACAGTACGCCCTCAGCTCCCGCTGACAACGGTGCCCACAGTACGCCCTCGACTCCCGCTGGCCACGGTGCCAATGGTACGGTTTTGATTTTGCACGATGGCGCTGCATTTTCGGCCTTTTCGGGCGGAGGGCTTCTTTTGTGTGCGGGTGCTTTGCTGCTGCACGTGTTCGTTATGGCAGTTTTTTTCTGATGTAGTGAGAGAGTCTCCTGACAAATGTAGATAAATTCATAATTGTGCTGTGAACCGTTTGGGTAAATGTGTGTGTGCGCTCTCATAACAAGGAAATGATTTCCAGTAATGTTTTTGTTTTTTGTTCTCGAACTTTTTGAACAAATCTGCGGACAGACGGTGATGAGTAATTTGAATTTGTTTTTCAGCGTGTTTTTGTCACTGACCCTTTGTTTAAGTGGAGACCGCGTTGGAATGCGGTGAGGGCATTTCTCTGTTTTGTTTTTCCCCTTTTTTTTTTTCCTTTGTGTTTCTTCAATT
## TcCLB.401661.10 CGGCGGTGTTCCGCGACGCTGTTGGCGTGTTGGTTGTTGGGGGCGTGGCGCTGTCGTCGCGTGGTGCGCTGTACGTGGACGGGCTGTTGGTGCAGACGGCGCTGGGGCTGTGCGTGTCGGTGGAGGGCGGTGTTGCGGCCAGCGGCGGCTCCGTGGTGGCGTTTTTTGACAGAGACTTCCTGCTGTGCAAGCACGCGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGCAGGGTGGCTTTGTCAGCGCGACGATCGATGGACAGAACGTCGTCCATTTCAGCCAGCCGGTGTATTCCTGGAAGGAGGGAGAAGAAGCGGGTCGAGTAGACTTGCGGCTGACGGACATGCAGCGAATTTATGATGTTGGGCCGGTATCCGCTGAAAATGAGAAGTTTGCCGCCAGCACTCTGCTGTACGCCACAGACGAAGTGCGATCATCGTTGGTGGAGAAAGAGTGGATAAAACTGTACTGCTCGCACGAGGTCGCTGCTGCGGATGACGAATGCAACATTGCTTTTGTGGACTTGACGGAGAAGTTGAAGGGCGTGAAGAGGGTGTTGGTTGCCTGGAAGGAGAAGGACGCGTAGGTTGCGAATGAATACCGCTGCGTGGGTGAAAAGAGCCAGAAGCGCCGTGACTGTAATGGTTCTGCCCCCACTGAAGGGCTGTTTGGCTTTTTATCCAACACGTCAACTGACAGCACGTGGGCCGACGAGTACCTCTGCGTGAATGCAGCAGTGAACAATGGGAGGTCAGGGGTTGATGGAGGGATGACGTTCAAAGGGTCTGGAGCGGGGAGTTTAGTGGCCTGTTGGCAAGCTGGGGCAGAATGTGCCGTACTACTTCGCAAACAACAAGTTCGGCCTTTTTTGGCGACGGTGACCATCCATGAGGAGCCGCATAATGGCCCTGTTTTTTTGGTGGGTGTGGGGATGAATGACACTGACAGCACCGTGCTTTTGGGCTGTTCTACACGAGTGGAAGGAAGTGGGAGGCCACCTTCAACGGTGAGACTCAGAATTTGTCAGGGGATCCCGACTTAGTGCAGGGCAAAACACATCAGTTGGCACTGCAATAGGATGATGCGGGGCTGATTGTGTACGTGGATGGATCGACGATATGCGACGGAGAACTGGATTATGAGGAGCATGAGGACTATGAGAGCTTTTCCAAGTGCTAATAACGCTGTTGAGGCCCCCCATCGTTTCACACTTCTGCGTTGGCGGCGGCGGCAAGAGTGCTCGGTAACGCTCATGTGACGGTGAGCAACGTCCTTTTGCACAACCGCGTGTTGAAGGGTGACGAGCTCCAAGCGCTAATGAAGACGAAGCCGGATGCTTCAGAGGCGAGGGTGCCGGCCCCGAAAGGTGCGCCTCACAACAATCATGCGAGTGAAACTTTCCCTCAATCCGCCAGTGGACTTGTCGTCGTGGATGAAGCGCGGCAGGAAGACACATCAGCACCACAACGTCAACACTCACCAGCGCAACCATCAGGAAATAGGAAGGGCTCAGCAGTTCCCAGGCAAACATCTTCTTCTGATGCCATTGGCCCGTCCACCTCAGCTGATACGGGGAAATGGAAGAAGAGACACCCAGCAGTGGTGTATTGGCGCCTGCATTGTCTTCGACACCGAGCGTGGTCAGTCGTCAGGAAGTACTTGAAAGCAAGATACCTGTCAGTGGGGGTCGCCCTGAGGGTGGCCGGGAGCACTTGCCCTCCAACGCGGCAGCATTGATGATGGGGCAGGCGGGCAAGGCGAGTGAAGGCTCTTCGCATAACGGATACACCGACGGTTGGCCCCAGCGCAGCATTTCACGTGATTTTTTTTTTTGAGAGTGTGCACAAAGAGCCGTCCACCGCCAACACGCTCGCCGGCGACAAACAACACAACGACCCTGAAGGGGAAACGCATGCCTGCACTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGCCGCCTGCGAGGGCCGGCGGCGGTGGTGGCGATACTGCGGAGGGCTGCGTGAGTGGCGTGACGCTGACGGAGTCGGTGACGGTTGGCGGCCGGCGG
## TcCLB.403789.9 CGTCATACCCGACGTCGTGGACGGTGCGCTGGAGCAGCAAACGCTGCCTTCGTGGCTGCCACAGTTCGATTCTGTGAACTTCACACGAAATGCCGATGACGCGACGAGTGGCGAACTTCTTTTTCAGGGCGCCAACGCGACGATGCGTCATGTCTTGTCGTTTCTTTCTCTTTTTACTTTGGGGACGCGGGCAGGATTANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCCAGATCCACAGACACGTGCTCTCTCTAGTGTCCTTGGATTTGTGGAGCAAATTGAGTGTTACCAGGTACTGCAGAACAACTCCACACCGGACCGAAACCTCCTCTCCACCGCGCAAGAAATTGCAAATGAGATCAACCTACGTAACCTGCAGGTGAAGGTGAATGAGGTGTACCAAGAAATCATGGGGACATTTGCTAAGAAGGAAGGTTTGTTCCGTATGAATATGATGGGGAAGCGTGTGAATCAGGCCTGTCGCTCCGTCATCTCCCCGGATTTAATGGTGGAGCCGAACGAAGTCTTGCTGCCCCGACCATTTGCCCGAAATTTGTCTTTTCCTGAACAAGTGACATTTTACGCCTCTGCCCGCATGAATCTTTTGAAGCGTTGCGTCATCAACGGGCCGCGTCGCTACCCCGGTGCCACTCATCTTGAGATTCGACAAACGAATGGAGAAATTCGATTTATTGAGCTTGACGTGCCTGAGCAAACACGGCGGCAACACGCTGCCAAGTACTTTGCCATGGCGCAAAGTGGCGTCACACTGATTGTGTACCGTCATATCTTGGATGGCGACCGGGTGGTGTTTAACCGTCAACCGACATTGCACAAGCCAAGTATGATGGGGTACCAAGTCAAGGTGCTTTCTGGACACAAGACAATTCGCTTTCATTACGTGAATGGCAACTCCTTTAATGCCGATTTTGACGGCGATGAGATGAATATTCACGTCCCGCAGAGCCTGGAGGCGAAGGTGGAATTAGACGTCCTGATGGATGCCAATCTGAACTACCTCGTCCCCACCTCCGGGAAACCGATTCGGGGACTTATTCAGGATCATATTGCGGCTGGTGTCCTGTTAACATTGCGTGACAAGTTTCTTGAGCATGCCACCTTTGTTCAACTTACTTACTATGGGCTGGCACCCTACTTGCGCCAGCAACACGAAATCACACTTTCTGAGCTTATTCCTCTGCCGGCGATCTTATGGCCCCGGCCACTCTGGACAGGGAAACANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTATACCGGGCCGTATGATGGTGCCATTTCCCCGCAACCATTTGCTCATGATGACCGCCTCAGGGGCAAAGGGAAGTAACGCTAATGCGACGCAGATGGCACTGGGACTCGGTCAGCAGCTTTTTGATGGACGACGCGTGAAACGAATGAATTCGGGAAAGACGCTGCCTGCCTTTTTTGCTCATGAACGCCGTGCCCG
## gid annot_gene_name gene_type chromosome
## TcCLB.506551.10 TcCLB.506551.10 protein coding TcChr27-S
## start end strand
## TcCLB.506551.10 202017 203870 -
## annot_gene_product
## TcCLB.506551.10 protein associated with differentiation 8, putative
## annot_gene_type length chr_length low_boundary high_boundary
## TcCLB.506551.10 protein coding 1853 850241 201717 204170
## fivep
## TcCLB.506551.10 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGGGTTTTAAGGGAGGGAAGCACACACGCGTGTCTATCCATATCTATATAACTATATATCTTAGACACAGATATAATAGGGGCCACTCTGTCTGCAACCATTATATCAATTGGAAGAGCACTCATCCAATCGGCTTGGCGTGAACCA
## threep
## TcCLB.506551.10 ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTATTTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGGTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCTCCCT
## cds
## TcCLB.506551.10 ATGACCGGCGAACAATTTGTTGAAGTTCTCGCTACGGGGCCACAGAAGGCAATCTATGAGCCCCGTCGCTTTGCGATTTTGCTGCTTGGATCCTATGGCTGTATCTGCTCATCTCTGAGCTACGCCTTCAATCTCATTGCGCCGGAGATGCAGTCGCGTTACGACCTTACTGGGCGCGATATCTCGACCATCAGCACGGTGGGTTTGGTGGTTGGCTACTTTCTTATGCCATATGGCTTCATTTTTGATCACTTTGGTCCCAAACCGATTTTCATACTGAGCATGGTGCTGTTTCCTCTCGGGGCGCTGCTGTTTGCGCTGTCATTTCGCGGGACAATTGAGGGCTCTGTGGTGCGTCTAAGCTTTTTCAACGCCATTCTGACACTCGGATGCACGCTGTATGACGTAGTATACATGATGACGATCATGAGCCATTTCCCGATCAGCAGGGGCCCTGTCGTGGCCATTTTGAAGTCGTACATCGGACTGGGCTCCGCCATTGTGGGAAGCATCCAGCTGGCCTTTTTTGACGGGAGGCCGGACCACTACTTCTATTTTCTGATGGTGCTGTTTTTTGTGACTGGAGCTGCGGGTTTCTTCCTTGTGCCACTCCCGTCGTACCACCTGACTGGCTATGAGGAGAAACACCTTGGCATCGAGGAAAAGGAGAGACGACTGGCACGCAAATCCGTTTACCTCCGCCAGCAACCACCCACAATTCGCTTCGCGATCGGCATTGCGTTTGTTGTCCTGCTGGTTATATACTTGCCACTGCAGAGCGCACTGGTTGCGTATCTGGGGTGGGGGAGGACGCAGCGCATCATATTTGCGTCCATCTTGATTGCTGTCCTTGTGGCACTTCCGTTGATGGCATTGCCCGTTTCGTGCCTTGAGAGGAGGGAGACACAACGGGAGGAGGATGACTGCGGTGGGACGGAGAGACCGAGTGCGGGTGATGAGGTGGCGAAAGAGCCTGCGGCGGCTGGTGGTCCTCCGAAGAAGGTGGAGACGGACGTCGACTACATTGCGCCGCAGTACCAGACGACCTTTCTCCAGAACCTGAAAACGCTGAAGTTGTGGGCGCTTCTCTGGTGCTTTTTTACCTTGGGGGGCGCCGGGTTTGTGATCATCTACAACGCCAGCTTTGTCTACGCCGCGCTTGCTGACGAAGAGGTGGACAACGCCATCAAAACGCTTCTCACGGTGCTGAACGGGGTGGGAAGTGCGGCGGGTCGGCTACTGATGAGCTACTTCGAGGTCTGGTCGCAGAAACGCAAGGCCGAGGACAGGGTTTCGATTATCGTTTCTATCTATTTGGCTGATGTGTTCGTGATCCTGTCGCTGGTTTTATTCCTCGTGGTGCCCAGGGCTGCACTGCCGCTGCCGTACGTATTGGCTGCCCTTGGCAACGGTTTTGGCGCAGCCTCCCTTGTGTTGGTTTCTCGGACTGTTTTTGCAAAGGACCCCGCCAAGCACTACAACTTCTGCTTCCTTGCATCACTGTTTTCTACAATCTTCCTGAACCGCCTTCTGTACGGCGAGTGGTACACGCGGGAGGCTGAGAAGCAGGGCGGCAATGTTTGCCTTGGCCGGAATTGTGTGATGATGCCGCTGATATTTTTGATTGTTCTCAGCTTCACCGCGTTTCTTTCTACTGCCTATTTTGACTGGGAGTACCGCCGATTCAGTCGATTGGTGCTTGAGGAGCGGTGCCGTCTGAAGGAGAGGGCAGGGGAAGGGCTATTGGCGGTGGAGTCTCCCCCGCTTGTTGCAGCGGAGCGACAGCAAGAGGAAGAGGATGCCGGCAACCGAACAACGACGCCGGCCAACGACAGGAAGGTGGCACGTCCGTAA
## all
## TcCLB.506551.10 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTGGGTTTTAAGGGAGGGAAGCACACACGCGTGTCTATCCATATCTATATAACTATATATCTTAGACACAGATATAATAGGGGCCACTCTGTCTGCAACCATTATATCAATTGGAAGAGCACTCATCCAATCGGCTTGGCGTGAACCATGACCGGCGAACAATTTGTTGAAGTTCTCGCTACGGGGCCACAGAAGGCAATCTATGAGCCCCGTCGCTTTGCGATTTTGCTGCTTGGATCCTATGGCTGTATCTGCTCATCTCTGAGCTACGCCTTCAATCTCATTGCGCCGGAGATGCAGTCGCGTTACGACCTTACTGGGCGCGATATCTCGACCATCAGCACGGTGGGTTTGGTGGTTGGCTACTTTCTTATGCCATATGGCTTCATTTTTGATCACTTTGGTCCCAAACCGATTTTCATACTGAGCATGGTGCTGTTTCCTCTCGGGGCGCTGCTGTTTGCGCTGTCATTTCGCGGGACAATTGAGGGCTCTGTGGTGCGTCTAAGCTTTTTCAACGCCATTCTGACACTCGGATGCACGCTGTATGACGTAGTATACATGATGACGATCATGAGCCATTTCCCGATCAGCAGGGGCCCTGTCGTGGCCATTTTGAAGTCGTACATCGGACTGGGCTCCGCCATTGTGGGAAGCATCCAGCTGGCCTTTTTTGACGGGAGGCCGGACCACTACTTCTATTTTCTGATGGTGCTGTTTTTTGTGACTGGAGCTGCGGGTTTCTTCCTTGTGCCACTCCCGTCGTACCACCTGACTGGCTATGAGGAGAAACACCTTGGCATCGAGGAAAAGGAGAGACGACTGGCACGCAAATCCGTTTACCTCCGCCAGCAACCACCCACAATTCGCTTCGCGATCGGCATTGCGTTTGTTGTCCTGCTGGTTATATACTTGCCACTGCAGAGCGCACTGGTTGCGTATCTGGGGTGGGGGAGGACGCAGCGCATCATATTTGCGTCCATCTTGATTGCTGTCCTTGTGGCACTTCCGTTGATGGCATTGCCCGTTTCGTGCCTTGAGAGGAGGGAGACACAACGGGAGGAGGATGACTGCGGTGGGACGGAGAGACCGAGTGCGGGTGATGAGGTGGCGAAAGAGCCTGCGGCGGCTGGTGGTCCTCCGAAGAAGGTGGAGACGGACGTCGACTACATTGCGCCGCAGTACCAGACGACCTTTCTCCAGAACCTGAAAACGCTGAAGTTGTGGGCGCTTCTCTGGTGCTTTTTTACCTTGGGGGGCGCCGGGTTTGTGATCATCTACAACGCCAGCTTTGTCTACGCCGCGCTTGCTGACGAAGAGGTGGACAACGCCATCAAAACGCTTCTCACGGTGCTGAACGGGGTGGGAAGTGCGGCGGGTCGGCTACTGATGAGCTACTTCGAGGTCTGGTCGCAGAAACGCAAGGCCGAGGACAGGGTTTCGATTATCGTTTCTATCTATTTGGCTGATGTGTTCGTGATCCTGTCGCTGGTTTTATTCCTCGTGGTGCCCAGGGCTGCACTGCCGCTGCCGTACGTATTGGCTGCCCTTGGCAACGGTTTTGGCGCAGCCTCCCTTGTGTTGGTTTCTCGGACTGTTTTTGCAAAGGACCCCGCCAAGCACTACAACTTCTGCTTCCTTGCATCACTGTTTTCTACAATCTTCCTGAACCGCCTTCTGTACGGCGAGTGGTACACGCGGGAGGCTGAGAAGCAGGGCGGCAATGTTTGCCTTGGCCGGAATTGTGTGATGATGCCGCTGATATTTTTGATTGTTCTCAGCTTCACCGCGTTTCTTTCTACTGCCTATTTTGACTGGGAGTACCGCCGATTCAGTCGATTGGTGCTTGAGGAGCGGTGCCGTCTGAAGGAGAGGGCAGGGGAAGGGCTATTGGCGGTGGAGTCTCCCCCGCTTGTTGCAGCGGAGCGACAGCAAGAGGAAGAGGATGCCGGCAACCGAACAACGACGCCGGCCAACGACAGGAAGGTGGCACGTCCGTAATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTATTTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGGTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCTCCCT
## Found: Trypanosoma cruzi CL Brener Non-Esmeraldo-like
## Unable to find CDSNAME, setting it to ANNOT_GENE_NAME.
## Unable to find CDSCHROM in the db, removing it.
## Unable to find CDSSTRAND in the db, removing it.
## Unable to find CDSSTART in the db, removing it.
## Unable to find CDSEND in the db, removing it.
## Extracted all gene ids.
## Attempting to select: ANNOT_GENE_NAME, GENE_TYPE, ANNOT_GENE_LOCATION_TEXT, ANNOT_GENE_NAME, ANNOT_GENE_PRODUCT, ANNOT_GENE_TYPE
## 'select()' returned 1:1 mapping between keys and columns
## Found 7 genes which are less than 300 nt. from the beginning of the chromosome.
## Found 9 genes which are less than 300 nt. from the end of the chromosome.
fivep <- gsub(pattern="^(N+)", replacement="", x=tc_utr["TcCLB.506551.10", "fivep"])
threep <- tc_utr["TcCLB.506551.10", "threep"]
It seems to me that we can assume that the actual 3’ UTR is everything before the run of Ts starting at around position 100, assuming that is the polypyrimidine tract.
I am guessing that Fernanda got the number 12 by clicking the little OrthoMCL link provided by the tritrypdb. When I clicked it, the first thing I noticed is that the gene IDs for CL Brener are the old-style. Thus it will probably be simpler for me to just get them manually since sometimes the IDs don’t match up well old to new.
Now that I think about it, I have a little toy which is supposed to provide all the orthologs, this provides an opportunity to make sure that it actually works.
## Unable to find species names for 1 species.
## Plasmodium vivax like
## Found the following hits: Trypanosoma cruzi CL Brener Esmeraldo-like, Trypanosoma cruzi CL Brener Non-Esmeraldo-like, choosing the first.
## Using: Trypanosoma cruzi CL Brener Esmeraldo-like.
## Loaded: org.Tcruzi.CL.Brener.Esmeraldo.like.v46.eg.db
## Some columns were missing: ORTHOLOGS_GROUP_ID, ORTHOLOGS_COUNT
## Removing them, which may end badly.
## 'select()' returned 1:many mapping between keys and columns
## There are 52 possible species in this group.
## Found species: Blechomonas ayalai B08-376
## Found species: Bodo saltans strain Lake Konstanz
## Found species: Crithidia fasciculata strain Cf-Cl
## Found species: Endotrypanum monterogeii strain LV88
## Found species: Leishmania aethiopica L147
## Found species: Leishmania amazonensis MHOM/BR/71973/M2269
## Found species: Leishmania arabica strain LEM1108
## Found species: Leishmania braziliensis MHOM/BR/75/M2903
## Found species: Leishmania braziliensis MHOM/BR/75/M2904
## Found species: Leishmania braziliensis MHOM/BR/75/M2904 2019
## Found species: Leishmania donovani BPK282A1
## Found species: Leishmania donovani CL-SL
## Found species: Leishmania donovani strain LV9
## Found species: Leishmania enriettii strain LEM3045
## Found species: Leishmania gerbilli strain LEM452
## Found species: Leishmania infantum JPCM5
## Found species: Leishmania major strain Friedlin
## Found species: Leishmania major strain LV39c5
## Found species: Leishmania major strain SD 75.1
## Found species: Leishmania mexicana MHOM/GT/2001/U1103
## Found species: Leishmania panamensis MHOM/COL/81/L13
## Found species: Leishmania panamensis strain MHOM/PA/94/PSC-1
## Found species: Leishmania sp. MAR LEM2494
## Found species: Leishmania tarentolae Parrot-TarII
## Found species: Leishmania tropica L590
## Found species: Leishmania turanica strain LEM423
## Found species: Leptomonas pyrrhocoris H10
## Found species: Leptomonas seymouri ATCC 30220
## Found species: Paratrypanosoma confusum CUL13
## Found species: Trypanosoma brucei brucei TREU927
## Found species: Trypanosoma brucei gambiense DAL972
## Found species: Trypanosoma brucei Lister strain 427
## Found species: Trypanosoma brucei Lister strain 427 2018
## Found species: Trypanosoma congolense IL3000
## Found species: Trypanosoma congolense IL3000 2019
## Found species: Trypanosoma cruzi Brazil A4
## Found species: Trypanosoma cruzi CL Brener Esmeraldo-like
## Found species: Trypanosoma cruzi CL Brener Non-Esmeraldo-like
## Found species: Trypanosoma cruzi Dm28c 2014
## Found species: Trypanosoma cruzi Dm28c 2017
## Found species: Trypanosoma cruzi Dm28c 2018
## Found species: Trypanosoma cruzi marinkellei strain B7
## Found species: Trypanosoma cruzi strain CL Brener
## Found species: Trypanosoma cruzi Sylvio X10/1
## Found species: Trypanosoma cruzi Sylvio X10/1-2012
## Found species: Trypanosoma cruzi TCC
## Found species: Trypanosoma cruzi Y C6
## Found species: Trypanosoma evansi strain STIB 805
## Found species: Trypanosoma grayi ANR4
## Found species: Trypanosoma rangeli SC58
## Found species: Trypanosoma theileri isolate Edinburgh
## Found species: Trypanosoma vivax Y486
gene_idx <- tc_orthos[["GID"]] == "TcCLB.506551.10"
chosen <- tc_orthos[gene_idx, ]
esmer_gene_idx <- chosen[["ORTHOLOGS_ORGANISM"]] == "Trypanosoma cruzi CL Brener Esmeraldo-like"
nonesmer_gene_idx <- chosen[["ORTHOLOGS_ORGANISM"]] == "Trypanosoma cruzi CL Brener Non-Esmeraldo-like"
esmer_chosen <- chosen[esmer_gene_idx, ]
esmer_chosen_genes <- esmer_chosen[["ORTHOLOGS_GID"]]
non_chosen <- chosen[nonesmer_gene_idx, ]
non_chosen_genes <- non_chosen[["ORTHOLOGS_GID"]]
esmer_utr_idx <- rownames(tc_utr) %in% esmer_chosen_genes
esmer_utrs <- tc_utr[esmer_utr_idx, ]
non_utr_idx <- rownames(nonesmer_utr) %in% non_chosen_genes
nonesmer_utrs <- nonesmer_utr[non_utr_idx, ]
e5p <- esmer_utrs[, c("gid", "fivep")]
readr::write_csv(x=e5p, path="esmer_5p.csv")
n5p <- nonesmer_utrs[, c("gid", "fivep")]
readr::write_csv(x=n5p, path="nonesmer_5p.csv")
e3p <- esmer_utrs[, c("gid", "threep")]
readr::write_csv(x=e3p, path="esmer_3p.csv")
n3p <- nonesmer_utrs[, c("gid", "threep")]
readr::write_csv(x=n3p, path="nonesmer_3p.csv")
Now I use a handy macro in my editor to convert the csv files to fasta.
Finally, I invoke an aligner (I chose fasta36 because I like it). It looks like there are no significant similarities for any of the 5’ UTRs, but a couple for the 3’ UTRs.
# /cbcb/sw/RedHat-7-x86_64/common/local/fasta/36.3.8e/bin/ggsearch36 query_threep.fasta threeps.fasta GGSEARCH performs a global/global database searches version 36.3.8e Sep, 2016(preload9) Query: query_threep.fasta 1>>>query_threep - 301 nt Library: threeps.fasta 3612 residues in 12 sequences Statistics: (shuffled [500]) Unscaled normal statistics: mu= -188.4700 var=4262.2256 Ztrim: 0 statistics sampled from 12 (12) to 500 sequences Algorithm: Global/Global affine Needleman-Wunsch (SSE2, Michael Farrar 2010) (6.0 April 2007) Parameters: +5/-4 matrix (5:-4), open/ext: -12/-4 Scan time: 0.080 The best scores are: n-w bits E(12) TcCLB.508799.270 ( 301) [f] 1336 60.5 8.1e-120 TcCLB.509713.10 ( 301) [f] 1243 57.9 8.8e-106 TcCLB.510811.20 ( 301) [f] 853 46.8 1.6e-56 TcCLB.510069.20 ( 301) [f] 380 33.4 1.9e-17 TcCLB.507383.10 ( 301) [f] 70 24.6 0.00045 TcCLB.510811.10 ( 301) [f] 70 24.6 0.00045 TcCLB.509713.20 ( 301) [f] -112 19.5 1.4 >>TcCLB.508799.270 (301 nt) n-w opt: 1336 Z-score: 283.5 bits: 60.5 E(12): 8.1e-120 global/global (N-W) score: 1336; 94.4% identity (94.4% similar) in 305 nt overlap (1-301:1-301) 10 20 30 40 50 60 query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT :::: ::::::::::::: :: :: :: :::::::::::::::::::::::::::::::: TcCLB. ATTT-TTTTTGTATTGCCGCGCCGTTATTTTATTTATTTTTGATGATATGTTTGATTTAT 10 20 30 40 50 70 80 90 100 110 120 query_ TTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATTC ::::::::::::::::::::::::::::::::::::::::: :::::::::::::::::: TcCLB. TTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGGCTTTTTCATTGTTGATTC 60 70 80 90 100 110 130 140 150 160 170 180 query_ TTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGGT ::::::: :::: ::::::::::::::::::::::::::::::::::::::::::: : TcCLB. TTTTTTTCTTTT---TTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGCT 120 130 140 150 160 170 190 200 210 220 230 240 query_ GACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGG :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: TcCLB. GACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGG 180 190 200 210 220 230 250 260 270 280 290 300 query_ TAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCTCCC :: :::::::::::::::::::::::::::::::::::::::::::::::::::::::: TcCLB. CAAATTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCTCCC 240 250 260 270 280 290 query_ T---- : TcCLB. TCGGT 300 >>TcCLB.509713.10 (301 nt) n-w opt: 1243 Z-score: 269.3 bits: 57.9 E(12): 8.8e-106 global/global (N-W) score: 1243; 91.5% identity (91.5% similar) in 305 nt overlap (1-301:1-301) 10 20 30 40 50 60 query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT :::: ::::::::::::: : :: :: :::::::::::::::: ::::::::::::::: TcCLB. ATTT-TTTTTGTATTGCCGTGCCGTTATTTTATTTATTTTTGATTATATGTTTGATTTAT 10 20 30 40 50 70 80 90 100 110 120 query_ TTATATTTCATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATTC ::::::::::::::::::::: :::::::::::::::::: :::::::::::::::::: TcCLB. GTATATTTCATCGTATGCTGGTTGGTTGCGTGTGTGTTTTGGCTTTTTCATTGTTGATTC 60 70 80 90 100 110 130 140 150 160 170 180 query_ TTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGACGGT :::::::::: : :::::::::::::::::::::::::::::::::::: :::::::: TcCLB. TTTTTTTTTTCT---TTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCAGGGACGGT 120 130 140 150 160 170 190 200 210 220 230 240 query_ GACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATAAGG :::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::: TcCLB. GACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAACGAAGGATGTGGTGATAATAAGG 180 190 200 210 220 230 250 260 270 280 290 query_ TAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCT-CC ::::::::: :::::::::::: : ::::::::::: ::::::: :::::::::: : :: TcCLB. TAACTTATTCCATTCTGTTTTTTAATGTTTTTCTTAGTGTTTTTTCTTTACTTTTTTTCC 240 250 260 270 280 290 300 query_ CT--- :: TcCLB. CTCGG 300 >>TcCLB.510811.20 (301 nt) n-w opt: 853 Z-score: 209.5 bits: 46.8 E(12): 1.6e-56 global/global (N-W) score: 853; 77.5% identity (77.5% similar) in 311 nt overlap (1-301:1-301) 10 20 30 40 50 60 query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT :::: :: :::: :::: :: ::::: ::: :: :::::::::::: :: :: : : TcCLB. ATTT-TTCTTGTGTTGCTGCGCCGCTACTTTGTTCATTTTTGATGATGTGCTTTACACAC 10 20 30 40 50 70 80 90 100 110 query_ TTATATTT-CATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATT ::: ::: : : :: :::: : :::: :::::::::::::: ::::::::: :::::: TcCLB. GTATTTTTTCGTTGTGTGCTAGCGGGTCGCGTGTGTGTTTTGGCTTTTTCATAGTTGATA 60 70 80 90 100 110 120 130 140 150 160 170 query_ CTTTTTTTTT--------TTTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCC :::: :::: ::::::::::: ::::::::::::::: :::::::: :::: TcCLB. TTTTTATTTTGTTTCAATTTTTTTTTGGCAGGCGTCACTTTTTTCTGCGCGTGACCCTCC 120 130 140 150 160 170 180 190 200 210 220 230 query_ GGGGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTG :::::::::: ::::::::::::::::::::::::::: ::::: :::::::::::::: TcCLB. GGGGACGGTGGCGGAGCATTGCGGGGCGGTGTGCACGCATACCGAACGAAGGATGTGGTG 180 190 200 210 220 230 240 250 260 270 280 290 query_ ATAATAAGGTAACTTATTTCATTCTGTTTTTGA-GTGTTTTTCTTACTGTTTTTCCTTTA ::::::::: ::::::::: :: :: ::::: : ::::: :: ::::: ::: TcCLB. ATAATAAGGCAACTTATTTAATCCTTTTTTTTTCGAATTTTTTTTTTTGTTT----CTTA 240 250 260 270 280 290 300 query_ CTTTTCTCCCT : :: : TcCLB. TTGTT-----T 300 >>TcCLB.510069.20 (301 nt) n-w opt: 380 Z-score: 137.1 bits: 33.4 E(12): 1.9e-17 global/global (N-W) score: 380; 49.0% identity (49.0% similar) in 304 nt overlap (1-301:1-301) 10 20 30 40 50 60 query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT :: : : :::: :::: :: ::::: ::: :: :::::::::::: :: :: : : TcCLB. ATAT-TGCTTGTGTTGCTGCGCCGCTACTTTGTTCATTTTTGATGATTTGCTTTACACAC 10 20 30 40 50 70 80 90 100 110 query_ TTATATTT-CATCGTATGCTGGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTGATT ::::::: ::: :: :::::: :::: :::::::::::::: ::::::::: :::::: TcCLB. GTATATTTTCATTGTGTGCTGGCGGGTCGCGTGTGTGTTTTGGCTTTTTCATAGTTGATA 60 70 80 90 100 110 120 130 140 150 160 170 query_ CTTTTTTTTTT--TTTTTTTGGCGGGCGTCACTTTTTTCCGCGCGTGAGTCTCCGGGGAC :::: ::::: :::::::::::::::::::::: ::: ::::::: TcCLB. TTTTTATTTTTATTTTTTTTGGCGGGCGTCACTTTCTTCTGCGCGTGGCCNNNNNNNNNN 120 130 140 150 160 170 180 190 200 210 220 230 query_ GGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGATAATA TcCLB. NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 180 190 200 210 220 230 240 250 260 270 280 290 query_ AGGTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTACTTTTCT ::: : : :::: : : TcCLB. NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATTTAAGGATACCATACCTT--CGGCACC 240 250 260 270 280 290 300 query_ CCCT :::: TcCLB. CCCT 300 >>TcCLB.507383.10 (301 nt) n-w opt: 70 Z-score: 89.6 bits: 24.6 E(12): 0.00045 global/global (N-W) score: 70; 50.8% identity (50.8% similar) in 311 nt overlap (1-301:1-301) 10 20 30 40 50 60 query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT :: : : :::: :::: :: ::::: ::: :: :::::::::::: :: :: : : TcCLB. ATAT-TGCTTGTGTTGCTGCGCCGCTACTTTGTTCATTTTTGATGATGTGCTTTACACAC 10 20 30 40 50 70 80 90 100 110 query_ TTATATTT-CATCGTATGCTGGTGGG---TTGCGTGTGTGTTTTGACTTTTTCATTGTTG ::::::: ::: :: :::: : ::: :: : : ::: :: : : : : TcCLB. GTATATTTTCATTGTGTGCTAGCGGGCTTTTAGGAGCTTGTGAAGAGGGTGTTCGCGAGG 60 70 80 90 100 110 120 130 140 150 160 170 query_ ATTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTC---CGCGCGTGAGTCTCCGG : : : : : :: : :: ::: : :: :: : : TcCLB. AAAGCTAGCTATCTAACTT-----GATAAACATTTTAATAAAACGAATATGTATATTTTT 120 130 140 150 160 170 180 190 200 210 220 230 query_ GGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGAT :: : : : : : :: :: :: : :: ::: ::: : : : TcCLB. CGATTCTTCCTTATTAATTGTTGGAGGATCGCGGGTTGAG-GAGTCAAGAAACCGAGAAC 180 190 200 210 220 230 240 250 260 270 280 290 query_ AATAAG--GTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTAC ::: : :: :: ::: :: : ::::: ::::: ::::: : : : TcCLB. TTCAAGACGGAATTTTTTTATTTTTATTTTTA---GTTTTCATTACTTCACTCACCCTCT 240 250 260 270 280 290 300 query_ TTTTCTCCCT- ::::: : TcCLB. TTTTCCGATTT 300 >>TcCLB.510811.10 (301 nt) n-w opt: 70 Z-score: 89.6 bits: 24.6 E(12): 0.00045 global/global (N-W) score: 70; 50.8% identity (50.8% similar) in 311 nt overlap (1-301:1-301) 10 20 30 40 50 60 query_ ATTTCTTTTTGTATTGCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTAT :::: :: :::: :::: :: ::::: ::: :: :::::::::::: :: :: : : TcCLB. ATTT-TTCTTGTGTTGCTGCGCCGCTACTTTGTTCATTTTTGATGATGTGCTTTACACAC 10 20 30 40 50 70 80 90 100 110 query_ -TTATATTTCATCGTATGCTGGTGGG---TTGCGTGTGTGTTTTGACTTTTTCATTGTTG :: : :::::: :: :::: : ::: :: : : ::: :: : : : : TcCLB. GTTTTTTTTCATTGTGTGCTAGCGGGCTTTTAGGAGCCTGTGAAGAGGGTGTTCGCGAGG 60 70 80 90 100 110 120 130 140 150 160 170 query_ ATTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCG-CGCGTGAG---TCTCCG : : : : : :: : :: ::: : :: : : : : TcCLB. AAAGCTAGCTATCTAACTT-----GATAAACATTTTAATAAAACGAATAATATATATTTT 120 130 140 150 160 170 180 190 200 210 220 230 query_ GGGACGGTGACGGAGCATTGCGGGGCGGTGTGCACGCGGACCGAGCGAAGGATGTGGTGA :: : : : : : :: :: :: : :: ::: ::: : : : TcCLB. TCGATTCTTCCTTATTAATTGTTGGAGGATCGCGGGTTGAG-GAGTCAAGAAACCGAGAA 180 190 200 210 220 230 240 250 260 270 280 290 query_ TAATAAG--GTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTTTA ::: : :: :: ::: :: : ::::: : : : :: : : : : :: : TcCLB. CTTCAAGACGGAATTTTTTTATTTTTATTTTTAATTTTCATTACTTCACTCATCCCCT-- 240 250 260 270 280 290 300 query_ CTTTTCTCCCT :::: : : TcCLB. -TTTTTTGATT 300 >>TcCLB.509713.20 (301 nt) n-w opt: -112 Z-score: 61.7 bits: 19.5 E(12): 1.4 global/global (N-W) score: -112; 46.0% identity (46.0% similar) in 313 nt overlap (1-301:1-301) 10 20 30 40 50 query_ ATTTCTTTTTGTATT-GCCACGTCGCTAATTTATTTATTTTTGATGATATGTTTGATTTA : : :: : : : : : : :: : : ::: : : : :: :: : TcCLB. AGTGAATTGTTTGTGAGGGATGACG-TGGACTGTTTTTGTGAGGGGAGTGCACCGACTAC 10 20 30 40 50 60 70 80 90 100 110 query_ TTTATATTTCATCGTATGCT---GGTGGGTTGCGTGTGTGTTTTGACTTTTTCATTGTTG ::: : : :: ::: :: : ::: ::: : : : : :: ::: TcCLB. AGAATACT-CCTCTGATGACAACGGCTGCTTGAAGGTGGGGAGCGGATGATCTTTTCTTG 60 70 80 90 100 110 120 130 140 150 160 170 query_ A-TTCTTTTTTTTTTTTTTTTTGGCGGGCGTCACTTTTTTCCGCGC-GTGA--GTCTCC- : :::: :: : : :: : : ::: ::: :: ::: :::: : :: TcCLB. AGTTCTGGAGGGATTGTCTCGTGCCTGAAGTC--TTTGTTGTGCGATGTGAACGAAGCCA 120 130 140 150 160 170 180 190 200 210 220 query_ -GGGGACGGTGACG-GAGCATTGCGGGGCGGTGTGCAC-GCGGACCGAGCGAAGGATGTG :: :: : : : : :: : : : : :: : :: ::: : : : :: :: TcCLB. TGGAGAAGTTCATGCGACTACTTACATGAGTAGTTTATAGCTACCCGCGTAATGCATCTG 180 190 200 210 220 230 230 240 250 260 270 280 query_ GTGATAATAAGGTAACTTATTTCATTCTGTTTTTGAGTGTTTTTCTTACTGTTTTTCCTT : : :: : : :: :: : :::: :: :: : : : :::: : TcCLB. CGTGCACGAGGG------AGTACA--CTTTGTTTGCGTTGATTACGTGATTTTTTGTTAT 240 250 260 270 280 290 300 query_ TACTTTTCTCCCT :: : : TcCLB. TATGACGATGGCG 290 300 301 residues in 1 query sequences 3612 residues in 12 library sequences Tcomplib [36.3.8e Sep, 2016(preload9)] (32 proc in memory [0G]) start: Fri Mar 27 11:12:08 2020 done: Fri Mar 27 11:12:08 2020 Total Scan time: 0.080 Total Display time: 0.000 Function used was GGSEARCH [36.3.8e Sep, 2016(preload9)]