tRNAscan-SE gene predictions
tRNA genes included in GtRNAdb were predicted by tRNAscan-SE 2.0. It utilizes covariance models that were custom built and trained with tRNA genes for similarity searches. Output results of predicted tRNA genes include their coordinates relative to the input sequences, the tRNA isotype and anticodon, locations of introns if existed, covariance model bit scores, predicted secondary structure, and identification of possible pseudogenes. In addition, the predicted genes were scanned with isotype-specific covariance models for better classifying their functions in relation to the consensus tRNA isotypes. For further information about tRNAscan-SE, please visit tRNAscan-SE Online Web Server.
The predicted tRNA genes of each genome in GtRNAdb are presented in the following ways:
- tRNA Gene List
- tRNA Alignments
- FASTA Sequences
- Run Options/Stats
- Download tRNAscan-SE Results
Links for genome information and genomic sequences are provided. Moreover, if UCSC genome browser exists for the genome, a link will be included.
The Summary page shows the overview statistics of the predicted tRNA genes identified in the genome. It includes the number of predicted tRNA genes, possible pseudogenes, and tRNA genes that contain introns. The genes are further grouped as two-box, four-box, six-box, other tRNA sets based on the genetic code of the amino acids decoded by tRNAs. If codon usage information is present for a genome, a link for showing or hiding codon usage will be available to toggle between displays.
tRNA-derived repetitive elements, whose primary sequences are very similar to real tRNA genes, have been commonly found in a lot of vertebrates, some worms, and some plants. To address this problem, we applied a multi-step post-filtering process to the predictions in large eukaryotes by using EukHighConfidenceFilter from tRNAscan-SE 2.0. The tool assesses the predictions with a combination of domain-specific, isotype-specific, and secondary structure scores in two filtering stages on top of the pseudogene classification, and determines the “high confidence” set of genes that are most likely to be functioned in the translation process. A small number of the predictions that have high scores but atypical features such as unexpected anticodons are separately marked for further investigation. Some vertebrates such as zebra fish have a large number of high-scoring identical tRNA genes. Since further studies will be needed to understand the sources of the high copy number, we categorize these genes as "high scoring" tRNA set.
tRNA Gene List
Predicted tRNA gene annotations are summarized in a table shown in the figure below. They include:
- GtRNAdb Gene Symbol - gene ID in corresponding genome
- tRNAscan-SE ID - tRNA ID in tRNAscan-SE prediction results
- Locus - Genomic coordinates of predicted gene
- Anticodon - anticodon of predicted tRNA gene
- Isotype (from Anticodon) - tRNA isotype determined by anticodon
- General tRNA Model Score - covariance model bit score from tRNAscan-SE results
- Best Isotype Model - best matching (highest scoring) isotype determined by isotype-specific covariance model classification
- Isotype Model Score - bit score of the best isotype model
- Anticodon and Isotype Model Agreement - consistency between anticodon from predicted gene sequence and best isotype model
- Features - special gene features that may include gene set categorization, number of introns, possible pseudogenes, possible truncation, or base-pair mismatches
Clicking on the link at the GtRNAdb Gene Symbol will launch the detail page of the predicted gene. It contains additional information including flanking sequences, secondary structure, alignments with other predicted genes of the same isotype, and isotype-specific model scores. When information is available, links to other databases such as HGNC, MGI, FlyBase, WormBase, and RNAcentral are provided for obtaining more data in context.
Alignments of predicted tRNA genes are arranged by isotypes and anticodons. The color blocks on the alignments represent the base-pairing stems in the secondary structure. Sequences of tRNA transcripts (for example, RA7630_AGC_SACCHAROMYCES_CER) and tRNA genes (for example, DA7631_AGC_SACCHAROMYCES_CER) from the original Sprinzl tRNA database are included as references.
Sequences of predicted tRNA genes can be downloaded in FASTA format by clicking on the provided link. Predicted mature tRNA sequences are also available.
You can check out the tRNAscan-SE run options and statistics as below at this link.
Download tRNAscan-SE Results
tRNAscan-SE results can be downloaded as a tarball that includes the standard output file, secondary structure output, predictions in BED file format, sequences in FASTA format, and GtRNAdb gene symbol vs tRNAscan-SE ID map.