TopHat (bioinformatics)

TopHat is an open-source bioinformatics tool for the throughput alignment of shotgun cDNA sequencing reads generated by transcriptomics technologies (e.g. RNA-Seq) using Bowtie first and then mapping to a reference genome to discover RNA splice sites de novo.^[1] TopHat aligns RNA-Seq reads to mammalian-sized genomes.^[2]

History

TopHat was originally developed in 2009 by Cole Trapnell, Lior Pachter and Steven Salzberg at the Center for Bioinformatics and Computational Biology at the University of Maryland, College Park and at the Mathematics Department, UC Berkeley.^[1] TopHat2 was a collaborative effort of Daehwan Kim and Steven Salzberg, initially at the University of Maryland, College Park and later at the Center for Computational Biology at Johns Hopkins University. Kim re-wrote some of Trapnell's original TopHat code in C++ to make it much faster, and added many heuristics to improve its accuracy, in a collaboration with Cole Trapnell and others. Kim and Salzberg also developed TopHat-fusion which used transcriptome data to discover gene fusions in cancer tissues.^[3]

Uses

TopHat is used to align reads from an RNA-Seq experiment. It is a read-mapping algorithm and it aligns the reads to a reference genome. It is useful because it does not need to rely on known splice sites.^[1] TopHat can be used with the Tuxedo pipeline, and is frequently used with Bowtie.

Advantages/Disadvantages

Advantages

When TopHat first came out, it was faster than previous systems. It mapped more than 2.2 million reads per CPU hour. That speed allowed the user to process and entire RNA-Seq experiment in less than a day, even on a standard desktop computer.^[1] Tophat uses Bowtie in the beginning to analyze the reads, but then does more to analyze the reads that span exon-exon junctions. If you are using TopHat for RNA-Seq data, you will get more read aligned against the reference genome.^[4]

Another advantage for TopHat is that it does not need to rely on known splice sites when aligning reads to a reference genome.^[1]

Disadvantages

TopHat is in a low maintenance, low support stage, and contains software bugs that have spawned 3rd party post-processing software to correct.^[5] It has been superseded by HISAT2, which is more efficient and accurate and provides the same core functionality (spliced alignment of RNA-Seq reads).^[2]

References

^ ^a ^b ^c ^d ^e Trapnell C, Pachter L, Salzberg SL (May 2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–11. doi:10.1093/bioinformatics/btp120. PMC 2672628. PMID 19289445.
^ ^a ^b "TopHat". ccb.jhu.edu. Retrieved 2018-04-17.
^ Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (April 2013). "TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions". Genome Biology. 14 (4): R36. doi:10.1186/gb-2013-14-4-r36. PMC 4053844. PMID 23618408.
^ "Bowtie & Tophat". www.biostars.org. Retrieved 2018-04-24.
^ Brueffer C, Saal LH (May 2016). "TopHat-Recondition: a post-processor for TopHat unmapped reads". BMC Bioinformatics. 17 (1): 199. doi:10.1186/s12859-016-1058-x. PMC 4855331. PMID 27142976.

External links

TopHat page on Center for Computational Biology at JHU

[Trapnell_2009-1] Trapnell C, Pachter L, Salzberg SL (May 2009). "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics. 25 (9): 1105–11. doi:10.1093/bioinformatics/btp120. PMC 2672628. PMID 19289445.

[:0-2] "TopHat". ccb.jhu.edu. Retrieved 2018-04-17.

[tophat2-3] Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (April 2013). "TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions". Genome Biology. 14 (4): R36. doi:10.1186/gb-2013-14-4-r36. PMC 4053844. PMID 23618408.

[4] "Bowtie & Tophat". www.biostars.org. Retrieved 2018-04-24.

[5] Brueffer C, Saal LH (May 2016). "TopHat-Recondition: a post-processor for TopHat unmapped reads". BMC Bioinformatics. 17 (1): 199. doi:10.1186/s12859-016-1058-x. PMC 4855331. PMID 27142976.

[1]

[2]

[3]

[4]

[5]

v t e Bioinformatics
Databases	Sequence databases: GenBank, European Nucleotide Archive, DNA Data Bank of Japan and China National GeneBank Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: BioNumbers, Protein Data Bank, Ensembl, InterPro, KEGG, and Gene Ontology Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, Rat Genome Database, PHI-base, Arabidopsis Information Resource, GISAID and Zebrafish Information Network
Software	BLAST Bowtie Clustal EMBOSS HMMER MUSCLE PANGOLIN SAMtools SOAP suite TopHat
Other	Server: ExPASy Rosalind (education platform)
Institutions	Broad Institute Computational Biology Department (CBD) Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI) Database Center for Life Science (DBCLS) DNA Data Bank of Japan (DDBJ) European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory (EMBL) Flatiron Institute J. Craig Venter Institute (JCVI) Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) US National Center for Biotechnology Information (NCBI) Japanese Institute of Genetics Netherlands Bioinformatics Centre (NBIC) Philippine Genome Center (PGC) Scripps Research Swiss Institute of Bioinformatics (SIB) Wellcome Sanger Institute Whitehead Institute
Organizations	African Society for Bioinformatics and Computational Biology (ASBCB) Australia Bioinformatics Resource (EMBL-AR) European Molecular Biology network (EMBnet) International Nucleotide Sequence Database Collaboration (INSDC) International Society for Biocuration (ISB) International Society for Computational Biology (ISCB) Student Council (ISCB-SC) Institute of Genomics and Integrative Biology (CSIR-IGIB) Japanese Society for Bioinformatics (JSBi)
Meetings	Basel Computational Biology Conference‎ ([BC²]) European Conference on Computational Biology (ECCB) Intelligent Systems for Molecular Biology (ISMB) International Conference on Bioinformatics (InCoB) International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB) ISCB Africa ASBCB Conference on Bioinformatics Pacific Symposium on Biocomputing (PSB) Research in Computational Molecular Biology (RECOMB)
File formats	CRAM format FASTA format FASTQ format NeXML format Nexus format Pileup format SAM format Stockholm format VCF format GFF format GTF format
Related topics	Computational biology List of biobanks List of biological databases Molecular phylogenetics Sequencing Sequence database Sequence alignment
Category Commons