Jump to content

List of biological databases

From Wikipedia, the free encyclopedia
(Redirected from Protein sequence databases)

Biological databases are stores of biological information.[1] The journal Nucleic Acids Research regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databases.[2] Omics Discovery Index can be used to browse and search several biological databases. Furthermore, the NIAID Data Ecosystem Discovery Portal developed by the National Institute of Allergy and Infectious Diseases (NIAID) enables searching across databases.

Meta databases

[edit]

Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism. Originally, metadata was only a common term referring simply to data about data such as tags, keywords, and markup headers.

Model organism databases

[edit]

Model organism databases provide in-depth biological data for intensively studied organisms.

Nucleic acid databases

[edit]

DNA databases

[edit]

The primary databases make up the International Nucleotide Sequence Database (INSD). The include:

DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe) are repositories for nucleotide sequence data from all organisms. All three accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.

Secondary databases are:[clarification needed]

  • 23andMe's database
  • HapMap
  • OMIM (Online Mendelian Inheritance in Man): inherited diseases
  • RefSeq
  • 1000 Genomes Project: launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
  • EggNOG Database: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. It provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation.[6][7]

Other databases

Gene expression databases

[edit]

Generic gene expression databases

Microarray gene expression databases

Genome databases

[edit]

These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.

Phenotype databases

[edit]

RNA databases

[edit]

Amino acid / protein databases

[edit]

(See also: List of proteins in the human body)

Several publicly available data repositories and resources have been developed to support and manage protein related information, biological knowledge discovery and data-driven hypothesis generation.[15] The databases in the table below are selected from the databases listed in the Nucleic Acids Research (NAR) databases issues and database collection and the databases cross-referenced in the UniProtKB. Most of these databases are cross-referenced with UniProt / UniProtKB so that identifiers can be mapped to each other.[15]

Proteins in human:

There are about ~20,000 protein coding genes in the standard human genome. (Roughly ~1200 already have Wikipedia articles - the Gene Wiki - about them) if we are Including splice variants, there could be as many as 500,000 unique human proteins[16]

Different types of Protein databases

[edit]

Signal transduction pathway databases

[edit]

Metabolic pathway and protein function databases

[edit]

Taxonomic databases

[edit]

Numerous databases collect information about species and other taxonomic categories. The Catalogue of Life is a special case as it is a meta-database of about 150 specialized "global species databases" (GSDs) that have collected the names and other information on (almost) all described and thus "known" species.

  • BacDive: bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information
  • Catalogue of Life: a meta-database of all species on earth
  • EzTaxon-e: database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences
  • NCBI Taxonomy: a taxonomic database operated by NCBI and concentrating on all taxa for which DNA sequences are available (those sequences are stored by GenBank, another database operated by NCBI).

Image databases

[edit]

Images play a critical role in biomedicine, ranging from images of anthropological specimens to zoology. However, there are relatively few databases dedicated to image collection, although some projects such as iNaturalist collect photos as a main part of their data. A special case of "images" are 3-dimensional images such as protein structures or 3D-reconstructions of anatomical structures. Image databases include, among others:[22]

Radiologic databases

[edit]

Additional databases

[edit]

Exosomal databases

[edit]
  • ExoCarta
  • Extracellular RNA Atlas: a repository of small RNA-seq and qPCR-derived exRNA profiles from human and mouse biofluids

Mathematical model databases

[edit]

Databases on antimicrobial resistance rates and antibiotic consumption

[edit]

Databases on antimicrobial resistance mechanisms

[edit]

Wiki-style databases

[edit]

Specialized databases

[edit]

References

[edit]
  1. ^ Wren JD, Bateman A (October 2008). "Databases, data tombs and dust in the wind". Bioinformatics. 24 (19): 2127–8. doi:10.1093/bioinformatics/btn464. PMID 18819940.
  2. ^ "Volume 46 Issue D1 | Nucleic Acids Research | Oxford Academic". academic.oup.com. Retrieved 2018-09-04.
  3. ^ Lock A, Rutherford K, Harris MA, Hayles J, Oliver SG, Bähler J, Wood V (January 2019). "PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information". Nucleic Acids Research. 47 (D1): D821–D827. doi:10.1093/nar/gky961. PMC 6324063. PMID 30321395.
  4. ^ Zhu B, Stülke J (January 2018). "SubtiWiki in 2018: from genes and proteins to functional network annotation of the model organism Bacillus subtilis". Nucleic Acids Research. 46 (D1): D743–D748. doi:10.1093/nar/gkx908. PMC 5753275. PMID 29788229.
  5. ^ Margarita Garcia-Hernandez; Tanya Berardini; Guanghong Chen; Debbie Crist; Aisling Doyle; Eva Huala; Emma Knee; Mark Lambrecht; Neil Miller; Lukas A. Mueller; Suparna Mundodi; Leonore Reiser; Seung Y. Rhee; Randy Scholl; Julie Tacklind; Dan C. Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang (November 2002). "TAIR: a resource for integrated Arabidopsis data". Functional & Integrative Genomics. 2 (6): 239–253. doi:10.1007/s10142-002-0077-z. PMID 12444417. S2CID 7827488.
  6. ^ Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, et al. (January 2014). "eggNOG v4.0: nested orthology inference across 3686 organisms". Nucleic Acids Research. 42 (Database issue): D231-9. doi:10.1093/nar/gkt1253. PMC 3964997. PMID 24297252.
  7. ^ Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, et al. (January 2019). "eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses". Nucleic Acids Research. 47 (D1): D309–D314. doi:10.1093/nar/gky1085. PMC 6324079. PMID 30418610.
  8. ^ ArrayExpress
  9. ^ GEO
  10. ^ "The Human Protein Atlas". www.proteinatlas.org. Retrieved 2019-05-27.
  11. ^ Dash S, Campbell JD, Cannon EK, Cleary AM, Huang W, Kalberer SR, et al. (January 2016). "Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family". Nucleic Acids Research. 44 (D1): D1181-8. doi:10.1093/nar/gkv1159. PMC 4702835. PMID 26546515.
  12. ^ "Saccharomyces Genome Database | SGD". www.yeastgenome.org. Retrieved 2018-09-04.
  13. ^ Grant D, Nelson RT, Cannon SB, Shoemaker RC (January 2010). "SoyBase, the USDA-ARS soybean genetics and genomics database". Nucleic Acids Research. 38 (Database issue): D843-6. doi:10.1093/nar/gkp798. PMC 2808871. PMID 20008513.
  14. ^ "IRESbase".
  15. ^ a b Chen C, Huang H, Wu CH (2017). "Protein Bioinformatics Databases and Resources". In Wu CH, Arighi CN, Ross KE (eds.). Protein Bioinformatics. Methods in Molecular Biology. Vol. 1558. New York, NY: Springer New York. pp. 3–39. doi:10.1007/978-1-4939-6783-4_1. ISBN 978-1-4939-6781-0. PMC 5506686. PMID 28150231.
  16. ^ Karnkowska, Anna; Treitli, Sebastian C.; Brzoň, Ondřej; Novák, Lukáš; Vacek, Vojtěch; Soukal, Petr; Barlow, Lael D.; Herman, Emily K.; Pipaliya, Shweta V.; Pánek, Tomáš; Žihala, David; Petrželková, Romana; Butenko, Anzhelika; Eme, Laura; Stairs, Courtney W.; Roger, Andrew J.; Eliáš, Marek; Dacks, Joel B.; Hampl, Vladimír (2019). "The Oxymonad Genome Displays Canonical Eukaryotic Complexity in the Absence of a Mitochondrion". Molecular Biology and Evolution. 36 (10): 2292–2312. doi:10.1093/molbev/msz147. PMC 6759080. PMID 31387118.
  17. ^ Keshava Prasad, T. S.; Goel, R.; Kandasamy, K.; Keerthikumar, S.; Kumar, S.; Mathivanan, S.; Telikicherla, D.; Raju, R.; Shafreen, B.; Venugopal, A.; Balakrishnan, L.; Marimuthu, A.; Banerjee, S.; Somanathan, D. S.; Sebastian, A.; Rani, S.; Ray, S.; Harrys Kishore, C. J.; Kanth, S.; Ahmed, M.; Kashyap, M. K.; Mohmood, R.; Ramachandra, Y. L.; Krishna, V.; Rahiman, B. A.; Mohan, S.; Ranganathan, P.; Ramabadran, S.; Chaerkady, R.; Pandey, A. (2008). "Human Protein Reference Database—2009 update". Nucleic Acids Research. 37 (Database issue): D767–D772. doi:10.1093/nar/gkn892. PMC 2686490. PMID 18988627.
  18. ^ Mir S, Alhroub Y, Anyango S, Armstrong DR, Berrisford JM, Clark AR, et al. (January 2018). "PDBe: towards reusable data delivery infrastructure at protein data bank in Europe". Nucleic Acids Research. 46 (D1): D486–D492. doi:10.1093/nar/gkx1070. PMC 5753225. PMID 29126160.
  19. ^ Kinjo AR, Bekker GJ, Suzuki H, Tsuchiya Y, Kawabata T, Ikegawa Y, Nakamura H (January 2017). "Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures". Nucleic Acids Research. 45 (D1): D282–D288. doi:10.1093/nar/gkw962. PMC 5210648. PMID 27789697.
  20. ^ Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, et al. (January 2017). "The RCSB protein data bank: integrative view of protein, gene and 3D structural information". Nucleic Acids Research. 45 (D1): D271–D281. doi:10.1093/nar/gkw1000. PMC 5210513. PMID 27794042.
  21. ^ Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, et al. (January 2004). "IntAct: an open source molecular interaction database". Nucleic Acids Research. 32 (Database issue): D452-5. doi:10.1093/nar/gkh052. PMC 308786. PMID 14681455.
  22. ^ a b Ellenberg J, Swedlow JR, Barlow M, Cook CE, Sarkans U, Patwardhan A, et al. (November 2018). "A call for public archives for biological image data". Nature Methods. 15 (11): 849–854. doi:10.1038/s41592-018-0195-8. PMC 6884425. PMID 30377375.
  23. ^ Tendler BC, Hanayik T, Ansorge O, Bangerter-Christensen S, Berns GS, Bertelsen MF, et al. (March 2022). "The Digital Brain Bank, an open access platform for post-mortem imaging datasets". eLife. 11: e73153. doi:10.7554/eLife.73153. PMC 9042233. PMID 35297760.
  24. ^ Iudin A, Korir PK, Salavert-Torres J, Kleywegt GJ, Patwardhan A (May 2016). "EMPIAR: a public archive for raw electron microscopy image data". Nature Methods. 13 (5): 387–388. doi:10.1038/nmeth.3806. PMID 27067018. S2CID 38996040.
  25. ^ Crickmore, N.; Berry, C.; Panneerselvam, S.; Mishra, R.; Connor, T. R.; Bonning, B. C. (November 2021). "A structure-based nomenclature for Bacillus thuringiensis and other bacteria-derived pesticidal proteins". Journal of Invertebrate Pathology. 186 (D1): 107438. doi:10.1016/j.jip.2020.107438. PMID 32652083.
  26. ^ Panneerselvam S; Mishra R; Berry C; Crickmore N; Bonning BC (2022). "BPPRC database: a web-based tool to access and analyse bacterial pesticidal proteins". Database (Oxford). 186 (D1): 107438. doi:10.1093/database/baac022. PMC 9216523. PMID 35396594.
  27. ^ Hounkpe BW, Chenou F, de Lima F, De Paula EV (January 2021). "HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets". Nucleic Acids Research. 49 (D1): D947–D955. doi:10.1093/nar/gkaa609. PMC 7778946. PMID 32663312.
  28. ^ (IHEC) data portal
  29. ^ CEEHRC
  30. ^ Blueprint
  31. ^ EGA
  32. ^ DEEP
  33. ^ CREST
  34. ^ "Sharing epigenomes globally". Nature Methods. 15 (3): 151. 2018. doi:10.1038/nmeth.4630. ISSN 1548-7105.
  35. ^ Valverde H, Cantón FR, Aledo JC (November 2019). "MetOSite: an integrated resource for the study of methionine residues sulfoxidation". Bioinformatics. 35 (22): 4849–4850. doi:10.1093/bioinformatics/btz462. PMC 6853639. PMID 31197322.
[edit]