Jump to content

EBI Search

From Wikipedia, the free encyclopedia
(Redirected from EB-eye)

EBI Search
Native name
EBI Search (formerly EB-eye)
Type of site
Data search engine
Available inEnglish language
OwnerEuropean Bioinformatics Institute
ServicesResearch and services in bioinformatics
URLebi.ac.uk
RegistrationOptional
Launched2006; 19 years ago (2006)
Current statusOnline

EBI Search is a scalable text search engine that provides easy and uniform access to the biological data resources and services hosted at the European Bioinformatics Institute (EBI).[1] [2]

The original and primary purpose of EBI Search is to provide search and indexing capabilities of publicly available biological data, thus enabling research in the fields of bioinformatics and life sciences by supporting both basic research and the broader scientific community by making biological data easily accessible and searchable.[3]

In addition to the EBI Search website, a RESTful API interface is available, enabling programmatic data queries. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines.

History

[edit]

The EBI Search project was developed in August 2006 at the European Bioinformatics Institute as software under the name EB-eye on top of the existing Apache Lucene open-source search engine. [3] The project was soon explanded to include more than 62 distinct datasets, covering about 400 million entries and was renamed to EBI Search. [2] [1]

In 2017, EBI Search was improved by implementing "search as a service" through a RESTful API that let other websites integrate its search capabilities into their platforms, eliminating the need to build separate search systems. The service was also enhanced with features like hierarchical taxonomy navigation and similar-entry suggestions, while scaling to handle over 300 million searches and 1.3 billion records that could be re-indexed in under 24 hours. [4]

In 2019, EBI Search was further developed to include a new HTTP cache mechanism improving response times, unlimited cross-references retrieval, support for Cross-Origin Resource Sharing (CORS), and integration of new data resources like Europe PMC, BioSamples, Rfam, and reviewed ChEMBL. [5]

During the COVID-19 pandemic, the project was updated to handle increased data needs.[6] At present, the EBI Search engine indexes more than 140 different data resources, making it one of the most comprehensive search tools for biological and biomedical data.

Data resources

[edit]

EMBL-EBI hosts a vast amount of molecular data and other information that is indexed by EBI Search. The search engine indexes data from various data resources. All these resources are freely available and regularly updated through EMBL-EBI's data management pipeline.

The EBI Search can search only the information that gets indexed. This implies that other search engines operating on biological data might yield different results. As a rule of thumb, the EBI Search engine indexes identifiers, names, descriptions, keywords and cross-references.

The indexed data includes nucleotide sequences and protein sequences, protein families, structural data, gene expression profiles, protein interactions, biological pathways, and small molecules. Additionally, EBI Search indexes academic literature, patents, and institutional information.

Search interface

[edit]

When users enter text into EBI Search interfaces - whether through the search boxes or by specifying the query parameter in RESTful API calls - their input gets converted into a standardized search query format. This converted query is what actually retrieves the search results.

Searching using the website

[edit]

The user can search globally across all data resources indexed by EBI search by using the EBI search box. You can simply type some query terms into the text search box there and press the search button (or press Enter). The user can thus search globally across all EBI Search data resources. The system then displays a summary page with a list of various data sets and the number of matches found in each of them.

In EBI Search boxes you can enter any meaningful term to find relevant information by typing, for example, accession numbers/identifiers (such as VAV_HUMAN), gene symbols (for instance tpi1), species or keywords.

Search results

[edit]

The EBI Search website presents results in a three-column layout designed for efficient data exploration. The left column displays a summary of hits per category/domain with customizable facets for filtering results. The central column lists the primary search results with direct URLs to original data entries. The right column shows related data and alternative views. For gene and protein queries, specialized "Gene & protein summaries" appear above the main results, collating data from multiple EMBL-EBI resources according to molecular biology's central dogma.

Features and tools
[edit]

Users can interact with search results in several ways:

  • Data Export: Results can be downloaded in multiple formats (XML, JSON, TSV, CSV) using the 'Save result' button, with a current limit of 100 entries per download
  • Analysis Tools: Direct launching of domain-specific tools (e.g., BLAST for sequence analysis, Clustal Omega for multiple sequence alignment) from selected search results
  • RSS Alerts: Users can create RSS feeds to monitor updates to their search queries, particularly useful for tracking new publications, protein entries, or structural data
  • Cross-References: Results include links to related entries across different EMBL-EBI databases, facilitating comprehensive data exploration
Result relevance
[edit]

Search result ordering primarily follows Apache Lucene's scoring system, where closer matches receive higher relevance scores. Users can influence result ranking using the caret symbol (^) followed by a boost factor—for example, "prostate^4 AND cancer" gives greater weight to entries matching "prostate". While EBI Search can be configured to boost specific domains or fields, runtime boosting is recommended for most precise control over result ordering.

Searching using the API

[edit]

The EBI Search provides RESTful Web Services that allow programmatic access to biological data from the EBI Search data resources. This service is particularly useful for researchers and developers who wish to include EBI Search results into their code pipelines or to simply use it with a custom developed interface.

The implementation details and webinar can be found on official EMBL EBI sites.

Users can interact with the API through various endpoints supporting different response formats including XML, JSON, RSS, and CSV. The service enables faceted searching, cross-reference searching, and auto-completion functionality across multiple databases.

The API currently follows Apache Lucene query syntax and returns appropriate HTTP status codes to indicate the success or failure of requests.

References

[edit]
  1. ^ a b Squizzato S.; Park Y.M.; Buso N.; Gur T.; Cowley A.; Li W.; Uludag M.; Pundir S.; Cham J.A.; McWilliam H.; Lopez R. (2015). "The EBI Search engine: providing search and retrieval functionality for biological data from EMBL-EBI". Nucleic Acids Res. 43 (W1): W585-8. doi:10.1093/nar/gkv316. PMC 4489232. PMID 25855807.
  2. ^ a b Valentin F.; Squizzato S.; Goujon M.; McWilliam H.; Paern J.; Lopez R. (2010). "Fast and efficient searching of biological data resources—using EB-eye". Brief Bioinform. 11 (4): 375–384. doi:10.1093/bib/bbp065. PMC 2905521. PMID 20150321.
  3. ^ a b Goujon, M.; Valentin, F.; Miyar, T.; McWilliam, H.; Lopez, R. (December 2007). "The EB-eye". No. 13.4. EMBnet.news. p. 18-21.
  4. ^ Park, YM; Squizzato, S; Buso, N; Gur, T; Lopez, R (May 2017). "The EBI search engine: EBI search as a service-making biological data accessible for all". Nucleic Acids Research. 45 (W1): W545 – W549. doi:10.1093/nar/gkx359.
  5. ^ Madeira, F.; Park, Y.M.; Lee, J.; Buso, N.; Gur, T.; Madhusoodanan, N.; Basutkar, P.; Tivey, ARN; Potter, SC; Finn, RD; Lopez, R (12 April 2019). "The EMBL-EBI search and sequence analysis tools APIs in 2019". Nucleic Acids Research. 47 (W1): W636 – W641. doi:10.1093/nar/gkz268.
  6. ^ Madeira, Fábio; Pearce, Matt; Basutkar, Prasad; Lee, Joon; Edbali, Ossama; Madhusoodanan, Nandana; Kolesnikov, Anton; Lopez, Rodrigo (July 2022). "Search and sequence analysis tools services from EMBL-EBI in 2022". Nucleic Acids Research. 50 (W1): W276 – W279. doi:10.1093/nar/gkac240.
[edit]