Talk:Full-text search
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
To do
[edit]- Create new articles so that the ugly red links go away.
- Create an information retrieval category and add relevant documents to it.
- Check list of querying techniques to make sure it's complete (it isn't).
- Explain that search engines use varying methods to express boolean operators, etc.
- Incorporate modern information; Alta Vista? All popular search engines today work off the concept of full-text searching. —Preceding unsigned comment added by 12.35.22.253 (talk) 15:45, 15 May 2008 (UTC)
Search algorithm
[edit]Google's PageRank algorithm is referred to as a search algorithm, but isn't it a relevance ranking algorithm, not a search algorithm? Nurg 01:58, 19 March 2006 (UTC)
- Agree that it is a relevance ranking algorithm, but it is used along with a basic vector-space model in the case of Google to provide a search algorithm. I would rather not make a direct reference though, it would be better to cite citation analysis, citation index, or Bibliometrics. Josh Froelich 14:15, 21 December 2006 (UTC)
MapReduce
[edit]AFAIK, MapReduce-based fulltext search as employed by e.g. Google doesn't use a classical fulltext index as described in the article. Rather, it uses massive parallelism to actually scan each document in real time as the search runs (rather than offline when an index is built), finding those that match ("map" stage), and then collects and ranks the results ("reduce" stage). The article makes it sound as if building a fulltext index were the only way to efficiently perform full-text searches over large numbers of documents. Multi io (talk) 02:30, 18 February 2010 (UTC)
Inverted index should be used even in MapReduce-based text retrieval system if you want to achieve performance. Of course you can scan the whole text corpus for "matching" documents. Even in the case of "concept search", in pratice to return results in timely fashion in less than 1 seconds, even with a corpus of few GB, you need an inverted index AND map-reduce kind-of algorithm. --i⋅am⋅amz3 (talk) 11:20, 22 December 2019 (UTC)
search vs. matching
[edit]The article does not make clear the distinction between boolean pattern matching in text (string matching) and information retrieval. -- JakobVoss (talk) 10:10, 11 December 2011 (UTC)
"Full" vs. "Free"
[edit]This article uses the terms "full text search" and "free text search" interchangeably. Is there a difference? — Preceding unsigned comment added by 76.14.24.156 (talk) 17:42, 5 November 2012 (UTC)
This article uses "full-text" and "full text" interchangeably as well. If there is a differentiation there it needs to be defined. — Preceding unsigned comment added by 24.67.190.178 (talk) 22:57, 4 August 2017 (UTC)
External links modified
[edit]Hello fellow Wikipedians,
I have just added archive links to one external link on Full text search. Please take a moment to review my edit. If necessary, add {{cbignore}}
after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}}
to keep me off the page altogether. I made the following changes:
- Added archive https://web.archive.org/20101223192214/http://www.lucidimagination.com:80/full-text-search to http://www.lucidimagination.com/full-text-search
When you have finished reviewing my changes, please set the checked parameter below to true to let others know.
This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}}
(last update: 5 June 2024).
- If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
- If you found an error with any archives or the URLs themselves, you can fix them with this tool.
Cheers.—cyberbot IITalk to my owner:Online 22:11, 9 January 2016 (UTC)
History section needed
[edit]This page needs a History section. Full-text search is a candidate for the algorithm with the most far-reaching effects on human activity so far in CS history.
Encyclopedant (talk) 04:38, 28 May 2017 (UTC)
- The history is at https://wiki.riteme.site/wiki/Information_retrieval#Timeline — Preceding unsigned comment added by Iamamz3 (talk • contribs) 17:59, 27 October 2020 (UTC)
Merge with Information retrieval
[edit]This should be merge with IR. full-text search is merely an improvement over boolean keyword search that takes into account some meaning by using stemming or lemmatization. I think it sits on a continuum where on one end there is boolean keyword search and in another end there Concept search. i⋅am⋅amz3 (talk) 00:46, 18 March 2018 (UTC)
- That sounds plausible, so I added merge tags in accordance with Wikipedia:Merging. --DavidCary (talk) 05:38, 7 October 2020 (UTC)
It sounds the IR is more generalized description of retrieve - it might be include voice and image and other search. However, text is still the most dominant query method for now (until someday voice/image can be processed with the same easiness and accuracy of current keystroke can provide). For now, I think full text search/text search worth its own standing for its dominance in everyday's life. — Preceding unsigned comment added by Ben H Zhu (talk • contribs) 15:37, 23 January 2021 (UTC)
- Closing, with no merge, given the uncontested objection (based on the importance of full-text searching. Klbrain (talk) 12:18, 2 January 2022 (UTC)
Advertising
[edit]This edit adds the "AnyTXT Searcher" link which claims to be free and open source software but can apparently only be obtained as binary downloads for Windows platforms. No licensing information is given, even at the SourceForge project page. It looks like advertising to me. PaulBoddie (talk) 20:56, 22 July 2019 (UTC)
- I bet the person who added it saw it as “free software” and “OSS” and since they are free, but proprietary, they should be moved to “proprietary”. Kigelim (talk) 02:38, 23 July 2019 (UTC)
full text indexing =
[edit]The phrase I actually searched on to get here is "full text indexing". "Information retrival" has a musty 60s-era punch card feel to it.
That might seem silly, but you need to think about the way the audience thinks about the subject, it's not all about the way experts in the field approach it.