User:StanfordLinkPredictor
This user account is a bot that uses [1], operated by ashwinpp (talk). It is used to make repetitive automated or semi-automated edits that would be extremely tedious to do manually, in accordance with the bot policy. This bot does not yet have the approval of the community, or approval has been withdrawn or expired, and therefore shouldn't be making edits that appear to be unassisted except in the operator's or its own user and user talk space. Administrators: if this bot is making edits that appear to be unassisted to pages not in the operator's or its own userspace, please block it. |
Emergency bot shutoff button
Administrators: Use this button if the bot is malfunctioning. (direct link)
Non-administrators can a malfunctioning bot to Wikipedia:Administrators' noticeboard/Incidents.
This user is a bot | |
---|---|
(talk · contribs) | |
Operator | ashwinpp (talk · contribs) |
Flagged? | Yes |
Task(s) | Link a mention in a Wikipedia article to another Wikipedia article based on statistical inference from human navigational traces |
Edit rate | Max. 600 edits/month |
Edit period(s) | Requirement-based |
Automatic or manual? | Automatic |
Programming language(s) | Python |
Exclusion compliant? | No |
Source code published? | Yes |
Emergency shutoff-compliant? | Yes |
Task
[edit]This is a Wikipedia bot which inserts links between Wikipedia pages based on statistical inference on human navigational traces.
The job of this bot is to insert a link between a source and a target page, given the mention in the source page which should link to the target page. The input is in the form of a tab-separated file. To make this bot version-agnostic, it provides a best-effort service when searching for the mention in the source article. If the mention exists then the link is added (at the first mention), otherwise it is not. It does not support specifying a location of the mention (in terms of number of words preceding it) because that location is subject to change due to edits.
The link prediction algorithm was developed in a research project that is part of a collaboration between Stanford University and the Wikimedia Foundation. The project page can be found here. A paper describing the algorithm and results is under submission to the World Wide Web Conference; if you would like a confidential preprint, please get in touch with Bob West.
Link Prediction Method
[edit]We propose a novel approach to identifying missing links on Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia’s navigability. We leverage a data set of navigation paths collected through a Wikipedia-based human-computation game called The Wiki Game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates according to various metrics. We further validate our prediction by recruiting human raters from Amazon Mechanical Turk and setting up a human evaluation task that asks them to guess which links should exist in Wikipedia, based on the Linking Guidelines. Our evaluation (see above for how to obtain a preprint of the paper) shows that the links predicted by our method are of higher quality than alternative methods.