Wikipedia:Bots/Requests for approval/William Avery Bot 3
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard. The result of the discussion was Approved.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: William Avery (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 08:11, Wednesday, June 2, 2021 (UTC)
Function overview: Remove dead URL's and associated {{Dead link}} tagging from CS1 templates if there is a free alternative available via an identifier.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python (pywikibot and mwparserfromhell)
Source code available: unneededDeadlinksBot.py unneededDeadlinks_medicine.sh unneededDeadlinks.sh wikipythonics_util.py
Links to relevant discussions (where appropriate): WP:BOTREQ#Remove dead links from book and journal citation templates with identifiers - It was agreed to proceed only where there is free access indicated. There may be further discussions to be had regarding other access levels and situations. These would have to be the subject of a further BRFA.
Edit period(s): one time run, with possible ad hoc repeats
Estimated number of pages affected: 111 pages for WikiProject Medicine, c. 1000 pages overall
Namespace(s): Mainspace/Articles
Exclusion compliant (Yes/No): Yes
Function details:
- Query the database for pages in Category:Articles with permanently dead external links (See SQL query in unneededDeadlinks.sh). I expect initial runs to be confined to WikiProject Medicine articles. (See unneededDeadlinks_medicine.sh)
- Each {{Deadlink}} present will be processed, and processing only proceeds if there is a value of 'yes' in the fix-attempted parameter.
- Using mwparserfromhell the deadlink template's ancestor elements are examined to find a tag or other element likely to contain the affected citation.
- The affected citation is the sibling template element that precedes the deadlink tag within the identified ancestor. This and the preceding step have details that depend closely on the mwparserfromhell parse tree, and have been refined by scanning large samples of pages. e.g. if there is a plain external link after the preceding template, then the dead link being tagged is that external link, and not a link to any preceding template, so no fix is possible.
- The candidate template is checked for a value in the url parameter. Processing only proceeds if there is a url. Editors sometimes mark a broken doi etc with {{Deadlink}}, rather than using the doi-broken-date parameter.
- The candidate template is checked for a value in the archive-url parameter. Processing only proceeds if there is not an archive-url. Presence of an archive-url should indicate that the link is fixed.
- A check is made to see if there are identifier parameters present that indicate free access. For details see WP:CS1#Access indicator for named identifiers. Under the scope of this request, processing will only proceed if free access is indicated and the access is unaffected by presence of a doi-broken-date or a pmc-embargo-date.
- If all the conditions are fulfilled, the url parameter is removed from the template along with any access-date parameter, and the {{Deadlink}} tagging is removed.
- Apply a general fix to remove template redirects using the rules at Wikipedia:AutoWikiBrowser/Template redirects. (I thought I could apply this fix prior to the above processing to simplify it, but some of the fixed templates were then removed, making the edit summary misleading.)
Test outputs:
- User:William Avery Bot/testsample medicine - list of all 111 citations under WikiProject Medicine identified as fixable.
- User:William Avery Bot/testsample all e - Sample list of 29 fixable citations from general articles beginning with letter 'E'
- https://wiki.riteme.site/w/index.php?title=User%3AWilliam_Avery_Bot%2Fdeadlinkstest&type=revision&diff=1026339187&oldid=1026333752 - Test edit in userspace. Once a suitable case has been identified, the actual fix is rather simple.
Discussion
[edit]Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 23:13, 29 June 2021 (UTC)[reply]
- Trial complete. I have checked the edits made, and the free sources indicated by the reference template parameters are indeed available in all fifty cases, which was my main worry. Edits here.
- @Velayinosu, Ajpolino, and GreenC: Courtesy ping to original requesters and GreenC, who gave helpful advice. William Avery (talk) 11:34, 22 July 2021 (UTC)[reply]
- This bot is very helpful since tracking down these articles and making these edits manually would be quite time consuming. It's more limited in scope than I personally prefer but that's understandable since it's new. Maybe its scope can be broadened over time. In any case, thank you for making this bot. Velayinosu (talk) 01:17, 24 July 2021 (UTC)[reply]
Approved. Primefac (talk) 21:05, 22 August 2021 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Wikipedia:Bots/Noticeboard.