Wikipedia:Bots/Requests for approval/GreenC bot 17
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: GreenC (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 16:08, Thursday, May 30, 2019 (UTC)
Automatic, Supervised, or Manual: supervised
Programming language(s): GNU Awk and WP:BOTWIKIAWK
Source code available: TBU
Function overview: If an article is already tagged with {{unreferenced}}
(currently about 180k) and the bot determines a tagged article contains references/links (ie. the tag might no longer be needed), add a new argument to the template |status=hasaref
. This will include the article in a tracking category. (It can also be set |status=nobot
in which case the bot will leave it alone.)
It will use the tagbot system (User:GreenC bot/Job 16) so users can receive fresh intelligence on-demand and act on it right away, without being overwhelmed by a giant list of articles from an old bot run that is outdated. It will be perpetual motion ie. once all templated articles are checked, it will generate a new list and start over from the beginning.
Links to relevant discussions (where appropriate):
- User_talk:GreenC#List_of_unreffed_articles_with_ELs?
- Template_talk:Unreferenced#Adding_|status=_argument
Edit period(s): on-demand
Estimated number of pages affected: 1-20 per run
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: The tagbot system at Template:Cleanup bare URLs/bot documents how it works.
Discussion
[edit]@Ajpolino: to notify of new BRFA. -- GreenC 16:19, 30 May 2019 (UTC)[reply]
- {{OperatorAssistanceNeeded}} at least until the Template:Unreferenced discussion is completed and this new parameter is accepted. Reason why: such edits are not currently appropriate if made by a human editor, so they can't yet be made by a bot. — xaosflux Talk 20:30, 30 May 2019 (UTC)[reply]
- Once this is resolved, please deactivate the OpAsNeeed tag and link to the result. — xaosflux Talk 20:30, 30 May 2019 (UTC)[reply]
- Discussion. No one responded so I added the tracking category after determining the scheme. The page is notified about the BRFA. -- GreenC 14:34, 1 June 2019 (UTC)[reply]
- Approved for trial (60 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (3 runs). Primefac (talk) 13:25, 15 June 2019 (UTC)[reply]
The bot is ready for testing. @Ajpolino: Would you be willing to run the bot through the trial period? Can run the bot from the control page User:GreenC bot/Job 17/bot. Set BOT=RUN 1
to tag 1 article (or up to 20). The article will appear in the tracking category. Then determine what should be done including remove/replace the template, or set it to |status=nobot
. Once completed 60, it will stop working and we can notify Primefac that the trial is done. -- GreenC 18:52, 25 June 2019 (UTC)[reply]
- @GreenC: Ok, went through the first batch of 20. It looks like the bot mostly works as intended. For a couple of them, I'm a little unclear on what bot logic led to their inclusion. For 1, 2, and 3 is the empty "References" section sufficient to trigger the bot? For 4, I can't find a link on the page; did the "Notes" section trigger its inclusion? Will get to fixing articles/changing template parameters and running more sometime in the next couple of days. Thanks!! Ajpolino (talk) 20:33, 25 June 2019 (UTC)[reply]
- @Ajpolino: It seems like most of the 20 are false positives or marginal cases. Probably need to narrow the criteria to the existence of inline refs, those are for certain 'hasaref' (eg. Kurjey Lhakhang). Let me know what you think. This is an experiment we didn't know what the data would reveal. I'm guessing the reason the noref bot had a lot of false negatives by design (the 'conservative' approach of silently skipping articles it wasn't sure of) and when reversing the algo it ends up with a lot of false positives. So maybe just a simple algo that flags articles that have inline refs would be accurate. -- GreenC 21:48, 25 June 2019 (UTC)[reply]
- @GreenC: That would be great! Since the goal here is to find the low-hanging fruit, that should suffice. Is there a simple way to just search for all articles in Category:All articles lacking sources, that also have at least one <ref> tag? If lots of articles meet that criterion that may be a good place to start, and we can revisit the bot when/if we need more sophisticated filtering. Thanks for the time you've put into this! Ajpolino (talk) 21:52, 25 June 2019 (UTC)[reply]
- The algo can be changed while retaining the tagbot framework. Estimating there might be about 10k to 20k cases. It would be too much to list on a single page and probably demoralizing after fixing a few hundred. Tagbot dolls it out in bite size pieces to whoever wants to contribute whenever they have time or desire. It also is perpetual once it reaches the end it starts over and rechecks. We can modify the parameters, like the limit of 20, or requirement for 5 in the tracking category etc.. whatever you like. It can even have a no limit and so it will tag everything all at once, but the idea is to encourage fixing the problem before taking on more. -- GreenC 00:19, 26 June 2019 (UTC)[reply]
- I'm sorry, I meant for a first pass could the bot just look for <ref> tags, and once that's exhausted maybe we could revisit the algorithm to find more easy references (external links sections, et al.)? But there might be a few thousand that have obvious references flanked by <ref> tags, which would be the lowest hanging fruit. I'm no search expert, but I think(?) this search may be giving me that set of articles? If that's true, we can work on this list, and come back to you when/if we ever make it through this... Ajpolino (talk) 14:31, 26 June 2019 (UTC)[reply]
- Yes that looks roughly right, close to 10k results. The tagbot framework was meant to make it more manageable for large numbers of article cleanups that contain a <ref> tags, but if you prefer to work off a search result instead that is fine, in which case this BRFA is not needed. Or if you want a plain list of article titles for importing into AWB let me know. -- GreenC 15:24, 26 June 2019 (UTC)[reply]
- Super! Thank you very much for your time. If I ever make it through that search result list, I'll let you know! Thanks again! Ajpolino (talk) 16:15, 26 June 2019 (UTC)[reply]
- Yes that looks roughly right, close to 10k results. The tagbot framework was meant to make it more manageable for large numbers of article cleanups that contain a <ref> tags, but if you prefer to work off a search result instead that is fine, in which case this BRFA is not needed. Or if you want a plain list of article titles for importing into AWB let me know. -- GreenC 15:24, 26 June 2019 (UTC)[reply]
- I'm sorry, I meant for a first pass could the bot just look for <ref> tags, and once that's exhausted maybe we could revisit the algorithm to find more easy references (external links sections, et al.)? But there might be a few thousand that have obvious references flanked by <ref> tags, which would be the lowest hanging fruit. I'm no search expert, but I think(?) this search may be giving me that set of articles? If that's true, we can work on this list, and come back to you when/if we ever make it through this... Ajpolino (talk) 14:31, 26 June 2019 (UTC)[reply]
- The algo can be changed while retaining the tagbot framework. Estimating there might be about 10k to 20k cases. It would be too much to list on a single page and probably demoralizing after fixing a few hundred. Tagbot dolls it out in bite size pieces to whoever wants to contribute whenever they have time or desire. It also is perpetual once it reaches the end it starts over and rechecks. We can modify the parameters, like the limit of 20, or requirement for 5 in the tracking category etc.. whatever you like. It can even have a no limit and so it will tag everything all at once, but the idea is to encourage fixing the problem before taking on more. -- GreenC 00:19, 26 June 2019 (UTC)[reply]
- @GreenC: That would be great! Since the goal here is to find the low-hanging fruit, that should suffice. Is there a simple way to just search for all articles in Category:All articles lacking sources, that also have at least one <ref> tag? If lots of articles meet that criterion that may be a good place to start, and we can revisit the bot when/if we need more sophisticated filtering. Thanks for the time you've put into this! Ajpolino (talk) 21:52, 25 June 2019 (UTC)[reply]
- @Ajpolino: It seems like most of the 20 are false positives or marginal cases. Probably need to narrow the criteria to the existence of inline refs, those are for certain 'hasaref' (eg. Kurjey Lhakhang). Let me know what you think. This is an experiment we didn't know what the data would reveal. I'm guessing the reason the noref bot had a lot of false negatives by design (the 'conservative' approach of silently skipping articles it wasn't sure of) and when reversing the algo it ends up with a lot of false positives. So maybe just a simple algo that flags articles that have inline refs would be accurate. -- GreenC 21:48, 25 June 2019 (UTC)[reply]
Withdrawn by operator. -- GreenC 17:45, 26 June 2019 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.