Wikipedia:Bots/Requests for approval/HersfoldCiteBot
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: Hersfold (talk · contribs)
Automatic or Manually assisted: Manual login, Automatic editing with supervision. In case of problems, a log is saved locally on my computer during each run.
Programming language(s): Java, with User:MER-C/Wiki.java
Source code available: Yes, at User:HersfoldCiteBot/Source. Details of changes between versions available at User:HersfoldCiteBot/Version.
Function overview: Correcting basic but common errors in {{cite web}} templates.
Links to relevant discussions (where appropriate): Request on WP:BOTREQ (permalink). Task should be non-controversial.
Edit period(s): No more than daily, more likely once every week or so.
Estimated number of pages affected: As of the time I write this there are 28 articles in Category:Articles with broken citations; I'd guess the bot would make edits to about 25 articles per run at the most.
Exclusion compliant (Y/N): No, however it seems unnecessary given that only 14 articles contain the no bots template.
Already has a bot flag (Y/N): No.
Function details: This bot will go through Category:Articles with broken citations and correct common errors in {{cite web}} templates; currently, missing |title= parameters, missing |archivedate= parameters when an |archiveurl= parameter is present, and missing |accessdate= parameters. If the bot attempts to correct one of these errors and finds it is not able to for some reason, it will report areas needing manual attention to User:HersfoldCiteBot/Citation errors needing manual review.
Discussion
[edit]A sample "Citation errors needing manual review" page is available here. Hersfold (t/a/c) 06:11, 10 September 2010 (UTC)[reply]
- It would be interesting to know how many citations you encounter with
|archiveurl=
or|archivedate=
missing. — HELLKNOWZ ▎TALK 08:51, 10 September 2010 (UTC)[reply]
- Also, I think "<!-- Title generated by HersfoldCiteBot, please report errors to [[User talk:Hersfold]] -->" is a bit too long for an auto-generated message if you will use it for live edits. Can you use "<!-- Bot generated title -->" for greater compatibility with tools that may look for this. — HELLKNOWZ ▎TALK 08:59, 10 September 2010 (UTC)[reply]
- For your first question/statement, are you asking for the bot to keep statistics? Just so you know, it doesn't currently look for a missing archiveurl paramter, only a missing archivedate if archiveurl is present.
- I can make the change to the shorter comments before trial runs. Hersfold (t/a/c) 17:12, 10 September 2010 (UTC)[reply]
- I only mentioned keeping statistics, since it would be interesting to see. You don't actually need to do so unless you want to. As to shortening comment — great. — HELLKNOWZ ▎TALK 18:58, 10 September 2010 (UTC)[reply]
- As
|accessdate=
is not mandatory and does not cause an article to enter Category:Articles with broken citations, what exactly does the bot edit this for? Or is this just an additional bonus? --Muhandes (talk) 23:08, 11 September 2010 (UTC)[reply]- Bonus, I guess? I put that in as a "while I'm at it" sort of thing. In the event the link does go dead, this may help provide a rough guideline of when the site was last up (especially so if the bot adds a title) to make finding an archive a little easier. Hersfold (t/a/c) 04:08, 12 September 2010 (UTC)[reply]
- If the url is dead, what (if anything) will the bot put for an accessdate assuming that parameter is missing? And if it does add an access date in this situation, how will it determine when the url was last not dead?--Rockfang (talk) 15:21, 13 September 2010 (UTC)[reply]
- Correct me if I'm wrong, but I thought the bot does not archive the dead links or provide accessdates? In any case there has not been (VP 1, VP 2) consensus on bots filling accessdates. May I ask the operator to shortly point out what the "common citation errors" tasks are? Are these only the ones that generate errors? — HELLKNOWZ ▎TALK 15:29, 13 September 2010 (UTC)[reply]
- @H3llkn0wz - Archiving dead links would be pointless. The archived links would still be dead links. Also, you were incorrect when you typed "...but I thought the bot does not...provide accessdates." The operator lists what the errors are in the "Function details" section. Missing accessdate is one of them. I can't think of any reason why a cite template with a url parameter being used shouldn't have an accessdate.--Rockfang (talk) 15:51, 13 September 2010 (UTC)[reply]
- By "archiving dead links" I meant "providing an URL/link to an archived copy for a given URL/link before it was dead (usually based on accessdate)". Regarding
|accessdate=
mentioned in function details — my bad, I though Hersfold used the same description as in BOTREQ. In that case — same question as Rockfang — how does the bot determine the access date? — HELLKNOWZ ▎TALK 16:11, 13 September 2010 (UTC)[reply]- The bot uses the current date, although now reading the discussion here, I'm going to add some code to verify that the site is in fact up before it attempts to do so. If the link is dead (or the bot is unable to access it for whatever reason) I'll have it flag the link for manual review and avoid adding an accessdate. Note also that for the archivedate, the bot does not attempt to access web.archive.org - it simply pulls the date out from the timestamp embedded in the archiveurl. In that case, I'm assuming that archive.org is going to hang around for a good long while. Hersfold (t/a/c) 16:37, 13 September 2010 (UTC)[reply]
- Oh, and the bot does not generate archive links. If a site is dead, it will simply say "Hey, this is dead" and leave it at that. The only reason the bot does anything with the archive parameters is if there is an archiveurl parameter and there isn't an archivedate parameter. Hersfold (t/a/c) 16:38, 13 September 2010 (UTC)[reply]
- Current date for accessdate is certainly wrong, it should be the date when the citation/reference was added — see the two VP links above. This requires looking through revisions with some special cases in mind. In what cases do you want to add accessdate — when retrieving the
|title=
value? — HELLKNOWZ ▎TALK 16:45, 13 September 2010 (UTC)[reply]- Looking through revisions may be impractical given this bot's intended task; I could still add it if I have to access the site anyway for the title parameter, as presumably if the link still works, the content we're looking for is still there or can be found within a few clicks. Would that work, or is it better to just remove that entirely? Hersfold (t/a/c) 17:08, 13 September 2010 (UTC)[reply]
- Looking through revisions with binary search is quite fast in most cases, browsing other sites is generally slower. Also, even if you access the site, you cannot be certain that the content hasn't changed since original access — that is the whole purpose of the access date. I agree that there is small chance that it has in most cases, but this is not for the bot to decide. — HELLKNOWZ ▎TALK 18:53, 13 September 2010 (UTC)[reply]
- Looking through revisions may be impractical given this bot's intended task; I could still add it if I have to access the site anyway for the title parameter, as presumably if the link still works, the content we're looking for is still there or can be found within a few clicks. Would that work, or is it better to just remove that entirely? Hersfold (t/a/c) 17:08, 13 September 2010 (UTC)[reply]
- Current date for accessdate is certainly wrong, it should be the date when the citation/reference was added — see the two VP links above. This requires looking through revisions with some special cases in mind. In what cases do you want to add accessdate — when retrieving the
- By "archiving dead links" I meant "providing an URL/link to an archived copy for a given URL/link before it was dead (usually based on accessdate)". Regarding
- Hmm. We'll see. For now I'll just disable that module, but I'll look into adding the binary search code later. Hersfold (t/a/c) 23:56, 13 September 2010 (UTC)[reply]
- Do note that it would still need consensus, which has not been reached in the previous attempt. — HELLKNOWZ ▎TALK 00:00, 14 September 2010 (UTC)[reply]
- So noted; I'll make sure the code remains disabled in the live bot until it's well tested and approved. Are there any other comments you had? Hersfold (t/a/c) 16:14, 14 September 2010 (UTC)[reply]
- Nope, have fun :) — HELLKNOWZ ▎TALK 16:16, 14 September 2010 (UTC)[reply]
- So noted; I'll make sure the code remains disabled in the live bot until it's well tested and approved. Are there any other comments you had? Hersfold (t/a/c) 16:14, 14 September 2010 (UTC)[reply]
- Do note that it would still need consensus, which has not been reached in the previous attempt. — HELLKNOWZ ▎TALK 00:00, 14 September 2010 (UTC)[reply]
I've made the changes in the code as requested in version 1.1.0b, specifically:
- The
|accessdate=
code is commented out and no longer functions. - The comments the bot leaves are now <!-- Bot generated title --> for the default "go-to-website-and-grab-page-title" case, and <!-- Bot generated title --><!-- HCB assumed title --> for the case where people mess up the parameter and leave it as {{cite web|url=http://www.link.com page title}} (similar to the [link title] syntax). I'm keeping these separate because it seems more likely that I'll have significant bugs with the second case; people are more-or-less used to the first by now, and by leaving the "Bot generated title" comment entirely intact, it shouldn't interfere with other tools.
Any other comments or suggestions (especially from BAG members, who don't seem to have commented yet) are welcome. Hersfold (t/a/c) 23:39, 16 September 2010 (UTC)[reply]
- Just a side note; I'd be keen to incorporate these functionalities (when operational) into Citation bot, if this is feasible or beneficial. Martin (Smith609 – Talk) 16:27, 20 September 2010 (UTC)[reply]
- You're welcome to the source code - it's not in PHP, but it should still help. Hersfold (t/a/c) 18:01, 20 September 2010 (UTC)[reply]
{{BAG assistance needed}} - it's been nine days now, wondering if I can start trials? Hersfold (t/a/c) 15:20, 19 September 2010 (UTC)[reply]
- Approved for trial (75 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Sorry for the delay. Feel free to proceed with a trial (without the controversial
|accessdate=
code) whenever you're ready. - EdoDodo talk 17:41, 23 September 2010 (UTC)[reply]
- I've done one trial, but the bot didn't make any edits; there's an issue, likely with the bot framework, that prevents the bot from noticing the error messages the citation templates generate, and thus preventing it from realizing something needs to be fixed. I've put the log up here, not that it's much to look at. Since the bot failed to make any edits, I'm going to try to fix this and then run it again. Hersfold (t/a/c) 23:18, 23 September 2010 (UTC)[reply]
- It would be nice if your edit summary linked to the task description. — HELLKNOWZ ▎TALK 00:37, 24 September 2010 (UTC)[reply]
- I hope you don't think of me as picking on your bot, I support your work. I'm just noting that most bots leave a link to task description unless it is self-explanatory or straightforward. — HELLKNOWZ ▎TALK 00:49, 24 September 2010 (UTC)[reply]
- Not at all, I appreciate the feedback, I just feel as though this is one of the more straightforward ones. But I can add a link in the next revision. Which, on a side note, will probably take a while to come out as there are a lot of little problems to be fixed. Hersfold (t/a/c) 00:58, 24 September 2010 (UTC)[reply]
- I'll be attempting another trial run shortly, the bot has been updated to (hopefully) fix the problems noted in the previous run. Hersfold (t/a/c) 00:11, 29 September 2010 (UTC)[reply]
- Just as an update, I haven't forgotten about this; the past week has been extremely busy for me and I haven't had the time to focus on this at all. I'll post back here once things lighten up and I'm able to fix the errors noticed in the last run. Hersfold (t/a/c) 21:28, 7 October 2010 (UTC)[reply]
{{OperatorAssistanceNeeded|D}}
Any progress? Anomie⚔ 01:05, 11 November 2010 (UTC)[reply]
- Just as an update, I haven't forgotten about this; the past week has been extremely busy for me and I haven't had the time to focus on this at all. I'll post back here once things lighten up and I'm able to fix the errors noticed in the last run. Hersfold (t/a/c) 21:28, 7 October 2010 (UTC)[reply]
- I've not put any effort into this since my last post. This semester is extremely busy for me. If this can be put on hold or even declined for now, the earliest I can say with any certainty that I'll be able to dedicate a significant amount of time to it is January. Hersfold (t/a/c) 03:53, 17 November 2010 (UTC)[reply]
- Withdrawn by operator. Ok. If you come back to it in January, you can reopen this request by undoing this edit, or you can start a new one if you think things have changed sufficiently. Anomie⚔ 03:56, 17 November 2010 (UTC)[reply]
- I've not put any effort into this since my last post. This semester is extremely busy for me. If this can be put on hold or even declined for now, the earliest I can say with any certainty that I'll be able to dedicate a significant amount of time to it is January. Hersfold (t/a/c) 03:53, 17 November 2010 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.