Wikipedia:Bots/Requests for approval/BattyBot 25
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: GoingBatty (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 19:00, Sunday December 1, 2013 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): AutoWikiBrowser
Source code available: AWB + User:BattyBot/CS1 errors-dates
Function overview: Fix incorrect date formats in citation templates to remove articles from Category:CS1 errors: dates.
Links to relevant discussions (where appropriate):
Edit period(s): Frequent runs
Estimated number of pages affected: Thousands
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: There are over 100,000 articles currently in Category:CS1 errors: dates. This bot task will use regex to find and replace incorrect date formats in citation templates to remove red errors displayed to readers and/or remove articles from Category:CS1 errors: dates. It will also perform any additional AWB general fixes. Examples:
- Remove {{Start date}} from citation templates - example
- Remove extraneous parentheses - example
- Remove extraneous commas - example
- Convert yyyy-mm to Mmmmm yyyy - example
- Convert periods to commas - example
- Comment out times - example
- Change months in foreign languages to English - example
- Change date format from "23rd" to "23" - example
- Remove day of the week - example
- Expand abbreviated month names - example
- Remove nbsp; - example
Remove extraneous text - example- Add missing dashes in dates - example
Consolidate|date=
,|month=
,|year=
into|date=
- example
This bot will not be able to fix all potential errors, but should resolve the common issues so editors can focus on manually fix the remaining articles. Additional regexes may be added to fix additional issues.
Discussion
[edit]I picked one "random" entry out of the cat. What would AWB do to (48639) 1995 TL8? Josh Parris 09:51, 2 December 2013 (UTC)[reply]
- @Josh Parris: - It would remove "last obs" from the
|date=
parameter - see my "Remove extraneous text" example above. GoingBatty (talk) 06:37, 4 December 2013 (UTC)[reply]- I don't think that would improve the article. Josh Parris 06:39, 4 December 2013 (UTC)[reply]
- @Josh Parris: - I can remove that rule from the bot, if you like. I've started a conversation at Help_talk:Citation Style 1#Additional text in date field to discuss it further. GoingBatty (talk) 00:54, 6 December 2013 (UTC)[reply]
- I asked the author of that edit, and didn't get a definitive answer. I believe Kheider said that the last obs was a clarifying qualifier (necessary for precision).
- If you're willing to remove that aspect of the task, I'd be happier. Josh Parris 01:47, 6 December 2013 (UTC)[reply]
- @Josh Parris: - I can remove that rule from the bot, if you like. I've started a conversation at Help_talk:Citation Style 1#Additional text in date field to discuss it further. GoingBatty (talk) 00:54, 6 December 2013 (UTC)[reply]
- I don't think that would improve the article. Josh Parris 06:39, 4 December 2013 (UTC)[reply]
This and other CS1-fixing bots are very much needed. GoingBatty, I would very much like to work with you to refine the operation of this bot and similar bots that may be able to fix other categories. I will be happy to proofread this bot's trial edits.
I will post here some suggestions for additional patterns that this bot may be able to fix:
- Fix valid date (e.g. February 2001) in
|year=
, if|date=
is not already present; change|year=
to|date=
. See example - Add missing zero in YYYY-MM-D or YYYY-M-DD or YYYY-M-D date. See example
- Fix unambiguous dates in MM/DD/YYYY or DD/MM/YYYY format (i.e. one and only one of the first two numbers is greater than 12).
- Fix unambiguous dates in MM-DD-YYYY or DD-MM-YYYY format. See example
- Fix erroneous dates in YYYY-DD-MM format, converting to YYYY-MM-DD. See example
- Replace all manner of dashes in YYYY-MM-DD dates with hyphens. See example
- Get rid of {{date}} used in
|date=
. See example - Move "reprint" or "(reprint)" or similar text (maybe "last obs" and similar?) to
|type=
if|type=
does not already exist. See example - Remove extraneous zeroes from YYYY-MM-DD format. See example
- Remove "by XXXX" from
|archivedate=
. See example - Add missing comma to MMM DD, YYYY format. See example
- Convert YYYY MMM DD to valid format. See example
- Convert YYYY MMM to valid format. See example
I should be able to come up with more. Will you be creating a Talk page where people can post bug reports and questions about the bot's edits?
Will this bot operate on all date-holding parameters that are checked by the CS1 module?
Will this bot operate only in Article space? I recommend that it not operate in Template space, for various reasons. – Jonesey95 (talk) 05:29, 6 December 2013 (UTC)[reply]
- @Jonesey95: - I converted your bullets above to numbers for ease of conversation.
#8 does not seem appropriate, based on my interpretation of Template:Cite book#Title.#9 would not be appropriate for a bot per Trappist's comment below. Some of the others are covered by AWB's general fixes, which this bot would also use. Therefore, I'd prefer to run the bot through the category once, analyze what's left, and then determine which rules I should add. - I already have User talk:GoingBatty for questions and suggestions, and User talk:BattyBot for bug reports.
- The rules are currently running on
|date=
,|accessdate=
,|archivedate=
, and|year=
. - For now, the bot will only operate in Article space. I suggest that templates should not be included in Category:CS1 errors: dates. Any other namespaces included in the category would be considered on a case by case basis. GoingBatty (talk) 18:27, 6 December 2013 (UTC)[reply]
- Re Editor Jonesey95's item 8, I think that using
|type=
for reprint is not much different from the default use of|type=
by{{cite thesis}}
,{{cite speech}}
,{{cite techreport}}
. If it is important to note that the cited work is a reprint (I'm not persuaded that it is, but some editors apparently think so) then the best place to note that is in|type=
where it is harmlessly displayed. If such text remains in|date=
then the resulting COinS metadata for the citation is corrupted.
- Re Editor Jonesey95's item 8, I think that using
- Module:Citation/CS1 does not categorize errors of any kind found in User, Talk, User talk, Wikipedia talk, File talk, Template talk, Help talk, Category talk, Portal talk, Book talk, Education Program talk, Module talk, or MediaWiki talk.
- Thanks for the numbering. I am perfectly OK with skipping or postponing any of the above suggestions. I agree with the idea of having the bot make a pass through the category with as many unobjectionable fixes as possible, after which we'll see how many oddball errors are left. Let me know how I can help this bot be successful. As for the Template space, that discussion should happen elsewhere. – Jonesey95 (talk) 20:51, 6 December 2013 (UTC)[reply]
- @Trappist the monk: - Thanks for your comments about
|type=
- I'll consider adding this in the future. - GoingBatty (talk) 22:22, 6 December 2013 (UTC) (signature added by Jonesey95)[reply]
- @Trappist the monk: - Thanks for your comments about
- Thanks for the numbering. I am perfectly OK with skipping or postponing any of the above suggestions. I agree with the idea of having the bot make a pass through the category with as many unobjectionable fixes as possible, after which we'll see how many oddball errors are left. Let me know how I can help this bot be successful. As for the Template space, that discussion should happen elsewhere. – Jonesey95 (talk) 20:51, 6 December 2013 (UTC)[reply]
The deprecation of the month parameter is questionable. Why fix something that isn't broken? Boghog (talk) 07:30, 6 December 2013 (UTC)[reply]
- @Boghog: - I only want to fix things that are broken. Specifically, if a reference has
|date=
|month=
|year=
specified, only the value in the|date=
field is displayed. (See Old revision of 102d Intelligence Wing references 12 and 25 for an example). GoingBatty (talk) 18:36, 6 December 2013 (UTC)[reply]- Per GoingBatty, I am also not proposing that this bot modify month/year pairs that display properly. – Jonesey95 (talk) 20:51, 6 December 2013 (UTC)[reply]
- Thanks GoingBatty and Jonesey95 for the clarification (and I apologize for not looking at the example more carefully). I now support the proposed bot edit. Boghog (talk) 00:15, 7 December 2013 (UTC)[reply]
- @Boghog: - Per Jc3s5h's comments below, the bot won't changing this, because it won't know whether to change
|date=15|month=April|year=2000
to "15 April 2000" or "April 15, 2000" or "2000-04-15" or something else. GoingBatty (talk) 00:23, 7 December 2013 (UTC)[reply]
- @Boghog: - Per Jc3s5h's comments below, the bot won't changing this, because it won't know whether to change
- Thanks GoingBatty and Jonesey95 for the clarification (and I apologize for not looking at the example more carefully). I now support the proposed bot edit. Boghog (talk) 00:15, 7 December 2013 (UTC)[reply]
- Per GoingBatty, I am also not proposing that this bot modify month/year pairs that display properly. – Jonesey95 (talk) 20:51, 6 December 2013 (UTC)[reply]
Removing extraneous digits from a date is likely something best left to human eyes because which of the several digits is the wrong digit can't always be determined by simple inspection. I too am interested in this bot both for what it will be doing and because I have just started using AWB, the documentation for which leaves much to be desired. I would be interested in seeing the code for this bot, is it available for viewing?
- —Trappist the monk (talk) 11:32, 6 December 2013 (UTC)[reply]
- @Trappist the monk: - I agree that removing extraneous digits is not a good bot task. I've posted the AWB settings at User:BattyBot/CS1 errors-dates. GoingBatty (talk) 18:48, 6 December 2013 (UTC)[reply]
- Excellent, thank you.
- Upon re-reading this thread, it occurs to me that the bot still might be able to remove some extraneous digits. I propose that any instance of '00[1-9]' found in the MM or DD portion of a date should always be changed to remove the first zero. I can't think of a counter-example and am OK with being wrong if there is one. I agree that my example linked above, where DD='016', could possibly have been meant as DD='01', '06', or '16', so we shouldn't just go removing all leading zeros. – Jonesey95 (talk) 06:24, 13 December 2013 (UTC)[reply]
- @Jonesey95: - Thanks for the suggestion - I'll work on adding this to the bot. GoingBatty (talk) 18:11, 13 December 2013 (UTC)[reply]
- @Jonesey95: - I've added a rule for fixing this - thanks! GoingBatty (talk) 17:15, 14 December 2013 (UTC)[reply]
- @Jonesey95: - Thanks for the suggestion - I'll work on adding this to the bot. GoingBatty (talk) 18:11, 13 December 2013 (UTC)[reply]
- Upon re-reading this thread, it occurs to me that the bot still might be able to remove some extraneous digits. I propose that any instance of '00[1-9]' found in the MM or DD portion of a date should always be changed to remove the first zero. I can't think of a counter-example and am OK with being wrong if there is one. I agree that my example linked above, where DD='016', could possibly have been meant as DD='01', '06', or '16', so we shouldn't just go removing all leading zeros. – Jonesey95 (talk) 06:24, 13 December 2013 (UTC)[reply]
I object to fixing unambiguous date-order forbidden formats or errors, for example, changing 6/13/1975 to June 6, 1975. One problem is that {{use dmy dates}} and related templates only apply to the article body, not the citations, so there is no way for the bot to know which format to change to. A more serious problem is that if there is one incorrect/forbidden format date, there are probably more that the bot can't fix (example: 6/11/1975). By leaving the errors/forbidden format, we alert human editors to the problem, and leave clues for human editors as to what the correct value of the ambiguous dates might be. Jc3s5h (talk) 18:20, 6 December 2013 (UTC)[reply]
- @Jc3s5h: - Sorry for taking so long to reply. I've removed any of my regex rules that require a decision on which date format to use. For example, it won't change
|date=
|month=
|year=
because a human would need to decide whether to use MDY or DMY or YYYY-MM-DD. Thanks! GoingBatty (talk) 14:12, 11 December 2013 (UTC)[reply]
Will this bot run on only AWB-coding or are you using a personal moduler or something similarly? -(t) Josve05a (c) 00:08, 7 December 2013 (UTC)[reply]
- @Josve05a: - It will use lots of find and replace rules in addition to AWB's general fixes. GoingBatty (talk) 00:25, 7 December 2013 (UTC)[reply]
Restatement
[edit]So, where's this at, GoingBatty? Josh Parris 09:08, 11 December 2013 (UTC)[reply]
- @Josh Parris: - I'm ready to go - awaiting approval for testing. GoingBatty (talk) 14:05, 11 December 2013 (UTC)[reply]
Bugs
[edit]I just loaded up the 6 December 2013 version of the BattyBot 25 code and made 50 edits with it. BattyBot 25 wants to replace:
|date=Nov 30 2013
with|date=November 30 2013
(no comma)|date=2011 09 21 (arc=719 days)
with|date=2011-09 21 (arc=719 days)
(only one hyphen)|date=1980, 2006
with|date=1980-2006
(year ranges not currently supported as valid dates; this one is a cite book so year range is very likely an inappropriate date)|date=Mar. 1 2012
with|date=Mar 1 2012
(no comma)
I skipped these proposed edits.
Should the replacement values for numbers 1 and 4 above be different?
—Trappist the monk (talk) 14:24, 11 December 2013 (UTC)[reply]
- The WP:DATESNO advice, which is adopted by HELP:Citation Style 1, contradicts WP:MOS (search on "Sep."). The restriction of not using periods was added on 15 August 2012 without any discussion about creating a contradiction. WP:MOS has specifically allowed periods since 13 August 2011. Jc3s5h (talk) 14:42, 11 December 2013 (UTC)[reply]
- The discussion re WP:DATESNO vs. WP:MOS is at Wikipedia talk:Manual of Style/Dates and numbers#Abbreviated months in citations so I'll not address it here. I will add to my question above: Item 1 above replaces a month abbreviation with the whole month name; item 4 simply removes a period. Whay are these two abbreviations handled differently?
- —Trappist the monk (talk) 14:55, 11 December 2013 (UTC)[reply]
- @Trappist the monk: - Thanks for doing my testing for me. Did you have AWB's general fixes turned on, which automatically inserts missing comma between day and year for American-format dates? I'll look at this more in detail later today. GoingBatty (talk) 15:08, 11 December 2013 (UTC)[reply]
- @Trappist the monk: - Did you use the most recent version of the BattyBot 25 code? I removed the fix that changes "Nov" to "November" on Dec 6. GoingBatty (talk) 15:16, 11 December 2013 (UTC)[reply]
- —Trappist the monk (talk) 14:55, 11 December 2013 (UTC)[reply]
- I started off with the original version and then caught myself and updated to 6 December 2013. I think the November error was caught by that version.
- General fixes was off. Perhaps we have philosophical differences about that. I believe that single purpose robots should be just that: single purpose. I wanted to see what BattyBot 25 was doing with CS1 citations only. If a robot is going to go to the trouble of detecting a CS1 date error, the correction should be complete and not rely on code maintained elsewhere and not under the robot author's control.
- General fixes encompasses a broad variety of things that aren't necessarily germane to CS1 date errors and which would clutter up the diff window. If there isn't one already, perhaps a general fixes bot should be created.
I noticed that there were never multiple rule matches – all of the detected errors were the same. Is that how AWB works? When different types of date errors exist in a page, only one of the error types is fixed?
- During the second 50-edit test, there were occasions where multiple rules were applied.
- @Trappist the monk: _ yes, we have philosophical differences about that. Most (if not all) of my bot tasks also has general fixes running, which seems to be the standard with AWB bots. There are editors who voice concerns about bots flooding/clogging their watchlists for minor fixes, so I feel it's best to get as many fixes done at once as possible. Having said that, it shouldn't be a big deal for me to duplicate the comma fixing functionality. GoingBatty (talk) 17:57, 11 December 2013 (UTC)[reply]
Second set of 50 edits with the 6 December 2013 version:
|date={{Start date|1906|4|18}}
with|date=1906-4-18
(incorrect month format)|date=April 1920] |title=Transcontinental Motor Convoy
with|date=|$3title=Transcontinental Motor Convoy
(deletes date and creates unknown|$3title=
parameter)|accessdate=9 February 2011
with|accessdate=9 February 2011
(misses second
)|date=Aug./Sept. 2006
with|date=Aug /Sept. 2006
(misses second month in range)
—Trappist the monk (talk) 17:44, 11 December 2013 (UTC)[reply]
Third set of 50 edits with the 6 December 2013 version:
|date=April 4, 1968 (1968-04-04) – April 8, 1968 (1968-04-08)
with|date=1968-4-4 – April 8, 1968 (1968-04-08)
(this kind of peculiar date range note supported)| date = Sat. Sep. 11
with| date = Sep. 11
(since there is not year, probably best to skip)|accessdate=2010=09=24
with|accessdate=2010-09=24
(missed second '=')|accessdate=Jan. 26 2012
withJan 26 2012
(missing comma)| date=November12, 2006
with| date=November12 2006
(the error is the missing space between month and day)
Enough for now. let me know if you'd like me to continue this,
—Trappist the monk (talk) 19:22, 11 December 2013 (UTC)[reply]
A couple of the edits that I made have been reverted because |date=
was found in templates that are not CS1 templates. See this edit.
—Trappist the monk (talk) 22:57, 11 December 2013 (UTC)[reply]
- @Trappist the monk: - I'm rewriting all of the rules to ensure they only impact CS1 templates, and doing it as an AWB module. I will address each issue you brought up, test the rules using my non-bot account, and then repost the code. Thanks! GoingBatty (talk) 05:20, 12 December 2013 (UTC)[reply]
Complete rewrite as AWB module
[edit]I have rewritten the bot's rules as an AWB module, and reposted the code at User:BattyBot/CS1 errors-dates. I ran it against some tests at User:GoingBatty/CS1 errors dates and then ran it supervised on my non-bot account against 100 articles. As I reviewed each article, I corrected one coding error and improved the rules so they correct more CS1 errors. Feel free to review my contributions, test, and offer further suggestions. Thanks! GoingBatty (talk) 05:28, 13 December 2013 (UTC)[reply]
- I checked the most recent 50 edits, from 04:34 to 05:11 UTC. All of them were perfect. I recommend a batch of unsupervised test edits, unless someone else wants to test the code independently. – Jonesey95 (talk) 06:06, 13 December 2013 (UTC)[reply]
- I'd feel more comfortable if the regexes demanded dates in the first two decades of the 20th century - so if I mistype 10th of October 2011 as 1010-11 rather than 10-10-11 then the bot will skip. We're not going to see access or archive dates outside of these decades. Josh Parris 06:44, 13 December 2013 (UTC)[reply]
- To be more specific, date errors in accessdate and archivedate should be skipped if the year value starts with something other than '200' or '201'. Agree. – Jonesey95 (talk) 14:20, 13 December 2013 (UTC)[reply]
- @Josh Parris:, @Jonesey95: - I will make this change - thanks for the suggestion! GoingBatty (talk) 18:06, 13 December 2013 (UTC)[reply]
- @Josh Parris:, @Jonesey95: - I've updated the code as you suggestion. I'll post the revised code after addressing Trappist's issues below. Thanks! GoingBatty (talk) 16:10, 14 December 2013 (UTC)[reply]
- @Josh Parris:, @Jonesey95: - I will make this change - thanks for the suggestion! GoingBatty (talk) 18:06, 13 December 2013 (UTC)[reply]
- To be more specific, date errors in accessdate and archivedate should be skipped if the year value starts with something other than '200' or '201'. Agree. – Jonesey95 (talk) 14:20, 13 December 2013 (UTC)[reply]
- {{Start-date}} ought to be supported as well as {{Start date}}. The conversion should be sensitive to the df= flag to this template. You might instead subst the {{Start date}} template, rather than trying to interpret it. Josh Parris 08:16, 13 December 2013 (UTC)[reply]
- Josh Parris Can you point to an edit made by the bot code that deals with this template? Almost all templates in citation date fields will cause error messages, even if they render a valid date. – Jonesey95 (talk) 14:20, 13 December 2013 (UTC)[reply]
- @Josh Parris: - Using subst would be better. I was playing around with that last night and couldn't get it to work, but I'll keep trying. If that works, then I can add other templates, such as {{Start-date}} and {{Date}}.
- @Jonesey95: - The intent of the bot is to remove {{Start date}}, since using templates in citation date fields will cause error messages. Feel free to try the module against User:GoingBatty/CS1 errors dates. GoingBatty (talk) 18:06, 13 December 2013 (UTC)[reply]
- @Josh Parris: - Per Help:Substitution#Limitation, "Substitution is not available inside
<ref>...</ref>
tags." :-( GoingBatty (talk) 15:56, 14 December 2013 (UTC)[reply]- @Josh Parris:, @Jonesey95: - The code now supports {{Start date}}, {{Start-date}}, {{startdate}}, and {{start-date}} - with or without
|df=yes
. This edit removed {{Start date}} from a citation on 1946 Nankai earthquake. The new code is posted at User:BattyBot/CS1 errors-dates. Please let me know if you find any more bugs, but let's hold off on any feature requests until after the bot has been approved. Thanks! GoingBatty (talk) 17:00, 15 December 2013 (UTC)[reply]
- @Josh Parris:, @Jonesey95: - The code now supports {{Start date}}, {{Start-date}}, {{startdate}}, and {{start-date}} - with or without
- @Josh Parris: - Per Help:Substitution#Limitation, "Substitution is not available inside
- Josh Parris Can you point to an edit made by the bot code that deals with this template? Almost all templates in citation date fields will cause error messages, even if they render a valid date. – Jonesey95 (talk) 14:20, 13 December 2013 (UTC)[reply]
- Is there a way to run this code manually? I would have thought that using Tools > Make module would do the trick. Apparently not. AWB documentation doesn't seem to be too helpful in this matter.
- —Trappist the monk (talk) 13:44, 13 December 2013 (UTC)[reply]
- @Trappist the monk: - Yes, it is possible to run the code manually. I've updated the documentation at Wikipedia:AutoWikiBrowser/User manual#Tools for you. When copying the code from User:BattyBot/CS1 errors-dates, be sure you don't copy the
<source>...</source>
tags. If you still having issues, could you please provide specific details on the steps you're taking and the results you're seeing? Thanks! GoingBatty (talk) 18:06, 13 December 2013 (UTC)[reply]
- @Trappist the monk: - Yes, it is possible to run the code manually. I've updated the documentation at Wikipedia:AutoWikiBrowser/User manual#Tools for you. When copying the code from User:BattyBot/CS1 errors-dates, be sure you don't copy the
- Thank you. Not clear to me what I didn't do right before, but it's working now. Repeatedly clicking the Skip button on the Start tab gets very tedious very quickly. Be sure that you check No changes were made on the Skip tab because Find and replace and Skip if no replacement are ignored when using a module.
- Some bugs:
- 1.
| date =(retrieved September 8, 2007)
with| date =retrieved September 8, 2007
- 2.
| accessdate=2010-08-0
with| accessdate=2010-08-00
- 3.
|date=20 Marc 2010 |accessdate=11 December 2013
with|date=20 March Marcaccessdate=11 December 2013
- 1.
- Some bugs:
- I wonder about fixes like #4 above. In the case of single digit day numbers, any number 1, 2, or 3 could be the first digit of a two digit number: 11, 25, 30. So, it would seem ok to correct by the addition of a leading 0 when the day number is 4-9.
- [Bugs above and Trappist's text renumbered for clarity.]
- #1 and #6 are the same. They do no harm, but perhaps the bot shouldn't take any action at all in this circumstance.
- #2 and #3 are bugs. Trappist's proposal would fix #2.
- Would #1, #4, #5, and #6 be resolved by the bot doing multiple passes through the same article until no potential fixes are found? The bot as written might only make one fix per citation per pass (just guessing here).
- Agree with Trappist that 4-9 is the right range for automatically adding zeros to days, and propose 2-9 for months. That would be a conservative approach. We could also just decide to add zeros to 1-9 (but not 0) on the assumption that the editor entered the value correctly, just in the wrong format. – Jonesey95 (talk) 02:38, 14 December 2013 (UTC)[reply]
- [Bugs above and Trappist's text renumbered for clarity.]
- I've changed the find and replace rule with regard to #4. If I did it right, the code now adds the leading zero only when the day digit is 4-9.
- —Trappist the monk (talk) 11:48, 14 December 2013 (UTC)[reply]
- @Trappist the monk:
- Yes, on the Skip tab I checked "No changes are made". I also checked "Page is in use" and "Only genfixes".
- Find and replace rules are NOT ignored when using a module. Skip if no replacement only pertains to the Find and replace rules.
- #1-2 will now be ignored
- #3-6 are now fixed
- Thanks! GoingBatty (talk) 17:25, 14 December 2013 (UTC)[reply]
- @Trappist the monk:
- —Trappist the monk (talk) 11:48, 14 December 2013 (UTC)[reply]
- BracketBot caught this one:
- 7.
|date=March 18, 2013|accessdate=18 de março de 2013}}
to |date=March 18, 2013|accessdate=18 March de março de}
- 7.
- BracketBot caught this one:
- Re #7: Maybe "$4" should be "$5" in the relevant regex string? – Jonesey95 (talk) 15:27, 14 December 2013 (UTC)[reply]
- #3 and #7 are clearly related. Re:
Maybe "$4" should be "$5"
, I concur. I tested that in the Regex tester and got correct results so I've changed the code. Good catch.
- #3 and #7 are clearly related. Re:
- Re #7: Maybe "$4" should be "$5" in the relevant regex string? – Jonesey95 (talk) 15:27, 14 December 2013 (UTC)[reply]
- —Trappist the monk (talk) 16:17, 14 December 2013 (UTC)[reply]
- @Trappist the monk:, @Jonesey95: Fixed by adding "?:" instead. GoingBatty (talk) 17:28, 14 December 2013 (UTC)[reply]
- —Trappist the monk (talk) 16:17, 14 December 2013 (UTC)[reply]
- I notice that in the same article the code did not catch these dates:
- 8.
|accessdate=fevereiro de 2013
- 9.
|date=24 de janeiro de 2013
(the regex tester can find this date so ...)
- 8.
- I notice that in the same article the code did not catch these dates:
- Re #8: The code appears to require a day to be present. Re #9: Did the code fix a
|date=
in the same citation? If so, it might need to make another pass through the article. Just guessing on this one. – Jonesey95 (talk) 15:27, 14 December 2013 (UTC)[reply]
- Re #8: The code appears to require a day to be present. Re #9: Did the code fix a
- For #7, as an experiment, I also changed the day capture from
\d{1,2}
to\d{0,2}
for the March translation. Doing that, the regex will match days that contain 0 to 2 digits. When zero, the non-English month is replaced with March YYYY. If this works then that is likely the fix for #8. Yes, the code did do another replacement in the same citation so that explains #9. These can probably be split apart into separate|accessdate=
,|archivedate=
, and|date=
replacements as has done with others.
- For #7, as an experiment, I also changed the day capture from
- —Trappist the monk (talk) 16:17, 14 December 2013 (UTC)[reply]
- @Trappist the monk:, @Jonesey95: I would not have thought of using
\d{0,2}
, so thanks for the suggestion. I've fixed these rules and split them out. I've added the updated code to User:BattyBot/CS1 errors-dates and updated the set of tests at User:GoingBatty/CS1 errors dates. Thanks to both of you for all your help! Off to do more testing! GoingBatty (talk) 17:31, 14 December 2013 (UTC)[reply]
- @Trappist the monk:, @Jonesey95: I would not have thought of using
- —Trappist the monk (talk) 16:17, 14 December 2013 (UTC)[reply]
I've just finished 150 edits with the 2013-12-14T17:12 version without finding any anomalous replacements. And now there's a new version. I'll play with that later.
—Trappist the monk (talk) 19:16, 14 December 2013 (UTC)[reply]
- I just finished 100+ edits without finding any anomalous replacements. I did find more things to replace, and added another new version. GoingBatty (talk) 22:37, 14 December 2013 (UTC)[reply]
Using 2013-12-15T06:18, 200 edits and only four issues to show for it (of which three seem to be related):
| accessdate = 05 September 2010
to| accessdate = 05 September 2010
|accessdate=03-Sep-2012
to|accessdate=03 Sep 2012
| accessdate=08September 2012
to| accessdate=08 September 2012
|date=1999.25.3
to|date=1999-25-3
—Trappist the monk (talk) 14:48, 15 December 2013 (UTC)[reply]
- @Trappist the monk: The leading zero is not reported as an error (yet?) - see Module talk:Citation/CS1/Archive 8#Another date check enhancement. GoingBatty (talk) 14:59, 15 December 2013 (UTC)[reply]
- Leading zeros in mdy dates aren't reported as an error yet either, but the robot code is correcting them so why not fix leading zeros in dmy dates? Similarly, missing spaces aren't reported as errors but they too are repaired.
- —Trappist the monk (talk) 15:20, 15 December 2013 (UTC)[reply]
- @Trappist the monk: - Updated code posted at User:BattyBot/CS1 errors-dates - thanks! GoingBatty (talk) 16:20, 15 December 2013 (UTC)[reply]
- —Trappist the monk (talk) 15:20, 15 December 2013 (UTC)[reply]
Another 100 using 2013-12-15T16:63:
|date=2011‐1‐31
to|date=2011-1-31
From contains unicode hyphen characters:‐
; also consider detecting and fixing non-breaking hyphens? unicode:‑
—Trappist the monk (talk) 19:03, 15 December 2013 (UTC)[reply]
- @Trappist the monk: Working as designed per the suggestions above - the code only changes the month when it is 3-9. In the example you provided, one would have to manually look at the reference to see if it should be changed to 01, 10, 11, or 12. I added
‑
to the code, and posted the updated version. Thanks! GoingBatty (talk) 19:52, 15 December 2013 (UTC)[reply]
Another 100 using 2013-12-15T19:47:
|date=7July 2008
to|date=7 July 2008
—Trappist the monk (talk) 01:16, 16 December 2013 (UTC)[reply]
- @Trappist the monk: - Added new rules - updated code posted at User:BattyBot/CS1 errors-dates - thanks! GoingBatty (talk) 05:01, 16 December 2013 (UTC)[reply]
Another 250 using 2013-12-16T06:00:
| date =12 October, , 1985
to| date =12 October , 1985
—Trappist the monk (talk) 20:18, 16 December 2013 (UTC)[reply]
- @Trappist the monk: - Tweaked rules - updated code posted at User:BattyBot/CS1 errors-dates - thanks! GoingBatty (talk) 03:05, 17 December 2013 (UTC)[reply]
Another 150 using 2013-12-17T05:14:
|date=Mat 2010
to|date=May 2010
– editor might have meant Mar?
—Trappist the monk (talk) 11:08, 17 December 2013 (UTC)[reply]
- @Trappist the monk: - Good point - I tweaked the rules at User:BattyBot/CS1 errors-dates - thanks! GoingBatty (talk) 13:01, 17 December 2013 (UTC)[reply]
Another 50 using 2013-12-17T13:00:
|date=Aug, 2010,
to|date=Aug , 2010
—Trappist the monk (talk) 13:41, 17 December 2013 (UTC)[reply]
Another 50 using 2013-12-17T14:50:
|date=12 Sept. 2011
to|date=12 Sept 2011
—Trappist the monk (talk) 16:54, 17 December 2013 (UTC)[reply]
- @Trappist the monk: Added/expanded the rules at User:BattyBot/CS1 errors-dates to cover both of these - thanks! GoingBatty (talk) 23:49, 17 December 2013 (UTC)[reply]
Another 50 using 2013-12-18T00:39:
|date=09 Sept 2013
to|date=9 Sept 2013
(interestingly, earlier in the same page:|date=01 Sept 2013
to|date=1 Sep 2013
)|date=2-21-2012
to|date=2012-2-21
– this may not be fixable, right?
—Trappist the monk (talk) 01:50, 18 December 2013 (UTC)[reply]
- @Trappist the monk: The first works for me - what page had this issue? The second isn't bot fixible by design - the month could be 02 or 12. GoingBatty (talk) 02:03, 18 December 2013 (UTC)[reply]
- I was thinking that I should be listing article names as well ... I can't find the 09 Sept 2013 article.
- Could be Approaching Midnight or Jana Kramer. – Jonesey95 (talk) 14:12, 18 December 2013 (UTC)[reply]
- Excellent! Approaching Midnight. 2013-12-18T01:48 has the same issue.
- —Trappist the monk (talk) 14:22, 18 December 2013 (UTC)[reply]
- @Trappist the monk:, @Jonesey95: Updated the code to fix more than one "Sept" in the same citation and posted the code at User:BattyBot/CS1 errors-dates. Time for the bot test! GoingBatty (talk) 02:35, 19 December 2013 (UTC)[reply]
- —Trappist the monk (talk) 14:22, 18 December 2013 (UTC)[reply]
Ready for trial?
[edit]So, 250 reviewed edits of the AWB module and no errors. Any objection to a trial? Josh Parris 07:55, 15 December 2013 (UTC)[reply]
- @Josh Parris: No objection from me! GoingBatty (talk) 13:52, 15 December 2013 (UTC)[reply]
- No objections at this point. – Jonesey95 (talk) 16:00, 15 December 2013 (UTC)[reply]
As you no doubt have guessed, I'm waiting for the errors to die down. Does it work in the new draft namespace? Josh Parris 04:56, 18 December 2013 (UTC)[reply]
- As far as I can see, the notes above, or at least the ones dating from 2013-12-14T17:12, have all pointed out proposed "fixes" by the bot which would leave the article in question in the error category, to be caught and fixed by a human. I think that's an acceptable outcome, even if it is a little strange. I do not see any bot-proposed fixes above that would remove the article from the error category (i.e. a "false fix", something to be avoided). It seems to me that the bot is being quite conservative.
- After reviewing the results more carefully, I see one false fix starting with the 2013-12-14T17:12 code, changing "Mat" to "May". That's one false fix in 1200+ test edits. The bot code will never be perfect. I would like to see it do a few hundred edits using the latest code; I will be happy to check the diffs and report problems.
- Maybe those with more bot experience will have a different view. That's OK with me.
- Re Draft space: The bot owner said above that "For now, the bot will only operate in Article space." I think that extending its scope into Draft space before we know how Draft space works would be premature. Articles that move from Draft into Article space will be cleaned up by the bot at that point. – Jonesey95 (talk) 05:53, 18 December 2013 (UTC)[reply]
- @Jonesey95: The 1200+ test edits you mentioned above doesn't include my 600+ test edits made on my non-bot account. While there's nothing that would have to be changed to run this code on the Draft space, I agree it's too early to do that. However, it would be interesting to consider having bot testing done in Draft space before deploying in article space. GoingBatty (talk) 07:50, 19 December 2013 (UTC)[reply]
- Approved for trial (250 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Josh Parris 06:02, 18 December 2013 (UTC)[reply]
- Trial complete. - A list of the edits can be found here. Results as follows:
- This edit is incorrect. I've reverted the bot edit and will recode to make the bot more conservative to avoid changing messes like this.
- This edit and this edit are partial fixes that were made where the article is no longer in the error category. I'll recode the bot to handle these, and request that these types of errors be caught by the error checking.
- This edit is a partial fix that still needs a human to fix it.
- There were 10 edits such as this one where the bot didn't catch all the typos in the month names. While the bot will never fix all the creative ways to spell months, I'll update the bot code for these.
- There were 31 edits such as this one where there are remaining errors that I might be able to get the bot to catch with some recoding. However, the bot will never be able to fix every parameter.
- Stay tuned! GoingBatty (talk) 07:59, 19 December 2013 (UTC)[reply]
- Trial complete. - A list of the edits can be found here. Results as follows:
- Approved for trial (250 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Josh Parris 06:02, 18 December 2013 (UTC)[reply]
This citation from ANDi caused my deprecated parameter script to do the wrong thing:
{{cite journal|last=Chiang|first=Mona|title=Monkey See, Monkey Glow|journal=Science World|date=12|year=2001|month=Feb.|pages=pg. 7|url=http://go.galegroup.com/ps/i.do?&id=GALE{{!}}A70872765&v=2.1&u=sunysuffolk&it=r&p=ITOF&sw=w|accessdate=12/5/11}}
It failed because the {{!}}
prematurely terminated the match. If I understand the BattyBot 25 script, that same template will also prematurely terminate the match so the malformed |accessdate=
will not be repaired. I don't see this a problem because here, nothing will change, so nothing gets more broken than it already was.
—Trappist the monk (talk) 16:55, 18 December 2013 (UTC)[reply]
- @Trappist the monk: Your logic seems correct that BattyBot 25 would ignore the parameters after the
|url=
, but I haven't tested it. Even if the url was simple, BattyBot 25 would still ignore|accessdate=12/5/11
because it's not obvious whether this should be December 5, 2011 or 12 May 2011 or 2012 May 11 or something else. GoingBatty (talk) 02:19, 19 December 2013 (UTC)[reply]
Operator is experienced with these tasks and is intent on continuously improving the accuracy of the bot. I encourage these fixes to be added into AWB's genfixes. Extensive edit testing has been undertaken by several interested parties. Approved. Josh Parris 12:12, 19 December 2013 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.