Wikipedia:Bots/Requests for approval/Merge bot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Wbm1058 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 01:08, Saturday, January 28, 2017 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): PHP
Source code available:
Function overview: History-merge categories which were moved by User:Cydebot between April 2006 and March 2015
Links to relevant discussions (where appropriate): Wikipedia:Bot requests/Archive 74#Bot for category history merges
Edit period(s): One time run, but the process will be run 3 times or more as needed to clear the work queue
Estimated number of pages affected: over 87,000 – about 67,000 "easy" cases for this BRFA, another ~20,000 "edge cases" deferred for later processing
Exclusion compliant (Yes/No): No
Adminbot (Yes/No): Yes (needs admin flag set)
Already has a bot flag (Yes/No): Yes
Function details: Use API:Usercontribs to select all Cydebot category-space new page creations from March 2015 back to 2006. This is the beginning of the selection set. In developing and testing the bot, I've already processed the first four items on the list. Check to see if the page shown in Cydebot's edit history has mergeable, deleted revisions and is a different page than the page Cydebot created, if so, then (1) Undelete the page, (2) Merge the appropriate history to the current category title (3) Delete the page (4) revision-delete the edit summary of the destination page of Cydebot's move, as it's no longer needed for attribution, but prevents listed users from vanishing).
Function details expanded by 103.6.159.90 (talk) 08:29, 29 January 2017 (UTC): The bot processes pages in category namespace in which the first edit is by Cydebot. Cydebot's edit summary will include the text "Moved from <CATEGORYOLDNAME>" and identifies the the editors of the old category to account for the attribution. (example). This bot will undelete the category mentioned as such (if it has any mergeable edits), and history-merge it into the new category, using Special:MergeHistory. The MergeHistory extention cannot be used for merging parallel histories, so there's no chance of any history mess-up. The bot will then delete the leftover redirect. (MergeHistory produces a redirect when all edits on the source page are merged away; this cannot be suppressed.) As an additional step, the bot shall revdelete Cydebot's edit summary that lists the authors - this is because attribution is no longer needed and it prevents user from vanishing.
Going by the botop's notes at the BOTREQ discussion, the bot is equipped to handle a variety of special situations - categories that underwent multiple moves or back-and-forth moves.
Further details: History merges are done using API:Mergehistory, which I believe is functionally equivalent to Special:MergeHistory.
The bot algorithm sequentially steps through the selected Cydebot contribution history, from March 2015 back to 2006, which is 89,893 or 89,894 pages (I've gotten both results on different test runs, and don't have an explanation for the difference).These pages are then grouped as follows:
- 89893 user contributions
- Found count: 87524 (the request to retrieve the timestamp of the oldest deleted revision was successful)
- Mergeable: 86974 (the timestamp of the oldest deleted revision is earlier than the timestamp of Cydebot's edit, i.e. the timestamp of the selection-set item currently being processed)
- Hist-mergeable: 86213 (report showing the first 1,000) – the page title in Cydebot's edit summary is different than the page title of the selection-set item
- Self-mergeable: 761 (report) – the page title in Cydebot's edit summary is the same as the page title of the selection-set item
- Not mergeable: 550 (report) – the timestamp of the oldest deleted revision is later than the timestamp of Cydebot's edit
- Mergeable: 86974 (the timestamp of the oldest deleted revision is earlier than the timestamp of Cydebot's edit, i.e. the timestamp of the selection-set item currently being processed)
- Self-mergeable, no deleted revisions: 15 (report) – I don't believe there is anything that needs to be done with these
- Other no deleted revisions: 2354 (report)
- Found count: 87524 (the request to retrieve the timestamp of the oldest deleted revision was successful)
I am prepared to begin processing the 86213 Hist-mergeable items. After these are processed, they will become Not mergeable items (the first four items on the list of 550 are pages I hist-merged during development and testing). However, when I run the bot through these 89,893 pages a second time, more pages will appear on the Hist-mergeable items list (they will be un-deleted by the first run). The second run will process these. Eventually (maybe after 3 or 4 runs) there will be no more of these left and the Hist-mergeable item count will be zero, while the Not mergeable list will have grown from 550 to over 87,000.
I can likely process the 761 "Self-mergeable" items. Rather than hist-merging, they will just need to have some deleted history restored. I haven't coded or tested this piece yet, and would prefer to process these separately, so in the event of problems, items of this type are isolated and consolidated. But I can work on this next if it's preferred that I do everything at once.
As the IP has pointed out, many of the 2354 "Other no deleted revisions" can likely be hist-merged also, but I haven't tested any of these yet, and would prefer to leave them for later processing as well.
These other two shorter work queues aren't going anywhere, and we can come back to them later. I'm ready to do a test run of a small subset of the 86213 Hist-mergeable items as soon as BAG is comfortable with giving me the go-ahead, and my bot has administrator privileges.
Community notifications
[edit]Community notifications for a new admin bot have been placed:
- Wikipedia:Bots/Requests for approval/Adminbots
- Wikipedia:Village pump (proposals)/Archive 136#New Adminbot proposal - History Merge cleanup of another bot
- Wikipedia:Administrators' noticeboard/Archive286#New Adminbot proposal - History Merge cleanup of another bot
- Wikipedia talk:Categories for discussion/Working
- Wikipedia talk:Categories for discussion/Archive 16
Discussion
[edit]Please note that this proposed bot does npothing but stuff I've done in the past, and would probably have gotten back to if not for this bot. עוד מישהו Od Mishehu 04:37, 29 January 2017 (UTC)[reply]
- The vast majority of the bot's edits should be trivially easy, but with such a massive number of pages to merge, I'm wondering if we'll encounter some situations where the best action (the action that any human admin would perform) would require this to be a CONTEXTBOT. For example, what happens if we move Cat:Foo to Cat:Bar, delete Cat:Bar some time later, and then someone creates a different Cat:Bar (so it's not G4-eligible) afterward. A human would see that a merge isn't needed. What would the bot do? Would it totally ignore the deleted category because Cydebot didn't create the extant one? Log the deleted category for human review? Also, what would it do when the Cydebot edit summary has already been revdeleted: would it just skip that step, or would this issue potentially cause a crash? Meanwhile, Od Mishehu said at WP:BOTR Some of the pages with no deleted reivsions are the result of a category rename where the source category was changed into something else (a category redirect or disambiguation), and a history merge in those caes should be done. When the last deleted revision is something other than a normal category (it has something more than just text, parent categories, and an XFD or speedy deletion tag), what will the bot do? Not trying to derail anything; I just want to make sure you've addressed everything in your coding. Nyttend (talk) 00:14, 31 January 2017 (UTC)[reply]
- (1) I'm looking at a list of Cydebot's page creations, so I might run into something like this:
- 18:01, 21 March 2015 . . N Category:Bar (Robot: Moved from Category:Foo. Authors: Author1, Author2)
- This item will only remain in Cydebot's edit history as long as Category:Bar is not deleted. If someone deletes Cat:Bar some time later, then that item will disappear from Cydebot's edit history. If then someone creates a different Cat:Bar then presumably the history of the old Cat:Bar will remain deleted, and my bot doesn't look at deleted edit histories. If someone else later restores that unrelated history, then that's on them, not my bot.
- (2) When the Cydebot edit summary has already been revdeleted, a warning is returned: "warnings": "code": "revdelete-no-change" – try this in the sandbox (click Make request). I'm not checking for warnings, but I am checking for unexpected "error" returns from my undelete, merge_history, and delete function calls. If any of these return errors, the bot will immediately stop processing, so I can investigate.
- (3) Right, "no deleted revisions" generally means that a human editor has intervened somehow, and humans are less predictable than bots. I've noticed that there are some different scenarios there including category splits and disambiguations. There may be some extra special handling needed for those 2354 "no deleted revisions" items, that's one reason why I'm suggesting we defer processing those until later. I'm not comfortable with those until I take some time for more analysis. wbm1058 (talk) 02:04, 31 January 2017 (UTC)[reply]
- (3) Maybe I misunderstood, then, because it didn't occur to me that all of these ones would necessarily fall into the "no deleted revisions" camp. (1) Again, I wasn't clear. Will it just ignore the page entirely, or will it have some way of logging it? I'm guessing that it will ignore it (because it won't show up on the list of pages to check, in the first place), but maybe I just don't understand the algorithm properly. (2) Great. And finally, thanks for the helpful responses. Nyttend (talk) 02:52, 31 January 2017 (UTC)[reply]
- re (1) On the bot requests page, I briefly considered whether using the Deletedrevs API to retrieve and examine deleted revisions would be necessary, and decided that it was not. So perhaps, "ignore" isn't quite the right word to use, but rather, it's not going out of its way to look for it. If a currently active category wasn't created by Cydebot, that category is not going to be part of the work queue. The only way to make that category appear as if it was created by Cydebot is to restore the deleted history, and my bot isn't going to do that. I'd have to go to extra trouble to look for deleted Cydebot page creations in order to log them.
- A general response regarding whether there are any lurking unusual category "gotchas" that I haven't accounted for: THIS is the list of the first 1,000 pages the bot will do hist-merges on. Scan that list and see whether you can spot any "unusual types" of categories on it. Those more active in category administration may be better at spotting any weird ones than I am. wbm1058 (talk) 03:51, 31 January 2017 (UTC)[reply]
- Now I recall that when I used the sandbox to try Deletedrevs, the API returned a notice: "warnings": "deletedrevs": "*": "\"list=deletedrevs\" has been deprecated. Please use \"prop=deletedrevisions\" or \"list=alldeletedrevisions\" instead." Though the documentation doesn't say that it's deprecated. – wbm1058 (talk) 15:18, 31 January 2017 (UTC)[reply]
- So, I'll use Deletedrevisions instead of Deletedrevs. wbm1058 (talk) 16:26, 31 January 2017 (UTC)[reply]
- Or should I use Alldeletedrevisions? Or do I need to use both? "List all deleted revisions by a user or in a namespace." How about "List all deleted revisions by a user (Cydebot) and in a namespace (Category). Not the most straightforward thing to figure out, this API. It might be interesting to see if I can generate a list of all of Cydebot's currently deleted category-space deleted revisions, for the time-window of interest, just for kicks. I don't think I would be spilling any beans if I published such a list. Some of the items on this list would be hist-merged by my bot on its second or later runs. wbm1058 (talk) 17:02, 31 January 2017 (UTC)[reply]
- Alldeletedrevisions it is: sandbox Hmm. "Note: Due to miser mode, using adruser and adrnamespace together may result in fewer than adrlimit results returned before continuing; in extreme cases, zero results may be returned." Hopefully I won't be running an "extreme case"; I wouldn't want to get no results. – wbm1058 (talk) 18:10, 31 January 2017 (UTC)[reply]
- (3) Maybe I misunderstood, then, because it didn't occur to me that all of these ones would necessarily fall into the "no deleted revisions" camp. (1) Again, I wasn't clear. Will it just ignore the page entirely, or will it have some way of logging it? I'm guessing that it will ignore it (because it won't show up on the list of pages to check, in the first place), but maybe I just don't understand the algorithm properly. (2) Great. And finally, thanks for the helpful responses. Nyttend (talk) 02:52, 31 January 2017 (UTC)[reply]
- (3) Right, "no deleted revisions" generally means that a human editor has intervened somehow, and humans are less predictable than bots. I've noticed that there are some different scenarios there including category splits and disambiguations. There may be some extra special handling needed for those 2354 "no deleted revisions" items, that's one reason why I'm suggesting we defer processing those until later. I'm not comfortable with those until I take some time for more analysis. wbm1058 (talk) 02:04, 31 January 2017 (UTC)[reply]
- Here's a link for Cydebot's Deleted user contributions from March 25, 2015 back. We're only interested in the deleted contributions with edit summaries in the form "(Robot: Moved from Category:Foo. Authors: X, Y)". The first one is:
- 07:01, 19 March 2015 . . Category:Calendar dates (Robot: Moved from Category:Dates.
- This one is a dead end. Category:Calendar dates was deleted per Wikipedia:Categories for discussion/Log/2016 March 25. I won't restore both categories in order to hist-merge Category:Dates into Category:Calendar dates, only to then re-delete both categories, unless there is a consensus to do that.
- Most of these are red links. The first blue link I see is:
- 23:32, 1 March 2015 . . Category:Royal Order of the Seraphim (Robot: Moved from Category:Order of the Seraphim.
- Category:Royal Order of the Seraphim was originally created by Cydebot on 1 March 2015 (edit summary above), but then:
- 06:17, 27 August 2015 Cydebot deleted page Category:Royal Order of the Seraphim (Robot - Removing category Royal Order of the Seraphim per CFD at Wikipedia:Categories for discussion/Log/2015 June 22#A few more award categories.)
- (collapsed under more awards: Propose upmerging Category:Royal Order of the Seraphim to Category:Orders of knighthood of Sweden.)
- But, more recently:
- 13:31, 19 December 2015 Chicbyaccident . . (←Created page with 'Seraphim Category:Orders of knighthood awarded to heads of state, consorts and sovereign family members|Seraphim, Royal Order o...')
- This is an example of a dead-end that was covered up by a new page creation. Again, my bot will not see this; it will not restore Category:Order of the Seraphim and the deleted history of Category:Royal Order of the Seraphim in order to perform a history-merge, only to re-delete Category:Order of the Seraphim and the part of Category:Royal Order of the Seraphim that was temporarily restored to do a history-merge. – wbm1058 (talk) 17:27, 2 February 2017 (UTC)[reply]
- Here's a link for Cydebot's Deleted user contributions from March 25, 2015 back. We're only interested in the deleted contributions with edit summaries in the form "(Robot: Moved from Category:Foo. Authors: X, Y)". The first one is:
Hmm... I think one of the problems with not checking deleted revisions and comparing deletion logs might be that we could end up with stuff that got separately deleted (who knows; maybe the content on the category page was deletion worthy at one point(?)). If a page was deleted multiple times—and especially if CydeBot wasn't the first or last thing to delete it—its entire history probably shouldn't be restored (the histmerge api has date parameters, probably for this reason). I think you alluded to covering the not-the-last condition, but I'm just trying to make sure we cover the not-the-first portion too in order to avoid something awful being accidentally dredged up. --slakr\ talk / 09:45, 4 February 2017 (UTC)[reply]
- Facepalm & self-trout Right, so after adding another check to my algorithm, with today's latest test we have:
- 89888 user contributions (5 less than the 89893 reported above, because 5 categories in the list were deleted recently?)
- Found count: 87519 (the request to retrieve the timestamp of the oldest deleted revision was successful)
- Destination has deleted history: 4056 (report)
- Mergeable: 82913 (the timestamp of the oldest deleted revision is earlier than the timestamp of Cydebot's edit, i.e. the timestamp of the selection-set item currently being processed, and the merge destination has no deleted history)
- Hist-mergeable: 82913 (report showing the first 1,000) – the page title in Cydebot's edit summary is different than the page title of the selection-set item
- Self-mergeable: 0 – the page title in Cydebot's edit summary is the same as the page title of the selection-set item. These by definition all have deleted history, so the 761 found by the last test are now a subset of the 4056 above [trapped there so they don't fall through to here]
- Not mergeable: 550 (report) – the timestamp of the oldest deleted revision is later than the timestamp of Cydebot's edit
- Self-mergeable, no deleted revisions: 15 (report) – I don't believe there is anything that needs to be done with these [no change since last test]
- Other no deleted revisions: 2354 (report) [just one change since last test]
- Found count: 87519 (the request to retrieve the timestamp of the oldest deleted revision was successful)
- So now instead of 761 "Self-mergeable" items to be deferred for later processing, we have 4056 merge destinations with deleted history (which includes all of the 761) to be deferred for later processing.
- The first example on the list – something I would have missed:
- Deletion log
- 05:40, 24 March 2015 Cydebot deleted page Category:Shipyard Associates of The Wire (Robot - Speedily moving category Shipyard Associates of The Wire to Category:Shipyard associates of The Wire per CFDS.)
- 21:16, 16 September 2010 Ex*** deleted page Category:Shipyard associates of The Wire (Merged to Category:The Wire (TV series) characters per Wikipedia:Categories for discussion/Log/2010 September 8#The Wire characters.)
- 16:22, 8 September 2010 Cydebot deleted page Category:Shipyard Associates of The Wire (Robot - Speedily moving category Shipyard Associates of The Wire to Shipyard associates of The Wire per CFDS.)
- Deleted history of Category:Shipyard Associates of The Wire (9 deleted edits)
- 03:04, 22 March 2015 . . Ko*** (554 bytes) (Nominated for speedy renaming; see Categories for discussion/Speedy. (TW))
- 03:04, 22 March 2015 . . Ko*** (94 bytes)
- 14:56, 14 April 2013 . . Gr*** (32 bytes) (added Category:The Wire characters using HotCat)
- 11:57, 6 September 2010 . . Ta*** (1,896 bytes) (cfr-speedy)
- 18:36, 1 July 2008 . . Io*** m (333 bytes)
- 18:36, 1 July 2008 . . Io*** m (335 bytes)
- 14:17, 22 February 2007 . . Cydebot m (337 bytes) (Robot - Speedily moving category The Wire (TV series characters) to The Wire (TV series) characters per CFD.)
- 21:32, 23 January 2007 . . Co*** (338 bytes)
- 21:28, 23 January 2007 . . Co*** (44 bytes) (←Created page with 'Category:The Wire (TV series characters)')
- Live history of Category:Shipyard associates of The Wire
- 05:39, 24 March 2015 Cydebot . . (94 bytes) (+94) . . (Robot: Moved from Category:Shipyard Associates of The Wire. Authors: Gr***, Ko***)
- Deleted history of Category:Shipyard associates of The Wire (4 deleted edits)
- 09:07, 12 September 2010 . . He*** (703 bytes) (Category:The Wire)
- 18:40, 8 September 2010 . . Mi*** m (681 bytes)
- 18:31, 8 September 2010 . . Mi*** (681 bytes) (cfm)
- 16:22, 8 September 2010 . . Cydebot m (333 bytes) (Robot: Moved from Category:Shipyard Associates of The Wire. Authors: Ta***, Io***, Co***, Cydebot)
- Obviously some more to think about here. My bot would undelete the 9 deleted edits of Category:Shipyard Associates of The Wire and hist-merge them to the single live edit of Category:Shipyard associates of The Wire.
- But it should only hist-merge the three deleted edits from 14 April 2013 – 22 March 2015. The downside of working backwards. We can back-burner these 4056 pages with deleted history.
- Now down to 82,913 on my "good to go" list. Have I filtered out all the trouble yet, or is there another "gotcha" scenario still lurking in these? $64K question. – wbm1058 (talk) 00:29, 5 February 2017 (UTC)[reply]
— — — — — — — — — — —
There is more work to be done before this is ready. Look at this example:
- Category:Olympic track and field athletes of Puerto Rico <-- Category:Olympic athletes of Puerto Rico
The deletion log of Category:Olympic athletes of Puerto Rico:
- 23:10, 12 March 2015 G*** O*** deleted page Category:Olympic athletes of Puerto Rico (R3: Recently created, implausible redirect: not convinced that this is plausible, since "athletes" in this country refers to all sportspeople)
- 08:21, 11 March 2015 Cydebot deleted page Category:Olympic athletes of Puerto Rico (Robot - Moving category Olympic athletes of Puerto Rico to Category:Olympic track and field athletes of Puerto Rico per CFD at Wikipedia:Categories for discussion/Log/2015 February 26.)
- 01:41, 5 May 2005 Re*** deleted page Category:Olympic athletes of Puerto Rico (deleted (renamed) as per cfd discussion/vote)
We do not want to restore all 15 deleted edits. No need to restore the 7 edits that were deleted on 5 May 2005, nor the one edit created after 08:21, 11 March 2015.
We just want to restore the 7 deleted edits in between those. This is a more complicated scenario, so I'll add another bucket of "items to skip on the first pass" to dump this into.
I need to check the deletion log. For the first pass at this, I will filter out all the items where there are multiple deletions listed in the log, and process the items where there is only one deletion in the log (Cydebot's). Finding the API for this wasn't easy. I was wondering whether I'd need to use Special:Log directly, and parse the results. But, I finally located API:Logevents, so I'll use that. – wbm1058 (talk) 15:20, 9 February 2017 (UTC)[reply]
This is really cool (and likely difficult to fix all of the edge cases). If only Wikipedia's software had allowed actual category page content moves a full decade before they were finally implemented. Good luck! --Cyde Weys 15:52, 10 February 2017 (UTC)[reply]
- Thanks, I appreciate the support of Cydebot's operator. This API sandbox log query returns 3 deletions, so I'll skip it for now. This log query returns just one item, with "user": "Cydebot", so we're good to go. Most on the list should be simple cases like this. I'll code this up and do another test run. – wbm1058 (talk) 19:35, 10 February 2017 (UTC)[reply]
|
My sincere apologies, but I have more work for you. It wasn't just Cydebot, but its counterpart ArmbrustBot (that comes alive when Cydebot is down) has also made several cut-and-paste moves. 103.6.159.82 (talk) 13:39, 13 February 2017 (UTC)[reply]
- There are 1055 pages in all. 103.6.159.82 (talk) 14:07, 13 February 2017 (UTC)[reply]
- And also the CrimsonBot; less than 100 pages to deal with. 103.6.159.66 (talk) 18:45, 13 February 2017 (UTC)[reply]
- And also the AvicBot; number is within 500, but many edit summaries show AvicBot to be the first author of the pages cut-and-pasted from, so there'd be some deleted history to deal with too. 103.6.159.72 (talk) 06:45, 14 February 2017 (UTC)[reply]
- Thanks. More work queues for deferred processing. wbm1058 (talk) 19:00, 14 February 2017 (UTC)[reply]
— — — — — — — — — — —
Results of my latest test run:
- 89879 User:Cydebot contributions
- Found count: 87509 (the request to retrieve the timestamp of the oldest deleted revision was successful)
- Destination has deleted history: 4057 (report, these will be deferred for later processing) – change since last report:
- Category:Football clubs in Kent <-- Category:Kent football clubs removed (see below)
- Category:Southern Utah Thunderbirds men's basketball players and Category:Southern Utah Thunderbirds men's basketball coaches added due to 13 February 2017 page moves
- Mergeable: 82901 (the timestamp of the oldest deleted revision is earlier than the timestamp of Cydebot's edit, i.e. the timestamp of the selection-set item currently being processed, and the merge destination has no deleted history)
- Source has multiple deletions: 1703 (report) . . . these will be deferred for later processing
- Source was not deleted by Cydebot: 14146 – I think these are good to go, as there's only one deletion in the log, but they can be deferred as the first on deck after the initial ~67,000 – details below
- Hist-mergeable: 67052 (report showing the first 5,000 and the last 52) – the page title in Cydebot's edit summary is different than the page title of the selection-set item, and there is only one deletion in the log, and that deletion was done by Cydebot
- Not mergeable: 551 (report) – the timestamp of the oldest deleted revision is later than the timestamp of Cydebot's edit. Category:Football clubs in Kent was added to this report since the last test run
- Destination has deleted history: 4057 (report, these will be deferred for later processing) – change since last report:
- Self-mergeable, no deleted revisions: 16 (report) – I don't believe there is anything that needs to be done with these. One added since the last test run:
- 05:14, 13 February 2017 Jenks24 restored page Category:Football clubs in Kent (3 revisions restored: histmerge) . . . glad to see Jenks back on the histmerge beat
- Other no deleted revisions: 2354 (report) . . . these will be deferred for later processing
Cydebot got its bot flag on 27 April 2006 but did not become an administrator-bot until 30 September 2008, per Wikipedia:Bots/Requests for approval/Cydebot 4. Thus the oldest item in the list of 67K for initial processing only dates to 1 October 2008. Prior to that, Cyde did many of these deletions: List of 6615 deletions done by Cyde between 28 April 2006 – 28 September 2008. These should be safe to process as well. Other admins have also deleted category pages that Cydebot did not; some after Cydebot became an admin: B (1300), C (177), D (306), E (1229), F (131), all others (4388), total 14146 not deleted by Cydebot. I believe these are all safe to process. So I believe we have ~ 67,052 + 14,146 = 81,198 which are ready to process.
As I've added more checks, these tests are now taking over 7 hours to run through the entire set. Once we start writing to the database, the bot will take even longer to work through the set.
Unless someone seems something else needing checked that I'm still missing, I'd appreciate authorization for a trial run, and admin-bot privileges for Merge bot if the trial is not to be done using my personal account. wbm1058 (talk) 19:00, 14 February 2017 (UTC)[reply]
- I had no idea there was a bot in the works for this, what a great concept. I've only been doing the odd category histmerge when I stumble across one in my regular editing/browsing, but I'll leave off now that I've seen a bot will (hopefully) do the whole lot shortly. No comment on the technical stuff, that's not in my wheelhouse, but the intent has my full support. Best of luck with the trial run. Jenks24 (talk) 19:37, 14 February 2017 (UTC)[reply]
- You have to use {{BAGAssistanceNeeded}} for that. But before that, for cases in which the source wasn't deleted by Cydebot or Cyde, an additional check needs to be done. The deletion of the source should postdate the creation of the category, and that too by a short period (max 3 days?). My concern is that there could be cases where the source was never deleted as part of the cat rename, but were deleted legitly sometime before that, to be recreated later. 103.6.159.71 (talk) 13:21, 15 February 2017 (UTC)[reply]
- Additionally, I would like to register my opposition on the whole of the revdeletion business. The bot may revdelete Cydebot's edit summary if it senses that one of the editors mentioned therein is now a vanished user. But to do it for all edit summaries is ridiculous. Pre-emptive usage to cover the possibility of one of the users ever vanishing in the future, is against the WP:REVDEL policy. Very few users vanish, after all. And even if they are vanished, their identities can anyway be found out by trivially easy methods. Revdeletions are a hinderance to transparency. And should the bot malfunction or function in a confusing manner, it would be easier for non-admins to detect the malfunctioning/ understand its functioning, if the edit summaries are visible. 103.6.159.71 (talk) 13:21, 15 February 2017 (UTC)[reply]
{{BAGAssistanceNeeded}}
- 103.6.159.xx raises a good point: Revision deletion should only be used in accordance with one of the seven criteria for redaction, and hiding usernames doesn't appear to fall under any of those. I'm personally ambivalent about the matter. Hide or not? A typical edit history will also say something like "Moved from Category:Eastern Catholic churches in the United States" and hiding the summary also hides that as an unintended side effect. We can't selectively hide only part of an edit summary. The list of authors in the edit summary is redundant to the names below the edit in the revision history, but seeing them offers another level of reassurance that the history-merge was correctly performed. I suppose in the future a bot could walk through the same Cydebot edits, look for a specific name in the edit histories, and only selectively delete the summaries with that specific username. If someone is available and willing to do that. If not, then I suppose the user wanting to vanish will not be able to vanish with regard to these edit summaries. I don't know exactly how I would sense that an editor was vanished, but parsing the list of contributors in each edit summary and looking up each one to determine their status would add more work for me to look them up, and likely add new database user-space lookups. Would like a ruling on whether I should or should not do this piece, and if an RfC on the matter is required before automated history-merging can proceed, let us know so we can get that started ASAP.
- The sources for all of the ~67,000 pages I'm asking for approval to process on the first pass were deleted by Cydebot. I can compare the dates before I process the 7531 pages not deleted by Cydebot or Cyde to see whether there are any sharp needles in that haystack. In my latest edition of my reports such as this report, I'm listing the dates, e.g.:
- Category:LIN Media <-- Category:LIN TV 2015-01-17T08:39:33Z <-- 2015-01-17T11:49:02Z
- 08:39, 17 January 2015 is the date in the Category:LIN Media revision history
- 11:49, 17 January 2015 is the date in the deletion log for Category:LIN TV
- Category:LIN Media <-- Category:LIN TV 2015-01-17T08:39:33Z <-- 2015-01-17T11:49:02Z
- A scan of these reports shows most of these dates are within a few hours, or at most, a few days, of each other. User:Wbm1058/Category history merges: mergeable is the report showing a subset of the 67K pages ready for the first run. The timestamps are also in this report, and the deletion time generally follows the merge by several seconds. – wbm1058 (talk) 16:05, 15 February 2017 (UTC)[reply]
- Regarding the revdels, I made a crude request for further input over at WP:AN. 103.6.159.65 (talk) 04:54, 16 February 2017 (UTC)[reply]
- I definitely don't support revdeling every edit summary that mentions editors' usernames, that would be ridiculous. If the bot were able to specifically revdel edit summaries that mention vanished users' usernames then I suppose I would be OK with that, although even then I'm not sure it's a good use of time/resources. Jenks24 (talk) 11:52, 16 February 2017 (UTC)[reply]
- Regarding the revdels, I made a crude request for further input over at WP:AN. 103.6.159.65 (talk) 04:54, 16 February 2017 (UTC)[reply]
- The sources for all of the ~67,000 pages I'm asking for approval to process on the first pass were deleted by Cydebot. I can compare the dates before I process the 7531 pages not deleted by Cydebot or Cyde to see whether there are any sharp needles in that haystack. In my latest edition of my reports such as this report, I'm listing the dates, e.g.:
I don't feel REVDEL is necessary (as others have noted), as it's not even applied to normal edit summaries elsewhere when a user vanishes, and many would lose access to history information about what happened needlessly. Plus, it might make things even more confusing if the bot screws up; histmerges can already be a pain to revert, and this would just make it more difficult for non-admins to detect botched ones. Anyway, I'd like to see the discussion settle a little bit before trialling, as it still seems details are being flushed out via input from others (ideally more BAG if possible; BRFA is a bit flooded right now, though). --slakr\ talk / 08:14, 17 February 2017 (UTC)[reply]
- OK, then it seems the consensus is to not revdel. I commented out the line in my code that did that. I ran another test to look for the outliers among the 64K items proposed for hist-merging on the first pass:
- Hist-mergeable: 67049 -- max. time difference = 665752 -- min. time difference = 3
- Hist-mergeable max. time difference: Category:Danish military aircraft 1910–1919 <-- Category:Danish military aircraft 1910-1919 2011-09-27T13:50:33Z <-- 2011-10-05T06:46:25Z (665752)
- Hist-mergeable min. time difference: Category:Creative Commons Attribution-ShareAlike 2.0 Belgium files <-- Category:Creative Commons Attribution-ShareAlike 2.0 Belgian files 2014-07-21T10:28:23Z <-- 2014-07-21T10:28:26Z (3)
- Hist-mergeable: 67049 -- max. time difference = 665752 -- min. time difference = 3
So the fastest any source item was deleted was 3 seconds after the merge was performed. There's likely many with a 3-second time gap, I just reported the first one found. It's reassuring not to find any that were deleted before they were merged.
At the other extreme, Category:Danish military aircraft 1910-1919 wasn't deleted until 665752 seconds (11095 minutes, 52 seconds) (184 hours, 55 minutes) (7 days, 16 hours) after it was merged into Category:Danish military aircraft 1910–1919:
- Deletion log: 06:46, 5 October 2011 Cydebot deleted page Category:Danish military aircraft 1910-1919 (Robot - Removing category Danish military aircraft 1910-1919 per CFD at Wikipedia:Categories for discussion/Log/2011 September 20.)
- (recall that I'm skipping items when there is more than one deletion in the deletion log)
Deleted history of Category:Danish military aircraft 1910-1919 (6 deleted edits)
- 13:50, 27 September 2011 . . Cydebot m (665 bytes) (Robot - Moving category Danish aircraft 1910-1919 to Category:Danish aircraft 1910–1919 per CFD at Wikipedia:Categories for discussion/Log/2011 September 20.)
- 13:12, 27 September 2011 . . Cydebot m (663 bytes) (Robot - Moving category Military aircraft 1910-1919 to Category:Military aircraft 1910–1919 per CFD at Wikipedia:Categories for discussion/Log/2011 September 20.)
- 04:10, 15 September 2011 . . Jenks24 (661 bytes) (tag for speedy renaming)
- 17:23, 17 August 2010 . . TXiKiBoT m (196 bytes) (robot Modifying: sr:Категорија:Дански војни авиони 1910—1919.)
- 23:55, 14 August 2010 . . Xqbot m (194 bytes) (robot Adding: sr:Категорија:Дански војни авиони 1910-1919.)
- 14:54, 22 August 2008 . . Rlandmann (117 bytes) (← Created page with 'Category:Danish military aircraft Category:Danish aircraft 1910-1919 Category:Military aircraft 1910-1919')
Revision history of Category:Danish military aircraft 1910–1919 (3 edits; there's no deleted history; the bot would have skipped it if there was... no items in the log for this page, either)
- 08:42, 22 March 2013 Addbot m . . (121 bytes) (-79) . . (Bot: Migrating 1 interwiki links, now provided by Wikidata on d:q7030703) (rollback: 1 edit | undo)
- 13:50, 27 September 2011 Cydebot m . . (200 bytes) (+2) . . (Robot - Moving category Danish aircraft 1910-1919 to Category:Danish aircraft 1910–1919 per CFD at Wikipedia:Categories for discussion/Log/2011 September 20.) (undo)
- 13:50, 27 September 2011 Cydebot m . . (198 bytes) (+198) . . (Robot: Moved from Category:Danish military aircraft 1910-1919. Authors: Jenks24, TXiKiBoT, Xqbot, Rlandmann, Cydebot)
Is there an explanation for the week-long delay between the merge and the deletion here? Is it because there was previous Cydebot action on the page, less than an hour before the history-merge was done? In any event, this item still looks safe to process, despite the week-long gap. wbm1058 (talk) 14:38, 20 February 2017 (UTC)[reply]
The discussion for this one is at Wikipedia:Categories for discussion/Log/2011 September 20#Hyphenated aircraft categories. There are two similar categories here, and it's easy to get them confused:
- Category:Danish military aircraft 1910-1919 to Category:Danish military aircraft 1910–1919
- Category:Danish aircraft 1910-1919 to Category:Danish aircraft 1910–1919
It appears that two Cydebot category moves happened within the same minute, @ 13:50, 27 September 2011:
- Category:Danish aircraft 1910–1919: Revision history
- 13:50, 27 September 2011 Cydebot m . . (77 bytes) (+77) . . (Robot: Moved from Category:Danish aircraft 1910-1919. Authors: Jenks24, Rlandmann)
Here's Cydebot's new page creation log for that time
Category:Danish military aircraft 1910–1919 moved first, immediately followed by Category:Danish aircraft 1910–1919.
– wbm1058 (talk) 16:12, 20 February 2017 (UTC)[reply]
- Approved for trial (10 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (well, 10 pages, I guess :P) @Wbm1058: I don't see any major objections related specifically to this bot, and I think most of the kinks, as far as general logic flow is concerned, have likely been flushed out (e.g., avoiding revdelete, being cognizant of previously-deleted categories). It'll probably be easiest to just do a handful of pages first, under your sysop account, as in case something goes wrong, the revert is a little more involved than just "undo." We can then expand the run a bit more to catch any edge cases and/or do another run if tweaks need to be made. --slakr\ talk / 00:05, 22 March 2017 (UTC)[reply]
- Trial complete. Actually, 30 logged actions, the only edits were to my user space.
- I didn't walk through the whole set this time, only a subset sufficient to get 10 to process.
- Skipped the first four as these were previously processed during development testing.
- These were the Ten items processed on this run.
- One item was skipped because the destination has deleted history.
- One item was skipped because the source has multiple deletions.
- Again, This is the beginning of the selection set. Also given in function details above.
- Noting that I logged 30 actions within a single minute, let me know whether I should employ a speed limiter to keep this truck from driving unsafely :o) wbm1058 (talk) 22:17, 23 March 2017 (UTC)[reply]
- I just updated my code so that on the next run, the merge-log edit summaries will say "per [[Wikipedia:Bots/Requests for approval/Merge bot 2]]". Given that Wikipedia:Bot requests#Bot for category history merges will eventually be archived. wbm1058 (talk) 23:09, 23 March 2017 (UTC) yep, it's archived now – wbm1058 (talk) 19:48, 4 April 2017 (UTC)[reply]
- Note: I just did the 2 skipped examples mentioned above. The second was clearly too complicated for a bot, the first was borderline. עוד מישהו Od Mishehu 18:24, 27 March 2017 (UTC)[reply]
- @Wbm1058: 30apm is getting up there, are you using maxlag or any other limiter currently? — xaosflux Talk 17:58, 3 June 2017 (UTC)[reply]
- No, I realized that editing at this speed would likely cause problems if it persisted for more than a few minutes, so that's why I asked. I'm not aware of what "maxlag" is, or any other limiters. Nor am I aware of any policies or guidance on the matter. I am aware that you can control the time that AWB waits between edits. I can add a PHP sleep command after doing each item on the list, just tell me how long to sleep between items and I'll add that to my code. Thanks, wbm1058 (talk) 18:07, 3 June 2017 (UTC)[reply]
- Oh, I see. mw:Manual:Maxlag parameter. If that's better, I can look into figuring out how to use that, but just sleeping for a bit would be easier to code. wbm1058 (talk) 18:27, 3 June 2017 (UTC)[reply]
- @Wbm1058: Maxlag is ideal, but if you'd rather not figure that out, a five second sleep would be fine. ~ Rob13Talk 05:22, 13 June 2017 (UTC)[reply]
- Right, maxlag would be handled in the framework (which to me seems just like a library); I inherited the "Chris G framework" when I took on the bots supporting requested moves and proposed merges. I now maintain a version of this framework which only supports maxlag in function
edit
to edit a page, but my bots don't specify a maxlag value when making edits. This task doesn't repetitively edit pages, it repetitively undeletes a page, merges pages, and deletes a page. I'm using the framework's undelete and delete functions, but had to write my ownmerge_history
function, as there was not a function for that in the framework. At the lower level the framework uses API:Undelete, API:Mergehistory and API:Delete, none of which's documentation mentions that they support maxlag. I think it's probably safe not to need to check for maxlag with every bot action, but just once every three bot actions should be enough to ensure we're not unduly stressing the system. I note that when maxlag fails it throws an error. I'm checking for errors returned by each of these 3 API calls, and am stopping execution with any error, regardless of what the specific error is. So, assuming that Mergehistory supports maxlag, I'll arbitrarily choose that one for the once-every-3-actions check:return $this->query('?action=mergehistory&format=json',$params);
→return $this->query('?action=mergehistory&format=json&maxlag=5',$params);
- Since this is evidently a low-priority maintenance task, however, I think adding the 5-second sleep after every completed 3-actions set, in addition to the maxlag check, would be a good thing to do. If this can wait ~6 months to do, then slowing it down a bit shouldn't be a big deal. Maxlag is more for making sure that other editing activity (bot or human) isn't stressing the system, while the objective of the 5-second sleeps is to ensure that this bot task doesn't overload the system.
- Right, maxlag would be handled in the framework (which to me seems just like a library); I inherited the "Chris G framework" when I took on the bots supporting requested moves and proposed merges. I now maintain a version of this framework which only supports maxlag in function
- @Wbm1058: Maxlag is ideal, but if you'd rather not figure that out, a five second sleep would be fine. ~ Rob13Talk 05:22, 13 June 2017 (UTC)[reply]
- Can I get approval for an extended trial, or full approval for the task? Thanks, wbm1058 (talk) 22:23, 13 June 2017 (UTC)[reply]
Approved for extended trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please process 250 pages (which means 750 logged actions). You can run this trial from your main account again. For now, please use a 10 second pause between processing pages, given that you'll be doing this without the bot flag. This can later be relaxed to five seconds once you're using a bot account. ~ Rob13Talk 03:14, 14 June 2017 (UTC)[reply]
- Trial complete. 750 logged actions — 250 items processed.
- Handled 311 items from the selection set in order to find 250 to process. Summary of the 61 not processed:
- 14 items previously processed (4 done during development testing and 10 done by first trial)
- 29 items skipped because the destination has deleted history.
- 14 items skipped because the source has multiple deletions.
- 4 items were skipped because they were not deleted by Cydebot: 2 deleted by Good Olfactory and 2 deleted by Black Falcon.
- Also 2 items skipped because there were no deleted revisions to restore & merge.
- Hist-mergeable max. time difference: Category:Ancient Greek archaeological sites in Turkey <-- Category:Ancient Greek sites in Turkey 2015-03-17T20:18:49Z <-- 2015-03-17T20:27:26Z (517)
- Hist-mergeable min. time difference: Category:Faculty by university or college in Eritrea <-- Category:Faculty by university or college in in Eritrea 2015-03-09T00:57:42Z <-- 2015-03-09T00:57:47Z (5)
- I don't see any problems; the trial appears to have run as expected. – wbm1058 (talk) 15:31, 14 June 2017 (UTC)[reply]
- I got some feedback on some of the actions of this trial on my talk page. Doesn't appear to be any technical problem with my edits; rather this seems to point to the potentially controversial nature of some category titles. – wbm1058 (talk) 14:04, 16 June 2017 (UTC)[reply]
Approved. Not at all concerned with the comment on your talk page, as it represents disagreement with the consensus at a CfD, not disagreement with the bot's actions. Please use a 5 second or more pause between processing pages or maxlag. ~ Rob13Talk 15:32, 5 July 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.