Wikipedia:Bots/Requests for approval/MPUploadBot
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Approved.
Operator: Xclamation point
Automatic or Manually Assisted: Automatic
Programming Language(s): PHP, using a modified version of ClueBot classes
Function Summary: Daily, uploading and protecting images from Commons which are on the Main Page
Edit period(s) (e.g. Continuous, daily, one time run): Continuous
Already has a bot flag (Y/N): Has a bot flag, but no admin flag.
Function Details: This bot will first do a search for all images currently included on the Main Page. It then checks which ones are on Commons and which ones are uploaded locally. It looks at the ones from Commons, and determines if they are already protected or not. If it is not, it uploads it to the English Wikipedia. After that, it puts {{C-uploaded}} onto the image. Every day, it deletes the images that it uploaded for the previous day, bringing the Commons version back.
Source code is available at User:X!/Main Page bot Source. The results of 1 day of a dry run is at User:X!/MPBot Dry Run.
Discussion
[edit]- You'll create a separate account, no? Security through segregation seems reasonable here. --MZMcBride (talk) 02:35, 30 September 2008 (UTC)[reply]
- Yes, it will run under SoxBot. The current SoxBot tasks will be moved to another bot account. Xclamation point 02:36, 30 September 2008 (UTC)[reply]
So the bot runs one time per day? How does it synchronize with the change of TFA? That is, how do you guarantee there is no window between the TFA cahnge and the bot running? — Carl (CBM · talk) 02:40, 30 September 2008 (UTC)[reply]
I will run it 5 minutes after midnight (UTC) each day, and just to make sure, purge the cache before it starts. Xclamation point 02:44, 30 September 2008 (UTC)[reply]
How are you planning on preventing timed vandalism attacks over on Commons from being replicated and protected here? east718 // talk // email // 03:14, 30 September 2008 (UTC)[reply]
- Good question. I suppose a mechanism similar to FA Template Protection Bot (e.g. if it has been edited in the past
10 minutes24 hours, it doesn't protect and instead posts a message to an IRC channel. Xclamation point 03:18, 30 September 2008 (UTC)[reply]
- That's acceptable for the FA template protection bot because it runs on a five-minute crontab (if my memory is serving me correctly). Yours will run only once a day, so a ten-minute window will still allow vandals to hit the Commons images, say, eleven minutes before launch. A different approach is needed here. Also, Commons receives significantly lower traffic than enwiki, and it's not unreasonable to think vandalism could stick around for 15 minutes or so there. Even here, we've had Avril-/Zodiac-style vandalism linger on the FA for fifteen minutes at a time. east718 // talk // email // 03:25, 30 September 2008 (UTC)[reply]
So how about it it has been edited in the past 23 hours then?Xclamation point 03:36, 30 September 2008 (UTC)[reply]
- Focusing on Carl's concerns a bit, at times, the FA picture changes (due to finding a better one or finding one at all that's not fair use). Will the bot respond accordingly? --MZMcBride (talk) 16:39, 30 September 2008 (UTC)[reply]
- Maybe if it runs every hour? I can't see any other way. Xclamation point 20:33, 30 September 2008 (UTC)[reply]
I've modified it to run every 5 minutes.Xclamation point 16:19, 1 October 2008 (UTC)[reply]
- Maybe if it runs every hour? I can't see any other way. Xclamation point 20:33, 30 September 2008 (UTC)[reply]
It might be better if the bot operates on tomorrow's featured article. If some vandalism happens to slip by due to the 10 minute gap, it will probably be noticed by the TFA team before the article is made live the next day. The bot would still undo its actions on yesterday's featured article as originally planned. — Carl (CBM · talk) 04:08, 30 September 2008 (UTC)[reply]
- That's actually a good idea. I've modified it so while it deletes the images that it uploaded after they are on the Main Page, it also protects the articles on Tomorrow's Main Page. Xclamation point 10:52, 30 September 2008 (UTC)[reply]
- "Protects the articles" → "Protects the images" ? --MZMcBride (talk) 16:39, 30 September 2008 (UTC)[reply]
- *facepalm* Xclamation point 20:31, 30 September 2008 (UTC)[reply]
- "Protects the articles" → "Protects the images" ? --MZMcBride (talk) 16:39, 30 September 2008 (UTC)[reply]
- Support the need for a bot to protect such images, assuming all issues above are ironed out. -- how do you turn this on 20:10, 3 October 2008 (UTC)[reply]
Comment - I am running a dry run currently, saving the results to a text file (which will be released soon). Xclamation point 22:01, 3 October 2008 (UTC)[reply]
- I'm pretty concerned at the culture of responsibility-shirking that this bot will create. This isn't just empty speculation on my part; I've witnessed it firsthand with my own featured article protection robot. Right after I started running it, it would rarely protect any pages because administrators were diligent in making sure the featured article was safe from pagemove vandalism. As more people became aware of its existence, less administrators proactively protected the TFA. Now, the bot protects pages daily because no administrator wants to do it anymore - they slack off because they know that there's a helpful robot that will do it for them. Given this, a reasonable person would have to expect that administrators will collectively react the same with regards to image protections once your bot goes into service. I don't think group human behavior is something we can influence, so the next best thing to do is avoid half-measures and make sure this robot has a bulletproof algorithm and code to match. This business of running every five minutes to detect new images isn't going to cut it, as it will leave a large and predictable gap for mischief.
If I were coding this, this is how I'd do it. The bot would have two components:
- The first part would run at 23:59 UTC and load the list of images used in the mainpage blurbs for tomorrow's featured article, featured picture, and selected anniversaries. If any image has been changed in the past day, it would report these changes in #wikipedia-en-admins, #wikimedia-commons, and #wikimedia-admin, using the !admin@commons flag. All other images would be uploaded locally and protected.
- The second part would stay online 24/7 and hang out in the IRC RC feed. It would detect changes in the image for today's featured article, featured picture and selected anniversaries, along with DYK and ITN. Again, it would see if the image that's just been inserted has been edited recently on Commons and alert online administrators if so. If not, it would immediately upload and protect it here.
This would put my fears of timed vandalism attacks to rest, and ensure a completely bulletproof bot. The coding for this shouldn't be non-trivial; I had a similar short-lived bot that would detect 4chan-style vandalism on user talk pages and block when necessary. Creating that wasn't really difficult, and I'm a pretty unskilled coder. Let me know if you're willing to go through with this. east718 // talk // email // 06:12, 4 October 2008 (UTC)[reply]
- That's a good idea. I'll see if I can write that into my bot. Xclamation point 13:43, 4 October 2008 (UTC)[reply]
- Or at least I will once the toolserver comes back up... Xclamation point 18:55, 4 October 2008 (UTC)[reply]
- That's a good idea. I'll see if I can write that into my bot. Xclamation point 13:43, 4 October 2008 (UTC)[reply]
- Comment - I have updated it according to east718's suggestion above. The new source is at User:X!/Main Page bot Source. Xclamation point 21:09, 4 October 2008 (UTC)[reply]
After giving this more and more consideration, this seems inappropriate for a bot. If all of this time is going to be spent writing a bot in PHP, why not instead implement cascading protection from the various projects to Commons inside MediaWiki itself? Likely not a very simple task, but it's the best solution to this particular problem that I see. --MZMcBride (talk) 06:32, 4 October 2008 (UTC)[reply]
- I'll go see if I can write the code for that then, after I fix the bot to what east718 said. Xclamation point 13:43, 4 October 2008 (UTC)[reply]
- I'm not sure this is feasible. The connection between a wiki and the commons is only 1 link to the image directory, so it can't even find out if it's protected or not. Xclamation point 15:31, 4 October 2008 (UTC)[reply]
- Feasibility is entirely relative. While I'm not a developer, I think putting the functionality into mw:Extension:GlobalUsage would probably be the right move. --MZMcBride (talk) 16:08, 4 October 2008 (UTC)[reply]
- I'm not sure this is feasible. The connection between a wiki and the commons is only 1 link to the image directory, so it can't even find out if it's protected or not. Xclamation point 15:31, 4 October 2008 (UTC)[reply]
- Also, what degree of fault tolerance are you going to have? The toolserver is down right now and is notoriously unreliable (to the extent that river jokes about it in the MOTD). This will need to be addressed too. east718 // talk // email // 19:10, 4 October 2008 (UTC)[reply]
- I guess it'd be the same as any other bot, unless you have a 100% always up, always reliable, perfect server to run it on. As far as I know, another adminbot, User:FA Template Protection Bot also runs on the toolserver. Xclamation point 19:15, 4 October 2008 (UTC)[reply]
- It might be reasonable to have multiple copies running on different servers, so that one going down will not disable the bot. If nothing else, you might run a second copy on the stable toolserver. —Ilmari Karonen (talk) 20:01, 4 October 2008 (UTC)[reply]
- One problem: It has to have more than 1 person working on it to go on stable. I'll see if I can get a copy to run on ClueNet (as that's the only other server I can run it on 24/7). Xclamation point 20:02, 4 October 2008 (UTC)[reply]
- It might be reasonable to have multiple copies running on different servers, so that one going down will not disable the bot. If nothing else, you might run a second copy on the stable toolserver. —Ilmari Karonen (talk) 20:01, 4 October 2008 (UTC)[reply]
- I guess it'd be the same as any other bot, unless you have a 100% always up, always reliable, perfect server to run it on. As far as I know, another adminbot, User:FA Template Protection Bot also runs on the toolserver. Xclamation point 19:15, 4 October 2008 (UTC)[reply]
- Commenting on the task, not on how it is achieved, I think this is a good idea - removing monotonous and repetitive tasks that require no human thought is an excellent application for a bot. Tim Vickers (talk) 19:30, 4 October 2008 (UTC)[reply]
Section break
[edit]Ok, so after reading this over a few times, I think the current status of this bot needs to be clarified:
- The bot will run in 2 parts:
- One part runs at 00:01 UTC, and that gets images on tomorrow's Main Page. If they are not protected on Commons, nor uploaded locally, nor edited in the last 24 hours, it uploads the image
- Another part hangs out in the RC feed, and if an edit is made to today's TFA, POTD, DYK, ITN, or Selected Anniversaries, it performs the same checks.
- For both of these, it sends notices to #wikipedia-en-admins, #wikipedia-en-alerts and #wikimedia-commons if the image was edited in the last 24 hours.
- The source code has been reviewed by both User:Chris G and User:Cobi.
- A dry run is currently running silently, and results are being saved to User:X!/MPBot Dry Run.
- Should it be renamed to something more descriptive? (e.g. "Main Page Image Bot" or something)
- Would a MediaWiki feature work? I don't think so, as there is no connection between the actual PHP files on enwiki and commons. I'd like to see what other developers think.
- Is the toolserver reliable enough? I know that FA Template Protection Bot runs on it, what do others think?
Xclamation point 04:06, 5 October 2008 (UTC)[reply]
- I think a MediaWiki feature or extension to apply cascading protection to shared repos like Commons, even if technically feasible is likely to be opposed by Commons and would probably be socially unfeasible. That said, building something else into MediaWiki that would protect our images but leave commons untouched might certainly be possible (perhaps some sort of local image cache for cascade protected non-local images), I'd have to think about it some more. I think ^demon's been doing a bit of work with file repos, you may want to ask him.
- Also, I don't think so many IRC notices are necessary. -en-alerts should be fine, maybe -commons (#wikipedia-admins hasn't been used since 2007). Mr.Z-man 04:49, 5 October 2008 (UTC)[reply]
- Removed #wikipedia-admins. Xclamation point 04:50, 5 October 2008 (UTC)[reply]
- The name is OK.
- I don't see any way of cross-project uploads and stuff on the mediawiki side. If we're going to do this, it should be a bot, not an extension.
- Toolserver is fine. Perhaps you could supply users with a 'watcherbot', if it's that critical, to raise hell on IRC if the bot doesn't run. --uǝʌǝsʎʇɹoɟʇs(st47) 00:59, 6 October 2008 (UTC)[reply]
Since no one has raised any objections to the actual task and the code seems to be stable Approved for trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. under your main account, place Putt9567 at the front of your edit/delete/upload summaries so we can filter them out, also please place a link to this brfa and make it clear that it is a bot making the edit --Chris 03:58, 6 October 2008 (UTC)[reply]
- SQL query for later:
SELECT CONCAT('* ',log_action,' on [[:Image:',log_title,']] at ',log_timestamp) FROM logging WHERE log_comment LIKE '{{Putt9567}}%' AND log_user = 3010110 AND log_title NOT LIKE 'Image-%';
Sorry, I apologise if I did not make a comment earlier before the trial run because I do not monitor WP:RBA on a regular basis. For almost a year now, I have been semi-automatically protecting the Commons images for TFA, OTD, and POTD using a cascading protected page on Commons:User:Zzyzx11/En main page. Much of its details are explained on Commons:User talk:Zzyzx11/En main page. Some of the disadvantages I mention there may be relevant to SoxBot11, namely:
- ITN and DYK can change at anytime, not just at 0:00 UTC.
- The images on TFA, OTD, and POTD can also change midway through the day, especially if they have to be suddenly tagged as a copyvio.
- At the last minute before they fall onto the Main Page, images stored locally on Wikipedia can be deleted under Wikipedia:CSD#I8 and be replaced by the version on Commons.
- User:Raul654, our featured article director, sometimes does not schedule a featured article of the day until the very last minute. iirc, the latest I have seen him schedule something was about 23:00 UTC, one hour before it was suppose to go live on the en.wikipedia main page.
Therefore, if the bot functions as the way it currently does, do not be surprised if it frequently has to make posts on the IRC channels. On the other hand, IMO, based on my observations here and for reasons on Commons I will not go into for security reasons, it is a little harder to vandalise Commons images than ones directly stored here. Cheers. Zzyzx11 (Talk) 04:29, 7 October 2008 (UTC)[reply]
- Which is why I run 2 bots, one that runs at 00:01 UTC, and one that runs whenever DYK, TFA, ITN, Selected Anniversaries, or POTD is modified. Xclamation point 04:34, 7 October 2008 (UTC)[reply]
- Trial complete. 7 days completed, edits are here. Xclamation point 02:35, 13 October 2008 (UTC)[reply]
- Approved. Trial is good, operator reports no pending issues, I see no standing complaints. +sysop and +bot are requested. --uǝʌǝsʎʇɹoɟʇs(st47) 03:00, 13 October 2008 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.