Wikipedia:Bots/Requests for approval/SoxBot 20
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: X!
Automatic or Manually assisted: Automatic
Programming language(s): PHP
Source code available: Not yet, will release at some point
Function overview: Tag articles with {{orfud}} per WP:NFCC#7.
Links to relevant discussions (where appropriate): This was previously approved for BJBot (link).
Edit period(s): Daily
Estimated number of pages affected: Probably a couple hundred a day. (Rough estimate)
Exclusion compliant (Y/N): Y
Already has a bot flag (Y/N): Y
Function details: This bot will go through all the fair-use images uploaded to the wiki. For each one, it will:
- Check if it is indeed tagged with a fair use template
- Check if it is not used in any articles
- Check if it is not already tagged as orfud
If it satisfies those three requirements, it will put {{subst:orfud}} at the top or the image page.
Discussion
[edit]Bot appropriate task, the nature of the images being tagged and their not being in articles indicates community consensus. My only concern would be if someone is writing an article and just uploaded the image? Is there some time constraint on when the tagging will be done relative to the uploading? Competent and communicative bot operator, no other concerns with this bot or operator. --IP69.226.103.13 (talk) 20:23, 28 December 2009 (UTC)[reply]
- One question and one answer, IP69.226.103.13 ORFU has a mandatory waiting period between tagging and deletion so that will not be an issue. But does the bot check for image redirects and how the redirects are used? βcommand 01:01, 29 December 2009 (UTC)[reply]
- Thanks Betacommand. Looks good to me, then. --IP69.226.103.13 (talk) 01:20, 29 December 2009 (UTC)[reply]
- {{OperatorAssistanceNeeded}} For Beta's questions, also could you advertise this somewhere appropriate since these tasks tend to be contentious. MBisanz talk 10:16, 30 December 2009 (UTC)[reply]
- Really? Good catch, then. One more reason to get a little input before moving forward, as it would not have occurred to me that getting rid of orphaned fair use images would be contentious. Please link back to discussion here, as I am curious to follow. --IP69.226.103.13 19:20, 31 December 2009 (UTC)
- Yep, anything to do with fairuse images tends to be controversial and gets strict scrutiny from BAG. MBisanz talk 04:44, 1 January 2010 (UTC)[reply]
- Really? Good catch, then. One more reason to get a little input before moving forward, as it would not have occurred to me that getting rid of orphaned fair use images would be contentious. Please link back to discussion here, as I am curious to follow. --IP69.226.103.13 19:20, 31 December 2009 (UTC)
I've spammed WP:AN, WP:VPR, and WT:NFCC. Hopefully, that will suffice. (X! · talk) · @979 · 22:29, 1 January 2010 (UTC)[reply]
- The task is good but there as some tweaks to keep the ire down. Need a lag between an image's creation and checking if it is orphaned, otherwise people who press the upload button may, while editing the article to add the image shortly after, see the orange talk bar. Almost every image when just uploaded is orphaned—say 30-60 minutes ? Has anyone got thoughts on how to manage 10→7000 messages on one talkpage during a run ?. Avoiding this is going to involve more programming but I think is worth some time. Apart from those two points - this all looks good. - Peripitus (Talk) 23:25, 1 January 2010 (UTC)[reply]
- I'm going to have it wait a week after being uploaded to prevent this. (X! · talk) · @046 · 00:06, 2 January 2010 (UTC)[reply]
- Then I have no objections - just be prepared for rankled comments if the bot drops repeated messages on one user's page - Peripitus (Talk) 00:57, 2 January 2010 (UTC)[reply]
- A week is probably overkill, maybe 48 hours? But a week is fine. Let's wait for response from the pages you spammed, though. I have no problems with this bot in general, as I think the community has been fairly alerted and BAG members seem in the know and on top of the type of community concerns that may arise with this bot. --IP69.226.103.13 | Talk about me. 07:36, 2 January 2010 (UTC)[reply]
- Do you know roughly how many FU images are currently orphaned, and would be caught in the first run of the bot? Pushing the whole backlog into one day of Category:Orphaned non-free use Wikipedia files might be a bit overwhelming, and the backlog should maybe be chipped away more gradually. Not sure what's preferable though, maybe having the backlog where people can see it on-wiki is actually better. Amalthea 11:53, 2 January 2010 (UTC)[reply]
- If there are editors who routinely work from the category to clean it up, then their preferences as a group for how it is done should be considered, imo. --IP69.226.103.13 | Talk about me. 18:25, 2 January 2010 (UTC)[reply]
- MBisanz can probably be comment on that, I regularly see him cleaning out the category. He uses the Twinkle mass deletion tool, but I assume he'll manually checks history and usage of each image first. If the bot floods the category and we trust the bot to categorize them correctly, it wouldn't be hard to whip up a tool to go through the category and delete all those that were tagged by the bot, still have no incoming links (properly interpreting redirects), and have no changes in history or log of file and file description page since the tagging. Of course, such a tool would also need BAG approval. :) So again, a rough estimate how big the flood will be would be helpful. Amalthea 18:55, 2 January 2010 (UTC)[reply]
- If there are editors who routinely work from the category to clean it up, then their preferences as a group for how it is done should be considered, imo. --IP69.226.103.13 | Talk about me. 18:25, 2 January 2010 (UTC)[reply]
- Are you going to notify the initial uploader, i.e. not the uploader of the latest or last remaining revision (fair use reduce!), but the original uploader who created the file description page? ({{Di-orphaned fair use-notice}}) Amalthea 11:53, 2 January 2010 (UTC)[reply]
- At the moment, it notifies the original uploader. (X! · talk) · @257 · 05:10, 7 January 2010 (UTC)[reply]
- Got code? Amalthea 11:53, 2 January 2010 (UTC)[reply]
- I'll have to sanitize it first. (X! · talk) · @257 · 05:10, 7 January 2010 (UTC)[reply]
- Beta, I seem to remember that you previously ran a bot that notified article talk pages about pending deletion of unused images with FUR pointing to their article. What was the issue with that, simply the volume of the spamming and the comparatively few images that actually required action? Amalthea 11:53, 2 January 2010 (UTC)[reply]
- Then I have no objections - just be prepared for rankled comments if the bot drops repeated messages on one user's page - Peripitus (Talk) 00:57, 2 January 2010 (UTC)[reply]
- Amalthea, I usually look at the logs of User:ImageRemovalBot after doing CSD#F5 cleanouts to see which ones were wrongly deleted; this bot would be paired with the already approved User:Orphaned image deletion bot admin bot, which would clear out the category, and that does check to make sure the image is orphaned. I think BJweeks ran a bot similar to this that did go back everyday and make sure the image was still orphaned, but I'm not sure. Also, I don't remember how Beta's old program worked. MBisanz talk 01:18, 3 January 2010 (UTC)[reply]
- Ah, darn, I've read about User:Orphaned image deletion bot, backlog is not an issue then. Thanks, Amalthea 01:47, 3 January 2010 (UTC)[reply]
- I find it concerning if there is no human-in-the-loop at all in the deletion process. --Apoc2400 (talk) 15:56, 5 January 2010 (UTC)[reply]
- What role do you envisage a human playing? Josh Parris 16:08, 5 January 2010 (UTC)[reply]
- For example, checking that the image wasn't mistakingly removed from an article or removed by vandalism. --Apoc2400 (talk) 18:44, 5 January 2010 (UTC)[reply]
- Are there tools available to discover that? Josh Parris 21:56, 5 January 2010 (UTC)[reply]
- No, tools won't work on this, although from historical practice, admins at most check if an image is in current use, not if it was removed mistakenly, due to the sheer number of images that are orphaned at any given time. MBisanz talk 22:26, 5 January 2010 (UTC)[reply]
- Are there tools available to discover that? Josh Parris 21:56, 5 January 2010 (UTC)[reply]
- For example, checking that the image wasn't mistakingly removed from an article or removed by vandalism. --Apoc2400 (talk) 18:44, 5 January 2010 (UTC)[reply]
- What role do you envisage a human playing? Josh Parris 16:08, 5 January 2010 (UTC)[reply]
- I find it concerning if there is no human-in-the-loop at all in the deletion process. --Apoc2400 (talk) 15:56, 5 January 2010 (UTC)[reply]
We need to estimate how many pages will be tagged overall. Depending on that number, we should think about staggering the bot run so that only so many per day are tagged. — Carl (CBM · talk) 14:04, 6 January 2010 (UTC)[reply]
P.S. As I was looking into this I ran into a more serious issue with image redirects, see below. — Carl (CBM · talk) 14:19, 6 January 2010 (UTC)[reply]
- To answer my own question, there are just under 1000 unused images non-free images at the moment, and another ~700 that are only used via redirects. I would think that limiting the tagging to 100/day would keep the daily workload manageable and still get rid of any backlog pretty quickly. — Carl (CBM · talk) 14:33, 6 January 2010 (UTC)[reply]
Image redirects
[edit]Does the bot properly recognize image redirects? For example, the non-free image File:"A" stamp of the US, 1978.jpg is unused according to the regular user interface and according to the API [1], but it has a redirect from File:1743.jpg and that redirect is included as an image in Non-denominated postage [2]. So the bot needs to handle this sort of thing. I see it mentioned above but without any response about whether the code actuall handles it. Since the code is not released, I can't just look it up. — Carl (CBM · talk) 14:19, 6 January 2010 (UTC)[reply]
- Hmm... image redirects was not in my consideration. I guess I have two courses of options: 1) Check for all redirects and get their usage, or 2) ignore images with redirects. The second is easier to program, yet the first is probably the better option. I think I'll end up programming in the first one, but it won't end up being programmed in until late next week due to IRL stuff. (X! · talk) · @258 · 05:12, 7 January 2010 (UTC)[reply]
- If you have a toolserver account, here is the code I used to get the counts. It's much easier to let the database make the subqueries for you. — Carl (CBM · talk) 13:20, 8 January 2010 (UTC)[reply]
create temporary table u_cbm.foo ( f_title varchar(255) );
insert into u_cbm.foo
select ip.page_title
from page as ip
join categorylinks on ip.page_id = cl_from and cl_to = 'All_non-free_media'
left join imagelinks on il_to = ip.page_title
where page_namespace = 6 and isnull(il_from);
select f_title as y from u_cbm.foo as xx
where not exists
(select 1
from page as rp
join redirect on rp.page_id = rd_from and rd_namespace = 6
join imagelinks on il_to = rp.page_title
where rd_title = xx.f_title);
- Just an idea. Would it not be nice if a bot changed "links" to images so it used the actual image and not the redirect (redirects could then be deleted)? I can imagine it would be a help to more than just this one bot here? --MGA73 (talk) 19:33, 21 January 2010 (UTC)[reply]
Withdrawn by operator. I simply don't have enough time in real life to dedicate the hard work necessary into this bot. (X! · talk) · @299 · 06:11, 25 January 2010 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.