Wikipedia:Bots/Requests for approval/DPL bot 2
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: JaGa (talk · contribs)
Time filed: 18:50, Thursday November 10, 2011 (UTC)
Automatic or Manual: Automatic
Programming language(s): PHP
Source code available: Not currently
Function overview: Notifies editors of disambiguation links they have recently added to articles.
Links to relevant discussions (where appropriate): Wikipedia talk:Disambiguation pages with links#User dablink notification, Wikipedia talk:Disambiguation pages with links#Update and Request for Comment - User dablink notification, User talk:JaGa#Links to disambig notifier
Edit period(s): Once daily
Estimated number of pages affected: 300-500 edits/day
Exclusion compliant (Y/N): Y
Already has a bot flag (Y/N): Y
Function details: This bot is going to take some explaining, so please bear with me.
There are many different ways to create links to disambiguation pages, but they can be divided into two categories: Maintenance and Content Creation. Maintenance dablinks occur in the course of organizing or cleaning up the wiki; for instance, changing a redirect to point to a disambig because there is no primary topic, or converting a short list article into a disambiguation page. Content Creation dablinks, on the other hand, are much more straightforward: adding "he was born in Georgia" to an article and not realizing the link goes to a disambig instead of the intended article.
I have noticed a couple of differences between Maintainers and Content Creators:
- Content Creators have no easy way to tell they've created a dablink, short of testing each and every wikilink
- Content Creators are more likely to be glad to find out about dablinks they've inadvertently created
Regarding the second point, Maintainers often focus on specific tasks, not unlike an assembly line. They look at the wiki, and work on a certain problem. Once their task is complete, even if that task happens to create a bunch of disambig links (like deciding there's no primary topic), that can be taken care of by another worker further down the line; it was a side effect of their own contribution and not really their concern. This is not the attitude of all Maintainers but certainly common.
For a Content Creator, however, the dablink is not a side effect; it's part of their contribution to the wiki. The Content Creator is more likely to feel ownership of the dablinks they've brought into existence; this ownership makes them want to fix the dablinks, to make their own contribution better.
So, IMO, there's a difference in attitudes based on feelings of ownership. With this in mind, I realized we have an unexplored opportunity; there are hundreds of editors who want to know about and fix dablinks they've created, but there's no easy way for them to find out.
So I set out to create a bot that notifies Content Creators of dablinks they've recently created. I wanted to focus on the Content Creators, since I think many Maintainers would be irritated by these messages, and that isn't what I want; this is intended as a service, not a stern call to duty.
The first step was to devise a way to categorize the various newly created dablinks. I did that with this report: Dab Dashboard, and I don't mind telling you, it was not a quick job.
Once I had my "Type 5" disambig links, I could compose messages solely for Content Creators. I've been testing it for some time now (you can see the most recent results at User:JaGa/Sandbox). Several questions have come up already, so I'll go ahead and deal with them:
Anonymous editors: IP editors would not receive messages.
New users: Users with less than 100 edits would not receive messages. This cuts the number of messages to send by 25-35%.
Vandalism reverts: Gladly, the vast majority of reverters (namely quick-response vandal fighters) would not be bothered by DPL bot. This is because the messages are only intended for new dablinks, and I only check for new dablinks twice daily. Take, for instance, The Feynman Lectures on Physics, which has linked to the disambig Magnetic resonance since February 15. Now, say some vandal pageblanks The Feynman Lectures on Physics and ClueBot quickly reverts it. ClueBot would not receive a message. This is because when the update script runs some time later, it will see that The Feynman Lectures on Physics still links to Magnetic resonance, and since it is not a new dablink, it would send no message.
Article creation requests (or as I like to think of it, the Alpha Quadrant Problem): Early in my testing, I noticed Alpha Quadrant's name coming up again and again. I soon realized this was because users were creating dablink-containing articles outside the article namespace, Alpha Quadrant was moving them into article space, and then getting "blamed" for the dablink. To solve this, I changed the logic of the scripts. Originally, the tool only examined article edits that occurred since the last update. Now, if the tool detects that the article didn't exist as of the last update, and it finds that the article was moved in from a different namespace, it takes the entire editing history into account, regardless of namespace. This solved the problem.
Discussion
[edit]Looks good Approved for trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete., however please try and limit the number of edits at first (e.g. maybe only 50 for the first run?), just in case something goes wrong/people react badly, we don't want to have spammed 300-500 talkpages. --Chris 11:50, 14 November 2011 (UTC)[reply]
- Trial complete. 50 messages sent, diffs on talk. --JaGatalk 18:47, 14 November 2011 (UTC)[reply]
- One bug - I'd placed the user's name in the section header for when I was sending the output to my Sandbox, and forgot to remove it before this trial run. It has now been removed. --JaGatalk 18:50, 14 November 2011 (UTC)[reply]
- Second bug - if an editor's talk page redirects to another talk page, DPL bot was writing to the redirect. I've updated the code to detect the redirect and write to the redirect target (assuming that redirect target has rd_namespace = 3; otherwise it writes no message). --JaGatalk 21:18, 14 November 2011 (UTC)[reply]
- One bug - I'd placed the user's name in the section header for when I was sending the output to my Sandbox, and forgot to remove it before this trial run. It has now been removed. --JaGatalk 18:50, 14 November 2011 (UTC)[reply]
After sending out 49 messages (one was lost due to the second bug), I sent 50 more messages to my Sandbox, to have a control group to compare results with. Here's what happened after 24 hours:
Message sent? |
Num recipients that fixed dablinks |
Num dablinks before trial |
Num dablinks after trial |
Num dablinks fixed by recipients |
Num dablinks fixed by others | |
---|---|---|---|---|---|---|
TRIAL GROUP (49 dablink creators) |
Yes | 27 | 72 | 29 | 40 | 3 |
CONTROL GROUP (50 dablink creators) |
No | 1 | 72 | 65 | 1 | 6 |
So, a pretty successful trial, where more than half of the message recipients fixed the dablinks they'd created. What's more, I received no complaints. --JaGatalk 21:01, 15 November 2011 (UTC)[reply]
Ok, lets go for a bit of a larger trial first to make sure, and assuming that goes well, I'm happy to approve Approved for extended trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete., lets go for about 300-500 edits, so we get an accurate idea of a daily run. --Chris 01:31, 16 November 2011 (UTC)[reply]
- Trial complete. 349 messages sent, diffs on talk. I actually ran the bot twice today, which is not the way it will be going forward, but I wanted to get a full day's sample size.
- Possibly unnecessary technical explanation Why I had to do that is somewhat complicated. I check for new dablinks twice daily: at 8:30AM and 8:30PM UTC. But I only want to send messages once per day, after the morning run. Therefore each morning run should include all dablinks created since 8:30AM the day before. When I was approved for a second trial, I marked all dablinks recorded in my user database as "message sent" to get fresh data the next morning. But the 8:30PM run had already completed. So this morning, I sent messages for all dablinks created since 8:30PM yesterday, instead of AM. To compensate, I ran the bot again after today's 8:30PM run, getting a good daily sample. I made sure not to message the same person twice in a single day. --JaGatalk 00:26, 17 November 2011 (UTC)[reply]
Approved. Please make sure that you add very clear and simple instructions on the bot's userpage, on how to opt-out of these notifications (via {{bots}}). --Chris 03:45, 17 November 2011 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.