Wikipedia:Bots/Requests for approval/AVBOT
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Withdrawn by operator.
Operator: Emijrp (talk · contribs)
Automatic or Manually assisted: Automatic
Programming language(s): Python (pywikipediabot and irclib)
Source code available: Google Code
Function overview: anti-vandalism
Links to relevant discussions (where appropriate):
Edit period(s): Continuous
Estimated number of pages affected: Depend on vandals
Exclusion compliant (Y/N):
Already has a bot flag (Y/N): No, and I think that it is not needed. It's edits must be seen in Recent Changes, right?
Function details: Anti-vandalism, anti-blanking, and anti-test edits. Also, it leaves a message to users.
Discussion
[edit]This bot has been tested in Spanish Wikipedia for about 2 years, and it has reverted about 200,000 vandalisms. emijrp (talk) 16:09, 21 February 2010 (UTC)[reply]
Some more features:
- Different messages for different vandalisms (blatant vandalism, blanking, tests edits);
- The regular expressions list can be edited by admins, in real-time;
- Reporting to Wikipedia:Administrator intervention against vandalism;
Regards. emijrp (talk) 17:09, 21 February 2010 (UTC)[reply]
- Needs wider discussion. as this bot has the potential to modify a large number of pages anywhere in the encyclopedia in a short amount of time, please announce this BRFA at Wikipedia:Village pump (proposals), Wikipedia:Village pump (technical) and Wikipedia:Bot owners' noticeboard; while there, please invite code reviews at Wikipedia:Village pump (technical) and Wikipedia:Bot owners' noticeboard with the reviews to be published here.
- What computer system does this bot operate from (if it is blocked by an admin, will that affect other users)? Josh Parris 22:28, 21 February 2010 (UTC)[reply]
- As long as the admins have a clue when blocking and dont enable autoblock (very stupid thing to do with bots as it also blocks the operator) it wont be a problem. βcommand 01:08, 22 February 2010 (UTC)[reply]
- Bot runs on meta:Toolserver, so, if blocking, don't enable autoblock, please. emijrp (talk) 19:44, 22 February 2010 (UTC)[reply]
- Hello Josh. I will post some messages on that places, thanks. emijrp (talk) 19:43, 22 February 2010 (UTC)[reply]
- Also, the bot-opperator is pretty well established. The idea to have bot correct vandalism is not new, and already has community support. However, community input never hurt anybody. Tim1357 (talk) 01:34, 22 February 2010 (UTC)[reply]
- Update: I created a pretty extensive list, and got some admins to add it to the page. Take a look and tell me what you think. However, because all the regexes are case-sensitive, I could not add anything that included shouting in articles. Tim1357 (talk) 05:36, 22 February 2010 (UTC)[reply]
- Hello Tim, thanks for working on the list. Regular expressions are not case-sensitive. emijrp (talk) 19:41, 22 February 2010 (UTC)[reply]
I have announced this BRFA at village pump[1][2] and at bot owner's noticeboard[3]. Regards. emijrp (talk) 21:50, 22 February 2010 (UTC)[reply]
- This bot will be working only in the article namespace, correct? Tim1357 (talk) 22:28, 22 February 2010 (UTC)[reply]
- It works in the user, wikipedia, and category ones too... but not in talk pages. emijrp (talk) 23:13, 22 February 2010 (UTC)[reply]
- I would like to suggest that regex comments be mandatory, so that one can quickly scan the list of regexes and see the intended purpose of each. DES (talk) 23:26, 22 February 2010 (UTC)[reply]
- Mandatory regexp comments is a good idea. emijrp (talk) 20:21, 23 February 2010 (UTC)[reply]
- Some Wikipedia: pages work like talk pages (for example, this one); is there a mechanism to exclude those pages, or would it be easier to exclude the WP: namespace? Josh Parris 02:09, 23 February 2010 (UTC)[reply]
- Good point josh, I vote we exclude the Wikipedia namespace. Also, I agree with DES, ill go through and start adding comments. Tim1357 (talk) 03:54, 23 February 2010 (UTC)[reply]
- Yes, it exists a feature to exclude pages (individual or using regexps). You can add pages to User:Emijrp/Exclusions.css. emijrp (talk) 20:21, 23 February 2010 (UTC)[reply]
- Can we also exclude edits to ones own userspace please? Also, I'd like to see the warnings being used by this bot. And I see at the Spanish Wikipedia the bot marks pages for deletion if they are too short, could you please say whether or not this function will be used here? (Probably not a good idea to use it here). And there's no need to not have the bot flag, since users can choose if bot edits show up in RC (I think they do by default). - Kingpin13 (talk) 09:39, 24 February 2010 (UTC)[reply]
- This bot doesn't revert own userpage edits. Newpages watch function can be disabled if you want. emijrp (talk) 18:34, 26 February 2010 (UTC)[reply]
- Can we also exclude edits to ones own userspace please? Also, I'd like to see the warnings being used by this bot. And I see at the Spanish Wikipedia the bot marks pages for deletion if they are too short, could you please say whether or not this function will be used here? (Probably not a good idea to use it here). And there's no need to not have the bot flag, since users can choose if bot edits show up in RC (I think they do by default). - Kingpin13 (talk) 09:39, 24 February 2010 (UTC)[reply]
- I would like to suggest that regex comments be mandatory, so that one can quickly scan the list of regexes and see the intended purpose of each. DES (talk) 23:26, 22 February 2010 (UTC)[reply]
- It works in the user, wikipedia, and category ones too... but not in talk pages. emijrp (talk) 23:13, 22 February 2010 (UTC)[reply]
There is no enough details, does this bot use a scoring system? Is it 1RR compliant? Sole Soul (talk) 12:28, 27 February 2010 (UTC)[reply]
- Yes, it uses a scoring system (see Regexp list for details). It is 1RR compliant for the same vandalism, but, if the user inserts a different bad word, it is reverted (it is a new vandalism). emijrp (talk) 21:38, 27 February 2010 (UTC)[reply]
- I'd like it if you did some work on the bot's user-page. Many new users will navigate to the bot's user-page, so lets make sure that they can understand what the bot is and why it reverted them. Also, I'd like to get started with part one of the trial. For the first part, I'd like you to log all the edits that would be reverted. That way, you can tweak the score list. How about 1 week? Later, we can move to a real-world trial. Good luck! Tim1357 (talk) 23:31, 27 February 2010 (UTC)[reply]
Trial
[edit]- Approved for trial (~ 7 days, userspace only). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Tim1357 (talk) 23:31, 27 February 2010 (UTC)[reply]
- Thanks, I have started the trial. You can see the log at User:AVBOT/Trial. emijrp (talk) 12:07, 28 February 2010 (UTC)[reply]
- I've been watching the bot in action, and it seems to be doing work that Cluebot isn't (I'm shocked).
- While I haven't seen much in the way of questionable calls, I think a week-long trial is too long. Analysis and verification of the data produced is heavy going work; Tim1357, do you have a reason to run this bot for 7 days? Can we cut it off at one, and re-start the trial once everyone has had a look at the performance of the bot in that window? Josh Parris 14:22, 28 February 2010 (UTC)[reply]
- Yea, now that I see how much the bot has already done, I suggest 3 days. Its entirely up to you, however. P.S. I made some changes to the regexp list. I was hoping to get you to sign off on them before I get an admin to udate the live one. Tim1357 (talk) 22:49, 28 February 2010 (UTC)[reply]
- Tim, all the regexps, when loaded, are inserted into this context: [ \@\º\ª\·\#\~\$\<\>\/\(\)\'\-\_\:\;\,\.\r\n\?\!\¡\¿\"\=\[\]\|\{\}\+\&]. For example (ass(es)?):
- [ \@\º\ª\·\#\~\$\<\>\/\(\)\'\-\_\:\;\,\.\r\n\?\!\¡\¿\"\=\[\]\|\{\}\+\&]ass(es)?[ \@\º\ª\·\#\~\$\<\>\/\(\)\'\-\_\:\;\,\.\r\n\?\!\¡\¿\"\=\[\]\|\{\}\+\&]
- So, it is not necessary to put \s like this \sass(es)?\s. Furthermore, \sass(es)?\s doesn't match asses!!!!!, but with the context above yes. Regards. emijrp (talk) 09:37, 1 March 2010 (UTC)[reply]
- Ok, cool. I took out all the \s's. See what I've done now (note I alphabetized them) Tim1357 (talk) 15:09, 2 March 2010 (UTC)[reply]
- So, it is not necessary to put \s like this \sass(es)?\s. Furthermore, \sass(es)?\s doesn't match asses!!!!!, but with the context above yes. Regards. emijrp (talk) 09:37, 1 March 2010 (UTC)[reply]
- Thanks for sorting the list. I have to develop a feature to sorting the list automatically. emijrp (talk) 16:33, 2 March 2010 (UTC)[reply]
(\w[a-z\s]{3,})\1{5,};;-2;;t;
What it does, essentially, is match any 4+ character string (letters and spaces only), that has been repeated 6 or more times. I think this avoids the problem you encountered with template parameters, as this regex requires that any string starts with a letter. Tim1357 (talk) 17:19, 2 March 2010 (UTC)[reply]
I notice ClueBot is 1RR-compliant with minimal exceptions. AVBOT made four reports concerning the same page in about ten minutes. Will this bot honor 3RR or 1RR? — The Earwig (talk) 03:57, 3 March 2010 (UTC)[reply]
- Quoting emijrp above "It is 1RR compliant for the same vandalism, but, if the user inserts a different bad word, it is reverted (it is a new vandalism)." Sole Soul (talk) 04:23, 3 March 2010 (UTC)[reply]
- Ah, didn't notice that for some reason. Thanks. — The Earwig (talk) 04:24, 3 March 2010 (UTC)[reply]
A preliminary analysis
[edit]The proposed edits are listed at
In total there were 8 reverts that were considered inappropriate; Emijrp has apparently addressed a few of them already. It would be helpful if everyone had a look at the inappropriate reverts and offered suggestions. Once the 8 reverts are addressed to everyone's satisfaction, perhaps another (longer) trial run would be appropriate. What's an acceptable false-positive rate? Josh Parris 05:09, 4 March 2010 (UTC)[reply]
- Yes, I agree, we should continue the trial. Josh, I wanted to point out that 4 of those eight were my fault (weighting things to heavaly, ect). Also an incentive to continue the trial, I added a substantial amount of regexes. Some of them may come with problems that I have not foreseen. Despite all the hiccups, I must say I am immensely pleased with the accuracy of your bot. This kind of open-sourced bot is exactly what we here need, so I thank you. Tim1357 (talk) 05:14, 4 March 2010 (UTC)[reply]
- Hi all. First, thanks for you work in both trial subpages: Josh Parris, Tim1357 and The Earwig. Yes, we have to do some more tests, so I will run the bot again when you approve a second trial. emijrp (talk) 16:42, 4 March 2010 (UTC)[reply]
- Approved for extended trial. Please provide a link to the relevant contributions and/or diffs when the trial is complete. You can continue your trial until you feel confident in the bot's abilities. No controversy in user-space edits anyways. When you want, we can get on with having the bot work in the real world, preforming reverts and such. For the meantime, I can help writing the templates. Can you give me some instructions on how to make them usable for the bot? Tim1357 (talk) 23:49, 4 March 2010 (UTC)[reply]
- Hi all. First, thanks for you work in both trial subpages: Josh Parris, Tim1357 and The Earwig. Yes, we have to do some more tests, so I will run the bot again when you approve a second trial. emijrp (talk) 16:42, 4 March 2010 (UTC)[reply]
While that trial's progressing, perhaps we could examine the unaddressed failures in the last trial:
- 2010-02-28 14:48:49.999603: Possible Vandalism in List of The Secret Life of the American Teenager characters by 96.245.119.123, reverting to 346426327 edit by 71.194.239.18
- This was reverted, among other reasons, because the edit included 'I love...'. Since the trial, we have reduced the score given for that particular regular expression. Tim1357 (talk) 17:28, 6 March 2010 (UTC)[reply]
- 2010-02-28 16:10:20.223026: Possible Test in Newport News, Virginia by 70.174.63.127, reverting to 346251432 edit by Lucasbfrbot
- Umm, this is a test. We define any form of 'hi' as a test. Tim1357 (talk) 17:28, 6 March 2010 (UTC)[reply]
- 2010-02-28 15:55:05.238836: Possible Vandalism in Internet slang by 81.204.228.34, reverting to 346792617 edit by Betsythedevine
- This one is tricky. While it is a vandalism, it is really close to the edge. I think what I'll do is add good points to an edit if there appears to be full sentences. Perhaps
[\w].*?(\s.*?\w){3,}.*?[\.\?\!];;+1;;g;; #Full sentences are good, must start with a letter, have 3 spaces, have 3 other letters, and end in a period, a question mark, or an exclamation point.
The only problem is that it will be put into the context:[ \@\º\ª\·\#\~\$\<\>\/\(\)\'\-\_\:\;\,\.\r\n\?\!\¡\¿\"\=\[\]\|\{\}\+\&]
which means it will screw with out regex a bit. In the end, slapping a ? on the beginning will ignore the first context and a .*? end of it will make it work with the second context. Finally:?[\w].*?(\s.*?\w){3,}.*?[\.\?\!].*?
Tim1357 (talk) 17:28, 6 March 2010 (UTC)[reply]
- This one is tricky. While it is a vandalism, it is really close to the edge. I think what I'll do is add good points to an edit if there appears to be full sentences. Perhaps
What can be done about these? Josh Parris 13:23, 6 March 2010 (UTC)[reply]
- Sorry about delay, I'm going to run a second trial in the next days. emijrp (talk) 20:28, 11 March 2010 (UTC)[reply]
- Any updates? Are you going to start the trial soon? — The Earwig (talk) 22:58, 19 March 2010 (UTC)[reply]
- I am running the second trial. Thanks for your patience. The log is in User:AVBOT/Trial. Regards. emijrp (talk) 13:01, 20 March 2010 (UTC)[reply]
- Can you add the score beside each suspected edit? Sole Soul (talk) 20:58, 20 March 2010 (UTC)[reply]
- I am running the second trial. Thanks for your patience. The log is in User:AVBOT/Trial. Regards. emijrp (talk) 13:01, 20 March 2010 (UTC)[reply]
Complex regexp
[edit]Please, be careful while adding complex regular expressions. This one: f+(.?)a+\1?g+\1?(g+\1)?(o+\1?t+\1?)?s*\1? has made some false positives (it matches the word "flag", detecting this as vandalism). It is better a lazy bot than an hungry one. ; ) Regards. emijrp (talk) 18:34, 21 March 2010 (UTC)[reply]
- Yep, Sorry about that. It wasn't me being lazy it was me not thinking. The new way of doing things make more sense (using [^a-z] instead of .) Thanks! Tim1357 (talk) 19:24, 21 March 2010 (UTC)[reply]
Messages
[edit]I started working on a list of message templates. here for a list. Im not sure how you wanted them to be set up so that they could work with the bot, but I guess that can be changed later. Tim1357 (talk) 01:40, 22 March 2010 (UTC)[reply]
- I'm going to check the templates in the next hours, please wait. emijrp (talk) 20:02, 28 March 2010 (UTC)[reply]
- I have checked the warning templates, they are very good. Tim, can you create the test warning too? Thanks. emijrp (talk) 13:07, 30 March 2010 (UTC)[reply]
- Erm, he can't create that page; it's admin-only. — The Earwig (talk) 15:23, 30 March 2010 (UTC)[reply]
- Oops, in User:Tim1357/uw-test1. emijrp (talk) 16:09, 30 March 2010 (UTC)[reply]
- Done. emijrp (talk) 21:32, 4 April 2010 (UTC)[reply]
What is your opinion about the link that AVBOT leaves in usertalk? Message, link to restore. It is a very easy way to restore a good faith edit in a false positive, but also, in the vandalism cases. emijrp (talk) 21:32, 4 April 2010 (UTC)[reply]
- Message headers do not seem to follow "standard" form for other manual and 'bot warnings. Should be == Month Year == so that (at a minimum) cluebot and others can escalate properly. For example, at User talk:99.225.199.97, if the 04:29, 5 April 2010 (UTC) message had been in a section titled "April 2010" instead of "Possible vandalism (Warning #1)", the subsequent warning would have been at level2 instead of another level1. DMacks (talk) 04:37, 5 April 2010 (UTC)[reply]
Moving forward
[edit]I'd like to move forward with this request. According to Tim, all (or nearly all) of the errors in the trial were caused by faulty regexes or scoring and have been corrected. If so, I'd like to:
- Stop editing the regex list as often, so we can get it stabilized.
- Approve this for a real-world trial, possibly week-long, so we can see how it fares when actually editing the wiki.
What do we think? — The Earwig (talk) 19:50, 28 March 2010 (UTC)[reply]
- Yep. Sounds good to me! Sorry about making all the faulty regexes. :_) Tim1357 (talk) 19:53, 28 March 2010 (UTC)[reply]
- No joke. Approved for trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (X! · talk) · @911 · 20:52, 1 April 2010 (UTC)[reply]
- To clarify: Mainspace trial. (X! · talk) · @636 · 14:16, 2 April 2010 (UTC)[reply]
- No joke. Approved for trial (7 days). Please provide a link to the relevant contributions and/or diffs when the trial is complete. (X! · talk) · @911 · 20:52, 1 April 2010 (UTC)[reply]
- OK, I will run this 7 days real-world trial in a few days. Thanks. emijrp (talk) 11:22, 4 April 2010 (UTC)[reply]
- 2 Things:
- You need to have
<!-- Template:uw-huggle1 -->
, somewhere on the vandal template, so that WP:Huggle can tell what level the warning is. Please put<!-- Template:uw-huggle{{{3}}} -->
right before the signature. - You also need to fix the diff part of the template. For example: The bot lists this as the diff on User talk:J4j93. Click on the link and see what's wrong, basically you need to change
&diff=next
to&diff=prev
. Thanks Tim1357 (talk) 23:11, 4 April 2010 (UTC)[reply]
- Second issue resolved. emijrp (talk) 16:40, 6 April 2010 (UTC)[reply]
- You need to have
- 2 Things:
- OK, I will run this 7 days real-world trial in a few days. Thanks. emijrp (talk) 11:22, 4 April 2010 (UTC)[reply]
- About the first issue, ClueBot puts <!- Template:uw-cluebotwarningX -><!- Template:uw-vandalismX -> [4]. I will change uw-cluebotwarningX to uw-avbotwarningX. emijrp (talk) 15:27, 10 April 2010 (UTC)[reply]
If user made a good edit then a second good edit then a third test edit, the bot would revert all three edits. We expect a vandal to repeat his vandalism. This is not necessary the case with test edits. Only the last edit should be reverted if it is a test edit. See [5], [6]. Sole Soul (talk) 04:15, 5 April 2010 (UTC)[reply]
- I'm not sure about that. Those two examples are really false positives. Does ClueBot work as you say? Please, more feedback about to revert only the last edit in test cases. emijrp (talk) 18:38, 5 April 2010 (UTC)[reply]
- I don't think that ClueBot revert test edits. Sole Soul (talk) 19:19, 5 April 2010 (UTC)[reply]
I'm working on this, please wait. emijrp (talk) 09:14, 25 April 2010 (UTC)[reply]
False positives
[edit]Hi, I'm working hardly in the false positive cases, adding new regexps using this list. About the messages above, I will reply them in the next hours/days and I will fix all the bugs. Thanks for you feedback. Regards. emijrp (talk) 15:50, 5 April 2010 (UTC)[reply]
- Do you think you are up to go again? Honestly we can't approve you for another go until the bot gets a slightly better track record. Nonetheless, this seems like an awesome bot, and I look forward to when I can finally approve it. Whenever you are ready, go again. Cheers. Tim1357 talk 06:15, 29 April 2010 (UTC)[reply]
- Ny updates? Tim1357 talk 02:48, 13 May 2010 (UTC)[reply]
- Echoing Tim here. What's going on with the bot? — The Earwig (talk) 20:17, 19 May 2010 (UTC)[reply]
- May be it is better to contact emijrp on the Spanish Wikipedia. Sole Soul (talk) 10:50, 20 May 2010 (UTC)[reply]
- Echoing Tim here. What's going on with the bot? — The Earwig (talk) 20:17, 19 May 2010 (UTC)[reply]
- If emijrp is not so active on the english wikipedia, I could run the bot for him. If you want that Emijrp, I'd be glad to help. Tim1357 talk 02:24, 21 May 2010 (UTC)[reply]
Hi all. I'm so sorry to say that I have no enough time now. Can this RFBA be frozen? Of course, the code is GPL, so you can run it, althought it is ugly and dirty, and the bug about sorting user talk warnings is not yet coded. Regards, and thanks again for your effort in this RFBA. emijrp (talk) 10:46, 22 May 2010 (UTC)[reply]
- Withdrawn by operator. Feel free to re-open at any time. I may do something with this idea later on, if you don't mind. Tim1357 talk 20:09, 22 May 2010 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.