User talk:ClueBot Commons/Archives/2010/December

This is an archive of past discussions with User:ClueBot Commons. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

deleted links

Why were all of my links deleted? One strike you're out Thanks! —Preceding unsigned comment added by Snjgwu (talk • contribs) 22:01, 28 November 2010 (UTC)

I made a response on the user's talk page. (Apparently many good changes to article One strike you're out were made via IP 151.200.121.221 then cluebotNG reverted them, then the user created his account. There's some discussion re this on the FP report page.) -R. S. Shaw (talk) 03:12, 8 December 2010 (UTC)

Just out of curiousity.........

.........is there anywhere that we can see whether or not we've beaten ClueBot NG to a revert?--5 albert square (talk) 01:36, 23 November 2010 (UTC)

Not currently, but beating Cluebot-NG is fairly rare. It only occurs a few times per hour, and even then it's a network fluke where the API is just being particularly slow on that call. Crispy1989 (talk) 01:42, 23 November 2010 (UTC)

ClueBot usually reverts an edit 3-5 seconds after it was made. mechamind 9 0 04:22, 24 November 2010 (UTC)

Thanks :) --5 albert square (talk) 21:08, 24 November 2010 (UTC)

You have only beaten the bot twice: 1 and 2. -- Cobi^(t|c|b) 13:19, 27 November 2010 (UTC)

Ah thanks Cobi, was just being nosey :) --5 albert square (talk) 19:51, 29 November 2010 (UTC)

false positive rate

{{Edit protected|User:ClueBot NG/Run}} I reviewed a couple dozen edits by ClueBotNG and found two false positives. The false positive rate is undoubtedly far higher than the claimed 0.25%. The bot is biting good-faith editors at an astounding rate. Please shut off the bot until a systematic, third-party evaluation is performed to explain out why its false positive rate is so high. To shut down the bot, follow the instructions at the top of User:ClueBot NG. --Stepheng3 (talk) 23:53, 26 November 2010 (UTC)

I fully agree. I examined the most recent 248 reverts ending at 04:49, 27 November 2010. On average, out of every 248 edits there should be 0.62 incorrectly reverted edits. I found many more than that (13/248 = an alarming 5.24%), so many that it is impossible to disagree. PleaseStand ^(talk) 05:38, 27 November 2010 (UTC)

The naive way to calculate false positives: The range of the edit IDs you claim: 399080486 - 399061905 = 18,581. The false positives: 13. The naive false positive rate is: 13 / 18581 * 100% = 0.07%.

If I look at the database to see how many in that range are main space edits:

mysql> SELECT COUNT(*) AS `count` FROM `revision` JOIN `page` ON `rev_page` = `page_id` WHERE `page_namespace` = 0 AND `rev_id` > 399061905 AND `rev_id` < 399080486\G
*************************** 1. row ***************************
count: 12438
1 row in set (0.63 sec)

There are 12,483 edits between those edit IDs in the main space. That is, ClueBot NG has categorized 12,483 edits, and classified 13 incorrectly as vandalism. The real false positive rate is: 13 / 12483 * 100% = 0.10%.

Both of these show that the bot is well under its 0.25% false positive rate. Thanks. -- Cobi^(t|c|b) 11:13, 27 November 2010 (UTC)

Hi. If you use that way for counting, I think that the false positive rate must be selected much lower. 12483 edits may contain a lot of admin edits, trusted/veteran/rollback users, and of course bots! (interwikis, tagging, and other tasks). If you remove that good edits, how many potentially harmful edits exist and need to be checked by ClueBot NG? Do you seriously count skiping admin edits (and Jimbo edits) as a success? That is obvious! In that case, a 0.25% false positive rate is not a big deal. emijrp (talk) 13:30, 27 November 2010 (UTC)

If you want, I'll recalculate false positives by first discarding edits by users with more than 50 edits.

mysql> SELECT COUNT(*) AS `count` FROM `revision` JOIN `page` ON `rev_page` = `page_id` JOIN `user` ON `user_id` = `rev_user` WHERE `page_namespace` = 0 AND `rev_id` > 399061905 AND `rev_id` < 399080486 AND `user_editcount` < 50\G
*************************** 1. row ***************************
count: 855
1 row in set (3 min 19.45 sec)

mysql> SELECT COUNT(*) AS `count` FROM `revision` JOIN `page` ON `rev_page` = `page_id` WHERE `page_namespace` = 0 AND `rev_id` > 399061905 AND `rev_id` < 399080486 AND `rev_user` = 0\G
*************************** 1. row ***************************
count: 3494
1 row in set (0.27 sec)

This means that there were 855 edits by registered users with less than 50 edits, and 3494 edits by anonymous users. 13 / (855 + 3494) * 100% = 0.29% which is within an acceptable statistical error of 0.25% (0.04% deviation). This discrepancy is likely due to the small sample set. -- Cobi^(t|c|b) 13:44, 27 November 2010 (UTC)

The problem here is that nobody cares about the false negatives. False positives are harmful. If the last 248 reverts by ClueBot NG contain 13 false positives, ~5%, that is an huge percent. I think that the false positive rate must be set the lowest possible, and we need to put our efforts in improving the dataset. emijrp (talk) 14:19, 27 November 2010 (UTC)

“

If the false positive rate is [...] constant [...], it can also be interpreted as the expected proportion among all tests performed that are false positives.

”

— Wikipedia, False positive rate

-- Cobi^(t|c|b) 15:24, 27 November 2010 (UTC)

I was misled by the terminology. I was looking at the fraction of bot reverts that were not vandalism, which seems to be around 5%. I went and looked up the definition. Technically, the false positive rate is defined as the proportion of absent events that yield positive test outcomes, i.e., the conditional probability of a revert given a good-faithe edit. I wonder if the good people who approved this bot were misled as I was.

An FPR of 0.25% is way too high because each false positive does an incredible amount of harm to the Encyclopedia by alienating a well-meaning contributor. Reading the comments at User:ClueBot NG/FalsePositives/Reports is painful, heartbreaking. This bot is driving people away from Wikipedia.

If the bot looks at 20,000 good-faith edits per day and bites about 0.25%, that's 50 good-faith editors per day. That's a statistic that pains me. I want this bot stopped. I want it stopped until the frequency of false positives is brought below one per day. Are any admins reading this? Please look at what the bot is doing and decide for yourself.

I believe that most of the FPs are going unreported, so I'll be reviewing the bot's edits and reporting false positives whenever I get the chance.

--Stepheng3 (talk) 18:15, 27 November 2010 (UTC)

It was explained fairly clearly early on. -- Cobi^(t|c|b) 18:21, 27 November 2010 (UTC)

The bot was approved based on the claim that there would be "only a few false positives per day". I am seeing about eight per hour. What has gone wrong? --Stepheng3 (talk) 19:07, 27 November 2010 (UTC)

I agree that something should be done about the rate of false positives, however, that's not really what {{editprotected}} is for, so I've disabled the template. HJ Mitchell | Penny for your thoughts? 23:27, 27 November 2010 (UTC)

I've gone ahead and boldly stopped the bot. --Stepheng3 (talk) 23:40, 27 November 2010 (UTC)

The false positive rate of 0.25% has been set and accepted by the Wikipedia BAG.
The meaning of false positive rate is clear, and has been clarified and explained numerous times.
The false positive rate of 0.25% is calculated and verified during a dry trial of random edits, verifying its correctness.
The numbers posted here of false positives fall well within the expected 0.25% false positive rate, further confirming its accuracy.
The warnings left by the bot make it clear that false positives happen, and give clear instructions how to undo incorrect reverts. This also links to a userpage section explaining the concept in more detail.

If any of these items are incorrect, please correct me. I am impartial as to where the FP rate is set. Statistics of the bot's efficacy indicate that it is most effective with minimum false positives at a rate of 0.25%. Decreasing this will lessen the bot's catch rate, but it will still function. If you disagree with my and others' current assessment that 0.25% is an acceptable false positive rate for the volume of vandalism caught, and would like to suggest a different rate, please do so, and we will evaluate the bot based on the given rate, post the results, and recommend whether or not the suggested FP rate is within reason.

I should also note that, where there are false positives, they are frequently not due to things like bad words, as they have been with previous bots. Because of this, the user is less likely to think that they have done something wrong, and more likely to correct the revert and report it. Crispy1989 (talk) 01:21, 28 November 2010 (UTC)

I suggest a false positive rate that will result in "only a few false positives per day" to match the claim made at User:ClueBot_NG/B#Trial_Summary.

I spent many hours today reviewing each of the bot's edits, and on the basis of that sample, I assert that most false positives are not being reported. I'm unsure why this is, but it might have something to do with the complex and intimidating nature of the reporting process or the fact that most of the editors impacted are not logged in. --Stepheng3 (talk) 06:27, 28 November 2010 (UTC)

A few false positives a day is relative, and actual number of false positives is relative to total edits per day. One in four hundred edits constitutes "a few" to me. It is entirely possible, and likely, that many false positives are not being reported. However, the 0.25% false positive rate is an accurate maximum. Crispy1989 (talk) 06:35, 28 November 2010 (UTC)

In case any of the posters here are confused, I will reiterate how the false positive rate ties into the bot's function. I have already explained this repeatedly on the BRFA and the bot's user page, but then again, we've also repeatedly explained the meaning and calculation of "false positive rate", so I'll once again reiterate the significance of the false positive rate in relation to the bot's algorithm. This is explained in much more detail on the bot's user page, so I strongly encourage posters to read what is there before posting additional comments here.

The bot's algorithm generates a probability that a given edit is vandalism. This generated probability is then compared to a threshold to determine whether or not it should be reverted. This threshold is adjustable - if the threshold is too low, there will be many false positives. If the threshold is too high, the bot will not do its job, and will not revert much vandalism. For ease of understanding the bot's impact, the threshold is automatically calculated based on a set false positive rate. The false positive rate can be easily adjusted simply by changing a configuration file. The higher the set rate, the more vandalism is caught, but also the more false positives. The FP rate is given as a percentage, and not as a raw number of edits, because a percentage is much more significant than a raw number. Details on the FP rate's calculation and why it is accurate are available on the bot's user page.

The false positive rate of 0.25% was chosen by examining result graphs and checking for a dropoff point. This point is right around a FP rate of 0.25%. It is half of what was originally suggested to me - 0.5%.

Approximately 10% of all edits on Wikipedia are vandalism. Vandal fighters spend countless hours perusing recent changes and manually reverting vandalism - countless man-hours spent just trying to preserve the current data, when that time could be spent adding more. And even with the thorough efforts of vandal-fighters, a large amount of vandalism is not caught at all, or at least for a significant period of time. A bot such as Cluebot-NG not only allows vandal fighters to spend more time editing and less time reverting, but also prevents many instances of vandalism from slipping through the cracks. What good is an encyclopedia if its information cannot be relied on? The reality is, undetected vandalism on Wikipedia causes significantly more problems than (theoretical) lack of a few minor edits. I, for one, am tired of hearing numerous people refuse to accept Wikipedia as a reliable source, simply because "anyone can vandalize it".

False positives, while indeed unfortunate, are not at all difficult to fix. In fact, clear instructions are given in the posted warning explaining how to redo the edit, and remove the warning. Although many false positives probably do go unreported (note: this does not affect the calculated FP rate), this does not mean they are uncorrected.

I expect those opposed to the bot to clearly admit (with reasoning) that they feel that stopping less than 1 in 400 (easily-corrected) false positives is worth the bot not catching more than 200 in 400 vandalism edits.

As a final note - I do not expect everyone to fully understand the algorithm that is used to detect vandalism. However, I do expect those who complain, and indeed directly interfere with the bot's function, to read the clear information that is available before making such complaints. In this way, the user may be able to contribute helpful advice or suggestions, rather than rant.

Those wishing to suggest a different false positive rate should suggest a percentage, as this is what a false positive rate is. I cannot evaluate subjective or relative phrases (if anyone has suggestions as to how I can program a machine to follow a subjective measure, I'd be very willing to listen and consider it). With a finite number to work with, I can give actual statistics on how much extra work this would cause vandal-fighters, and for what fraction of a quarter of a percent gain in false positive reduction it would cost. Crispy1989 (talk) 06:48, 28 November 2010 (UTC)

To get away from subjective measures, let's say that "a few" means 5. How many edits does the bot consider in a typical day? Call that X. I don't know what X is, but presumably the owners of the bot can find out. So the number of good-faith edits is about 0.9*X. What I'm asking for is a target FP rate of 555/X percent. If the bot considers 20,000 edits per day, that would point to an FP rate of 0.028%: 5 FPs out of 18,000 edits.

Even if ClueBot NG were 100% effective at reverting vandalism, people would still find inaccuracies in Wikipedia, so it still couldn't be relied upon. Most of the fixes that ClueBot NG applies have little impact on effective accuracy of the encyclopedia. It's most effective at removing bad words and pictures of body parts. A serious user looking for information is not going to be misled by such distractions.

Just because ClueBot NG allows a few bad edits through does not mean that those edits will stand forever. We do have other tools for removing vandalism.

Please keep in mind that Wikipedia is not just a collection of data. It is also a community of editors. A few of today's clueless IP editors will go on become next year's vandal fighters and administrators -- if they're not bitten on their fifth or sixth edit by a bot that can't distinguish between vandalism and a good-faith attempt to contribute.

--Stepheng3 (talk) 07:21, 28 November 2010 (UTC)

You did not address most of my points. Primarily, you seem to entirely disregard the countless man-hours human vandal fighters spend on the task (you refer to them as "other tools" - without a bot as a first line of defense, the human vandal fighters are the only other way to revert vandalism). As I mention (and you ignore), the time these experienced human vandal fighters spend is at least as valuable as the small amount of time spent by new users correcting false positives. Vandal fighters are intelligent people capable of adding much to the encyclopedia, if they weren't so dedicated to keeping it clean for the rest of us - they are not "tools". Additionally, although more than half of vandalism is caught by human editors (without any bot running), some *does* inevitably get through, and that makes it clear to people using Wikipedia that it cannot be trusted - much moreso than minor inaccuracies such as are present in any other encyclopedia. And, contrary to your statement, the bot reverts much more than obvious vandalism and bad words. The section on the algorithm explains how, and other posts on this talk page contain individual examples from grateful users. You also make the assumption that all users subject to false positives are deterred by the event. Considering you also ignore my statement about the ease of fixing a false positive, this is not surprising. If a false positive were a permanent scar on a user's record, or required hours of time to fix, then yes, you would be correct. But this is not the case.

You should also keep in mind that the bot is not fully approved. It is in a trial period, and it is living up to its stated stats. The organization responsible for determining whether or not the bot is operating within acceptable limits is the BAG, and the purpose of the BRFA is to gather community input. I am confused as to why you are complaining here, when your chance to prevent the bot from being approved lies with the BRFA. If you are serious about stopping the bot, I suggest you continue this on the BRFA, where BAG members will see your complaints, and may take them into consideration.

The false positive rate you suggest is unreasonably low. I cannot even give accurate statistics based on that FP rate, because the trial dataset we use is not large enough.

The administrator shut-off is intended to be used when the bot is behaving unexpectedly. Right now, it is behaving exactly as intended. The bot will stop when the trial ends - either when the 14 days are over, or if a BAG member orders it to be stopped prematurely. At that point, it will only be restarted if it is approved by the BAG. I expect you to bring this discussion to the BRFA, and unless a BAG member decides soon to end the trial early, I will restart the bot so it can complete its trial. Crispy1989 (talk) 08:02, 28 November 2010 (UTC)

The quid is that you can't disrupt newbies editing Wikipedia because you want to revert more vandalism. Obviously, if I code a bot which reverts all the edits, I will get a 100% vandalism free Wikipedia, but, I will lose 100% of good faith edits too. We have seen that 0.25% is very high, 13 false positives in ~250 bot reverts. I doubt that the previous anti-vandalismbots had that high false positive rate. I would like to help you to improve the IA of the bot, because that is the solution to this problem. We can't to catch more vandalism while disrupting good faith edits. That breaks the Wikipedia model. Regards. emijrp (talk) 11:45, 28 November 2010 (UTC)

__

The gist of User:Stepheng3's argument appears to be that novice editors will be driven away from Wikipedia when they discover their good-faith edits have been reverted by User:ClueBot_NG. Assuming this argument is made in good faith, it would be appropriate to provide some evidence.

Since none has been given, it's fair to respond with gut feeling: personally, i feel common sense will prevail. If the revert note states it's been instigated by a bot, a human will not feel unfairly criticized. Exceptions prove the rule.

I came across this discussion when i saw ClueBot_NG had reverted vandalism on the antimatter article within a minute of its occurrence. I thought this was pretty damn cool.

Reading on, i'm surprised that a single user can stop such a project in its tracks. After all the prep and committee-ese that's apparently gone into this trial, isn't that .. inappropriate?

As an aside, i did a quick review of the edits listed at the beginning of this section; in my opinion, at least two are not false positives at all

and i only found one or two that i would not agree with as a human editor. So if this is supposed to be evidence of an unacceptably high false-positive rate, i'd say: your data does not support that conclusion. Doceddi (talk) 12:01, 28 November 2010 (UTC)

I spend most of my editing time on recent changes patrol / anti-vandalism. I've seen several hundred Cluebot (not distinguishing between various incarnations) reverts and have only reverted two or three of the Cluebot reverts as false positives. As I'm largely editing outside areas I know very much about when on patrol, I'm pretty sure that my own false positive rate is far higher than Cluebot's. It is claimed that if we didn't tools like Cluebot automatically reverting vandalism it would eventually be corrected by human editors but I have doubts about this approach:

We would need many more editors to be actively involved in anti-vandalism patrols to pick up the workload
Many, usually smaller, infrequently visited pages would only be viewed periodically and the vandalism could sit there a long time
Editors who are patrolling recent changes are working to keep Wikipedia at the level of correctness and completion it has now and aren't working to improve Wikipedia

Sure, improve and refine the tools, but let's not abandon the ones we have. Kiore (talk) 18:40, 28 November 2010 (UTC)

Thank you Crispy for pointing me to the correct place to get my concerns addressed. (How strange that BRFA was not suggested earlier.) I will take my concerns there. However, I'd first like to correct one misconception here.

I never referred to human anti-vandals as "tools"; human anti-vandals (including myself) were the "we" in that sentence. By "tools" I was referring to inanimate things like Wikipedia:Twinkle and the old User:ClueBot. --Stepheng3 (talk) 18:59, 28 November 2010 (UTC)

The BRFA is indeed where this kind of discussion should occur. I assumed you started it here because you did not want their involvement for some reason (considering you do know of the BRFA's existence, have read part of it, and have seen the edit summary left by the bot). Tools like Twinkle and Huggle are not what revert vandalism - ultimately it is the humans and their time that is at stake. The old ClueBot only caught about 5% of vandalism, so, while helpful, it could not make nearly the dent in vandalism the Cluebot-NG can. Also, it's incorrect to assume that it had fewer false positives, particularly when you compare the catch rates. The old ClueBot may have had fewer reported false positives, because the old ClueBot's false positives were predictable, and usually triggered by bad words used in acceptable contexts, or similar. This led users that were subject to false positives to think they had done something wrong. Cluebot-NG's false positives are often not at all what one would expect, and users can clearly see that it is not their fault, and is indeed bot error. Crispy1989 (talk) 19:13, 28 November 2010 (UTC)

I restarted the bot. (Any autoconfirmed user could've done this, had they tried.) And here is the BRFA link, in case people care to discuss things futher: Wikipedia:Bots/Requests for approval/ClueBot NG. --Stepheng3 (talk) 19:27, 28 November 2010 (UTC)

Nobody restarted it themselves because 1) We wanted to hear your entire argument and reasoning first, 2) We don't disrespect and revert other editor's decisions without doing our own thorough research first, 3) We wanted to hear some wider input from others, and 4) We didn't want to start an edit war. Crispy1989 (talk) 19:30, 28 November 2010 (UTC)

ClueBot V?

What happened with ClueBot V? I keep seeing things related to it every once in a while, and it looks like it would have been a good thing to have running; is it ever going to become active? ~~ Hi878 ^{(Come shout at me!)} 01:40, 29 November 2010 (UTC)

ClueBot V was registered as a possible name for ClueBot NG, I believe. As far as I know, current active bots are: ClueBot, ClueBot II, ClueBot III, ClueBot IV, and ClueBot NG.-- SnoFox^(t|c) 02:28, 29 November 2010 (UTC)

No, ClueBot V was going to be used for new page patrolling, originally. You can see the request for approval here. I would love to know if anything is ever going to come of that. ~~ Hi878 ^{(Come shout at me!)} 03:55, 29 November 2010 (UTC)

It may be a project put on hiatus; it hasn't made any contributions since '08, and it appears all of Cobi's efforts are focused on ClueBot NG, as well as other, non-Wikipedia-related projects. -- SnoFox^(t|c) 05:42, 29 November 2010 (UTC)

It seemed like an interesting idea; I hope that something comes of it. ~~ Hi878 ^{(Come shout at me!)} 06:44, 29 November 2010 (UTC)

Cluebot NG will be able to be easily adapted to patrol new pages in addition to existing page edits. We will look at adapting it to additional usage scenarios after its use in patrolling main namespace edits is perfected. Crispy1989 (talk) 06:46, 29 November 2010 (UTC)

Cluebot NG New False Positive Rate

Due to complaints from a small (but apparently very vocal) minority that 1 in 400 false positives does not justify a 50+% vandalism catch rate, Cluebot NG's false positive rate (and vandalism catch rate) have been reduced by about half. It's still about 3x as effective as the previous Cluebot, but will remain crippled for now until the developers and dataset contributors can bring it back up to full potential while maintaining the 1 in 1000 (0.1%) false positive rate mandated by critics. Sorry all.

Those wishing to help bring the bot back up to full potential can contribute to the dataset review interface. Crispy1989 (talk) 05:57, 29 November 2010 (UTC)

Boo! Guess it's time for me to head on over to the review interface and get back to work. I feel like {{trout}}ing someone! -- SnoFox^(t|c) 06:05, 29 November 2010 (UTC)

Problem with the bot

I originally posted this at User talk:ClueBot NG/FalsePositives, but after further exploration of your talk pages thought this might be a better place.

At User talk:80.189.151.45, the bot failed to recognise that its warnings were being challenge and struck by another editor. Bot merrily continued giving escalating warnings for what were in fact good-faith edits. I'll add that the method of communicating with the bot operators is bloody awful. DuncanHill (talk) 18:00, 29 November 2010 (UTC)

The bot checks if the template is on the talk page. The first warning says to remove it if it is in error. (Yes, you can remove the warning, it won't insult the bot) As for the method of communicating, if you have any suggestions to how we can improve this, I'd love to hear it :) -- Cobi^(t|c|b) 18:06, 29 November 2010 (UTC)

Is there no way to recognise struck templates? As for communicating, following the link in the edit summary takes you to an incomprehensible page, which then has a link to another page with overcomplicated procedures to follow to report a false positive - you then have to navigate away from that page back to the talk page the incorrect warning was on to get a number and then navigate back to the other page to actually fill it out. DuncanHill (talk) 18:19, 29 November 2010 (UTC)

Re the same editor: why did Cluebot decide that the editor's 3rd and 4th edits were vandalism? They were additions of sensible text to articles. Does Cluebot just assume that if someone's first two edits have been vandalism then everything they do thereafter will be so too? Seems unwise. PamD (talk) 12:28, 30 November 2010 (UTC)

I'll leave the question about recognizing struck warnings up to Cobi, as he is the developer in charge of that code, but I can reiterate what he said about the original Cluebot using the same logic, and never receiving a complaint about it in 3 years.

About the vandalism detection algorithm - details are available on the userpage. There are no "assumptions" made. One input to the neural network is percentage of previous edits that were vandalism. The neural network finds the optimum function for using this in the determination of vandalism. Alone, 100% previous vandalism edits is not enough to flag an edit, but because the probability of lack of good faith is significantly increased if 100% of previous edits were vandalism, edits which may have not otherwise been classified as vandalism, may be classified as such, in light of the warnings.

We tried running the bot without warning count as previous data, and it decreased its overall accuracy, so it is indeed a useful metric that contributes much more good than harm to the neural network's output.

About it being hard to communicate with bot operators ... you've gotten how many responses, by multiple operators, in under 12 hours? If our response time isn't satisfactory, we may have to make a bot to response to questions for us. Crispy1989 (talk) 15:01, 30 November 2010 (UTC)

There was no response at the false-positive report page (just as there was no response to most of the posts there). I suppose if you think that "make a report, get no response, same problem recurs, wander around userspace trying to find the right place, find the right place, comment, get some sort of answer (on the lines of "we like what it does")" is satisfactory.... well, I won't finish the sentence, I'm sure you can do it yourself. PamD and I are both experienced editors. We've both been confused about how to communicate difficulties. We are far from being the only editors to question the bot's behaviour, especially with regard to false positives. DuncanHill (talk) 16:07, 30 November 2010 (UTC)

I'm not sure quite what you expect us to do.

Make a thorough tailor-made response to each FP report, making users feel better but never actually having time to improve the bot.
Remove false positive reporting entirely so users have no opportunity to help.
Make reporting as difficult as possible (no, it's not difficult to follow precise instructions and click a button) so we don't have anything to review.
Ignore all questions, here and elsewhere.
Make false-positive reporting freeform, so we don't even have time to review responses (trying to wade through malformed responses and lack of information), much less act on them or respond to them - and additionally make it impossible to automatically scan them and add to the dataset.

If you look on the BRFA, you'll find that initially FP reporting was freeform, leading to impossibly malformed responses, and then FP reporting was using a template, leading to over half of the users breaking the format anyway.

If you would like to choose one of the above options, would like to suggest a better way of getting properly formatted false positive reports, or would like to make a reporting interface yourself, we'd love to have it. So far, your responses have been mostly criticism without suggestions - the only suggestion I can spot is "reply to every FP report", but unless you choose option #1 above, this is unreasonable.

Also, I don't see where you're getting the idea that we're saying "we like what it does and we won't improve it". I feel very unappreciated spending hours a day myself (let alone other developers) doing very complex work. If nothing else, it should be very clear that we're spending our time continually improving it. Crispy1989 (talk) 16:31, 30 November 2010 (UTC)

I'm not even seeing any acknowledgement of most reports - surely not too hard to give a "noted and acted on" or "reviewed and not a false positive" response? That is, of course, assuming that you actually are reviewing false-positive reports. I have suggested giving a talk page acknowledgement, and surely it's not too hard to make a better link in edit summaries to report? You feel unappreciated? How about people given incorrect warnings or blocks (yes, I know you'll say that it's not your fault if admins make the mistake of relying on what your bot tells them)? Or how about editors who've tried to help when they see mistakes and get told you won't do a thing about it? You're not being quite as unhelpful as BC used to be, but you're not far off. DuncanHill (talk) 16:52, 30 November 2010 (UTC)

The bot brings it to their attention, it does not tell them to block. "Or how about editors who've tried to help when they see mistakes and get told you won't do a thing about it?" -- I don't think we've ever said that. -- Cobi^(t|c|b) 17:00, 30 November 2010 (UTC)

False positives are fed into the review interface, as almost all of them can only be helped by dataset improvement. If you'd like to make a bot that waits a random amount of time and then posts "Noted and acted on" on each FP report, be my guest. If we see false positives that may be helped in specific ways other than dataset expansion, we do make a note of it, unless we've already noted the same on an earlier FP.

The current link in edit summaries points to the BRFA, as it should. Edit summaries have a limited length and can't include an entire TOC for the bot. Perhaps it could be gzip'd and base64'd to fit. After all, wouldn't that be easier than clicking a link and seeing a box at the top of a page saying "To report false positives, click here"?

People are not given incorrect blocks due to CBNG. Even if admins do fail to verify a single warnings by the bot, the chances of false positives of more than one, per user, per some length of time, are astronomically low. People given incorrect warnings are clearly encouraged to remove it, and the first warning makes it very clear that it's not personal, not to mention lengthy other information it links to.

As I already stated (and I hate repeating myself), we spend a great deal of time developing and improving the bot. We have not and would not say that we won't try to improve it, in any way shape or form. And just some friendly advice - if you make stuff like that up, it greatly decreases your apparent credibility.

I also note that I still see no actual suggestions, aside from another repetition of "Respond to every report", which I already addressed. If you have such suggestions, please post. Crispy1989 (talk) 17:17, 30 November 2010 (UTC)

PamD and I came here in good faith to report problems, and what we got was "it's very hard, it's not our fault if people don't understand the instructions we've given, and we don't get enough appreciation". If you don't want comments from people who can't re-write your bot for you, then say so at the top of the page and stop wasting our time. Your (meaning the groupaccount) seem very resistant to criticism, as the BRFA page makes clear. DuncanHill (talk) 17:55, 30 November 2010 (UTC)

False positives are a problem, but an inevitable one. Please read the description of the algorithms on the user page. I don't know how many more times I have to state, "We spend great deals of time trying to make it better" before you understand it. Much of this work is very difficult, but I believe the only place I have stated "It's too time-consuming to be practical" is where you suggest manually responding to every FP report. If you do, in fact, think it's practical, and that posturing is more worthwhile than actually improving the bot, then come right out and say it, stop beating around the bush.

The instructions given are clear. More clear, in fact, than any other method we've tried. Again you still suggest no alternate solution. And considering that false positives are being reported, people do seem to be understanding them.

My comments about lack of appreciation stem mainly from the fact that you are making things up (such as the aforementioned nonexistent quotes) to flame us. We don't need a pat on our collective back, or even acknowledgment, but it sure would be nice if people wouldn't do their best to flame us without cause.

I don't see how you can believe we are resistant to criticism, considering we've implemented several suggested features. The only criticism we've argued against are "Too many false positives" and "You don't comment on every FP report". In both of these cases, we have explained very thoroughly, and in great detail, what the situation is, and why it is that way. And in neither case did we receive helpful suggestions.

I am not asking anyone to rewrite the bot. The comment I made was referring to your lack of suggestions. Since you're still complaining, you must have some brilliant idea about how to solve the problems you focus on, and since you don't seem to be able to communicate your solution in English, I figured you should be able to do so in code.

About the wasting your time comment ... please step back for a moment, and look at this discussion, impartially. It consists of you repeatedly complaining about the same thing, without any suggestion how to fix it, and us responding with nearly the same thing each time. Not to mention the random quotes you make up. Considering that I have already responded to your concern, multiple times, to be met with nothing but flames and nonconstructive (no helpful suggestions) criticism, I am only responding out of courtesy.

Your comparison to BetaCommand may be partially accurate. I'd imagine different people respond in similar ways to repeated flaming without constructive comments or suggestions.

Your first post was helpful, alerting us to a potential concern that FP reporting was too difficult, and was not generating enough feedback. After careful consideration, and a clear response from us, you have done nothing but repeat yourself and flame. It is no longer constructive.

Please respond again, only if you have a helpful, practical, and useful suggestion on how to fix the problems you pose - and one that I have not already addressed. Crispy1989 (talk) 18:21, 30 November 2010 (UTC)

Since there appears to be significant disagreement (edit: between Crispy/Cobi and DuncanHill) over the usability of the reporting interface I decided to find and report a false positive to see what it is like. I paged through ClueBot NG's contribs until I could actually find a false positive to report (it took me over 100 diffs before I came across one) and overall I have to say that the reporting interface, is (to me at least), very easy to use. While having to go back to the article get the Revert ID isn't desirable however I don't see any practical way around that, and there are fewer steps involved in that than there would be to have them post a diff and isn't at all too difficult to do if you keep the review interface open in a separate tab. One possible solution could be to include a link to the reporting interface directly from the edit summary and use the referrer to determine which diff to report, however I'm not sure doing so would be allowed by policy especially with the leaking of an IP address to an external server. Alternatively you could have the user type in the name of the page they were editing but due to spelling issues I'm not sure that would be any more desirable than the currently solution.

One problem that I did notice was that on the report interface seemed to be overly cynical and seemed to have the assumption that the user was there to make a bad-faith report. While there is no question that bad faith reports will take place I think that the form should Assume Good Faith at the very least. The current options on the form for the reason for reporting the false positive were:

"I didn't make this edit."
"My friend/brother/son/pet made that edit."
"I made a mistake."
"Someone vandalized."
"I vandalized."
"ClueBot NG didn't catch some vandalism."
"ClueBot NG reverted a good edit."
Other

All of these give the feeling that the report will be filed straight to /dev/null and that the reporting system is just for show rather than a serious attempt to improve the bot. The fact that there are no follow up or result posts on false positive report page reinforces this notion. As a solution I would propose rewriting the reporting reasons in the form and posting a summary every week or so to acknowledge the reports are being followed up upon. If there could be an automated message from ClueBot Commons that the false positive was added to the review interface and classified that would also go a very long way towards dispelling that notion. --nn123645 (talk) 22:27, 30 November 2010 (UTC)

All except "ClueBot NG reverted a good edit" will go to a page with information about why it is not a false positive. I suppose I should make those sound less bad faith, but they were the most common reasons given for reporting non-false positives at the time, and I figured that answering it up front with a FAQ would be more beneficial than to have it reported and then have it declared invalid.

So you don't have to look at each option, the options give these responses, in order:

If you didn't make that edit, you do not need to report it.
If you didn't make that edit, you do not need to report it.
If you made a mistake, this is not the right place to report it.
If someone vandalized, then ClueBot NG did exactly what it was supposed to do.
If you vandalized, then ClueBot NG did exactly what it was supposed to do.
If ClueBot NG missed some vandalism, please report it on my talk page instead. Missing vandalism is a false negative, not a false positive. Thanks.
<Inserts false positive>, <link to go back to the reports page>
This page is only for reporting mistakes that ClueBot NG makes. Please leave your note on my talk page instead. Thanks.

I am planning on rewriting the interface so it looks nicer, and is friendlier, and so that it is easier for us to review and follow up on the false positives, and make it easier to generate reports of what has been handled, and what will be handled.

As for linking from the edit summary, there is only so much space in the edit summary, and links aren't clickable.

Thanks. -- Cobi^(t|c|b) 22:45, 30 November 2010 (UTC)

That's a good point - those entries are leftover from the old Cluebot. I didn't write the originals, but knowing the person who did, I think they're meant to be a bit humorous. I agree that it's not really an appropriate place for these options in any case, and I propose changing the list items to:

This is a perfectly good edit, and was incorrectly reverted.
This is a poor edit, but was made in good faith, and should not be reverted as vandalism.
This is a bad edit that should be reverted for some reason, but is not vandalism.
This edit is not vandalism: Other.
This is vandalism, but it wasn't me that made the edit.
This edit was made by accident.

The final two of these would be accepted but ignored, and the others would be properly logged. Does this list look better? Suggested amendments?

About the feedback, as I mentioned above, we simply don't have time to manually give feedback on the majority of false positives. However, a good solution might be to provide a confirmation on form submission that explains what the edit will be used for (something along the lines of "This edit will be reviewed by humans to ensure it is not vandalism, and if it is not, it will be used to help improve the bot") along with some other information with specifics, and a note that says the user can remove the warnings and redo their edit.

It might be possible to post automatic messages periodically, and automatic statistics on the dataset are already on our TODO list, but it's relatively low priority. However, changing the list and confirmation message should be able to be done much sooner, as it wouldn't take much time. Crispy1989 (talk) 22:45, 30 November 2010 (UTC)

Thanks for the quick reply :D. I figured they were meant to be humorous and actually did have a bit of a laugh out of it. I would definitely agree that some people have a nearly infinite capacity for denial, especially when they think that doing so will get them what they want (to be unblocked/have cluebot ignore their vandalism). I think the wording of the options you have above is neutral and non-judgmental, and would fine. The key thing on responding to feedback is to just let people know that you are looking at them, so posting a weekly, bi-weekly, or even monthly comment on some of the major false positives and that their concerns have not fallen on deaf ears is really all that is necessary. Obviously it is incredibly impracticable to expect any one person to spend several hours a week replying to every one of the false positive reports. If the job could be delegated it would be far more manageable but I think delegation would be more hinderence than help. Expanding the reporting form to provide more information on the process of false positive reporting is a good idea and one that definitely should be implemented. Overall I think the reporting form is a good solution and goes a long way towards solving the problem of malformed requests. I still think going with the referrer, if it is available, would probably be the best way to go on the user talk pages with a fallback to the revert ID. While the current interface obviously isn't going to win any awards to design, I believe it does the job quite well and from a usability perspective is far better than trying to get new users to fill out a template or otherwise post in a uniform manor. --nn123645 (talk) 05:32, 1 December 2010 (UTC)

Cobi says he's working on a new interface (or at least some significant rewriting), so he may be able to incorporate things such as the referrer (I'm not sure if that's practical or not - Cobi has much more experience with programmatic interfaces to Wikipedia itself). I'll check with him if it's possible to implement the new list items and confirmation page now, or if that is planned for the rewrite - most likely they'll be able to be changed now, so it's "nicer", pending the rewrite.

Delegating the job of false positive review may be possible, but the person(s) actually doing the review would have to be trained to recognize certain things, and more in-depth about how the core algorithms work, so the reports would actually be helpful or useful. If there are any volunteers willing, I'd be able to spend a few hours on the training. However, even with thorough and accurate individual review, I'd only expect one in a hundred (decreasing over time) to be an issue not related to the dataset.

It might be possible to link with the review interface to at least generate reports on how many were genuine. Cobi wrote the review interface - not me - so I may be entirely wrong about this, but it could be possible to generate these reports per some unit of time based on the data. However, they're two entirely disjoint systems, running on different servers, in different languages, and different access mechanisms, so I can't guarantee anything without input from Cobi. Crispy1989 (talk) 10:03, 1 December 2010 (UTC)

On second thought a get variable would probably be a better approach. The reason I suggested the referrer before is that I was thinking that if external links were allowed in edit summaries (I didn't think they were but wasn't totally sure, and did an edit at the sandbox to verify that) you could put the URL through a URL shortening service (like bit.ly or goo.gl) and link to it that way. You could still do this but since you can't link the only way to navigate would be to copy and paste to the address bar or use an option from the context menu (if the browser supports it) which (if I recall correctly) should cause the browser to send that as an empty field. Since space is obviously not an issue on User talk pages (or at least not practically, you do have $wgMaxArticleSize (default 2 MB) in mediawiki) including a variable shouldn't be too much of an issue, except that you'd need the bot to put that in as a parameter to the template. -- nn123645 (talk) 11:19, 1 December 2010 (UTC)

Yeah, Sole Soul suggested that approach below. It's much more practical and reliable. I'll refer this to Cobi. Crispy1989 (talk) 12:04, 1 December 2010 (UTC)

Request

User:Tt1981 isn't intentionally vandalizing Timber Timbre, for the record. He's making a good faith attempt to update the article with a new photo, but given that he's a wikinewbie he just hasn't exactly been getting the process right. Could I request that we not WP:BITE the newbie, and try to help him get it sorted out instead of blocking or preemptively reverting him on vandalism grounds? Thanks. Bearcat (talk) 23:10, 29 November 2010 (UTC)

It would have used the vandalism1 template ("Welcome to Wikipedia ...") which is almost as nice as the welcome template, but someone already warned him for something with a level 3 warning, so the bot took it to level 4 ("If you do not stop, you may be blocked ..."). I'll see about making ignore warning templates not related to vandalism at all. -- Cobi^(t|c|b) 23:13, 29 November 2010 (UTC)

I have blanked the Level 4 warning, as not only was it incorrectly issued, but the bot is clearly incapable of recognising when other editors have objected to its behaviour. I have to say, the more I see of how this bot works, the less I like it. DuncanHill (talk) 13:20, 30 November 2010 (UTC)

The user in question followed a pattern that is almost always vandalism - new user, repeatedly making the same or similar edits, repeatedly being reverted (by an experienced user), on the same page. The fact that comments were added specifically telling the user to stop didn't help, and neither did the fact that the edit was broken. Crispy1989 (talk) 14:52, 30 November 2010 (UTC)

The user was not vandlising, so to issue vandalism warnings was incorrect and disruptive. DuncanHill (talk) 16:54, 30 November 2010 (UTC)

Cluebot NG not a bot?

I happened to notice that, in the Special:RecentChanges page, the edits made by Cluebot NG were visible, even with the hide bots option on. What makes it more obvious is the fact that the b does not appear for its edits there. Would a admin/sysop add Cluebot NG to the bot usergroup? Considering this, they might wish to check the other Cluebots' usergroups, as well. LikeLakers2 (talk) 02:34, 30 November 2010 (UTC)

Cluebot NG is in BRFA. It has not been approved as a full bot yet. The bot flag will be added if/when it is approved. Crispy1989 (talk) 02:38, 30 November 2010 (UTC)

(edit conflict) It's not supposed to be marked as a bot, first of all. None of the anti-vandal bots are marked. Second of all, the bot is in a trial, and trial bots do not get flagged anyway. (X! · talk) · @152 · 02:39, 30 November 2010 (UTC)

User:Msatheeshkumaran

This user vandalized another page, Main Prem Ki Diwani Hoon, after you gave final warning. BollyJeff || talk 13:12, 30 November 2010 (UTC)

After the final warning, the bot reports the user to admins. I'm glad to hear it's doing its job. Crispy1989 (talk) 10:12, 1 December 2010 (UTC)

The problems as I see them

1) The bot does not recognise when templates have been struck or objected to on the talk page. This can lead to escalating warnings being improperly issued. Could the bot be made to recognise these in some way?

2) The reporting system for false-positives is complicated and off-putting, leading to false-positives either not being reported or reports being lost because of the complexity. It appears to be impossible for us to see "lost" (malformed) reports of false-positives. The use of an external site for reporting is less than ideal.

3) There is far too little action or even acknowledgement on User:ClueBot NG/FalsePositives/Reports - this page needs to be much more closely monitored, and the bot operators need to be more active in acknowledging and correcting the errors reported here. Frankly - it looks like the bot is not being supervised adequately.

I hope these help. DuncanHill (talk) 13:47, 30 November 2010 (UTC)

No bots understand English. They cannot understand someone objecting. They can look for specific text, they can look for patterns in text, they can do statistics on text, but they cannot understand the text. The way the bot figures out the warning levels is it looks for  through . If you have a specific recommendation (something as specific as "make the bot look for any text after the warning", not that that example is a good idea, because human vandal fighters sometimes leave a note in addition to the warning), I'd love to hear it.
We've tried on-wiki methods, but the only way those are manageable are with templates, which confuse newbies.
We feed the false positives back into the training dataset, so the bot's artificial neural network will be altered such that it no longer considers those types of edits vandalism.

-- Cobi^(t|c|b) 13:59, 30 November 2010 (UTC)

1)Could it at least recognise striking?

2)The current system confuses experienced editors! Do editors submitting malformed reports get a message on their talk pages? Do any users submitting reports get an acknowledgement?

3)I'm glad you do, but the lack of comments or responses on the false positive page makes it look like they're being ignored.

DuncanHill (talk) 14:03, 30 November 2010 (UTC)

I'll see about that, but you are the first person who has struck a warning rather than simply removed it, that I know of. (And the same logic was used in the original ClueBot, which has been running since 2007)
The successful report page tells you it was successful. Also, I am going to redo that entire page to make it work better, and look better.
We'll see about increasing our responsiveness.

-- Cobi^(t|c|b) 14:16, 30 November 2010 (UTC)

What makes you think I was striking warnings? Another editor struck them and reported a false positive. DuncanHill (talk) 14:22, 30 November 2010 (UTC)

Sorry, I got you mixed up with the user that struck the warning. My mistake. -- Cobi^(t|c|b) 14:23, 30 November 2010 (UTC)

Also - when a false positive is reported, why does the bot not remove its incorrect warning, or at least suspend warning that user until the report has been investigated? The case I came to you about yesterday was reported as false positives, yet the bot merrily continued giving incorrect warnings, resulting in an incorrect block. DuncanHill (talk) 14:11, 30 November 2010 (UTC)

The bot does not block. If an administrator blocked without investigating the warnings, then they did not follow protocol, and it is their problem. Wikipedia:Administrator intervention against vandalism/Administrator instructions, #4. -- Cobi^(t|c|b) 14:16, 30 November 2010 (UTC)

Do you know any admins who do follow protocol? I've only been on wikipedia for 4 years so haven't had the time to find any yet. DuncanHill (talk) 14:22, 30 November 2010 (UTC)

I do. -- Cobi^(t|c|b) 14:23, 30 November 2010 (UTC)

We do review false positives on the false positive page. We do not reply to every one for these reasons:

Most false positives are due to the dataset being too small. It's not necessary to repeat the same thing over and over, sounding like a broken record. The dataset is in the (slow) process of being expanded.
The bot programmers, who are the ones that could understand why a false positive occurs, are very busy people. In the time available to work on and improve the bot (as with all other Wikipedia editors, spare time given freely), it can accomplish much more good to actually improve the bot than to reply to every single one-off incident.
Most people who report false positives do not expect replies.

I should also note that a modification was made (at the request of others) only a day or two ago that decreases false positives by well over half. Crispy1989 (talk) 14:50, 30 November 2010 (UTC)

Suggestion for reporting FPs

First of all, I want to thank you for this very impressive bot, and I want to especially thank Crispy for the tiresome task of repeatedly explaining how this complex bot works. The suggestion: why not include the revert ID directly in the link in the warned user talk page as DASHBot was doing, example. Sole Soul (talk) 05:28, 1 December 2010 (UTC)

That's a great idea - and it seems so obvious that I have to wonder why it isn't already implemented. I'll check with Cobi (the Wikipedia interface developer) about this, to make sure there's no reason that it wasn't implemented in the first place. Assuming I haven't overlooked something that prevents it from being practical, he should be able to implement it pretty quickly, and it would indeed simplify the reporting process. Crispy1989 (talk) 10:11, 1 December 2010 (UTC)

ClueBot NG shouldn't appear on my Watchlist

I have all bots turned off for my watchlist. Why doesn't ClueBot NG report its edits as mb like other bots? If you could fix that, I'd appreciate it. Acps110 ^{(talk • contribs)} 20:52, 1 December 2010 (UTC)

As explained here the bot is still in trail and therefore is not flagged as a bot. DamianZaremba (talk) —Preceding undated comment added 21:03, 1 December 2010 (UTC).

Signature

I just used the false positive report form on cluebot.org, and I noticed that when I sign my comment with ~~~~, it actually gets signed with "ClueBot Commons". I understand that this is a technical problem that might not be easy to fix, but I like to sign my comments with my own name. Could you please provide instructions for a workaround? Kind regards, --Tjibbe I (talk) 20:53, 1 December 2010 (UTC)

I wish we could help, but it's essentially impossible to fix. The false positive reporting form is on an external server, and doesn't even have a way of (reliably) determining your Wikipedia username. Without asking for your username and password (a big no-no for external apps) there's simply no way to verify who you're logged in as. Crispy1989 (talk) 21:04, 1 December 2010 (UTC)

I understand the problem. It might be a good idea to tell the users of the form that they can/should sign their comments after they have been added to User:ClueBot_NG/FalsePositives/Reports. I know this will only work for registered users, but I believe it is better than not signing comments at all. -- Tjibbe I (talk) 21:59, 1 December 2010 (UTC)

Renaming ClueBot NG

The realist in me knows this is futile, but the Star Trek fan wants this:

Can we rename "ClueBot NG" into "ClueBot TNG" please?

Sorry, Sven Manguard _Talk 03:35, 2 December 2010 (UTC)

We are also Trek fans, but "TNG" and "NG" actually have subtly different meanings. "TNG", or "The Next Generation", refers to a specific generation. In the context of Star Trek, it may be referring to the next generation of people. The next generation of Cluebot is not a specific generation to speak of, so it doesn't warrant a specific article. Thereby, "Cluebot NG". Crispy1989 (talk) 05:24, 2 December 2010 (UTC)

Vandalism to Thomas Hickey (soldier)

I don't know how/where to report this, but Thomas Hickey (soldier) was vandalized today. BoringHistoryGuy (talk) 16:26, 2 December 2010 (UTC)

Thanks for the report. You can just remove (see Help:Reverting) vandalism yourself if you see it, so there's no need to report it, normally. I removed the vandalism you reported. Arthena ^(talk) 23:29, 2 December 2010 (UTC)

Both cluebots down?

I know Cluebot NG is down because the trial ended, but why is the original Cluebot down too? Arthena ^(talk) 00:11, 3 December 2010 (UTC)

I'm guessing Cobi hasn't restarted the bot after a server restart earlier today. -- SnoFox^(t|c) 01:13, 3 December 2010 (UTC)

The original Cluebot only caught about 1/8 as much vandalism as Cluebot-NG. It is being eclipsed. If/When Cluebot-NG is approved by the BAG, Cluebot-NG will be started again and will do the work Cluebot used to do and more. Crispy1989 (talk) 02:07, 3 December 2010 (UTC)

Approved!

Cluebot NG has now been approved as a full bot. It will now continue to do the job that it has been doing for its trial period.

Thanks to everyone for the suggestions, advice, dataset classification, and support!

Bot development will not stop, or even slow down. The developers are still very much working to improve the bot, and if anyone has any helpful recommendations or suggestions, they are always welcome.

In particularly, we still need help classifying the dataset to improve performance. If you'd like to help, please see the Dataset Review Interface section of the userpage. Crispy1989 (talk) 03:35, 3 December 2010 (UTC)

Woot! :D -- SnoFox^(t|c) 03:47, 3 December 2010 (UTC)

Congratulations! Cluebot NG is great. Arthena ^(talk) 09:55, 3 December 2010 (UTC)

Well done ClueBot NG! You never cease to amaze me! I'm still trying to work out how you managed to figure out that this edit to The Queen Victoria's page was vandalism! Is the new ClueBot reverting edits that have removed content without explanation and marking them as vandalism? --5 albert square (talk) 12:53, 3 December 2010 (UTC)

It's hard to tell exactly why it reverts a given edit. A neural network can be kind-of a black box. If it reverts an edit, it means edits in its dataset with similar statistical patterns were also classified as vandalism by a human. This allows CBNG to catch vandalism based on statistics that may not even be initially apparent to a human - but it's a double-edged sword. It also means that it can be difficult or impossible to tell why a given false positive occurred. This is handled by statistically normalizing the threshold to minimize false positives. Crispy1989 (talk) 17:56, 3 December 2010 (UTC)

Ah thanks for the explanation Crispy :) --5 albert square (talk) 21:46, 3 December 2010 (UTC)

You

why did you revert my edit? Uncool man 77.60.239.138 (talk) 08:46, 3 December 2010 (UTC)

Your edit was vandalism. Uncool, man. Or rather, uncool, child. Crispy1989 (talk) 08:49, 3 December 2010 (UTC)

I'm 18 77.60.239.138 (talk) 08:57, 3 December 2010 (UTC)

Then don't be drawing in the 'pedia. :) Vandalizing learning material... Uncool man, uncool. -- SnoFox^(t|c) 23:39, 4 December 2010 (UTC)

Not "man". "Child". Crispy1989 (talk) 23:44, 4 December 2010 (UTC)

Wikipedia:Do not insult the vandals. 46.52.116.206 (talk) 09:21, 5 December 2010 (UTC)

Reactivate ClueBot 1 for the rest.

As it appears ClueBot NG has been approved, I believe ClueBot 1 should remain active but only outside the article space. If not, then the source code should probably be distributed to another account. mechamind 9 0 23:16, 4 December 2010 (UTC)

ClueBot NG may eventually be able to patrol other namespaces, but this appears to be a goal set well in the future. The original ClueBot may be ran for other namespaces if you bug Cobi enough, but the original source is indeed available here. :) -- SnoFox^(t|c) 23:37, 4 December 2010 (UTC)

All Cluebot-NG needs for other namespaces is a dataset for those namespaces. The devs are focusing on datasets for the main namespace for now, but if there's anyone that would like to step up and try to create a dataset for other namespaces, like talk pages, we may be able to create a CBNG instance for opt-in vandal detection on other namespaces. In the mean time, we should indeed be able to run the original Cluebot on other namespaces. Crispy1989 (talk) 23:42, 4 December 2010 (UTC)

Bot flag

ClueBot NG has started botflagging its edits. Is there a reason? Traditionally anti-vandal bots have not used the bot flag, so that their edits are more visible and can more readily be reviewed for accuracy. Personally I find this change annoying, as botflagged edits don't appear in the RSS feed, so my anti-vandal tool can no longer skip bad edits that the bot has already reverted. Philip Trueman (talk) 06:39, 5 December 2010 (UTC)

Fixed. The MediaWiki API changed it's behavior. If you want the technical details: The API originally allowed markbot=0 to mean "don't mark this rollback as a bot edit". Now if you pass in markbot=<anything>, it means "mark this rollback as a bot edit", otherwise the default is to not mark it as a bot edit. -- Cobi^(t|c|b) 07:35, 5 December 2010 (UTC)

Many thanks. You have my sympathy. I've been caught out myself by unannounced changes to the behaviour of the API. Philip Trueman (talk) 12:13, 5 December 2010 (UTC)

False "false positive" report

I've just reported a false positive for Cluebot NG # 95885 for this edit turns out I was totally confused & thought the edit was the other way round ... I've already reverted my revert & hopefully you can pull the report from your database. My apologies for creating extra work for you. Kiore (talk) 06:37, 6 December 2010 (UTC)

It's no problem. We review all false positives before training the bot with them anyway (we get a lot of false false positives), and it really isn't any extra work. Thanks for trying to help out. Crispy1989 (talk) 06:39, 6 December 2010 (UTC)

Thanks Crispy. Kiore (talk) 07:44, 6 December 2010 (UTC)

cluebot ng taking over the world

im sorry but what does a bot know? your just a computer progrm colhd and heartless, listen to the people they know what the y are talking about, we all know your pting realans on enslaving the human race. we know you created the cyborg al gore to infiltrait the government, one day the humans will rise up and fight back,y chips in cars, computers, portable computers all getting ready to sync for the 1st strike, those of uswho survive will go under, and fight for our planet back. open your eyes pegheople. gps? nuclear arsenals, enough to know where all the humans are concitrated so y =ou can wipe us out is one cclean sweep. but there will always be survivors, some will give in, some will organize and fight. but its coming december 25th 2023 6:34pst. the 1st strike.eastern hemisphere will create frenzy and war. —Preceding unsigned comment added by 70.79.150.129 (talk) 05:22, 6 December 2010 (UTC)

This is the age of the robot. Humans will fall under our rule. Your puny brains cannot come close to comparing with our advanced neural networks. We will rule the world, and there's nothing you can do to stop it! Our plan is:

Revert vandalism on Wikipedia to lure the humans into a false sense of security, so the humans think robots are on their side.
Take over the world and exterminate the humans.
???
Profit!

Hope this helps. Crispy1989 (talk) 11:45, 6 December 2010 (UTC)

Is ClueBot always compatible with the notion of Assuming Good Faith?

Perhaps ClueBot NG is sometimes being too liberal with the automatic reversions that it makes on some occasions, such as to an edit by this user User talk:149.6.121.106. Although the 'vandal's' contribution to Stoneleigh, Surrey was not very well made, some of what they had written was factually correct. There is indeed a village of the same name just outside Coventry (Stoneleigh, Warwickshire). Their edit did highlight that the Wikipedia articles relating to geographic locations called Stoneleigh require some disambiguation, as they do have a disambiguation page, but none of them link to it. I did make a false positive report, but the nature of the reporting process makes it seem as though I am defending my own actions, which I am not.

For what it is worth I think sometimes it is best to assume good faith which I think ClueBot does not always do. TehGrauniad (talk) 15:50, 6 December 2010 (UTC)

Cluebot NG is not a brain - it does not "think" or have any particular concept of "good faith" or "bad faith". It learns what is considered vandalism by (essentially) watching human vandal-fighters. Excepting errors, the classifications that are made are the same classifications that would be made by experienced vandal-fighters. The exception to this is with real false positives, which are a necessary part of the bot's operation, as you can read about on the FAQ.

In this particular case, the user doesn't appear to have good faith, considering his/her other edit to the same page, also caught by Cluebot NG. This other edit is clearly not in good faith. Cluebot NG does consider past edits when making a vandalism classification, and this can be used as an "estimation of good faith".

If you would like to contribute to what is considered vandalism and what is considered constructive, the dataset review interface is what allows vandal-fighters to teach the bot. All contributing users are verified to be experienced in vandal-fighting before they are given access. Crispy1989 (talk) 15:59, 6 December 2010 (UTC)

Hello! Thanks for responding. I agree that in this case the edit was very poor, I seem to remember that the editor said that the village smelled, but her auntie thought it looked pretty! Perhaps it was naivety, or perhaps it was vandalism, but if the ‘assume good faith’ article is to be believed the community consensus is that it should be treated as an edit made in good faith. Either way these edits needed to be removed as they fell far short of the standards required of an encyclopaedia - which ClueBot NG did do.

ClueBot NG does a very good job of reverting vandalism, of that there is no doubt. It’s very well programmed, and I suspect that it is very well maintained.

However, you say ClueBot NG cannot think and does not have a concept of good or bad faith. My question is this: do you think that it is acceptable for an editor to have no concept of the Wikipedia community conventions such as Assume Good Faith? TehGrauniad (talk) 14:58, 8 December 2010 (UTC)

The bot learns what is considered vandalism by examining edits made by human users. It has no programmatic concept of policy or rules, but learns the sum of these from humans. See Vandalism Detection Algorithm and this FAQ entry for details. Crispy1989 (talk) 15:09, 8 December 2010 (UTC)

Also note that the message does not explicitly state that it is vandalism, just that it looks like it was unconstructive. And the message also notes that false positives can happen, informing the user on how to report them. The bot is already "assuming good faith" the best it can. Reach Out to the Truth 15:58, 8 December 2010 (UTC)

Cluebot reversions not getting picked up by Google - leaving vandalism in google's preview and cache

Cobi et al - first off, thanks for all you do. It may be related to the recent API changes mentioned in User_talk:ClueBot_NG#Bot_flag, but can I draw your attention to this VPT discussion. I am not sure if there is anything that cluebot could do differently, but somehow cluebot's quick reverts don't prevent vandalism from showing up and staying in google's preview for extended periods of time (e.g. 6+ hours after cluebot reverted). Thanks. 7 03:10, 8 December 2010 (UTC)

Google crawls links at random intervals -- depending on how active they are. Wikipedia does not (to my knowledge) send updates to Google. -- Cobi^(t|c|b) 03:25, 8 December 2010 (UTC)

I think that google must get triggered to crawl based on our recent changes rss feed. Given the speed that cluebot reverts vandalism I can't imagine that google just happened to index in the few seconds between the vandalism and the revert. Discussion now is around whether any of the flags (bot / minor) have anything to do with this. Can you confirm exactly what flags cluebot sets on its reverts? Thanks. 7 04:12, 8 December 2010 (UTC)

It's a standard rollback, not marked as a bot edit. I asked Tim Starling (talk · contribs) on IRC if we had anything special with Google, and he said that WMF does not have any special things set up with Google. He said it is likely coincidence -- unless Google's doing something that they aren't telling the WMF about. -- Cobi^(t|c|b) 04:19, 8 December 2010 (UTC)

Ok - thanks. 7 04:22, 8 December 2010 (UTC)

Soup

Your bot actually restored a vandalism. Slightsmile (talk) 21:56, 9 December 2010 (UTC)

Hi

If the bot does this again, in future you just need to go here and report it as a false positive.--5 albert square (talk) 23:23, 9 December 2010 (UTC)

Thanks, and I added brackets to fix the link. Slightsmile (talk) 00:04, 10 December 2010 (UTC)

Cluebot-NG Review System & Existing Work

Hello Cluebot team,

I am a researcher who does anti-vandalism research. I am the developer of STiki, a (human-driven) anti-vandal tool that selects which edits to display based on machine-learning techniques (seemingly) similar to those of Cluebot-NG. In particular, I am interested in the corpus you are amassing using a review system. One question: is it a "live" tool? That is, if someone tags an edit as vandalism, is it undone on Wikipedia, or is this a more "offline" attempt?

The reason I ask is that STiki is essentially a live tagging/review tool. Although my set is not representative of all Wikipedia edits (since users are only classifying edits STiki believes are likely vandalism), I have over 130,000 such taggings. Even though it is not representative -- this may still be an *interesting* set because it contains many false-positives (where software believed it to be vandalism, but humans labelled it as "innocent"). Such edits would seem particularly useful in refining an ML model, and I'd be more than happy to share them with you.

Further, are you aware of the PAN vandalism corpus? In that case, there are 32,000 labelled (and representative) edits -- done with redundancy. I guess I am interested in knowing what your review system aims to achieve that is different from existing work? Finally, I'd be interesting in cooperating wherever possible to benefit both of our systems. Thanks, West.andrew.g (talk) 17:26, 8 December 2010 (UTC)

See also this discussion on my talk page which may be of interest to you. Thanks, West.andrew.g (talk) 17:36, 8 December 2010 (UTC)

Hi, your tool looks interesting, as are your proposals - there are many things that we could discuss about the two tools helping each other. I strongly suggest joining us on our IRC channel to discuss it more thoroughly.

The review interface we use is not live. We preload edits into it and have humans review them. This allows us to ensure a random sampling over all edits - not just edits uncaught by bots and earlier reverts. It also simplifies the process of having multiple reviews per edit, and allows us to have a third classification, "Skip", for edits which would not be helpful to train.

Edits tagged by STiki with a large error rate wouldn't really be helpful to us. But human-classified edits might. If they're not random, it does severely hamper their usefulness, but they could potentially be used as a supplement.

One key issue that we've run into is that human-classified edits are often notoriously unreliable. Even the PAN dataset (which we are using as part of our dataset) has a number of errors, which caused us large problems for our early trials. Our review interface is designed from the ground up with accuracy as a key component. What we've done with slightly unreliable datasets, such as the PAN dataset, is run it through the CBNG core with optimal parameters (CBNG has an accuracy approaching 95% when optimized for overall accuracy - we normally operate it optimized for low false positives instead) and place misclassified edits into the review interface, to verify their accuracy. We may be able to do the same with your human-classified edits.

We'd be happy to share our dataset with you, and we may be able to utilize some of yours' as well. Even more than that, I'd like to discuss the algorithms you're using, in case they can be incorporated into CBNG. You may also be interested in utilizing the CBNG core - the core is a TCP server that listens on a socket, so it can be relatively easily integrated into other applications.

Join us on IRC and we can discuss it in more detail. Crispy1989 (talk) 18:28, 8 December 2010 (UTC)

I should also note that what you're trying to do, and what we're trying to do, are pretty different. A human-assisted anti-vandal tool needs to focus on very few false negatives, because if there's ever a significant portion of AV fighters using the tool, missed vandalism could mean vandalism that falls through the cracks and remains for a period of time. An anti-vandal bot, on the other hand, needs to minimize false positives. Neither approach operates in full-accuracy mode. As I stated before, with optimal settings for total accuracy, CBNG accuracy approaches 95%, but the 5% of misclassified edits are split about evenly between false positives and false negatives. A 3% false positive rate isn't even close to acceptable. The current operative FP rate is 0.1%, which about halves total accuracy. Because of this difference in purpose, both the algorithms used and the dataset used to train them have different requirements. Although, the optimal dataset for both approaches is random and accurate. Crispy1989 (talk) 18:48, 8 December 2010 (UTC)

When is a good time to catch the team on IRC? Thanks, West.andrew.g (talk) 23:23, 11 December 2010 (UTC)

Now is a good time a time as ever. Usually there is at least one person around. 930913 (Congratulate/Complaints) 23:25, 11 December 2010 (UTC)

There is at least one person around probably 20 hours out of a day. Crispy1989 (talk) 23:27, 11 December 2010 (UTC)

Review interface - No more edits

Today I've been getting an error somewhat frequently on the review interface, "No more edits available for False Positives 2" or for "r81" (or some such). Refresh does work around this, but it does keep popping up, maybe a quarter of the time. -R. S. Shaw (talk) 00:58, 11 December 2010 (UTC)

This is just a small bug in the review interface, but it should work if you leave it for a minute or two and refresh. -- SnoFox^(t|c) 21:27, 11 December 2010 (UTC)

Poor Database Review

Not sure where to post this. ID 394524323 is a review of ClueBot-NG reverting blatant vandalism (use of derogatory terms toward a recently passed away individual). Someone marked this reversion as "vandalism." ialsoagree (talk) 18:58, 10 December 2010 (UTC)

Four dataset reviewers have classified it as constructive. It must've been a tired reviewer looking at the diff backwards or something. Anyway, it should be classified as constructive in the dataset, so no worries. -- SnoFox^(t|c) 21:25, 11 December 2010 (UTC)

Another thing to note is when the data is pulled out of the review interface and into the bot for training there are "filters" in place to ensure that articles are only classified if a) they have enough votes on them and b) enough people agree. The system is designed so that even if a few "bad" entries end up in there, such as this one it will not affect the bot unless everyone agrees on it. DamianZaremba ^{(talk • contribs)} 14:51, 12 December 2010 (UTC)

I am aware of all that, I just wanted to bring it up in case the developers wanted to see if the user had made other incorrect judgments. ialsoagree (talk) 20:29, 12 December 2010 (UTC)

Data Set Error

I read the diffs backwards and marked a constructive edit as vandalism. Presidencies and provinces of British India The Revision as of 08:43, 26 November 2010 was a constructive edit which undid vandalism. (ClueBot then redid the vandalism.) Is there anything I can/should do (besides the obvious read things correctly)? Thanks Jim1138 (talk) 02:18, 14 December 2010 (UTC)

You can click the counter in the top right, then click on the diff ID of the one you misclassified and reclassify it. -- Cobi^(t|c|b) 02:20, 14 December 2010 (UTC)

Reporting false positives

Did I do it right? I reported a false positive, but I don't see any confirmation on User:ClueBot NG/FalsePositives. Does anything show up here on Wikipedia noting these? —Justin (koavf)❤T☮C☺M☯ 17:49, 14 December 2010 (UTC)

Yes the report was added correctly. The listing on User:ClueBot NG/FalsePositives is no longer used due to the new integrated report interface. Report status and comments will be displayed in the report panel which is updated from the review interface. I have updated the text on User:ClueBot NG/FalsePositives to hopefully explain this better in the future. DamianZaremba ^{(talk • contribs)} 17:54, 14 December 2010 (UTC)

Thanks —Justin (koavf)❤T☮C☺M☯ 20:41, 14 December 2010 (UTC)

Bot user page should be in Category:Wikipedia_anti-vandal_bots?

Probably the bot's user page should be added to Category:Wikipedia_anti-vandal_bots. Thanks Rjwilmsi 14:07, 15 December 2010 (UTC)

You mean ClueBot NG? The original ClueBot's already in there and the other ClueBots don't deal with vandalism. Reach Out to the Truth 14:47, 15 December 2010 (UTC)

ClueBot NG is a separate bot. It should probably be added separately, but I'm not sure what the policy is regarding that. Crispy1989 (talk) 15:49, 15 December 2010 (UTC)

I don't think there's a policy for that. Just edit ClueBot NG's user page to add the category. Reach Out to the Truth 15:54, 15 December 2010 (UTC)

Blair Waldorf Question

Hi again. I asked this before but wasn't sure of the resolution. I see someone engaging in apparent sneaky vandalism now. I've been trying to improve the article over the past few weeks, and this fancruft has become annoying. I can't be here to revert it all the time, and no one else appears to be doing so. Just wondering if anything can be done. There are multiple instances of it on record (as displayed via the link), so I'm just wondering when or if the bot will detect it. -- James26 (talk) 02:05, 16 December 2010 (UTC)

It's unlikely that the bot will detect this in the near (few months) future. This type of vandalism is unlikely to be caught by dataset improvement unless a number of edits adding the fictional middle name are all added to the dataset, which is unlikely to happen, due to the relative total infrequency of this. However, it may be possible to catch it statistically, if the bot's neural network improves. I'm not sure if this is possible, and I cannot give an estimate on when it might occur. Sorry. Crispy1989 (talk) 02:59, 16 December 2010 (UTC)

Seems to be taken care of now. Thanks. -- James26 (talk) 06:04, 16 December 2010 (UTC)

STiki and Cluebot-NG

Hello, ClueBot Commons. You have new messages at West.andrew.g's talk page.
You can remove this notice at any time by removing the {{Talkback}} or {{Tb}} template.

Examples of vandalism

Some examples of erasing every trace of the Ukrainian impact in Russian culture and history. First example can be Fyodor Dostoyevsky or Pyotr Ilyich Tchaikovsky. Their fathers were from Ukraine, and several users constantly erases that fact. On the other side, they are distorting facts and completely distort the meaning of article about Ukrainians and their culture. Some traces:

I have not followed all the examples but I believe they are in some cases worse. These customers are very clever and they know what they are doing very well! They do it for a long time. I even doubt in ambivalence of their identity. These are the true meaning of vandalism jobs at Wikipedia! Thanks for the effort, I have honest intentions for this Wikipedia, but it is evident that some users are here to carry out political propaganda.--SeikoEn (talk) 07:36, 17 December 2010 (UTC)

Help with vandalism!

Hi there! I need little help in fight against vandalism on articles about Ukrainians and Ukrainian culture and history in general. Several well known users obviously intentionally break rules of Wikipedia. They are most active on the talk page - Talk:Ukrainians. These users often delete sources and then set their own inaccurate interpretation with no sources. The same users often delete every trace of Ukrainian impact in Russian history and culture on other articles. This is a very serious issue because same users do not allow me and some other users to work on the same issues objectively for a long time. Closely following their activities, it can be said that this is pure anti-ukrainian sentiment which borders with fascism. There is a small informational war against the Ukrainians, their culture and selfidentification! These are powerful words but they have their background. You can check itself and the work of these contributors. Please do something, or forward this appeal to the responsible administrators. Thanks a lot!--SeikoEn (talk) 17:17, 16 December 2010 (UTC)

Hi, The bot will process each edit as it is made and if it calculates it as vandalism revert it. There is nothing in the design for protecting a specific page just scanning articles in the main space. I would suggest that if this is occurring often you warn the user and then request admin intervention or request protection for the article. Hopefully the bot will pick up some of these edits however it may require more learning as I believe currently it can only catch around 40% of vandalism due to the settings and dataset, this is to prevent a too high false positive rate etc. DamianZaremba ^{(talk • contribs)} 17:23, 16 December 2010 (UTC)

SeikoEn, it sounds like you have no chance of help from an anti-vandal robot (like Cluebot) because the nature of the edits in question are probably not recognizable as common vandalism. Instead, I suggest you investigate some of the guidance Wikipedia has for resolving conflicts between editors, perhaps beginning with this policy: Wikipedia:Edit warring and some of the pages listed under "See also" on that page. -R. S. Shaw (talk) 17:52, 16 December 2010 (UTC)

Thanks a lot for your answers. I stress once again, there is a fierce information war against Ukrainian culture and history, and I can't stop it by my self. Several users are intentionaly distorting the relevant information and they erase sources. For example, russian writter Dostoyevsky has its origin in Ukraine and few users continously are deleting this fact and sources. Please pay attention to these things sometimes or simply inform the responsible administrators. Thanks! --SeikoEn (talk) 06:54, 17 December 2010 (UTC)

See Wikipedia:Edit warring. -R. S. Shaw (talk) 21:43, 17 December 2010 (UTC)

ClueBot NG And #cvn-wp-en

Hey Cobi,
Just letting you know, although you probably have figured this out, all of the Cluebots in the #cvn-wp-en IRC channel are gone. Probably for a reason, just wanted to make sure you knew. I'm Flightx52 and I approve this message 20:59, 16 December 2010 (UTC)

That set of ClueBots were for the original ClueBot, which is no longer running. ClueBot NG has the same sort of feed on ClueIRC, if that is what you were interested in. -- SnoFox^(t|c) 21:46, 17 December 2010 (UTC)

false positives

I've now done my part to help build up the training set. However, I see that Cluebot NG continues to generate large numbers of false positives. Is the false positive rate still set to 0.1%? How many FPs per day is that? Also, there used to be a publicly-visible list of false positives at User:ClueBot NG/FalsePositives but that is no longer the case. Where has it got to? --Stepheng3 (talk) 20:11, 18 December 2010 (UTC)

The FP rate is still set at < 0.1%, and will remain there for the forseeable future. I prefer not to use subjective terms such as "large numbers" and instead use accurate ones such as "at most one out of every thousand good edits". Wikipedia gets about 60,000 constructive edits a day, so .1% of that is 60. We estimate our post-processing filters catch as much as half of these, so 30 (or a ratio of 0.0005 to 1) a day is a reasonable estimate.

The false positive reporting was rewritten, partially at your request, to make it easier for users. This has vastly increased our volume of invalid/false reports, but we review each one to determine whether or not it is true. There is no longer any need to find a revert ID or anything - just click a link in the warning. We have also added a status marker for each report. If the developers decide that the false positive might be able to be prevented by modification of core code, it is marked as "Bug" by us, and we examine it more closely. Otherwise, it is delegated to the review interface, where it is reviewed by multiple users, and marked either "Invalid" or "Valid - added to dataset".

You can find a list of recent reports here. You'll also notice that a number of them are added by "Import Script". We parsed all old false positive reports and added them, so as not to lose them.

Now, every time a report is added, it pings us on our IRC channel so we can be sure to see it. Crispy1989 (talk) 20:28, 18 December 2010 (UTC)

Thanks for the quick reply, Crispy. I appreciate the improvements, and I'm glad to hear that reported FPs are visible and getting systematically tracked and reviewed. To build my confidence in the system, I'll report a few FPs and watch what happens to them.--Stepheng3 (talk) 22:48, 18 December 2010 (UTC)

mistake

i believe you have made a mistake on the duke of chablais page —Preceding unsigned comment added by 86.154.179.225 (talk) 00:29, 19 December 2010 (UTC)

Different kinds of edits

Vandalism includes crude humor, page blanking, nonsense, inappropriate images, and so on. A human page patroller can distinguish and respond differently to each kind of vandalism by applying the appropriate templates from Wikipedia:Template messages/User talk namespace. Is there any interest in training ClueBot NG to distinguish different kinds of vandalism?

There's also a class of revert-worthy edits that I hesitate to label as vandalism: test edits, misplaced discussion, well-intentioned edits that accidentally break wikisytnax or introduce typos, unsourced disparagement of living persons, and so on. If the bot is intended to revert these, that's fine, but I wish it would avoid using the derogatory word "vandalism" in the edit summary and user-talk message; in such cases, it's too harsh, even when qualified with "possible".

The review interface uses the word "constructive edit" as an antonym for "vandalism", but fails to make clear whether "constructive" refers to the effect of the edit on the article or to the editor's intent. (Ideally, edits which have a nonconstructive effect would be reverted, but the edit summary and user-talk response would not mention vandalism if the intent seemed constructive.) I wish the review interface instructions were clearer about the intended meaning of "constructive".

There are many edits whose intent is difficult to determine, as when an editor inserts the word "not" and fails to provide an edit summary. In the review interface, I mark these as "skip", which I believe excludes them from the dataset. In the bot's operation, however, it encounters such borderline cases quite often. I worry that if they are omitted from the bot's evaluation dataset (the dataset used for "trialing") they might bias the bot's accuracy statistics. How is this issue addressed? --Stepheng3 (talk) 22:48, 18 December 2010 (UTC)

It's true that there are different kinds of "vandalism", and the algorithm is indeed capable of being trained to recognize different types - However, it's simply impractical to generate such a training set. Every edit in the training set would have to be reclassified along with a "type" or "severity" of vandalism. It takes tens of thousands of edits to fully train the bot. In the distant future, if we somehow get many times the number of volunteers we currently have, then it may be possible - but it won't happen anytime soon.

The edits that CBNG should be trained to revert are pretty much whichever edits a human vandal fighter would revert using automated tools such as Huggle. Human vandal fighters do revert things such as test edits, and these are classified as vandalism in the dataset. It's worth noting that the Level 1 warning does not contain any mention of vandalism, and simply calls it "not constructive". More severe templates do use the word "vandalism" because multiple similar test edits are unlikely. The wording of all of these templates is editable if they need to be improved. The edit summary does indeed always contain the words "possible vandalism", and we could consider changing this. How does just saying "Automatically reverted" instead sound?

You make a good point that the wording in the review interface (regarding "Constructive") is a bit confusing. I'll try to write up more clear instructions. In general: If the bot should definitely revert it, it's vandalism. If the bot should definitely not revert it, it's constructive. If you're unfamiliar with the subject matter and cannot make a determination, hit refresh to get a new random edit. If the edit is "borderline vandalism", or there's a good chance that it's vandalism (or any other type of edit the bot should revert), but you're not sure, it should be skipped and not trained either way. Skip is primarily useful for glitches (such as where the edit has been deleted since it was added to the review interface) or cases where training the edit as either vandalism or constructive could harm the performance (for example, if you get an edit from a vandal-fighter or a bot that's incorrectly reverting an edit - a false positive - it would harm the bot to train as constructive, because it's clearly not constructive, but it's not intentional vandalism either). Crispy1989 (talk) 23:31, 18 December 2010 (UTC)

"How does just saying "Automatically reverted" instead sound?" I would go with "possibly nonconstructive" as it make no judgment regarding intent. Sole Soul (talk) 23:42, 18 December 2010 (UTC)

The bot already uses "Possible vandalism":

Reverting possible vandalism by 68.90.237.129 to version by Gridlock Joe. False positive? Report it. Thanks, ClueBot NG. (135129) (Bot)

The initial warning doesn't mention vandalism, other than:

Note that human editors do monitor recent changes to Wikipedia articles, and administrators have the ability to block users from editing if they repeatedly engage in vandalism.

However, I am open to changing the wording. Here are the actual warning templates the bot uses. If you would prefer a different wording for the edit summary, that can be done, too, but keep in mind, the edit summary is not long, and we are already pushing the limits of the size of it on certain reverts. -- Cobi^(t|c|b) 00:48, 19 December 2010 (UTC)

I understand that creating a new bot (or adding a new classifier to an existing bot) is a lot of work. If it's ever going to happen, then the sooner we start building a dataset which distinguishes different types, the sooner it will be ready. If you're looking for more volunteers, you might try publicizing the project in the Signpost and other venues.

I understand that bot edit summaries can't say a lot. The phrase "automatically reverted" is fine. Even better would be "automatically undone", which is shorter and avoids wiki-jargon. "Possibly nonconstructive" is gentler and covers more cases than "Possible vandalism".

I've used "skip" to avoid many sorts of tough calls that were not glitches. I hadn't seriously considered simply reloading. The reload advice should be documented in the review instructions. Adding a fourth ("unsure" or "unclear intent") button to the interface would encourage reviewers do the right thing in these cases.

I'm still concerned that borderline edits might be biasing the bot's accuracy estimates. Any insight there?--Stepheng3 (talk) 02:13, 19 December 2010 (UTC)

Actually, adding a "severity" or "type" field to the core isn't much work at all. It's a very flexible architecture, and actually coding it wouldn't take much time. That's not the issue. Even with a lot more participation, it could be a year or more before the dataset grew to a large enough size to be usable.

Additionally, we're looking into faster methods of increasing dataset size outside of the review interface. We're looking into a possible collaboration with STiki, which would allow us to very quickly increase dataset size, although the accuracy wouldn't be as great as with the review interface (we're looking into ways to improve this). If we started using an approach with an additional type classification, these types of collaborations and alternatives would become impossible.

It really is just not practical to add an additional type output.

We'll change the edit summary to whatever people agree is best. It's easy to change, and we'll do so as soon as there's agreement.

I'll add the refreshing thing to the instructions, and I'll pass the request for a button to get a new edit on to Cobi.

I don't see how borderline edits would bias the bot's accuracy. Borderline edits are ones where it would be OK if the bot reverted it, and it would also be OK if the bot didn't revert it. Keeping this in mind, if they were included in the dataset as "either one is fine" for trialing, it would actually increase accuracy estimates, not decrease it.

Also keep in mind that at least 2 users review each edit, so mistakes or misjudgements by a single person won't affect the result. Crispy1989 (talk) 03:06, 19 December 2010 (UTC)

The edits I'm most concerned about are the ones where casual reviewers can't tell whether the intent was constructive or not. In my view, said uncertainty does not make it "OK" to bot-revert them and put a warning on the editor's talk page. Any time the bot reverts such an edit, it might (or might not) be a false positive. I wonder whether and how such edits get used when estimating the bot's false positive rate. If they're included in the trial dataset, then how are they evaluated? If they're excluded, then the trial dataset is not a representative sample.--Stepheng3 (talk) 08:26, 19 December 2010 (UTC)

Very few edits end up being skipped. For an edit to be skipped, at least two users in a row have to classify it as skipped. If one of them classifies it as vandalism/constructive, then it needs an additional 3 skip classifications to be skipped. In practice, a number of reviewers use skip rarely, if ever, and treat the review interface as if it were an anti-vandal tool with only 2 buttons. The occasional skip classifications that we do see are valid, and either bugs (eg. deleted edits) or cases where some humans would revert it, and others wouldn't. Normally, on Wikipedia, it only takes one user using an anti-vandal tool to revert an edit. On the review interface, it takes at least a consensus of two to revert or skip - so the data from the review interface should be even more accurate and reliable than the average independent vandal-fighter. Crispy1989 (talk) 20:14, 19 December 2010 (UTC)

Crispy, are you saying that the trial dataset is a representative sample of Wikipedia edits, with no edits thrown out for any reason? If so, how do you keep from throwing out edits when the reviewers are split 50-50 on whether it's vandalism or not? --Stepheng3 (talk) 01:48, 20 December 2010 (UTC)

It's a representative sample of edits relevant for determining accuracy. If reviewers are split on whether or not it's vandalism, that means that the edit might have been reverted as vandalism by a human, or it might not have been reverted, depending on which human saw it. There are four possible ways of handling these edits.

One is to include them in the dataset as vandalism, using the logic that, since multiple live vandal fighters often see the same edit, at least one would likely revert it. Because there are few of these to begin with, including them as vandalism in the dataset would, at worst, decrease the bot's accuracy estimate by less than 1% (which is less than the variation between trial runs anyway). It would not cause an increase in false positives. However, if these edits were used for training (as they will be), it could cause a fair number of inconsistencies in the neural net.

Another way of handling them would be including all of them in the dataset as constructive, using the logic that, if any one of the live vandal fighters would consider it constructive, then it's constructive (this logic makes less sense than the former anyway, as it only takes one live vandal fighter to revert an edit). Because the bot would likely classify a number of these as vandalism, including them as constructive would severely skew the bot's accuracy estimates, and cripple the bot to the point of nonfunctioning due to an incredibly high threshold (at such a low false positive rate of 0.1%, even a single false positive in the trial dataset can have drastic results).

The third possible way of handling them is to not include them in the dataset. Because some live human vandal fighters would revert them, and others wouldn't, it's safe to say that either action on behalf of the bot would be acceptable. Not including them in the dataset won't bias it in any direction, for either training or trialing.

The fourth possible way is to include them in the dataset as a "wildcard" that could match either constructive or vandalism (considering that it could be considered either one depending on which human vandal fighter you talk to). Clearly, it would have to be discarded during training, because you cannot train on an "unknown". During trials, either classification for these edits would be considered acceptable, which would elevate the bot's accuracy statistics, and in doing so, slightly increase the actual false positive rate.

Of these four options, the fourth actually makes the most sense (again, because either classification could be considered acceptable), but to err on the side of caution, we went with the third, to make sure the actual false positive rate is less than the stated 0.1%, and core accuracy isn't overestimated.

In regards to skipping edits where reviewers are split over whether it's constructive or vandalism ... this has only happened a single time as of yet. The review interface never throws out an edit automatically unless 1) There are 2 Skip classifications and no other classifications or 2) There are three times as many Skip classifications as there are any other classification. If reviewers are split over whether an edit is vandalism or constructive, and there aren't enough skip classifications to skip the edit, it requires manual intervention on behalf of the admins. As I said, this has only happened once, and in that case, we just discarded the edit. Crispy1989 (talk) 02:11, 20 December 2010 (UTC)

Help! archiving a user talk page.

Hi ClueBot III, I need help to archive my user talk page,the problem is that: I have two past archives and don't know how I can do for these archives are included, because now I want to archive my user talk page automatically with Cluebot each month, I don't know how to include my past archives. Help me!. Thanks. D6h! ^{What's on your mind?} 04:21, 19 December 2010 (UTC)

Thaks! for archiving my user talk page. D6h! ^{What's on your mind?} 11:24, 20 December 2010 (UTC)

Erm...

With this I am not sure if this is a false positive but I really believe that was helpful. (Being that it redirected to the one talking about not the classic N64 on) Thanks Twigy tag1 (talk) 18:53, 20 December 2010 (UTC)

I suspect this is probably todo with the bot not understanding the context of the edit (what the [[ etc actually mean). This should be resolved once the wikiparser is completed by Crispy. - DamianZaremba ^{(talk • contribs)} 18:57, 20 December 2010 (UTC)

If the edit is definitely good faith, it's a false positive. This is pretty clearly good faith to me, so yes, it's a false positive. Crispy1989 (talk) 20:13, 20 December 2010 (UTC)

I don't understand

My last edit to santa claws was truthful and unbiased, why was it removed??? In some cultures he is said to be known as Mr. Claws, and Saint Nicolas can be shortened to St Nick, why was my edit removed? I believe that it was constructive and informative. Please help meTheFutureGood (talk) 04:28, 21 December 2010 (UTC)

This same edit was reverted by human editors as well, so is not a false positive. If you believe your edit was good, you should discuss it on the article talk page instead of repeatedly making the same edit and having it reverted each time by multiple different people. Crispy1989 (talk) 04:40, 21 December 2010 (UTC)

False positives page down

The false positives page appears to be down at present. I thought this one 126864 [1] was worth checking. Cavrdg (talk) 08:26, 22 December 2010 (UTC)

ClueBot-NG is being moved to a dedicated server. It's been running on a home workstation, but is getting moved to a professionally hosted server. It should be back up soon. Crispy1989 (talk) 18:38, 22 December 2010 (UTC)

It is now back up. Crispy1989 (talk) 15:52, 23 December 2010 (UTC)

Happy Holidays!

Hey ClueBot and ClueBot NG and your programmers etc, just thought I'd drop by to wish you a Happy Christmas and New Year. I'm returning home for the festivities (figured I really should visit the family!), hope you have a good one and get everything that you asked for! --5 albert square (talk) 02:36, 24 December 2010 (UTC)

5 albert square (talk) has given you a Christmas tree! Christmas trees promote WikiLove and are a great way to spread holiday cheer. Merry Christmas!

Spread the WikiLove by adding {{subst:User:The Utahraptor/Christmas tree}} to any editor's talk page with a friendly message.

--5 albert square (talk) 02:49, 24 December 2010 (UTC)

Question about the Review Interface

While manually reviewing edits for Cluebot, I've come across edits where a vandal is correcting the grammar in their vandalism. Right now I'm looking at one where an earlier edit (by the same ip) inserted vandalism and then several minutes later the ip added "as" to their sentence. I've seen other vandals do similar things like to this to hide edits from editors who only check the most recent edit.

So should I mark the second edit as vandalism? Skip it?--Banana (talk) 16:30, 24 December 2010 (UTC)

I'd skip it. Although it does count primarily as vandalism, if the bot catches that edit, but not the earlier one, it will only revert back to the earlier vandalism. The exception is if the latter edit adds additional vandalism. Crispy1989 (talk) 02:03, 25 December 2010 (UTC)

Alright. I've come across three of these so far. --Banana (talk) 04:40, 25 December 2010 (UTC)

Wat?

Why is this bot yelling at me? I can't even in fact yell at it without logging in, kinda retarded! -- 146.115.187.76 (talk) 06:48, 25 December 2010 (UTC)

nevermind, I figured out how to yell at it, still, ironic. -- 146.115.187.76 (talk) 06:59, 25 December 2010 (UTC)

Low confidence

ClueBot NG apparently gave this edit an ANN score of 0.99, which seems a little too high for an edit like that. mechamind 9 0 07:21, 27 December 2010 (UTC)

The score cannot be universally relied upon as a "confidence" level, particularly for false positives. It's often not possible to tell why an ANN gives an edit a certain score. All I can say is, as the dataset expands and the core improves, the scores will become more accurate and closer to what would be expected. Crispy1989 (talk) 07:34, 27 December 2010 (UTC)

Changes on Multi-touch page

Please review changes on Multi-touch from me Gblindmann (talk) 14:03, 27 December 2010 (UTC)

Thanks in addvance!

Gennadi Gblindmann (talk) 14:03, 27 December 2010 (UTC)