User talk:Shirik/CollabRC

This page is for discussion regarding the CollabRC project. To open a new discussion thread, please create a new section at the bottom of this page. All feedback is welcome and will be responded to as time permits.

Change: 3 options -> 2

The "Learning reversion patterns" section used to have a segment written as follows:

To accommodate learning, every time a user reviews a change, one of three results must be chosen (though they may be implicit): vandalism, not obvious vandalism, and certainly not vandalism. These results are fed back to the collaboration bot; the vandalism and certainly not vandalism results are taken into account for adapting the working knowledge of the AI of the bot. A not obvious vandalism result informs the bot that it may have been correct in its analysis, but the change should not be rolled back because it is not obvious vandalism. In such a situation, the AI system will not adapt because inconclusive evidence is given as to whether the result was correct or not.

After discussing with The Thing That Should Not Be, it was determined that this was over-aggressive and might result in a slowdown of recent change patrollers. Furthermore, the AI bot does not necessarily need to know the difference between not obvious vandalism and certainly not vandalism. Accordingly, this section will be changed to be more similar to Huggle's tactics -- either a given change is vandalism or it is not. Since the bot should only be triggering on high-priority threats, anything that would have fit under not obvious vandalism should not be a trigger, and thus must be fed back to the bot anyway. Any feedback regarding this change is appreciated. --Shirik (talk) 06:32, 16 December 2009 (UTC)[reply]

The "I'm not sure" option, while it might slow down some patrollers, is necessary, but for a different reason - if an edit is "suspicious" it deserves a further review by someone less rushed than front-line RC patrollers to determine whether it was vandalism or not. Triona (talk) 06:44, 15 September 2010 (UTC)[reply]

Sample sets

Tim1357 was kind enough to point me to some sets of known vandalism and non-vandalism which should be used to initially train the bot. These datasets are courtesy of Crispy1989 and were originally intended for ClueBot.

User:Crispy1989/Dataset/Vandalism - Definite vandalism
User:Crispy1989/Dataset/Constructive - Definite constructive posts

Thanks to those whom have worked to build these lists. The information will be truly helpful. Shirik (talk) 04:00, 17 December 2009 (UTC)[reply]

User:Tim1357/dataset - Additional dataset courtesy of Tim1357 by VoABot

I don't think the above edits should be included, too many errors.Tim1357 (talk) 03:42, 18 December 2009 (UTC)[reply]

That's fine, but something's been bugging me. We have the reversions done by bots, but these reversions are typically "obvious vandalism". Can you think of any way we can get a dataset of less typical vandalism, such as that which was reverted manually? I can't think of an easy way to do that. --Shirik (talk) 18:18, 18 December 2009 (UTC)[reply]

That was what Cobi was trying to do. He gave the bots neural network a base training using what the bot had already seen as vandalism, then he fine tuned it using human selected 'sneaky' vandalism. He placed more weight on the human selected edits then the bot's previous ones. Tim1357 (talk) 00:06, 19 December 2009 (UTC)[reply]

P.S. Im going to get the human data-set from him.Tim1357 (talk) 03:43, 21 December 2009 (UTC)[reply]

".. huggle or twinkle .."

Is the bit about blacklisting editors who have been reverted using Huggle or Twinkle meant literally? I don't, and won't, use either. How about "reverted by a sysop or rollbacker"? Philip Trueman (talk) 18:51, 12 April 2010 (UTC)[reply]

Important pages and new pages

Pages linked to from the main page, highly-used templates, and bios of famous people are important, pages about rivers and little-known books are less important. New pages sometimes get created for pure vandalism, or are really messed up when they first get made. If there were some way to prioritize the importance of a page to the project, wouldn't it be possible to keep a closer watch on important pages? Keep the current system in place, and have the same settings for "absolutely vandalism", but make it easier for new/important pages to trip the "might be vandalism" part. After a week or so, take new pages off the watchlist. Math321 (talk) 21:00, 23 February 2012 (UTC)[reply]