Wikipedia:Bots/Requests for approval/ProcseeBot

The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was

Approved. MBisanz ^talk 23:15, 27 January 2009 (UTC)[reply]

ProcseeBot

Operator: slakr^\ talk /

Automatic or Manually Assisted: automatic

Programming Language(s): level 60 magic, dragon tears (jellybeans), and toadstools.

Function Summary: It basically harvests open proxies listed on public proxy lists, then checks to make sure the proxy is world-usable and then blocks them here pre-emptively.

Edit period(s) (e.g. Continuous, daily, one time run): continuous

Already has a bot flag (Y/N): N

Function Details: First and foremost, my apologies to those who will think that adminbots are evil/automation is evil/zOMG adminbots/whatever, but this bot truly is needed. I created it as an emergency response to an influx of quick-fire spammers from a specific website (e.g., [1] [2]) then, once it was clear that it had squashed them, I disabled my script, but actual (failed?) spambots started emerging in early january (e.g. [3], [4]). The common trend was that all of the edits were from open proxies listed in public proxy lists. I therefore revised my script into a more streamlined, multi-site version to crawl more sites and do better checks to make certain that a proxy is open and to block the exit ip instead of the entrance IP in instances where the two differ.

As a result, the measurable incidence of new vandalism/spam from these spambots has significantly decreased (e.g., now about 1 address gets through a week as opposed to 12 or more a day). Presumably, the amount of sockpuppetry and vandalism from proxies has decreased as well, but there's no real way to test it; however, from some of the unblock requests on the proxies ([5] [6] for admins) that claim that I rape+cannibalize babies as well as some threatening to kill me (not to mention the increase in attack accounts created that mention me and other proxy blockers), it would seem that the blocks are doing a decent job at preventing vandalism with almost non-existent collateral damage— only one now-closed proxy unblock request the entire time, which I quickly unblocked.

The bot has been and continues to run on my main account. Again, this is because I felt it was urgent and simply didn't have the time to come here first. To give you perspective: at one time there were 14 single edits made from 14 different proxies in a 2 minute period, and I felt that the situation clearly demanded an automated solution to prevent recurrence.

I don't intend to release either the source code or any intimate details of the bot's operation. I apologize in advance, but I hope you understand that I don't want to give vandals/spammers/socks an easy-to-use, automated method for harvesting and using proxies, OCR-reading images, and all sorts of other things.

Cheers. --slakr^\ talk / 02:27, 19 January 2009 (UTC)[reply]

Discussion

This bot is defiantly needed --Chris 02:38, 19 January 2009 (UTC)[reply]

Care to say what language the bot is really written it? I doubt "level 60 magic, dragon tears (jellybeans), and toadstools" is a language. :P Also, is the source code available? Foxy Loxy ^Pounce! 02:55, 19 January 2009 (UTC)[reply]

"I don't intend to release either the source code or any intimate details of the bot's operation. " I thought the language question was trivial since I'm capable of writing bots and I knew I had to use humor or I'd never get around to filing this thing, but a combination of php, c and c++, and yes, level 60 magic (it keeps the magic smoke inside the bot. ;) --slakr^\ talk / 03:30, 19 January 2009 (UTC)[reply]

This bot is definitely needed, but the lack of source code for an admin bot is not so cool. Would you be willing to provide this out to BAG members and other trusted users?

Also, is there any real difference between this and Wikipedia:Requests for adminship/TawkerbotTorA? NuclearWarfare _(Talk) 03:03, 19 January 2009 (UTC)[reply]

Again I apologize. I'm paranoid about things I create being used for evil. I don't trust many people, and I don't yet know and trust everyone to keep the stuff confidential. Plus, even if it weren't used to do bad things on wikipedia, it could still be used to do bad things on other sites.

In response to your question about the difference between this bot and Wikipedia:Requests for adminship/TawkerbotTorA, off hand I can think of a few different things:

The community is more open toward the idea of admin bots, and the discussion of admin bots is significantly more rational (not the case in 2006).
This bot is already running per Wikipedia:IAR and has yet to cause damage, whereas people thought the world was going to come to an end should the other bot come online.
While this bot seeds its to-check list from external sites, it does not block based on them being listed there— it, itself, checks to make sure the proxy is open and usable, then only blocks if it can use it as a proxy.
Last and most significantly, there is a clear and demonstrated need for it presently. Between the anontalk spammers and the spambots, the abuse of proxies is a very real problem, and it's becoming increasingly obvious and difficult to control, as the proliferation of open proxy lists and amateur scripting has made it extremely easy to evade our blocks. Anecdotally, the 4chan vandal streak a few days ago would almost certainly been a hundred times worse had this bot not been in operation, as /b/ users complained multiple times that their proxies were already blocked when they tried to vandalize articles.

--slakr^\ talk / 03:30, 19 January 2009 (UTC)[reply]

From seeing how disruptive open proxy abusers can be (I just blocked over 20 anontalk spammer IPs a couple days ago), I support this bot. Mr.Z-man 04:40, 19 January 2009 (UTC)[reply]

This looks fine to me. Though perhaps it should run at Meta and do this globally? That's my only real thought. We have lots of people taking en.wiki's block log and doing blocks manually on their home wiki, which just seems like needless duplication of work. /me shrugs. --MZMcBride (talk) 19:52, 19 January 2009 (UTC)[reply]

Oh, as to the bot op's competency, I consider him fully trusted. He's probably smarter than all of BAG combined. --MZMcBride (talk) 19:55, 19 January 2009 (UTC)[reply]

I support this idea. I think this bot could have a bigger impact on Meta as a (gasp) steward-bot... Lego K^ontribs T^alkM 00:45, 20 January 2009 (UTC)[reply]

Well, if that's done, technically the stewards could just create a new global group with only the "globalblock" right and put the bot in that. Mr.Z-man 00:48, 20 January 2009 (UTC)[reply]

Assuming anyone can comment here...

Endorse the idea. Would accept closed code on the basis that if it acts up, it's blocked like any other admin account making questionable edits. Some questions that I'd consider, though i understand you won't be giving too much detail:

Is it rate limited in any way so that if it did mess up, we wouldn't have 3000 blocks in 2 seconds flat :) Details not needed, trust given that the rate is a good balance.
Does it block for a fixed time? If so, what happens when the fixed time ends? Is the proxy rechecked to the block log? Or does it keep an internal table of proxies and expected blocks?
What happens if an admin manually blocks or unblocks a proxy?
What happens if an IP it decides needs blocking (or unblocking) is already blocked (eg for a different period or indef)?
What happens if the bot has to go down?
I'm assuming its basic logic is to repeatedly scan proxy lists and feeds, check validity, update its database, and for each update it finds, check the projects block list entry. Roughly right?
What if it finds an IP it thought was a proxy, is no longer on a proxy list? Does it unblock, let the original block stand, or what?

Some questions. FT2 ^{(Talk | email)} 04:47, 20 January 2009 (UTC)[reply]

Sure:

Is it rate limited in any way so that if it did mess up, we wouldn't have 3000 blocks in 2 seconds flat :) — Yes, it'll only block 1500 in 2 seconds flat. :D ... just kidding. Yes, the blocking portion runs in a single thread and cuts itself off if it ever thinks it will go over 100 on a single run, at which point it panics and dies.
Does it block for a fixed time? Yes. Currently it's set at 2 years for proxies on all ports but 9090. Since port 9090 proxies are almost always due to a facebook worm, and the users of facebook are almost always on dynamic ips, I only made those 2 months. If so, what happens when the fixed time ends? Nothing yet. I'm planning on adding a second scanning queue to re-check proxies checked 'x' time ago. Is the proxy rechecked to the block log? Or does it keep an internal table of proxies and expected blocks? Not sure what you mean by checked to the block log. The list of ip addresses is stored in a table in a database, with their processing statuses, flags, and other data placed into columns. The blocking side will only handle a given row once. If the ip was verified as open but has never been processed by the blocking side it will try to block it. If the block goes through (or has already been blocked by someone else or if something goes wrong) it will update the status of the ip. If an ip has already been processed—regardless of the action taken on it—it won't try to do anything more on it. It doesn't query the block log or anything like that.
What happens if an admin manually blocks or unblocks a proxy? If a proxy is already blocked, it just moves on. There are no checks for whether or not an admin unblocks a proxy; mainly because the time difference between confirming the proxy as open and blocking it is around 10 seconds max, so even if someone unblocked the proxy (or a block expired) right as the proxy is verified as open, it would still be blocked as an open proxy. After all, it's highly unlikely that the proxy would magically be secured/closed in that 10 second window.
What happens if an IP it decides needs blocking (or unblocking) is already blocked (eg for a different period or indef)? It just moves on. The api throws an error if you try to push a block through if someone's already blocked and the block request doesn't have the reblock parameter set; and, the bot never sets the reblock parameter. The bot catches that error and simply moves on.
What happens if the bot has to go down? Then it goes down? :P My apologies, but I'm not really sure what the question is asking. If the bot goes down it simply stops blocking proxies.
I'm assuming its basic logic is to repeatedly scan proxy lists and feeds, check validity, update its database, and for each update it finds, check the projects block list entry. Roughly right? Hmm... I think you have the right idea. It basically goes like this: depending on the site, the harvester portion crawls for new ips at varying intervals. The harvester then adds the "suspect" ips to the database. The separate scanner portion checks the database for unscanned entries, then scans them and updates their statuses depending on whether the entry was open-and-usable. Then, the separate block portion checks for entries that have been scanned, found as open, but haven't yet been processed by the block portion. Regardless of whether the block happens or not, the block portion marks it as having been processed by the block portion and won't revisit it.
I made it a point to split all of the major sections into their own daemons so that if any part in the chain fails, there's no catastrophic failure. E.g., if the harvester stops working, then there's just nothing for the scanner to do. If there's nothing for the scanner to do, then there's nothing for the blocker to do. Similarly, if the scanner has problems, it'll die and the blocker will have nothing to do.
What if it finds an IP it thought was a proxy, is no longer on a proxy list? Does it unblock, let the original block stand, or what? The only way it discovers proxies is when they appear on the proxy list or have been manually scanned by me. It also doesn't automatically unblock anything or worry if an IP disappears from a proxy list— it's primary concern is that it doesn't re-process anything it's already encountered.

Phew. Hopefully that covers everything. --slakr^\ talk / 09:50, 20 January 2009 (UTC)[reply]

I'm a bit weary, not being able to have a glance and the source (if your afraid of bad people looking at the source, do what Chris G did for AntiAbuseBot's code). But due to the trust the community seems to have in you, I'll weakly support this bot for now. Foxy Loxy ^Pounce! 00:39, 21 January 2009 (UTC)[reply]

I'm looking to approve this bot tomorrow as it seems the concerns have been addressed. MBisanz ^talk 03:00, 27 January 2009 (UTC)[reply]

The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.