Computing desk
< August 15	<< Jul \| August \| Sep >>	August 17 >

Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

August 16

I'm not a robot

How do the "I'm not a robot" checkboxes on login pages work? It's just a checkbox from what the users see, so how does that prove anything? Thanks, †dismas†|^(talk) 12:26, 16 August 2018 (UTC)[reply]

@Dismas: If reCAPTCHA looks at your browser's identifying information (user agent, OS, window size, Google-related cookies, ...) and thinks you're not likely to be a robot, then it lets you bypass the questioning. If you deliberately make yourself difficult to identify or track, such as by going through a VPN or Tor, it is very likely to make you go through at least one image identification question before allowing you to proceed. Jc86035 (talk) 12:49, 16 August 2018 (UTC)[reply]

There is also a time limit. If your browser hits too many pages that have reCAPTCHA installed in too short of a time, it will assume you are a robot and ask you to identify trucks or street signs or bridges. I feel that they do this because they are looking for someone who has written a program that accurately identifies those things so they can purchase the program to use in Google Maps. 71.12.10.227 (talk) 13:45, 16 August 2018 (UTC)[reply]

I had assumed (perhaps incorrectly) that the image identification was a clever way to feed a database held by Google that can be used as training data for their machine learning based image identification algorithms. Egglz (talk) 13:40, 17 August 2018 (UTC)[reply]

To function, the image identification must be solved already. Otherwise, it can't tell if you are correct or not. So, they can't use your response to train their algorithms any more than they could use their own response when they identified what was correct. What they are looking for is a computer, somewhere, that hits their service thousands of times a minute and correctly solves the images. Then, they know someone has a better algorithm than they have and, hopefully, they can find the developer and purchase the algorithm. 71.12.10.227 (talk) 15:29, 18 August 2018 (UTC)[reply]

With this being a reference desk, can you provide a reference for that? Because this seems to suggest that I was correct. Perhaps they compare your answers to the answers of people that have previously solved the same image and just every now and again throw in a fresh one. I believe they also track mouse movements within the reCAPTCHA box, giving them further evidence of whether you are a human. Egglz (talk) 11:03, 19 August 2018 (UTC)[reply]

The reason the original ReCaptcha had two words was AFAIK mentioned by Google as being because one was 'solved' one was note. So your solution to one of the words counted little to whether you were human. I imagine the image system is similar. Most of the images are 'solved' and known to either be or not be what they say they are but one or two are not. It's may be a little more complicated than that and they also collect statistics on some images where there's dispute etc (and also if the agent appears to eventually be recognised as human etc). But the large point is there's no requirement that all the images are 'solved'. Nil Einne (talk) 11:13, 21 August 2018 (UTC)[reply]

Yes, that makes sense when there are multiple images to choose from. I was thinking of the type where you have to pick all the squares that contain a certain feature from a grid overlain on a photo. Egglz (talk) 12:42, 21 August 2018 (UTC)[reply]

Ah yes you're right. In those cases either they are all solved and not used or sometimes a training image is thrown in. Nil Einne (talk) 07:34, 22 August 2018 (UTC)[reply]

Thank you both. I had a feeling that it was something along those lines but they were just theories. Thanks again, †dismas†|^(talk) 16:20, 16 August 2018 (UTC)[reply]

For those situations where the Wikimedia software gives you a "I am not a robot" checkbox instead of the image ID page, it still has advantages: it makes the robot operator have to create a custom version for Wikipedia. A lot of these bot operators simply run their standard bot on thousands of different sites hoping to hit the tiny percentage that will accept and display the spam. --Guy Macon (talk) 18:52, 16 August 2018 (UTC)[reply]

I have a related question about those bloody annoying tools. Why are the images used so often of such very poor quality, making the job frustratingly difficult for genuine human respondents? It discourages me from using those sites. I doubt that's what the site owners want. HiLo48 (talk) 23:07, 18 August 2018 (UTC)[reply]

As I mentioned above, the responses are used to train AI, so it is beneficial to have humans process poor quality images where a machine is more likely to struggle. Egglz (talk) 11:06, 19 August 2018 (UTC)[reply]

How can me getting it wrong multiple times help an AI to learn? HiLo48 (talk) 11:44, 19 August 2018 (UTC)[reply]

Well, good point. Presumably you believe you are doing it correct, so maybe it is other people/bots getting it wrong so that when your answer is compared to the 'wisdom of crowds' answer, the system marks it as incorrect but probably still adds it to their database. Egglz (talk) 17:36, 19 August 2018 (UTC)[reply]

I'm not convinced and I've never seen strong evidence nor do your sources suggest Google is intentionally making their images difficult to solve because they want the work. I mean they are obviously using the work, but this seems to be a case of 'why piss of people unnecessarily'? The reason Captchas is basically explained by your source. As AIs improve it's getting hard and harder to make ones that aren't relatively simple to solve with AI. While there are still benefits as Guy Macon mentioned in cutting off the dumber bots, ultimately the easier it is the less effective they are. Of course on the flip side it's well recognised that they are also getting more annoying and difficult for humans and this is offputting, so it's a balance. One thing sort of mentioned but not very well above. When it comes to ReCaptchas it really helps if you have a Google account and login to it in any browser where you're getting them (before you get it). This means your far more likely to get the box. If you're concerned about the privacy implications consider that in a lot of the world, particularly outside the EU there's a good chance it doesn't actually change what they know about you especially if you account is not in your real name and used for things like email etc. Nil Einne (talk) 11:13, 21 August 2018 (UTC)[reply]

"Presumably you believe you are doing it correct". Nope. Quite often I have no idea. That's how confusing I find those things. HiLo48 (talk) 11:19, 21 August 2018 (UTC)[reply]

To clarify, I wasn't trying to suggest that they are deliberately difficult, just that it is beneficial to have the poor quality images solved as well to help the AI to identify features in poor quality images rather than filter out anything of low quality. Egglz (talk) 13:02, 21 August 2018 (UTC)[reply]

Ah I see although that wasn't really an answer to the HiLo48's question then which was why the images can be so hard to solve which as I said I strongly suspect has much more to do with the fact that they need to be hard for AIs to solve. Of course when you're training AI, you're going to end up with both purposes being met anyway. Nil Einne (talk) 04:53, 22 August 2018 (UTC)[reply]

Simple solution to defeating CAPCHAs:

Set up a web site with free porn, free MP3 downloads, etc.

Grab the CAPCHA from a Wikipedia session, give it to somene wanting to access your site to solve.

Feed the solution they give you back to Wikipedia.

If Wikipedia lets you post, let the user access his pirated song or porn clip.

--Guy Macon (talk) 06:10, 22 August 2018 (UTC)[reply]