Wikipedia:Bots/Requests for approval/Bender the Bot 7
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Bender235 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 01:06, Friday, January 20, 2017 (UTC)
Automatic, Supervised, or Manual: Automatic
Programming language(s): AWB
Source code available: upon request
Function overview: replace http://
with https://
for the New York Times domain.
Links to relevant discussions (where appropriate): WPR: Why we should convert external links to HTTPS wherever possible and WPR: Should we convert existing Google and Internet Archive links to HTTPS?
Edit period(s): one-time run
Estimated number of pages affected: about 100,000
Exclusion compliant (Yes/No): Yes
Already has a bot flag (Yes/No): Yes
Function details: Bgwhite recently pointed me at Secure The News, a project of the Freedom of the Press Foundation, conveniently listing all major news outlets that enable HTTPS access. Having already converted The Guardian links earlier, I want to work through that list one by one, starting with The New York Times who proudly announced their activation of HTTPS a week ago.
We have a lot of NYT links (my conservative guess is 100k pages), and while the NYT announcement says so far only "articles published in 2014 and later" are HTTPS accessible, I want to convert them all right now for two reasons: (1) it does not break older links (for example), only redirect to HTTP again; but if NYT does that on their site, at least they keep the HTTP Referrer information. And (2) as they announced they "intend to bring the rest of our site under the HTTPS umbrella", so it's only a matter of time.
Discussion
[edit]- Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Here is a trial approval to test your code, do you have any statistics as to how many of the links you will change will end up being currently useless due to the remote server changing them back to http? — xaosflux Talk 05:09, 21 January 2017 (UTC)[reply]
- I don't have an accurate number, but I would guess as of today about 70-80% of the links would be re-routed to HTTP on the NYT server. This number will gradually go to zero over the next couple of months. (By the way, the example link above already works with HTTPS on mobile; desktop will follow soon.) --bender235 (talk) 12:07, 21 January 2017 (UTC)[reply]
- Trial complete. Edit history obviously here. --bender235 (talk) 16:21, 21 January 2017 (UTC)[reply]
- Due to the massive size of this request, I have posted a link in from WP:VPR. Placing on hold for any initial community comments. — xaosflux Talk 17:20, 26 January 2017 (UTC)[reply]
- I have no problem with this and it's a great idea and thanks for BB for doing this important work. My question is why not run the bot for multiple sites? It would reduce the number of edits if > 1 site could be processed at once. -- GreenC 17:57, 26 January 2017 (UTC)[reply]
- Support I support this proposal. In addition to the 50 edit trial I'll throw out another possibility but ignore if other bot experts think this is overkill. Given the large number of affected references (approximately 100,000), I think it would be wise to run it for 1000 or so and then pause for couple days just to see if something odd has happened.--S Philbrick(Talk) 20:58, 26 January 2017 (UTC)[reply]
- @Sphilbrick: Generally when approving massive jobs like this I do it with a ramp-up throttle (e.g. 1000 edits, 24 hr wait, 2000 edits, 24 hour wait) with increasing large steps depending on the max size (last step is "open"). — xaosflux Talk 00:00, 27 January 2017 (UTC)[reply]
- Sounds good. Thanks.--S Philbrick(Talk) 01:05, 27 January 2017 (UTC)[reply]
- Support Clear-cut and helpful. My only feedback would be to change this from a one-time run to a large initial run, and then periodically after, as there will certainly be non-https links re-introduced by editors, either due to bad copy/pastes or restoring stuff from page history, etc. Avicennasis @ 22:26, 28 Tevet 5777 / 22:26, 26 January 2017 (UTC)[reply]
- I don't think that will be necessary, since nytimes.com is now HTTPS-by-default, so all copy-pasted URL will be HTTPS from now on. --bender235 (talk) 23:37, 26 January 2017 (UTC)[reply]
- All "good" 'copy-pasted URL will be HTTPS from now on', but that wasn't my example. An editor who copied from "://www.foo.com" by accident, might just type in http at the beginning of that URL, vs. going back and re-copy/pasting. And that still doesn't address stuff pulled from history and the like. But, that's your call to make. I still support it either way. Avicennasis @ 02:57, 29 Tevet 5777 / 02:57, 27 January 2017 (UTC)[reply]
- I don't think there will be more than a few cases. I'll keep an eye on it. --bender235 (talk) 19:48, 28 January 2017 (UTC)[reply]
- All "good" 'copy-pasted URL will be HTTPS from now on', but that wasn't my example. An editor who copied from "://www.foo.com" by accident, might just type in http at the beginning of that URL, vs. going back and re-copy/pasting. And that still doesn't address stuff pulled from history and the like. But, that's your call to make. I still support it either way. Avicennasis @ 02:57, 29 Tevet 5777 / 02:57, 27 January 2017 (UTC)[reply]
- I don't think that will be necessary, since nytimes.com is now HTTPS-by-default, so all copy-pasted URL will be HTTPS from now on. --bender235 (talk) 23:37, 26 January 2017 (UTC)[reply]
- Support I can't find any issues with this proposal, overall net positive. Iazyges Consermonor Opus meum 04:19, 27 January 2017 (UTC)[reply]
- Support I don't see any problems. HTTPS is important for the privacy and security of our readers. Thank you for the good work bender235! Tony Tan · talk 05:31, 29 January 2017 (UTC)[reply]
- Approved for extended trial (1000 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Please update with results. — xaosflux Talk 20:11, 30 January 2017 (UTC)[reply]
- Trial complete. Edit history again here. --bender235 (talk) 06:44, 31 January 2017 (UTC)[reply]
- Approved. Task approved with ramp up schedule:
- 1000 edit, 24hr hold (already completed above)
- 2000 edits, 24hr hold
- 3000 edits, 24hr hold
- 5000 edits, 24hr hold
- Open editing.
- Should there be any minor issues brought up during the ramp up that are easily correctable, make corrections and restart the ramp up schedule. You may certainly use a slower ramp up schedule at your discretion. — xaosflux Talk 16:32, 3 February 2017 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.