Jump to content

Wikipedia:Link rot/URL change requests/Archives/2024/September

From Wikipedia, the free encyclopedia


cbsnews.com/stories

Hello. CBS News links with /stories/ in the URL don't work. www.cbsnews.com/stories/ is now www.cbsnews.com/news/name-of-the-article/. Some of these can be converted over while others don't fit the format: For example, this is now here for Pedro Carmona.

In this case, I think this changeover would need 3 stages. Article title, article title and date, archive any that remain broken.

Thanks! MrLinkinPark333 (talk) 23:28, 26 August 2024 (UTC)

Building a new URL from |title= data is difficult. In the above examples:
  1. "CBS: Venezuelan Coup Leader Exits" --> "venezuelan-coup-leader-exits" (drop leading "CBS:")
  2. "Cashing In For Profit?" --> "cashing-in-for-profit" (drop ?)
  3. "'Lackawanna 6' Link To Yemen Killings?" --> "lackawanna-6-link-to-yemen-killings-04-11-2002" (drop single-quote and ?, add a date string parsed from original URL)
  4. "U.S. Plants: Open To Terrorists" --> "us-plants-open-to-terrorists-13-11-2003" (drop period and semi-colon, add a date string)
In #1 and #4 they each have colons but are done differently. I suspect there will be a lot of edge cases. I can try some generic rules like this and see how many it can get. If you find any more rules, that will help. -- GreenC 03:20, 27 August 2024 (UTC)
Other cases:
It might be easier to do ones without punctuation marks first. However, I can't predict which would need dates and which don't. MrLinkinPark333 (talk) 03:59, 27 August 2024 (UTC)
  1. "We're Watching: How Chicago Authorities Keep An Eye On The City" --> "were-watching" (drop punct and split : to left side)
  2. "$10 Million? NYC Says No Thanks" --> "10-million-nyc-says-no-thanks" (drop punct including $)
  3. "Iceland Says Bye to the Big Mac" --> "iceland-says-bye-to-the-big-mac" (square-link title)
-- GreenC 15:40, 27 August 2024 (UTC)
4,393 pages

Results

  1. URLs with a match: 3,984 (converted via above method)
  2. URLs not matched: 1,293 (unable to covert)
  3. Title unspecified: 78 (bare and square links without a title)

User:MrLinkinPark333: This turned out better than expected with a 74% success rate. Though the 80/20 Rule is expected. It was fiddly getting all the transforms right and building a table of possible URLs. Some of the #2's probably have a match but the title is too complicated to parse. Many of the titles in #2 are straightforward but no URL exists. If you want the list of #2 let me know. -- GreenC 17:26, 29 August 2024 (UTC)

If you mean the ones that were too complicated to convert, sure. Perhaps I can find more conversion rules from them. I'm also interested in the bare links of #3. I don't need the 404s of easy conversion. MrLinkinPark333 (talk) 17:44, 29 August 2024 (UTC)
Set #2 and #3: Wikipedia:Link_rot/Cases/cbsnews.com-stories -- GreenC 05:47, 31 August 2024 (UTC)
Of the 10 I tested in case #2, I found a handful that worked.
I would like the list for #2 updated at that link rot cases page if any more links are resolved. However, not all of these links will be fixed. For example, the links at Columbus Blue Jackets and Concerns and controversies at the 2010 Commonwealth Games don't have working links. If I find any more, I'll let you know. Otherwise, if we run out of ones to replace, the rest could be replaced with archived links. MrLinkinPark333 (talk) 20:07, 31 August 2024 (UTC)
They all had archives added already there's no loss to verifiability if nothing further is done. The one's that might be made to work require special edge case rules that I don't want to deal with sorry it's too messy and time consuming there is too much variability. For example how many URLs are fixed by removing "The Early Show - CBS News"? The answer is 4. So that's 4 out of 1000. Cntrl-F search on "/" in that list, there is no general rule for "everything after slash is removed". It goes on like that, the data is extremely messy and variable. In situation like this, the 80/20 Rule rules - you can often get the first "easy" 80% and the remaining hard "20%" is dealt with or not, but at least you got 80% is better than nothing. It's just the nature of this particular problem trying to create a URL from free-form text. -- GreenC 23:23, 31 August 2024 (UTC)
Fair enough. Hopefully there'd be more luck with the other cbs ones below. --MrLinkinPark333 (talk) 23:26, 31 August 2024 (UTC)
Honestly, 74% is much better than I expected, considering. And probably at least half those in #2 are legitimate dead links no page available, the real conversion rate might be closer to 90%, after the dead links are factored out. -- GreenC 00:22, 1 September 2024 (UTC)

 Done

cbsnews.com/numeric

Hello. While looking at CBS News, I found many URLs with numeric IDs that don't work. I found 2 that redirect but the rest don't:

URL replacements are the same as the above section with some exceptions:

  • For Jihobbyist, this is now here. - Political Hotshot needs to be removed from the reference as it does not exist in the new URL's article title.
  • ~590 URLs that start with 2 (any non-mainspace can be ignored).
  • ~4500 URLs that start with 8 (any non-mainspace can also be ignored)

Thanks again! MrLinkinPark333 (talk) 23:47, 26 August 2024 (UTC)

2,413 pages

Results

  • URLs with a match: 2,279 (converted with above method)
  • URLs unable to match: 631
  • URLs no title available: 36

Successful matches: 77.4% .. of those 631, roughly half are not a matching problem rather page no longer exists. Assuming 50% is true, and also removing the no title available, the real match rate is 88% ie. a further 12% might be matched but not practical to the variability of the data. -- GreenC 00:34, 1 September 2024 (UTC)

 Done -- GreenC 00:34, 1 September 2024 (UTC)

Not too bad! MrLinkinPark333 (talk) 03:44, 1 September 2024 (UTC)

cbc.ca/story

Hello. There are links to cbc.ca using /story/ that are broken. While there are new working URLS, I can't predict them. For example, this has a working archived link that redirects here for Scouting controversy and conflict. For these ones, I request looking for archived redirects first, then adding archives to the rest.

  • /story/ 41 articles
  • /news/story/ 415 articles.

Thanks! MrLinkinPark333 (talk) 22:08, 28 August 2024 (UTC)

This is a weird site because the ghost redirects are.. ghostly. In the above example, there are different redirects depending on timestamp. Sometimes it goes here and other times here. They are also somewhat chronologically buried in the list, normally I only get the most recent redirect (because there is no way of knowing which is correct without looking), and the last redirect goes here, which is not a ghost redirect. Thus unable to determine redirect URLs with automation. -- GreenC 21:06, 1 September 2024 (UTC)

456 pages

  • Checked 457 pages and edited 376 pages. Added 2 {{dead link}}. Switched 45 |url-status=live to dead. Added 384 archive URLs (337 Wayback). Changed 28 citation metadata.

 Done -- GreenC 17:17, 3 September 2024 (UTC)

MrLinkinPark333: The cbc.ca/story and /news/story appear to have been parsed and fixed during the below section. Example Special:Diff/1243671672/1243757966 -- GreenC 17:17, 3 September 2024 (UTC)

www.cbc.ca/redirects

Hello. Cbc.ca has redirects to working URLs. Some of them require URL changes while others can be fixed quickly.

  • No Changes
    • This automatically goes here without any URL changes for Serena Williams. Not sure how many cases don't require /news/ to make working redirects.
  • ~2000 2 folders
  • ~3000 insource:/http?:\/\/www\.cbc\.ca\/[a-z]+\/[a-z]+\/[story]+\//
  • ~1300 insource:/http?:\/\/www\.cbc\.ca\/[a-z]+\/[a-z]+\/[a-z]+\/[story]+\//
  • ~50 insource:/http?:\/\/www\.cbc\.ca\/[a-z]+\/[a-z]+\/[a-z]+\/[a-z]+\/[story]+\//
  • 3 insource:/http?:\/\/www\.cbc\.ca\/[a-z]+\/[a-z]+\/[a-z]+\/[a-z]+\/[a-z]+\/[story]+\//

Since this is a big request, I suggest focusing on ones that already redirect without changing the URLs first, then the ~180 /m/ ones. Thank you very much! MrLinkinPark333 (talk) 22:44, 28 August 2024 (UTC)

As the House of Commons of Canada example has both an /amp/ and full link, could those /amp/ ones be archived in case they break? Not sure why there's two links to the same article, but that helps! MrLinkinPark333 (talk) 16:09, 2 September 2024 (UTC)
For House of Commons-like URLs I can't automatically determine the desktop URL only the mobile version. "AMP" is for pages optimized for mobile users, a parallel version of the site. Some sites have an API (a URL) that allows translation between the mobile and desktop URL ie. give it the AMP URL and it will return the desktop URL. Ideally all URLs on Wikipedia are the desktop version. But I don't know if they have an API, that would be nice to have. Either way anything added to Wikipedia will get archived into the Wayback Machine automatically. If the link later dies the bots or my tool will add an archive. -- GreenC 16:20, 2 September 2024 (UTC)

Results

  • 10,368 links are live. All (but 31) are new, created per above rules.
  • 95 links are not working. Of those, 12 had a {{dead link}} added. The rest have archives.
  • Checked ____ pages and edited 7,133 pages. Moved 10,368 links to a new URL. Removed 45 {{dead link}}. Added 12 {{dead link}}. Switched 1,460 |url-status=dead to live. Switched 2 |url-status=live to dead. Added 556 archive URLs (402 Wayback).

 Done -- GreenC 02:08, 3 September 2024 (UTC)

articles.latimes.com

Hello. This is a big request. URLs with articles.latimes.com either redirect to the new URL or don't work:

60000 with HTTP/HTTPS. Any non articlespace links can be filtered out. Thank you very much! MrLinkinPark333 (talk) 23:58, 12 August 2024 (UTC)

MrLinkinPark333: It looks like *.latimes.com is 96,000 pages and articles.latimes.com is 37,000. I could focus on articles.latimes.com (which is a significant project) but I wonder about the other 2/3rds. Are they redirecting also? Maybe I should do articles.* right now to keep the size manageable. -- GreenC 02:00, 13 August 2024 (UTC)
From the sample checks at Summer Olympic Games, Tampa Bay Buccaneers and 2020 Summer Olympics for latimes.com, these look fine and don't need new URLs. If that changes in the future, I could file a separate request later. MrLinkinPark333 (talk) 22:40, 13 August 2024 (UTC)
OK great. This job is going fast because the LAT has an exceptionally clean site, rapid response, few dead links. It's mostly just finding the redirect and replacing. I'm happy with how well ghost redirect discovery is working, now noted in the statistics (along with soft-404 stats) starting with this run. -- GreenC 16:05, 14 August 2024 (UTC)

Enwiki in multiple batches:

  • Batch 1: Checked 3,000 pages and edited 2,935 pages. Moved 4,152 links to a new URL. Resolved 68 ghost redirects. Resolved 25 soft-404s. Removed 2 {{dead link}} templates. Added 8 {{dead link}}. Switched 149 |url-status=dead to live. Switched 14 |url-status=live to dead. Added 166 archive URLs (142 Wayback). Changed 13 citation metadata fields.
  • Batch 2: Checked 7,000 pages and edited 6,859 pages. Moved 9,663 links to a new URL. Resolved 143 ghost redirects. Resolved 53 soft-404s. Removed 1 {{dead link}}. Added 21 {{dead link}}. Switched 314 |url-status=dead to live. Switched 34 |url-status=live to dead. Added 372 archive URLs (276 Wayback). Changed 36 citation metadata.
  • Batch 3: Checked 26,845 pages and edited 26,251 pages. Moved 36,856 links to a new URL. Resolved 481 ghost redirects. Resolved 195 soft-404s. Removed 5 {{dead link}}. Added 89 {{dead link}}. Switched 1,289 |url-status=dead to live. Switched 128 |url-status=live to dead. Added 1,329 archive URLs (1,106 Wayback). Changed 140 citation metadata.

IABot: does not support URL moves, redirects are working the bot will consider links live.

 Done -- GreenC 20:43, 14 August 2024 (UTC)

Pass 2

  • Checked 36,845 pages and edited 658 pages. Moved 379 links to a new URL. Resolved 216 ghost redirects. Resolved 438 soft-404s. Removed 2 {{dead link}}. Added 106 {{dead link}}. Switched 199 |url-status=dead to live. Added 217 archive URLs (138 Wayback).

-- GreenC 02:29, 4 September 2024 (UTC)

ehdenfamilytree.com

This 'ehdenfamilytree.com' is dead and the new one is 'ehdenfamilytree.org'. Saroufim1 (talk) 01:49, 31 August 2024 (UTC)

80 pages

  • Checked 79 pages and edited 79 pages. Moved 120 links to a new URL. Removed 3 {{dead link}}. Switched 1 |url-status=dead to live.

 Done -- GreenC 19:26, 3 September 2024 (UTC)

AnandTech shuts down

Amazing website/technews site AnandTech has shut down (https://www.anandtech.com/)

If an archive bot could preemptively archive the entirety of that website, that would be mint, as people are unsure what will happen to the content.

Thanks.

Headbomb {t · c · p · b} 04:32, 31 August 2024 (UTC)

1,158 pages — Preceding unsigned comment added by GreenC (talkcontribs)

@GreenC: those are just what's used on Wikipedia. Which, I agree should be a priority. But if archiving the entirety of Andandtech is possible... either by talking to IA or through your bot or whatever that would be an amazing service to the tech community/tech historians. Headbomb {t · c · p · b} 14:45, 31 August 2024 (UTC)
I believe the domain is already crawled by the Wayback Machine as part of the GDELT Collection ("NO404-GDELT"). For example given this archive the "About this capture" tab says GDELT Collection. The crawl was started in 2014, though it might be the whole site. If you can find some older URLs (older the better) and check if they exist in the Wayback. They should be there, but worth checking to see if the crawl missed them. If there are blank spots then I'll need to go through the URLs on Wikipedia one by one and capture any that are missing which is a bit of a job. -- GreenC 16:24, 31 August 2024 (UTC)
Here's an article from 1998. You can tell it's a very early article by the URL: /161/ .. recent articles are at around /21000/. It's a pretty good bet the site is well archived. -- GreenC 05:29, 2 September 2024 (UTC)
Sounds like anandtech.com will be staying stable and keeping all its articles up. [1]. And while the AnandTech staff is riding off into the sunset, I am happy to report that the site itself won’t be going anywhere for a while. Our publisher, Future PLC, will be keeping the AnandTech website and its many articles live indefinitely. So that all of the content we’ve created over the years remains accessible and citable. Just FYI to help with making the decision. –Novem Linguae (talk) 17:41, 1 September 2024 (UTC)
There is a dedicated team of volunteers and staff (I supposed) of the Internet Archive archiving dead or dying websites. And Anandtech is listed on their wiki. If anyone here wants to speed up the process of the site getting archived, I suggest volunteering some time or resources there as well. – robertsky (talk) 05:52, 2 September 2024 (UTC)
ω Awaiting to see if the site goes offline. -- GreenC 18:15, 3 September 2024 (UTC)

google.com/search?q=cache:

Practically all Google Search links with this string are redirects to Google cache, which has shut down. (technically, not every link starting with this string necessarily redirects to cache, but all links I've found are redirects). Helpful Raccoon (talk) 18:45, 1 September 2024 (UTC)

Note: while the vast majority of URLs I found are followed by 12 characters and another colon before the original website URL (e.g. http://google.com/search?q=cache:EdF1mH2UVF8J:www.maurinet.com/allform/pportnew.pdf+mauritius+national+card&hl=en&ct=clnk&cd=5&gl=nz in Identity document), a few of them are not followed by 12 characters (e.g. http://www.google.com/search?q=cache:www.melafoundation.org/theatre.pdf in Drone music). Helpful Raccoon (talk) 18:58, 1 September 2024 (UTC)
OK. I wrote/use Google Cache Parser (GitHub). It correctly parses both those URLs. -- GreenC 21:53, 4 September 2024 (UTC)

User:Helpful Raccoon: I cleared Google Cache in February: Wikipedia:Link_rot/URL_change_requests/Archives/2024/February#Google_cache targeting webcache.googleusercontent.com but was not aware of google.com/search?q=cache: .. thanks for bringing this to attention. 776 pages-- GreenC 19:08, 1 September 2024 (UTC)

Results

 Done -- GreenC 23:40, 4 September 2024 (UTC)

time-blog.com

Site appears to be dead. All links redirect to the time.com homepage. There are 54 pages. Thank you! Helpful Raccoon (talk) 23:13, 1 September 2024 (UTC)

Enwiki

  • Checked 54 pages and edited 41 pages. Added 2 {{dead link}}. Switched 4 |url-status=live to dead. Added 44 archive URLs (43 Wayback).

IABot DB

  • Checked and updated 84 unique URLs which will propagate across 300+ wikis.

 Done -- GreenC 00:29, 5 September 2024 (UTC)

nola.com/politics

This subpage appears to be dead. All links currently redirect to https://www.theadvocate.com/baton_rouge/news/politics/ (and it doesn't show the original article). I could not find the original articles by searching in theadvocate.com. There are 343 pages. Helpful Raccoon (talk) 23:23, 1 September 2024 (UTC)

Enwiki

  • Checked 342 pages and edited 320 pages. Added 30 {{dead link}}. Switched 36 |url-status=live to dead. Added 563 archive URLs (419 Wayback). Changed 22 citation metadata.

IABot DB

  • Checked and updated 677 unique links which will propagate to 300+ wikis

 Done -- GreenC 05:03, 5 September 2024 (UTC)

voices.washingtonpost.com

Articles appear to be unavailable. All links currently redirect to the Washington Post landing page; many of them already have archive URLs but a significant minority most do not. 1874 pages. Helpful Raccoon (talk) 23:36, 1 September 2024 (UTC)

Enwiki

  • Checked 1,876 pages and edited 1,758 pages. Added 26 {{dead link}}. Switched 294 |url-status=live to dead. Added 2,050 archive URLs (1,750 Wayback). Changed 56 citation metadata.

IABot DB

  • Checked and updated about 2,500 unique links which will propagate to 300+ wikis

 Done -- GreenC 01:36, 6 September 2024 (UTC)

Bug: square archives

There was a bug in the core code, introduced 22 projects ago. All archive URLs with a square link were skipped. Thus something like [https://web.archive.org/web/20240101/https://example.com Example.com] was not processed. Following is the list of projects (internal code). I may or may not redo them as time allows.

  • urlchanger_www03ibmcom.nim
  • urlchanger_tsfi.nim
  • urlchanger_ieee.nim
  • urlchanger_ftpibmcom.nim
  • urlchanger_fhwadotgov.nim
  • urlchanger_wileycomstore.nim
  • urlchanger_msnbcmsncom.nim
  • urlchanger_hpvectorcojp.nim
  • urlchanger_bcsportshalloffame.nim
  • urlchanger_nbcnewscom.nim
  • urlchanger_gameinformer.nim
  • urlchanger_cbccastory.nim
  • urlchanger_ukbusinessinsidercom.nim
  • urlchanger_cbsnewsnumeric.nim
  • urlchanger_slatemsncom.nim
  • urlchanger_articlescnncom.nim
  • urlchanger_prwebcom.nim
  • urlchanger_articleslatimescom.nim
  • urlchanger_emporis3.nim
  • urlchanger_cbsnewsstories.nim
  • urlchanger_cartoonnetwork.nim
  • urlchanger_businessinsidercomau.nim

-- GreenC 22:04, 2 September 2024 (UTC)

These are not big numbers. In the 10,333 URLs for cbc.ca/redirects above, there were 60 instances of square archives. I'll re-run a couple of the larger projects. -- GreenC 02:40, 3 September 2024 (UTC)

arxiv.org mirror shut down

arxiv.org has several mirrors but these will be shut down on 2024-09-15

Only one mirror has a domain name that arxiv.org does not control x x x.lanl.gov from US Los Alamos National Laboratory (remove spaces between x's)

Can we get all links of the format https://x x x .lanl.org/{path} changed to https://arxiv.org/{path} ?

All links from cn.arxiv.org, de.arxiv.org lanl.arxiv.org and in.arxiv.org will be rerouted via DNS changes to arxiv.org and continue to work correctly. URLs with those hostnames could also be updated but it is unnecessary. Brian Caruso (talk) 14:37, 5 September 2024 (UTC)

There's no xxx.lanl.gov link anywhere on Wikipedia ([2]). For cn. de. lanl. and in.arxiv, there's about 27 links ([3]), which I will shortly update. Headbomb {t · c · p · b} 15:01, 5 September 2024 (UTC)
 Done Headbomb {t · c · p · b} 15:09, 5 September 2024 (UTC)

articles.nydailynews.com

This site is down, but links can be converted to live subpages of nydailynews.com if the article title is known: Currently, links are of the form articles.nydailynews.com/[yyyy]-[mm]-[dd]/[section]/[junk]. These articles are available at URLs of the form www.nydailynews.com/[yyyy]/[mm]/[dd]/[title]/, where the title is in all lowercase, punctuation is stripped, and spaces are replaced by hyphens. Not sure about edge cases. Note that the hyphens in the dates must be replaced with slashes.

The title can be extracted if an archived version of the articles.nydailynews.com article is available; alternatively, |title= data can be used, but it does not always correspond to the actual article title due to human error. Here is an example of an archived page. Thank you! 1381 pages. Helpful Raccoon (talk) 00:30, 2 September 2024 (UTC)

Turns out that the article dates sometimes change around too... (8/13 vs 8/14 in the following example)
Old article example: http://articles.nydailynews.com/2012-08-14/news/33187461_1_giants-weatherford-giants-and-jets-metlife-stadium (from New York Giants)
New article example: https://www.nydailynews.com/2012/08/13/weatherford-beating-jets-is-pretty-sweet/ Helpful Raccoon (talk) 02:09, 2 September 2024 (UTC)
Some articles also might just be lost; e.g. I couldn't find a live version of https://web.archive.org/web/20121028212158/http://articles.nydailynews.com/2012-05-15/news/31714265_1_john-mayer-spotlight-interviews. Helpful Raccoon (talk) 02:13, 2 September 2024 (UTC)

User:Helpful Raccoon: I can do this (with a ~10 to 20% miss rate), but, adding archives may be better than converting to live because the live appears to be paywalled (example). Granted the paywall is "low" ie. one can view source to read the content; Or save the page at Wayback which removes the paywall (example) .. let me know what you think. My estimate is treat them all as dead and add archives. -- GreenC 05:15, 2 September 2024 (UTC)

I don't have a preference honestly. Archiving is at least a simpler solution than converting to live URLs. Helpful Raccoon (talk) 05:26, 2 September 2024 (UTC)
I can do it, only what is best for Wikipedia. The URLs are almost identical to Wikipedia:Link_rot/URL_change_requests#articles.cnn.com and the conversion of title to URL is basically the same as Wikipedia:Link_rot/URL_change_requests#cbsnews.com/stories. In the past, I usually lean towards archives over live when there is a paywall, to make verification easier. Sometimes website will deny archive access, at which point the URLs become inaccessible (dead at the site and no archives), until they are converted to the live version. Pros and cons, reactive and proactive. -- GreenC 05:59, 2 September 2024 (UTC)
Due to the paywall I simply converted them to archives. If the situation changes I can redo to the live link method. -- GreenC 16:45, 6 September 2024 (UTC)

Enwiki

  • Checked 1,380 pages and edited 1,072 pages. Added 250 {{dead link}}. Switched 146 |url-status=live to dead. Added 920 archive URLs (368 Wayback).

IABot DB

  • Checked and updated about 2,000 unique links which will propagate across 300+ wikis

 Done -- GreenC 23:05, 6 September 2024 (UTC)

weeklystandard.com

Dead website for a defunct magazine, The Weekly Standard. 989 pages. Helpful Raccoon (talk) 04:48, 2 September 2024 (UTC)

Enwiki

  • Checked 990 pages and edited 744 pages. Added 31 {{dead link}}. Switched 141 |url-status=live to dead. Added 707 archive URLs (618 Wayback). Changed 40 citation metadata.

IABot DB

  • Checked 1,591 and updated links which will propagate to 300+ wikis

 Done --GreenC 02:42, 7 September 2024 (UTC)

archive.fortune.com

This site is dead and no articles are available on fortune.com, but they are currently available on CNN for some reason. "archive.fortune.com" just needs to be replaced with "money.cnn.com" in all URLs. Example dead URL from Apple Inc.: http://archive.fortune.com/magazines/fortune/fortune_archive/2007/03/19/8402321/index.htm. Equivalent live URL: https://money.cnn.com/magazines/fortune/fortune_archive/2007/03/19/8402321/index.htm. 569 pages. Helpful Raccoon (talk) 05:22, 2 September 2024 (UTC)

Enwiki

  • Checked 569 pages and edited 561 pages. Moved 616 links to a new URL. Removed 3 {{dead link}}. Added 2 {{dead link}}. Switched 49 |url-status=dead to live. Switched 2 |url-status=live to dead. Added 16 archive URLs (10 Wayback). Changed 1 citation metadata.

IABot DB

  • Checked and updated 799 links which will propagate to 300+ wikis

 Done -- GreenC 02:45, 7 September 2024 (UTC)

GreenC (talk · contribs), this edit changed the URL to https://www.cnn.com/business which seems to be incorrect. Would you take a look? Thank you. Cunard (talk) 08:28, 7 September 2024 (UTC)
I found 6 other edits that added the cnn.com/business URL (plus one that was made by a different user): [4] [5] [6] [7] [8] [9]. Helpful Raccoon (talk) 09:40, 7 September 2024 (UTC)
That's a soft-404, and can clearly see in the logs. Trying to do too much and skipping steps. I'll roll back those edits and redo the pages, with the soft-404 trap enabled. There are no others (that I can see in the logs). Thanks for the notification. -- GreenC 15:56, 7 September 2024 (UTC)
Fixed eg. [10] -- GreenC 19:20, 7 September 2024 (UTC)

sportsillustrated.cnn.com

Dead domain, along with subdomains such as vault.sportsillustrated.cnn.com. I could not find any live versions of the articles. 9,555 pages. Helpful Raccoon (talk) 19:16, 2 September 2024 (UTC)

Wait, some articles are live on si.com. I'm working on possible conversion rules. Helpful Raccoon (talk) 04:51, 7 September 2024 (UTC)
What I have so far: Articles may be live if they are in the vault (vault.sportsillustrated.cnn.com or sportsillustrated.cnn.com/vault) or were published after 2008 or so.
New vault URLs are of the form https://vault.si.com/vault/YYYY/MM/DD/name-of-article. Unfortunately the original URL does not contain the date of publication, but this can be extracted from the reference template or an archived version of the original article. Original article example from Baseball: http://sportsillustrated.cnn.com/vault/article/magazine/MAG1188950/index.htm. Converted: https://vault.si.com/vault/2011/08/08/its-all-about-anticipation.
Articles published around 2013-2014 are typically of the form sportsillustrated.cnn.com/<section>/news/YYYYMMDD/name-of-article/.... The converted article is of the form si.com/<new section>/YYYY/MM/DD/name-of-article. ".ap" and any other junk should be stripped from the end of the original URL. The date sometimes changes by 1 day. Old URL example from Condoleezza Rice: http://sportsillustrated.cnn.com/college-football/news/20131016/condoleezza-rice-college-football-playoff/index.html. New URL: https://www.si.com/college/2013/10/17/condoleezza-rice-college-football-playoff.
In many cases the section is unchanged during the conversion, but there are some special cases. Section conversion rules that I've found: college-football to college, college-basketball to basketball, -olympics to olympics.
Articles published between 2009-2013 are usually live, but the conversion rules can be difficult. I will get to those later. Helpful Raccoon (talk) 05:29, 7 September 2024 (UTC)
Hi, I appreciate the discoveries you made. The above is a programmers nightmare: "around 2013-2014 are typically" etc.. etc.. I have limits, this is one. The above is probably 30-50 hours (3-5 days) given the number of links, and the likely amount of novel code and testing involved. It seems easy, but is not. It's not in my budget sorry. In the mean time I can convert to archives, and if someone wants to do these conversions, send me the table of old and new I will add them to wiki with appropriate template support. -- GreenC 06:28, 7 September 2024 (UTC)
Thanks for the feedback. I will try to be more precise when requesting conversions and take into account potential difficulties. Helpful Raccoon (talk) 09:04, 7 September 2024 (UTC)

Enwiki

  • Checked 9,562 pages and edited 7,024 pages. Removed 3 {{dead link}}. Added 237 {{dead link}}. Switched 1,079 |url-status=live to dead. Added 8,136 archive URLs (6,462 Wayback). Changed 5 citation metadata.

IABot DB

  • Checked and updated about 15,000 unique links which will propagate to 300+ wikis

 Done -- GreenC 15:03, 8 September 2024 (UTC)

articles.chicagotribune.com

Articles in this domain currently redirect to a 404 page. However, most can be converted into live URLs using the same method for articles.nydailynews.com. Example dead URL from Barack Obama: http://articles.chicagotribune.com/2009-03-22/features/0903200725_1_barack-obama-story-chicago-school-harvard-law. Converted live URL: https://www.chicagotribune.com/2009/03/22/ivory-tower-of-power/. 12,469 pages. Helpful Raccoon (talk) 18:46, 2 September 2024 (UTC)

When doing free-form title string conversions such as this, expect 10% to 20% won't convert for various reasons, mainly because of variable title strings don't match, or the page is legitimately no longer available at the site. -- GreenC 17:16, 8 September 2024 (UTC)

Enwiki in two batches:

  • Batch 1: Checked 3,000 pages and edited 2,937 pages. Moved 3,558 links to a new URL. Resolved 336 ghost redirects. Resolved 44 soft-404s. Removed 2 {{dead link}}. Added 11 {{dead link}}. Switched 202 |url-status=dead to live. Switched 27 |url-status=live to dead. Added 446 archive URLs (383 Wayback). Changed 36 citation metadata.
  • Batch 2: Checked 9,500 pages and edited 9,269 pages. Moved 11,141 links to a new URL. Resolved 818 ghost redirects. Resolved 115 soft-404s. Removed 3 {{dead link}}. Added 63 {{dead link}}. Switched 639 |url-status=dead to live. Switched 122 |url-status=live to dead. Added 1,418 archive URLs (1,301 Wayback). Changed 122 citation metadata.

IABot DB:

  • Checked and updated 18,045 unique links which will propagate to 300+ wikis

 Done -- GreenC 23:43, 10 September 2024 (UTC)

blogs.cnn.com

Defunct domain that used various subdomains, such as news.blogs.cnn.com and thechart.blogs.cnn.com. These subdomains all give 410 errors. Most articles do not appear to be live at the main cnn.com domain, although I did find one: http://geekout.blogs.cnn.com/2012/04/11/stan-lee-launches-his-own-comic-convention/ from Stan Lee is live at https://www.cnn.com/2012/04/11/living/stan-lee-launches-his-own-comic-convention. It might be best to just mark all as dead. 2,524 pages. Helpful Raccoon (talk) 19:32, 2 September 2024 (UTC)

Enwiki

  • Checked 2,527 pages and edited 1,665 pages. Added 47 {{dead link}}. Switched 442 |url-status=live to dead. Added 1,660 archive URLs (1,353 Wayback). Changed 5 citation metadata.

IABot DB

  • Checked and updated about 4,000 unique URLs which will propagate to over 300+ wikis

 Done -- GreenC 22:51, 11 September 2024 (UTC)

xyz.reuters.com

uk.reuters.com, ca.reuters.com: Some (but not all) subpages in these domains are soft redirects which ca be converted to live URLs by replacing the domain with just "reuters.com", no subdomain. E.g. http://uk.reuters.com/article/wtMostRead/idUKTRE50318U20090104 in Matt Smith can be converted to http://www.reuters.com/article/wtMostRead/idUKTRE50318U20090104.

However, this conversion often leads to an unrelated article for some reason. For example, the URL http://uk.reuters.com/article/idUKN1420378520061215 from Korn would get converted to http://reuters.com/article/idUKN1420378520061215, which is a different article. The original article in this case appears to be completely dead. Either the title at the converted URL needs to be extracted to see if the article is correct, or else no conversion should happen at all.

in.reuters.com: Some links are soft redirects which can be converted like the above. This occurs when the URL contains keywords before "idINIndia", e.g. http://in.reuters.com/article/film-treysongz-idINDEE9010B720130102 in Trey Songz. Other links are either completely dead or soft redirects with unpredictable conversion rules, e.g. http://in.reuters.com/article/idINIndia-54075420110111 in Ricky Ponting. In this case, there are live URLs, e.g. https://www.reuters.com/article/sports/ponting-should-focus-on-batting-wessels-idUSTRE70A2EU/, but I can't find a way to get the correct "id" at the end of the URL.

uk.reuters.com: 6,893 pages.

ca.reuters.com: 572 pages.

in.reuters.com: 2,588 pages.

Helpful Raccoon (talk) 01:29, 3 September 2024 (UTC)

Due to the unreliability of the above simple conversion rules, I'd say archiving everything is best unless there's a feasible workaround. Helpful Raccoon (talk) 09:18, 7 September 2024 (UTC)
I agree. It's also probably dangerous to convert these because in the future they might recycle IDs as they appear to have done already. Considering how long those IDs are, you'd think they would remain unique until the end of time, but they seem to be reused for different articles based on the host name. It's a yellow flag about their system. Might be a consequence of how the site grew geographically over time. -- GreenC 05:32, 8 September 2024 (UTC)

Enwiki

  • Checked 9,575 pages and edited 8,473 pages. Converted 1 templates. Added 748 {{dead link}}. Switched 2,124 |url-status=live to dead. Added 7,922 archive URLs (6,815 Wayback). Changed 272 citation metadata.

IABot DB

  • Checked 22,154 links and updated 15,124 which will propagate to 300+ wikis

 Done -- GreenC 15:29, 13 September 2024 (UTC)

xroads.virginia.edu

A defunct project where all subpages return 404 errors; https://xroads.virginia.edu/ recommends using Wayback as one option. 562 pages. Helpful Raccoon (talk) 18:38, 3 September 2024 (UTC)

Enwiki

  • Checked 576 pages and edited 385 pages. Added 5 {{dead link}}. Switched 24 |url-status=live to dead. Added 421 archive URLs (385 Wayback).

IABot DB

  • Checked and updated 789 links which will propagate to 300+ wikis.

 Done -- GreenC 18:01, 13 September 2024 (UTC)

au.af.mil

This domain and all subdomains are dead. Unable to find live versions of the sources. 692 pages. Helpful Raccoon (talk) 18:47, 3 September 2024 (UTC)

Enwiki

  • Checked 693 pages and edited 437 pages. Added 26 {{dead link}}. Switched 38 |url-status=live to dead. Added 444 archive URLs (424 Wayback).

IABot DB

  • Checked and updated 819 links which propagate to 300+ wikis

 Done -- GreenC 00:51, 14 September 2024 (UTC)

theweeklystandard.com

Dead. 9 pages. -- GreenC 17:59, 7 September 2024 (UTC)

 Already done -- GreenC 00:53, 14 September 2024 (UTC)

nhl.com/gamecenter

NHL Gamecenter changes their URLs sometimes in 2019–2020. The old addresses all have the format:
http://www.nhl.com/gamecenter/en/recap?id=
followed by a number.
If you use:
https://www.nhl.com/gamecenter/
followed by the number you get redirected to the new page.
As an example if the old URL is:
http://www.nhl.com/gamecenter/en/recap?id=2003030411
and you use the URL:
http://www.nhl.com/gamecenter/2003030411
you will be redirected to:
https://www.nhl.com/gamecenter/cgy-vs-tbl/2004/05/25/2003030411.
There appears to be a few hundred URLs that need updating. -- LCU ActivelyDisinterested «@» °∆t° 19:31, 7 September 2024 (UTC)

I would expect that they will delete the redirects at some point, so it would be good to update to the new URL if that's possible. I don't know how difficult that will be given you have to call the redirect to get the final URL. -- LCU ActivelyDisinterested «@» °∆t° 19:37, 7 September 2024 (UTC)
Not difficult! -- GreenC 05:04, 8 September 2024 (UTC)
Brilliant thanks GreenC. -- LCU ActivelyDisinterested «@» °∆t° 09:17, 8 September 2024 (UTC)
User:ActivelyDisinterested, here you go over 20,000 URLs changed in 682 pages (beginning upload now). -- GreenC 15:08, 14 September 2024 (UTC)
I had not realised the scale, fantastic work! -- LCU ActivelyDisinterested «@» °∆t° 16:35, 14 September 2024 (UTC)
The "season" pages have most of it. Like Special:Diff/1235871408/1245695508, where it changed "ott-vs-buf" -> "buf-vs-ott" - they redid in alphabetical order and thankfully created redirects. I'm also doing the IABot database, but IABot does not support URL moves, so unfortanately all these (without working redirects) will be considered dead links with archives added. It will only effect the non-Enwiki wikis. -- GreenC 18:07, 14 September 2024 (UTC)

Enwiki

  • Checked 980 pages and edited 682 pages. Moved 20,149 links to a new URL. Resolved 58 ghost redirects. Resolved 344 soft-404s. Removed 2 {{dead link}}. Added 33 {{dead link}}. Switched 9 |url-status=dead to live. Added 254 archive URLs (211 Wayback).

IABot DB

  • Checked 21,348 links and updated 9,298 which propagate to 300+ wikis

 Done -- GreenC 02:12, 15 September 2024 (UTC)

pubmedcentral.nih.gov

These ([11]) links all seem dead or redirecting to a 404. replacing them however with the below seems to make them work:

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1380757
to
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380757/
Jonatan Svensson Glad (talk) 19:46, 8 September 2024 (UTC)

Enwiki

  • Checked 180 pages and edited 180 pages. Moved 193 links to a new URL. Removed 2 {{dead link}}. Added 3 {{dead link}}. Added 3 archive URLs (3 Wayback).

IABot DB

  • Checked and updated 1,204 links which propagate to 300+ wikis

 Done -- GreenC 14:24, 15 September 2024 (UTC)

magxone.com

This website is dead. The current website gives a virus warning on my computer. Kaltenmeyer (talk) 03:50, 29 August 2024 (UTC)

12 pages

 Done: WP:JUDI batch #17 -- GreenC 04:31, 19 September 2024 (UTC)

www.lindsaygibsonpsyd.com

The official website listed for Lindsay Gibson redirects to watermillrestaurant.com, which appears to be the website for an Indonesian casino. I couldn't find an official dedicated website for Lindsay Gibson, and I'm not sure one currently exists. The best I could find were author pages on various other websites, none of which seem to serve as an official site. BlueEditorials (talk) 03:18, 31 August 2024 (UTC)

 Done: WP:JUDI batch #17 -- GreenC 04:31, 19 September 2024 (UTC)

yemenileopard.org

Has been usurped by an advert for a mobile game Big Blue Cray(fish) Twins (talk) 22:38, 7 September 2024 (UTC)

 Done: WP:JUDI batch #17 -- GreenC 04:30, 19 September 2024 (UTC)

Various articles.<domain>.com subdomains for Tribune Publishing sites

Other publications owned by Tribune Publishing besides New York Daily News and Chicago Tribune have defunct subdomains of the form articles.<domain>.com that can be transformed to live links in the same way as articles.nydailynews.com and articles.chicagotribune.com. Paywalled sites can be archived instead of converted.

articles.courant.com (No paywall, 1,594 pages)

articles.sun-sentinel.com (Paywall, 3,369 pages)

articles.dailypress.com (No paywall, 565 pages)

articles.orlandosentinel.com (No paywall, 3,544 pages)

articles.mcall.com (No paywall, 1,170 pages)

The Virginian-Pilot does not have any links of the form "articles.pilotonline.com", presumably because Tribune Publishing acquired it very recently. Helpful Raccoon (talk) 16:51, 10 September 2024 (UTC)

User:Helpful_Raccoon, nice find. May I ask, would you redo the request so each has its own section? Thus 5 sections. It's because the edit summary links to a domain name, which I need to run one at a time. You can delete this comment. -- GreenC 19:46, 10 September 2024 (UTC)
Will do! Helpful Raccoon (talk) 21:38, 10 September 2024 (UTC)

articles.courant.com

Should be converted in the same way as articles.nydailynews.com and articles.chicagotribune.com.

(No paywall, 1,594 pages) Helpful Raccoon (talk) 21:40, 10 September 2024 (UTC)

Enwiki

  • Checked 1,596 pages and edited 1,547 pages. Moved 1,855 links to a new URL. Resolved 28 ghost redirects. Resolved 7 soft-404s. Removed 5 {{dead link}}. Added 11 {{dead link}}. Switched 239 |url-status=dead to live. Switched 17 |url-status=live to dead. Added 251 archive URLs (205 Wayback). Changed 19 citation metadata.

IABot DB

  • Checked and updated 2,360 unique links which propagate to 300+ wikis

 Done -- GreenC 00:10, 16 September 2024 (UTC)

articles.sun-sentinel.com

Can be converted in the same way as articles.nydailynews.com and articles.chicagotribune.com, but has a paywall, so archiving is probably best.

(Paywall, 3,369 pages) Helpful Raccoon (talk) 21:41, 10 September 2024 (UTC)

Enwiki

  • Checked 3,370 pages and edited 2,292 pages. Added 59 {{dead link}}. Switched 392 |url-status=live to dead. Added 2,705 archive URLs (2,482 Wayback). Changed 24 citation metadata.

IABot DB

  • Checked and updated 5,100 links which will propagate to 300+ wikis

 Done -- GreenC 18:30, 16 September 2024 (UTC)

articles.dailypress.com

Should be converted in the same way as articles.nydailynews.com and articles.chicagotribune.com.

(No paywall, 565 pages) Helpful Raccoon (talk) 21:42, 10 September 2024 (UTC)

Enwiki

  • Checked 564 pages and edited 538 pages. Moved 627 links to a new URL. Resolved 16 ghost redirects. Resolved 6 soft-404s. Removed 3 {{dead link}}. Added 6 {{dead link}}. Switched 172 |url-status=dead to live. Switched 2 |url-status=live to dead. Added 76 archive URLs (60 Wayback). Changed 8 citation metadata.

IABot DB:

  • Checked and updated 783 unique links which will propagate to 300+ wikis

 Done -- GreenC 20:36, 23 September 2024 (UTC)

articles.orlandosentinel.com

Should be converted in the same way as articles.nydailynews.com and articles.chicagotribune.com.

(No paywall, 3,544 pages) Helpful Raccoon (talk) 21:43, 10 September 2024 (UTC)

Enwiki

  • Checked 3,541 pages and edited 3,431 pages. Moved 4,156 links to a new URL. Resolved 80 ghost redirects. Resolved 24 soft-404s. Removed 1 {{dead link}}. Added 67 {{dead link}}. Switched 503 |url-status=dead to live. Switched 49 |url-status=live to dead. Added 598 archive URLs (505 Wayback). Changed 24 citation metadata.

IABot DB

  • Checked and updated about 6,000 unique links which will propagate to 300+ wikis

 Done -- GreenC 16:04, 24 September 2024 (UTC)

articles.mcall.com

Should be converted in the same way as articles.nydailynews.com and articles.chicagotribune.com.

(No paywall, 1,170 pages) Helpful Raccoon (talk) 21:43, 10 September 2024 (UTC)

Enwiki

  • Checked 1,170 pages and edited 1,148 pages. Moved 1,316 links to a new URL. Resolved 14 ghost redirects. Resolved 7 soft-404s. Added 7 {{dead link}}. Switched 176 |url-status=dead to live. Switched 73 |url-status=live to dead. Added 180 archive URLs (121 Wayback). Changed 28 citation metadata.

IABot DB

  • Checked and updated 1,722 unique links which propagate to 300+ wikis

 Done -- GreenC 20:42, 24 September 2024 (UTC)

pubs.acs.org

These ~80 articles has link which are dead. There may be more if making better a better search query. I believe they can be replaced the following way:

Example

From: http://pubs.acs.org/cgi-bin/abstract.cgi/jafcau/1999/47/i05/abs/jf981170m.html
To: https://pubs.acs.org/doi/10.1021/jf981170m
--Jonatan Svensson Glad (talk) 04:50, 11 September 2024 (UTC)

They should use (1) |doi= or (2) {{doi}} rarther than a URL to the publisher's website, or simply be (3) removed altogether if there already is a proper doi link. I just took care of about half of the (3) cases. DMacks (talk) 05:12, 11 September 2024 (UTC)
There are a bunch of variations...working on it manually... DMacks (talk) 13:45, 11 September 2024 (UTC)
Mainspace all done. Zero of them were worthy of remaining as a URL at all:) DMacks (talk) 18:50, 11 September 2024 (UTC)
 Already done thank you DMacks -- GreenC 00:34, 25 September 2024 (UTC)

Usurpation: HuskerJ.com

Reporting Wikipedia:Link rot/Usurpations

Site: https://www.huskerj.com/

Linked to / cited from various Nebraska football articles, such as: Chicago Tribune Fans' Poll

PK-WIKI (talk) 17:40, 17 September 2024 (UTC)

 Done in WP:JUDI batch #18 -- GreenC 00:40, 25 September 2024 (UTC)