Wikipedia:Link rot/URL change requests/Archives/2022/April
This is an archive of past discussions about Wikipedia:Link rot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page. |
FCC ECFS system overhaul
URLs beginning in
need to be changed to begin in
https://www.fcc.gov/ecfs/file/download/
e.g. https://www.fcc.gov/ecfs/file/download/6513193001.pdf replaces https://ecfsapi.fcc.gov/file/6513193001.pdf
Links with a file name more than a link will still be broken and may need to be retrieved by searching ECFS again. Sammi Brie (she/her • t • c) 04:39, 5 April 2022 (UTC)
- @Sammi Brie: There are only 62 pages containing these links. But they are turning out out to be very hard to automate. They have bot detection, and require JavaScipt enabled to view the PDF, and that requires a headless browser for a bot. I'm going to recommend these be done manually will be more accurate. If it was thousands I would try to develop a custom system to automate around the blocks with a headless browser and VPN IP skipping, but for 62 pages I am going to pass on that level of work, easier done manually. Below is the list of pages. -- GreenC 16:12, 11 April 2022 (UTC)
1000-2000 dead links to astronautix.com
2100 articles reference astronautix.com which changed its whole website structure at some point (2016-2018). Now all old links redirect to a 404 page. The articles are still online but there is no fixed URL replacement pattern. The articles are generally archived at web.archive.org if the access is no later than March 2016, about 1000 articles already reference an archived version from there. Can a bot add archive links in the remaining articles? Basically this type of edit over 1000 times. Links that follow the pattern "astronautix.com/[a-z]/" were added after the restructuring and should still be available. If someone has a way to find the current page (example for the Moon landing) that's even better, but we still need an archived version as the website isn't maintained any more and might disappear any moment. --mfb (talk) 14:03, 11 April 2022 (UTC)
- @Mfb:. This is done, all the links treated as dead because we still need an archived version as the website isn't maintained any more and might disappear any moment. The site is technically live but functionally dead. There are 25 citations that need to be deleted entirely because they all use the same bogus URL. It was caused by ReFill years ago. It will need to be done manually. Can you help? -- GreenC 13:24, 22 April 2022 (UTC)
- Thank you for the bot run. The 404page references are all salvageable - if the archived version is good it's easy, otherwise I can get the original URL from the reFill edit destroying it. Just needs to be done URL by URL. 1/3 done, will take care of the rest later. --mfb (talk) 03:26, 23 April 2022 (UTC)
- Great. I see what you mean. Are you finding any with no archive available? Archive.today is the second-largest and timetravel.mementoweb.org . If still none, it might be non-verifiable since the source is on-line only, unless possibly available somewhere else. To bad Mark Wade doesn't open source the website and dump it to GitHub. -- GreenC 13:56, 23 April 2022 (UTC)
- I found all archived at web.archive.org. In two articles direct references to 404page were added manually - in one I could find which article was meant, in the other I removed the references. Done with the 25 articles. I also found ~20 articles which used "Encyclopedia Astronautica Index: 1" as title and fixed that. Done --mfb (talk) 09:09, 24 April 2022 (UTC)
- Great. I see what you mean. Are you finding any with no archive available? Archive.today is the second-largest and timetravel.mementoweb.org . If still none, it might be non-verifiable since the source is on-line only, unless possibly available somewhere else. To bad Mark Wade doesn't open source the website and dump it to GitHub. -- GreenC 13:56, 23 April 2022 (UTC)
- Thank you for the bot run. The 404page references are all salvageable - if the archived version is good it's easy, otherwise I can get the original URL from the reFill edit destroying it. Just needs to be done URL by URL. 1/3 done, will take care of the rest later. --mfb (talk) 03:26, 23 April 2022 (UTC)