Jump to content

Help talk:Using the Wayback Machine/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1

Proof Of Concept

Check this sandbox history page to see. --TIB (talk) 20:16, Aug 22, 2004 (UTC)

"Improperly formatted links"?

The "improperly formatted links" given in the article seem to work just fine! Brianjd | Why restrict HTML? | 09:52, 2005 Mar 20 (UTC)

They work fine in my system too. My browser is Opera 8.01 running on WinXP system. This wouldn't be browser dependant? --The Merciful 15:13, 24 July 2005 (UTC)
They used to break, the wiki software has been fixed to cause them to work properly. The link would break at a semicolon and the rest of the text was cut off, so links broke. I suppose this article should be deleted, now that the software works. --TIB (talk) 00:03, August 8, 2005 (UTC)
I take that back, for some reason the first one works (one version) but the all versions one doesn't, because of the asterisk. Why did you confuse me? --TIB (talk) 00:06, August 8, 2005 (UTC)

I agree with the previous comment. In my browser too the so-called "incorrect" code works fine! - anon.

Templates

The "broken" links work fine for me, but to make things hopefully easier I have made a few quick templates, {{wayback}} (which links to a site's archive), and {{waybackdate}}, which links to specific instances of a site. I hope they are helpful to someone! --Fastfission 22:46, 23 August 2005 (UTC)

Colons are no problem

As you can see, this link: [1] which doesn't use any silly replacement technique for its colon (:), works just fine. Maybe this was fixed with a later release of Mediawiki? --Michiel Sikma 14:20, 31 December 2005 (UTC)

[http://web.archive.org/web/19980112230112/http://www.planet.nl/]

I wonder

I wonder if anyone is reading this talk page, but should Wikipedia even allow Wayback people to archive our site? I realize Wikipedia is licensed in GFDL, etc. but there are many copyright violated pages that subsequently become archived in their site, and that's not something that we want to encourage. As for most search engines, they re-cache after a while (Supplemental Index of Google being the exception), but the Wayback Machine keeps a pernament copy. -- WB 07:59, 22 January 2006 (UTC)

You said it yourself: Wikipedia is licensed under GFDL, so anybody can archive it. Period.--Eysen 02:21, 11 January 2007 (UTC)

automatic caption

i used the first template in Josiah Royce#References and encountered two bugs. first, it links to a specific version, not to the list. second,it creates an automatic anchor text which is both noninformative and not related to the page title. i could easily add text, but couldn't think of a way to delete the automatic part ("josiah royce", specifically). how and why does it happen? trespassers william 21:35, 7 January 2007 (UTC)

Other archiving systems

I wonder why Wikipedia is dwelling on using the Wayback machine to recover broken links, without even mentioning other archiving systems such as WebCite. As Archive.org is far from complete, there should be crosslinks to other articles describing how other services like WebCite can be used. Note that WebCite - in contrast to Wayback machine, which uses a shotgun-approach using a crawler - allows editor/author-initiated prospective archiving (taking a snapshot before the website disappears), which - if this would be done consistently by authors, a part of the Wikipedia "citing sources" policy, or handled by bots - would avoid the problem of link rot on wikipedia in the first place. Countless hours of editors are wasted just to eliminate broken links and trying to recover cited source. --Eysen 02:21, 11 January 2007 (UTC)

Why not 'Help Using Internet Archives' ?

Indeed, it's now 13 years after this comment was made and governments and many other institutions are now hosting their own archives, this article ought to be more general and help users understand adding links to archives of all kinds, not just focussing on Way Back. Help should be given in less specific ways, perhaps by giving examples to archives other than WBM. The topic ought to be renamed to reflect this new direction, eg: 'Help Using Internet Archives' Yoga Mat (talk) 11:21, 20 July 2020 (UTC)

I've proposed a minor change in wording at Template talk:Wayback. Comments appreciated. John Broughton | Talk 20:31, 12 January 2007 (UTC)

Use with {{cite web}}

Where the {{cite web}} template is already in use, we should just add archiveurl and archivedate arguments, right? 82.36.30.34 (talk) 06:07, 10 December 2007 (UTC)

Site Parameter note being transcoded

I could not get {{webarchive |url=https://web.archive.org/web/*/site= |date=* |title=Guinness Book of World Records 2005 - SCIENCE AND TECHNOLOGY << BUILDINGS }} to work correctly. The site parameter was not picked up. I had to use {{waybackdate}} instead, with an additional date parameter, which worked fine. Was using the template in a sub-section of the CN Tower article. papageno (talk) 02:01, 10 April 2008 (UTC)

Outdated

Attempted killing of this useful site

Some editors are trying to kill off this template

DG and a few other exclusionists/deletionists have basically taken over WP:EL for months now (they just wear out anyone who tries to discuss things with them). We could really use some Wikipedians with more common-sense over there..... frustrated (talk) 21:34, 6 September 2008 (UTC)

I have no idea what you're talking about - can you be more specific? --Ludwigs2 02:11, 7 September 2008 (UTC)
See this diff, which is based on the 2nd thread I linked above. (DreamGuy has multiple rfcs and an arbcom, all about civility, but he treads the line enough to not get blocked - see his usual-rude reply to Jmabel at the 1st thread I linked above. (He's the cause, but I'm just trying to get people involved in fixing this particular symptom)) frustrated (talk) 00:09, 8 September 2008 (UTC)


archivedate/accessdate

I believe the example archivedate and accessdate values given in Wikipedia:Using_the_Wayback_Machine#Cite_templates are the wrong way round. Am I right? Open4D (talk) 13:59, 13 December 2011 (UTC)

In general, people should be adding |archiveurl= and |archivedate= parameters and not adding |accessdate= as that is much less useful. When a date-stamped article also has a date-stamped archive copy, the accessdate is completely irrelevant. -- 86.144.190.83 (talk) 18:10, 9 October 2013 (UTC)
What about explaining this in the page? Timofonic (talk) 12:42, 19 July 2017 (UTC)

Use of Wikipedia as the example site

I believe the use of wikipedia.org as the example of a website that has been archived by Wayback Macine has the unnecessary possibility of confusing someone, and would propose that any other site be used - e.g. un.org, disney.com, guardian.co.uk. Thoughts? Open4D (talk) 14:00, 13 December 2011 (UTC)

I agree. Any of the three websites you mentioned would be fine. Or you may choose http://www.example.com/, but it is a worse choice because it's not a commonly-known website. Feel free to make the change. —Unforgettableid (talk) 01:40, 10 October 2013 (UTC)
Why not change it? Why this discussion ended and nothing has been done? Timofonic (talk) 12:44, 19 July 2017 (UTC)

The server web.archive.org doesn't work

The server http://web.archive.org/ hasn't been accessible for more than a month. Does anybody know what's the problem and when it will be accessible again? --Лъчезар 09:37, 21 September 2012 (UTC)

It's working fine for me - both that direct link (which redirects to http://archive.org/web/web.php) as well as instances of Template:Wayback and Template:cite web's archiveurl. Can you give example(s) of the exact error and location that you're encountering? —Quiddity (talk) 19:47, 21 September 2012 (UTC)
It's a DNS resolver problem. I'm in Bulgaria and my ISP is the Bulgarian Telecommunications Company. The DNS address "web.archive.org" just can't be resolved here. When I resolve it from abroad and try to access that IP address by HTTP, it gives "HTTP/1.1 302 Unknown Error" and somehow redirects back to "web.archive.org", which fails to resolve again. --Лъчезар共产主义万岁 17:50, 24 September 2012 (UTC)
They just fixed the problem, after a complaint I did yesterday. --13:47, 27 September 2012 (UTC)

For subscription only material

Should the tool be used at all for subscription only material like billboard.biz charts (now moved to billboard.com/biz)? It seems like a copyright violation of some sorts. There should probably be a word of caution if so. ⊾maine12329⊿ talkswiki 02:16, 15 June 2013 (UTC)

http://www.billboard.com/robots.txt does not disallow archiving. --  Gadget850 talk 11:29, 21 July 2013 (UTC)

Wayback API

I just learned of this: mw:Archived Pages: "The Internet Archive wants to help fix broken outlinks on Wikipedia and make citations more reliable. Are there members of the community who can help build tools to get archived pages in appropriate places? If you would like to help, please discuss, annotate this page, and/or email alexis@archive.org." - leaving the link here in case anyone else is interested or can help. –Quiddity (talk) 16:38, 31 October 2013 (UTC)

What happened to this? Any news? Timofonic (talk) 12:53, 19 July 2017 (UTC)

HTTP Secure

Since October 24, 2013, the Internet Archive uses HTTP Secure by default. Should we reflect that by recommending the use of https:// links on this manual, or not? Please discuss at WP:VPM. --bender235 (talk) 13:14, 9 November 2013 (UTC)

I hoped within half a year somebody would comment on this. --bender235 (talk) 09:31, 5 May 2014 (UTC)
You did direct people to WP:VPM where it looks like it was discussed (before the time that I was active here). Personally, I have no preference. --Otus scops (talk) 10:02, 5 May 2014 (UTC)

Archive problem - redirect loop possibly?

I've no idea if this is a reasonable place to ask this, or if there even is a place to ask. I've been fixing links to PrimeraHora.com and adding updated urls to the internet archive. However, I've just realised that the archived copies aren't working properly - the page initially loads OK, but then loads a second copy in a box on the right of the page and, presumably, another to the right of that, etc, until the page goes blank after about 20 seconds. I'm not sure if this is a redirect and hitting the 5 redirect limit or something else. (Firefox gives the behaviour above - IE doesn't (for me) show the extra versions of the page loading but goes straight from showing the page briefly to a blank screen.)

Some examples - an original page with an archive created today:
"A punto de revivir a los Senadores de San Juan en el béisbol invernal" (in Spanish). Primera Hora. August 5, 2010. Archived from the original on 2014-04-21. {{cite web}}: Italic or bold markup not allowed in: |publisher= (help); Unknown parameter |deadurl= ignored (|url-status= suggested) (help)

It seems to have something to do with the box on the right of the page called "Las +".

I tried using the "id_", "js_", "cs_", and "im_" option mentioned on Help:Using_the_Wayback_Machine#Specific archive copy.

https://web.archive.org/web/20140421193730id_/http://www.primerahora.com/deportes/beisbol/nota/apuntodereviviralossenadoresdesanjuanenelbeisbolinvernal-406044/

https://web.archive.org/web/20140421193730js_/http://www.primerahora.com/deportes/beisbol/nota/apuntodereviviralossenadoresdesanjuanenelbeisbolinvernal-406044/

https://web.archive.org/web/20140421193730cs_/http://www.primerahora.com/deportes/beisbol/nota/apuntodereviviralossenadoresdesanjuanenelbeisbolinvernal-406044/

https://web.archive.org/web/20140421193730im_/http://www.primerahora.com/deportes/beisbol/nota/apuntodereviviralossenadoresdesanjuanenelbeisbolinvernal-406044/

In Firefox, the "js_" option actually looks OK and seems to work (though it adds a bar at the top). In IE, it works but doesn't render very well. Is that my best option or can anyone suggest something else I can do to prevent the infinite loop / blank screen?

Alternatively, should I ask this somewhere else?

Thanks for any suggestions.--Otus scops (talk) 23:18, 21 April 2014 (UTC)

I've commented out the links because I get the impression that someone at the IA is working on it. My example link had disappeared from the IA (until I accidentally added it again). I'll reinstate them in a few days if I don't hear anything. --Otus scops (talk) 21:42, 22 April 2014 (UTC)
I guess it was a IA blip - links now point to the original archived version again. --Otus scops (talk) 20:40, 23 April 2014 (UTC)

I'm ignorant of javascrip and I can't seem to find an answer to this by searching. I wonder if anyone knows of a way to modify these scripts so that they open in a new browser window? I would like to go to the Wayback Machine in a new window and leave the page I'm viewing open. Is this possible?

  • javascript:void(location.href='http://web.archive.org/web/*/'+document.location.href) (Search)
  • javascript:void(location.href='http://web.archive.org/save/'+document.location.href) (Save)

Thanks for any assistance anyone can provide.—D'Ranged 1 talk 16:56, 8 May 2014 (UTC)

Have you tried it? It opens in a new tab (rather than window) for me in Firefox. I don't think I changed anything to get it to work like this.--Otus scops (talk) 17:34, 8 May 2014 (UTC)
@D'Ranged 1:I've just double-checked (now that I'm on my computer) and my bookmarklet points to
  • javascript:void(window.open('http://web.archive.org/web/*/'+location.href))
The window.open might make all the difference.--Otus scops (talk) 21:36, 8 May 2014 (UTC)

FWIW, I was having a hard time using the bookmarklet on network/http-related errors so I came up with a Chrome-specific solution. I'm not sure if it's worth adding to the main page so i'll let you decide. What it does is load the url from Chrome's error page if there, otherwise fallback to location.href:

   javascript:void(location.href='http://web.archive.org/*/'+(location.href==="data:text/html,chromewebdata"&&loadTimeData.data_.summary.failedUrl||location.href))

216.162.78.214 (talk) 00:28, 18 August 2016 (UTC)

Sources

Does citing a webpage using the Wayback Machine always constitute a secondary source, even if the existing original or current version of the archived page would be considered a primary source? Technically, the Wayback Machine is secondary, as it is one party copying or quoting the material of another party. But does this create a loophole, where instead of citing the original material, one cites the Wayback Machine's version of this other party's material, in order to claim it's a secondary source? Sorry if this has been explained already, but I haven't seen it.--Wasp14 (talk) 21:37, 2 June 2014 (UTC+1)

@Wasp14: The creator of a typical secondary source would be thinking about the material, assessing it and reworking it. Examples would be newspaper journalists or book editors. The Wayback Machine is just preserving the original, without any editorial steps. So when it copies a primary source I'd say that the copy counts as a primary source too. -- John of Reading (talk) 21:01, 2 June 2014 (UTC)
(edit conflict)@Wasp14: No, if it was originally a primary source, it's still a primary source from the Wayback Machine. And it was was an unreliable source it's still unreliable. The Wayback Machine archives anything and everything that someone has requested (unless the site blocks archiving) - there's no mechanism by which it decides the VALUE of a site or comments on it. It's just a (very) useful source of no-longer available webpages.--Otus scops (talk) 21:23, 2 June 2014 (UTC)

Citing the Wayback Machine

Why cite the Wayback Machine as template {{Wayback}} does?

If it's important, why does this page cite Internet Archive instead (ref 2)?

--P64 (talk) 01:38, 22 July 2014 (UTC)

That template got deleted and replaced by {{Webarchive}}. I just mention it to avoid confusion, as I had while trying put an archived version of a site that is no longer available. Timofonic (talk) 13:00, 19 July 2017 (UTC)

This article needs to talk about references which are bare URLs

This article should talk about converting a reference which is a bare url (Example: <ref>[http://www.deadlink.example.com/some_dead_link.html Some dead link]</ref>) in to a “proper” web archive link. I ended up making the bare URL a link to the wayback version of the URL in question: diff. Samboy (talk) 15:42, 7 March 2015 (UTC)

So I'm not sure if there still are bots checking for offline links - made a post that at the idea lab: A bot that checks for offline-links. And I'm also not sure if this is the right place to post this - and not say somewhere on Village Pump or Phabricator. But anyway here's my idea which is also relevant if there are no more bots checking for offline links:

What about a bot that scans through wiki-pages for references marked as dead links (the syntax for that is:{{dead link}}) and automatically retrieves the page archived by the Wayback Machine that is closest to the accessdate (in the case of it being specified) and then either automatically makes an edit right away or proposes this via a comment behind the deadlink, or an extension of the dead-link parameters or by the use of an external tool.

What do you think? --Fixuture (talk) 22:39, 18 March 2015 (UTC)

One problem to think about: when a web page is taken down, it's often replaced with either an error message ("Sorry, this story is no longer available") or by generic advertising. The Wayback Machine's archiving bot faithfully copies these useless pages into its archive. I think this rules out any fully-automated use of these Wayback Machine links, as only a human can decide whether the archive page is a useful reference. -- John of Reading (talk) 07:42, 19 March 2015 (UTC)
Yes, that's the reason why it would just make a comment or add another entry in a new dead-link-template parameter: it wouldn't show up on the page but causes the watchers of an article to check the link and if it's ok they'd just need to uncomment the part or otherwise (for example via an external tool) confirm that the archive-link is correct & fine. --Fixuture (talk) 18:24, 19 March 2015 (UTC)
Checking for only archives taken as a close as possible to, but before the access date should pretty much eliminate this problem.--greenrd (talk) 08:37, 3 May 2015 (UTC)
I even suggest the bot go through live links and create archive copy if Wayback machine doesn't have it, and add archive-url, archive-date and url-status=live to the reference, so we're ready when the live link rots. This seems to be InternetArchive Bot Numbersinstitute (talk) 19:37, 12 April 2020 (UTC)
URLs added to Wikipedia (across all wiki languages) are saved at the WaybackMachine within 24hrs of being added to Wikipedia. -- GreenC 20:06, 12 April 2020 (UTC)

The 14th month

This isn't right. "YYYYMMDDhhmmss" - "20131404315600" —User 000 name 23:58, 2 May 2015 (UTC)

Why can't I save this page?

I'm trying to archive this page but when I go to https://web.archive.org/save/https://forum.quantifiedself.com/thread-my-phone-and-me I get an error that "This url is not available on the live web or can not be archived.". That's just bogus. I'm the admin of that site, and the robots.txt doesn't disallow that URL. Thousands of different URLs from the site have been archived successfully. What's up? -- Dandv 03:07, 3 October 2015 (UTC)

I'm not sure what happened, but the site just got archive by what it seems a generic (spam) advertising site. Timofonic (talk) 13:06, 19 July 2017 (UTC)

Faulty markup in an example

The current page contains the following example:

<ref>{{<!-EXISTING REFERENCE->|archive-url=https://web.archive.org/web/20021128120000/http://www.originalurl.com|archive-date=2002-11-28|access-date={{subst:YYYYMMDD|d}}|dead-url=yes}}</ref>

That should surely be (with the proper XML comment markup):

<ref>{{<!--EXISTING REFERENCE-->|archive-url=https://web.archive.org/web/20021128120000/http://www.originalurl.com|archive-date=2002-11-28|access-date={{subst:YYYYMMDD|d}}|dead-url=yes}}</ref>

Can someone sort this out please? Best wishes. RobbieIanMorrison (talk) 17:54, 29 September 2016 (UTC)

 Done @RobbieIanMorrison: Not sure why it took so long, or why you did not do it yourself. – Allen4names (contributions) 23:58, 26 February 2017 (UTC)

nytimes.com robots exclusion standard

Hi, I've noticed that the Wayback Machine archives pages from the domain nytimes.com regularly and that I can access those pages on the Wayback Machine without getting an error message about robots.txt. While some pages archived required a login, the site overall doesn't seem to be using its robots exclusion standard. In fact, nytimes.com/robots.txt has /archives/ listed but not /archive/ which would presumably disallow the Wayback Machine from viewing. Is this standard still applicable? Thanks, Icebob99 (talk) 14:16, 5 November 2016 (UTC)