Wikipedia:List of web archives on Wikipedia

List of known web archive services in-use on English Wikipedia. Sorted roughly by number of uses from most to least. The Wayback Machine is about 80% of the total. Data initially compiled by User:GreenC as of March 2017, updates and corrections welcome.

Archive servicesEdit

Internet Archive Wayback MachineEdit

  • Article: Wayback Machine
  • Domain: archive.org, waybackmachine.org
  • Launched: 2001
  • Date range: 1996-
  • Hostname: <none>, web, wayback, liveweb, www, www.web, classic-web, web-beta, replay, replay.web, web.wayback
  • Path: <none>, web
  • Timestamp: Number 1 digit; 4–14 digits. Or "*". Or "?". Or combination. May also contain trailing chars like "re_" for (?), "if_" for frames and "im_" for images. If timestamp missing returns best available page.
  • Examples:
  • Oldest:
  • Newest:
  • Submit page:
  • Interval between manual captures: 45 minutes.

Some popular reference websites such as Tom's Hardware are excluded from archival, for which alternatives have to be used.

Archive.TodayEdit

  • Article: archive.today
  • Domain: .today, .is, .fo, .li, .vn, .md, .ph
  • Launched: 2012
  • Hostname: <none>, www
  • Path: <none>
  • Timestamp: 4–14 digits; or digits + characters (see example)
  • Examples:

Archived pages are initially served through their short URL format, an identifier with five case-sensitive alphanumerical characters and four characters on early captures from 2012. To obtain the long URL format with time stamp and the source URL, click "share" in the top menu or append "/share" to the URL. The full URL is listed in the window.

Additional restrictions

As of 2023, copies of Archive.org pages can only be saved once. This restriction applies as well to the digital library (archive.org/details/) which is subject to change, not only the Wayback Machine where pages are, besides infrequent exclusions, not subject to change anyway.

Web Citation (WebCite)Edit

  • Article: WebCite
  • Domain: webcitation.org
Deprecatedno longer accepting new archive requests; old archives no longer available
  • Hostname: <none>, www
  • Path: base62ID, query, cache, getfile.php, <number>
  • Timestamp: None. Uses &date=2012-06-01+21:40:03 in query?url ; the short ID is base62 which converts to unix time
  • Examples:

National Archives UKEdit

  • Domain: nationalarchives.gov.uk
  • Hostname: webarchive, yourarchives
  • Path: <none>
  • Timestamp: 4–14 digits
  • Examples:

Australian Web ArchiveEdit

  • Article: Australian Web Archive (Trove)
  • Domain: nla.gov.au
  • Hostname: webarchive
  • Path: see examples. First-level path can contain "awa" or "wayback". Second-level path might contain a pandora.nla.gov.au URL with a third-level destination URL which may or may not have a scheme (http://). Or the second-level path might be the final destination URL. A special archive index page URL is also available as "/tep/"
  • Timestamp: Two types (20120727-0512, 20120326012340)
  • Examples:
Note: The Australian Web Archive incorporates the Pandora archive as well as the Australian Government Web Archive and the National Library of Australia's archive of the .au domain.
Note: No memento access

NLA Australia (deprecated)Edit

Deprecated. Integrated into Australian Web Archive (Trove) above.
  • Domain: nla.gov.au
  • Hostname: pandora, trove, tep, webarchive, content.webarchive
  • Path: see examples. The /pan/ regex should be /pan/[0-9]{4,7}/
  • Timestamp: Three types (20120727-0512, S2000-Dec-5, 20120326012340)
  • Examples:
  • Note: Not to be confused with non-webarchive URLs that appear similar:
Note: No memento access

Ghost ArchiveEdit

  • Domain: ghostarchive.org
  • Launched: ~2021
  • Hostname: [none]
  • Path: archive, varchive/<YouTube_video_ID>, iarchive/<Instagram_username>[/<Instagram_post_ID (optional)>]
  • Timestamp: 4 to 14 digits
  • Examples:
Note: to convert short-form to long:
For regular web pages:
For video pages, e.g. YouTube:
To find the earliest and latest archive available use a timestamp of "1990" or "3000" e.g.
The /archive/ path long form will work for all types of archives, for example https://ghostarchive.org/archive/20210728022510/https://youtube.com/watch?v=UhCiGY75wVw will redirect to the video, and https://ghostarchive.org/archive/20210728022510/https://www.instagram.com/p/BMUccRPg4Ow/ will redirect to the image.

Megalodon.jpEdit

Using https://megalodon.jp/(full URL) (example: https://megalodon.jp/https://gstreamer.freedesktop.org:443/download/ ) can check if Megalodon archived any copy of a particular URL. http and https are treated separately.

FreezePageEdit

  • Domain: freezepage.com
  • Hostname: <none>, www
  • Path: <none>
  • Timestamp: <none> (only available via web scrape)
  • Examples:
Note: If the account ID which created the snapshot expires for lack of activity (no login to freezepage), the snapshot is deleted from freezepage.com
Note: No memento access

Library of CongressEdit

  • Domain: loc.gov
  • Hostname: webarchive
  • Path: all, lcwa####
  • Timestamp: 4–14 digits
  • Examples:

Arquivo.pt (Portugal)Edit

  • Domain: arquivo.pt
  • Hostname: <none>
  • Path: wayback, wayback/wayback, noFrame/replay
  • Timestamp: 4–14 digits ... might contain "mp_" see example
  • Examples:

Stanford University web archiveEdit

  • Domain: stanford.edu
  • Hostname: swap, sul-swap-prod
  • Path: <none>
  • Timestamp: 4–14 digits
  • Examples:

Archive-ItEdit

  • Domain: archive-it.org
  • Hostname: wayback
  • Path: all, a 4 digit number
  • Timestamp: 4–14 digits
  • Examples:

BibAlexEdit

  • Domain: bibalex.org:80
  • Hostname: web.archive, web.petabox
  • Path: web
  • Timestamp: 4–14 digits
  • Examples:

WikiWixEdit

  • Domain: wikiwix.com
  • Hostname: archive
  • Path: cache
  • Timestamp: 4–14 digits
  • Examples:
Note: Does not support https. Does not support Memento
Note: API access added in March 2018. By appending &apiresponse=1 to the end of the URL. (http://archive.wikiwix.com/cache/?url=http://www.linterweb.fr&apiresponse=1). This may require encoding of any other & in the url= section
Note: Supports &title argument at end of URL not part of the source URL (similar to &apiresponse). Gives the name of the Wikipedia article the link is being used in (optional).

National Archives USEdit

  • Domain: webharvest.gov
  • Hostname: <none>
  • Path: <variable>
  • Timestamp: 4–14 digits
  • Examples:

National Archives IcelandEdit

  • Domain: vefsafn.is
  • Hostname: wayback
  • Path: wayback
  • Timestamp: 4–14 digits
  • Examples:

Europa Archives, Ireland (deprecated)Edit

Deceased. In May 2018 all archives were moved to collections.internetmemory.org then as of September 2018, all archives were moved again to Archive-It [1]
  • Domain: europarchive.org
  • Hostname: collection
  • Path: nli
  • Timestamp: 4–14 digits
  • Move example:

Perma CCEdit

  • Domain: perma-archives.org, perma.cc
  • Hostname: <none>
  • Path: <none>, warc
  • Timestamp: 4–14 digits for perma-archives.org, or snapshot ID
  • Examples:

Proni Web Archives (deprecated)Edit

Deceased. As of October 2018, all archives moved to Archive-It [2]
  • Domain: proni.gov.uk
  • Hostname: webarchive
  • Path: <none>
  • Timestamp: 4–14 digits
  • Examples:
  • http://webarchive.proni.gov.uk/20111213123846/http
  • Move example:

Parliament UKEdit

  • Domain: parliament.uk
  • Hostname: webarchive
  • Path: <none>
  • Timestamp: 4–14 digits
  • Examples:

UK Web Archive (British Library)Edit

  • Domain: webarchive.org.uk
  • Hostname: www
  • Path: wayback/archive
  • Timestamp: 4–14 digits with possibility of "mp_" at end
  • Examples:

Libraries and Archives Canada (deprecated)Edit

Deceased. As of May 2018 all archives moved to webarchive.bac-lac.gc.ca [3]
  • Domain: collectionscanada.gc.ca
  • Hostname: www
  • Path: archivesweb, webarchives
  • Timestamp: 4–14 digits
  • Examples:
  • http://www.collectionscanada.gc.ca/webarchives/20061104084225/http://broadband.gc.ca/maps/province.html?prov=48
  • http://www.collectionscanada.gc.ca/archivesweb/20060209004933/http
  • Note: Not to be confused with other close URL variants. Only capture "/webarchives/" or "/archivesweb/"
  • Move example:

Libraries and Archives Canada (www.bac-lac.gc.ca)Edit

As of September 2022 the new website is https://library-archives.canada.ca/eng - most of the www.bac-lac.gc.ca may no longer be available but some links still work see: [4]
  • Domain: bac-lac.gc.ca:8080
  • Hostname: webarchive, www
  • Path: wayback
  • Timestamp: 4–14 digits
  • Examples:
  • Note: Formerly collectionscanada.gc.ca see above. As of September 2022, links from collectioncanada.gc.ca were entirely removed by LAC.[5]

Catalonian ArchiveEdit

  • Domain: padi.cat(:8080)?
  • Hostname: www, (none)
  • Path: wayback
  • Timestamp: 4–14 digits
  • Examples:

Web Archives SingaporeEdit

  • Domain: nlb.gov.sg
  • Hostname: eresources
  • Path: webarchives/wayback
  • Timestamp: 4–14 digits
  • Examples:
  • Note: Not to be confused with other close URL variants. Only capture "/webarchives/wayback/"

Slovenian Archives (Spletni)Edit

  • Domain: nuk.uni-lj.si:8080
  • Hostname: nukrobi2 (may change)
  • Path: wayback
  • Timestamp: 4–14 digits
  • Examples:

Estonia ArchivesEdit

  • Domain: digar.ee
  • Hostname: veebiarhiiv
  • Path: a
  • Timestamp: 4–14 digits
  • Examples:

Bavarian ArchivesEdit

  • Domain: bib-bvb.de
  • Hostname: langzeitarchivierung
  • Path: wayback
  • Timestamp: 4–14 digits
  • Examples:

York University Digital LibraryEdit

  • Domain: yorku.ca
  • Hostname: digital.library
  • Path: wayback
  • Timestamp: 4–14 digits
  • Examples:

National Library of IsraelEdit

Appears to have poor coverage.

OtherEdit

Memento Web
Note: Redirects to an external archive service based on cached data in the Memento database which can fluctuate and/or be inaccurate due to the cache going out of sync with the client service.
Google Cache (ephemeral)
Note: Links quickly expire.
Note: Cannot be accessed with Memento.
Bing Cache (ephemeral)

Only accessible through search results, not manually through an URL or search prefix.

UndocumentedEdit

See alsoEdit

ReferencesEdit

  1. ^ a b c "Wayback Archives". Tracetools. Retrieved 27 June 2022.
  2. ^ a b Bederov, Igor S. (8 May 2022). "Web Archive as an OSINT Tool". Medium. Retrieved 27 June 2022.
  3. ^ "Examples using Common Crawl Data". Common Crawl. Retrieved 27 June 2022.
  4. ^ "RightsLab". rightslab.org. Retrieved 27 June 2022.
  5. ^ "rightslab". Twitter. Retrieved 27 June 2022.
  6. ^ "Webrecorder". webrecorder.net. Retrieved 27 June 2022.
  7. ^ "wiki.Archiveteam". archiveteam.org. Retrieved 27 June 2022.