Remove Article from the Wayback Machine (2026)

Key Takeaways

Archive.org is an independent nonprofit -- publisher removal has no automatic effect on it. The Internet Archive has been crawling the web since 1996 and is not affiliated with any publisher. Removing an article from its source does not touch Archive.org's snapshots.
The robots.txt exclusion route removes all snapshots -- but requires the publisher to act. If a publisher adds the Wayback Machine crawl exclusion to their robots.txt file, the Internet Archive will honor it and remove existing snapshots of that domain. This requires the publisher to make a technical configuration change.
The Internet Archive accepts direct removal requests from individuals -- no publisher needed. If the archived page contains personal information, sensitive content, or was removed via legal order, you can submit a removal request directly to the Archive. This path is available even when the publisher cannot or will not act.
Google indexes Archive.org URLs independently -- a separate de-indexing step is often required. Even after Archive.org removes a snapshot, you may need to submit a separate Google de-indexing request for the Archive.org URL to remove it from search results.

Why Archive.org Has Your Article After the Publisher Removed It

The Internet Archive's Wayback Machine is an independent nonprofit organization that has been crawling and archiving the web since 1996. It operates on its own infrastructure, its own crawl schedule, and its own policies -- all entirely separate from any publisher or media outlet. When a publisher removes an article from their website, that decision affects only their own servers. The Internet Archive has no way to know the article was removed and no automatic mechanism to respond to publisher deletions.

By the time a news article is removed from its original source, the Wayback Machine may have captured it dozens or even hundreds of times. Major news sites are crawled frequently -- some weekly, some monthly -- meaning an article that was published years ago and lived on a publisher's site for that entire period could have scores of separate timestamped snapshots in Archive.org's database. Each snapshot is stored at its own unique URL in the format web.archive.org/web/[timestamp]/[original-url], and each of those URLs is independently accessible and potentially indexed by search engines.

This is the core of the problem: publisher removal eliminates one URL (the live article), but it does nothing about the archive of that URL across potentially years of snapshots. From Archive.org's perspective, the article still exists -- it was publicly available when they captured it, and the Archive's mission is to preserve public web content. They are not obligated to delete their records simply because the original publisher later changed their mind. Understanding this independence is the starting point for knowing how to address the problem.

For a broader look at how Google's removal policies work, including how de-indexing differs from actual deletion, that guide covers the full landscape of what search engines can and cannot do. If the original article also needs to be de-indexed from Google, that should run in parallel with the Archive.org request. For criminal record articles specifically, see our guide on.

Does Google Index Archive.org URLs?

Yes -- and this is precisely why the Wayback Machine problem matters so much for search results. Archive.org is one of the most authoritative domains on the internet. It is consistently ranked among the top 500 most-visited websites globally and has accumulated enormous link authority over decades of operation. Google treats Archive.org as a high-quality, trustworthy source and actively indexes its pages.

When the original publisher article is removed, Google will eventually notice that the publisher URL returns a 404 or 410 error and de-index it from search results. This can take weeks to months depending on how frequently Google crawls that domain. But while Google is processing the removal of the original URL, the Archive.org copies of that article remain live at their own URLs -- and those Archive.org URLs are independently crawlable, independently indexable, and can rank on their own merits.

The result is something many people find deeply frustrating: the article that was removed from the publisher can be replaced in Google search results by an Archive.org URL. Someone searching your name may now see a result like web.archive.org/web/2019/publisher-site.com/article-about-you appearing where the original publisher result used to show. In some cases the Archive.org version ranks higher than the original did, because Archive.org's overall domain authority can boost individual page rankings beyond what the publisher site itself achieved.

This dynamic is not a bug or an accident -- it is simply how Google's indexing works. Archive.org is a live website hosting live content. Google does not distinguish between "this is an archive of something that was removed" and "this is original content." If the URL is accessible and the page has content, Google can index it and rank it.

Path 1 -- The robots.txt / Exclusion Route (Requires Publisher Cooperation)

The most comprehensive removal path involves the publisher adding a crawl exclusion directive to their website's robots.txt file. Archive.org has long honored the robots.txt standard, and specifically recognizes directives targeting its crawler, which identifies itself as ia_archiver. When a publisher adds the following to their robots.txt file, the Internet Archive will stop crawling that domain and will remove existing snapshots of that domain from the Wayback Machine:

This is the most complete removal option available -- it eliminates all snapshots of all pages on the publisher's domain from the Wayback Machine's public interface. The downside is the one built into the name of this section: it requires the publisher to act. You cannot add this directive to someone else's website.

However, if you have already worked with the publisher to remove the article itself, you have an established point of contact and a demonstrated willingness on their part to engage with your request. Asking them -- as a follow-up step -- to add the Wayback Machine exclusion is often easier than the original article removal request. Here is why: the robots.txt change is a pure technical configuration decision, not an editorial one. The publisher is not being asked to make a judgment about the content of the article, to admit wrongdoing, or to revisit a past editorial decision. They are simply updating a text file on their server. Many publishers are receptive to this framing.

If a full-domain exclusion is too broad for the publisher (for example, if they do not want to block Archive.org from crawling their entire site, only the specific article), a targeted exclusion for just the article's URL path is also effective:

Once the publisher adds either directive, Archive.org's systems will detect it during their next crawl of the domain and process the removal. Existing snapshots of the covered URLs are removed from the Wayback Machine's public interface, typically within 2-6 weeks. This is the fastest and most thorough resolution available when publisher cooperation can be secured.

Path 2 -- The Direct Internet Archive Removal Request

This path does not require publisher cooperation. The Internet Archive accepts direct removal requests from individuals when the archived content meets certain qualifying circumstances. The Archive is a nonprofit with a genuine public-service mission, and while they take their preservation work seriously, they also recognize legitimate claims from individuals whose personal information or sensitive content appears in archived pages.

The circumstances under which the Internet Archive will typically honor direct removal requests include:

To submit a direct removal request, follow the Internet Archive contact form at archive.org/about/contact.php, select "Exclusion / URL removal" as the subject, and explain your situation clearly. The Internet Archive terms of service outline the circumstances under which removal requests are honored, which is useful to review before submitting. Include the original article URL (the publisher's URL, even if it now returns a 404) and, if you know them, the specific Archive.org snapshot URLs you want removed. You can find all snapshots by visiting web.archive.org and entering the original article URL -- the calendar view will show every capture date. Include the basis for your request: whether the publisher removed the content, whether it contains your personal information, whether it was subject to a legal order, and any supporting details that establish your standing to request removal.

On timelines: direct requests to the Internet Archive typically take 4 to 8 weeks to process. The Archive handles requests in the order they are received and does not provide status updates or acknowledgment during the review period. You will not receive a confirmation email until the removal is complete -- or until the request is declined. Silence during the review period is normal and does not indicate rejection.

Path 3 -- Google De-Indexing of the Archive.org URL

Even before Archive.org processes a removal request -- which can take 4 to 8 weeks -- you can act immediately to address the search results problem by requesting that Google de-index the specific Archive.org URL. This does not remove the content from Archive.org itself, but it removes the URL from Google's search index, meaning it will no longer appear in Google search results for your name. For most people, search result visibility is the primary practical concern, so this step can provide significant relief while the Archive.org removal is pending.

Google provides the Google outdated content removal tool specifically for situations where a URL contains content that has been removed or changed at its source. For Bing, use Bing's removal tool in parallel. Because the original article has already been removed from the publisher's site, you can legitimately represent the Archive.org URL as content whose original source has been removed.

An important distinction: Google de-indexing removes the URL from Google's search results only. The Archive.org page itself remains accessible. Anyone who knows the direct URL can still navigate to it. The de-indexing simply means Google will not serve it in response to search queries. For the vast majority of people whose concern is what appears when someone searches their name, this is a meaningful and useful result -- but it is not equivalent to Archive.org actually removing the content.

Understanding how long the full removal process takes, including both publisher removal and the downstream archive and de-indexing steps, helps set realistic expectations for the complete timeline from start to resolution. If you also need to deindex the article on Google in parallel with the Archive.org request, that guide covers the specific tools and timelines. For the full multi-platform removal checklist including Wayback Machine, search engines, and syndicated copies, see our guide on how to remove negative articles from the internet.

Approach Comparison: Which Path Is Right for Your Situation?

The 5-Step Archive.org Removal Plan

Other Archive Sites That Cache Removed Articles

Approach	What It Removes	Requires Publisher	Timeframe	Difficulty
Publisher adds robots.txt exclusion	All snapshots of the domain from Archive.org	Yes	2-6 weeks after publisher acts	Moderate (depends on publisher)
Direct Archive.org removal request	Specific snapshots at requested URLs	No	4-8 weeks	Moderate
Google de-indexing (Outdated Content tool)	Archive.org URL from Google search only	No	2-4 weeks	Easy
DMCA takedown to Archive.org	Specific copyrighted content	No (requires copyright claim)	2-4 weeks	Moderate
Legal order / court order	All snapshots; Archive.org must comply	No (requires court order)	Varies	Hard

Archive.org is the most prominent archiving service, but it is not the only one that may have captured your article. If the article was distributed through a wire service and appears on multiple publisher sites in addition to Archive.org, see our guide on syndicated news article removal for how to address copies across multiple outlets simultaneously. Several other platforms maintain cached or archived copies of web content, each with its own removal policies and processes.

Google Cache -- As of 2024, Google has deprecated its traditional cached pages feature. Google no longer maintains easily accessible cached copies of web pages in the way it once did, and the "Cached" link that used to appear alongside search results has been removed. For most practical purposes, Google Cache is no longer a significant source of persisted article content.

Bing Cache -- Microsoft's Bing search engine still maintains cached copies of web pages. If a removed article is appearing via a Bing Cache URL, you can submit it through Bing's Content Removal tool, available through the Bing Webmaster Tools portal. The process is similar to Google's Outdated Content Removal tool -- submit the cached URL with a description of why the content should be removed.

archive.today (formerly archive.is) -- This is a separate archiving service that is entirely independent of Archive.org and notably more resistant to removal requests. archive.today has no formal published removal policy, and the service's operators evaluate requests on a case-by-case basis with significant discretion to decline them. The most reliable path for archive.today removal is a legal order directed at the service. In the meantime, Google de-indexing of the specific archive.today URL is typically more practical and achievable than direct removal.

CourtListener and Justia -- These platforms index legal documents including court filings, decisions, and public records. If the article about you involved legal proceedings, CourtListener and Justia may have separately indexed those court documents, creating additional search results beyond the article itself. Both platforms have their own removal processes -- CourtListener considers removal requests in cases of personal safety risk or legal orders; Justia has similar policies. These are separate removal tracks from article content and require individual attention.

Frequently Asked Questions

Common Questions About Archive.org and Wayback Machine Removal

If I get the article removed from the publisher, will Archive.org automatically remove it too?

No. Archive.org (the Wayback Machine) is an independent nonprofit that operates entirely separately from any publisher. When a publisher removes an article from their website, that action has no automatic effect on the Internet Archive. Archive.org may have captured the article dozens or hundreds of times over the years it was live, and each of those snapshots remains accessible at its own URL until Archive.org is specifically asked to remove it -- through the robots.txt exclusion route or a direct removal request.

How do I submit a removal request to the Internet Archive?

Go to archive.org/about/contact.php, select Exclusion or URL removal as the subject, and explain your situation. Include the original article URL and, if known, the specific Archive.org snapshot URLs (in the format web.archive.org/web/[timestamp]/[original-url]). Explain the basis for your request: whether the content was removed by the publisher, contains personal information, was taken down due to a legal order, or involves other qualifying circumstances. Be factual and specific.

How long does it take for Archive.org to remove a page?

Direct removal requests to the Internet Archive typically take 4 to 8 weeks to process. The Archive handles requests in order and does not provide status updates during processing. You will not receive confirmation until the removal is complete. Requests based on publisher removal, personal information, or legal orders tend to be processed successfully; requests based solely on reputational preference may be declined.

Can Google still show an Archive.org URL after I get the original article removed?

Yes. Google indexes Archive.org URLs independently of the original publisher URLs. When a publisher removes an article, Google may eventually de-index the original URL -- but the Archive.org copy is a different, live URL that Google can rank separately. Many people are surprised to find that Google now shows an archive.org/web/[timestamp]/[original-url] result where the publisher URL used to appear. You can address this by submitting the Archive.org URL to Google's Outdated Content Removal tool, which typically takes 2 to 4 weeks to process.

What is the robots.txt Wayback Machine exclusion and does it work?

The robots.txt Wayback Machine exclusion is a technical directive that website owners can add to their robots.txt file to instruct the Internet Archive to stop crawling their domain and to remove existing snapshots. The directive is: User-agent: ia_archiver followed by Disallow: / for the full domain, or Disallow: /specific-path/ for a specific URL path. The Internet Archive honors this exclusion and, when it detects the directive, removes existing snapshots of the affected domain or path from the Wayback Machine's public interface. The catch is that this requires the publisher to add the directive -- you cannot add it to someone else's website.

Is there a way to remove an article from archive.today (archive.is)?

archive.today (formerly archive.is) is a separate archiving service not affiliated with Archive.org, and it is generally more resistant to removal requests than the Internet Archive. archive.today does have a contact form where you can submit removal requests, but the service has no formal published removal policy and requests are evaluated on a case-by-case basis. Legal orders are the most reliable path to removal from archive.today. Google de-indexing of the specific archive.today URL is also available and is often the more practical approach while pursuing direct removal.

You Got the Article Removed. Archive.org Still Has It.

Why Archive.org Has Your Article After the Publisher Removed It

Does Google Index Archive.org URLs?

Path 1 -- The robots.txt / Exclusion Route (Requires Publisher Cooperation)

Path 2 -- The Direct Internet Archive Removal Request

Path 3 -- Google De-Indexing of the Archive.org URL

Approach Comparison: Which Path Is Right for Your Situation?

The 5-Step Archive.org Removal Plan

Other Archive Sites That Cache Removed Articles

The Archive Problem Is Solvable.

Is your article removable?
Find out -- free.

Common Questions About Archive.org and Wayback Machine Removal

You Got the Article Removed. Archive.org Still Has It.

Why Archive.org Has Your Article After the Publisher Removed It

Does Google Index Archive.org URLs?

Path 1 -- The robots.txt / Exclusion Route (Requires Publisher Cooperation)

Path 2 -- The Direct Internet Archive Removal Request

Path 3 -- Google De-Indexing of the Archive.org URL

Approach Comparison: Which Path Is Right for Your Situation?

The 5-Step Archive.org Removal Plan

Other Archive Sites That Cache Removed Articles

The Archive Problem Is Solvable.

Is your article removable?Find out -- free.

Common Questions About Archive.org and Wayback Machine Removal

Is your article removable?
Find out -- free.