> >
Getting a news article removed from the original publisher's website feels like a victory, and it is. But for many people, the relief is short-lived: they search their name days later and find that the Wayback Machine at Archive.org still hosts a full copy of the article, often with multiple snapshots going back years, and that Google has indexed the Archive.org URL and is serving it in search results. The archived copy problem is solvable, but the process is different from publisher removal and almost nobody explains it clearly.
Archive.org is an independent nonprofit -- publisher removal has no automatic effect on it. The Internet Archive has been crawling the web since 1996 and is not affiliated with any publisher. Removing an article from its source does not touch Archive.org's snapshots.
The robots.txt exclusion route removes all snapshots -- but requires the publisher to act. If a publisher adds the Wayback Machine crawl exclusion to their robots.txt file, the Internet Archive will honor it and remove existing snapshots of that domain. This requires the publisher to make a technical configuration change.
The Internet Archive accepts direct removal requests from individuals -- no publisher needed. If the archived page contains personal information, sensitive content, or was removed via legal order, you can submit a removal request directly to the Archive. This path is available even when the publisher cannot or will not act.
Google indexes Archive.org URLs independently -- a separate de-indexing step is often required. Even after Archive.org removes a snapshot, you may need to submit a separate Google de-indexing request for the Archive.org URL to remove it from search results.
The Internet Archive's Wayback Machine is an independent nonprofit organization that has been crawling and archiving the web since 1996. It operates on its own infrastructure, its own crawl schedule, and its own policies -- all entirely separate from any publisher or media outlet. When a publisher removes an article from their website, that decision affects only their own servers. The Internet Archive has no way to know the article was removed and no automatic mechanism to respond to publisher deletions.
By the time a news article is removed from its original source, the Wayback Machine may have captured it dozens or even hundreds of times. Major news sites are crawled frequently -- some weekly, some monthly -- meaning an article that was published years ago and lived on a publisher's site for that entire period could have scores of separate timestamped snapshots in Archive.org's database. Each snapshot is stored at its own unique URL in the format web.archive.org/web/[timestamp]/[original-url], and each of those URLs is independently accessible and potentially indexed by search engines.
This is the core of the problem: publisher removal eliminates one URL (the live article), but it does nothing about the archive of that URL across potentially years of snapshots. From Archive.org's perspective, the article still exists -- it was publicly available when they captured it, and the Archive's mission is to preserve public web content. They are not obligated to delete their records simply because the original publisher later changed their mind. Understanding this independence is the starting point for knowing how to address the problem.
For a broader look at how Google handles negative article removal requests, including how de-indexing differs from actual deletion, that guide covers the full landscape of what search engines can and cannot do.
Yes -- and this is precisely why the Wayback Machine problem matters so much for search results. Archive.org is one of the most authoritative domains on the internet. It is consistently ranked among the top 500 most-visited websites globally and has accumulated enormous link authority over decades of operation. Google treats Archive.org as a high-quality, trustworthy source and actively indexes its pages.
When the original publisher article is removed, Google will eventually notice that the publisher URL returns a 404 or 410 error and de-index it from search results. This can take weeks to months depending on how frequently Google crawls that domain. But while Google is processing the removal of the original URL, the Archive.org copies of that article remain live at their own URLs -- and those Archive.org URLs are independently crawlable, independently indexable, and can rank on their own merits.
The result is something many people find deeply frustrating: the article that was removed from the publisher can be replaced in Google search results by an Archive.org URL. Someone searching your name may now see a result like web.archive.org/web/2019/publisher-site.com/article-about-you appearing where the original publisher result used to show. In some cases the Archive.org version ranks higher than the original did, because Archive.org's overall domain authority can boost individual page rankings beyond what the publisher site itself achieved.
This dynamic is not a bug or an accident -- it is simply how Google's indexing works. Archive.org is a live website hosting live content. Google does not distinguish between "this is an archive of something that was removed" and "this is original content." If the URL is accessible and the page has content, Google can index it and rank it.
"The Wayback Machine problem is the most common post-removal surprise we see. A client gets an article removed, celebrates, and then emails us two weeks later to say Google is now showing archive.org/web/2019/[original URL] in the results. The good news: Archive.org has a removal process, and it works."
The most comprehensive removal path involves the publisher adding a crawl exclusion directive to their website's robots.txt file. Archive.org has long honored the robots.txt standard, and specifically recognizes directives targeting its crawler, which identifies itself as ia_archiver. When a publisher adds the following to their robots.txt file, the Internet Archive will stop crawling that domain and will remove existing snapshots of that domain from the Wayback Machine:
This is the most complete removal option available -- it eliminates all snapshots of all pages on the publisher's domain from the Wayback Machine's public interface. The downside is the one built into the name of this section: it requires the publisher to act. You cannot add this directive to someone else's website.
However, if you have already worked with the publisher to remove the article itself, you have an established point of contact and a demonstrated willingness on their part to engage with your request. Asking them -- as a follow-up step -- to add the Wayback Machine exclusion is often easier than the original article removal request. Here is why: the robots.txt change is a pure technical configuration decision, not an editorial one. The publisher is not being asked to make a judgment about the content of the article, to admit wrongdoing, or to revisit a past editorial decision. They are simply updating a text file on their server. Many publishers are receptive to this framing.
If a full-domain exclusion is too broad for the publisher (for example, if they do not want to block Archive.org from crawling their entire site, only the specific article), a targeted exclusion for just the article's URL path is also effective:
Once the publisher adds either directive, Archive.org's systems will detect it during their next crawl of the domain and process the removal. Existing snapshots of the covered URLs are removed from the Wayback Machine's public interface, typically within 2-6 weeks. This is the fastest and most thorough resolution available when publisher cooperation can be secured.
For situations where the publisher will not engage at all, the guide on what to do when an editor won't remove a news article covers alternative strategies for moving forward without publisher cooperation.
This path does not require publisher cooperation. The Internet Archive accepts direct removal requests from individuals when the archived content meets certain qualifying circumstances. The Archive is a nonprofit with a genuine public-service mission, and while they take their preservation work seriously, they also recognize legitimate claims from individuals whose personal information or sensitive content appears in archived pages.
The circumstances under which the Internet Archive will typically honor direct removal requests include:
To submit a direct removal request, go to archive.org/about/contact.php, select "Exclusion / URL removal" as the subject, and explain your situation clearly. Include the original article URL (the publisher's URL, even if it now returns a 404) and, if you know them, the specific Archive.org snapshot URLs you want removed. You can find all snapshots by visiting web.archive.org and entering the original article URL -- the calendar view will show every capture date. Include the basis for your request: whether the publisher removed the content, whether it contains your personal information, whether it was subject to a legal order, and any supporting details that establish your standing to request removal.
On timelines: direct requests to the Internet Archive typically take 4 to 8 weeks to process. The Archive handles requests in the order they are received and does not provide status updates or acknowledgment during the review period. You will not receive a confirmation email until the removal is complete -- or until the request is declined. Silence during the review period is normal and does not indicate rejection.
The Internet Archive is a nonprofit with a mission to preserve the open web. They take removal requests seriously but they are not obligated to remove content simply because the subject finds it embarrassing or damaging. Requests based on factual inaccuracy, legal orders, or sensitive personal information are the most successful. Requests based solely on reputational preference -- "I don't like how this article makes me look" -- may be declined. Frame your request around the qualifying circumstances that apply to your situation, not around the impact the content has on your reputation.
Even before Archive.org processes a removal request -- which can take 4 to 8 weeks -- you can act immediately to address the search results problem by requesting that Google de-index the specific Archive.org URL. This does not remove the content from Archive.org itself, but it removes the URL from Google's search index, meaning it will no longer appear in Google search results for your name. For most people, search result visibility is the primary practical concern, so this step can provide significant relief while the Archive.org removal is pending.
Google provides a tool called the Remove Outdated Content tool specifically for situations where a URL contains content that has been removed or changed at its source. Because the original article has already been removed from the publisher's site, you can legitimately represent the Archive.org URL as content whose original source has been removed.
The process works as follows:
An important distinction: Google de-indexing removes the URL from Google's search results only. The Archive.org page itself remains accessible. Anyone who knows the direct URL can still navigate to it. The de-indexing simply means Google will not serve it in response to search queries. For the vast majority of people whose concern is what appears when someone searches their name, this is a meaningful and useful result -- but it is not equivalent to Archive.org actually removing the content.
Understanding how long the full removal process takes, including both publisher removal and the downstream archive and de-indexing steps, helps set realistic expectations for the complete timeline from start to resolution.
| Approach | What It Removes | Requires Publisher | Timeframe | Difficulty |
|---|---|---|---|---|
| Publisher adds robots.txt exclusion | All snapshots of the domain from Archive.org | Yes | 2-6 weeks after publisher acts | Moderate (depends on publisher) |
| Direct Archive.org removal request | Specific snapshots at requested URLs | No | 4-8 weeks | Moderate |
| Google de-indexing (Outdated Content tool) | Archive.org URL from Google search only | No | 2-4 weeks | Easy |
| DMCA takedown to Archive.org | Specific copyrighted content | No (requires copyright claim) | 2-4 weeks | Moderate |
| Legal order / court order | All snapshots; Archive.org must comply | No (requires court order) | Varies | Hard |
Article removed from the publisher but still live on Archive.org? We know the path. Paste the original article URL to get started.
Get Started at RemoveNews.aiArchive.org is the most prominent archiving service, but it is not the only one that may have captured your article. If the article was distributed through a wire service and appears on multiple publisher sites in addition to Archive.org, see our guide on syndicated news article removal for how to address copies across multiple outlets simultaneously. Several other platforms maintain cached or archived copies of web content, each with its own removal policies and processes.
Google Cache -- As of 2024, Google has deprecated its traditional cached pages feature. Google no longer maintains easily accessible cached copies of web pages in the way it once did, and the "Cached" link that used to appear alongside search results has been removed. For most practical purposes, Google Cache is no longer a significant source of persisted article content.
Bing Cache -- Microsoft's Bing search engine still maintains cached copies of web pages. If a removed article is appearing via a Bing Cache URL, you can submit it through Bing's Content Removal tool, available through the Bing Webmaster Tools portal. The process is similar to Google's Outdated Content Removal tool -- submit the cached URL with a description of why the content should be removed.
archive.today (formerly archive.is) -- This is a separate archiving service that is entirely independent of Archive.org and notably more resistant to removal requests. archive.today has no formal published removal policy, and the service's operators evaluate requests on a case-by-case basis with significant discretion to decline them. The most reliable path for archive.today removal is a legal order directed at the service. In the meantime, Google de-indexing of the specific archive.today URL is typically more practical and achievable than direct removal.
CourtListener and Justia -- These platforms index legal documents including court filings, decisions, and public records. If the article about you involved legal proceedings, CourtListener and Justia may have separately indexed those court documents, creating additional search results beyond the article itself. Both platforms have their own removal processes -- CourtListener considers removal requests in cases of personal safety risk or legal orders; Justia has similar policies. These are separate removal tracks from article content and require individual attention.
Getting the publisher to remove an article is step one. We handle the full path -- Archive.org removal requests, Google de-indexing, and every downstream cache that needs to be addressed.
Free assessment. Confidential. No obligation.