archive.org_bot User Agent - Internet Archive's Archiver Documentation

archive.org_bot is the Internet Archive's web crawler for the Wayback Machine, systematically crawling and preserving publicly accessible web pages for historical record and research. You can use Agent Analytics to see how often archive.org_bot visits your website.

Agent Type

Archiver

Captures and stores historical website snapshots for long-term digital preservation

Expected Behavior

Archivers crawl websites to create historical snapshots for preservation purposes. They typically visit on a regular cadence to build a chronological record of how content changes over time. Crawl frequency varies based on site popularity and content update patterns. Unlike search crawlers, archivers aim to capture and store complete page states rather than extract information for indexing.

Detail

Operated By	Internet Archive
Last Updated	9 hours ago

Global Insights

See All AI & Bot Traffic →

Top Website Robots.txts

2% of top websites are blocking archive.org_bot

Learn How →

Country of Origin

United States

archive.org_bot normally visits from the United States

Top Website Blocking Trend Over Time

The percentage of the world's top 1000 websites who are blocking archive.org_bot

Overall Archiver Traffic

The percentage of all internet traffic coming from archivers

Top Visited Website Categories

News

Business and Industrial

Food and Drink

People and Society

Hobbies and Leisure

How Do I Get These Insights for My Website?

Use the WordPress plugin, Node.js package, or API to get started in seconds.

Set Up Agent Analytics For Free →

User Agent String

Example Mozilla/5.0 (compatible; archive.org_bot +http://archive.org/details/archive.org_bot) Zeno/139ea40 warc/v0.8.96

Access other known user agent strings and recent IP addresses using the API.

Robots.txt

In this example, all pages are blocked. You can customize which pages are off-limits by swapping out / for a different disallowed path.

User-agent: archive.org_bot # https://darkvisitors.com/agents/archive-org-bot
Disallow: /

How Do I Block All Archivers?

⚠️ Manually copying and pasting this rule is not scalable, because new archivers are discovered every day. Instead, serve a robots.txt that updates automatically.

Set Up Automatic Robots.txt For Free →

Frequently Asked Questions About archive.org_bot

Should I Block archive.org_bot?

It depends on your goals. Digital archiving preserves cultural and historical records for future generations. Most website owners appreciate being included in archives like the Wayback Machine. However, if you handle sensitive content or prefer not to have historical snapshots, you can block archivers.

How Do I Block archive.org_bot?

If you want to, you can block or limit archive.org_bot's access by configuring user agent token rules in your robots.txt file. The best way to do this is using Automatic Robots.txt, which update automatically as new agents are discovered. While the vast majority of agents operated by reputable companies honor these robots.txt directives, bad actors may choose to ignore them entirely. In that case, you'll need to implement alternative blocking methods such as firewall rules or server-level restrictions. You can verify whether archive.org_bot is respecting your rules by setting up Agent Analytics to monitor its visits to your website.

Will Blocking archive.org_bot Hurt My SEO?

Blocking archivers has no direct SEO impact since they don't influence search engine rankings. However, archived content can provide historical context and backlink opportunities. Some SEO tools also reference archived data for analysis, so blocking might limit certain SEO insights.

Does archive.org_bot Access Private Content?

Archivers typically focus on publicly accessible content to create historical records. They generally don't attempt to access password-protected or private content, as their goal is to preserve public web history. However, they may archive content that's publicly accessible but not intended for long-term preservation, such as temporary pages or draft content.

How Can I Tell if archive.org_bot Is Visiting My Website?

Setting up Agent Analytics will give you realtime visibility into archive.org_bot visiting your website, along with hundreds of other AI agents, crawlers, and scrapers. This will also let you measure human traffic to your website coming from AI search and chat LLM platforms like ChatGPT, Perplexity, and Gemini.

Why Is archive.org_bot Visiting My Website?

archive.org_bot discovered your site through web discovery methods or your site was submitted to their archiving service. Your content was selected for preservation either as part of broad web archiving efforts or because it was specifically nominated for historical preservation.

How Can I Authenticate Visits From archive.org_bot?

Agent Analytics authenticates agent visits from many agents, letting you know whether each one was actually from that agent, or spoofed by a bad actor. This helps you identify suspicious traffic patterns and make informed decisions about blocking or allowing specific user agents.

References

https://archive.org/help/

What Is archive.org_bot?