ArchiveBot User Agent - Wikimedia's Intelligence Gatherer: Block or Allow?

ArchiveBot is an intelligence gatherer operated by Wikimedia. If you think this is incorrect or can provide additional detail about its purpose, please let us know. You can use Agent Analytics to see how often ArchiveBot visits your website.

Agent Type

Intelligence Gatherer

Analyzes web content for brand safety, competitive insights, and ad targeting

Expected Behavior

Intelligence gatherers crawl websites to collect business intelligence, competitive data, and market insights on behalf of their clients. These tools may use artificial intelligence to identify and extract information like pricing changes, product listings, brand mentions, or trademark usage. Crawl patterns are highly variable. Sites relevant to a client's monitoring goals may be visited frequently (daily or hourly), while others may never be crawled. They typically focus on specific pages or data points rather than comprehensive site crawls.

Detail

Operated By	Wikimedia
Last Updated	3 hours ago

Global Insights

See All AI & Bot Traffic →

Top Website Robots.txts

0% of top websites are blocking ArchiveBot

Learn How →

Country of Origin

United States

ArchiveBot normally visits from the United States

Top Website Blocking Trend Over Time

The percentage of the world's top 1000 websites who are blocking ArchiveBot

Overall Intelligence Gatherer Traffic

The percentage of all internet traffic coming from intelligence gatherers

Top Visited Website Categories

People and Society

Business and Industrial

News

Books and Literature

Science

How Do I Get These Insights for My Website?

Use the WordPress plugin, Node.js package, or API to get started in seconds.

Set Up Agent Analytics For Free →

User Agent String

Example ArchiveTeam ArchiveBot/20250806.050c783 (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36

Access other known user agent strings and recent IP addresses using the API.

Robots.txt

In this example, all pages are blocked. You can customize which pages are off-limits by swapping out / for a different disallowed path.

User-agent: ArchiveBot # https://darkvisitors.com/agents/archivebot
Disallow: /

How Do I Block All Intelligence Gatherers?

⚠️ Manually copying and pasting this rule is not scalable, because new intelligence gatherers are discovered every day. Instead, serve a robots.txt that updates automatically.

Set Up Automatic Robots.txt For Free →

Frequently Asked Questions About ArchiveBot

Should I Block ArchiveBot?

It depends on the use case. Intelligence gathering can range from legitimate market research to competitive data harvesting. If you benefit from similar services or the gathering seems reasonable, allow access. Block it if the activity appears excessive or solely benefits competitors.

How Do I Block ArchiveBot?

If you want to, you can block or limit ArchiveBot's access by configuring user agent token rules in your robots.txt file. The best way to do this is using Automatic Robots.txt, which update automatically as new agents are discovered. While the vast majority of agents operated by reputable companies honor these robots.txt directives, bad actors may choose to ignore them entirely. In that case, you'll need to implement alternative blocking methods such as firewall rules or server-level restrictions. You can verify whether ArchiveBot is respecting your rules by setting up Agent Analytics to monitor its visits to your website.

Will Blocking ArchiveBot Hurt My SEO?

Blocking intelligence gatherers has minimal direct SEO impact since they don't control search indexing. However, if competitors use these tools to monitor your SEO strategy, blocking them might actually provide competitive advantages by limiting their access to your optimization tactics and performance data.

Does ArchiveBot Access Private Content?

Intelligence gatherers typically focus on publicly accessible business information, but their scope can vary significantly. Some limit themselves to public websites and social media, while others may attempt to access restricted databases, employee directories, or other sensitive information sources. The scope depends on the operator's objectives and ethical boundaries.

How Can I Tell if ArchiveBot Is Visiting My Website?

Setting up Agent Analytics will give you realtime visibility into ArchiveBot visiting your website, along with hundreds of other AI agents, crawlers, and scrapers. This will also let you measure human traffic to your website coming from AI search and chat LLM platforms like ChatGPT, Perplexity, and Gemini.

Why Is ArchiveBot Visiting My Website?

ArchiveBot likely identified your site as relevant to their clients' business intelligence needs. Your site may contain information about competitors, market data, pricing, or other business insights that their monitoring system was configured to track and analyze.

How Can I Authenticate Visits From ArchiveBot?

Agent Analytics authenticates agent visits from many agents, letting you know whether each one was actually from that agent, or spoofed by a bad actor. This helps you identify suspicious traffic patterns and make informed decisions about blocking or allowing specific user agents.

References

https://meta.wikimedia.org/wiki/InternetArchiveBot/FAQ_for_sysadmins

What Is ArchiveBot?