Dark Visitors

A List of Known AI Agents on the Internet

Insight into the hidden ecosystem of autonomous chatbots and data scrapers crawling across the web. Protect your website from unwanted AI agent access.

Sign up to get notified when new AI agents are added, so you can update your website's robots.txt.
Submit an AI Agent
ChatGPT-User
10% blocked

ChatGPT-User is dispatched by OpenAI's ChatGPT in response to user prompts. Its answers will usually contain a summary of the content on the website rather than relaying it to the user directly.

AI Assistant
cohere-ai
1% blocked

cohere-ai is an unconfirmed agent possibly dispatched by Cohere's AI chat products in response to user prompts when it needs to retrieve content on the internet.

AI Assistant
anthropic-ai
2% blocked

anthropic-ai is a unconfirmed agent possibly used by Anthropic to download training data for its LLMs (Large Language Models) that power AI products like Claude.

AI Data Scraper
Bytespider
2% blocked

Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok. It's allegedly used to download training data for its LLMs (Large Language Model) including those powering ChatGPT competitor Doubao.

AI Data Scraper
CCBot
13% blocked

CCBot is a web crawler used by Common Crawl to maintainin an open source repository of web crawl data that is available for anyone to use. This repository has been used to train many LLMs (Large Language Models), including OpenAI's GPTs.

AI Data Scraper
FacebookBot
0% blocked

FacebookBot is a web crawler used by Meta to download training data for its AI speech recognition technology.

AI Data Scraper
Google-Extended
11% blocked

Google-Extended is a web crawler used by Google to download AI training content for its AI products like Bard and Vertex AI generative APIs.

AI Data Scraper
GPTBot
30% blocked

GPTBot is a web crawler used by OpenAI to download training data for its LLMs (Large Language Models) that power AI products like ChatGPT.

AI Data Scraper
omgili
0% blocked

Omgili is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.

AI Data Scraper
Amazonbot
0% blocked

Amazonbot is a web crawler used by Amazon to index search results that allow the Alexa AI Assistant to answer user questions. Alexa's answers normally contain references to the website.

AI Search Crawler
Applebot
2% blocked

Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website.

AI Search Crawler
PerplexityBot
0% blocked

PerplexityBot is a web crawler used by Perplexity to index search results that allow their AI Assistant to answer user questions. The assistant's answers normally contain references to the website as inline sources.

AI Search Crawler
YouBot
0% blocked

YouBot is a web crawler used by You.com to index search results that allow their AI Assistant to answer user questions. The assistant's answers normally contain references to the website as inline sources.

AI Search Crawler