Dark Visitors
A List of Known AI Agents on the Internet
Insight into the hidden ecosystem of autonomous chatbots and data scrapers crawling across the web. Protect your website from unwanted AI agent access.
ChatGPT-User is dispatched by OpenAI's ChatGPT in response to user prompts. Its answers will usually contain a summary of the content on the website, along with a reference link.
Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok. It's allegedly used to download training data for its LLMs (Large Language Model) including those powering ChatGPT competitor Doubao.
CCBot is a web crawler used by Common Crawl to maintainin an open source repository of web crawl data that is available for anyone to use. This repository has been used to train many LLMs (Large Language Models), including OpenAI's GPTs.
Diffbot is an intelligent web crawler used to understand, aggregate, and ultimately sell structured website data for real-time monitoring and training other AI models.
FacebookBot is a web crawler used by Meta to download training data for its AI speech recognition technology.
Google-Extended is a web crawler used by Google to download AI training content for its AI products like the Gemini assistant and its Vertex AI generative APIs.
GPTBot is a web crawler used by OpenAI to download training data for its LLMs (Large Language Models) that power AI products like ChatGPT.
Omgili is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.
Amazonbot is a web crawler used by Amazon to index search results that allow the Alexa AI Assistant to answer user questions. Alexa's answers normally contain references to the website.
Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website.
PerplexityBot is a web crawler used by Perplexity to index search results that allow their AI Assistant to answer user questions. The assistant's answers normally contain references to the website as inline sources.
YouBot is a web crawler used by You.com to index search results that allow their AI Assistant to answer user questions. The assistant's answers normally contain references to the website as inline sources.
anthropic-ai is a unconfirmed agent possibly used by Anthropic to download training data for its LLMs (Large Language Models) that power AI products like Claude.
Claude-Web is an AI-related agent operated by Anthropic. It's currently unclear exactly what it's used for, since there's no official documentation. If you can provide more detail, please contact us.
ClaudeBot is an AI-related agent operated by Anthropic. It's currently unclear exactly what it's used for, since there's no official documentation. If you can provide more detail, please contact us.
cohere-ai is an unconfirmed agent possibly dispatched by Cohere's AI chat products in response to user prompts when it needs to retrieve content on the internet.