Agents
AI Data Scrapers
Every known artificial agent (bot) on the internet. You can track their activity on your website with agent analytics, or control their behavior with automatic robots.txt.
AI Data Scrapers
Ai2Bot-Dolma
Ai2Bot-Dolma is operated by Ai2, a non-profit AI research institute. It's used to download data to train open source AI models.
AI Data Scraper
See More →
Applebot-Extended
Apple-Extended is used to train Apple’s foundation LLM models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.
AI Data Scraper
See More →
Bytespider
Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok. It's allegedly used to download training data for its LLMs (Large Language Model) including those powering ChatGPT competitor Doubao.
AI Data Scraper
See More →
CCBot
CCBot is Common Crawl's web crawler that creates an open repository of web data, making crawled content universally accessible for research, analysis, and AI training purposes.
AI Data Scraper
See More →
ChatGLM-Spider
ChatGLM-Spider is a web crawler operated by Zhipu AI, the Chinese company behind ChatGLM. It is used for collecting data to train and evaluate the company's large language models.
AI Data Scraper
See More →
ClaudeBot
ClaudeBot is a web crawler operated by Anthropic to download training data for its LLMs (Large Language Models) that power AI products like Claude.
AI Data Scraper
See More →
CloudVertexBot
CloudVertexBot is a Google-operated crawler available to site owners to request targeted crawls of their own sites for AI training purposes on the Vertex AI platform.
AI Data Scraper
See More →
cohere-training-data-crawler
cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products.
AI Data Scraper
See More →
Cotoyogi
Cotoyogi is a research crawler operated by Japan's Research Organization of Information and Systems that collects web content to build AI training datasets for research purposes.
AI Data Scraper
See More →
Datenbank Crawler
Datenbank Crawler is a web crawler operated by German company netEstate used for collecting and selling international website data.
AI Data Scraper
See More →
Diffbot
Diffbot is an intelligent web crawler used to understand, aggregate, and ultimately sell structured website data for real-time monitoring and training other AI models.
AI Data Scraper
See More →
FacebookBot
FacebookBot is a web crawler used by Meta to download training data for its AI speech recognition technology.
AI Data Scraper
See More →
Google-Extended
Google-Extended is a web crawler used by Google to download AI training content for its AI products like the Gemini assistant and its Vertex AI generative APIs.
AI Data Scraper
See More →
GoogleOther
GoogleOther is Google's generic crawler used by various product teams for fetching publicly accessible content, including one-off crawls for internal research and development.
AI Data Scraper
See More →
GPTBot
GPTBot is OpenAI's web crawler that collects data from publicly accessible web pages to improve AI models like ChatGPT, while respecting robots.txt and opt-out preferences.
AI Data Scraper
See More →
ICC-Crawler
ICC-Crawler is NICT's research crawler that automatically collects web pages from the Internet for academic research at Japan's National Institute of Information and Communications Technology.
AI Data Scraper
See More →
Kangaroo Bot
Kangaroo Bot is used by the company Kangaroo LLM to download data to train open source AI models tailored to Australian language and culture.
AI Data Scraper
See More →
meta-externalagent
meta-externalagent crawls web content for training AI models and improving Meta's products by indexing content directly across the internet.
AI Data Scraper
See More →
netEstate Imprint Crawler
netEstate Imprint Crawler is an AI data scraper operated by netEstate. If you think this is incorrect or can provide additional detail about its purpose, please let us know.
AI Data Scraper
See More →
omgili
omgili is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.
AI Data Scraper
See More →
PanguBot
PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu.
AI Data Scraper
See More →
Timpibot
Timpibot is used by Timpi's decentralized network of independent node operators. The index they build can be used to train LLMs (Large Language Models).
AI Data Scraper
See More →
VelenPublicWebCrawler
VelenPublicWebCrawler is an AI data scraper operated by Hunter. If you think this is incorrect or can provide additional detail about its purpose, please let us know.
AI Data Scraper
See More →
webzio-extended
webzio-extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.
AI Data Scraper
See More →