Agents
AI Data Scrapers
Every known artificial agent (bot) on the internet. You can track their activity on your website with Agent Analytics, or control their behavior with Robots.txt Categories.
AI Data Scrapers
Ai2Bot-Dolma
Ai2Bot-Dolma is operated by Ai2, a non-profit AI research institute. It's used to download data to train open source AI models.
AI Data Scraper
See More →
Applebot-Extended
Apple-Extended is used to train Apple’s foundation LLM models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.
AI Data Scraper
See More →
Bytespider
Bytespider is a web crawler operated by ByteDance, the Chinese owner of TikTok. It's allegedly used to download training data for its LLMs (Large Language Model) including those powering ChatGPT competitor Doubao.
AI Data Scraper
See More →
CCBot
CCBot is Common Crawl's web crawler that creates an open repository of web data, making crawled content universally accessible for research, analysis, and AI training purposes.
AI Data Scraper
See More →
ChatGLM-Spider
ChatGLM-Spider is a web crawler operated by Zhipu AI, the Chinese company behind ChatGLM. It is used for collecting data to train and evaluate the company's large language models.
AI Data Scraper
See More →
ClaudeBot
ClaudeBot is a web crawler operated by Anthropic to download training data for its LLMs (Large Language Models) that power AI products like Claude.
AI Data Scraper
See More →
CloudVertexBot
CloudVertexBot is a Google-operated crawler available to site owners to request targeted crawls of their own sites for AI training purposes on the Vertex AI platform.
AI Data Scraper
See More →
cohere-training-data-crawler
cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products.
AI Data Scraper
See More →
Cotoyogi
Cotoyogi is a research crawler operated by Japan's Research Organization of Information and Systems that collects web content to build AI training datasets for research purposes.
AI Data Scraper
See More →
Datenbank Crawler
Datenbank Crawler is a web crawler operated by German company netEstate used for collecting and selling international website data.
AI Data Scraper
See More →
Diffbot
Diffbot is an intelligent web crawler used to understand, aggregate, and ultimately sell structured website data for real-time monitoring and training other AI models.
AI Data Scraper
See More →
FacebookBot
FacebookBot is a web crawler used by Meta to download training data for its AI speech recognition technology.
AI Data Scraper
See More →
Google-Extended
Google-Extended is a web crawler used by Google to download AI training content for its AI products like the Gemini assistant and its Vertex AI generative APIs.
AI Data Scraper
See More →
GoogleOther
GoogleOther is Google's generic crawler used by various product teams for fetching publicly accessible content, including one-off crawls for internal research and development.
AI Data Scraper
See More →
GPTBot
GPTBot is OpenAI's web crawler that collects data from publicly accessible web pages to improve AI models like ChatGPT, while respecting robots.txt and opt-out preferences.
AI Data Scraper
See More →
ICC-Crawler
ICC-Crawler is NICT's research crawler that automatically collects web pages from the Internet for academic research at Japan's National Institute of Information and Communications Technology.
AI Data Scraper
See More →
imageSpider
imageSpider is a web crawler operated by ByteDance, the company behind TikTok, Douyin, and other content platforms. The bot collects images from websites across the internet, likely to support ByteDance's various AI products.
AI Data Scraper
See More →
Kangaroo Bot
Kangaroo Bot is used by the company Kangaroo LLM to download data to train open source AI models tailored to Australian language and culture.
AI Data Scraper
See More →
laion-huggingface-processor
LAION-huggingface-processor is a web crawler operated by LAION (Large-scale Artificial Intelligence Open Network), a non-profit organization that creates open datasets for AI research. This bot collects images and associated metadata from websites to build large-scale datasets like LAION-5B, which are used to train AI models including text-to-image generators.
AI Data Scraper
See More →
LCC
LCC is a web crawler operated by the University of Leipzig that collects text data from websites to build large-scale linguistic corpora for research purposes. The bot gathers multilingual text content to support the Wortschatz project, which creates comprehensive language resources and dictionaries for natural language processing and computational linguistics research.
AI Data Scraper
See More →
meta-externalagent
meta-externalagent crawls web content for training AI models and improving Meta's products by indexing content directly across the internet.
AI Data Scraper
See More →
netEstate Imprint Crawler
netEstate Imprint Crawler is an AI data scraper operated by netEstate. If you think this is incorrect or can provide additional detail about its purpose, please let us know.
AI Data Scraper
See More →
omgili
omgili is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.
AI Data Scraper
See More →
PanguBot
PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu.
AI Data Scraper
See More →
SBIntuitionsBot
SBIntuitionsBot is a web crawler operated by SB Intuitions, a Japanese company that develops generative language models optimized for the Japanese language and culture. This bot collects data from websites to train and improve their language models, with all collected data stored and managed within Japan.
AI Data Scraper
See More →
Spider
Spider is a web crawler designed for AI projects, including AI agents, LLMs, RAG systems, and data analysis. It collects and converts web data into multiple formats including markdown, HTML, and text for AI training and fine-tuning purposes.
AI Data Scraper
See More →
Timpibot
Timpibot is used by Timpi's decentralized network of independent node operators. The index they build can be used to train LLMs (Large Language Models).
AI Data Scraper
See More →
VelenPublicWebCrawler
VelenPublicWebCrawler is a web crawler developed by Velen for Hunter that analyzes millions of publicly accessible internet pages every month. The bot builds business datasets and machine learning models while crawling respectfully with a minimum 2-second delay between requests.
AI Data Scraper
See More →
webzio-extended
webzio-extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models.
AI Data Scraper
See More →