Set Up Your Robots.txt
Overview
Keep your robots.txt up to date with the agent list automatically. The agent list is updated continuously, so generating your robots.txt using the Wordpress Plugin or API is more effective than maintaining one manually.
1. Create a New Project
Sign up and create a new project for your website if you haven't already.
- Navigate to the Projects page
- Click the New Project button
- Enter your website details
- Click Create
2. Copy Your Access Token
- Click on your project
- Click Settings
- Copy your access token
3. Generate and Serve the Robots.txt
There are 3 ways to generate and serve a robots.txt from your website.
Option 1: Using the Wordpress Plugin
Use this method for Wordpress websites. Adding the plugin is quick and easy.
- Download the plugin
- Log in to your website's Wordpress dashboard
- Click Plugins in the sidebar
- Click Add New Plugin
- Click Upload Plugin
- Choose the downloaded plugin .zip file
- Click Install Now
- Click Activate Plugin. Dark Visitors should now appear in the sidebar.
- Click Dark Visitors in the sidebar
- Paste your access token
- Select the agent types you want to block
- Click Save Changes
Option 2: Using the API
Make a request to the Robots.txts endpoint to generate a new robots.txt. Do this periodically (e.g. once per day), then cache and serve the result.
The Request
Endpoint | |
---|---|
URL | https://api.darkvisitors.com/robots-txts |
HTTP Method | POST |
Headers | |
Authorization |
A bearer token with your project's access token (e.g. Bearer 48d7dcbd-fc44-4b30-916b-2a5955c8ee42 ). |
Body | |
agent_types |
An array of agent types. Agent types include AI Assistant , AI Data Scraper , AI Search Crawler , and Undocumented AI Agent . |
disallow |
A string specifying which URLs are disallowed. Defaults to / to disallow all URLs. |
The Response
The response body is a robots.txt in text/plain
format. You can use this as is, or append additional lines to include things like sitemap directives. Serve this as your website's robots.txt.
Example
This javascript example would generate a robots.txt that blocks all known AI data scrapers and undocumented AI agents from all URLs.
const robotsTXT = await fetch("https://api.darkvisitors.com/robots-txts", {
method: "POST",
headers: {
"Authorization": "Bearer " + ACCESS_TOKEN
},
body: JSON.stringify({
agent_types: [
"AI Data Scraper",
"Undocumented AI Agent"
],
disallow: "/"
})
})
// Cache and serve this from your website's /robots.txt path
Option 3: Using the Boilerplate
In this example, all currently known AI data scrapers and undocumented AI agents are blocked. You can use it as a starting point and manually customize it using the agent list. If you sign up, you’ll get notified when new agents are added. However, using the Wordpress plugin or API is a more effective strategy since the agent list is updated continuously.
# Dark Visitors Robots.txt
# AI Data Scraper
# https://darkvisitors.com/agents/bytespider
User-agent: Bytespider
Disallow: /
# AI Data Scraper
# https://darkvisitors.com/agents/ccbot
User-agent: CCBot
Disallow: /
# AI Data Scraper
# https://darkvisitors.com/agents/diffbot
User-agent: Diffbot
Disallow: /
# AI Data Scraper
# https://darkvisitors.com/agents/facebookbot
User-agent: FacebookBot
Disallow: /
# AI Data Scraper
# https://darkvisitors.com/agents/google-extended
User-agent: Google-Extended
Disallow: /
# AI Data Scraper
# https://darkvisitors.com/agents/gptbot
User-agent: GPTBot
Disallow: /
# AI Data Scraper
# https://darkvisitors.com/agents/omgili
User-agent: omgili
Disallow: /
# Undocumented AI Agent
# https://darkvisitors.com/agents/anthropic-ai
User-agent: anthropic-ai
Disallow: /
# Undocumented AI Agent
# https://darkvisitors.com/agents/claude-web
User-agent: Claude-Web
Disallow: /
# Undocumented AI Agent
# https://darkvisitors.com/agents/claudebot
User-agent: ClaudeBot
Disallow: /
# Undocumented AI Agent
# https://darkvisitors.com/agents/cohere-ai
User-agent: cohere-ai
Disallow: /