Set Up Automatic Robots.txt
Overview
Keep your robots.txt updated with the latest known AI scrapers, crawlers, and assistants automatically.
New agents are added frequently, so setting up an automatic robots.txt that continuously updates is more effective than maintaining one manually. You can do this using the API or the WordPress plugin.
We also recommend setting up agent analytics to check whether they're actually following your robots.txt rules.
1. Create a New Project
Sign up and create a new project for your website if you haven't already.
- Navigate to the Projects page
- Click the New Project button
- Enter your website details
- Click Create
2. Copy Your Access Token
- Click on your project
- Click Settings
- Copy your access token
3. Generate and Serve the Robots.txt
There are 2 ways to generate and serve a robots.txt from your website. If you're looking for a different way, please let us know.
Option 1: Using the WordPress Plugin
Use this method for WordPress websites. Adding the plugin is quick and easy.
- Log in to your website's WordPress dashboard
- Click Plugins in the sidebar
- Search for "Dark Visitors" or download the plugin directly
- Click Install Now
- Click Activate
- Click Dark Visitors in the sidebar
- Paste your access token
- Select the agent types you want to block
- Click Save Changes
Option 2: Using the API
Make a request to the Robots.txts endpoint to generate a new robots.txt. Do this periodically (e.g. once per day), then cache and serve the result.
The Request
Endpoint | |
---|---|
URL | https://api.darkvisitors.com/robots-txts |
HTTP Method | POST |
Headers | |
Authorization |
A bearer token with your project's access token (e.g. Bearer 48d7dcbd-fc44-4b30-916b-2a5955c8ee42 ). |
Content-Type |
This needs to be set to application/json |
Body | |
agent_types |
An array of agent types. Agent types include AI Assistant , AI Data Scraper , AI Search Crawler , and Undocumented AI Agent . |
disallow |
A string specifying which URLs are disallowed. Defaults to / to disallow all URLs. |
The Response
The response body is a robots.txt in text/plain
format. You can use this as is, or append additional lines to include things like sitemap directives. Cache and serve this as your website's robots.txt.
Example
This cURL example generates a robots.txt that blocks all known AI data scrapers and undocumented AI agents from all URLs.
curl -X POST https://api.darkvisitors.com/robots-txts \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"agent_types": [
"AI Data Scraper",
"Undocumented AI Agent"
],
"disallow": "/"
}'
Here's an example of how to use this in practice for a Node.js backend:
const response = await fetch("https://api.darkvisitors.com/robots-txts", {
method: "POST",
headers: {
"Authorization": "Bearer " + ACCESS_TOKEN,
"Content-Type": "application/json"
},
body: JSON.stringify({
agent_types: [
"AI Data Scraper",
"Undocumented AI Agent"
],
disallow: "/"
})
})
// Cache and serve response.text() as your robots.txt
You can follow these examples to call the API in any language.