Set Up Automatic Robots.txt
Overview
Protect sensitive content from unwanted access and scraping. Generate a continuously updating robots.txt that stays up to date with all current and future bots in the specified categories automatically.
1. Create a New Project
- Sign up for Dark Visitors
- Navigate to the Projects page
- Click the New Project button
- Enter your website details
- Click Create
2. Copy Your Access Token
- Navigate to the Projects page
- Click on your project
- Click Settings
- Copy your access token
3. Serve the Robots.txt
- Log in to your website's WordPress dashboard
- Click Plugins in the sidebar
- Search for "Dark Visitors" or download the plugin directly
- Click Install Now
- Click Activate
- Click Dark Visitors in the sidebar
- Paste your access token
- Select the agent types you want to block
- Click Save Changes
Install the Package
Download the package from NPM using the command line.
npm install @darkvisitors/sdk
Initialize the Client
Create a new instance of DarkVisitors
with your access token.
import { AgentType, DarkVisitors } from "@darkvisitors/sdk"
const darkVisitors = new DarkVisitors(YOUR_ACCESS_TOKEN)
Generate a Robots.txt
Select which AgentType
s you want to block, and a string specifying which URLs are disallowed (e.g. "/"
to disallow all paths).
const robotsTxt = await darkVisitors.generateRobotsTxt.([
AgentType.AIDataScraper,
AgentType.Scraper,
AgentType.IntelligenceGatherer,
AgentType.SEOCrawler
], "/")
The return value is a plain text robots.txt. You can use this as is, or append additional lines to include things like sitemap links. Do this periodically (e.g. once per day), then cache and serve robotsTxt
from your website's /robots.txt
endpoint.
Make an HTTP request the REST API from any codebase or programming language.
Generate a Robots.txt
Call the API to generate a new robots.txt periodically (e.g. once per day). Cache and serve this text from your website's /robots.txt
path.
URL | |
---|---|
URL | https://api.darkvisitors.com/robots-txts |
HTTP Method | POST |
Headers | |
Authorization |
A bearer token with your project's access token (e.g. Bearer 48d7-fc44-4b30-916b-2a59 ). |
Content-Type |
This needs to be set to application/json |
Body | |
agent_types |
An array of agent types you want to block or set a rule for. Allowed agent types include:
|
disallow |
A string specifying which URLs are disallowed. Defaults to / to disallow all URLs. |
Example
curl -X POST https://api.darkvisitors.com/robots-txts \
-H "Authorization: Bearer ${YOUR_ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"agent_types": [
"AI Data Scraper",
"Scraper",
"Intelligence Gatherer",
"SEO Crawler",
],
"disallow": "/"
}'
Serve the Response
The response body is a robots.txt in text/plain
format. You can use this as is, or append additional lines to include things like sitemap links.
The Shopify integration is in the works. If you want early access, please contact us.
Generate a Robots.txt
Define a function that makes an HTTP request to the REST API. Select which AgentType
s you want to block, and a string specifying which URLs are disallowed (e.g. "/"
to disallow all paths). Allowed agent types include:
AI Agent
AI Assistant
AI Data Scraper
AI Search Crawler
Archiver
Developer Helper
Fetcher
Headless Agent
Intelligence Gatherer
Scraper
SEO Crawler
Search Engine Crawler
Security Scanner
Undocumented AI Agent
Uncategorized
async def generate_dark_visitors_robots_txt() -> str:
async with aiohttp.ClientSession() as session:
try:
async with session.post(
"https://api.darkvisitors.com/robots-txts",
headers={
"Authorization": f"Bearer {YOUR_ACCESS_TOKEN}",
"Content-Type": "application/json",
},
json={
"agent_types": [
"AI Data Scraper",
"Scraper",
"Intelligence Gatherer",
"SEO Crawler",
],
"disallow": "/",
},
) as response:
response.raise_for_status()
return await response.text()
except aiohttp.ClientResponseError as error:
raise RuntimeError(f"Invalid response code fetching robots.txt: {error.status}") from error
except aiohttp.ClientError as error:
raise RuntimeError(f"Error fetching robots.txt: {error}") from error
Serve the Response
The return value is a plain text robots.txt.
robots_txt = asyncio.run(generate_dark_visitors_robots_txt())
You can use this as is, or append additional lines to include things like sitemap links. Do this periodically (e.g. once per day), then cache and serve this text from your website's /robots.txt
path.
Generate a Robots.txt
Define a function that makes an HTTP request to the REST API. Select which AgentType
s you want to block, and a string specifying which URLs are disallowed (e.g. "/"
to disallow all paths). Allowed agent types include:
AI Agent
AI Assistant
AI Data Scraper
AI Search Crawler
Archiver
Developer Helper
Fetcher
Headless Agent
Intelligence Gatherer
Scraper
SEO Crawler
Search Engine Crawler
Security Scanner
Undocumented AI Agent
Uncategorized
function generate_dark_visitors_robots_txt() {
$curl = curl_init('https://api.darkvisitors.com/robots-txts');
curl_setopt_array($curl, [
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => [
'Authorization: Bearer ' . $YOUR_ACCESS_TOKEN,
'Content-Type: application/json',
],
CURLOPT_POSTFIELDS => json_encode([
'agent_types' => [
'AI Data Scraper',
'Scraper',
'Intelligence Gatherer',
'SEO Crawler',
],
'disallow' => '/'
], JSON_UNESCAPED_SLASHES),
CURLOPT_RETURNTRANSFER => true,
]);
$response = curl_exec($curl);
if ($response === false) {
$error = curl_error($curl);
curl_close($curl);
throw new RuntimeException('Error fetching robots.txt: ' . $error);
}
$status = curl_getinfo($curl, CURLINFO_RESPONSE_CODE);
curl_close($curl);
if ($status < 200 || $status >= 300) {
throw new RuntimeException('Invalid response code fetching robots.txt: ' . $status);
}
return $response;
}
Serve the Response
The return value is a plain text robots.txt.
$robots_txt = generate_dark_visitors_robots_txt();
You can use this as is, or append additional lines to include things like sitemap links. Do this periodically (e.g. once per day), then cache and serve this text from your website's /robots.txt
path.