← Back
Set Up Robots.txt Categories for Your AWS Website
Overview
Use a simple Lambda edge function to append Robots.txt Categories to your existing robots.txt file. If you don't have the capability or need a custom solution, you can alternatively use the REST API to generate and append the robot.txt rules periodically using some other method like a cron job. Please contact us if you need help getting set up.
Step 1: Create a Lambda Edge Function
- Open your AWS dashboard
- In the search bar at the top of the screen, type in and select Lambda
- Click the Create a function button
- Enter something like
dark-visitors-robots-txtfor the function name - Expand the Change default execution role section and select Create a new role from AWS policy templates
- Create a new role with the Basic Lambda@Edge permissions template
- Click the Create function button
- Paste this code into the
index.mjsfile:
const DARK_VISITORS_ACCESS_TOKEN = "YOUR_ACCESS_TOKEN" // TODO: Swap in your access token
const ROBOTS_TXT_DISALLOW_PATH = "/"
const ROBOTS_TXT_AGENT_TYPES = [
// TODO: Add blocked agent types
]
export const handler = async (event) => {
const thisResponse = event.Records[0].cf.response
const thatResponse = await fetchRobotsTXT()
const thisRobotsTXT = thisResponse.body || ""
const thatRobotsTXT = thatResponse.ok ? await thatResponse.text() : ""
const robotsTXT = [
thisRobotsTXT.trim(),
"# BEGIN Dark Visitors Managed Content",
thatRobotsTXT.trim(),
"# END Dark Visitors Managed Content"
].join("\n\n")
return {
...response,
status: "200",
statusDescription: "OK",
headers: {
...response.headers,
"content-type": [{ value: "text/plain" }]
},
body: robotsTXT
}
}
async function fetchRobotsTXT() {
return fetch("https://api.darkvisitors.com/robots-txts", {
method: "POST",
headers: {
"Authorization": `Bearer ${DARK_VISITORS_ACCESS_TOKEN}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
agent_types: ROBOTS_TXT_AGENT_TYPES,
disallow: ROBOTS_TXT_DISALLOW_PATH
})
})
}
- Navigate to the Dark Visitors Projects page and open your project
- Copy your access token from the Settings page
- Back in AWS, swap in your access token where it says
YOUR_ACCESS_TOKEN - Where it says
// TODO: Add blocked agent types, add the agent types you want to block, and a string specifying which URLs are disallowed (e.g."/"to disallow all paths). Allowed agent types include:"AI Agent""AI Assistant""AI Data Scraper""AI Search Crawler""Archiver""Developer Helper""Fetcher""Automated Agent""Intelligence Gatherer""Scraper""SEO Crawler""Search Engine Crawler""Security Scanner""Undocumented AI Agent""Uncategorized"
- Click Deploy
Step 2: Create a Distribution Behavior
- Click the Actions dropdown menu and select Publish a new version
- Select the Versions tab, and click your latest version
- Copy the Function ARN
- Navigate back to your AWS dashboard and select your distribution
- Select the Behaviors tab, and click the Create behavior button
- For Path pattern, enter
/robots.txt - In the Function associations section, select Lambda@Edge for Origin response, and paste the Function ARN
- Click Create behavior
Step 3: Test Your Integration
If your website is correctly connected, you should see the new rules in your website's robots.txt.