Nutch
What is Nutch?
About
Nutch is a scraper. If you think this is incorrect or can provide additional detail about its purpose, please contact us. You can see how often Nutch visits your website by setting up Dark Visitors agent analytics.
Expected Behavior
Due to the wide variety of use cases, there's no way to accurately predict visitation behavior. Scrapers are notorious for ignoring robots.txt rules and accessing disallowed content. This is especially true if they're dispatched to achieve a specific goal rather than for some general purpose.
Type
Detail
Last Updated | 13 minutes ago |
Insights
Top Website Robots.txts
Country of Origin
Global Traffic
The percentage of all internet traffic coming from Scrapers
Top Visited Website Categories
Robots.txt
Should I Block Nutch?
Probably. Scrapers usually download publicly available internet content, which is freely accessible by default. However, you might want to block them if you don't want your content to be used for unauthorized purposes.
How Do I Block Nutch?
You can block Nutch or limit its access by setting user agent token rules in your website's robots.txt. Set up Dark Visitors agent analytics to check whether it's actually following them.
User Agent String | MaxPointCrawler/Nutch-1.19 (valassis.crawler at valassis dot com) |
# robots.txt
# This should block Nutch
User-agent: Nutch
Disallow: /