Nutch

What is Nutch?

About

Nutch is a scraper. If you think this is incorrect or can provide additional detail about its purpose, please contact us. You can see how often Nutch visits your website by setting up Dark Visitors agent analytics.

Expected Behavior

Due to the wide variety of use cases, there's no way to accurately predict visitation behavior. Scrapers are notorious for ignoring robots.txt rules and accessing disallowed content. This is especially true if they're dispatched to achieve a specific goal rather than for some general purpose.

Type

Scraper
Downloads web content for possibly malicious purposes

Detail

Last Updated 13 minutes ago

Insights

Top Website Robots.txts

3%
3% of top websites are blocking Nutch
Learn How →

Country of Origin

United States
Nutch normally visits from the United States

Global Traffic

The percentage of all internet traffic coming from Scrapers

Top Visited Website Categories

News
Sports
Food and Drink
Science
Law and Government
Get These Insights for Your Website
Use the WordPress plugin, Node.js package, or API to get started in seconds.

Robots.txt

Should I Block Nutch?

Probably. Scrapers usually download publicly available internet content, which is freely accessible by default. However, you might want to block them if you don't want your content to be used for unauthorized purposes.

How Do I Block Nutch?

⚠️ Manual Robots.txt Edits Are Not Scalable
New agents are created every day. Instead, serve a continuously updating robots.txt that blocks new agents automatically.

You can block Nutch or limit its access by setting user agent token rules in your website's robots.txt. Set up Dark Visitors agent analytics to check whether it's actually following them.

User Agent String MaxPointCrawler/Nutch-1.19 (valassis.crawler at valassis dot com)
# robots.txt
# This should block Nutch

User-agent: Nutch
Disallow: /

References