Google has quietly up to date its record of user-triggered fetchers with new documentation for Google NotebookLM. The significance of this seemingly minor change is that it’s clear that Google NotebookLM won’t obey robots.txt.
Google NotebookLM
NotebookLM is an AI analysis and writing device that permits customers so as to add an internet web page URL, which is able to course of the content material after which allow them to ask a spread of questions and generate summaries based mostly on the content material.
Google’s device can mechanically create an interactive thoughts map that organizes subjects from a web site and extracts takeaways from it.
Person-Triggered Fetchers Ignore Robots.txt
Google Person-Triggered Fetchers are internet brokers which can be triggered by customers and by default ignore the robots.txt protocol.
In accordance with Google’s Person-Triggered Fetchers documentation:
“As a result of the fetch was requested by a consumer, these fetchers usually ignore robots.txt guidelines.”
Google-NotebookLM Ignores Robots.txt
The aim of robots.txt is to offer publishers management over bots that index internet pages. However brokers just like the Google-NotebookLM fetcher aren’t indexing internet content material, they’re performing on behalf of customers who’re interacting with the web site content material via Google’s NotebookLM.
How To Block NotebookLM
Google makes use of the Google-NotebookLM consumer agent when extracting web site content material. So, it’s attainable for publishers wishing to dam customers from accessing their content material might create guidelines that mechanically block that consumer agent. For instance, a easy resolution for WordPress publishers is to make use of Wordfence to create a customized rule to dam all web site guests which can be utilizing the Google-NotebookLM consumer agent.
One other solution to do it’s with .htaccess utilizing the next rule:
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} Google-NotebookLM [NC] RewriteRule .* - [F,L]
