Skip to Content
New Old Web home
New Old Web home

Open Web

Analyzing 5,818 Publishers’ robots.txt Files: Most Non-profit News Organizations Allow AI Bots, OpenAI Most Commonly Blocked

Robots.txt is a common code format that allows website owners to instruct and direct crawlers, scrapers, spiders, and other automated systems that identify themselves as a unique user agent. Once used to green or red light search engines from accessing a site’s content, publishers are now relying on robots.txt for something completely new: Managing web…

October 15, 2025