SyntaxHighlighter

Thursday, February 27, 2014

Planet Atlantides grows up and gets its own user-agent string

So, sobered by recent spelunking and bad-bot-chasing in various server logs and convicted by sage advice that ought to be followed by everyone in the UniversalFeedParser documentation, I have customized the bot used on Planet Atlantides for fetching web feeds so it identifies itself unambiguously to the web servers from which it requests those feeds.

Here's the explanatory text I just posted to the Planet Atlantides home page. Please let me know if you have suggestions or critiques.

Feed reading, bots, and user agents

As implied above, Planet Atlantides uses Sam Ruby's "Venus" branch of the Planet "river of news" feed reader. That code is written in the Python language and uses an earlier version of the Universal Feed Reader library for fetching web feeds (RSS and Atom formats). Out of the box, its http requests use the feed parser's default user agent string, so your server logs will only have recorded "UniversalFeedParser/4.2-pre-274-svn +http://feedparser.org/" when our copy of the software pulled your feed in the past. 

Effective 27 February 2014, the Planet Atlantides production version of the code now identifies itself with the following user agent string: "PlanetAtlantidesFeedBot/0.2 +http://planet.atlantides.org/". Production code runs on a machine with the IP address 66.35.62.81, and never runs more than once per hour. Apart for a one-time set of test episodes on 27 February 2014 itself, log entries recording our user agent string and a different IP address represent spoofing by a potential bad actor other than me and my automagical bot. You should nuke them from orbit; it's the only way to be sure. Note that from time-to-time, I may run test code from other IP addresses, but I will in future use the user agent string beginning with "PlanetAtlantidesTestBot" for such runs. You can expect them to be infrequent and irregular.

Please email me if you have any questions about Planet Atlantides, its bot, or these user agent strings. In particular, if you put something like "PlanetAtlantidesBot is messing up my site" in your subject line, I'll look at it and respond as quickly as I can.

No comments: