Friday, January 8, 2010

News International starts blocking search engine spiders....

.... but only the ones it can afford to set its lawyers on.

Today we have had a further development in the ongoing battle between news aggregators and Rupert Murdoch's News International company. In fact its one that aggregator NewsNow has felt strongly enough to issue a Press Release about today.

For those who aren't familiar with NewsNow's service, it spiders news sites in a similar way to other search sites such a Yahoo and Google News. It then builds links to these articles from its own site, thereby sending traffic off to the content it finds (helping them build visitor traffic, which sells more adverts, which makes the sites more money).

News International has previously stated that it will delist itself from search engines in 2010. We're all watching for them to make themselves invisible to the search spiders of Google and others (but not, assuming News International and Microsoft get into bed together as rumoured).

So as I've stated before, the easy way to do this is to make changes to a simple file called a robots.txt which sits on the server displaying the content. If you use this file to tell the spider not to index the specific site it won't, its as simple as that. You can even go so far as to tell the spider (e.g. Google's Googlebot) that you specifically don't want it to look through all or part of your site.
I have done this in the past when I've had a particularly fragile site that could not take all its pages being indexed at once and some site owners do this when they don't want their content read & stolen... which is fair enough.

So what have The Time Online done? Well, if you look at their robots.txt file you will see that they have entered the line:
#Agent Specific Disallowed Sections
User-agent: NewsNow
Disallow: /

Now for those who don't speak fluent search engine spider, I'll translate...
"Dear NewsNow spider, go away"

I'll leave you to decide if The Times are playing fair in this matter. However NewsNow's boss Struan Bartlett is fairly clear on the matter:
“The question remains whether News International, in arbitrarily blocking individual search engines, is trying to use its muscle to gain unreasonable control over the public’s freedom to choose the way they access information and news online."

I personally want to know why The Times has not done the same to block the Googlebot. Could it be it brings in far too much reveue for them right now?

The battle continues!
Post a Comment