Then we realized that a hit did not necessarily mean a unique visitor. In fact, an unscrupulous webmaster could generate thousands on hits on his own site to fool advertisers into thinking his site was much busier than it was.
Then we built some technology to examine these hits and weed out the ones that were from the same IP.
Then we realized that was flawed because returning visitors are actually valuable and worth counting, so we started to speak of page views.
We then came to realize that this HTTP protocol is stateless and was never meant to produce meaningful stats. So we looked outwards and started to depend on third-parties like Google and Alexa to tell us how important a web site was. In the end, we've come to this uneasy truce with web stats. It's accepted that there is no single way to assess the amount of traffic a website recieves nor is there any way to concretely rank a site as to its importance on the web. We've accepted that we're 'close enough' and moved on.
This same scenario is playing out in podcasting.
First we decided that the number of hits on a feed was a good indicator of listenership. Then we realized that RSS feeders can hit a site every ten minutes to see if there's new content.
Then we tried to apply the concept of unique RSS hits, but that not only has the same problems as unique web hits with respect to return listenership, routers are so much more commonplace now that hundreds and perhaps thousands of listeners can be represented by one single corporate IP.
Then we started mucking around with time intervals. If the same IP hits the feed in 10 minutes, it's the same guy. If it's 30 minutes later, we'll count that as a new guy.
Finally, there is enough podcast data now to see that in many cases less than half of a show's listeners even use the feed. They prefer to direct download.
Sooner or later we'll have to come to the same uneasy truce with podcasting stats that we did with web site stats. The technologies that we use to transfer podcast media (hello, http!) is the same technology from 10 years ago and has the same inherent problems.
Places like Liberated Syndication and Feedburner have put a tremendous amount of resources into normalizing feed statistics, but at the end of the day - it's all guess work.
Personally, I feel that the correct number is the amount of times the media file is downloaded regardless of how it was done. I have recently launched a podcast hosting site named The Purple Podcaster (blatant plug!) and I'm using Loudblog software for user's show blogs. Loudblog has a stats package that counts the number of times each file is downloaded via the feed, downloaded directly from the website, and played on the site via the built-in flash player.
It is perfect? Nope. Can it be gamed? yup.
But it's close enough. I want to make podcasts. I f I wanted to count beans I would have been an accountant.
You get the perfect counting when you take the amount of downstream traffic that a single Media files causes.
Then you divide it by the size of this media file. Now you get a good impression of how often your file is downloaded. Loudblog only counts how often a file has been "tocuhed" by the user, even if he does not complete the download.
Posted by: Gerrit | April 3, 2006 8:52 AM | Permalink to Comment