In doing some backend maintenance for Iowa Blogs today I noticed something very peculiar. For some of the bloggers there are duplicate posts… After drilling down further I noticed all of them were on the Typepad platform. After analyzing the RSS feeds produced by Typepad in combination with Feedburner I found that it is producing two different links – FeedBurner’s version and Typepad’s version.
Example from Mike Sansone’s recent post within his RSS Feed…
1. http://feeds.feedburner.com/~r/Converstations/~3/153408787/discovery-along.html
2. http://www.converstations.com/2007/09/discovery-along.html
Both of these links go to the same content. So what’s the problem? Well, I use the link to determine the uniqueness of a post before it’s processed. Since these links are different the engine thinks they are different posts – when in fact they are not.
Now I am checking the link in combination with the publication date and content to find uniquness. I will push the new version this weekend and clean up all the duplicate entries which accounts for 177 of the 4681 posts currently archived, or about 3.7% of the archived content.
Tags: Iowa Blogs, RSS, Typepad, Feedburner
Related posts:
- The woes of an RSS hacker I’ve been developing web applications built on the RSS platform...
- Subscribe RSS Feeds Main Feed...
- What’s in your RSS feed – Optional Channel Elements There are several optional elements in the channel so I’ll...
- Feedburner Ads A Disappointment Logging into FeedBurner yesterday in hopes they would have finally...
- *Announcing* Fresh Feeds Any of you that know us know we’ve had this...


{ 5 comments… read them below or add one }
I would love to know how other RSS Developers hurdle this non standardized feature of what's supposed to be a standardized technology.
A call for help to RSS Hackers!
Andy,
We encountered this kind of pain in the ROME project as well. Aggregation sucks without reliable uniqueness. In response to your post, I wrote up an article talking about the problem with the lack of a good GUID here:
http://techbrew.net/articles/200709/rss-the-guid-problem/
Well if you curl the feedburner url, you just get a 302 redirect to the second url (I figure it’s there just for click-through stats). So why not just follow the redirects and use the end result as the content’s url?
Should I overwrite the code to just show feeds feed?.feedburner.com
Mark – thanks for your insights…
Tony – thanks I’ll look into that and see if that can be a quick fix.
Mike – It’s a problem I should definitely fix in the engine so no problem on your side. I was just using your feed as an example (plus you got three links outta the deal :) As long as your feed is valid (which yours is) my indexer should definitely be able to process it.
Plus, I wouldn’t want everybody to have to change their feed settings for me :)
I should have a fix up this weekend!
Thanks again everyone