If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!
In doing some backend maintenance for Iowa Blogs today I noticed something very peculiar. For some of the bloggers there are duplicate posts… After drilling down further I noticed all of them were on the Typepad platform. After analyzing the RSS feeds produced by Typepad in combination with Feedburner I found that it is producing two different links - FeedBurner’s version and Typepad’s version.
Example from Mike Sansone’s recent post within his RSS Feed…
1. http://feeds.feedburner.com/~r/Converstations/~3/153408787/discovery-along.html
2. http://www.converstations.com/2007/09/discovery-along.html
Both of these links go to the same content. So what’s the problem? Well, I use the link to determine the uniqueness of a post before it’s processed. Since these links are different the engine thinks they are different posts - when in fact they are not.
Now I am checking the link in combination with the publication date and content to find uniquness. I will push the new version this weekend and clean up all the duplicate entries which accounts for 177 of the 4681 posts currently archived, or about 3.7% of the archived content.
Tags: Iowa Blogs, RSS, Typepad, Feedburner
If you enjoyed this post, make sure you subscribe to my RSS feed!
5 Responses
Mark Woodman
September 7th, 2007 at 12:33 pm
1Andy,
We encountered this kind of pain in the ROME project as well. Aggregation sucks without reliable uniqueness. In response to your post, I wrote up an article talking about the problem with the lack of a good GUID here:
http://techbrew.net/articles/200709/rss-the-guid-problem/
Andy Brudtkuhl
September 7th, 2007 at 12:54 pm
2I would love to know how other RSS Developers hurdle this non standardized feature of what’s supposed to be a standardized technology.
A call for help to RSS Hackers!
Tony
September 7th, 2007 at 1:27 pm
3Well if you curl the feedburner url, you just get a 302 redirect to the second url (I figure it’s there just for click-through stats). So why not just follow the redirects and use the end result as the content’s url?
Mike Sansone
September 7th, 2007 at 3:32 pm
4Should I overwrite the code to just show feeds feed?.feedburner.com
Andy Brudtkuhl
September 7th, 2007 at 3:36 pm
5Mark - thanks for your insights…
Tony - thanks I’ll look into that and see if that can be a quick fix.
Mike - It’s a problem I should definitely fix in the engine so no problem on your side. I was just using your feed as an example (plus you got three links outta the deal :) As long as your feed is valid (which yours is) my indexer should definitely be able to process it.
Plus, I wouldn’t want everybody to have to change their feed settings for me :)
I should have a fix up this weekend!
Thanks again everyone
RSS feed for comments on this post · TrackBack URI
Leave a reply
Andy Brudtkuhl
Chief Web Guru
48Web
Follow Andy
Follow 48Web
Recent Posts
Categories
Reading
Links
Meta
Email Subscription
Recent Readers
Get A New Browser
technology, web, business
Sponsors
Recent Entries
Recent Comments
Most Commented
Get A New Browser is proudly powered by WordPress - BloggingPro theme by: Design Disease
GANB is written by Andy Brudtkuhl of 48Web