Interesting Bug

by Andy Brudtkuhl on September 7, 2007

In doing some backend maintenance for Iowa Blogs today I noticed something very peculiar. For some of the bloggers there are duplicate posts… After drilling down further I noticed all of them were on the Typepad platform. After analyzing the RSS feeds produced by Typepad in combination with Feedburner I found that it is producing two different links – FeedBurner’s version and Typepad’s version.

Example from Mike Sansone’s recent post within his RSS Feed…

1. http://feeds.feedburner.com/~r/Converstations/~3/153408787/discovery-along.html

2. http://www.converstations.com/2007/09/discovery-along.html

Both of these links go to the same content. So what’s the problem? Well, I use the link to determine the uniqueness of a post before it’s processed. Since these links are different the engine thinks they are different posts – when in fact they are not.

Now I am checking the link in combination with the publication date and content to find uniquness. I will push the new version this weekend and clean up all the duplicate entries which accounts for 177 of the 4681 posts currently archived, or about 3.7% of the archived content.

Tags: , , ,

If you enjoyed this post, make sure you subscribe to my RSS feed!

Related posts:

  1. The woes of an RSS hacker I’ve been developing web applications built on the RSS platform...
  2. Subscribe RSS Feeds Main Feed...
  3. What’s in your RSS feed – Optional Channel Elements There are several optional elements in the channel so I’ll...
  4. Feedburner Ads A Disappointment Logging into FeedBurner yesterday in hopes they would have finally...
  5. *Announcing* Fresh Feeds Any of you that know us know we’ve had this...

{ 5 comments… read them below or add one }

Andy Brudtkuhl September 7, 2007 at 6:54 am

I would love to know how other RSS Developers hurdle this non standardized feature of what's supposed to be a standardized technology.

A call for help to RSS Hackers!

Reply

Mark Woodman September 7, 2007 at 12:33 pm

Andy,

We encountered this kind of pain in the ROME project as well. Aggregation sucks without reliable uniqueness. In response to your post, I wrote up an article talking about the problem with the lack of a good GUID here:

http://techbrew.net/articles/200709/rss-the-guid-problem/

Reply

Tony September 7, 2007 at 1:27 pm

Well if you curl the feedburner url, you just get a 302 redirect to the second url (I figure it’s there just for click-through stats). So why not just follow the redirects and use the end result as the content’s url?

Reply

Mike Sansone September 7, 2007 at 3:32 pm

Should I overwrite the code to just show feeds feed?.feedburner.com

Reply

Andy Brudtkuhl September 7, 2007 at 3:36 pm

Mark – thanks for your insights…

Tony – thanks I’ll look into that and see if that can be a quick fix.

Mike – It’s a problem I should definitely fix in the engine so no problem on your side. I was just using your feed as an example (plus you got three links outta the deal :) As long as your feed is valid (which yours is) my indexer should definitely be able to process it.

Plus, I wouldn’t want everybody to have to change their feed settings for me :)

I should have a fix up this weekend!

Thanks again everyone

Reply

Leave a Comment

Previous post:

Next post: