Business Content

Intersecting enterprise content management and open source to improve business results

Alfresco Blog Integration

Posted by Sumer Jabri on April 11, 2008

A little while back I was tasked with solving a problem that went something like this:

Create content in a blog engine and have that content show up in Alfresco transformed to a predefined canonical form.

Motivation
Blog users, not necessarily within the corporate wall, can author content using familiar tools of choice (like WordPress), something would pull that content into Alfresco. If that content were to be pushed through some approval process then pushed from Alfresco to some content delivery infrastructure, say the corporate web site, then effectively any external blogger can contribute content to a website without the need for VPN setup nor corporate accounts.

Assumptions
Assume a setup where Alfresco is already being used to store enterprise content and it already has a model for content representation. Furthermore there exists a publishing mechanism to push that content to the edge for serving.

Solution
Obviously the first thing to do was to look at what Alfresco has out of the box in terms of blog integration. A quick look at the code shows something related to blog integration, and the wiki explains:
http://wiki.alfresco.com/wiki/Blog_Publishing_User_Guide

This basically allows one to take a piece of content within Alfresco, add some blog specific meta-data to it, and publish it to Typepad or WordPress. This is the reverse of what I was trying to do, so I had find another way.

Basically, the problem can be distilled to: Pull new blog entries from one or more blogs, transform the content to the designated canonical form, then store in Alfresco based on rules (more on that later).

The first thing that came to mind was to check if Mule had an RSS or ATOM transport, and indeed it does. Mule has a community transport for RSS that is able to pull down an RSS feed into ROME feed objects, the transport can be found on the Mule Forge here: http://mule.mulesource.org/display/RSS/Home

All that was needed then is pull down the feed, split it into messages, one message per post. Run it through an XSLT, easily done in Mule, and drop the transformed blog entries into Alfresco over CIFS.

However, that left me with 2 problems: (i) the blog poller needs to be idempotent (don’t pull down the same blog entry twice); (ii) handle custom namespaces/custom fields in the feed.

The first problem was addressed by writing an idempotent receiver inbound router. The router quite simply remembers the date and time of last blog post it received and uses that to pull down newer posts only.

The second problem was a bit tricker to solve. Extending ROME with custom modules is certainly possible, and though it would solve the problem of pulling in custom fields, it’s a bit cumbersome and I would have to update these modules every time the RSS feed source fields change.

What I was really after is segmentation of the RSS feed into individual blog posts, and the transformation of those individual snippets of XML into a predefined canonical form.

So all I really needed was to write a simple XML feed splitter. So another simple outbound router that splits the RSS XML feed into individual posts and a couple of transformers that transform messages from XMLByteArray to JDOM Document and back is all it took to make it happen.

Mule pulled everything together quite nicely with an HTTP connector polling periodically for posts, an XML Splitter segmenting the RSS feed with an idempotent router insuring only new posts make it through. Next was an XML transformation responsible for transforming the blog posts to the canonical representation, and finally a file transport to drop the blog post into Alfresco.

5 Responses to “Alfresco Blog Integration”

  1. Ross Mason said

    Great post. I like to see projects get utilised from MuleForge, we’re seeing huge and growing community value from MuleForge. The application you describe would make a great example app for the RSS connector. Any chance of donating the code to the RSS project (any proprietary code/info will be removed). The same pattern would be very useful when using our Apache Abera connector too: http://mule.mulesource.org/display/ABDERA/Cookbook

  2. Hi Ross, We’ll check with the client about contributing the code we developed as is. Shouldn’t be a problem. This contribution will be of significant value to the Alfresco community as well — Alfresco integration with blog platforms is a hot topic and Mule has proven to be a perfect fit.

  3. Ross Mason said

    Hi Mike,
    I think there would be a lot of interest from both communities. Feel free to contact me directly if I can help get this out there.

  4. Malcolm Ong said

    Hi,
    Great post indeed. I’m trying out your method of using the Idempotent Receiver and seem to have issues with Mule to pass any payload to it. What version of Mule was this on? Please email me if you don’t mind sharing the snipet of configuration code related to this inbound router alone. Thanks.

  5. Josh K said

    I am amazed with it. It is a good thing for my research. Thanks. ^_^

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>