Converting WordPress export to BlogML
For those of you who noticed, I have decided to re-platform my blog (again - yea I know, sorry for anyone who got a full list of posts in thier RSS again too) and this time I decided to move away from WordPress and decided to use FunnelWeb. There is a bunch of stuff I like about FW in general, but that's not what this post is about - this post is to discuss the tool I wrote to assit with the migration. Enter WXRtoBlogML - which I have published the source code to for anyone who is interested in grabbing it.
Basically the tool is designed to read from an XML export from WordPress (which is in an extended form of RSS called WXR, which stands for WordPress eXtended RSS) and converts it in to BlogML. The tool makes use of LINQ to XML to read from the source XML file, and uses the BlogML library on Nuget to prepare the and save the new file - there really isn't all that much code involved in it. I also grabbed NDesk.Options to help with parsing the command line arguments as I decided a command line tool would be all that was needed for this one.
Source code is on BitBucket at http://bitbucket.brianfarnhill.com/wxrtoblogml/overview Feel free to fork it and make changes if you like. There is more that can be done with it, at the moment I ignored pages and attachments so there is room for improvement.
Some quick highlights of the code in the tool - the command line parsing works like this:
var showHelp = false;
var source = string.Empty;
var output = string.Empty;
var p = new OptionSet
{
{ "s|source=", "the path of the source XML file",
v => source = v },
{ "o|output=", "the path to save the resulting XML file",
v => output = v},
{ "h|help", "show this message and exit",
v => showHelp = v != null },
};
try
{
p.Parse(args);
}
catch (OptionException e)
{
Console.Write("WXRtoBlogML: ");
Console.WriteLine(e.Message);
Console.WriteLine("Try `WXRtoBlogML --help' for more information.");
}
if (showHelp)
{
ShowHelp(p);
return;
}
NDesk allows us to use some really simple syntax to specify the parameters, thier descriptions, and then with an expression set the parameter to a local variable. Makes working with command line arguments a piece of cake.
Next was doing the actual parsing - here's an example of how I start the conversion:
var sourceFile = XDocument.Load(source);
if (sourceFile.Root == null) throw new Exception("Unable to locate root node in XML input file");
var channelNode = sourceFile.Root.Element("channel");
if (channelNode == null) throw new Exception("Unable to locate channel node in XML input file");
XNamespace content = "http://purl.org/rss/1.0/modules/content/";
XNamespace excerpt = "http://wordpress.org/export/1.1/excerpt/";
XNamespace wp = "http://wordpress.org/export/1.1/";
var blog = new BlogMLBlog
{
Title = channelNode.Elements().First(node => node.Name == "title").Value,
SubTitle = channelNode.Elements().First(node => node.Name == "description").Value
};
foreach (var author in channelNode.Elements().Where(node => node.Name == wp + "author"))
{
blog.Authors.Add(new BlogMLAuthor
{
Approved = true,
DateModified = DateTime.Now,
Email = GetFirstElementValue(author.Elements(wp + "author_email")),
ID = GetFirstElementValue(author.Elements(wp + "author_id")),
Title = GetFirstElementValue(author.Elements(wp + "author_display_name"))
});
}
So it starts with loading the document, creating some namespace references to use later, and then going through setting values on the new blog object through querying the XML document for specific values. Where there are possibilities for more than one value it just drops in to foreach loops like the authors one there to create the necessary child objects.
I'll write a full post the describes the whole process of how I got the new platform up and running, so stay tuned for that - but feel free to have a look at the code in the mean time if you are interested!
Comments
No comments yet. Be the first!