This is something I have come across on site with a client I am working with at the moment. Basically the scenario was this – I am programmatically creating a CSV file, and I am downloading a word document template from a SharePoint site, and I want to set the copy of the word document up with my CSV file so when it opens it is ready to be merged with the data I generated. After hunting around Google for a little while I couldn’t find a good answer that got me what I needed and explained things well enough, so I figured I would put this together to explain it all.
The first thing you need to grab is the OpenXML SDK. I got version 2 from the Microsoft download site at http://www.microsoft.com/downloads/details.aspx?FamilyID=c6e744e5-36e9-45f5-8d8c-331df206e0d0&displaylang=en. When you install it there is a tool called the “OpenXML SDK 2.0 Productivity Tool” that we are going to use to make this process a whole lot easier. Basically what it does is open an OpenXML document (any of the office document formats that end with the ‘x’, like docx, pptx, xlsx, etc) and will show you all of the XML files and structures within the document itself (or the ‘package’ as it is referred to).
As a starting point, I manually did the mail merge and saved the word document and then opened it up with this tool to see what I could see. The main part that grabbed my attention was under /word/document.xml –> /word.settings.xml –> w:settings –> w:mailMerge. If you right click on this node and choose “Reflect Code” you will see two windows pop up for you on the right of the window – the first is the actual XML from that node, which is somewhat interesting and confusing at the same time, but the cooler part for me was the C# code it generated below that would create that exact XML node for me – that’s a win right there folks!
So before we go any further, I wanted to explain what some of this stuff was before we blindly started working with it – The first few tags (like query, connectionString and dataSource are all used to define the source for the data – brilliant because that’s what we need to set. There are many tags below it though that are called fieldMapData – and it wasn’t clear to me at first what there were – I did figure it out though. If you go back into your word document and have a look under the ‘Mailings’ tab in the ribbon, you will see a button called “Match Fields”. What this does is it lets you map fields in your data source to a set of known fields that word can work with – so when you do things like insert and address block or a greeting line, it will know what field in your datasource maps to things like first name, last name, business address etc. So what you should do before we go any further is go and map all of these fields in your word document, then open it back up in the productivity tool and go back to looking at that code.
Now we can grab all of that C# and put it into our visual studio project – but first we need to give it some context and create the objects that represent the word document itself. Start by opening VS and adding a reference to “DocumentFormat.OpenXml.dll” (its in the directory you installed the SDK in). Now add a reference to DocumentFormat.OpenXml.Packaging and DocumentFormat.OpenXml.Wordprocessing to your class. Now lets add this code to open our document and get ready for the code from the productivity tool.
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(WordDocmentPath, true))
{
var settingsPart = wordDocument.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().First();
if (settingsPart.Settings.OfType<MailMerge>().Count() > 0)
settingsPart.Settings.OfType<MailMerge>().First().Remove();
// more code will go here later
}
Basically the “WordDocumentPath” variable there is a string representing the location of your word document file. This will open the file and find that settings XML file for us, and then if the MailMerge node already exists it will delete it. Now if you grab the C# inside the GenerateMailMerge() method of the productivity tools output and paste it in where the comment is in the above code, then add this line directly below to add your newly formed section into the document:
settingsPart.Settings.InsertAt(mailMerge1, settingsPart.Settings.Count());
So now that all that was done I tested it and when I opened the output word document I get an error about it being corrupted – dang. I had a look through the rest of the document XML files to see if anything else looked important, and there was one thing I did find that will stop this whole thing working if you don’t include it. If you grab a copy of your word document before the mail merge and the one after the mail merge and use the “Compare Files” function, you will see a list of all XML files in the package that have changed – the one that matters here is “/word/_rels/settings.xml.rels”. This defines relationships with other documents that are declared in the settings.xml file we were looking at earlier. If you have a look in the file by double clicking on it, you will see that there is an extra relationship tag that is defined there that points to your CSV file. Now here’s the trick – note the Id property of your relationship tag – now go and have a look back at the XML in the mail merge element, specifically at the <w:dataSource> element – the Id here is the Id of the relationship tag! This is what is telling the package the location of your CSV file, despite the fact that it looks like the query tag above it does it, and without adding the relationship your merge wont work. Here is a code chunk to create a new relationship in that file:
foreach (var relationship in wordDocument.ExternalRelationships.Where(Rel => Rel.RelationshipType == "http://schemas.openxmlformats.org/officeDocument/2006/relationships/mailMergeSource"))
{
wordDocument.DeleteExternalRelationship(relationship);
}
var dsRelationship = settingsPart.AddExternalRelationship("http://schemas.openxmlformats.org/officeDocument/2006/relationships/mailMergeSource", new Uri(string.Format("file:///{0}", CsvPath)));
Put that in directly below where the old MailMerge element is removed (so before the rest of the code you copied from the tool). Last but not least if you then look for the line of code you copied from the tool where it creates the DataSourceReference element and replace it with this, it will ensure the correct relationship is referenced as the documents data source.
DataSourceReference dataSourceReference1 = new DataSourceReference(){ Id = dsRelationship.Id };
So I saved and compiled all that and ran it – and it worked! I then ran it with a few different word document templates and they all worked with the same code too which was awesome. So there you have it kids, hopefully now you understand a bit about how the mail merge stuff works under the hood and you can now know that you can write this stuff up to automate more things for your users if you have to.