OpenXml Office format open and substitute text

Quite often, customers ask us to generate reports in word format, the main advantage of this approach is that people feels comfortable working with word, and they love the possibility to modify the report once generated. In the past years I used many techniques to reach this goal, but in the end, a lot of time ago I resort to write a simple RTF generator that suites my needs.

In these days I reach the point where the complexities of the documents became really difficult to manage with RTF generator, moreover we need a software that permits the customers to create a master word document, then the software should only makes substitution in some prefixed part with the real data. The obvious solution seems to use Office automation, but we need this code in a web application, and office automation is not supported by microsoft in non interactive environment . Moreover I used this technique in the past, and it is terribly slow for big documents.

We decided to create word 2007 document so I move to OpenXml format. Microsoft have a SDK that permits you to manage this new format, it permits you only to work with the overall structure of the document, manage the unzipping, adding part and zipping again document, so it’s up to you to manage the XML to modify the document, but thanks to LINQ 2 XML we can really do this in a simple way. Let’s start with a little example. I’ve created a simple document with this content

image

My goal is to find the $$$substituteme(30) in the document, change it with another text and save the new document with another name. I create an helper class that does this for me, here is the constructor.

1
2
3
4
5
public Document(String originalDocumentPath, String destinationDocumentPath)
{
    File.Copy(originalDocumentPath, destinationDocumentPath, true);
    Doc = WordprocessingDocument.Open(destinationDocumentPath, true);
}

The SDK does not permits me to save document with another name, so I simply copy the master file with the name I want, then open the copy; the overall effect is the same, mantaining the master unchanged and have a file with a given name. The WordprocessingDocument class is the root class that you should use to manage docx files, it is true that a docx file is simply a zip file, but this class shield you from this permitting you to browse file content. My class assign the document to a private propery called Doc that does some basic management.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
private WordprocessingDocument Doc
{
    get { return doc; }
    set
    {
        if (doc != null) doc.Dispose();
        doc = value;
        GrabDocumentParts();
    }
}

private void GrabDocumentParts()
{
    if (doc != null)
    {
        using (StreamReader sr = new StreamReader(doc.MainDocumentPart.GetStream()))
        {
            using (XmlReader xmlr = XmlReader.Create(sr))
            {
                mainDocument = XElement.Load(xmlr);
            }
        }
    }
}

First of all I dispose a previous document if present (actually it is not needed because I can setup document only in the constructor), suddenly  I read all the content of the MainDocumentPart into an XElement variable. Now I can change the XElement to substitute text.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
public static XNamespace namespace_w = XNamespace.Get("http://schemas.openxmlformats.org/wordprocessingml/2006/main");
public Document Substitute(String tagToSearch, String textToSubstitute)
{
    XElement node = FindNodeByTag( tagToSearch);
    node.Value = node.Value.Replace(tagToSearch, textToSubstitute);
    return this;
}

private XElement FindNodeByTag(String tagToSearch)
{
    return (from b in mainDocument.Descendants(namespace_w + "t")
            where b.Value.Trim() == tagToSubstitute
            select b).Single();
}

The code is really simple, to find the code I search for a node named w:t; remember that you need to use XML namespaces to make queries. When I found the node I simply substitute the original text with the one I want and since I love fluent interfaces, the Substitute method return the original Document object to chain calls. Finally I need a Save method that close the document and save all modified content to the file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
public void Save()
{
    using (Stream s = doc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write))
    {
        using (XmlWriter xmlw = XmlWriter.Create(s))
        {
            mainDocument.WriteTo(xmlw);
        }
    }
    doc.Close();
}

The process is really simple, I call GetStream of the MainDocumentPart object, but with FileMode.Create that requires the creation of a new stream, then I simply write to the stream all the content of the modified XElement, and the game is done. Here is a typical use.

1
2
3
4
Document doc = new Document(@"samples\doc2.docx", @"samples\doc1saved.docx");
doc.Substitute("$$$substituteme(30)", "First Substitution!!!")
   .Substitute("$$$substituteme(25)", "Second Substitution!!!")
   .Save();

Thanks to the fluent interface I can use simple syntax to change various part of the document. Working with OpenXml makes really easy to manage Word documents.

alk.