Open Xml Project

I’ve received some comments in an old post, regarding the manipulation of docx documents, with the purpose of substitute parts of the original document with images or pieces of text. I’ve blogged also how to use Excel to create reports, and populate data programmatically with OpenXml standard.

The original code was developed for an Italian company named ActValue I collaborate with, and some people asked me to publish the full code. I cannot publish the exact version of the current library, but, thanks to the courtesy of ActValue, I can now publish an old version of the code, that contains all the techniques I’ve described in my old post about OpenXml Format.

Please use the code only as a reference to better understand the technique I explained in my blog, this is not fully ready production code. You can do everything you want with it, this is the disclaimer.

Copyright (c) 2009 Ricci Gian Maria
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the author (Ricci Gian Maria) nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Please remember that nor I nor Actvalue will be give any warranties on this code, as described in the above license. Please feel free to signal me any malfunction.

The code can be downloaded here.

Alk.

Tags:

OpenXml excel and formulas

In an old post, I deal with a simple way to create excel report using openXml format. The trick is a simple manipulation of the document with Linq to Xml.

Now I need to add another feature, I need to open an excel document with formulas, fill some cells, leaving formulas intact. My first version does not work as expected, I simply created an excel with simple formulas, then fire my function and when I open the resulting excel I see all zero on formula column, but the formula is there, and if I change some cell referenced by the formula I’ll obtain the right value.

This problem arise because formula are stored in original sheet with such a xml

<c r="C2">
  <f t="shared" ref="C2:C10" si="0">A2+B2</f>
  <v>0</v>
</c>

This means that the cell C2 contains the formula A2+B2, but the node <v>0</v> tells Excel that actual value is Zero. So when you open the resulting excel file, excel found that content of the cell is 0 and shows this value until some related cell changes content. To solve this problem I simply added a bit of code that removes the <v> element.

originalElement.Descendants(ExcelFiller.ns_s + "v").Remove();
row.Add(originalElement);

Now when excel opens elaborated document, it found no <v> (value) element, so it recalculate it based on formula.

alk.

Tags:

Manage image in openXml format part2

In a previous post I deal with image insertion into an openXml document. Now it is time to show how to change image dimension, I want to be able to define new dimension and to choose if the image should be stretched or no. The w:drawing element has two distinct part to manage image dimension, the first is the wp:extent node, child of the wp:inline one.

image

This node determines the extent of the area of the document that will contain the image, but to really change dimension of the image we should operate in a different tag:

image

The spPr node is used to determines the shape properties as described in the section 4.4.1.41 of the specification of the openXml format, it contains the xfrm node, that is used to apply 2D transformation to an object. This particular node is used to specify the offset of the picture into the area, and the ext is used to set the real image dimension. The code to change image width is the following

String nodeContent = Properties.Resources.XmlContentForEmbedImage.Replace( "###imageid###", document.WordProcessingDocument.MainDocumentPart.GetIdOfPart(newImage)) .Replace("###width###", (Width * 9525).ToString()) .Replace("###height###", (Height * 9525).ToString()); nodeContent = SetImageDimension(nodeContent);

I store a sample of the image xml code in the resource file of the project, then I simply change image Id as described in the previous post, finally I change the width and height of the wp:extent. Since the value of these tag are to be expressed in EMUs (that is English Metric Units and not the famous animal) I multiply for 9525, a constant that converts from pixel unit to EMUs. The function SetImageDimension sets the a:xfrm node.

private string SetImageDimension(string nodeContent) { if (StretchImage) { nodeContent = nodeContent.Replace("###widthr###", (Width * 9525).ToString()) .Replace("###heightr###", (Height * 9525).ToString()); } else { Double widthRatio = OriginalWidth / (Double) Width; Double heightRatio = OriginalHeight / (Double) Height; Double realRatio = Math.Max(widthRatio, heightRatio); nodeContent = nodeContent.Replace("###widthr###", (OriginalWidth * realRatio * 9525).ToString()) .Replace("###heightr###", (OriginalHeight * realRatio * 9525).ToString()); } return nodeContent; }

As you can see in the sample image code I write ###widthr### and ###heightr### tag to set the correct format, if the StretchImage is set to true, I simply set the width and height as specified from the user, if it is false I should make some simple calculation to avoid image stretching.

Now I’m able to resize the image as desidered.

alk.

Tags:

OpenXml format, insert an image into a document

In previous post I showed how to open a docx file, search for a specific text, and replace the text with another string. The reason for doing this is simply to create a master report file in docx format, and let an application insert data in specific part of the document itself.

The next step is to substitute text with an image, this is a more complex process, because we need first to insert the image into the package, then we need to reference it in the main document. The first part is really simple

ImagePart newImage = document.MainDocumentPart.AddImagePart(imageType); using (Stream image = new FileStream(imageFileName, FileMode.Open, FileAccess.Read, FileShare.Read)) { newImage.FeedData(image); }

This code is a part of a little library I’m developing, in the first line I use the AddImagePart method of the MainDocumentPart, that creates another part of the document that will contain an image, then I simply open a fileStream to read the image data and use the method FeedData of the ImagePart object. Now we have the image included in the document.

The next step is to add a reference to the image into the document, to accomplish this I store an XML fragment in project resources

<root xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"> <w:drawing> <wp:inline distT="0" distB="0" distL="0" distR="0"> <wp:extent cx="955040" cy="955040"/> <wp:effectExtent l="19050" t="0" r="0" b="0"/> <wp:docPr id="1" name="Immagine 1" descr=""/> <wp:cNvGraphicFramePr> <a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" noChangeAspect="1"/> </wp:cNvGraphicFramePr> <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"> <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture"> <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture"> <pic:nvPicPr> <pic:cNvPr id="0" name="Picture 1" descr=""/> <pic:cNvPicPr> <a:picLocks noChangeAspect="1" noChangeArrowheads="1"/> </pic:cNvPicPr> </pic:nvPicPr> <pic:blipFill> <a:blip r:embed="###imageid###"/> <a:srcRect/> <a:stretch> <a:fillRect/> </a:stretch> ... </root>

This is the code that word generates for an image. Despite the complexities of the fragment, the important part is that the only things you need to change to include the image is substute the ###imageid### with the real id of the embedded object.

String nodeContent = Properties.Resources.XmlContentForEmbedImage.Replace( "###imageid###", document.MainDocumentPart.GetIdOfPart(newImage));

Thanks to the method GetIdOfPart() of MainDocumentPart we are able to get the id of the previously embedded image, now we can simply insert this piece of Xml into the MainDocumentPart to include the image in the word document.

alk.

Tags:

Technorati Tags:

OpenXml Office format, open and substitute text

Quite often, customers ask us to generate reports in word format, the main advantage of this approach is that people feels comfortable working with word, and they love the possibility to modify the report once generated. In the past years I used many techniques to reach this goal, but in the end, a lot of time ago I resort to write a simple RTF generator that suites my needs.

In these days I reach the point where the complexities of the documents became really difficult to manage with RTF generator, moreover we need a software that permits the customers to create a master word document, then the software should only makes substitution in some prefixed part with the real data. The obvious solution seems to use Office automation, but we need this code in a web application, and office automation is not supported by microsoft in non interactive environment  . Moreover I used this technique in the past, and it is terribly slow for big documents.

We decided to create word 2007 document so I move to OpenXml format. Microsoft have a SDK that permits you to manage this new format, it permits you only to work with the overall structure of the document, manage the unzipping, adding part and zipping again document, so it’s up to you to manage the XML to modify the document, but thanks to LINQ 2 XML we can really do this in a simple way. Let’s start with a little example. I’ve created a simple document with this content

image 

My goal is to find the $$$substituteme(30) in the document, change it with another text and save the new document with another name. I create an helper class that does this for me, here is the constructor.

public Document(String originalDocumentPath, String destinationDocumentPath) { File.Copy(originalDocumentPath, destinationDocumentPath, true); Doc = WordprocessingDocument.Open(destinationDocumentPath, true); }

The SDK does not permits me to save document with another name, so I simply copy the master file with the name I want, then open the copy; the overall effect is the same, mantaining the master unchanged and have a file with a given name. The WordprocessingDocument class is the root class that you should use to manage docx files, it is true that a docx file is simply a zip file, but this class shield you from this permitting you to browse file content. My class assign the document to a private propery called Doc that does some basic management.

private WordprocessingDocument Doc { get { return doc; } set { if (doc != null) doc.Dispose(); doc = value; GrabDocumentParts(); } } private void GrabDocumentParts() { if (doc != null) { using (StreamReader sr = new StreamReader(doc.MainDocumentPart.GetStream())) { using (XmlReader xmlr = XmlReader.Create(sr)) { mainDocument = XElement.Load(xmlr); } } } }

First of all I dispose a previous document if present (actually it is not needed because I can setup document only in the constructor), suddenly  I read all the content of the MainDocumentPart into an XElement variable. Now I can change the XElement to substitute text.

public static XNamespace namespace_w = XNamespace.Get("http://schemas.openxmlformats.org/wordprocessingml/2006/main"); public Document Substitute(String tagToSearch, String textToSubstitute) { XElement node = FindNodeByTag( tagToSearch); node.Value = node.Value.Replace(tagToSearch, textToSubstitute); return this; } private XElement FindNodeByTag(String tagToSearch) { return (from b in mainDocument.Descendants(namespace_w + "t") where b.Value.Trim() == tagToSubstitute select b).Single(); }

The code is really simple, to find the code I search for a node named w:t; remember that you need to use XML namespaces to make queries. When I found the node I simply substitute the original text with the one I want and since I love fluent interfaces, the Substitute method return the original Document object to chain calls. Finally I need a Save method that close the document and save all modified content to the file.

public void Save() { using (Stream s = doc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write)) { using (XmlWriter xmlw = XmlWriter.Create(s)) { mainDocument.WriteTo(xmlw); } } doc.Close(); }

The process is really simple, I call GetStream of the MainDocumentPart object, but with FileMode.Create that requires the creation of a new stream, then I simply write to the stream all the content of the modified XElement, and the game is done. Here is a typical use.

Document doc = new Document(@"samples\doc2.docx", @"samples\doc1saved.docx"); doc.Substitute("$$$substituteme(30)", "First Substitution!!!") .Substitute("$$$substituteme(25)", "Second Substitution!!!") .Save();

Thanks to the fluent interface I can use simple syntax to change various part of the document. Working with OpenXml makes really easy to manage Word documents.

alk.

Technorati Tags:

Tags: