Thursday, February 12, 2009

OBA Part 3 - Storing and Reading Data in Word Documents


OBA Part 3 - Storing and Reading Data in Word Documents

Published 12 February 09 05:55 PM

In my last few posts we've been building a simple Office Business Application (OBA) for the new Northwind Traders. If you missed them:

In this post I'm going to talk about how we can create a purchase order in Word 2007 that contains data about the items being purchased and how we can query that data and place it into our database. We'll use this code as a basis for our SharePoint Workflow which we will build out in the next post.


If you recall our architecture diagram of our Northwind Traders OBA involved our Sales Reps submitting purchase orders as Word 2007 documents up to SharePoint which kicked off a workflow to parse the document and update the database with the order data through our data service.

This allows us to store the unstructured document on SharePoint and the structured order data in our database.

However before we build out the SharePoint Workflow we need a clean way to store and then retrieve the structured order data inside the Word 2007 document. Since Word 2007 documents are Open XML we can use the Open XML SDK and LINQ to XML to easily parse the document. (I've talked about how to manipulate documents with Open XML SDK before here.)

Content Controls

One way to store data in an Word 2007 document is by using content controls. These allow you to define specific data areas/fields in the document which are then bound to XML that is placed inside the document. When users enter data into these areas of the document the data is stored as a CustomXML Part inside the document. You can use Visual Studio to create content controls and map them to XML or you can use Word itself. There's also a nifty tool called the Word 2007 Content Control Toolkit that makes the mapping more visual. I'd also highly recommend installing the VSTO Power Tools as well which includes a VS Add-In for manipulating Open XML documents. This allows you to look inside the document easily to inspect all the parts directly within Visual Studio.

So the first thing to do is to create a purchase order template and lay out the content controls on the document. We'll create something very simple using Microsoft Word 2007. On the Developer tab you will see the Controls section. There you can choose which types of controls to lay out on the document. Click the properties button to assign a friendly title and tag to the control. Here I've laid out the minimum information we'll need to submit an order to the system:


Users can write anything else around the content controls but the system only cares about capturing the data we've specified. This gives us the ability to store structured and unstructured data completely inside the .docx file.

Creating and Mapping the XML

Now we are ready to map the values of the content controls to some custom XML. The XML document for our order looks like this. (Note that there are 10 <OrderDetail> elements I just snipped them for brevity):

<OrderEntry xmlns="urn:microsoft:examples:oba">
<CustomerID />
<OrderDate />
<Shipper />
<ProductName />
<Quantity />

Now open up the Word 2007 Content Control Toolkit an open the OrderEntryTemplate.docx. Under Actions select "Create a new Custom XML Part", switch to edit view, and then paste in the XML:


Next switch to Bind View and then drag the elements onto the content controls on the left. Make sure you select the element first and then drag it.


Once you're done, save the document and then you can open it in Visual Studio if you've loaded the VSTO Power Tools. This will show the Open XML parts of the document and you can expand the customXml folder and see that our XML has been added to the document.


If you back into Word and fill out the content controls and then view the document in Visual Studio again, you will see that the item1.xml custom XML part will contain the data we entered.

Now that we have a purchase order template we can give this to our sales reps who can collaborate with our high volume customers via email to fill it out. They can then submit the purchase orders to a SharePoint list that can run a workflow to extract the order data and update the database through the data service.

Using the Open XML SDK to Retrieve the Order Data

The easiest way to programmatically manipulate Office 2007 Open XML documents is by using the Open XML SDK. Once you install it you can then Add Reference to the DocumentFormat.OpenXML assembly. In order to use LINQ to XML you'll also need a reference to System.Core and System.Xml.Linq. These are imported automatically when you create a new project in Visual Studio 2008. You'll also need to add a Service Reference to the ADO.NET Data Service like I've shown before. So let's start simple and just create a console application for now called NorthwindOrderDocParser. Later we'll talk about moving this to a SharePoint workflow.

Before we start parsing the document let's create a couple simple classes that store the data we're extracting from our document.

''' <summary>
''' These classes represents the order data that is inside the Word Document.
''' </summary>
''' <remarks></remarks>
Public Class DocumentOrderData

Sub New(ByVal customerID As String, ByVal orderDate As Date, ByVal shipperName As String)
_CustomerID = customerID
_OrderDate = orderDate
_Shipper = shipperName
End Sub

Private _CustomerID As String
Public Property CustomerID() As String
Return _CustomerID
End Get
Set(ByVal value As String)
_CustomerID = value
End Set
End Property

Private _OrderDate As Date
Public Property OrderDate() As Date
Return _OrderDate
End Get
Set(ByVal value As Date)
_OrderDate = value
End Set
End Property

Private _Shipper As String
Public Property Shipper() As String
Return _Shipper
End Get
Set(ByVal value As String)
If value Is Nothing OrElse value.Trim = "" Then
value = "Speedy Express"
End If
_Shipper = value
End Set
End Property

Private _details As New List(Of Detail)
Public ReadOnly Property Details() As List(Of Detail)
Return _details
End Get
End Property

Public Class Detail
Sub New(ByVal productName As String, ByVal quantity As Short)
_ProductName = productName
_Quantity = quantity
End Sub

Private _ProductName As String
Public Property ProductName() As String
Return _ProductName
End Get
Set(ByVal value As String)
_ProductName = value
End Set
End Property

Private _Quantity As Short
Public Property Quantity() As Short
Return _Quantity
End Get
Set(ByVal value As Short)
_Quantity = value
End Set
End Property
End Class
End Class

Next, let's add a schema for the OrderEntry XML data that is contained in the document. This will give us IntelliSense on our XML when we're using LINQ to XML. We can just open the document in Visual Studio like before and copy the OrderEntry XML data into the clipboard. Then we can Add a new XML to Schema Item and paste into the Wizard's dialog box. This will infer the schema and place the XSD file into the project automatically for us. Notice that I specified a namespace on our OrderEntry XML data. We now can import this namespace into our main program along with a few other .NET namespaces we'll need:

'Reference to our data service and data entities:
Imports NorthwindOrderDocParser.NorthwindService
'Open XML SDK:
Imports DocumentFormat.OpenXml.Packaging
Imports System.IO
'Default XML Namespace:
Imports <xmlns="urn:microsoft:examples:oba">

We are almost ready to start writing our main program to parse the purchase order. First we need a test document. For this test I filled out the following information in a document called MyTestOrder.docx.


Now we can write our main program:

Module Module1

Sub Main()
Dim docFile = My.Computer.FileSystem.GetFileInfo("MyTestOrder.docx")

Dim docData As DocumentOrderData
Using sr = docFile.OpenRead()
'Attempt to parse the document for order data
docData = ParseOrderDocument(sr)
End Using

If docData IsNot Nothing Then
Dim employeeEmail = ""

'Attempt to add the order data through the service
AddNewOrder(docData, employeeEmail)
Console.WriteLine("Order saved successfully.")
Console.WriteLine("No order data was found in the document.")
End If

Catch ex As Exception
Console.WriteLine("Order could not be processed." & vbCrLf & ex.ToString())
End Try

End Sub

The ParseOrderDocument method is going to need to grab the XML data from our Custom XML parts as we iterate over the part collection. It's a collection because there can actually be many Custom XML definitions in our document. In order to make grabbing the XML data from the parts easier let's create an Extension method that extends the OpenXMLPart type. I like to place Extension methods in a separate file called Extensions.vb:

Imports DocumentFormat.OpenXml.Packaging
Imports System.IO
Imports System.Xml

Module Extensions

' Create an extension method so we can easily access the part XML
<System.Runtime.CompilerServices.Extension()> _
Function GetXDocument(ByVal part As OpenXmlPart) As XDocument
Dim xdoc As XDocument

Using sr As New StreamReader(part.GetStream())
xdoc = XDocument.Load(XmlReader.Create(sr))
End Using

Return xdoc
End Function
End Module

Now we can go back to our main Module1 and add a the ParseOrderDocument method. Notice that I'm using the Extension method we created in the For Each part... loop to return the custom XML as an XDocument. Then I use the child axis property <OrderEntry> (displayed in IntelliSense as I type the query) to see if the element exists. Also notice that since I imported our XML namespace at the top of the file it will only return an <OrderEntry> element in that namespace. So we're safe not to clash with other custom XML that may be added to the document by other processes.

''' <summary>
''' Attempts to parse the word document for order data and returns an order
''' object with all the required. The document must have a customXML part
''' that adheres to the OrderEntry.xsd
''' </summary>
''' <param name="docStream">The document to parse</param>
''' <returns>The order data contained in the document</returns>
''' <remarks></remarks>
Function ParseOrderDocument(ByVal docStream As Stream) As DocumentOrderData
Dim orderData As DocumentOrderData = Nothing

'Use the Open XML SDK to open the document and access parts easily
Dim wordDoc = WordprocessingDocument.Open(docStream, False)

Using wordDoc
'Get the main document part (document.xml)
Dim mainPart = wordDoc.MainDocumentPart
Dim docXML As XElement = Nothing

'Find the order data custom XML part
For Each part In mainPart.CustomXmlParts
docXML = part.GetXDocument.<OrderEntry>.FirstOrDefault()
If docXML IsNot Nothing Then
Exit For
End If

If docXML Is Nothing Then
Throw New InvalidOperationException("This document does not contain order entry data.")
End If

'Grab the order data fields from the XML
Dim customerID = docXML.<CustomerID>.Value.Trim()
Dim orderDate = docXML.<OrderDate>.Value.Trim()
Dim shipper = docXML.<Shipper>.Value.Trim()

If customerID <> "" AndAlso IsDate(orderDate) Then
'Create and fill the DocumentOrderData
orderData = New DocumentOrderData(customerID, CDate(orderDate), shipper)

For Each item In docXML.<OrderDetails>.<OrderDetail>
'Grab order details data fields
Dim product = item.<ProductName>.Value.Trim()
Dim quantity = item.<Quantity>.Value.Trim()

If product <> "" AndAlso IsNumeric(quantity) Then
'Add a new DocumentOrderData.Detail for each product found
orderData.Details.Add(New DocumentOrderData.Detail(product, CShort(quantity)))
End If

End If
End Using

Catch ex As Exception
Throw New InvalidOperationException("Could not process this document.", ex)
End Try

Return orderData
End Function

Updating the Database through the Data Service

Now that we have our document parsed we're just left with adding the data through our data service.  What we need to do is query the reference data (entities) that we'll need to properly associate on our Order. For instance Order will need a reference to the Customer, the Employee and the Shipper. Then each Order_Detail will need a reference to the Product entity. Notice that we're passing the employee email address into this method so that we can associate the sales rep with the order. If you recall we had to add this field to the Customer and Employee tables in Northwind. (For this test program I'm hard-coding the value but later we'll get this information from the Outlook client when it submits the order to SharePoint.)

Once we have these entities queried and returned from the service we can link them up properly and add our new Order and Order_Details to the data service. For more information on updating data and setting proper linkage to entities returned from an ADO.NET data service read this post and this one.

    ''' <summary>
''' Adds a new order through the ADO.NET Data service and sets up all the required
''' associations to related entities.
''' </summary>
''' <param name="docData">The order data</param>
''' <param name="employeeEmail">EmailAddress of sales representitve</param>
''' <remarks></remarks>
Private Sub AddNewOrder(ByVal docData As DocumentOrderData, ByVal employeeEmail As String)

Dim ctx As New NorthwindEntities(New Uri("http://localhost:1234/Northwind.svc/"))
Dim cust As Customer
'Try to retrieve the customer
cust = (From c In ctx.Customers _
Where c.CustomerID = docData.CustomerID).FirstOrDefault()

Catch ex As Exception
Throw New InvalidOperationException("Invalid customer ID.")
End Try

If cust IsNot Nothing Then

Dim ship = (From s In ctx.Shippers _
Where s.CompanyName = docData.Shipper).FirstOrDefault()

'Email Address will come from our Outlook client/sales person
Dim emp = (From e In ctx.Employees _
Where e.EmailAddress = employeeEmail).FirstOrDefault()

Dim o As New Order()

o.OrderDate = docData.OrderDate
o.RequiredDate = Now.AddDays(2)
o.ShipAddress = cust.Address
o.ShipCity = cust.City
o.ShipCountry = cust.Country
o.ShipName = cust.ContactName
o.ShipPostalCode = cust.PostalCode
o.ShipRegion = cust.Region
o.Freight = 25


o.Customer = cust
ctx.SetLink(o, "Customer", cust)

If ship IsNot Nothing Then
o.Shipper = ship
ctx.SetLink(o, "Shipper", ship)
End If

If emp IsNot Nothing Then
o.Employee = emp
ctx.SetLink(o, "Employee", emp)
End If

o.Order_Details = New System.Collections.ObjectModel.Collection(Of Order_Detail)

For Each item In docData.Details
Dim productName = item.ProductName.ToLower()
Dim product = (From p In ctx.Products _
Where p.ProductName.ToLower() = productName).FirstOrDefault()

If product IsNot Nothing Then
'Create a detail for each product being ordered
Dim detail As New Order_Detail()

detail.Quantity = item.Quantity
detail.UnitPrice = If(product.UnitPrice.HasValue, _
product.UnitPrice.Value, 1D)


detail.Product = product
ctx.SetLink(detail, "Product", product)

detail.Order = o
ctx.SetLink(detail, "Order", o)

ctx.AddLink(o, "Order_Details", detail)
End If
'Saving in Batch mode will update the data inside a database transaction
'This will throw an exception if the service can't save the Order
End If
End Sub
End Module

When we run this program we will see that Customer ALFKI now has a new Order and 4 Order Details entered into the database. Since we're sending the updates in Batch mode this will cause our order data to be properly wrapped in a database transaction.


Next post we'll talk about how we can create a SharePoint workflow to run this code when order documents are added to a SharePoint list. However, if SharePoint is not a requirement of your system (maybe you have no need to collaborate on documents or store this unstructured data) you could easily add this code directly to the Outlook client we built in the previous post.

I updated the sample on Code Gallery with this project so have a look.


by Beth Massi

Beth Massi - Sharing the goodness that is VB : OBA Part 3 - Storing and Reading Data in Word Documents

1 comment:

Blogger said...

Are you trying to make money from your visitors by popunder advertisments?
If so, did you take a look at Clickadu?

Blog Archive