Tuesday, March 8, 2011

C++ Console Application to Get Comments from a Microsoft Word File

Update 2019-02-19:  Many thanks to Brian Thomas @ Kutana Software for helping getting this run. The C++ project is on Github at https://github.com/travelmarx/travelmarx-blog/tree/master/ExtractComments.

Update 2019-02-11:  Based on a comment received (see below), we reviewed this code and found a number of problems.

  1. That the presentation of the code below is bad having been escaped poorly when inserted.
  2. The code itself (when corrected for escape problems) when put into VS 2017 gave a number of syntax errors.
  3. The status of OPC has changed such that getting the C++ project to build in VS wasn't obvious. Our trouble was around getting the right System.IO.Packaging and WindowsBase references - we think. Some of the original SDK Samples for Windows 7 are in Github here.
What to do? First, here is the code on Github that was intended. This addressed problems 1 and 2. We made some slight changes to the code to remove syntax errors (using SysAllocString in several spots) as well as changing stdafx.h to pch.h. The name of the precompiled library isn't fixed, just that for C++ console apps in VS 2017, pch.h was used.

For problem 3, we are working on a possible solution (see update above). If you  are interested in running extracting comments using .NET (C# and VB), see How to: Retrieve comments from a word processing document (Open XML SDK).

Begin Initial Post

Output from Comment Extraction

The goal of this post is to show how to construct a C++ console application that will extract comments from a Word document. This post builds on a previous post which showed extracting comments from a Microsoft Word document (2007 or greater). In the previous post, Getting Comments from a Microsoft Word File: Leveraging the OPC Format, we did the extraction by changing the extension of the Word document and accessing the files directly in the ZIP structure. In this post, we take the Word document as is and use a console application written in C++/COM and leveraging the OPC API to directly access the comments. The code shown here was run in Visual Studio 2010 on Windows 7.

The key to the console application logic is to understand the document parts of the Word XML format. When we crack open the Word ZIP file we could get the comments file directly. Using the API we have to follow the pattern set out in the API. The pattern for a Word document is discussed here on MSDN and here. The main document part (../word/document.xml) is the main part in the package and that the comments part (../word/comments.xml) has a relationship to the main document part that can be used to obtain the coments. On our first try, we kept trying to get the comments part directly from the package relationships which didn't work. However, once we got the document part from the package (see the FindPartByRelationshipType method in the program below), we then could use the same logic to get the comments part from the document part.

A crucial part of the console application are the definitions of content types and relationship types of parts to parts. These definitions are defined in the header file (ExtractComments.h) for this application. For example, the content type of the comments part is:

application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml

The relationship of the comments part to the document part:

http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments

Note: In this console application we did not deal with the fact that comments in a Word document can contain more than just text. In the previous post we did deal with hyperlinks as example of content besides text in comments. These improvements to this code would need to be added here. Specifically, if you look at the ECMA-376 part1 for the docx format, you can find the details of what a comment can contain and it includes charts, diagrams, hyperlinks, images, video, and embedded content.

The code shown here was build starting from the SDK samples provides with the OPC SDK Samples for Windows 7. In particular we started from the SetAuthor project inside of the AllOPCSamples.zip. We changed the SetAuthor program to suit our purpose here. The console application takes a file name as an argument. In Visual Studio, set the file name under the configuration properties of the project as shown below.

Visual Studio Console App Configuration

The code is shown below and as well as links for downloading it. Before getting to the code here is a sketch of the pseudo-logic of the code. We use the syntax of (x,y) -> z to mean x and y are used to return z. A bit simplistic, but helps clarify what is coming in and what is going out.
//pseudo-code
wmain
    COM Initilization of Thread
    CoCreateInstance of Opc Factory : () -> factory
    Load Package : (factory, fileName) -> package
    Find Document Part in Package : (package) -> documentPart
    Find Comments Part in Package : (package, documentPart) -> commentsPart
    Print Core Properties (package) -> output
    Print Comments (commentsPart) -> output

Load Package
(factory, fileName) -> package
    Create Stream on File : (factory, fileName, options) -> sourceFileStream
    Read Package from Stream : (factory, sourceFileStram, options) -> package

Find Document In Package
(package) -> documentPart
    relationshipType = http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument
    contentType = application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml
    Find Part by Relationship Type : (package, NULL, relationshipType, contentType) -> documentPart

Find Core Properties Part
(package) -> documentPart
    relationshipType = http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties
    contentType = application/vnd.openxmlformats-package.core-properties+xml
    Find Part by Relationship Type : (package, NULL, relationshipType, contentType) -> documentPart

Find Comments in Package
(package, documentPart) -> commentsPart
    relationshipType = http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments
    contentType = application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml
    Find Part by Relationship Type* : (package, documentPart, relationshipType, contentType) -> commentsPart

Find Part By Relationship Type
(package, parentPart, relationshipType, contentType) -> part
    Get Part Set : (package) -> partSet
    Get Relationship Set
    if (parentPart == NULL) then (package) -> packageRels
    else (parentPart) -> packageRels
    Get Enumerator for Type : (packageRels, relationshipType) -> packeRelsEnum
    Get Current : (packageRelsEnum) -> currentRel
    Resolve Target Uri to Part : (currentRel) -> partUri
    Part Exists : (partSet, partUri) -> partExists
    if (partExists) {
        Get Current Part : (partSet, partUri) -> currentPart
        Get Current Part Content Type : (currentPart) -> currentContentType
        if (currentContentType equals contentType)
        { // found the part }
    }

Resolve Target URI to Part
(relationship) -> resolvedUri

Print Comments
(commentsPart) -> output
    Get DOM from Part : (commentsParts, namespace) -> commentsDom
    Select Nodes : (commentsDom) -> commentsNodeList
    for each {
        Get Attributes of Comment Node
          Get Text of Comment Node
    }

Get Text of Comment Node
(node) -> output

Get Attributes of Comment Node
(node) -> output

Print Core Properties
(package) -> output
    Find Core Properties : (package) -> corePropertiesPart
    Get DOM from Part : (corePropertiesPart, namespace) -> corePropertiesDom
    Select Single Node : (corePropertiesDom, nodeName) -> nodeFound
    // work with nodeFound

Get DOM from Part
(part, namespace) -> XmlDocument


The header file for the console application can be downloaded here and is shown below.
#include "msopc.h"
#include "msxml6.h"
#include "stdafx.h"

HRESULT LoadPackage(IOpcFactory *factory, LPCWSTR packageName, IOpcPackage **outPackage);
HRESULT FindDocumentInPackage(IOpcPackage *package, IOpcPart  **documentPart);
HRESULT FindCommentsInPackage(IOpcPackage *package, IOpcPart  *parentPart, IOpcPart  **documentPart);
HRESULT FindPartByRelationshipType(IOpcPackage *package, IOpcPart *parentPart, LPCWSTR relationshipType, LPCWSTR contentType, IOpcPart **part);
HRESULT ResolveTargetUriToPart(IOpcRelationship *relativeUri, IOpcPartUri **resolvedUri);
HRESULT PrintCoreProperties(IOpcPackage *package);
HRESULT PrintComments(IOpcPart *part);
HRESULT GetAttributesOfCommentNode(IXMLDOMNode *node);
HRESULT GetTextofCommentNode(IXMLDOMNode *node);
HRESULT FindCorePropertiesPart(IOpcPackage *package, IOpcPart **part);
HRESULT DOMFromPart(IOpcPart *part, LPCWSTR selectionNamespaces, IXMLDOMDocument2 **document);

static const WCHAR g_officeDocumentRelationshipType[] =
    L"http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
static const WCHAR g_wordProcessingContentType[] =
    L"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml";
static const WCHAR g_corePropertiesRelationshipType[] =
    L"http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties";
static const WCHAR g_corePropertiesContentType[] =
    L"application/vnd.openxmlformats-package.core-properties+xml";
static const WCHAR g_commentsRelationshipType[] =
 L"http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments";
static const WCHAR g_commentsContentType[] =
 L"application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml";
static const WCHAR g_corePropertiesSelectionNamespaces[] =
    L"xmlns:cp='http://schemas.openxmlformats.org/package/2006/metadata/core-properties' "
    L"xmlns:dc='http://purl.org/dc/elements/1.1/' "
    L"xmlns:dcterms='http://purl.org/dc/terms/' "
    L"xmlns:dcmitype='http://purl.org/dc/dcmitype/' "
    L"xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'";
static const WCHAR g_commentsSelectionNamespaces[] =
 L"xmlns:wpc='http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas' "
    L"xmlns:mc='http://schemas.openxmlformats.org/markup-compatibility/2006' "
    L"xmlns:o='urn:schemas-microsoft-com:office:office' "
    L"xmlns:r='http://schemas.openxmlformats.org/officeDocument/2006/relationships' "
    L"xmlns:m='http://schemas.openxmlformats.org/officeDocument/2006/math' "
 L"xmlns:v='urn:schemas-microsoft-com:vml' "
 L"xmlns:wp14='http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing' "
 L"xmlns:wp='http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing' "
 L"xmlns:w10='urn:schemas-microsoft-com:office:word' "
    L"xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' "
 L"xmlns:w14='http://schemas.microsoft.com/office/word/2010/wordml' "
 L"xmlns:wpg='http://schemas.microsoft.com/office/word/2010/wordprocessingGroup' "
 L"xmlns:wpi='http://schemas.microsoft.com/office/word/2010/wordprocessingInk' "
 L"xmlns:wne='http://schemas.microsoft.com/office/word/2006/wordml' "
 L"xmlns:wps='http://schemas.microsoft.com/office/word/2010/wordprocessingShape' ";


The main code file for the console application can be download here and is shown below.
// ExtractComments.cpp : Defines the entry point for the console application.

#include "ExtractComments.h"
#include "stdio.h"
#include "windows.h"
#include "shlobj.h"
#include 
#include "util.h"
using namespace std;

int wmain(int argc, wchar_t* argv[])
{
 if (argc != 2)
 {
  wprintf(L"Usage: ExtractComments.exe \n");
  exit(0);
 }
 wprintf(L"Starting.\n");
 LPCWSTR pFileName = argv[1];
 HRESULT hr = CoInitializeEx(NULL, COINIT_MULTITHREADED);

 if (SUCCEEDED(hr))
 {
  IOpcPackage * package = NULL;
  IOpcPart * documentPart = NULL;
  IOpcFactory * factory = NULL;
  hr = CoCreateInstance(
      __uuidof(OpcFactory),
      NULL,
      CLSCTX_INPROC_SERVER,
      __uuidof(IOpcFactory),
      (LPVOID*)&factory
      );
  if (SUCCEEDED(hr))
  {
   wprintf(L"Created factory.\n");
   hr = ::LoadPackage(factory, pFileName, &package);
   // See command arguments in project properties for specification of file to read.
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Loaded package.\n");
   hr = ::FindDocumentInPackage(package, &documentPart);

  }
  IOpcPart *commentsPart;
  if (SUCCEEDED(hr))
  {
   wprintf(L"Found document in package.\n");
   hr = ::FindCommentsInPackage(package, documentPart, &commentsPart);
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Found comments in package.\n");
   hr = ::PrintCoreProperties(package);
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Found core properties in package.\n");
   hr = ::PrintComments(commentsPart);
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Found comments in package.\n");
  }

  // Release resources
  if (package)
  {
   package->Release();
   package = NULL;
  }

  if (documentPart)
  {
   documentPart->Release();
   documentPart = NULL;
  }

  if (factory)
  {
   factory->Release();
   factory = NULL;
  }
  CoUninitialize();
 }
 return 0;
}

HRESULT LoadPackage(
 IOpcFactory *factory,
 LPCWSTR packageName,
 IOpcPackage **outPackage)
{
 IStream * sourceFileStream = NULL;
 HRESULT hr = factory->CreateStreamOnFile(
     packageName,
     OPC_STREAM_IO_READ,
     NULL,
     0,
     &sourceFileStream);
 if (SUCCEEDED(hr))
 {
  hr = factory->ReadPackageFromStream(
     sourceFileStream,
     OPC_CACHE_ON_ACCESS,
     outPackage);
 }
 if (sourceFileStream)
 {
  sourceFileStream ->Release();
  sourceFileStream = NULL;
 }
 return hr;
}
HRESULT FindDocumentInPackage(
 IOpcPackage *package,
 IOpcPart   **documentPart)
{
  return ::FindPartByRelationshipType(
  package,
  NULL,
  g_officeDocumentRelationshipType,
  g_wordProcessingContentType,
  documentPart);

}
HRESULT FindCommentsInPackage(
 IOpcPackage *package,
 IOpcPart   *documentPart,
 IOpcPart   **commentsPart)
{
  return ::FindPartByRelationshipType(
  package,
  documentPart,
  g_commentsRelationshipType,
  g_commentsContentType,
  commentsPart);

}
HRESULT FindCorePropertiesPart(
  IOpcPackage * package,
  IOpcPart **part)
{
 return ::FindPartByRelationshipType(
    package,
    NULL,
    g_corePropertiesRelationshipType,
          g_corePropertiesContentType,
    part);
}
HRESULT FindPartByRelationshipType(
 IOpcPackage *package,
 IOpcPart *parentPart,
 LPCWSTR relationshipType,
 LPCWSTR contentType,
 IOpcPart **part)
{
 *part = NULL;
 IOpcRelationshipSet * packageRels = NULL;
 IOpcRelationshipEnumerator * packageRelsEnum = NULL;
 IOpcPartSet * partSet = NULL;
 BOOL hasNext = false;

 HRESULT hr = package->GetPartSet(&partSet);

 if (SUCCEEDED(hr))
 {
  if (parentPart == NULL)
  {
   hr = package->GetRelationshipSet(&packageRels);
  }
  else
  {
   hr = parentPart->GetRelationshipSet(&packageRels);
  }
 }
 if (SUCCEEDED(hr))
 {
  hr = packageRels->GetEnumeratorForType(
    relationshipType,
    &packageRelsEnum);
 }
 if (SUCCEEDED(hr))
 {
  hr = packageRelsEnum->MoveNext(&hasNext);
 }
 while (SUCCEEDED(hr) && hasNext && *part == NULL)
 {
  IOpcPartUri * partUri = NULL;
  IOpcRelationship * currentRel = NULL;
  BOOL partExists = FALSE;

  hr = packageRelsEnum->GetCurrent(¤tRel);
  if (SUCCEEDED(hr))
  {
   hr = ::ResolveTargetUriToPart(currentRel, &partUri);
  }
  if (SUCCEEDED(hr))
  {
   hr = partSet->PartExists(partUri, &partExists);
  }
  if (SUCCEEDED(hr) && partExists)
  {
   LPWSTR currentContentType = NULL;
   IOpcPart * currentPart = NULL;
   hr = partSet->GetPart(partUri, ¤tPart);
   IOpcPartUri * name = NULL;
   currentPart->GetName(&name);
   BSTR displayUri = NULL;
   name->GetDisplayUri(&displayUri);
   wprintf(L"currentPart: %s\n", displayUri);
   if (SUCCEEDED(hr) && contentType != NULL)
   {
    hr = currentPart->GetContentType(¤tContentType);
    wprintf(L"contentType: %s\n", currentContentType);
    if (SUCCEEDED(hr) && 0 == wcscmp(contentType, currentContentType))
    {
     *part = currentPart;  // found what we are looking for
     currentPart = NULL;
    }
   }
   if (SUCCEEDED(hr) && contentType == NULL)
   {
    *part = currentPart;
    currentPart = NULL;
   }
   CoTaskMemFree(static_cast(currentContentType));
   if (currentPart)
   {
    currentPart->Release();
    currentPart = NULL;
   }
  }
  if (SUCCEEDED(hr))
  {
   hr = packageRelsEnum->MoveNext(&hasNext);
  }
  if (partUri)
        {
            partUri->Release();
            partUri = NULL;
        }

        if (currentRel)
        {
            currentRel->Release();
            currentRel = NULL;
        }
 }
     if (SUCCEEDED(hr) && *part == NULL)
    {
        // Loop complete without errors and no part found.
        hr = E_FAIL;
    }

    // Release resources
    if (packageRels)
    {
        packageRels->Release();
        packageRels = NULL;
    }

    if (packageRelsEnum)
    {
        packageRelsEnum->Release();
        packageRelsEnum = NULL;
    }

    if (partSet)
    {
        partSet->Release();
        partSet = NULL;
    }
 return hr;
}
HRESULT ResolveTargetUriToPart(
 IOpcRelationship *relationship,
 IOpcPartUri **resolvedUri
 )
{
 IOpcUri * sourceUri = NULL;
 IUri * targetUri = NULL;
 OPC_URI_TARGET_MODE targetMode;
 HRESULT hr = relationship->GetTargetMode(&targetMode);
 if (SUCCEEDED(hr) && targetMode != OPC_URI_TARGET_MODE_INTERNAL)
 {
  return E_FAIL;
 }
 if (SUCCEEDED(hr))
 {
  hr = relationship->GetTargetUri(&targetUri);
 }
 if (SUCCEEDED(hr))
 {
  hr = relationship->GetSourceUri(&sourceUri);
 }
 if (SUCCEEDED(hr))
 {
  hr = sourceUri->CombinePartUri(targetUri, resolvedUri);
 }
 if (sourceUri)
 {
  sourceUri->Release();
  sourceUri = NULL;
 }
 if (targetUri)
 {
  targetUri->Release();
  targetUri = NULL;
 }
 return hr;
}
HRESULT PrintComments(
 IOpcPart *commentsPart)
{
 IXMLDOMDocument2 * commentsDom = NULL;

 HRESULT hr = ::DOMFromPart(
    commentsPart,
    g_commentsSelectionNamespaces,
    &commentsDom);
 if (SUCCEEDED(hr))
 {
  IXMLDOMNodeList * commentsNodeList = NULL;
  BSTR text = NULL;
  hr = commentsDom->selectNodes(
   L"//w:comment",
   &commentsNodeList);
  if (SUCCEEDED(hr) && commentsNodeList != NULL)
  {
   // Iterate through comment nodes
   // http://msdn.microsoft.com/en-us/library/ms757073(VS.85).aspx
   long nodeListLength = NULL;
   hr = commentsNodeList->get_length(&nodeListLength);

   for (int i = 0; i < item =" NULL;" hr =" commentsNodeList-">get_item(i, &item);
    SUCCEEDED(hr) ? 0 : throw hr;

    ::GetAttributesOfCommentNode(item);
    ::GetTextofCommentNode(item);
   }

  }
  // Release resources
        if (commentsNodeList)
        {
            commentsNodeList->Release();
            commentsNodeList = NULL;
        }
 }
 // Release resources
    if (commentsPart)
    {
        commentsPart->Release();
        commentsPart = NULL;
    }

    if (commentsDom)
    {
        commentsDom->Release();
        commentsDom  = NULL;
    }

 return hr;
}
HRESULT GetTextofCommentNode(
 IXMLDOMNode *node
 )
{
 BSTR bstrQueryString1 = ::SysAllocString(L"w:p");
 BSTR bstrQueryString2 = ::SysAllocString(L"w:r");
 BSTR commentText = NULL;
 IXMLDOMNodeList *resultList1 = NULL;
 IXMLDOMNodeList *resultList2 = NULL;
 IXMLDOMNode *pNode, *rNode = NULL;

 long resultLength1, resultLength2;

 HRESULT hr = node->selectNodes(bstrQueryString1, &resultList1);
 SUCCEEDED(hr) ? 0 : throw hr;
 hr = resultList1->get_length(&resultLength1);
 if (SUCCEEDED(hr))
 {
  resultList1->reset();
  for (int i = 0; i <>get_item(i, &pNode);
   if (pNode)
   {
    //wprintf(L"--Found a w:p node.\n");
    wprintf(L"\n");
    pNode->selectNodes(bstrQueryString2, &resultList2);
    SUCCEEDED(hr) ? 0 : throw hr;
    hr = resultList2->get_length(&resultLength2);
    if (SUCCEEDED(hr))
    {
     resultList2->reset();
     for (int j = 0; j <>get_item(j, &rNode);
      if (rNode)
      {
       rNode->get_text(&commentText);
       //wprintf(L"----Found a w:r node. \n");
       wprintf(commentText);
      }
     }
    }

   }
  }
 }

 ::SysFreeString(bstrQueryString1);  ::SysFreeString(bstrQueryString2);
 bstrQueryString1 = NULL;            bstrQueryString2 = NULL;
 resultList1->Release();    resultList2->Release();
 resultList1 = NULL;     resultList2 = NULL;
 pNode->Release();     rNode->Release();
 pNode = NULL;      rNode = NULL;
 return hr;
}
HRESULT GetAttributesOfCommentNode(
 IXMLDOMNode *node
 )
{
 VARIANT commentAuthorStr, commentDateStr;
 BSTR bstrAttributeAuthor = ::SysAllocString(L"w:author");
 BSTR bstrAttributeDate = ::SysAllocString(L"w:date");

    // Get author and date attribute of the item.
 //http://msdn.microsoft.com/en-us/library/ms767592(VS.85).aspx
 IXMLDOMNamedNodeMap *attribs = NULL;
    IXMLDOMNode *AttrNode = NULL;
 HRESULT hr = node->get_attributes(&attribs);
 if (SUCCEEDED(hr) && attribs)
 {
  attribs->getNamedItem(bstrAttributeAuthor, &AttrNode);
  if (SUCCEEDED(hr) && AttrNode)
  {
   AttrNode->get_nodeValue(&commentAuthorStr);
  }
  AttrNode->Release();
  AttrNode = NULL;
  attribs->getNamedItem(bstrAttributeDate, &AttrNode);
  if (SUCCEEDED(hr) && AttrNode)
  {
   AttrNode->get_nodeValue(&commentDateStr);
  }
  AttrNode->Release();
  AttrNode = NULL;
 }
 attribs->Release();
 attribs = NULL;

 wprintf(L"\n-------------------------------------------------");
 wprintf(L"\nComment::\nAuthor: %s, Date: %s\n", commentAuthorStr.bstrVal, commentDateStr.bstrVal);

 ::SysFreeString(bstrAttributeAuthor); ::SysFreeString(bstrAttributeDate);
 bstrAttributeAuthor = NULL;    bstrAttributeDate = NULL;

 return hr;
}
HRESULT PrintCoreProperties(
 IOpcPackage *package)
{
 IOpcPart * corePropertiesPart = NULL;
 IXMLDOMDocument2 * corePropertiesDom = NULL;

 HRESULT hr = ::FindCorePropertiesPart(
     package,
     &corePropertiesPart);
 if (SUCCEEDED(hr))
 {
  hr = ::DOMFromPart(
    corePropertiesPart,
    g_corePropertiesSelectionNamespaces,
    &corePropertiesDom);
 }
 if (SUCCEEDED(hr))
 {
  IXMLDOMNode * creatorNode = NULL;
  BSTR text = NULL;
  hr = corePropertiesDom->selectSingleNode(
    L"//dc:creator",
    &creatorNode);
  if (SUCCEEDED(hr) && creatorNode != NULL)
  {
   hr = creatorNode->get_text(&text);
  }
  if (SUCCEEDED(hr))
  {
   wprintf(L"Author: %s\n", (text != NULL) ? text : L"[missing author info]");
  }
  // Release resources
        if (creatorNode)
        {
            creatorNode->Release();
            creatorNode = NULL;
        }

        SysFreeString(text);

  // put other code here to read other properties
 }
 // Release resources
    if (corePropertiesPart)
    {
        corePropertiesPart->Release();
        corePropertiesPart = NULL;
    }

    if (corePropertiesDom)
    {
        corePropertiesDom->Release();
        corePropertiesDom  = NULL;
    }
 return hr;
}

HRESULT DOMFromPart(
 IOpcPart * part,
 LPCWSTR selectionNamespaces,
 IXMLDOMDocument2 **document)
{
 IXMLDOMDocument2 * partContentXmlDocument = NULL;
 IStream * partContentStream = NULL;

 HRESULT hr = CoCreateInstance(
     __uuidof(DOMDocument60),
     NULL,
     CLSCTX_INPROC_SERVER,
     __uuidof(IXMLDOMDocument2),
     (LPVOID*)&partContentXmlDocument);
 if (SUCCEEDED(hr) && selectionNamespaces)
 {
  AutoVariant v;
  hr = v.SetBSTRValue(L"XPath");
  if (SUCCEEDED(hr))
  {
   hr = partContentXmlDocument->setProperty(L"SelectionLanguage", v);
  }
  if (SUCCEEDED(hr))
  {
   AutoVariant v;
   hr = v.SetBSTRValue(selectionNamespaces);
   if (SUCCEEDED(hr))
   {
    hr = partContentXmlDocument->setProperty(L"SelectionNamespaces", v);
   }
  }
 }
 if (SUCCEEDED(hr))
 {
  hr = part->GetContentStream(&partContentStream);
 }
 if (SUCCEEDED(hr))
 {
  VARIANT_BOOL isSuccessful = VARIANT_FALSE;
  AutoVariant vStream;
  vStream.SetObjectValue(partContentStream);
  hr = partContentXmlDocument->load(vStream, &isSuccessful);
  if (SUCCEEDED(hr) && isSuccessful == VARIANT_FALSE)
  {
   hr = E_FAIL;
  }
 }
 if (SUCCEEDED(hr))
 {
  *document = partContentXmlDocument;
  partContentXmlDocument = NULL;
 }
 // Release resources
    if (partContentXmlDocument)
    {
        partContentXmlDocument->Release();
        partContentXmlDocument = NULL;
    }

    if (partContentStream)
    {
        partContentStream->Release();
        partContentStream = NULL;
    }
 return hr;
}

6 comments:

  1. Thanks for the sample code. I had to copy what was on screen as the download link doesn't work any more. I got loads of compilation errors. In particular, at line 337 there's code that looks like this (which makes no sense):

    for (int i = 0; i < item =" NULL;" hr =" commentsNodeList-">get_item(i, &item);
    SUCCEEDED(hr) ? 0 : throw hr;

    ::GetAttributesOfCommentNode(item);
    ::GetTextofCommentNode(item);
    }

    But I think it should look like this instead:

    for (int i = 0; i < nodeListLength; ++i)
    {
    IXMLDOMNode *item = NULL;
    hr = commentsNodeList -> get_item(i, &item);
    SUCCEEDED(hr) ? 0 : throw hr;

    ::GetAttributesOfCommentNode(item);
    ::GetTextofCommentNode(item);
    }

    What's happened is that when the correct code was pasted in everything between matching pairs of < and > characters was treated as html, and got removed. Does that sound likely?

    ReplyDelete
  2. You are right, something messed up with code. I found the original and put it into Visual Studio and with a few changes have not syntax errors. Now just trying to work through build issue - we'll update this post soon.

    ReplyDelete
  3. See update added at start of post.

    ReplyDelete
  4. Is issue 3 still a problem? I got the original code to build and run in VS2017 - after I'd fixed the escaping issue - I can supply a source code zip if you like...

    ReplyDelete
  5. VS2017 and Windows 10? I had problems building that stumped me; I missing something. You can send the zip to travelmarx at live dot com. Thanks!

    ReplyDelete