Monday, February 28, 2011

Indian Meal Moth – Plodia interpunctella

Illustrationof Plodia interpunctella - Indian Meal Moth
Illustration of Plodia interpunctella from Agricultural Entomology for Students, Farmers, Fruit-Growers and Gardeners by Herbert Osborn, published in 1916.

Travelmarx headquarters is under a relentless siege from the Indian Meal Moth – Plodia interpunctella. Also called the Pantry Moth, Flour Mouth, or the North American High-Flyer. Every morning we find one or two of the moths in our kitchen. We did discover some cornmeal contaminated with larvae in January and cleaned everything up and threw out contaminated items. When we first saw the larvae we thought they were weevils, but now with the moths we know they were unmistakably Plodia interpunctella. We are now, we hope, seeing the last of these pests. But a sweep through all food stores is in order again. 

The Encyclopedia of Life lists the conservation status of this insect as “of no concern”. We don’t feel so bad using the electric bug zapper racket to usher then into their next life and break the cycle of infestation. One characteristic of interpunctella is that when you touch them the drop straight down and stay motionless for a few moments, making it easy to catch them.

The genus Plodia was first described in 1845 by Achille Guenée, a French lawyer and entomologist. The species interpunctella was first described in 1813 by Jacob Hübner, a German entomologist. But what do the names mean? The best we could come up with is the following: the genus name, Plodia, is without etymology according to An accentuated list of the British Lepidoptera, 1858 (p. 58) and interpunctella means “well-divided, pointed”. What part of the larva or moth interpunctella applies to is not specified.

Plodia interpunctella, Moth - Alive
Plodia interpunctella, Moth - Alive Plodia interpunctella, Moth - Alive Plodia interpunctella, Moth - Alive

Plodia interpunctella, Moth - Dead

Plodia interpunctella, Moth - Dead Plodia interpunctella, Moth - Dead Plodia interpunctella, Moth - Dead

Wednesday, February 23, 2011

Getting Comments from a Microsoft Word File: Leveraging the OPC Format

Example of Extracting Comments From a Microsoft Word Document


Overview


This post shows one way to grab all the comments from a Microsoft Word document and display them as HTML. We last tested this August 2021 with the latest version of Microsoft Word.

The method shown here leverages the fact that, starting with Word 2007, documents are zipped packages of XML files and associated resources that can be "cracked" open simply by changing the extension from .docx to .zip. This capability is part of the Office Open XML standard, ECMA-376. In other words, a Word document is really a ZIP package that inside contains a virtual directory structure with XML files and resources (like images) that comprise the document. The package concept as described on MSDN is analogous to a filing cabinet. There is a good tutorial on the open format on this Office training page. The ZIP-like behavior applies to more than just the Word format we are dealing with here. It applies to Excel spreadsheets (.xlsx), PowerPoint presentations (.pptx) and XPS documents (.xps).

We thought this was kind of interesting when we first learned about it and thought about a way to exploit this format. What we came up with is a scenario when you want to get the comments out of a document, which turns out to be common. With that in mind, let's begin.

The steps


Suppose you have the document "Software Spec.docx" that has comments in it with at most hyperlinks in them and you want to extract all the comments and hyperlinks. First we'll need to get at the comments stored as an XML file inside the package. To do this, add the extension .zip to the file name and then unzip the file so that you end up with a directory looking like this:

OPC - Word Comments Screen4

If you go into the unzipped folder you are at the top level folder:

OPC - Word Comments Screen1

If you go into the word directory, you should see something that looks like the following image:

OPC - Word Comments Screen2

This contains the file comments.xml that has the comments in it. But we are also considering that the comments have hyperlinks in them, so we need to go even further and go into the _rels folder. 

OPC - Word Comments Screen3

In the _rels folder there is a comments.xml.rels file that contains the hyperlinks that are used in the comments. Together the comments.xml and the comments.xml.rels can be used to get what we want. (If you don't have a comments.xml.rels it means your comments don't have any hyperlinks in them. See step 2 below for more information)

To get the comments out we'll use an XSL transform on the two XML comment files to transform XML to HTML. So the basic strategy is this :

1. Take the comments.xml as is and place in a directory (we'll call it the transform directory) where we'll do the transformation. A different transform directory than the unzipped folder is best to avoid confusion.

2. In the transform directory, also put the comments.xml.rels file.

If you don't have a comments.xml.rels file, it means the comments in your Word doc don't have any hyperlinks. If this is the case, create an empty comments.xml.rels file. Technically, this isn't really necessary, you can leave out the file and the transform will fail gracefully.

3. In the transform directory, create a transform.xslt file and put the content shown below in the code section in it. 

4. In the transform directory, create a stylesheet.css file and put the content shown below in it. This is optional, but makes the output look nicer. The transform uses speech bubbles to display comments, an idea that comes from http://nicolasgallagher.com/pure-css-speech-bubbles/.

5. Finally, in the comments.xml file, add the line that references the transform.xslt file. This additional line will be the second line in your file. (See full file example in code section below.)

<?xml-stylesheet type="text/xsl" href="transform.xslt"?>

6. See the comments with transformation and stylesheet applied. Once upon a time, you could just open the comments.xml file in a browser and the comments would be transformed. Over time, security made this trickier and rightly so. If you are a developer, you probably have your favorite way to transform the XML to HTML, like using Visual Studio. The easiest way we found to do is as follows:

a. Be sure all Edge browser windows are closed and start a new version with:

Windows Key W + R
msedge --allow-file-access-from-files

b. Or, if you work in Chrome, close all Chrome browser windows and start a new version with:

Windows Key W+ R
chrome --allow-file-access-from-files

c. Drag the comments.xml into the new browser window and the browser should transform the XML for you.

d. Kill the browser started because you should only use it for this test.

Run dialog box opening Edge to allow file access from files


Extending the transform


Above we specified that we were dealing with comments with hyperlinks. But, more generally, comments in a Word file can have images, SmartArt and a lot more. To make the comment extraction method given here more robust you would need to modify the XSLT to take all this into account. For example, if you inserted a SmartArt Graphic into a comment, the word\comments.xml file would reference a relationship in the word\_rels\comments.xml.rels file which would reference word\diagrams\data1.xml (for example) that might in turn reference another file \word\diagrams\drawing1.xml (for example). The point is, it can get quite complicated and all the paths need to be followed to reconstruct the comments as they appear in the document.

Note that we have to deal with two XML files, comments.xml and comments.xml.rels, with one transform. We do this by using the XSLT document function. Notice in the transform.xlst there is this line:

<xsl:variable name="rels" select="document('comments.xml.rels')"/>

Full code examples


Example comments.xml file (a snippet):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="transform.xslt"?>
<w:comments xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
           xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
           xmlns:o="urn:schemas-microsoft-com:office:office"
           xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
           xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
           xmlns:v="urn:schemas-microsoft-com:vml"
           xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
           xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
           xmlns:w10="urn:schemas-microsoft-com:office:word"
           xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
           xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
           xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
           xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
           xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
           xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
   <w:comment w:id="8" w:author="Arthur Smash" w:date="2011-02-21T13:10:00Z" w:initials="AS">
       <w:p w14:paraId="522F5EC4" w14:textId="77777777" w:rsidR="00C60769" w:rsidRDefault="00C60769">
           <w:pPr>
               <w:pStyle w:val="CommentText"/>
           </w:pPr>
           <w:r>
               <w:rPr>
                   <w:rStyle w:val="CommentReference"/>
               </w:rPr>
               <w:annotationRef/>
           </w:r>
           <w:r>
               <w:t>We need to add more to this section so readers know what is going on here.</w:t>
           </w:r>
       </w:p>
       <w:p w14:paraId="5E190A2F" w14:textId="77777777" w:rsidR="00C60769" w:rsidRDefault="00C60769">
           <w:pPr>
               <w:pStyle w:val="CommentText"/>
           </w:pPr>
       </w:p>
       <w:p w14:paraId="632FBBEE" w14:textId="01A8A8C9" w:rsidR="00C60769" w:rsidRDefault="00C60769">
           <w:pPr>
               <w:pStyle w:val="CommentText"/>
           </w:pPr>
           <w:r>
               <w:t xml:space="preserve"> For background reading see the one page spec I put together last week.</w:t>
           </w:r>
       </w:p>
   </w:comment>

The transform.xslt file (downloadable version is here):

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
               xmlns:msxsl="urn:schemas-microsoft-com:xslt"
               xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
               xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
               exclude-result-prefixes="msxsl">
   <xsl:variable name="rels" select="document('comments.xml.rels')"/>
   <xsl:output method="html" doctype-public="html"/>
   <xsl:template match="w:comments">
       <html lang="en">
           <head>
               <title>Comments from Word</title>
               <link href="stylesheet.css" rel="stylesheet" type="text/css" />
           </head>
           <body>
               <xsl:for-each select="w:comment">
                   <div>
                       <i>
                           <xsl:value-of select="@w:author"/>
                       </i> on
                       <i>
                           <xsl:value-of select="@w:date"/>
                       </i> said
                   </div>
                   <div class="triangle-isosceles top">
                       <xsl:for-each select="w:p">
                           <p>
                               <xsl:for-each select="child::node()">
                                   <xsl:choose>
                                       <xsl:when test="name()='w:hyperlink'">
                                           <xsl:variable name="cId" select="@r:id"/>
                                           <xsl:variable name="link" select="$rels//*[@Id=$cId]/@Target"/>
                                           <a href="{$link}">
                                               <xsl:value-of select="./w:r"/>
                                           </a>
                                       </xsl:when>
                                       <xsl:when test="name()='w:r'">
                                           <xsl:value-of select="."/>
                                       </xsl:when>
                                   </xsl:choose>
                               </xsl:for-each>
                           </p>
                       </xsl:for-each>
                   </div>
                   <xsl:if test="position() != last()">
                   </xsl:if>
               </xsl:for-each>
           </body>
       </html>
   </xsl:template>
</xsl:stylesheet>


The stylesheet.css file (optional, downloadable version is here):

body
{
   font-family: Verdana;
}

.triangle-isosceles
{
   position: relative;
   padding: 15px;
   margin: 1em 0 3em;
   color: #000;
   background: #f3961c; /* default background for browsers without gradient support */ /* css3 */
   -moz-border-radius: 10px;
   -webkit-border-radius: 10px;
   border-radius: 10px; /* NOTE: webkit gradient implementation is not as per spec */
   background: -webkit-gradient(linear, left top, left bottom, from(#f9d835), to(#f3961c));
   background: -moz-linear-gradient(top, #f9d835, #f3961c);
   background: -o-linear-gradient(top, #f9d835, #f3961c);
}
.triangle-isosceles.top
{
   /* NOTE: webkit gradient implementation is not as per spec */
   background: -webkit-gradient(linear, left top, left bottom, from(#f3961c), to(#f9d835));
   background: -moz-linear-gradient(top, #f3961c, #f9d835);
   background: -o-linear-gradient(top, #f3961c, #f9d835);
}
.triangle-isosceles:after
{
   content: "";
   display: block; /* reduce the damage in FF3.0 */
   position: absolute;
   bottom: -15px; /* value = - border-top-width - border-bottom-width */
   left: 50px; /* controls horizontal position */
   width: 0;
   height: 0;
   border-width: 15px 15px 0; /* vary these values to change the angle of the vertex */
   border-style: solid;
   border-color: #f3961c transparent;
}

.triangle-isosceles.top:after
{
   top: -15px; /* value = - border-top-width - border-bottom-width */
   left: 50px; /* controls horizontal position */
   bottom: auto;
   left: auto;
   border-width: 0 15px 15px; /* vary these values to change the angle of the vertex */
   border-color: #f3961c transparent;
}


A subsequent post describes a C++ program to get comments.

Saturday, February 12, 2011

Tête-à-tête + Picturing a Passion at the Frye Art Museum

Salon-style presentation, Tête-à-tête Exhibit at the Frye Museum Salon-style presentation, Tête-à-tête Exhibit at the Frye Museum 

Seattle patrons of the arts Charles Frye (1858 – 1940) and his wife Emma Frye ( – 1934) are who we have to thank for today’s Frye Art Museum which was built in 1952 to house their art collection. The Fryes made their fortune in the meat-packing business which boomed during the Klondike Gold Rush. 

From their first painting purchased in 1893 to the death of Emma, the couple collected more than 230 works. Most of the work is from the Munich School, reflecting their German background and interest in European realism. One hundred fifty of those paintings are on display at the Frye from February 6, 2010 – January 15, 2012. 

In the exhibit Tête-à-tête, the curators have recreated the “sumptuous viewing experience enjoyed by visitors to the art gallery in Charles and Emma Frye’s Seattle home in the first decades of the twentieth century.” The paintings are hung salon-style, in a dense mass of paintings. Thematically, they all mesh (to our modern, distant eye) and feature either artists from the Munich Secession or artists from the preceding Artists’ Association, the Munich Künstlergenossenschaft. 

In the adjoining exhibit, Picturing a Passion, you get a little background on the Fryes and a view of what their collection looked like through historical photographs of their art collection as it was presented in their home and offices. What we didn’t know was that in 1943, a B-29 bomber crashed into the Frye headquarters. Thirty-two people and a number of animals were killed. As well, art work displayed at the headquarters and art records were lost. For more information on the history of this free(!) museum, see HistoryLink.org

Charles and Emma Frye portraits at the Frye Art Museum
Charles and Emma Frye portraits at the Frye Art Museum

Friday, February 4, 2011

Getting Our Italian Passports

The Passports
The Passports as Photographed at Cafe Zuni

Today we flew to San Francisco to apply for our Italian passports. (A previous post details the quest for citizenship, which we happily have!) Our appointments were for the late morning so we had plenty of time to get an early morning flight from Seattle, take the BART from the airport to the Civic Center stop, and walk the few miles to the Italian Consulate at 2590 Webster St.

We arrive at the consulate and hand over our paperwork at the sportello and prepare to wait. Note if you bring money to pay for the passport fee, bring change. We didn’t think to bring change and were missing 20 cents. (The cost of each passport this day was $109.10.) Our thinking was, here’s a one dollar bill, keep the change. But that does not work here. We end up borrowing 20 cents from someone in the waiting room.

A few minutes go by and the small waiting room starts filling up with people with all sorts of requests: someone collecting a pension, someone needing to have a legal document reviewed, someone with general questions, and others like us getting a passport. It starts to get a little tense. A form not filled out correctly by the pensioner starts some grumbling in those waiting. Then the person who handles passports comes out with his date book and informs us that we are there one month too early! I guess in our excitement we heard February 4th and not March 4th. Oops. He is very accommodating since we travelled to get there and asks us if we could come back in a one hour – which is fine by us. So off we go for a coffee and some hand-wringing.

We arrive back at the consulate later and the waiting room has cleared out and we are the only ones there. By the way did we have the 20 cents? Somehow in the time we were gone, the 20 cents disappeared. A quick trip several blocks away to buy a pack of gum and get 50 cents in dimes. Once inside the consulate (finally past the sportello!) everything goes smoothly and it takes about 25-30 minutes per person to process the passport information and create the passport on the spot! We walked out of the consulate with the passports!

We have to say that the folks there were great: from the staff working the sportello, to an intern who answered questions patiently, the man who processed our passports. They made our day.

The Italian Consulate, San Francisco (entrance is on the side to the right)
Italian Consulate San Francisco


View from Coit Tower
San Francisco Skyline from Coit Tower

Thursday, February 3, 2011

A Do-It-Yourself (DIY) Sonos Controller (Web Page) on iPad

iPad Web Page Displaying Sonos Content


(The code on this page was last checked and verified in June 2014.)

Warning this is not an elegant solution, however, it may interest those curious about Sonos and ways to get more interesting information displays about what Sonos is playing. Start from the post, A Simple Sonos JavaScript and Java Application. Use the HTML page (with JavaScript code) that is given in that post or here on GitHub, exactly as is.

The trick to this whole process is to host the page (with the JavaScript functionality that does all the magic) locally on the iPad. The way I did that is put the page in Dropbox and then using the Atomic Web Browser I could save the page locally. When the page is rendered locally the URL looks like this (in Atomic) “file:///var/mobile/Applications....”.

The following images are taken from an iPad (hold the Home button and power button simultaneously).

1. Open up the Atomic Browser.
Navigate to www.dropbox.com and sign in. Make sure you have uploaded the file (WhatSonosIsPlaying.html) to DropBox.



2. Select file in Dropbox and Get the File Locally


3. Select the Local File in Atomic Web Browser

The page should show metadata about what's playing. Clicking the Google or Bing search buttons should open into a new window.