Taoffi's blog

prisonniers du temps

xsl witness!

Transforming xml content through xsl stylesheets is a useful and relatively common feature in the development process. I talked about this in the previous post about OneNote pages html preview.
Searching in my personal code toolbox, I just found this ‘iXslWitness’, a tool I wrote a couple of years ago to check the effectiveness of a stylesheet in transforming xml to html. Its usage is quite simple: you select an xml file and the xsl stylesheet to use. And you get the html transformed content.
I hope that can be useful for anyone involved in such tasks!
A screenshot of transforming a OneNote page xml content (a list of ‘The World If’ publications of The Economist newspaper):

worldif2017-witness

You can download the tool Here.
The source code is Here.

OneNote page xsl transformation

As we saw in a previous post, OneNote API is xml-based. You call the OneNote Application object to obtain almost all needed information as xml strings.

One of that information is the page content. Whose schema is defined as explained in the previous post.

Once you get the page content’s xml string, it is a little bit of work to transform that into a useful html page

To do this, I used an xslt style sheet which is the subject of the current post.

How does it work?

Assume you have an xsl sheet string and the xml string of a page. You can then process both in a way similar to the following code:

 

using System.Xml.Xsl;

 

public static string XmlToHtml(string xmlString, string xslString)
{
    string            html;
    XslCompiledTransform    transform  = null;
    XmlReader         xslReader        = null,
                      xmlReader        = null;
    MemoryStream      memStream        = null;
    StreamReader      sr               = null;
    MemoryStream      xslStream        = null,
                      xmlStream        = null;
    byte[]            xslBytes         = Encoding.UTF8.GetBytes( xslString),
                      xmlBytes         = Encoding.UTF8.GetBytes( xmlString);

    xslStream    = new MemoryStream( xslBytes);
    xmlStream    = new MemoryStream( xmlBytes);

    transform    = new XslCompiledTransform();
    xslReader    = XmlReader.Create( xslStream);
    xmlReader    = XmlReader.Create( xmlStream);
    memStream    = new MemoryStream();

    transform.Load( xslReader);
    transform.Transform( xmlReader, null, memStream);

    memStream.Position    = 0;

    sr      = new StreamReader( memStream);
    html    = sr.ReadToEnd();
               
    xslStream.Close();
    xmlStream.Close();
    memStream.Close();
    sr.Close();
    xslReader.Close();

    return html;
}

 

 

 

How to get the page xml content?

Assuming you have the page ID, here is a sample code to query the page content through OneNote API:

 

public static string GetPageContentXmlString(string pageId)
{
    var         onenoteApp  = new Application();
    string      pageXml     = null;
    PageInfo    pageInfo    = PageInfo.piAll;

    onenoteApp.GetPageContent(pageId, out pageXml, pageInfo);
    return pageXml;
}

 

Reminder: The page content xsd schema

 

The xsl style sheet general structure

Let us use the iXml explorer to navigate through the xsl stylesheet used to transform the page’s xml into html.

(I searched for ‘match’ to locate templates defined in the stylesheet).

As you may notice, we have an xsl template to handle each OneNote page defined xsd type:

  • A template to process the page root information
  • A template to process the page’s title
  • A template to process outline elements
  • A template to process OEChildren collection
  • … and so forth

Sample xsl code

Process OneNote Element (one:OE)

  <!--
  ************************************************
  one:OE
  ************************************************
  -->
  <xsl:template match="one:OE">
    <xsl:param name="nest_level" select="0" />

    <xsl:variable name="listNode" select="./one:List" />
    <xsl:variable name="quickStyleIndex" select="./@quickStyleIndex" />
    <xsl:variable name ="styleNode" select="msxsl:node-set($quickStyleList)/quickStyle[@index=$quickStyleIndex]" />

    <xsl:variable name="quickStyle">
      <xsl:choose>
        <xsl:when test="$styleNode">
          <xsl:value-of select="$styleNode/@style"/>
        </xsl:when>
      </xsl:choose>
    </xsl:variable>

    <!-- is there any list here? -->
    <xsl:choose>
      <xsl:when test="$listNode">
        <xsl:variable name="number"    select="$listNode/one:Number" />
        <xsl:variable name="txt"      select="./one:T" />
        <xsl:variable name="listItemTag">
          <xsl:text>li</xsl:text>
        </xsl:variable>

        <!-- list tag: either <ol> or <ul> -->
        <xsl:variable name="listTag">
          <xsl:choose>
            <xsl:when test="$number">
              <xsl:text>ol</xsl:text>
            </xsl:when>
            <xsl:otherwise>
              <xsl:text>ul</xsl:text>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:variable>

        <xsl:choose>
          <!-- numbered list? output <ol> -->
          <xsl:when test="$number">
            <xsl:variable name="txtNum"        select="number($number/@text)" />
            <xsl:variable name="fontNum"      select="$number/@font" />
            <xsl:variable name="tag" select="concat('<', $listTag, '>')" />
            <xsl:value-of disable-output-escaping="yes" select="$tag"/>

            <!-- *********** output <li value="xx"> ' style="font-family:', $fontNum, ';"',************* -->
            <xsl:value-of disable-output-escaping="yes" select="concat('<', $listItemTag, ' value=', '"', $txtNum, '">')" />

            <xsl:call-template name="outputListItem">
              <xsl:with-param name="listNode" select="$listNode" />
              <xsl:with-param name="itemText" select="$txt" />
            </xsl:call-template>
          </xsl:when>

          <!-- bullet ? output <ul> -->
          <xsl:otherwise>
            <xsl:variable name="tag" select="concat('<', $listTag, '>')" />
            <xsl:value-of disable-output-escaping="yes" select="$tag"/>
            <!-- *********** output <li> ************* -->
            <xsl:value-of disable-output-escaping="yes" select="concat('<', $listItemTag,'>')" />

            <xsl:call-template name="outputListItem">
              <xsl:with-param name="listNode" select="$listNode" />
              <xsl:with-param name="itemText" select="$txt" />
            </xsl:call-template>
          </xsl:otherwise>

        </xsl:choose>

        <!-- process list's sub items -->
        <xsl:apply-templates select="./one:OEChildren">
          <xsl:with-param name="nest_level" select="1 + $nest_level" />
        </xsl:apply-templates>

        <xsl:apply-templates select="./one:Table" />

        <!-- close the list item tag -->
        <xsl:value-of disable-output-escaping="yes" select="concat('</', $listItemTag,'>')" />
        <!-- close the list tag -->
        <xsl:value-of disable-output-escaping="yes" select="concat('</', $listTag, '>')"/>
      </xsl:when>

      <!-- no list: process all sub items -->
      <xsl:otherwise>
        <xsl:variable name="style0" >
          <xsl:call-template name="string-replace-all">
            <xsl:with-param name="text" select="./@style" />
            <xsl:with-param name="replace" select="''" />
            <xsl:with-param name="by" select="''" />
          </xsl:call-template>
        </xsl:variable>

        <xsl:variable name="alignment" select="./@alignment" />

        <xsl:variable name="style">
          <xsl:choose>
            <xsl:when test="$alignment='right'">
              <xsl:value-of select="concat('width:100%; float:right; display:inline; text-align:right;', $style0)"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="$style0"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:variable>

        <xsl:choose>
          <xsl:when test="string-length($style)>0">
            <span style="{$style}">
              <xsl:apply-templates />
            </span>
          </xsl:when>

          <!-- ****** no style defined -->
          <xsl:otherwise>
            <xsl:choose>
              <xsl:when test="$quickStyle">
                <span style="{$quickStyle}">
                  <xsl:apply-templates>
                    <xsl:with-param name="nest_level" select="1 + $nest_level" />
                  </xsl:apply-templates>
                </span>
              </xsl:when>

              <xsl:otherwise>
                <span>
                  <xsl:apply-templates>
                    <xsl:with-param name="nest_level" select="1 + $nest_level" />
                  </xsl:apply-templates>
                </span>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:otherwise>
          <!-- ****** end of : no style defined -->
        </xsl:choose>

      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

Output a list (either numbered or bulleted)

<!--
  ************************************************
  output list item (numbered / bulleted)
  ************************************************
  -->
  <xsl:template name="outputListItem" match="one:List">
    <xsl:param name="listNode" />
    <xsl:param name="itemText" />

    <xsl:variable name="number"        select="$listNode/one:Number" />
    <xsl:variable name="bullet"        select="$listNode/one:Bullet" />
    <xsl:variable name="txtNum"        select="number($number/@text)" />

    <xsl:choose>
      <xsl:when test="$number">
        <xsl:value-of select="normalize-space($itemText)" disable-output-escaping="yes"/>
      </xsl:when>

      <xsl:otherwise>
        <xsl:value-of select="normalize-space($itemText)" disable-output-escaping="yes"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

 

Process page images

In the page’s xml, images are returned as base64 strings.

To process an image into html, the xsl template looks like the following:

  <!--
  ************************************************
  one:Image
  ************************************************
  -->
  <xsl:template match="one:Image">
    <xsl:variable name="imgWidth"select="substring-before( number(./one:Size/@width) * 1.33, '.')"/>
    <xsl:variable name="imgHeight" select="substring-before( number(./one:Size/@height) * 1.33, '.')"/>
    <xsl:variable name="imgData"    select="./one:Data" />
    <xsl:variable name="oneFormat"  select="./@format" />
    <xsl:variable name="htmlformat">
      <xsl:choose>
        <xsl:when test="$oneFormat='png'">
          <xsl:text>data:image/png;base64</xsl:text>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>data:image/jpg;base64</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <xsl:variable name="htmlWidth">
      <xsl:choose>
        <xsl:when test="$imgWidth">
          <xsl:value-of select="$imgWidth"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>96%</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <xsl:variable name="htmlHeight">
      <xsl:choose>
        <xsl:when test="$imgHeight">
          <xsl:value-of select="$imgHeight"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>auto</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>


    <img width="{$htmlWidth}" height="{$htmlHeight}" src="{$htmlformat}, {$imgData}"/>
  </xsl:template>

 

Download the xsl stylesheet

You may download the entire xsl stylesheet (for OneNote 2013 and 2016) here

Yet another [rather simple] OpenXML package explorer!

As its name implies, OpenXML is XML-centric technology (i.e. which extensively uses XML to describe the meta-model and contents of each and every element of a document).

OpenXML documents themselves (MS Word, Excel, PowerPoint…) are compressed packages that include numerous xml files which describe the document's meta-models and contents.

To explore these items, you can simply rename the document file to .zip and then navigate through this zip archive to view the elements that may be of interest.

After proceeding this way, you of course must not forget to rename the file back to its original extension (.docx, .xlsx… or whatever). When working on an OpenXML project, you obviously have to do this so many times a day, which becomes a somehow daunting task!

Some very good tools exist to explore OpenXML documents' structures… but most of them expose a logical view of the document rather than an xml-centric view. Which is of course very helpful… but does not apply when you just need to see the physical structure and the xml content of meta-models and elements.

Moreover, most of these tools are somehow out dated (2007-2009…) and many simply crash on most of our 'modern era' OSs!

Add some other good reasons to write another explorer:

It is august, it is raining and I don't know what to do this week end (while listening to Maria Joao Pires playing Mozart Sonatas!)

 

The purpose and tools?

What I need is:

  • Open an OpenXML document as a zip archive (without having to rename it to .zip!)
  • Explore the content of the archive (folders / files)
  • Explore / search the xml tree of xml files that may be found there

 

Adding a reference to System.IO.Compression and System.IO.Compression.FileSystem, you can (relatively easily) present the archive tree structure like in the following image:

I created a view model layer to map my needs to the zip archive objects of the System.IO.Compression namespace:

iZipEntryVM: a view model for zip archive entry which offers some useful properties related to my purpose.

iZipEntryVmList: a collection of the above object which offers static methods to create the tree of a zip archive file as well as a 'selected item' property and selection change event.

iOfficeVM: (singleton) entry point ('main view model' in a way) that orchestrates the basic commands and selection change events.

 

XML property bags explorer, again, to the rescue!

I wrote in a previous post about using PropertyBags to explore xml nodes.

The xml explorer objects, methods and controls written during this research are now packaged into an independent library that can be referenced in the current project.

What I have to do is just wire the selection event of an item in the archive tree to the xml explorer objects.

 

That is, when I click an xml file on the zip archive tree, the selection change event is fired. If the selected item is an xml content, it is opened and its stream is sent to the xml explorer object to display the node tree.

 

In practice:

  • On the TreeView control's selection changed event, the SelectedItem of the view model collection (iZipEntryVmList) is set.
  • The collection fires its own Selection changed event.
  • iOfficeVM subscribes to that event and send the entry stream to the xml explorer:

 

_myArchiveList.SelectedItemChanged += _list_SelectedItemChanged

private async void _list_SelectedItemChanged(object sender, iZipEntryVM selectedItem)
{
    ZipArchiveEntry    entry    = selectedItem.ZipEntry;

     if(entry == null)
       return;
 
    if(selectedItem.IsXmlFile)
        await Task.Run(() => OpenXmlFile(selectedItem));
}


 

async void OpenXmlFile(iZipEntryVM entryVm)
{
    ZipArchiveEntry    zentry    = entryVm.ZipEntry;
    var        bagVm    = XmlExplorer.Instance.PropertyBagVM;

    if(bagVm != null)
    {
        Stream            stream;
        await Task.Run(() =>
        {
            stream        = zentry.Open();
            // this will build the xml nodes' property bag tree
            bagVm.SourceStream = stream;
        });
    }
}



 

Apart from presenting the xml nodes' tree, XML explorer offers some other useful features:

  • Search for content values and/or node names
  • View / copy the xml string of a selected node
  • Export (save) the current OpenXML item to a file.

 

And, not less important, I no more have to rename a file to do that…!

Some screenshots

Explorer General View

Sample search results

 

Binaries and source code

You may download the binaries here

You may also download the source code here.