Taoffi's blog

prisonniers du temps

covid-5ync – the fight against the crowned virus!

I continue posting about covid-5ync app (a helper app contribution in the fight against the latest coronavirus epidemic)

A major step now done: serializing application's information into xml files!

Hopefully this will allow transmitting research session work among members of our target community: biotechnology engineers.

I also downloaded (@NCBI site) a recent version of the virus RNA sequence (referred to as: MT163719 29903bp RNA linear VRL 10-MAR-2020). A daily work is done by the application to retrieve significant regions on the sequence. The updated version of the (xml) file is then uploaded to the project's web site which will allow gaining time in some tedious analysis tasks.

Serialization / deserialization

Reminder: Technically speaking, the sequence's data structures can be summarized as:

  • Sequence: (that is a collection of nodes, each referring to an item of collection of bases)
    • Identification and info of the sequence (Id, Name, Summary…)
    • A collection of named-regions (identified by start / end index + name and summary)
    • A collection of sub-sequences of 'repeats' occurrences
    • A collection of sub-sequences of hairpins (or 'pair-repeats') occurrences

 

To serialize these information… (in regards of the urgency matters) I decided to simply use a DataContractSerializer (which, as I said in other posts, is not really the best solution for extensibility).

The encountered difficulty was on several aspects:

  • Some information is redundant, notably for sub-sequences (collections of repeats and hairpins)
  • The sequence object being itself a collection of nodes, the serializer did not really allow an easy way to handle its sub-objects. Here we fall into a known problem of cycling!

 

The solution used was:

  • Transform sub-sequences to region lists (start/end indexes + name + occurrences) and deserialize back to sub-sequences
  • Create a serialization-specialized object (using DataContract / DataMember attributes) and process the serialization through that object to avoid the 'cycling' errors.

 

The second difficulty was errors encountered on deserialization while locating and assigning various regions start/end indexes to the sequence's nodes.

Actually, the deserialization submits the sequence's node as a string to be parsed asynchrony. And as you may have guessed, the collection of sequence's nodes was being altered while locating the regions' nodes!

The solution:

  • The Parse method raises Parse Complete event
  • Subscribe to that event and proceed to these operations.

 

That is the short story… will keep you posted about more details later this week!

KEEP SAFE: wash your hands + do not touch your face + keep hope: Humanity will prevail!

covid-5ync – dna search using Linq

This post, follows in the series about technical details of covid-5ync (an open source application contribution in the fight against the new covid-19 virus).

This time, our talk is about searching nucleotides within a dna sequence.

Search is an important task for analyzing a sequence. Actually, to understand the structure of a genome, one task is to locate and classify identical regions and complementary regions (complementary regions = where nucleotides are paired: aót, góc). From the IT point of view, such a task is obviously related to 'search'… and should particularly be optimized.

Using Linq

To locate a sequence of nucleotides (i.e. a string) within a (global) sequence, we may simply iterate into the global sequence nodes (each of which is a char) until we find the searched string. To find all occurrences, we can repeat the process starting at the next location and so on.

Despite the artisanal aspect of the process, that works well for searching a predefined string.

Another task that looks related to search, is to find 'repeats' (Repeats are identical regions on a sequence).

It is somehow different from searching a predefined string. The application actually has to iterate through the sequence, and at each position take a number of nucleotides to compose a string to be searched all over the sequence… (and keep coordinates of the found occurrences).

Proposed solution

In covid-5ync, a dna sequence is a List<dna node> where each node contains an Index (int). Using Linq, we can define a method that returns the string of a given length at a given location (node Index) of the sequence:

public string StringAtIndex(int index, int len)
{
    string str = "";
    var nodes = this.SkipWhile(i => i.Index < index).Take(len);

    foreach(var n in nodes)
        str += n.Code.ToString();

    return str;
}

 

With this method in place, we can efficiently locate all starting nodes of occurrences of a string on the sequence:

IEnumerable<iDnaNode> AllStartoccurrencesOfString(string str)
{
    int len    = str.Length;
    return this.Where(i => StringAtIndex(i.Index, len) == str);
}

 

Sample usage: locate and select occurrences of a string

// get all starting nodes of the string
Var allStarts = AllStartoccurrencesOfString(str).ToList();

// visit the found starting nodes and select (str.Length) consecutive nodes…
foreach( var item in allStarts)
{
    var subSeq = this.SkipWhile(n => n.Index <  item.Index).Take(str.Length);
    subSeq.SelectAllNodes();
}

 

covid-5ync – dna analysis helper application: progress!

A quick post to talk about the progress in this project with a few technical details.

First, a screen capture of what it looks like today (just a little better than before!):

 

The first version of the application is already online: click-once installation. And Yes, we don't have money to have a certificate for the deployment… install only if you trust!... any suggestions about sponsors are also greatly welcome!

 

A few features implemented:

  • Search for string and search for the 'complementary' of string.
  • Paging
  • Zoom-in / out on the sequence

Next steps:

  • Search enhancement: visually locate occurrences
  • Search for unique fragments according to user-provided settings
  • Open file / save selection

Long-term:

  • … endless

 

Now, let us take a quick dive into the mechanics

The first class diagram (updated source code with such documentation is @github)

That seems quite basic(!), actually, as the reality of DNA, and that is one reason it is also fascinating!

  • A sequence (iDnaSequence) is composed of 'nodes' (iDnaNode)
  • A node is noted as a char. It refers to a 'base'.
  • Commonly a set of 4 bases is used ('a', 't', 'g' and 'c'). There may be more, but we can easily handle this when needed.
  • Each individual base has a complementary 'Pair'. A is pairable with T, C with G
  • The 'standard' set of bases (iDnaBaseNucleotides) is a (singleton) list of the 4 bases. It sits as the main reference for nodes. It provides answers to important questions like: is 'c' a valid base?... what the complementary of 'X'?... what is the complementary sequence of the string "atgccca"? and so on.

 

Visual presentation: a start point

There are many ways to present a DNA sequence. To start with something, let us assign a color to each base. The user can later change this to obtain a view according to his or her work area. Technically speaking, we have some constraints:

  • The number of nucleotides of a sequence can be important. For coronavirus, that is roughly 29000. We therefor need 'paging' to display and interact with a sequence.
  • Using identification colors for nucleotides can also help to visually identify meaningful regions of the sequence on hand. For this to be useful, we need to implement zoom-in/out on the displayed sequence.

 

Paging

I simply used the solution exposed in a previous post about doc5ync.

 

Zoom-in / out

I found a good solution through a discussion on Stack Overflow
(credit: https://stackoverflow.com/users/967254/almulo).

  • Find the scroll viewer of the ListView.
  • Handle zoom events as required
var presenter        = UiHelpers.FindChild<ScrollContentPresenter>(listItems, null);
var mouseWheelZoom = new MouseWheelZoom(presenter);
PreviewMouseWheel += mouseWheelZoom.Zoom;

 

Sample screen shots of a zoom-out / in

More details later in a future post.

Please send your remarks, suggestions and contributions on github.

Hope all that will be useful in some way… Time is running… Humanity will prevail

covid-5ync – tackling covid-19

NCBI (National Center for Biotechnology Information) offers a vast database of DNA sequences.

I visited their site to see if I can find information about the last version of the menacing coronavirus.

Yes, that is available. It is precisely named: coronavirus 2 (SARS-CoV-2), and a long list of its DNA sequences is there.

I had worked, long years ago, on DNA sequences analysis and felt like giving a try to see how that sequence can be presented… just to see!

My old app (MFC, C++ app) could open and analyze the sequence. DNA sequence analysis is a long story. As far as I could learn: locate repeats (fragments that are repeated on the sequence), locate 'hairpins' (fragments of complementary nucleotides: a<=>T and g<=>c)… etc.

The old app did not seem quite handy to manipulate the downloaded sequence, so I started writing a new WPF one.

A few hours later, the app could display the sequence in a somehow 'visual appealing' UI, which invited to go ahead for some more significant work.

Covid-19 is not for fun!

Yes, it is not really for fun! I am not yet sure how such work can be useful, but whatever effort everyone can provide might be of help in defeating this new danger. Let us start and see!

For now, what I intend to do is:

  • Port the biotechnology features of the old app to a new handy UI;
  • Publish the app online for biotechnology engineers working on the subject: and get their feedback
  • Upload the source code to github for IT community feedback and contributions

It is a very small step in a long way to defeat that epidemic.

More on this in the next few days / weeks.

Be safe!

doc5ync – word index web page presentation!

Objectives:

  • Display a cloud of index words, each dimensioned relative to its occurrences in e-book information (e-book title, description, author, editor… etc.)
  • On selection of a given word: display the list of e-book references related to the selected word
  • On selection of an e-book: display its information details and all words linked to it

Context:

doc5ync web interface is based on a meta-model engine (simpleSite, currently being renamed to web5ync!).

I talked about meta-models in a past post Here, with some posts about its potential applications here.

The basic concept of meta-models is to describe an object by its set of properties and enable the user to act on these properties by modifying their values in the meta-model database. On runtime, those property values are assigned to each defined object.

In our case, for instance, we have a meta-model describing web page elements, and a meta-model describing the dataset of word index and their related e-books.

For web page elements, the approach considers a web page as a set of html tags (i.e. <div>, <table><tr><td>…, <img… etc.). Where each tag has a set of properties (style, and other attributes) for which you can define the desired values. On runtime, your meta-model-defined web page comes to life by loading its html tags, assigning to each the defined values and injecting the output of the process to the web response.

A dataset is similarly considered as a set of rows (obtained through a data source), each composed of data cells containing values. Data cells can then be either presented and manipulated through web elements (html tags, above) or otherwise manipulated through web services.

Data storage and relationships

As we mentioned in the previous post, index words and their related e-books are stored in database tables as illustrated in the following figure:

Each word of the index provides us with its number of occurrences in e-book text sequences (known on Trie scan).

Html formatting using a SQL view

To reflect this information into a presentation, we used a view to format a html div element for each word relative to its number of occurrences. The query looks like the following code

select
  w.id            as word_id
 , w.n_occurs
 , N'<div style="BASIC STYLE STRING HERE…; display:inline;'

-- add the font-size style relative to number of occurrences
+
case
    when w.n_occurs between 0 and 2    then N' font-size:10pt;'
    when w.n_occurs between 3 and 8    then N' font-size:14pt;'
    when w.n_occurs between 9 and 15    then N' font-size:16pt;'
    when w.n_occurs between 16 and 24    then N' font-size:22pt;'
    when w.n_occurs between 25 and 2147483647    then N' font-size:26pt; '
end

+ N'"'
-- add whatever html attributes we need (hover/click…)
+
N' id="div' + convert(nvarchar(32), w.id)
+ N'" onclick="select_data_cell(''' + convert(nvarchar(32), w.id) + N''');" '
     as word_string_html

-- add other columns if needed
from dbo.doc5_trie_words w
order by w.word_string

 

The above view code provides us with html-preformatted string for each word index in the data row.

Tweaking the data rows into a cloud of words

On a web page, a dataset is commonly displayed as a grid (table / columns / rows), and web5ync knows how to read a data source, and output its rows into that form. But that did not seem to be convenient in our case, because it simply displays index words each on a row which is not really the presentation we are looking for!

 

To resolve this, we simply need to change the dataset web container from a <table> (and its containing rows / cells) into <div> tags (with style=display: inline).

Here a sample of html code of the above presentation:

<table>
  <tr>
    <td>
    <div style="font-size:16pt;" onclick="select_data_cell('29008');">After</div>
  </td>
</tr>
<tr>
  <td>
    <div style="font-size:26pt; onclick="select_data_cell('28526');">after</div>
  </td>
</tr>

<!-- the table rows go on... --> 


And here is a sample html code of the presentation we are looking for:

<div style="display:inline;">
    <div id="td_word_string_html82" style="display:inline;">
       <div style="font-size:26pt;" onclick="select_data_cell('28526');">after</div>
      </div>
</div>

<div style="display:inline;">
    <div id="td_word_string_html83">
      <div style="font-size:16pt;" onclick="select_data_cell('29008');">After</div>
    </div>
</div>

<div style="display:inline;">
   <div style="display:inline;">
      <div style="font-size:10pt;" onclick="select_data_cell('17657');">AFTER</div>
    </div>
</div>

 

Which looks closer to what we want:

Interacting with index words

The second part of our task is to allow the user to interact with the index words: clicking a word = display its related e-books, clicking an e-book = display the e-book details + display index words specifically linked to that e-book.

For this, we are going to use a few of the convenient features of web5ync, namely: Master/details data binding, and Tabs. (I will write more about these features in a future post)

Web5ync master/details binding allows linking a subset of data to a selected item in the master section. Basically, each data section is an iframe. The event of selecting a data row in one iframe can update the document source of one or more iframes. All what we need is: 1. define a column that will be used as the row's id, and 2. define how the value of that id should be passed to the target iframe (typically: url parameter name).

Tabs are convenient in our case as they will allow distributing the information in several areas while optimizing web page space usage.

In the figure above, we have 3 main data tabs: n Explore by words, n Document info and n Selected document words.

On the first tab:

  • clicking a word (in the upper iframe) should display the list of its related e-books (in lower iframe of that same tab)
  • clicking an e-book row on the lower iframe should: first displays its details (an iframe on the second tab), and display all words directly linked to the selected e-book (an iframe in the 3rd tab). (figures below)

In that last tab, we can play once again with the displayed words, to show other documents sharing one of them:

doc5ync Trie index integration tool

That is a maintenance WPF client application for indexing words found in e-book titles and descriptions for the doc5ync project. (http://doc5.5ync.net/)

Before talking about technical details, let us start by some significant screenshots of the app.

1. scanning All languages’ words for 10000 data records with minimum words length of 4 chars.

trie-with-data-window1

After the scan, words are displayed (on the left side of the above figure) highlighting the occurrences of each word (greater font size = more occurrences). This done using a user control itself using a converter.

trie-with-data-word-control

A simple Border enclosing a TextBlock

<UserControl.Resources>
	<conv:TrieWordFontSizeConverter x:Key="fontSizeConverter" />
</UserControl.Resources>
<Grid x:Name="grid_main">
	<Border BorderBrush="DarkGray" CornerRadius="2" Background="#FFEDF0ED" Height="auto" Margin="2" BorderThickness="1">
		<TextBlock Text="{Binding Word}"
					Padding="4px"
					VerticalAlignment="Center"
					HorizontalAlignment="Center"
					FontSize="{Binding ., Converter={StaticResource fontSizeConverter}, FallbackValue=12}"
					>
		</TextBlock>
	</Border>

	</Grid>
 
 
The converter emphasizes the font size relative to the word’s occurrences:
 
public class TrieWordFontSizeConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
    {
        double minFontSize = 11.0,
              defaultFontSize = 12.0,
              maxFontSize = 32.0,
              size;
         if (System.ComponentModel.DesignerProperties.GetIsInDesignMode(new DependencyObject()))
            return defaultFontSize;

         double min = (double) iWordsCentral.Instance.MinOccurrences,
                max = (double) iWordsCentral.Instance.MaxOccurrences;
         iTrieWord word = value as iTrieWord;

         if( word == null)
            return defaultFontSize;

         max = Math.Min(9, max);
        size = (word.Occurrences / max) * maxFontSize;

        if(size > maxFontSize)
            return maxFontSize;
        if(size < minFontSize)
            return minFontSize;
        return size;
}

Load, Scan and link words to data items

The View Model objects and processing flow

trie-with-data-view-model

iWordsCentral is the ‘main’ view model (singleton) which provide word scanning and data object assignment through its ScanWordsData (iData object)

ItemList (iDataItemList) is iData’s member responsible for building the Trie (its member) and assigning Trie’s words to its data items.

On Load button click, the MainWindow calls its LoadData() method.

 
async void ReloadData()
{
	await Task.Run(() => iWordsCentral.Instance.LoadData());
}
 

The method loads data records into (the desired number of records is a parameter… see main figure) and assign it to the ItemsList of the scan object (iData), then calls the iData’s method to build the Trie and assign data items to each of the Trie’s nodes.

 

_scanWordsData.ItemList	= rootList;

bool scanWordsResult = await _scanWordsData.ScanDataWordsAsync(_minWordLength, _includeDocAreaWords, _cancelSource.Token);

 

The iData object calls its ItemList to do the job… its method proceeds as in the following code

public async Task<bool> ScanDataWordsAsyn(int minWordLength, bool scanRootItems, CancellationToken cancelToken)
{
    if(_trie == null)
        _trie = new iTrie();

    // build a single string with all textual items and parse its words
	iTrie		trie			= _trie;
	string		global_string	= "";

        foreach( iDataItem item in this)
        global_string	+= item.StringToParse;
        await Task.Run(() => _trie.LoadFromStringAsync(global_string, minWordLength, notifyChanges: false));

        _trie.Sort();
        List<iTrieWord>	trieWordList	= trie.AllStrings;

        // copy the Trie words (strings) to a DataTrieWord list
	CopyDataWords(trieWordList);

        // assign words to data items
        bool result = await AssignTrieWordsDataAsync(scanRootItems, cancelToken);
	return result;
}


 

The data Item List loops through all its words and data items, calling each data item to assign itself to the given word if it is contained in its data

foreach (var word in _dataWordList)
{
   foreach (var ditem in this)
      await ditem.AssignChildrenTrieWordAsyn(scanRootItems, word, cancelToken);
}

The data item looks for any of its data where a match of the given word is found and assign those items to the word:

var wordItems	= this.Children.Where( i => i.Description != null 

				&& i.Description.IndexOfWholeWord( word.Word) >= 0);

IndexOfWoleWord note

That is (an efficient) string extension which is important to ensure that one whole word is present in data. I struggled to find a solution for this question, and finally found an awesome solution proposed by https://stackoverflow.com/users/337327/palota

 

// credit: https://stackoverflow.com/users/337327/palota
public static int IndexOfWholeWord(this string str, string word)
{
    for (int j = 0; j < str.Length && 
        (j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
        if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) && 
            (j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
            return j;
    return -1;
}
 

Finally, as you may have noticed, for performance measurement, a simple StopWatch is embedded into the main view model to notify elapsed time during the process. For this to have sense, all methods are of course async notifying changes through the UI thread (Dispatcher). You might ignore all the async artifacts in the above code to better concentrate on the processing steps themselves.

Presentation

Once all the processing is done, there is still the presentation UI work to do in order to display the document list of a selected word.

This will be the subject of a next post.

 

Jet.oledb.4.0 and utf8 bom story

As you may know, a text file may specify its encoding by a ‘bom’ (byte order mark) using several bytes at its beginning. For utf8 encoding the signature bytes are: 0xef, 0xbb, 0xbf.

I came across an issue while manipulating csv files, for which I decided to use utf8 (thinking that was a good choice for a multi-cultural environment!). The process involved reading and writing back to the same file after some insertions and updates. All using microsoft.jet.oledb.4.0 provider (with a schema.ini specifying CharacterSet=65001 (65001 being utf8 code page)).

My csv files had a header row of which the first column was, ironically, named ‘Match ID’. A few manipulations revealed a somehow strange behavior. Although the debugger showed that the first column’s name is ‘Match ID’, I could no more access ‘Match ID’ column by its name. Using the watch window, I asked the debugger:
myColumn.ColumnName == "Match ID"… it replied false… weird!

Viewing the column name's CharArray in hexa offers a more significant information:

 column name issue

That is evidently endless! With the time going, as you manipulate your columns and rewrite back to the csv file, you end up by having your 'Match ID' column prefixed by bytes from the utf8 bom code as many times you rewrite the csv file. And if you are a nice guy who lets the users reorder the columns as they need, you may end up by having all your columns affected by that issue!

column name bytes

Changing files’ encoding to unicode (whose bom signature is 0xff 0xfe 0xff 0xfe) does not reproduce the issue. Which makes it clear that the source of annoyance is not utf8 but rather the jet.oledb.4 data provider with utf8. Still, identifying the source is half way of solving the issue :).

How to solve this?

Well, you may think of ‘sanitizing’ your column names at every load! Which, in my view, does not seem quite practical.
In my case I just switched to unicode (despite more bytes waste!) to preserve multi-cultural data requirements.

xsl witness!

Transforming xml content through xsl stylesheets is a useful and relatively common feature in the development process. I talked about this in the previous post about OneNote pages html preview.
Searching in my personal code toolbox, I just found this ‘iXslWitness’, a tool I wrote a couple of years ago to check the effectiveness of a stylesheet in transforming xml to html. Its usage is quite simple: you select an xml file and the xsl stylesheet to use. And you get the html transformed content.
I hope that can be useful for anyone involved in such tasks!
A screenshot of transforming a OneNote page xml content (a list of ‘The World If’ publications of The Economist newspaper):

worldif2017-witness

You can download the tool Here.
The source code is Here.

OneNote page xsl transformation

As we saw in a previous post, OneNote API is xml-based. You call the OneNote Application object to obtain almost all needed information as xml strings.

One of that information is the page content. Whose schema is defined as explained in the previous post.

Once you get the page content’s xml string, it is a little bit of work to transform that into a useful html page

To do this, I used an xslt style sheet which is the subject of the current post.

How does it work?

Assume you have an xsl sheet string and the xml string of a page. You can then process both in a way similar to the following code:

 

using System.Xml.Xsl;

 

public static string XmlToHtml(string xmlString, string xslString)
{
    string            html;
    XslCompiledTransform    transform  = null;
    XmlReader         xslReader        = null,
                      xmlReader        = null;
    MemoryStream      memStream        = null;
    StreamReader      sr               = null;
    MemoryStream      xslStream        = null,
                      xmlStream        = null;
    byte[]            xslBytes         = Encoding.UTF8.GetBytes( xslString),
                      xmlBytes         = Encoding.UTF8.GetBytes( xmlString);

    xslStream    = new MemoryStream( xslBytes);
    xmlStream    = new MemoryStream( xmlBytes);

    transform    = new XslCompiledTransform();
    xslReader    = XmlReader.Create( xslStream);
    xmlReader    = XmlReader.Create( xmlStream);
    memStream    = new MemoryStream();

    transform.Load( xslReader);
    transform.Transform( xmlReader, null, memStream);

    memStream.Position    = 0;

    sr      = new StreamReader( memStream);
    html    = sr.ReadToEnd();
               
    xslStream.Close();
    xmlStream.Close();
    memStream.Close();
    sr.Close();
    xslReader.Close();

    return html;
}

 

 

 

How to get the page xml content?

Assuming you have the page ID, here is a sample code to query the page content through OneNote API:

 

public static string GetPageContentXmlString(string pageId)
{
    var         onenoteApp  = new Application();
    string      pageXml     = null;
    PageInfo    pageInfo    = PageInfo.piAll;

    onenoteApp.GetPageContent(pageId, out pageXml, pageInfo);
    return pageXml;
}

 

Reminder: The page content xsd schema

 

The xsl style sheet general structure

Let us use the iXml explorer to navigate through the xsl stylesheet used to transform the page’s xml into html.

(I searched for ‘match’ to locate templates defined in the stylesheet).

As you may notice, we have an xsl template to handle each OneNote page defined xsd type:

  • A template to process the page root information
  • A template to process the page’s title
  • A template to process outline elements
  • A template to process OEChildren collection
  • … and so forth

Sample xsl code

Process OneNote Element (one:OE)

  <!--
  ************************************************
  one:OE
  ************************************************
  -->
  <xsl:template match="one:OE">
    <xsl:param name="nest_level" select="0" />

    <xsl:variable name="listNode" select="./one:List" />
    <xsl:variable name="quickStyleIndex" select="./@quickStyleIndex" />
    <xsl:variable name ="styleNode" select="msxsl:node-set($quickStyleList)/quickStyle[@index=$quickStyleIndex]" />

    <xsl:variable name="quickStyle">
      <xsl:choose>
        <xsl:when test="$styleNode">
          <xsl:value-of select="$styleNode/@style"/>
        </xsl:when>
      </xsl:choose>
    </xsl:variable>

    <!-- is there any list here? -->
    <xsl:choose>
      <xsl:when test="$listNode">
        <xsl:variable name="number"    select="$listNode/one:Number" />
        <xsl:variable name="txt"      select="./one:T" />
        <xsl:variable name="listItemTag">
          <xsl:text>li</xsl:text>
        </xsl:variable>

        <!-- list tag: either <ol> or <ul> -->
        <xsl:variable name="listTag">
          <xsl:choose>
            <xsl:when test="$number">
              <xsl:text>ol</xsl:text>
            </xsl:when>
            <xsl:otherwise>
              <xsl:text>ul</xsl:text>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:variable>

        <xsl:choose>
          <!-- numbered list? output <ol> -->
          <xsl:when test="$number">
            <xsl:variable name="txtNum"        select="number($number/@text)" />
            <xsl:variable name="fontNum"      select="$number/@font" />
            <xsl:variable name="tag" select="concat('<', $listTag, '>')" />
            <xsl:value-of disable-output-escaping="yes" select="$tag"/>

            <!-- *********** output <li value="xx"> ' style="font-family:', $fontNum, ';"',************* -->
            <xsl:value-of disable-output-escaping="yes" select="concat('<', $listItemTag, ' value=', '"', $txtNum, '">')" />

            <xsl:call-template name="outputListItem">
              <xsl:with-param name="listNode" select="$listNode" />
              <xsl:with-param name="itemText" select="$txt" />
            </xsl:call-template>
          </xsl:when>

          <!-- bullet ? output <ul> -->
          <xsl:otherwise>
            <xsl:variable name="tag" select="concat('<', $listTag, '>')" />
            <xsl:value-of disable-output-escaping="yes" select="$tag"/>
            <!-- *********** output <li> ************* -->
            <xsl:value-of disable-output-escaping="yes" select="concat('<', $listItemTag,'>')" />

            <xsl:call-template name="outputListItem">
              <xsl:with-param name="listNode" select="$listNode" />
              <xsl:with-param name="itemText" select="$txt" />
            </xsl:call-template>
          </xsl:otherwise>

        </xsl:choose>

        <!-- process list's sub items -->
        <xsl:apply-templates select="./one:OEChildren">
          <xsl:with-param name="nest_level" select="1 + $nest_level" />
        </xsl:apply-templates>

        <xsl:apply-templates select="./one:Table" />

        <!-- close the list item tag -->
        <xsl:value-of disable-output-escaping="yes" select="concat('</', $listItemTag,'>')" />
        <!-- close the list tag -->
        <xsl:value-of disable-output-escaping="yes" select="concat('</', $listTag, '>')"/>
      </xsl:when>

      <!-- no list: process all sub items -->
      <xsl:otherwise>
        <xsl:variable name="style0" >
          <xsl:call-template name="string-replace-all">
            <xsl:with-param name="text" select="./@style" />
            <xsl:with-param name="replace" select="''" />
            <xsl:with-param name="by" select="''" />
          </xsl:call-template>
        </xsl:variable>

        <xsl:variable name="alignment" select="./@alignment" />

        <xsl:variable name="style">
          <xsl:choose>
            <xsl:when test="$alignment='right'">
              <xsl:value-of select="concat('width:100%; float:right; display:inline; text-align:right;', $style0)"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="$style0"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:variable>

        <xsl:choose>
          <xsl:when test="string-length($style)>0">
            <span style="{$style}">
              <xsl:apply-templates />
            </span>
          </xsl:when>

          <!-- ****** no style defined -->
          <xsl:otherwise>
            <xsl:choose>
              <xsl:when test="$quickStyle">
                <span style="{$quickStyle}">
                  <xsl:apply-templates>
                    <xsl:with-param name="nest_level" select="1 + $nest_level" />
                  </xsl:apply-templates>
                </span>
              </xsl:when>

              <xsl:otherwise>
                <span>
                  <xsl:apply-templates>
                    <xsl:with-param name="nest_level" select="1 + $nest_level" />
                  </xsl:apply-templates>
                </span>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:otherwise>
          <!-- ****** end of : no style defined -->
        </xsl:choose>

      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

Output a list (either numbered or bulleted)

<!--
  ************************************************
  output list item (numbered / bulleted)
  ************************************************
  -->
  <xsl:template name="outputListItem" match="one:List">
    <xsl:param name="listNode" />
    <xsl:param name="itemText" />

    <xsl:variable name="number"        select="$listNode/one:Number" />
    <xsl:variable name="bullet"        select="$listNode/one:Bullet" />
    <xsl:variable name="txtNum"        select="number($number/@text)" />

    <xsl:choose>
      <xsl:when test="$number">
        <xsl:value-of select="normalize-space($itemText)" disable-output-escaping="yes"/>
      </xsl:when>

      <xsl:otherwise>
        <xsl:value-of select="normalize-space($itemText)" disable-output-escaping="yes"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

 

Process page images

In the page’s xml, images are returned as base64 strings.

To process an image into html, the xsl template looks like the following:

  <!--
  ************************************************
  one:Image
  ************************************************
  -->
  <xsl:template match="one:Image">
    <xsl:variable name="imgWidth"select="substring-before( number(./one:Size/@width) * 1.33, '.')"/>
    <xsl:variable name="imgHeight" select="substring-before( number(./one:Size/@height) * 1.33, '.')"/>
    <xsl:variable name="imgData"    select="./one:Data" />
    <xsl:variable name="oneFormat"  select="./@format" />
    <xsl:variable name="htmlformat">
      <xsl:choose>
        <xsl:when test="$oneFormat='png'">
          <xsl:text>data:image/png;base64</xsl:text>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>data:image/jpg;base64</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <xsl:variable name="htmlWidth">
      <xsl:choose>
        <xsl:when test="$imgWidth">
          <xsl:value-of select="$imgWidth"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>96%</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>

    <xsl:variable name="htmlHeight">
      <xsl:choose>
        <xsl:when test="$imgHeight">
          <xsl:value-of select="$imgHeight"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text>auto</xsl:text>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>


    <img width="{$htmlWidth}" height="{$htmlHeight}" src="{$htmlformat}, {$imgData}"/>
  </xsl:template>

 

Download the xsl stylesheet

You may download the entire xsl stylesheet (for OneNote 2013 and 2016) here

Json object explorer

[json objects =>to property bags =>to objects]

Reducing dependency between clients and services is a major common question in software solutions.

One important area of client/server dependencies lies in the structure of objects involved in exchanged messages (requests / responses). For instance: a new property inserted to an object on service side, often crashes the other side (client) until the new property is introduced on the involved object.

I previously posted about loose coupling through property bags abstractions.

My first approach was based on creating a common convention between service and client which implies transforming involved objects into property bags whose values would be assigned as needed to business objects at each side on runtime. That still seems to be a 'best solution' in my point of view.

Another approach is to transform the received objects (at either side: server/client) into property bags before assigning their values to the related objects.

This second approach is better suited for situations where creating a common convention would be difficult to put in place.

While working on some projects based on soap-xml messages, I wrote a simple transformer: [xml => property bags => objects].The transformer then helped write an xml explorer (which actually views xml content as its property bag tree. You can read about this in a previous post).

Another project presented a new challenge in that area, as the service (JEE) was using Json format for its messages. In collaboration with the Java colleagues, we could implement the property bag approach which helped ease client / server versioning issues.

A visual tool, similar to xml explorer, was needed for developers to explore json messages' structures. And that was time for me to write a new json <==>-property bag parser.

The goal was to:

  • Transform json content to property bags
  • Display the transformed property bag tree

Using Newtonsoft's Json library – notably its Linq extensions – was essential.

Hereafter the global dependency diagram of the Json explorer app:

 

The main method in the transformation is ParseJsonString to which you provide the string to be parsed.

Its logic is rather simple:

  • A json string is the representation of either:
    • An object:
      • Read its properties (which may contain arrays… see below)
    • Or an array of:
      • Objects: read the array's objects
      • Arrays: read the array's arrays

 

Code snippets

The parse json string method

public static PropertyBag ParseJsonString(string jsonString)
{
    JObject jObj = null;
    PropertyBag bag = new PropertyBag("Json");
    ObjProperty bagRootNode;
    JArray jArray = null;
    string exceptionString = "";

    /// the json string is either:
    /// * a json object
    /// * a json array
    /// * or an invalid string

    // try to parse the string as a JsonObject
    try
    {
        jObj = JObject.Parse(jsonString);
    }
    catch (Exception ex)
    {
        jObj = null;
        exceptionString = ex.Message;
    }

    // try to parse the string as a JArray
    if(jObj == null)
    {
        try
        {
            jArray = JArray.Parse(jsonString);
        }
        catch (Exception ex2)
        {
            jArray = null;
            exceptionString = ex2.Message;
        }

        if(jArray == null)
        {
            bag.Add(new ObjProperty(_exceptionString, null, false) { ValueAsString = exceptionString });
            return bag;
        }
    }

    bagRootNode = new ObjProperty("JsonRoot", null, false);

    if(bagRootNode.Children == null)
        bagRootNode.Children = new PropertyBag();

    bag.Add(bagRootNode);

    if(jObj != null)
    {
        bagRootNode.SourceDataType = typeof(JObject);
        bagRootNode.Children = ParseJsonObject(bagRootNode, jObj);
    }
    else if(jArray != null)
    {
        bagRootNode.SourceDataType = typeof(JArray);
        bagRootNode.Children = ParseJsonArray(bagRootNode, jArray);
    }

    return bag;
}

 

Parse json array code snippet

 

private static PropertyBag ParseJsonArray(ObjProperty parentItem, JArray jArray)
{
    if(parentItem == null || jArray == null)
        return null;

    ObjProperty childItem;

    if(parentItem.Children == null)
        parentItem.Children = new PropertyBag();

    PropertyBag curBag        = parentItem.Children;

    foreach(var item in jArray.Children())
    {
        JObject jo     = item as JObject;
        JArray subArray = item as JArray;
        PropertyBag childBag;
        Type nodeType = subArray != null ? typeof(JArray) : typeof(JObject);

        childItem    = new ObjProperty("item", parentItem, false) { SourceDataType = nodeType };

        if (jo != null)
            childBag = ParseJsonObject(childItem, jo);
        else if(subArray != null)
            childBag    = ParseJsonArray(childItem, subArray);
        else
            continue;

        curBag.Add(childItem);
    }

    return curBag;
}

 

Json to Xml

As, now, we have the json content in property bags, we can almost directly get the xml equivalent (see screenshot below).

The used sample json string: for the following screenshot:

 

{
    "web-app": {
    "servlet": [
    {
        "servlet-name": "cofaxCDS",
        "servlet-class": "org.cofax.cds.CDSServlet",
        "init-param": {
            "configGlossary:installationAt": "Philadelphia, PA",
            "configGlossary:adminEmail": "ksm@pobox.com",
            "configGlossary:poweredBy": "Cofax",
            …
            …
            "maxUrlLength": 500
            }
    },
    {
        "servlet-name": "cofaxEmail",
        "servlet-class": "org.cofax.cds.EmailServlet",
        "init-param": {
            "mailHost": "mail1",
            "mailHostOverride": "mail2"
            }
    },
    {
        "servlet-name": "cofaxAdmin",
        "servlet-class": "org.cofax.cds.AdminServlet"
    },

    {
        "servlet-name": "fileServlet",
        "servlet-class": "org.cofax.cds.FileServlet"
    },
    {
        "servlet-name": "cofaxTools",
        "servlet-class": "org.cofax.cms.CofaxToolsServlet",
        "init-param": {
            "templatePath": "toolstemplates/",
            "log": 1,
            …
            …
            "adminGroupID": 4,
            "betaServer": true
            }
        }
    ],
    "servlet-mapping": {
        "cofaxCDS": "/",
        "cofaxEmail": "/cofaxutil/aemail/*",
        "cofaxAdmin": "/admin/*",
        "fileServlet": "/static/*",
        "cofaxTools": "/tools/*"
        },

    "taglib": {
        "taglib-uri": "cofax.tld",
        "taglib-location": "/WEB-INF/tlds/cofax.tld"
        }
    }
}



Screenshot

 

You may download the binaries here!

The source code is available here!