Taoffi's blog

prisonniers du temps

covid-5ync – dna search using Linq

This post, follows in the series about technical details of covid-5ync (an open source application contribution in the fight against the new covid-19 virus).

This time, our talk is about searching nucleotides within a dna sequence.

Search is an important task for analyzing a sequence. Actually, to understand the structure of a genome, one task is to locate and classify identical regions and complementary regions (complementary regions = where nucleotides are paired: aót, góc). From the IT point of view, such a task is obviously related to 'search'… and should particularly be optimized.

Using Linq

To locate a sequence of nucleotides (i.e. a string) within a (global) sequence, we may simply iterate into the global sequence nodes (each of which is a char) until we find the searched string. To find all occurrences, we can repeat the process starting at the next location and so on.

Despite the artisanal aspect of the process, that works well for searching a predefined string.

Another task that looks related to search, is to find 'repeats' (Repeats are identical regions on a sequence).

It is somehow different from searching a predefined string. The application actually has to iterate through the sequence, and at each position take a number of nucleotides to compose a string to be searched all over the sequence… (and keep coordinates of the found occurrences).

Proposed solution

In covid-5ync, a dna sequence is a List<dna node> where each node contains an Index (int). Using Linq, we can define a method that returns the string of a given length at a given location (node Index) of the sequence:

public string StringAtIndex(int index, int len)
{
    string str = "";
    var nodes = this.SkipWhile(i => i.Index < index).Take(len);

    foreach(var n in nodes)
        str += n.Code.ToString();

    return str;
}

 

With this method in place, we can efficiently locate all starting nodes of occurrences of a string on the sequence:

IEnumerable<iDnaNode> AllStartoccurrencesOfString(string str)
{
    int len    = str.Length;
    return this.Where(i => StringAtIndex(i.Index, len) == str);
}

 

Sample usage: locate and select occurrences of a string

// get all starting nodes of the string
Var allStarts = AllStartoccurrencesOfString(str).ToList();

// visit the found starting nodes and select (str.Length) consecutive nodes…
foreach( var item in allStarts)
{
    var subSeq = this.SkipWhile(n => n.Index <  item.Index).Take(str.Length);
    subSeq.SelectAllNodes();
}

 

covid-5ync – dna analysis helper application: progress!

A quick post to talk about the progress in this project with a few technical details.

First, a screen capture of what it looks like today (just a little better than before!):

 

The first version of the application is already online: click-once installation. And Yes, we don't have money to have a certificate for the deployment… install only if you trust!... any suggestions about sponsors are also greatly welcome!

 

A few features implemented:

  • Search for string and search for the 'complementary' of string.
  • Paging
  • Zoom-in / out on the sequence

Next steps:

  • Search enhancement: visually locate occurrences
  • Search for unique fragments according to user-provided settings
  • Open file / save selection

Long-term:

  • … endless

 

Now, let us take a quick dive into the mechanics

The first class diagram (updated source code with such documentation is @github)

That seems quite basic(!), actually, as the reality of DNA, and that is one reason it is also fascinating!

  • A sequence (iDnaSequence) is composed of 'nodes' (iDnaNode)
  • A node is noted as a char. It refers to a 'base'.
  • Commonly a set of 4 bases is used ('a', 't', 'g' and 'c'). There may be more, but we can easily handle this when needed.
  • Each individual base has a complementary 'Pair'. A is pairable with T, C with G
  • The 'standard' set of bases (iDnaBaseNucleotides) is a (singleton) list of the 4 bases. It sits as the main reference for nodes. It provides answers to important questions like: is 'c' a valid base?... what the complementary of 'X'?... what is the complementary sequence of the string "atgccca"? and so on.

 

Visual presentation: a start point

There are many ways to present a DNA sequence. To start with something, let us assign a color to each base. The user can later change this to obtain a view according to his or her work area. Technically speaking, we have some constraints:

  • The number of nucleotides of a sequence can be important. For coronavirus, that is roughly 29000. We therefor need 'paging' to display and interact with a sequence.
  • Using identification colors for nucleotides can also help to visually identify meaningful regions of the sequence on hand. For this to be useful, we need to implement zoom-in/out on the displayed sequence.

 

Paging

I simply used the solution exposed in a previous post about doc5ync.

 

Zoom-in / out

I found a good solution through a discussion on Stack Overflow
(credit: https://stackoverflow.com/users/967254/almulo).

  • Find the scroll viewer of the ListView.
  • Handle zoom events as required
var presenter        = UiHelpers.FindChild<ScrollContentPresenter>(listItems, null);
var mouseWheelZoom = new MouseWheelZoom(presenter);
PreviewMouseWheel += mouseWheelZoom.Zoom;

 

Sample screen shots of a zoom-out / in

More details later in a future post.

Please send your remarks, suggestions and contributions on github.

Hope all that will be useful in some way… Time is running… Humanity will prevail

covid-5ync – tackling covid-19

NCBI (National Center for Biotechnology Information) offers a vast database of DNA sequences.

I visited their site to see if I can find information about the last version of the menacing coronavirus.

Yes, that is available. It is precisely named: coronavirus 2 (SARS-CoV-2), and a long list of its DNA sequences is there.

I had worked, long years ago, on DNA sequences analysis and felt like giving a try to see how that sequence can be presented… just to see!

My old app (MFC, C++ app) could open and analyze the sequence. DNA sequence analysis is a long story. As far as I could learn: locate repeats (fragments that are repeated on the sequence), locate 'hairpins' (fragments of complementary nucleotides: a<=>T and g<=>c)… etc.

The old app did not seem quite handy to manipulate the downloaded sequence, so I started writing a new WPF one.

A few hours later, the app could display the sequence in a somehow 'visual appealing' UI, which invited to go ahead for some more significant work.

Covid-19 is not for fun!

Yes, it is not really for fun! I am not yet sure how such work can be useful, but whatever effort everyone can provide might be of help in defeating this new danger. Let us start and see!

For now, what I intend to do is:

  • Port the biotechnology features of the old app to a new handy UI;
  • Publish the app online for biotechnology engineers working on the subject: and get their feedback
  • Upload the source code to github for IT community feedback and contributions

It is a very small step in a long way to defeat that epidemic.

More on this in the next few days / weeks.

Be safe!

doc5ync Trie integration tool - UI Tag cloud, paging and navigation

The integration client tool I talked about in the previous post, we need to display the list of words in a way similar to tag clouds in blog post.

For this we will use a ListView with some customization (see Xaml code below).

Paging

A more important question is the number of items to show. As we, in most cases, have thousands of items to display, we need a paging mechanism.

A solution – a Linq extension - proposed by https://stackoverflow.com/users/69313/wim-haanstra was a good base for a generic paging module:

 

// credit: https://stackoverflow.com/users/69313/wim-haanstra
// usage: MyQuery.Page(pageNumber, pageSize)
public static class LinqPaging
{
    // used by LINQ to SQL
    public static IQueryable<TSource> Page<TSource>(this IQueryable<TSource> source, int page, int pageSize)
    {
        return source.Skip((page - 1) * pageSize).Take(pageSize);
    }

    // used by LINQ
    public static IEnumerable<TSource> Page<TSource>(this IEnumerable<TSource> source, int page, int pageSize)
    {
        return source.Skip((page - 1) * pageSize).Take(pageSize);
    }

}

trie-with-data-paging-base

iObjectPaging is now a generic class accepting any collection of data to be paged through calls to the Linq extension.

Let us derive, from this base, a two specific paging classes: one for our words and another for our documents (DataItems):

trie-with-data-paging-2

That is all what we need for paging. Each list will simply assign its data to the corresponding paging object and the UI elements will be bound to the CurrentPageData collection. Next / Previous buttons will allow navigating through the collection pages.

The main view model, for instance, declares a paging member:

protected iWordPaging	_wordPaging	= new iWordPaging(200);

And assigns its Words collection to this paging member whenever the collection changes:

_wordPaging.SourceCollection	= ItemList?.AllWords;

 

Words as Tag Cloud

A ListView control should be customized for this.

We need to customize its ItemsPanel and ItemTemplate:

<ListView x:Name="listItems" 
			Grid.Row="1" 
			ItemsSource="{Binding WordPaging.CurrentPageData, IsAsync=True}" 
			BorderBrush="#FFA3A3A4" BorderThickness="1"
			SelectedItem="{Binding SelectedWord, Mode=TwoWay, IsAsync=True}" Background="{x:Null}"
			Padding="12" ScrollViewer.HorizontalScrollBarVisibility="Disabled"
			>
   <ListBox.ItemsPanel>
       <ItemsPanelTemplate>
          <WrapPanel MaxWidth="{Binding ElementName=listItems, Path=ActualWidth, Converter={StaticResource widthConverter}}" HorizontalAlignment="Left" Height="auto" Margin="12,0,12,0" />
       </ItemsPanelTemplate>
   </ListBox.ItemsPanel>

   <ListView.ItemTemplate>
       <DataTemplate>
          <local:iWordCtrl DataContext="{Binding }" Width="120" Height="40" />
       </DataTemplate>
    </ListView.ItemTemplate>
  </ListView>
 

Data items grid

A DataGrid bound to the selected Word data items will display its (paged) data items:

<DataGrid ItemsSource="{Binding CurrentPageData, IsAsync=True}">
   <DataGrid.Columns>
    …
    …


 

With this in place, we are now able to:

  • Navigate though word pages
  • When the selected word changes, its related data (paged) items (e-books) are displayed in the DataGrid…
  • Next / Previous buttons can be used to navigate, and will be enabled or disabled according to the paging context (see paging base class in the diagram above)
  • A list of pages (combo box) can also allow to go to a specific page

 

trie-with-data-paged-word-cloud

 

Sample paged datagrid of e-books containing the selected word

trie-with-data-paged-datagrid

In a next post, we will see the database integration process

doc5ync Trie index integration tool

That is a maintenance WPF client application for indexing words found in e-book titles and descriptions for the doc5ync project. (http://doc5.5ync.net/)

Before talking about technical details, let us start by some significant screenshots of the app.

1. scanning All languages’ words for 10000 data records with minimum words length of 4 chars.

trie-with-data-window1

After the scan, words are displayed (on the left side of the above figure) highlighting the occurrences of each word (greater font size = more occurrences). This done using a user control itself using a converter.

trie-with-data-word-control

A simple Border enclosing a TextBlock

<UserControl.Resources>
	<conv:TrieWordFontSizeConverter x:Key="fontSizeConverter" />
</UserControl.Resources>
<Grid x:Name="grid_main">
	<Border BorderBrush="DarkGray" CornerRadius="2" Background="#FFEDF0ED" Height="auto" Margin="2" BorderThickness="1">
		<TextBlock Text="{Binding Word}"
					Padding="4px"
					VerticalAlignment="Center"
					HorizontalAlignment="Center"
					FontSize="{Binding ., Converter={StaticResource fontSizeConverter}, FallbackValue=12}"
					>
		</TextBlock>
	</Border>

	</Grid>
 
 
The converter emphasizes the font size relative to the word’s occurrences:
 
public class TrieWordFontSizeConverter : IValueConverter
{
    public object Convert(object value, Type targetType, object parameter, CultureInfo culture)
    {
        double minFontSize = 11.0,
              defaultFontSize = 12.0,
              maxFontSize = 32.0,
              size;
         if (System.ComponentModel.DesignerProperties.GetIsInDesignMode(new DependencyObject()))
            return defaultFontSize;

         double min = (double) iWordsCentral.Instance.MinOccurrences,
                max = (double) iWordsCentral.Instance.MaxOccurrences;
         iTrieWord word = value as iTrieWord;

         if( word == null)
            return defaultFontSize;

         max = Math.Min(9, max);
        size = (word.Occurrences / max) * maxFontSize;

        if(size > maxFontSize)
            return maxFontSize;
        if(size < minFontSize)
            return minFontSize;
        return size;
}

Load, Scan and link words to data items

The View Model objects and processing flow

trie-with-data-view-model

iWordsCentral is the ‘main’ view model (singleton) which provide word scanning and data object assignment through its ScanWordsData (iData object)

ItemList (iDataItemList) is iData’s member responsible for building the Trie (its member) and assigning Trie’s words to its data items.

On Load button click, the MainWindow calls its LoadData() method.

 
async void ReloadData()
{
	await Task.Run(() => iWordsCentral.Instance.LoadData());
}
 

The method loads data records into (the desired number of records is a parameter… see main figure) and assign it to the ItemsList of the scan object (iData), then calls the iData’s method to build the Trie and assign data items to each of the Trie’s nodes.

 

_scanWordsData.ItemList	= rootList;

bool scanWordsResult = await _scanWordsData.ScanDataWordsAsync(_minWordLength, _includeDocAreaWords, _cancelSource.Token);

 

The iData object calls its ItemList to do the job… its method proceeds as in the following code

public async Task<bool> ScanDataWordsAsyn(int minWordLength, bool scanRootItems, CancellationToken cancelToken)
{
    if(_trie == null)
        _trie = new iTrie();

    // build a single string with all textual items and parse its words
	iTrie		trie			= _trie;
	string		global_string	= "";

        foreach( iDataItem item in this)
        global_string	+= item.StringToParse;
        await Task.Run(() => _trie.LoadFromStringAsync(global_string, minWordLength, notifyChanges: false));

        _trie.Sort();
        List<iTrieWord>	trieWordList	= trie.AllStrings;

        // copy the Trie words (strings) to a DataTrieWord list
	CopyDataWords(trieWordList);

        // assign words to data items
        bool result = await AssignTrieWordsDataAsync(scanRootItems, cancelToken);
	return result;
}


 

The data Item List loops through all its words and data items, calling each data item to assign itself to the given word if it is contained in its data

foreach (var word in _dataWordList)
{
   foreach (var ditem in this)
      await ditem.AssignChildrenTrieWordAsyn(scanRootItems, word, cancelToken);
}

The data item looks for any of its data where a match of the given word is found and assign those items to the word:

var wordItems	= this.Children.Where( i => i.Description != null 

				&& i.Description.IndexOfWholeWord( word.Word) >= 0);

IndexOfWoleWord note

That is (an efficient) string extension which is important to ensure that one whole word is present in data. I struggled to find a solution for this question, and finally found an awesome solution proposed by https://stackoverflow.com/users/337327/palota

 

// credit: https://stackoverflow.com/users/337327/palota
public static int IndexOfWholeWord(this string str, string word)
{
    for (int j = 0; j < str.Length && 
        (j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
        if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) && 
            (j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
            return j;
    return -1;
}
 

Finally, as you may have noticed, for performance measurement, a simple StopWatch is embedded into the main view model to notify elapsed time during the process. For this to have sense, all methods are of course async notifying changes through the UI thread (Dispatcher). You might ignore all the async artifacts in the above code to better concentrate on the processing steps themselves.

Presentation

Once all the processing is done, there is still the presentation UI work to do in order to display the document list of a selected word.

This will be the subject of a next post.

 

xsl witness!

Transforming xml content through xsl stylesheets is a useful and relatively common feature in the development process. I talked about this in the previous post about OneNote pages html preview.
Searching in my personal code toolbox, I just found this ‘iXslWitness’, a tool I wrote a couple of years ago to check the effectiveness of a stylesheet in transforming xml to html. Its usage is quite simple: you select an xml file and the xsl stylesheet to use. And you get the html transformed content.
I hope that can be useful for anyone involved in such tasks!
A screenshot of transforming a OneNote page xml content (a list of ‘The World If’ publications of The Economist newspaper):

worldif2017-witness

You can download the tool Here.
The source code is Here.

Revisiting the Trie –applied to text search and indexing [.net]

A Trie is an efficient tree structure for storing and searching hierarchically-structured information.

For more detailed information you may have a look Here, Here or Here… to mention a few.

In this article, I take 'Text' as the information to be handled.

Text can be viewed as a tree of 'character' nodes. Each set of these nodes compose a 'Word' building block. Several 'Words' building block may share a same set of character nodes.

Let us look at words like: trie, tree, try, trying. All of them share the 't' and 'r' root nodes. 'try' and 'trying' have their 't', 'r', 'y' as common nodes. This can be presented in the following simple diagram:

Or, in a more graphical presentation, like the figure below (green nodes represent end of words):

 

In the case of Text, the efficiency of a Trie is that any word of a given text will start with one of the known alphabetical characters of the language used. Which represents a relatively limited number of nodes.

To enhance our tree structure, we may choose to compose our root nodes with only characters used in the manipulated text (i.e. dynamically compose our root character dictionary).

Trie classes

The following class diagram presents the main Trie model used in the application code:

 

  • CharDictionaryItem: stores one character information (its 'char' value… from which we can get its int value)
  • CharDictionary: the collection or CharDictionaryItem used as the root dictionary for all nodes
  • TrieNode: a trie node item. It refers to one of the above dictionary's nodes. Contains a IsEndOfWord flag that tells whether the node is the end of a word building bloc. And provides information related to its status and position in the tree (Children, Neighbors, IsExpanded…). Through its tree position and flags, a TrieNode object can provide us with useful information such as 'words' (Strings) found at its specific position.
  • TrieNodeList: a collection of TrieNode items.
  • Trie: is the central object. It contains a CharDictionary and a TrieNodeList. This class is implemented as a singleton in the current sample application.

Collection indexers

Collections (CharDictionary, TrieNodeList) expose a useful indexer that retrieves an item by its character. Those indexers will be used through the code for retrieving items.

TrieNodeList indexer:

 

public TrieNode this[char c]
{
get { return this.FirstOrDefault(item => item.Character == c); }
}

CharDictionary indexer:

 

public CharDictionaryItem this[char c]
{
get { return this.FirstOrDefault( item => item.Character == c); }
}

The sample scenario

The sample application proposes the following usage scenario:

  • User selects a text file
  • Parse the file's contents: get its words blocks
  • Compose the root character dictionary
  • Build the words tree
  • Display the presentation tree + additional search functionalities

Parsing text – sample code

Parse text words (having a minimum length: (= 3 chars in the sample app))

 

int ParseStringTask(string str, int minWordLength)
{
// split text words
string[] words = str.Split(new string[]
{
" ", "\r\n", "(", ")", ",", ".", "'", "\"",
":", "/", "'", "'", "«", "»", "-", "+", "=",
"*", "[", "]", "{", "}", ";", "&", "<",
">", "|", "\\", "`", "^", "…", "\t"    
}, StringSplitOptions.RemoveEmptyEntries);

string word;

foreach (string w in words)
{
word = w.Trim();

if (word.Length >= minWordLength)
{
this.AddString(word); // add this word to the tree
}
}

return this._nodes.Count;
}

AddString method: creates the new dictionary items / adds the tree nodes of a string block (word)

 

public void AddString(string str)
{
char c = str[0];
CharDictionaryItem dicoItem;
TrieNode firstNode = _nodes[c],
node;
int ndx,
len = str.Length;

if(firstNode == null)
{
// find the dictionary item. add it if none found
dicoItem = GetDictionaryItem(c);
firstNode = _nodes.AddTail(dicoItem);
}

for(ndx = 1; ndx < len; ndx++)
{
c = str[ndx];
node = firstNode.Children[c];
// find the dictionary item. add it if none found
dicoItem = GetDictionaryItem(c);

if(node == null)
{
node = firstNode.Children.AddTail(dicoItem);
}

if(ndx >= len -1)
{
// set the End of word flag
node.IsEndOfWord = true;
}

firstNode = node;
}
}

Reading back the trie words

How to find back our parsed words through the trie?

Here is a sample code to find words located at a given node:

TrieNode.Strings property:

 

public List<string> Strings
{
get
{
List<string> list = new List<string>();
var childWords = from c in _children where(c._isEndofWord == true) select c;
List<string> neigbors = Next == null ? null : Next.Strings;
string strThis = this.Character.ToString();

// add words found in immediate neighbors
if(childWords != null)
{
string str = strThis;

foreach(TrieNode n in childWords)
{
str += n.Character.ToString();

if(n.IsEndOfWord)
list.Add( str);
}
}

// add words that may be found in child nodes
foreach(TrieNode childNode in _children)
{
foreach(string str in childNode.Strings)
list.Add(strThis + str);
}

return list;
}
}

 

User interface (WPF)

Trie nodes can easily be presented in a TreeView. According to the trie node flag, its corresponding tree view node should be able to indicate this (through different image in our example, using a Converter)

The Tree view item template may be:

 

<TreeView.ItemTemplate>
<HierarchicalDataTemplate ItemsSource="{Binding Children}">
<StackPanel Orientation="Horizontal" Margin="0" >
<Image Width="14"
Source="{Binding Converter={StaticResource nodeImageConverter}}"
Margin="0,0" />
<TextBlock Text="{Binding Character}" Padding="4,0"
Margin="4,0" VerticalAlignment="Center" />
</StackPanel>
</HierarchicalDataTemplate>
</TreeView.ItemTemplate>

 

Searching trie words

Fins words starting with a given string (Trie class)

 

public List<string> StartsWith(string str)
{
List<String> list = new List<string>();

char c = str[0];
TrieNode node0 = this._nodes[c];

if(node0 == null)
return list;

List<string> strings = node0.Strings;

foreach(string s in strings)
{
if(s.StartsWith(str, true, CultureInfo.CurrentCulture))
list.Add(s);
}
return list;
}

 

Fins words containing a given string (Trie class)

 

public List<string> Containing(string str)
{
List<String> list = new List<string>();

if(string.IsNullOrEmpty(str))
return list;

foreach(TrieNode node in this._nodes)
{
var strings = from sx in node.Strings where( sx.Contains(str)) select sx;

foreach(string s in strings)
{
list.Add(s);
}
}

return list;
}

 

Sample app screenshots

 

The sample code

You may download the source code Here.

Have fun optimizing the code and adding new features!

A MMXVI side walk: roman numerals

Numeral systems are fascinating!

Presenting values in various numeral systems reveals some hidden aspects of these values and sometimes reveals more about our Human History and knowledge evolution.

Roman is one of these systems. (you may have a look here, here, or here)

A few years ago, I wrote a method to convert decimal numbers into roman. That worked well. The customer wanted a converter for numbering paragraphs. Up to 50 would be largely enough, he said. The developer in my head pushed me up to 9999 (sadly, I abandoned here by lack of time)

A few days ago, I saw someone wearing a T-shirt with 'XCIV' logo. And that reminded me that I never wrote the reverse conversion (from roman to decimal).

That was a good occasion to write this. And as I also have a friend who wants to practice C#, that may be a good exercise.

I found back my old code (to discover it was all to be rewritten!)… and I started working on a new version!

Roman numerals. A brief presentation

As you may know, Roman numerals building blocks are:

Roman

Decimal

I

1

V

5

X

10

L

50

C

100

D

500

M

1000

 

Intermediate values (like in the table below, between 1 and 10) are additions and subtractions of these basic building blocks

Roman

Decimal

 

I

1

 

II

2

1 + 1

III

3

1 + 1 + 1

IV

4

5 – 1

V

5

 

VI

6

5 + 1

VII

7

5 + 1 + 1

VIII

8

5 + 1 + 1 + 1

IX

9

10 – 1

X

10

 

 

Going beyond the M (1000) [presentation and entries]

Roman numerals were, at their era, (an evidence) hand written (probably more often engraved on hard stones!).

That surely does not mean the people did not know or need to count beyond the 1000. We better never forget that numbers and mathematics are much older than our own era… and that most of the great advances in this area had been achieved in other older civilizations!.

My problem here is just a presentation question: how can I write roman numbers beyond the M (1000)?

Old traditionalists use quirky figures that do not seem easily writable using a 'keyboard'.

Like here:

Or here:

To make presentation and entries a little easier with our era's keyboards, I decided to combine other units to create building blocks beyond the 1000. Example: To present the 1000 – 9000 sequence, I used 'XM' and 'VXM':

 

"XM",

// 10000

"XMXM",

// 20000

"XMXMXM",

// 30000

"XMVXM",

// 40000

"VXM",

// 50000

"VXMXM",

// 60000

"VXMXMXM",

// 70000

"VXMXMXMXM",

// 80000

"CMXM",

// 90000

 

For the sequence 1m – 9m, I used characters that are not in the traditional roman building blocks: The 'U', 'W' and 'Y':

"U",

// 1000000

"UU",

// 2000000

"UUU",

// 3000000

"UW",

// 4000000

"W",

// 5000000

"WU",

// 6000000

"WUU",

// 7000000

"WUUU",

// 8000000

"UY",

// 9000000

 

Conversion processing units

The conversion manager object (iRomanNumberDictionary in the above figure) stores a list of roman building blocks.

Each building block corresponds to a decimal sequence factor (1, 10, 100, 1000… etc.) and stores the roman elements of this sequence (see the sample tables above). It also stores the roman group level (a vital info as you will see in the code!)

Decimal to roman

Now, to convert a decimal number into its roman presentation, we proceed this way (here, using 28 as an example):

  • Take the leftmost digit of the number (leftmost of 28 = 2)
  • Set the index of the sequence to look in = count of number's digits - 1 (for 28 that is 2 – 1 = 1)
    • Note: this target sequence is composed of: "X", "XX", "XXX", … etc.
  • Find the value of this sequence at the element index = the leftmost digit – 1 (2 – 1 = 1)
  • In our case, that would be "XX" (which is decimal 20)
  • Recurse call with the remaining digits of the number (remaining of "28" = "8")
    • Note: that call should return "VIII" (first sequence at the 7th position)
  • Add the string to the "XX" (that would then be "XXVIII")

 

Roman to decimal

Roman to decimal is a bit trickier!

Let us take the reverse conversion of the example above ("XXVIII") for which the conversion result should be 28.

  • The conversion step (a parameter) should be set to the highest index of our roman sequences (in my app, I used 7 sequence groups whose factors are: 1, 10, 100, 1000, 10000… 1000000)
  • We have to look for the string in all sequences whose group order is <= the current step
  • Do until we find something:
    • In our example: we look for "XXVIII" in all sequences whose group order is <= 7
    • As the string is not found in any sequence, we continue the search using the string minus 1 char "XXVII"… "XXVI"… "XXV"… "XX"
    • "XX" is found in the sequence whose decimal factor = 10 at number zero-index = 1
    • We store the following value in a List<int>: sequence's decimal factor * (number zero-index + 1). Which results in 10 * 2 = 20
  • Recurse call using the remaining string "VIII" and using the sequence group order – 1 as the conversion step. Add the returned number to our List<int>
  • Return the sum of integers of our List<int>

 

To better understand what goes on in the conversion process, I added a conversion history that explains the steps of each conversion.

Here is the processing history for XXVIII (28):

 

Another processing history for a greater roman number MMMXCVIII (3098):

 

You may download the code (WPF) HERE. Have fun extending and enhancing for the better!

Swagger – Browsing REST service definitions

You probably know about Swagger?

Swagger:

OpenAPI Specification (originally known as the Swagger specification) is a specification for machine-readable interface files for describing, producing, consuming, and visualizing RESTful web services.

 

In a way, Swagger does what soap does for wsdl generated files.

It produces a .json file containing the given service information (data types, operations, parameters… etc.).

I came to know Swagger some days ago on a client site and that was quite useful to document defined services. The UI provided to access those information was quite poor though.

I looked for a more convenient UI for reading the provided .json, but could not find something available. I thus decided to write my own…. Which is the subject of this post.

 

I first thought it would be quite straightforward: just define objects that match the json structure / parse the json code into my objects… et voilà. I admit I was too optimistic!

The swagger.json schema

Let us first try to understand the structures defined in the generated .json file. I read many articles (some rather obscure!) about that, but through the work craft my understanding got betterJ. Here are the conclusions seen through a .net developer view:

Swagger json file is composed of:

 

A global (root) service definition, itself composed of (most significant elements for clarity)

  • A Dictionary of service paths:
      • Key = path url
      • Value = Dictionary of operations available at that address. Itself composed of:
        • Key = operation name
        • Value = the operation object. Composed of:
          • General info (name, description, summary… etc.)
          • Operation’s Verb (example: get, post, set, put…)
          • An array of Parameter objects:
            • Description
            • (In) = Where should it be located (example: in the request’s path, argument or body…)
            • Is required (bool flag)
            • Parameter’s data type
          • Operation Responses object = Dictionary of:
            • Key = response code (example: 200, default…)
            • Value = response object:
              • Description
              • Response data type

 

Remark: you may notice that operation parameters are defined as an array. I think they would better be defined as a dictionary as parameter names should be unique within the same operation.

After the paths node, the schema continues with another interesting node:

    • A Dictionary of service’s data types:
      • Key = data type name
      • Value = Dictionary of data type members:
        • Key = element name
        • Value = a data type object:
          • Type (example: object, string, array… may refer to a defined data type object)
          • Element data type (for objects of type array). Refers to a defined data type.

 

Here is a screen capture of significant nodes in a sample swagger json file. You may find more samples here.

 

 

The objects of the schema can be presented in the following class diagram (note that the ServicePaths dictionary Value is a Dictionary of string, iSvcOperation… I could not find a way to represent this relation in the diagram):

 

To parse the swagger json data, we will use the now famous NewtonSoft.Json library.

That needs us to add some attributes for the parse process to go right.

Example, in our root class iSvcDefinition:

The service domain names array should parsed, so we tell the library about its model name:

[JsonProperty("tags")]
public iSvcDomain[] ServiceDomainNames { get; set; }

 

The class contains a property that should not be parsed… so, we tell the parser to ignore it:

[JsonIgnore]
public List<iSvcDomain> ServiceDomainNamesList
{
    get { return ServiceDomainNames == null ? null : ServiceDomainNames.ToList(); }
}

So far, so good… we have objects that correctly represent the swagger json model (complemented by some view model properties in this example!).

We still have a problem: resolving data type references!

Resolving json references

Data types of elements in our swagger json file are specified as references to the related data-type-definition. For example, a Response that returns a ‘Product’ object is serialized as:

 

"responses": {

"200": {

   "description": "An array of products",

   "schema": { "type": "array",

     "items": { "$ref": "#/definitions/Product" }

   }

},

 

The full definition of the ‘Product’ data type is defined elsewhere, as:

"Product": {
"properties": {
   "product_id": {
     "type": "string",
     "description": "Unique identifier of the product."
   },
   "description": {
     "type": "string",
     "description": "Description of product."
   },
   "display_name": {
     "type": "string",
     "description": "Display name of product."
   },
   "image": {
     "type": "string",
     "description": "Image URL representing the product."
   }
}
}

So each item of ‘Product’ data type will tell the reference of that type (instead of duplicating definitions).

Resolving json references while parsing needs some craftingJ

In fact, the json parser does not resolve references automatically. That is part 1 of the problem. Part 2 is the fact that to find a referenced item, it should exists. That is: it should have already been parsed. Which requires the data type definitions to be at the beginning of the parsed file. A requirement that is, at least, not realistic.

 

As always, I searched the web for a sustainable solution. You will find many about this question, including “how to Ignore $ref”… which is the exactly the opposite of what we are looking for in this contextJ

The solution I finally used is:

  • Use the JObject (namespace Newtonsoft.Json.Linq) to navigate as need through the swagger json nodes
  • Start the parse process by:
    • Deserializing the swagger "definitions" node which contains the data types definitions
    • Store data types into a Dictionary: key = type id (its path), Value = the data type object
  • Use a custom resolver (a class which implements the IReferenceResolver (namespace Newtonsoft.Json.Serialization) to assign the related data type object each time its reference is encountered.

 

Here is the essential code-snippets for resolving references:

// define a dictionary of json JToken for data types
internal static IDictionary<string, JToken> JsonDico { get; set; }

// a dictionary of data types
IDictionary<string, iSvcTypeBase> _types = new Dictionary<string, iSvcTypeBase>();

// create a JObject of the file’s json string
JObject jo = JObject.Parse(jsonString);

// navigate to the definitions node
Var typesRoot = jo.Properties().Where( i => i.Name == "definitions").FirstOrDefault();

// store dictionary of type_path / type_token
if (typesRoot != null)
    JsonDico = typesRoot.Values().ToDictionary( i => { return i.Path; });

 

 

Now, we can build our data-type-dictionary using the JTokens:

foreach(var item in JsonDico)
{
   // deserialze the data type of the JToken
   iSvcTypeBase svcType = JsonConvert.DeserializeObject<iSvcTypeBase>(item.Value.First.ToString());

   // add the data type to our dictionary (reformat the the item’s key)
  _types.Add(new KeyValuePair<string, iSvcTypeBase>("#/" + item.Key.Replace(".", "/"), svcType));
}

 

 

Our resolver can now return the referenced item to the parser when needed:

public object ResolveReference(object context, string reference)
{
    string                    id = reference;
    iSvcTypeBase      item;
    _types.TryGetValue(id, out item);
    return item;
}

 

Some screen captures:

 

 

You may download the binaries here.

Will post the code later (some cleanup requiredJ)

XML – the ASCII of our era

At the dawn of XML, Tim Bray once said "XML is the ASCII of the future"… which is now clearly the case!

 

To ease reading and handling XML files, I started writing an XML browser last year. The 'Open file' menu of the app first proposed '*.xml' file extension.

As you may imagine, quite quickly that expanded to so many other file extensions: *.xml; *.xaml; *.xsd; *.wsdl; *.csproj; *.vbproj; *.dwproj; *.rptproj; *.config; *.dim; *.ds?; *.cube; *.partitions; *.manifest; *.database; *.rsd; *.rds; *.rdl the list is actually endless!

Why write a XML browser / editor?

Most XML tools I found look more like good (some excellent) editors / syntax visualizers. What I needed more was to have a logical image of the tree structure of an XML file: which node is parent of which…

Xml Note Pad did part of this but was more text-editor-oriented than logical structure visualizer. Its most recent version also dates 2007… time is running!

Of course there is not one sole way to handle such question. Visualizing the tree structure is one aspect. Editors with syntax highlighting, nodes expand/collapse… are still of course very useful.

Let us take a simple user control xaml file with the code similar to this:

 

Viewing the above code's node tree helps better understand its structure:

 

How can we handle XML tree presentation?

XML code is composed of some foundation elements:

  • The element name (the xml tag… which is part of a namespace).
  • The element property (or properties). Which may be either:
    • A value (string / primitive type)
    • A composite element:
      • A set of attributes
      • And / or a set of child elements

 

A simple example:

<person gender="female">

Element = person, gender = an attribute whose value is female

  <firstname>Anna</firstname>

firstname (property of person) whose value is Anna

  <lastname>Smith</lastname>

lastname (property of person) whose value is Smith

</person>

End of the composite element person

 

The above code can also be written like this (1st attribute shifted to a property level):

<person>
  <gender>female</gender>
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

 

You may look here for more information about the XML model.

 

At our .net level, we have objects defined in System.Xml.Linq that allow us to (more or less easily) explore, traverse and query a XML tree. Some of these:

  • XDocument (derives from à XContainer à XNode à XObject)
  • XElement (derives from à XContainer à XNode à XObject)

 

These xml-specialized objects are indispensable to safely communicate with the xml model, but are not good candidates for a view to represent the xml tree.

In some way, in a MMVM pattern, they represent the model (and business logic). We need an intermediate component that can be integrated into a view model and a view.

Property Bags to the rescue

I briefly mentioned property bags in a previous post about MSBuild browser. Let me elaborate a little more about this.

A property bag is a set of abstracted properties. Where a property is presented as:

  • A name
  • A data type
  • A value

And where the property 'value' can be either:

  • An object or primitive value (string for instance)
  • Or a set of properties (i.e. a property bag)

 

This model proved quite useful in many contexts. You may for instance transform any object into a property bag representing its properties (in a strongly-typed form). This can be quite useful in situations where loose coupling is a concern (applications ßà services for instance). I will talk about this in details later in a following post.

 

For our current question about presenting xml trees, property bag model seems to be in sync with the xml model:

Xml element:

name / value / attributes / sub-elements (children)

Property bag element:

name / value / sub-elements (children)

 

To clarify by the practice, here is an example for parsing a XElement attributes into a property bag:

 

static PropertyBag ParseXmlNodeAttributes(XElement xnode, ObjProperty parent)
{
    // create a property bag as a child of the parent property
    PropertyBag targetBag    = new PropertyBag(parent);

    // add the xml element attributes to the bag
    foreach (XAttribute attrib in xnode.Attributes())
    {
        targetBag.Add(attrib.Name.LocalName, attrib.Value);
    }

    return targetBag;
}

 

Parsing a XElement can be done in a few steps:

  • Create an ObjProperty with name = xml element's name
  • If the element's value is composite (attributes and/or child items): add element's attributes and children to the ObjProperty Children
  • Otherwise set the ObjProperty value to the element's value

 

The following code illustrates this (simplified version of the code for clarity)

 

public static PropertyBag ParseXml(XElement xnode, ObjProperty parent)
{
    string            propertyName    = xnode.Name.LocalName;
    PropertyBag    bagObject        = new PropertyBag(propertyName);
    ObjProperty    property            = new ObjProperty(propertyName);
    PropertyBag    bagAttributes    = new PropertyBag();
    PropertyBag    bagSubItems    = new PropertyBag();

    // get the XEelement attributes' bag
    bagAttributes    = ParseXmlNodeAttributes(xnode, property);

    // add attributes (if any) to Children
    if (bagAttributes.Count > 0)
    {
        bagObject.AddRange(bagAttributes);
    }

    // add sub elements if any
    var        subNodes        = xnode.Descendants();

    foreach(var element in subNodes)
    {
        string            nodeName        = element.Name.LocalName;
        PropertyBag    elementBag    = ParseXml(element, property);

        ExtractPropertyValue(bagSubItems, elementBag, nodeName, property);
    }

    bagObject.AddRange(bagSubItems);

    return bagObject;
}

 

The Xml Property Bags' view model

Presenting a property bag in a view requires some additional properties that are specific for a view. For instance, an 'Expanded' property for a tree node, a 'Selected' property for an item, as well as useful commands and search/filter features… etc.

Our property bag view model will embed the bag item and adds such properties.

 

The Xml Property Bags' Views

We need to present our xml property bags in either a tree view or a grid view.

As I said earlier, a property bag item contains either a primitive value (which may be presented as a string) or a set of child items.

In a WPF tree view that can be presented like this (code simplified for clarity):

    <!-- import view model namespace -->
    xmlns:vm="clr-namespace:iSoapExplorer2015.viewmodel"


<TreeView x:Name="treeView1" ItemsSource="{Binding Containers}">

    <TreeView.ItemTemplate>
        <!-- hierarchial data template for containers items -->
        <HierarchicalDataTemplate
                    DataType="{x:Type vm:PropertyBagItemVM}"
                    ItemsSource="{Binding Containers}" > <!-- Containers = property bags -->
            <StackPanel Orientation="Horizontal">
                <!—note: 'Node' is the property bag item -->
                <TextBlock Text="{Binding Node.Name}"/>    
            </StackPanel>
        </HierarchicalDataTemplate>

    </TreeView.ItemTemplate>

 

That results is a view that looks like the one at the beginning of this post (with an image added to the tree view item template)

A data grid view may look like this (again: code simplified for clarity):

 

<DataGrid ItemsSource="{Binding }">
    <DataGrid.Columns>
        <DataGridTextColumn Header="Property" Binding="{Binding Node.Name}" />
        <DataGridTemplateColumn Header="Value" Binding="{Node.ValueAsString}" />
    </DataGrid.Columns>
</DataGrid>

For a xaml Stack Panel snippet element, our view may look like this (an image column added to the above simplified code):

 

With some more work, you may implement more useful features and visual effects.

The current version offers: Xml nodes Tree and Grid views, Search nodes (by name and / or value), get the selected node XML, Export the entire tree to an xml file… etc.

 

 

I stopped short from editing the nodes (names / values)… but you may add such feature (before I do… probably in a near futureJ)

 

Here is the current version's binaries.

Have fun adding new features to the source code if you would like to.