Taoffi's blog

prisonniers du temps

doc5ync – the Trie in practice for online e-books

I spent the past few months working on a new web project referencing online e-books (http://doc5.5ync.net/)

The goal of the project was not to build a new online library (many good libraries are already out there) but rather to offer a central reference for all what exists, adding some features for these references to provide a new analytical view of e-books.

Most of online libraries offer access to books that are now in the ‘public domain’ (I.e. no more copyright protected) and thus available for free download.

For an analytical approach, I started to use the Trie structure (I talked about this in a previous post) for analyzing textual elements of the referenced e-books to provide relational aspects among them.

Just a reminder, explained in the previous post: a Trie is a tree-like structure where a node has a parent, neighbors and descendants. The structure is particularly interesting for text indexing because, whatever the language, any textual unit (word) is forcibly composed of a set of that language’s alphabet (whose number is quite limited). Adding a flag to end-of-word nodes, we can build a Trie whose root is composed of the few units of the alphabet with branches to text words.

trie-word-nodes

This compact structure enables fast and efficient search and retrieve elements into large text sequences. Which seems to be a good base for our e-book text indexing and analysis.

Using the trie structure to index e-book details (titles, description, author…) of the relatively large number of referenced e-books (approx. 9000 as of writing) was straightforward and efficient.

Now, a given unit (word) in this trie might be related to one or more of our e-books. How to link our trie nodes each to its set of ‘data’? That is the subject of this brief post.

We are going to build upon the elements mentioned in in the previous post:

  • We will use our Trie with its (char) Dictionary and Nodes.
  • Our trie provides us with its words presented as a collection of iTrieWord objects
  • Let us create a new object iTrieDataWord (deriving from iTrieWord)
  • This last object will contain a collection of ‘Data items’ (in our concrete case, this will be a collection of e-books)

trie-with-data

How to proceed?

After some experimentations Smile, I ended up using the following steps which seemed to be good in regards of efficiency and performance:

  • Load all e-books’ textual sequences (titles, descriptions, author information… for the time being)
  • Build the Trie of this text sequences (more about this later)… which provides us with its Words (iTrieWord) collection
  • Now, in the loaded collection of e-book records (the iDataItem(s)). (Each record contains the e-book title, description and author information)… each record (iDataItem) can assign itself to any of the Trie words whenever that word is part of its own data.

Some additional considerations in the process are quite important:

    • One important point is to define “What is a ‘Word’”?  in terms of minimum number of characters to consider a sequence as a ‘word’. As the referenced e-books are multilingual, it was somehow clear that this threshold is language-dependent. In Arabic, for instance, words tend to be short in terms of number of characters (Arabic vowels are often part of the character). After some research, I found that considering 4 chars as a minimum is an acceptable compromise as it allows searching the e-books by year (author’s or book’s) which may be quite useful.
    • It is also important to define what are ‘word-delimiters’ (spaces are not the only ones to consider!). Actually, that is also language-dependent in some ways… and as such requires experimentations with all languages to be used in the given project.
    • Finally: what are we going to do for all this to b useful?... I.e. Are we going to persist this Trie? Or rather proceed as a (runtime queryable) indexing service?… etc. For doc5 project, we decided to persist the results in data tables / running the scan process periodically

Some performance numbers

Some numbers to justify using the above steps:

  • Reading data records + Building a Trie of 40365 words (min = 4 chars): 17s
  • Processing 9000 e-book information (I.e. building the Trie + creating 358000 links to its words): 8min30s

Will post some sample code in the coming weeks. You may have a look at http://doc5.5ync.net/ (The current version for presenting the results).

A bit late!: Wish you all a happy 2020 year, with many useful projects and much fun!

Silverlight database to DataGrid, yet another approach- Part II

This is the second part on how to format/adapt data to be displayed in a Silverlight DataGrid in a way that allows the preservation of business logic.

 

Server side objects

As we have seen, in part I, the server will prepare our data into the designed classes before transmitting it to the Silverlight client application.

On the server side, we have the following classes:

 

Data level classes

SilverDataTable

Represents the table (or view, or function…) data.

Contains:

A meta-data table

A list of data rows

SilverDataRow

Represents one record of data.

Contains:

List of data cells

Linked to:

A parent table

SilverDataCell

Represents one record’s data cell.

Linked to:

A parent row

A parent meta-data column

Meta-data level classes

SilverMetaTable

Represents the table’s meta-data (schema)

Contains:

A list of meta-data columns

Linked to:

A parent table

SilverMetaColumn

Represents one column’s schema information.

Note: This is the ‘Key Class’ where you can insert all your required business logic.

Linked to:

A parent meta-data table

 

The server uses these classes’ methods to expose a data service that returns a set of requested data:

 

[AspNetCompatibilityRequirements(RequirementsMode =

                    AspNetCompatibilityRequirementsMode.Allowed)]

public class DataService

{

       [OperationContract]

       public SilverDataTable GetData(string str_connect_string,

                                  string str_sql_command)

       {

             SilverDataTable sl_table = new SilverDataTable();

 

             sl_table.UserDefinedSqlCommand = str_sql_command;

             sl_table.GetData(str_connect_string);

 

             return sl_table;

       }

}

 

The GetData method of the SilverDataTable logic is as follows:

§  Open the database connection.

§  Read the meta-data structure of the SQL command.

§  Read the data rows of the SQL command.

 

Reading table’s meta-data (table schema)

For reading the table schema, SilverDataTable asks an OleDbDataAdapter to fill a DataTable (System.Data) schema and passes this DataTable it to its SilverMetaTable for reading its information:

 

OleDbDataAdapter    adapter      = new OleDbDataAdapter(str_sql_cmd, connection);

DataTable           table  = new DataTable();

 

table  = adapter.FillSchema(table, SchemaType.Mapped);

MetaDataTable.ReadDbTableColumns( table);

 

The Meta-data table (SilverMetaTable) offers a method to read a DataTable (System.Data) schema and create its own meta-data columns accordingly:

 

public bool ReadDbTableColumns(DataTable db_table)

{

       /// start with a ‘traditional’ checking!

       if( db_table == null || db_table.Columns == null)

                    return false;

 

       Columns.Clear();

 

       foreach (DataColumn col in db_table.Columns)

       {

             Add( col);

       }

 

       return true;

}

 

The Add method of this same class proceeds as in the following code:

 

public void Add(DataColumn data_column)

{

       /// start with a ‘traditional’ checking!

       if( data_column == null

                    || string.IsNullOrEmpty( data_column.ColumnName)

                    || data_column.DataType == null)

             return;

 

       /// does this column already exist?: if so, only update its information

       /// otherwise add a new column

       SilverMetaColumn    col    = this[data_column.ColumnName];

 

       if( col == null)

             Columns.Add(new SilverMetaColumn(data_column));

       else

             col.ReadColumnInfo( data_column);

}

 

As you may have already guessed through the above code, our meta-data table offers an indexer which returns the meta-data column by name:

 

public SilverMetaColumn this[string column_name]

{

       get

       {

             if( string.IsNullOrEmpty( column_name) || Count <= 0)

                    return null;

 

             foreach (SilverMetaColumn col in m_columns)

             {

                    if( string.Compare( col.Name, column_name, true) == 0)

                           return col;

             }

             return null;

       }

}

 

And the meta-data column offers a constructor using a DataColumn (System.Data) object:

 

public SilverMetaColumn(DataColumn data_column)

{

       ReadColumnInfo( data_column);

}

 

public bool ReadColumnInfo(DataColumn data_column)

{

       if( data_column == null)

             return false;

 

       m_name       = data_column.ColumnName;

       m_caption    = data_column.Caption;

       m_data_type  = data_column.DataType;

 

       return true;

}

 

Reading data records

Inside the SilverDataTable, reading the data rows (records) is straightforward:

 

OleDbCommand        cmd    = new OleDbCommand( str_cmd, conn);

OleDbDataReader     dr     = null;

SilverDataRow       row;

 

dr = cmd.ExecuteReader(CommandBehavior.CloseConnection);

while (dr.Read())

{

       row          = new SilverDataRow(this);

       row.ReadDataReader( dr);

       m_rows.Add( row);

}

 

The ReadDataReader method of the SilverDataRow class looks like the following pseudo-code:

 

int                 n_fields     = data_reader.FieldCount;

string              field_name,

                    field_value;

SilverMetaColumn    meta_column;

 

for( int ndx = 0; ndx < n_fields; ndx++)

{

       field_name   = data_reader.GetName( ndx);

       meta_column  = meta_table[field_name];

       field_value  = data_reader.GetValue(ndx).ToString();

 

       m_cells.Add(new SilverDataCell(this, meta_column, field_value));

}

 

The Silverlight client side

At the client side, our Silverlight application will uses (references) the server’s web service in order to obtain the data.

As we have seen above, the data will be received as a SilverDataTable object (containing the data rows + the meta-data table)

 

Binding the Silverlight DataGrid to the data

To bind the received data to the DataGrid, we will, basically, proceed according to the following steps:

§  Set the DataGrid to NOT auto generate its columns (we will do this ourselves)

§  Bind the DataGrid to our received SilverDataTable’s Rows (List<> of rows, often presented in Silverlight as an ObservableCollection<SilverDataRow> when referencing the wcf service)

§  For each SilverMetaColumn in our received meta-data table’s meta-columns:

§  Create a DataGridColumn according to the meta-column data type (and business logic constraints)

§  Bind the created data grid column using a converter that will be in charge of interpreting the related data cell’s data for all column’s rows

§  Add this DataGridColumn to the DataGrid

 

Here is a sample code (where some artifacts have been removed for better readability):

 

foreach (SilverMetaColumn met_col in meta_table.Columns)

{

       DataGridBoundColumn col    = CreateDataGridColumn( meta_col);

       Binding                    binding      = new Binding();

 

       col.Header                 = cell.Caption;

       binding.Path               = new PropertyPath("Cells");

       binding.Mode               = BindingMode.OneWay;

       binding.Converter          = (SilverRowConverter) this.Resources["row_converter"];

       binding.ConverterParameter = col_index;

 

       col.Binding         = binding;

 

       data_grid.Columns.Add(col);

}

 

Note: The converter is defined inside the Xaml code, like the following:

 

<UserControl.Resources>

       <local:SilverRowConverter x:Key="row_converter" />

       ...

       ...

</UserControl.Resources>

 

As you may have guessed, each DataGrid row will receive the corresponding data row’s Cells as its DataContext. And to present any row cells, it will call our designated converter.

To interpret one cell’s data, our converter takes one parameter: the cell index.

Using he cell index, the converter will be able to identify the related cell, its data and its meta-data column information (data type, or any other business logic constraints)

 

Here is a sample code for the converter:

 

public object Convert(object            value,

                    Type         targetType,

                    object              parameter,

                    CultureInfo culture)

{

       ObservableCollection<SilverDataCell>    row    =

                    (ObservableCollection<SilverDataCell>) value;

       int    col_index    = (int) parameter;

 

       SilverDataCell      cell         = row[col_index];

 

       return cell.ValueAsString;

}

 

Sample user interface to test the solution

In the joined sample code, to test our solution, the user interface proposes:

§  A TextBox control where you can enter the desired connection string to your database;

§  Another TextBox where you can type your SQL command;

§  A data grid where the received data will be displayed.

 

Any comments are welcome: you can write me at tnassar[at]isosoft.org

 

Download the sample application (1.32 mb).

 

Html Content Viewer for Silverlight

In a previous post, I talked about a solution to manipulate (animate for example) the Silverlight control inside the hosting html page.

 

The reverse side of the problem (displaying html content inside a Silverlight control), is a requirement which comes out in the context of numerous web projects.

 

Some of the proposed solutions suggest html translators in order to obtain the final text into a sort of ‘Rich Edit’ control.

Although it is very good to have a rich edit Silverlight control, the solution seems too tedious to solve a relatively simple problem: displaying html content into an html page (i.e. the page hosting our Silverlight control).

 

Building on the previous sample of animating the Silverlight control into the hosting page, I tried to make a user control that dynamically creates an iFrame to display the desired html content inside the main Silverlight page.

 

You can have a look at the proof of concept Here!

 

 

 

Dynamically create an html control

Suppose we want to display the html page: http://mycompany.com/page.html inside our Silverlight control.

 

The solution can be:

§  Use the HtmlPage to locate the hosting HtmlElement (where our Silverlight control lives: that can be straightforward:  HtmlPage.Plugin.Parent;)

§  Create a new html container (for example, a DIV)

§  Place an iframe html element inside this container

§  Set the iframe source to the desired page

§  Insert the container into the hosting HtmlElement

 

 

Here is a sample code illustrating these steps

 

private void CreateHtmlContent()

{

       HtmlElement  plugin_div   = HtmlPage.Plugin.Parent,

                    new_div      = HtmlPage.Document.CreateElement("div"),

                    iframe       = HtmlPage.Document.CreateElement("iframe");

 

       /// set the new div position to absolute

       new_div.SetStyleAttribute("position", "absolute");

       new_div.SetStyleAttribute("z-order", "5");

 

       new_div.SetStyleAttribute("left", "200px");

       new_div.SetStyleAttribute("top", "40px");

       new_div.SetStyleAttribute("width", "400px");

       new_div.SetStyleAttribute("height", "400px");

 

       /// setup the iframe style, attributes and target page

       iframe.SetStyleAttribute("width", "100%");

       iframe.SetStyleAttribute("height", "100%");

       iframe.SetAttribute("src", "http://www.isosoft.org/taoffi/");

 

       /// add the iframe to the new div

       new_div.AppendChild(iframe);

 

       /// add the new div to the hosting html page

       plugin_div.AppendChild(new_div);

}

 

The solution exposed here can be a suitable foundation to solve many situations where html content can be composed and/or displayed inline within a Silverlight control.

Let us begin by making a custom control, a HtmlContentViewer say!

 

The sample code exposes some more!

 

SilverHtmlContent.zip (49.34 kb)