Taoffi's blog

prisonniers du temps

A dive into the undocumented TFS meta-model – Part II

 

In a previous post, I started exploring the TFS configuration database. Let's continue here by having a look at some more configuration objects and relationships.

TFS Config security objects

 

Referenced table

Primary column

Table

Foreign column

tbl_security_identity_cache

tf_id

tbl_gss_group_membership

member_id

tbl_security_identity_cache

tf_id

tbl_security_domain_groups

group_id

tbl_security_identity_cache

tf_id

tbl_security_membership_cache

container_id

tbl_security_identity_cache

tf_id

tbl_security_membership_cache

member_id

tbl_gss_groups

tf_id

tbl_gss_group_membership

parent_group_id

tbl_security_domain

domain_number

tbl_security_domain_groups

domain_number

tbl_security_domain

domain_number

tbl_security_projects

domain_number

tbl_security_identity_type

type_id (int)

tbl_security_identity_cache

type (tinyint)

The relationship diagram above suggests that we can query user logins, group-membership for specific projects by a query like the following:


SELECT TOP (100) PERCENT
    login.display_name  AS user_name,
    proj.scope_name     AS project,
    login_grp.display_name AS user_group

FROM dbo.tbl_security_membership_cache AS membership INNER JOIN
  dbo.tbl_security_identity_cache AS
login ON membership.member_id = login.tf_id INNER JOIN
  dbo.tbl_security_projects AS proj INNER JOIN
     dbo.tbl_security_domain AS domain INNER JOIN
       dbo.tbl_security_domain_groups AS grp ON domain.domain_number = grp.domain_number
       ON proj.domain_number = domain.domain_number
     ON
membership.container_id = grp.group_id INNER JOIN
  dbo.tbl_security_identity_cache AS
login_grp ON grp.group_id = login_grp.tf_id

ORDER
BY user_name, project

 

The above query may give us results similar to the following

user_name

project

user_group

Administrator

Research education "agile

Administrators

Administrator

Research education "agile

[TEAM FOUNDATION]\Team Foundation Service Accounts

Administrator

SourceSafeProjects

[TEAM FOUNDATION]\Team Foundation Service Accounts

Administrator

SourceSafeProjects

Administrators

Administrator

DefaultCollection

[TEAM FOUNDATION]\Team Foundation Service Accounts

Administrator

DefaultCollection

Administrators

 

TFS database pause: Change set quantitative statistics sample

Where to find change-sets information?


TFS source control maintains change-sets for projects in the corresponding collection database. Each change-set is of course composed of the modified project files at the time of the Check-in.

Change-set information is stored in the tbl_ChangeSet table of the project collection database. That is: if you have a project collection 'Default Collection' stored in a database named tfs_default_collection, you will find change-sets information in the table tfs_default_collection.dbo.tbl_ChangeSet.

Files information is stored in tbl_files table, and file-versions information in tbl_Version table.

So, to obtain the list of files of a change-set number 861, for instance, you can execute a query similar to the following:


SELECT TOP (100) percent
    chg_set.CreationDate,
    chg_set.ChangeSetId,
    v.FullPath

FROM
    dbo.tbl_ChangeSet (nolock)AS chg_set INNER JOIN
    
dbo.
tbl_Version (nolock)AS v ON chg_set.ChangeSetId = v.VersionFrom LEFT OUTER JOIN
    
dbo.
tbl_File (nolock) AS f ON v.FileId = f.FileId


WHERE (chg_set.ChangeSetId = 861)

ORDER BY chg_set.CreationDate, v.FullPath

 

 

Which may produce results like the following:

CreationDate

ChangeSetId

FullPath

2012-05-05 20:22:31.800

861

$\Research education "agile\SimpleUniConverter\commands\UniConvertCommandModel.cs\

2012-05-05 20:22:31.800

861

$\Research education "agile\SimpleUniConverter\Convert.xaml.cs\

2012-05-05 20:22:31.800

861

$\Research education "agile\SimpleUniConverter\Convert.xaml\

2012-05-05 20:22:31.800

861

$\Research education "agile\SimpleUniConverter\MainWindow.xaml.cs\

2012-05-05 20:22:31.800

861

$\Research education "agile\SimpleUniConverter\MainWindow.xaml\

 

How many bytes changed?


As you may have noticed from the figure above, change-set file information contains a FileLength column. This suggests that we can obtain the number of changed bytes of a given file between two change-sets… through the following query results we can know that the file grew 696-551 = 145 bytes:

CreationDate

ChangeSetId

FullPath

FileLength

2012-04-30 22:33:10.040

843

$\Research education "agile\SimpleUniConverter\Convert.xaml.cs\

551

2012-05-05 20:22:31.800

861

$\Research education "agile\SimpleUniConverter\Convert.xaml.cs\

696

 

Who changed what?


The change-set table provides an CommitterId column through which we can know who made a given check-in. As this column contains an ID (integer refering tbl_identity.IdentityId), this is not straight forward information. We have to join other tables to obtain something useful.

In fact, the committer user name is stored in the Constants table. So, we have to join the Change-set's CommitterId column to tbl_identity.IdentityId, and then join tbl_identity.TeamFoundationId to Constants.TeamFoundationId… we can then get the user name in the Constants.DisplayPart column (or Constants.String column)

 

 

This can be presented by the following change in our initial view diagram:

 

CreationDate

ChangeSetId

FullPath

FileLength

DisplayPart

2012-04-30 22:33:10.040

843

$\Research education "agile\SimpleUniConverter\Convert.xaml.cs\

551

Taoffi

2012-05-05 20:22:31.800

861

$\Research education "agile\SimpleUniConverter\Convert.xaml.cs\

696

Taoffi

 

Changed bytes tracking generalization


It would, ideally, be beneficial to integrate such information into the TFS analytical database.

We still can do some turnaround by directly querying the database tables.

The idea is to list all change-set files in the order of their respective check-in date. To handle the probability of identical dates / files, we can use a row number to distinguish changes. This may be done like in the following view:


CREATE VIEW [dbo].[xtfs_changeset_files_qry]
AS
SELECT
TOP (2147483647)
    ROW_NUMBER() over( ORDER BY chg_set.CreationDate) as row_number,
    chg_set.CreationDate,

    const.DisplayPart        as user_name,
    chg_set.ChangeSetId,

    v.FullPath,
    f.FileLength

FROM dbo.tbl_ChangeSet (nolock)    AS chg_set INNER JOIN
dbo.tbl_Identity (nolock)    AS login ON chg_set.CommitterId = login.IdentityId INNER JOIN
dbo.tbl_Version (nolock)    AS v ON chg_set.ChangeSetId = v.VersionFrom LEFT OUTER JOIN
dbo.tbl_File (nolock)        AS f ON v.FileId = f.FileId inner join
Constants (nolock)        AS const ON const.TeamFoundationId=login.TeamFoundationId

ORDER
BY chg_set.
CreationDate, v.FullPath


GO

 

We can then refer the view to itself (current version / previous version) obtain row-relative quantitative changes:

SELECT v1.row_number,
       v1.CreationDate,
       v1.user_name,
       v1.FullPath AS v1FilePath,
       v1.FileLength AS v1_length,
       v_next.FileLength AS v2_length,
       v_next.FullPath AS v2FilePath,
       ISNULL(v_next.FileLength, 0)
          - ISNULL(v1.FileLength, 0) AS changed_bytes

FROM dbo.xtfs_changeset_files_qry AS v_next RIGHT OUTER JOIN
     dbo.xtfs_changeset_files_qry AS v1 ON v_next.FullPath = v1.FullPath
        AND v_next.row_number = v1.row_number + 1
        OR v_next.FullPath IS NULL
        AND v_next.row_number IS NULL

 

A dive into the undocumented TFS meta-model – Part I

TFS… introduction to this series


Microsoft TFS (Team Foundation Server… recently renamed to ALM: Application Lifecycle Management) is a great project management platform.

TFS is built on meta-model abstractions (that were first seen, in Microsoft's products, in SharePoint).

TFS, like SharePoint, is composed of several software components which operate around several databases: the configuration database and one or more content databases.

Configuration and content databases' elements (tables, views, functions, stored procedures… etc.) represent the meta-models (schemas and business logic for each related feature) which govern and control much of the software components' behaviors.

TFS is often presented as a software project management solution. In my view, this seems a little reductive of its value. Because TFS features truly span a much larger project management scope.

Many useful literatures have been written about TFS: its features, usage, guidance, troubleshooting… etc. Few have enlightened its meta-model structures and, as of my knowledge, none has put some clear light about its (undocumented) databases' structures. That is what I will try to do in this series, more to value TFS abstractions than to nude its internal mechanics. And also to enforce, again, the meta-model approach as a rational basis for software solutions.

 

TFS logical architecture

 

For its contents, TFS maintains a database per project-collection.

Each collection may contain one or more projects. And each project can be managed according to a selected methodology (Agile / Scrum / CMMI… etc.). In fact, TFS applies and operates the selected methodology through the provided methodology project management template. Ultimately, this means you can build your own methodology and use it to manage your project with TFS. All what you need is to provide the desired methodology template.

Managing a project with TFS consists of recording and following-up specific Work items (defined by the methodology template)… those can be items like: Tasks, Bugs, Change requests, User story… etc.

As TFS was initially brewed to manage software projects, a common item in each project is the Source control repository which helps developers to maintain their source code versions and branches in a very efficient architecture composed of 'change sets'.

 

 

To discover and figure out its components' data and their required behaviors, TFS uses a configuration database. A first look at the configuration database will be the subject of this first article in the series.

 

The configuration database


Like SharePoint, TFS databases are delivered without documentation or relational diagrams. They simply don't contain any explicit Foreign-keys or relationships. Whatever the reason behind this choice, that seems a little surprising of a product whose main aim is projects' transparency and documentationJ.

That said, to discover, understand and correctly interpret the role of each TFS database item and their inter-relationships you should do a deep dive and be armed with some patience and empiric methods!

 

Hosts and processes


Let's have a look at three important tables:

  • tbl_ServiceHost contains the list of services' host information. This includes the item's name, the database connection string (when relevant), the Virtual directory and Resource directory…
  • tbl_ServiceHostProcess contains information about current processes. Machine name, Process name, Identity under which the process is running, start time…
  • tbl_ServiceHostInstance contains the list of processes' instances per host (Process Id / Host Id / Start time).

 

In my case, the table tbl_ServiceHost contained 3 entries for:

  • Team Foundation: root item (parentless… see diagram ParentHostId column). Connected to the configuration database.
  • Two child entries each for one of my TFS Project collections:
    • Default collection.
    • Training collection.

 

The above relational schema allows us to query the tables and obtain results like the following about current running services:


SELECT TOP (100) PERCENT
svcHost.Name         AS host,
svcHost.VirtualDirectory,
svcHost.ConnectionString,
process.MachineName    AS machine,
process.ProcessName,
process.StartTime


FROM dbo.tbl_ServiceHostInstance AS instance INNER JOIN
dbo.tbl_ServiceHost AS svcHost ON instance.HostId = svcHost.HostId LEFT OUTER JOIN
dbo.tbl_ServiceHostProcess AS process ON instance.ProcessId = process.ProcessId LEFT OUTER JOIN
dbo.tbl_ServiceHost AS hostParent ON svcHost.ParentHostId = hostParent.HostId

ORDER BY process.StartTime DESC, hostParent.Name, host

 

Sample results

host 

VirtualDirectory 

ConnectionString 

machine

ProcessName 

StartTime 

TEAM FOUNDATION 

NULL 

Data Source= hp10\SqlTfs;Initial Catalog=Tfs_Configuration;Integrated Security=True

HP10 

w3wp.exe 

2012-05-21 20:18:09.997 

DefaultCollection 

~/DefaultCollection/ 

Data Source=hp10\sqltfs;Initial Catalog=tfs_defaultcollection;Integrated Security=True 

HP10 

w3wp.exe 

2012-05-21 20:18:09.997 

Training collection 

~/Training collection/ 

Data Source=hp10\SqlTfs;Initial Catalog="Tfs_Training collection";Integrated Security=True

HP10 

w3wp.exe 

2012-05-21 20:18:09.997 

TEAM FOUNDATION 

NULL 

Data Source=hp10\SqlTfs;Initial Catalog=Tfs_Configuration;Integrated Security=True 

HP10 

TfsJobAgent.exe 

2012-05-21 06:51:37.870 

DefaultCollection 

~/DefaultCollection/ 

Data Source=hp10\sqltfs;Initial Catalog=tfs_defaultcollection;Integrated Security=True 

HP10 

TfsJobAgent.exe 

2012-05-21 06:51:37.870 

Training collection 

~/Training collection/ 

Data Source=hp10\SqlTfs;Initial Catalog="Tfs_Training collection";Integrated Security=True

HP10 

TfsJobAgent.exe 

2012-05-21 06:51:37.870 

 

Ressources and services

Again, let's follow up the following tables:

  • tbl_CatalogResourceType stores a list of resource types. Like: 'Machine', 'Infrastructure Root', 'Project Server', 'Team Foundation Project Collection Database'… etc. Here is some sample entries:

Identifier 

DisplayName 

Description 

9FB288AE-9D94-40CB-B5E7-0EFC3FE3599F 

Default location for team project portals 

By default, each team project Web site will be a sub-site of this location.

0584A4A2-475B-460E-A7AC-10C28951518F 

Machine 

A machine that exists as part of the TFS deployment. 

14F04669-6779-42D5-8975-184B93650C83 

Infrastructure Root 

The root of the catalog tree that describes the physical makeup of the TFS deployment.

1B6B5931-69F6-4C53-90A0-220B177353B7 

Team Foundation Project Collection Database 

The database that houses the information for a Team Foundation Project Collection. 

526301DE-F821-48C8-ABBD-3430DC7946D3 

Team Foundation Configuration Database

The database that houses the information for a Team Foundation Application Instance. 

289DD275-CECA-4698-8042-38D2E86FC682 

Project Server 

Project Server Information 

3DADD190-40E6-4FC1-A306-4906713C87CE 

SharePoint Web Application 

A SharePoint Web application that has been configured to support team project portals.

3C856555-8737-48B6-8B61-4B24DB7FEB15 

Test Controller 

This is a test controller type in Catalog Service. 

26338D9E-D437-44AA-91F2-55880A328B54 

Team Project Collection 

A Team Project Collection that exists within the TFS deployment.

450901B6-B528-4863-9876-5BD3927DF467 

Project Portal 

A Web site or SharePoint site that provides a portal for a team project. 

15DA1594-45F5-47D4-AE52-78F16E67EB1E 

Process Guidance 

A Web site or SharePoint library that provides process guidance for a team project.

47FA57A4-8157-4FB5-9A64-A7A4954BD284 

Team Web Access 

Team Web Access Location 

48577A4A-801E-412C-B8AE-CF7EF3529616 

Team Project 

A Team Project that exists within the TFS deployment. 

 

  • tbl_CatalogResource stores resources (each referring to one of the types in the above tbl_CatalogResourceType table. Sample entries:

 

Identifier 

ResourceType 

DisplayName 

Description 

7462A882-741B-4F91-885F-1405A22256DF 

1B6B5931-69F6-4C53-90A0-220B177353B7 

Tfs_Training collection

 

C5CD36C5-8373-4BF7-B3FA-16199DAFF106 

48577A4A-801E-412C-B8AE-CF7EF3529616 

cmmi project 

cmmi training project 

16480162-C502-4197-872A-4DDDFE49BA90 

1B6B5931-69F6-4C53-90A0-220B177353B7 

tfs_defaultcollection 

 

DCC270E7-6EC7-45BF-A827-53BBB5386606

14F04669-6779-42D5-8975-184B93650C83 

Infrastructure Root 

The root of the catalog tree that describes the physical makeup of the TFS deployment. 

2F8C7D1C-EB1F-4058-B34D-60FA2F6CA7DC 

48577A4A-801E-412C-B8AE-CF7EF3529616 

Agile project 

 

D7121DA7-3E32-49C3-BCB1-68C83C4383B7

47FA57A4-8157-4FB5-9A64-A7A4954BD284 

Team Web Access 

 

4B812279-A7A6-4A8A-92A4-7AA208589FEB 

EB1E0B3B-FAA1-49D2-931A-FDC373682BA5 

SQLTFS 

 

F69EFD2B-51C6-461B-9E49-8089DF86C9A3 

0584A4A2-475B-460E-A7AC-10C28951518F 

hp10 

 

5242DBFC-06C8-4CAD-A27B-99212BEEE999

526301DE-F821-48C8-ABBD-3430DC7946D3 

Tfs_Configuration 

 

713FF18A-E0E8-4192-A7B9-A0900F50A5BC 

26338D9E-D437-44AA-91F2-55880A328B54 

DefaultCollection 

 

 

  • tbl_ServiceDefinition stores a list of predefined services (classes) and the web access page for each. Sample entries (among other things, you may note curly braces which are place holders for some url parameters in the RelativePath column):

 

ServiceType 

Identifier 

DisplayName 

RelativePath 

AdministrationService 

C18D6E34-68E8-40D2-A619-E7477558976E

Administration Service 

/TeamFoundation/Administration/v3.0/AdministrationService.asmx 

Annotate 

74B15E02-0AC2-414F-A9B9-30268659D3B5 

Team Web Access (Annotate) 

/web/ann.aspx 

AnnotateSourceControlItem

D271E722-C261-4BC2-B0F7-1C8A9E13F907 

Team Web Access (AnnotateSourceControlItem) 

/web/ann.aspx?pcguid={projectCollectionGuid}&path={itemPath}&cs={itemChangeset} 

CatalogService 

C2F9106F-127A-45B7-B0A3-E0AD8239A2A7 

Catalog Service 

/TeamFoundation/Administration/v3.0/CatalogService.asmx

ChangesetDetail 

D40EF625-CCA7-4E73-B9EC-86CBE1534CE0 

Team Web Access (ChangesetDetail) 

/web/cs.aspx 

CreateWorkItem 

14CD69C6-88F9-4C8C-A259-D2441D77D1AF

Team Web Access (CreateWorkItem) 

/web/wi.aspx?puri={projectUri}&wit={workItemType} 

Difference 

2B84D900-1F08-486C-9C47-0E6AF371D03C 

Team Web Access (Difference) 

/web/diff.aspx 

Eventing 

C424AE04-8C6F-4516-8B2D-238FFFCA3081 

Event Service 

/TeamFoundation/Administration/v3.0/EventService.asmx

IdentityManagementService 

3DE26348-00BE-4B82-8E4A-E5AD004CFECD 

Identity Management Service 

/TeamFoundation/Administration/v3.0/IdentityManagementService.asmx 

JobService 

DA1C0184-14FE-4E13-B7FC-6EAA07D84BE8 

TFS Background Job Service 

/TeamFoundation/Administration/v3.0/JobService.asmx 

 

  • tbl_CatalogServiceReference stores per-resource services' entries. (Resource Identifier / Service Identifier…).

 

ResourceIdentifier 

AssociationKey 

ServiceIdentifier 

ServiceType 

D7121DA7-3E32-49C3-BCB1-68C83C4383B7 

ChangesetDetail 

D40EF625-CCA7-4E73-B9EC-86CBE1534CE0

ChangesetDetail 

D7121DA7-3E32-49C3-BCB1-68C83C4383B7 

CreateWorkItem 

14CD69C6-88F9-4C8C-A259-D2441D77D1AF 

CreateWorkItem 

D7121DA7-3E32-49C3-BCB1-68C83C4383B7 

Difference 

2B84D900-1F08-486C-9C47-0E6AF371D03C 

Difference 

D7121DA7-3E32-49C3-BCB1-68C83C4383B7

DiffSourceControlItems 

5E91C4DA-0013-4EBB-943D-CC77F5ADB82D 

DiffSourceControlItems 

D7121DA7-3E32-49C3-BCB1-68C83C4383B7 

DiffSourceControlShelvedItem 

4C81A44D-67AB-4D23-9CBE-339C9102993B 

DiffSourceControlShelvedItem 

D7121DA7-3E32-49C3-BCB1-68C83C4383B7 

ExploreSourceControlPath

AC0770BC-1DD6-4B8E-A811-5A03690DF44F 

ExploreSourceControlPath 

D7121DA7-3E32-49C3-BCB1-68C83C4383B7 

Home 

0F9CED5D-89F9-4743-BAB8-FA511FF09A8C 

TSWAHome 

 

This allows us to query the services in a way similar to the following:

SELECT resType.DisplayName AS ResourceType,
svc.DisplayName AS Service,
svc.Description AS serviceDescription
FROM dbo.tbl_ServiceDefinition AS svc INNER JOIN
dbo.tbl_CatalogServiceReference AS svcRef ON svc.Identifier = svcRef.ServiceIdentifier
INNER JOIN
dbo.tbl_CatalogResource AS Res ON svcRef.ResourceIdentifier = Res.Identifier
INNER JOIN
dbo.tbl_CatalogResourceType AS resType ON Res.ResourceType = resType.Identifier

 

 

Sample results:

ResourceType

Service

serviceDescription

Team Web Access

Team Web Access (Annotate)

 

Team Web Access

Team Web Access (AnnotateSourceControlItem)

 

Team Web Access

Team Web Access (ChangesetDetail)

 

Team Web Access

Team Web Access (CreateWorkItem)

 

Team Web Access

Team Web Access (Difference)

 

Team Web Access

Team Web Access (DiffSourceControlItems)

 

Team Web Access

Team Web Access (DiffSourceControlShelvedItem)

 

Team Web Access

Team Web Access (DiffSourceControlShelvedItem)

 

Team Web Access

Team Web Access (ExploreSourceControlPath)

 

Team Project Collection

Location Service

Location Service for Visual Studio Team Foundation Server.

Team Project Collection

Location Service

Location Service for Visual Studio Team Foundation Server.

Team Foundation Server Instance

Location Service

Location Service for Visual Studio Team Foundation Server.

Team Web Access

Team Web Access (OpenWorkItem)

 

 

More about TFS meta-models in following posts!

Sorting dates: Ascending, Descending OR…

This is a small story, yet with something special:

Once upon a time, there was a list of events stored into a database table. Each event had (evidenceJ) a date.

A friend of mine asked me to sort these event items.

I asked, as you may imagine: Ascending or Descending?

 

But this, apparently, seemed too simplistic for him…

- I want the most recent to appear first. He said.

- Well, that means you want to sort in descending order.

I gave him a demo of data sorted in descending order…

- No, he said, that is not what I want!

 

After a short discussion, I understood that what he wanted was:

·         To first have the incoming events (i.e. those located after “today’s” date… sorted in an ascending order)

·         Followed by older dates (sorted in ascending order)

 

I first thought: can ORDER BY do something else than sorting in ascending / descending?...

As the answer was, evidently, No… I said: Sorry, but that is not possible!

The question remained in my mind however. And I started looking for a solution (for myself!)

The experimentations didn’t in fact last long, the solution was quickly found:

 

All what we need is to:

§  Categorize dates according to their location relative to Today (or any other logic);

§  Give each category a ‘sortable’ value (integer for example)

§  Sort by the category’s value, followed by the date value.

 

Example:

SELECT TOP (100) PERCENT

      event_date,

      description,

      CASE WHEN event_date >= getutcdate() THEN 1

            ELSE 2

      END               AS date_category  -- categorize date

FROM         dbo.sight_conferences

ORDER BY date_category, event_date

 

 

Screenshot of the output data sample

 

 

 

The key part of this code is the logic of categorizing dates. In this sample, categorization was quite simple (set a category according to date location relative to Today). But, of course, we can use another logic scheme to obtain data sorted accordingly.

Managing databases periodic membership (or: unique time span indexing) Part II

Ho to view periodic relationship data

In the previous post, I exposed a method for uniquely indexing time spans of periodic-memberships.

 

Here, we will discuss the next task: How to view historical data in such structures?

To illustrate this, we will talk about a portfolio sample data, simply because this is a more elaborate case (than the hotel/room reservation case).

 

Suppose we have:

§  A portfolio named ‘Sample portfolio’

§  A set of companies: company 1, company 2 and company 3

§  A set of daily historical data for each company for the period from 2010/01/01 to 2010/01/10

 

Now suppose that:

§  Company 1 and company 3 are ‘permanent members’ of the portfolio starting at 2010/01/01

§  Company 2 is a member of the portfolio for the following periods:

o   2010/01/01 to 2010/01/05

o   2010/01/08 to à endless

 

To query the daily historical data for company 2 related to its parent portfolio, we should thus obtain the data for the periods where it is effectively a member of the portfolio:

§  2010/01/01, 2010/01/02, 2010/01/03, 2010/01/04, 2010/01/05

§  2010/01/08, 2010/01/09 and 2010/01/10

 

(Note that data for 2010/01/06 and 2010/01/07 should be ignored)

 

This can be done using a view similar to the following (TSQL followed by the view design diagram)

 

SELECT     p.name AS portfolio,

                c.name    AS company,

                history.date,

                history.data

 

FROM dbo.portfolio_companies AS pc INNER JOIN

                      dbo.portfolios AS p ON pc.portfolio_id = p.id INNER JOIN

                      dbo.company_data AS history

                ON pc.date_start <= history.date

                               AND ISNULL(pc.date_end, CONVERT(datetime, '9999/01/01')) >= history.date INNER JOIN

                      dbo.companies AS c ON history.company_id = c.id AND pc.company_id = c.id

 

As said above, company 2 is a portfolio member during the following periods:

§  2010/01/01 to 2010/01/05

§  2010/01/08 to à endless

 

 

 

With our sample history data, this view produces the following data for company 2 (8 entries):

 

 

Note: the history data table contains the following data for company 2 (10 entries):

 

 

Managing databases periodic membership (or: unique time span indexing)

Introduction

The relations between some ‘real world’ objects can sometimes be in the form of ‘periodic membership’. By ‘periodic membership’, I mean the context when one object can be member of another during one or more periods of time.

Examples:

·         A person who occupies a hotel room during a period of time;

·         A company which is member of a portfolio during a period of time;

·         … etc.

 

The context also assumes that we want to keep the membership history and be able to build data aggregations that take into account these periodic memberships. (For example: be able to calculate portfolio performance during a period of time that spans several ‘periodic memberships’).

 

A concrete example may better explain the case:

§  We have a portfolio for which we want to maintain historical performance during time;

§  At 2009/01/01, the portfolio contained : company1, company2 and company3;

§  At 2009/03/01, we changed the portfolio components and excluded company2. (Now the portfolio contains only company1 and company3);

§  At 2009/06/01, to calculate the portfolio performance for the period 2009/01/01 to 2009/06/01, we should take company2 into account for the period 2009/01/01 to 2009/03/01 (although it is no more part of the portfolio).

 

Knowing that database engines do not offer a ‘time-overlap’ unique indexing features, resolving this situation needs a structure that allows us to assign and manage periodic membership indexes.

Note: I am surely not the first to confront this context, but I wanted to share my experience to resolve this problem and will be glad to receive any thoughts about it.

 

Periodic membership multiplicity models

In my experience, I encountered two possible types of multiplicities related to periodic relationship:

§  One-to-one periodic membership: a good example for this would be a hotel room that can be occupied only by one person during the same period (i.e. period is unique per room);

§  One-to-many periodic membership: a good example for this would be a portfolio which can have more than one member for the same period (i.e. period is unique per portfolio-member).

In this article, I will explain how to handle both of these types.

 

Hotel room reservation, the basic structure

To demonstrate a hotel reservation periodic membership (uniqueness of room/reservation period), we can create the following table and relationships:

 

§  Hotel clients: contains client list items

§  Hotel rooms: contains room list

§  Hotel room reservations: contains client/room reservation periods

 

Note: the date_end field of the reservations table can be null, to allow ‘permanent’ (or endless) reservations.

 

To ensure uniqueness of reservation periods, we will create a scalar function (returning a Boolean value) that can be used as Check Constraint on the hotel_room_reservations.

 

CREATE FUNCTION [dbo].[is_valid_hotel_reservation_period]

(

       @item_id     int,

       @room_id     int,

       @date_start  datetime,

       @date_end    datetime

)

RETURNS bit

AS

BEGIN

       -- validate the provided parameters    

       if @date_start is null or @room_id is null

             return 0

 

       -- be sure start date < end date 

       if not @date_end is null

       begin

             if @date_start >= @date_end

                    return 0

       end

      

       -- is this the first reservation period for this room?

       if ( select count(*)

                    from dbo.hotel_room_reservations

                    where( room_id = @room_id) ) <= 0

             return 1

      

       -- check overlapped periods

       if @date_end is null              -- is this a permanent (endless) reservattion?

       begin

             if ( select  count(*)

                    from dbo.hotel_room_reservations

                    where( ( room_id = @room_id)

                                  and( id != @item_id)

                                  and(( date_end is null) or (date_end >= @date_start))

                           )) > 0

                    return 0

       end

       else

       begin

             if ( select  count(*)

                    from dbo.hotel_room_reservations

                    where(       (room_id = @room_id)

                                  and( id != @item_id)

                                  and( date_start <= @date_end)

                                  and(( date_end is null) or (date_end >= @date_start))

                           )) > 0

                    return 0

       end

      

 

       -- return OK

       return 1

END

 

We can now use this function as check constraint on the reservation table:

 

 

Overlapping entries in the reservation table would now display an error message like:

The Portfolio/company periodic membership case

In portfolio/company case, membership should allow multiple membership instances for different periods. That is: the same company can be member of the same portfolio several times as long as the membership periods do NOT overlap.

 

Our tables and relationships may look like the following:

 

All what we need is to create a new version of the above function, and use it as a check constraint condition for the portfolio companies table:

 

CREATE FUNCTION [dbo].[is_valid_portfolio_company_period]

(

       @item_id     int,

       @portfolio_id int,

       @company_id  int,

       @date_start  datetime,

       @date_end    datetime

)

RETURNS bit

AS

BEGIN

       -- validate the provided parameters    

       if @company_id is null or @date_start is null or @portfolio_id is null

             return 0

 

       -- be sure start date < end date 

       if not @date_end is null

       begin

             if @date_start >= @date_end

                    return 0

       end

      

       -- is this the first membership period for the portfolio/company?

       if ( select count(*)

                    from dbo.portfolio_companies

                    where( portfolio_id = @portfolio_id)

                           and( company_id = @company_id) ) <= 0

             return 1

      

       -- check overlapped periods

       if @date_end is null              -- is this a permanent (endless) membership?

       begin

             if ( select  count(*)

                    from dbo.portfolio_companies

                    where( company_id = @company_id

                           and( portfolio_id = @portfolio_id)

                           and( id != @item_id)

                           and(( date_end is null) or (date_end >= @date_start))

                           )) > 0

                    return 0

       end

       else

       begin

             if ( select  count(*)

                    from dbo.portfolio_companies

                    where( company_id = @company_id

                           and( portfolio_id = @portfolio_id)

                           and( id != @item_id)

                           and( date_start <= @date_end)

                           and(( date_end is null) or (date_end >= @date_start))

                           )) > 0

                    return 0

       end

      

       return 1

END

 

We can, now, use the function as a check constraint condition for the membership table:

 

 

Possible extensions

It seems that, with some more efforts, this can evolve to a generic Time-Span unique indexer.

Waiting to hear some thoughts about thisJ!

SQL recursive schemas

SQL recursive schemas

 

One of the interesting features of relational databases is the possibility for a table to reference itself.

Suppose we want to represent the file folder system in a database. We can define a table having, for example, 3 fields: id, parent_id and name:

 

We can then create a self referencing relationship between the id and parent_id field.

Simple (and elegant) solution!

Yes, but that also let us inherit all specific constraints of recursive patterns.

 

The figure above shows the simple (and somehow elegant) of the table schema or pattern.

What it doesn’t show is how the ‘real world’ records will look like when stored in the table:

 

 

Two important constraints of such recursive schema are:

·         Path readability: i.e. as the schema allows an infinite nest levels, reading an item path may cause a stack overflow;

·         Cyclic references: i.e. a folder that references itself (or one of its children) as a parent (which may end up by creating infinite path readability loop;

 

The Nest level problem

A function or stored procedure for reading an item path may look like this:

 

CREATE FUNCTION dbo.Read_path

(

       @item_id int

)

RETURNS nvarchar(max)

AS

BEGIN

       declare             @parent_id   int,

                    @path        nvarchar(max),

                    @name        nvarchar(50)

      

       select @parent_id   = parent_id,

             @name        = name

       from dbo.sample_folders

       where(id = @item_id)

      

       if @parent_id is null

             return @name

 

       -- recursive call  

       return dbo.Read_path ( @parent_id) + N'\' + @name

END

GO

 

 

Such a function will work nicely as long as the folder’s nest level remains in a ‘reasonable’ value. For example, in MS SQL Server, nest level (recursive call stack whose value can be returned at any time by the @@nestlevel system function) has a maximum of 32.

 

So, it is more precautious to test current @@nestlevel value before proceeding:

 

if @@nestlevel >= 32

       return N'Path nest level exceeds allowed value'

 

However, doing so breaks the efficiency and usability of our pattern. The best solution would be to get rid of the recursive call inside our function. Here is an example using a goto label:

 

       DECLARE @folder_name       nvarchar(255),

              @parent_id   int,

              @cur_item_id int,

              @path        nvarchar(max)

 

       SELECT @path        = N'',

              @cur_item_id = @item_id

 

-- recursion label

get_nested_path:

       SELECT @folder_name = f.name,

             @parent_id   = f.parent_id

       FROM   dbo.sample_folders AS f

       WHERE( f.id = @cur_item_id)

 

 

       if @parent_id is null

       begin

              -- We are on the root

              return @folder_name + @path

       end

 

       select @cur_item_id = @parent_id,

              @path        = N’\’ + @folder_name + @path

 

       -- goto recursion label

       goto get_nested_path

 

 

Cyclic references problem

Our read path function can still be trapped into an infinite loop that can be caused by cyclic references… i.e. a folder referencing itself or one of its children as a parent:

 

 

One way to prevent cycling references is to create a constraint on the folders table that tests for cycling references. The constraint may call a function that returns true (1) if the item is not valid:

 

CREATE FUNCTION [dbo].[Is_cyclic_folder]

(

       @item_id            int,

       @parent_id          int

)

RETURNS bit

AS

BEGIN

       -- if this is a root item: OK

       if @parent_id is null

             return 0

      

       -- self referencing?

       if @parent_id = @item_id

             return 1

      

       declare             @parent_folder_id   int,

                    @folder_id          int

      

       -- initial parent folder

       select @parent_folder_id = @parent_id

      

       while NOT @parent_folder_id is null

       begin

             -- if the current parent folder is one of children folders:

             -- then this item is cyclic (NOT VALID)

             if @parent_folder_id = @item_id

                    return 1

            

             select @folder_id   = @parent_folder_id

            

             select @parent_folder_id   = parent_id

             from dbo.sample_folders

             where( id = @folder_id)

       end

      

       return 0

END

 

The constraint may look like this:

 

ALTER TABLE dbo.sample_folders ADD CONSTRAINT

       CK_sample_folders_cyclic CHECK ([dbo].[Is_cyclic_folder](id, parent_id) = 0)