jeudi 11 décembre 2008


2008/12/02

DATA MODELING

“Database Reverse engineering”: a reactive documentation

Robin Ouellet

The Data Model

Have you already heard the following statement: “show me your data model and I will understand your system “? is it true that a data model speaks to us? Based on my several years of experience in system development, I noticed that the data model is the document the most used in information technologies. It is used in all stages of development cycle.

The data model is an essential tool to illustrate the comprehension of a business domain during the architecture phase, to evaluate the space capacity required by the physical database or to analyze the impact of a modification in production. Unfortunately, the data model is the most neglected project deliverable.

Delivering a functional system is more important than delivering a well documented system. There are many reasons: lack of resources, delays in one or many stages of the development cycle, the supplier did not provide the data model at the software implementation, etc.

In the situation of an application have been implemented in production and well functioning, the need to have a data model is still required by the project manager, the analysts or the developers sustaining the application. The solution is “Database Reverse Engineering” at a relatively lower cost.

The Reverse Engineering

An appropriate data modeling tool will allow to connect into the physical database and convert information into a data model representation. This diagram will show the entities, the attributes, the primary keys, the relationships as well as the relationships cardinalities.

Reverse engineering cannot convert everything. Relationships between the entities will appear only if the “foreign keys” constraints were defined in the physical database.

Assuming that the physical database contains the minimal information (definition of the tables, the attributes, the primary and the foreign keys), here are the manual operations to perform in the data modeling tool after the reverse engineering operation has been completed:

Step 1. Manually reorganized the presentation of entities by module

The reverse engineering functionality cannot gather in a logical way and by page, the entities of a same module. E.g. entities related to the file customer, entities related to the invoicing module.

If the database contains five hundred tables, then five hundred entities will be presented on the same page with all the relationships that connect those entities. The data model will be very difficult to read.

Step 2. Withdraw relationships which become redundant

Sometimes, in order to reduce and to facilitate the reading of the data model, it is necessary to remove relationships which are found at many places. E.g. the “customer number” and “product number” foreign keys can be found in the majority of the entities of the data model.

Only the graphical representation must be removed. The background information entities relationships must always remain intact.

Step 3. Seize short and long entities descriptions

Step 4. Seize short and long attributes descriptions

The efforts

Efforts of reverse engineering is not more than one or two weeks without including the documentation of the attributes (in reference to step 4 above).

Let us consider a database containing approximately 150 tables, 350 relationships and 20 columns on average per table:

· The execution of the reverse engineering in the data modeling tool (MS Visio Professionnel) will take approximately 30 minutes;

· Steps 1,2 and 3 can take from 1 to 2 weeks according to both the application and the business domain knowledge;

· Step 4 can last approximately 8 weeks. Considering that 3000 columns (attributes) in a database represent approximately 2000 attributes to be documented because the same attribute can be found in several entities. By allocating a 10 minutes of effort per attribute, step 4 could last approximately 8 weeks. Once again the duration varies according to the application and the business domain knowledge;.

The Data modeling tools

The majority of the data modeling tools offer the reverse engineering functionality: Microsoft Visio Professional 2007, PowerDesigner by Sybase, Oracle Designer, Oracle SqlDeveloper and ERWin.

These tools detect in any time differences between the physical database and the data model. This functionality proves very useful to preserve an up to date version of the data model.

Cost of its products is approximately 700 Canadian dollars for “Microsoft Office Visio Professionnel 2007” and approximately 7500 American dollars for the PowerDesigner product.

Oracle provides a free data model tool named “SqlDeveloper”. You will find it at this address: http://www.oracle.com/technology/products/database/sql_developer/files/Modeling.html. However that is a beta version.

The standard version of Microsoft Visio does not contain the reverse engineering functionality. It is thus necessary to buy Microsoft Visio with the mention “Professional”.

Conclusion

The documentation of a data model using a reverse engineering remains a reactive response. Ideally and normally, this activity occurs in the architecture stage of the system development cycle.

Robin Ouellet

ouelletrobin@aqiii.org


Data model
: A data model is an abstract and a graphic representation of information related to a business domain or to the company as a whole. It presents the entities (or tables), the attributes (or columns), the relationships between its entities as well as the cardinality which characterizes the relationships. For example: The entity “ORDERS” with the relation “CONTAINS” one to many “ORDER ITEMS”. The mention “one to many” represents the cardinality of the relationship.