Following my User Group Presentation I was asked by fellow MySQLer Kim about Logical Data Modelling (LDM), in relation to Physical Data Modelling.
Well, first the brain had to work overtime to remember when was the last time I worked on a Logical Data Model. The answer to that is 1996 doing R&D work for Oracle Corporation with their CASE repository tool, Oracle Designer, about version 1.3/1.3.2. I’ve learnt in the past 10 years to purge technical stuff from my brain, leading from the capacity in be able to remember in detail data models, data migration and data cleansing issues of projects even after leaving them 3 years eariler.
As Kim pointed out, he thinks physically, actually directly at the SQL level, then working backwards to produce an appropiate physical model. To think logically is to consider the entities and attributes and relations before considering the physical tables, columns and relationships. So how do you program somebody to think logically? In the case with Kim, he is undertaking formal studies after already grasping the concepts of software development. Generally today more people don’t undertake the formal education and we end up with The Hobbist and the Professional Syndrome.
I guess in summary I’d argue why bother. Does anybody still do traditional logical models? Feedback/comments are welcome. (Professionally I believe there is a place for LDM).
However for the purpose of the exercise lets start with a Physical Representation and present a Logical Representation of that to see the differences.
From this you can see a classic Library example, showing a table of Books and a Table of Authors, and an intersection table (3NF) to indicate that a Book can have one to many Authors, and likewise an Author can have one to many Books.
So how would this have been represented in a Logical Model.
Some things to consider in Logical Modelling.
- Attributes really only require a name, and perhaps just a datatype using number, string, date (Jim Starkey would be happy), but nothing specific like SMALLINT, or VARCHAR(50).
- An Attributes Mandatory/Optional state may or may not be known
- Information is not in 3NF. i.e. relationships can be many to many (in our example, there is no concept of the intersection table
I’m going from memory here, so there is probably more points to consider.
So how do you teach this when you are trying to work backwards, when I learnt this back in 1988 (some 18 years ago), I’d never created a database or used SQL so I didn’t have the history. Arjen had a brilliant idea to consider Logical Modelling after the fact. Views. Consider an end user requirement for Reporting, and how you would represent your model better to an end user (effectively de-normalising these views so users don’t have to know about joins).
In this case you would create a view of Books and a View of Authors. Details such as mandatory/optional isn’t important to the end user report (ie, it’s not like it’s needed to be enforced), and specific datatype details again are not that important. The basics to know how to format a number or a date works.
An interesting approach that worked well in our explaining.
The Text Book
So it was interesting to go back to the text book using C.J. Date’s An Introduction to Database Systems to review definitions.
Logically isn’t really referenced, the term used is conceptual. A conceptual definition is “The right way to do database design is to get the logical design right first, without paying any attention whatsoever at that stage to physical – that is, performance – considerations”
Also for example Third Normal Form we get from Section 12.3 First, Second and Third Normal Forms.
Third normal form (very informal definition): A relvar is in 3NF if and only if the monkey attributes (if any) are both:
- a. Mutually independent
- b. Irreducibly dependent on the primary key
We explain the terms nonkey attribute and mutually independend (loosely) as follows:
- A nonkey attribute is any attribute that does not participate in the primary key of the relvar concerned.
- Two or more attributes are mutually independent if none of them is functionally dependent on any combination of the others. Such independence implies that each attributes can be updated independently of the rest.
Man, no wonder many years of experience and having generally seen most cases, enables me to forget this and not feel like I’ve fogotten something.
Jay Pipes says
Hi!
Great stuff! Care to add this content to the MySQL Forge for Getting Started in Database Design:
http://forge.mysql.com/wiki/GettingStartedRelationalDatabaseConcepts
Thanks!
Jay
Roland Bouman says
Hi Ronald,
Great to read see you bring this stuff to the attention.
“I guess in summary I’d argue why bother. Does anybody still too traditional logical models?”
Well, I do. I mean, in my mind at least. I almost never get to the stage of drawing an actually full blown diagram, but when I’m analyzing a problem, I think about the problem in these terms.
To me, the essential part about logical modelling is that it is the phase where you make a (mental) inventory of what things the system is about. That inventory needs to do two things primarily:
1) identify which classes of objects (entities) there are – at least principal types need to be established here.
2) identify which kinds of relationships tie the entities together.
It might include an account of which attributes there are, but at this point, you need not be complete. There are lot’s of things you don’t know -or care- about in this stage. Keys are another thing I don’t care about in this stage, although you will probably have some kind of idea of the properties that could be used to identify instances of the entity.
Now one of the things I remember from the tool you describe, Oracle Designer, is that it was/is capable of transforming a logical datamodel into a physical one. That is IMO where things could go terribly wrong.
It is sort of possible to do transformation from a logical model to a physical model, and most people will intuitively feel how this is implemented. Entities become tables, attributes become columns (and are assigned a datatype), the putative notion of identifying properties are transformed into (primary) keys, one to many relationships are translated to foreign keys and corresponding constraints and finally many to many relationships are used to generate intersection tables.
Although such a generated physical model might give you a kind of headstart, you should probably not use it as is for the actual implementation. Entities might not map to tables: you might want to split up a table (vertical partitioning) for performance reasons, or because your analysis of the problem has changed while you’re developing (you might discover you need subtypes for example). Attributes don’t need to map one-to-one to columns: you might think of an entity with a timeperiod attribute, without already knowing the best way to implement it (date + int? from date to date? int and int?).
Of course, a failure of such a translation does not devaluate logical modelling. IMO, you should use logical modelling primarily as a means to get a grasp on the problem, and also to communicate your information analysis to other people involved in the project. A physical model might take too much time to prepare in the early stages of development, or it might not serve the pupose of communication well enough. Of course, you migh even decide to let the choice for a particular rdbms be guided by the initial information analysis, in which case you simply can’t design the physical model at this stage. So, lot’s of reasons why to bother with logical models.
So, just my thoughts on this interesting subject.
Sheeri says
I make entity-relationship diagrams. I’m a big ol’ database geek. I find it’s much easier to start thinking of the real-world things I want to store, and their relationships to each other, and then work out how the db will do it. It keeps things much truer to reality.
Ronald says
Some interesting info at Logical Data Modeling: A Key to Successful Enterprise Data Warehouse Implementations.
Nigel says
“A relvar is in 3NF if and only if the **monkey** attributes (if any) are both…”
I’ve worked on a lot of systems in the last 25+ years and I must say a whole lot of them had MONKEY attributes. Also monkey project managers, come to that….