No, it’s not a bedtime story, is a serious system’s design concept and I’m amazing that people don’t know about this.
As I mentioned in If you don’t know your data, you don’t know your application I was doing a Java Code Review, and I found a clear case of a much simplier solution. How simple you ask?
Well, without completing the task 100%, I achieved in less then 1 day (and lets say for the argument 1 more day of work), what is being worked on by somebody else for a week, with an estimate of 2 more weeks to complete. So let’s add 50% to my estimate, that’s a total of 3 days verses 15 days. You do the math. and yes that was last week and that task is still being worked on the same way, even with reference to my working code. Not to mention the code is a similiar magnitude of simplicity, and simplicity means cost savings in support, people so quickly forget that.
So what is this RAT and CAT:
- RAT – Row At a Time
- CAT – Chunk At a Time
This concept is really rather simple, however it’s amazing how implementations never consider this. Let me give you a simple example.
Your importing some external data (a flat file of 5 columns, and is inturn will be stored in 2 tables in a normalised form). So the RAT way of doing things would entail:
- Using the language in question, open the file, read line by line, converting into object for reference.
- Now for each row read you
- You get the values (A and B), and then do a select to see if this row already exists in one of the normalised tables (calling in X). If it does, you have the surrogate key, else you insert the row and get the surrogate key
- Now you do the same with the next set of values (C, D, E) which reference the normalised values (A and B), inserting into Y)
- And so on and so on.
- Report exceptions line by line if found
For those now laughing, it’s not really funny, it’s sad that programs are written this way. For those that say, but that’s the only way, well welcome to a different, radical and hard to learn approach.
The Cat way of doing things would entail:
- Create a temporary table
- Bulk load the data into the temporary table via appropiate command, e.g. mysqlimport or LOAD DATA.
- Using one select, insert into X rows from temporary table that are not already present
- Using one select, insert into Y rows from the temporart table that are not present, joining to X to get the appropiate surrogate key
- Report exceptions via one select of data that wasn’t inserted due to some rules
It’s not rocket science people.
I should mention this is a simple example, and it’s not always possible to do CAT processing for all operations, but generally some portion of batch work can be, and the remaining must be done in a RAT way.
Some more recent articles including We need more CATs (2009) and The Art of Elimination (2010)
This is brilliant — I never thought about it this way, although I often do the “cat” way. I figure “let me get the stuff in the database first, and then I can figure out where and what I want with it, like referential integrity…
Justin Swanhart says
CAT is AKA “bulk SQL” or “batch processing”.