The RAT and the CAT

No, it’s not a bedtime story, is a serious system’s design concept and I’m amazing that people don’t know about this.
As I mentioned in If you don’t know your data, you don’t know your application I was doing a Java Code Review, and I found a clear case of a much simplier solution. How simple you ask?

Well, without completing the task 100%, I achieved in less then 1 day (and lets say for the argument 1 more day of work), what is being worked on by somebody else for a week, with an estimate of 2 more weeks to complete. So let’s add 50% to my estimate, that’s a total of 3 days verses 15 days. You do the math. and yes that was last week and that task is still being worked on the same way, even with reference to my working code. Not to mention the code is a similiar magnitude of simplicity, and simplicity means cost savings in support, people so quickly forget that.

So what is this RAT and CAT:

  • RAT – Row At a Time
  • CAT – Chunk At a Time

This concept is really rather simple, however it’s amazing how implementations never consider this. Let me give you a simple example.

Your importing some external data (a flat file of 5 columns, and is inturn will be stored in 2 tables in a normalised form). So the RAT way of doing things would entail:

  • Using the language in question, open the file, read line by line, converting into object for reference.
  • Now for each row read you
    • You get the values (A and B), and then do a select to see if this row already exists in one of the normalised tables (calling in X). If it does, you have the surrogate key, else you insert the row and get the surrogate key
    • Now you do the same with the next set of values (C, D, E) which reference the normalised values (A and B), inserting into Y)
    • And so on and so on.
    • Report exceptions line by line if found

For those now laughing, it’s not really funny, it’s sad that programs are written this way. For those that say, but that’s the only way, well welcome to a different, radical and hard to learn approach.

The Cat way of doing things would entail:

  • Create a temporary table
  • Bulk load the data into the temporary table via appropiate command, e.g. mysqlimport or LOAD DATA.
  • Using one select, insert into X rows from temporary table that are not already present
  • Using one select, insert into Y rows from the temporart table that are not present, joining to X to get the appropiate surrogate key
  • Report exceptions via one select of data that wasn’t inserted due to some rules

It’s not rocket science people.

I should mention this is a simple example, and it’s not always possible to do CAT processing for all operations, but generally some portion of batch work can be, and the remaining must be done in a RAT way.

Updated
Some more recent articles including We need more CATs (2009) and The Art of Elimination (2010)

Tagged with: Databases General MySQL

Related Posts

Why Being Proactive Is Always a Winning Approach

Many companies manage production infrastructure using a reactive model rather than a proactive one. Organizations typically react to warnings and alerts, then implement corrective actions in response. While some companies have well-designed architectural patterns—such as feature flags and rate limiting—that can quickly mitigate the impact of issues, these are merely temporary solutions, not resolutions.

Read more

AWS CLI support for Aurora DSQL and S3 Tables

If you were following the AWS Re:invent keynote yesterday there were several data specific announcements including Aurora DSQL and S3 Tables . Wanting to check them out, I downloaded the latest AWS CLI 2.

Read more

Migrating off of WordPress - A Simplified Stack

The ongoing drama between Wordpress v WP Engine continues to cross my reading list, but I have permanently removed WordPress from my website. I have finally transitioned away from the complex Linux/Apache/MySQL/PHP (LAMP) stack required for self-hosting WordPress on my professional website.

Read more