Looking just at the data

There are many areas you need to review when addressing MySQL performance such as current database load, executed SQL statements, connections, configuration parameters, memory usage, disk to memory ratio, hardware performance & bottlenecks just to name a few.

If you were to just look at the data that is held in the database, what would you consider?
Here are my tips, when looking just at the data.

  1. What is the current database size?
  2. What is the growth of data over time, say daily, weekly?
  3. Which are the 2 largest tables now?
  4. What 2 tables are growing the fastest?
  5. What tables have greatest churn, specifically DELETE’s?
  6. How often do you optimize your tables?
  7. What is your archiving/purging strategy? Do you even have one?
  8. Review data types? I average 25% reduction in footprints, just by choosing optimal data types, generally with zero code changes.
  9. What further data simplification can occur to reduce size, eg. INT for IP’s, enums, removing repeating text etc?
  10. What normalization of data can occur?
  11. What storage engines are in use?
  12. What data is write once data?
  13. Can data be stored in other forms, e.g. outside a relational database?

Even without looking at the SQL statements or the MySQL configuration you can generally deduce a lot of information about the application by just looking at the data.