InnoDB I_S.tables.table_rows out by a factor of 100x

I’ve always believed that the MySQL Information_schema.tables.table_rows figure for Innodb tables to be while approximate, approximately accurate.

Today I found that the figures varied on one table from 10x to 100x wrong.

Before performing an ALTER I always verify sizes for reference.

| table_s | table_name | engine | row_format | table_rows | avg_row_length | total_mb   | data_mb    | index_mb  | today      |
| XXXXX   | s_a        | InnoDB | Compact    |     208993 |           7475 |  1491.5312 |  1490.0156 |    1.5156 | 2009-09-09 |

mysql> alter table s_a modify col VARCHAR(255);
Query OK, 23471 rows affected (4 min 26.23 sec)


| XXXXX   | s_a        | InnoDB | Compact    |    2407063 |            629 |  1447.0312 |  1445.5156 |    1.5156 | 2009-09-09 |

I was so caught out by this. The table reported 200k rows, but the alter returned 23k, that’s like 10x out.
I ran my query again, and the second time I got.

| XXXXX   | s_a        | InnoDB | Compact    |      21813 |          69487 |  1447.0312 |  1445.5156 |    1.5156 | 2009-09-09 |

This is closer to what I’d expected, 22k verses 23k.

I have to say, while I have always treated the data and index size as accurate, I now how little confidence in the table_rows any more.

MySQL DML stats per table

MySQL provides a level of statistics for your INSERT, UPDATE, DELETE, REPLACE Data Manipulation Language (DML) commands using the STATUS output of various Com_ variables, however it is per server stats. I would like per table stats.

You can achieve this with tools such as MySQL Proxy and mk-query-digest, however there is actually a very simple solution that requires no additional tools.
The following 1 line Linux command (reformatted for ease of reading) gave me exactly what I wanted, and it had ZERO impact on the database.

$ mysqlbinlog /path/to/mysql-bin.000999 |  
   grep -i -e "^update" -e "^insert" -e "^delete" -e "^replace" -e "^alter"  | 
   cut -c1-100 | tr '[A-Z]' '[a-z]' |  
   sed -e "s/t/ /g;s/`//g;s/(.*$//;s/ set .*$//;s/ as .*$//" | sed -e "s/ where .*$//" |  
   sort | uniq -c | sort -nr

  33389 update e_acc
  17680 insert into r_b
  17680 insert into e_rec
  14332 insert into rcv_c
  13543 update e_rec
  10805 update loc
   3339 insert into r_att
   2781 insert into o_att

Granted the syntax could do with some regex improvements, but in 2 minutes I was able to deduce some approximate load. The mysqlbinlog command also gives option to retrieve data for a given time period, so it is very easy to get these statistics on a per hour basis.

Sometimes the most simple options are right in front of you, just just need to strive to find the simplest solution.