The value of multi insert values

Baron got a great amount of response from his 50 things to know before migrating Oracle to MySQL. I’m glad I invited him as a fellow MySQL colleague to my presentation to the Federal Government on Best Practices for Migrating to MySQL from Oracle and SQL Server for his inspiration.

Oracle will always be a more featured product then MySQL. There are however features that MySQL has that Oracle does not. While I’ve got a draft of a list of my own, I have several hundred incomplete drafts.

One of these features I was able to demonstrate to a client is the ability to have multiple VALUES clauses for a single INSERT statement. For example.

INSERT INTO t1(c1) VALUES (1), (2), (3), (4), (5);

Instead of

INSERT INTO t1(c1) VALUES(1);
INSERT INTO t1(c1) VALUES(2);
INSERT INTO t1(c1) VALUES(3);
INSERT INTO t1(c1) VALUES(4);
INSERT INTO t1(c1) VALUES(5);

Does it make a difference? What is the performance improvement?

The number one reason for an improvement in performance is the lack of network latency for each command. We ran a number of tests in a specific example for the client, taking multiple single insert statements, and combining into combined statements.

We ran tests across localhost and also a network test.

It was found that taking several thousand INSERT queries and combined into a maximum of 1M packets made sense. Overall this single test showed a 88% improvement from 11.4 seconds to 1.4 seconds.

real        0m11.403s
user        0m0.175s
sys         0m0.157s

real        0m1.415s
user        0m0.019s
sys         0m0.012s

In the real world example, differences in the volumes of query to combine and system load showed a reduction of 72%

Comments

  1. Mark Callaghan says

    MySQL has a few things that seem simple at first glance but make it very efficient. This is one. Having the client-server protocol push many rows at a time back to the client (rather than the client pull 1 row at a time) is another.

  2. says

    Yes – very useful feature.

    There are a few popular myths around this feature – some people still think this is MySQL-specific (it is not – Postgres supports it in v >= 8.2; MS SQL Server supports it in v 2008), and some people think it is non-standard (it is in the 2003 version of the SQL standard)

    Anyway, for Oracle, I sometimes generate scripts like this:

    INSERT INTO t (col1,…,colX)
    SELECT v1,…,vX FROM DUAL UNION ALL
    SELECT v1,…,vX FROM DUAL UNION ALL

    SELECT v1,…,vX FROM DUAL
    ;

    clunky, but it does allow you to throw the data in there in one statement.

  3. says

    On the other hand, the ay Oracle does this with the array intercafe is even more performant (although requires a bit more programming). The Oracle C API (Pro*C or OCI) is not the easiest to get along with, but once you know them, they can kick butt. I know, I used to teach people how to use them.

  4. says

    If you’re using InnoDB in autocommit mode, you will also see a very large improvement from the multi/batched insert statement due to fewer commits. Even moreso if you use innodb flush with setting 1.

  5. Brooks Johnson says

    The test conditions aren’t exactly clear to me. Were you sending the single inserts as a batch or one by one? If as a batch, was autocommit on or were you committing all the inserts as one?

  6. says

    Combined with INSERT IGNORE, the multi-insert feature allows for a really nice peoperty:
    You can insert 100 rows together, and those that fail a constraint (FOREIGN, UNIQUE), don’t get inserted, now errors thrown, while all the rest get inserted just fine.

  7. says

    I made extensive use of these bulk inserts, and combined them with “ON DUPLICATE KEY UPDATE `columnname` = VALUES(columnname)”.
    You get really huge performance improvements from that.