Beginner CSV Engine issues

I’ve just started using the CSV engine for a practical application and I’ve come across a few subtle and unexpected issues/limitations.

First, you can’t create any columns in your CSV table nullable.

mysql> create table t1(i INT) ENGINE=CSV;
ERROR 1178 (42000): The storage engine for the table doesn't support nullable columns

RTFM shows this was introduced in 5.1.23. See CSV Limitations

The second and more annoying was creating a CSV table, inserting a sample row (just to check the syntax), then replacing the file ([datadir]/[schema]/[table].CSV) with the same format, but with additional generated rows. This was to no avail when attempting to SELECT from the table.

The solution was to do a REPAIR TABLE [table] in order to see the newly refreshed data.
Futhermore, some more RTFM shows in Repairing and Checking CSV Tables an important caveat to using the CSV engine and loading data directly by file copy.

_Warning: Note that during repair, only the rows from the CSV file up to the first damaged row are copied to the new table. All other rows from the first damaged row to the end of the table are removed, even valid rows.
_

Buyer beware.

Tagged with: Databases General Linux MySQL Storage Engines

Related Posts

Readyset QueryPilot Announcement

At the MySQL and Heatwave Summit 2025 today, Readyset announced a new data systems architecture pattern named Readyset QueryPilot . This architecture which can front a MySQL or PostgreSQL database infrastructure, combines the enterprise-grade ProxySQL and Readyset caching with intelligent query monitoring and routing to help support applications scale and produce more predictable results with varied workloads.

Read more

More CPUs or Newer CPUs

In a CPU-bound database workload, regardless of price, would you scale-up or scale-new? What if price was the driving factor, would you scale-up or scale-new? I am using as a baseline the first available AWS Graviton2 processor for RDS (r6g).

Read more

An Interesting Artifact with AWS RDS Aurora Storage

As part of using public datasets with my own Benchmarking Suite I wanted upsize a dataset for larger volume testing. I have always used the INFORMATION_SCHEMA.TABLES data_length and index_length columns as a sufficiently accurate measurement for actual disk space used.

Read more