2022 April

If you have worked with an RDBMS for some time, you will likely have come across the statement SELECT 1.

However, rarely is it correctly explained to engineers what the origin of SELECT 1 is, and why it’s useless and wasteful? A google search is not going to give you the response you would hope, these ranked responses are just as useless as the statement itself.

Bloat

Seeing a SELECT 1 confirms two things. First you are using a generic ORM framework, quote, and second, you have never optimized your SQL traffic patterns.

“Frameworks generally suck.
They CLAIM to improve the speed of development and abstract the need to know SQL.
The REALITY is the undocumented cost to sub-optimal performance, especially with data persistence.”

Connection Pooling

SELECT 1 comes from early implementations of connection pooling.

What is a connection pool? Rather than a new request or call getting a new database connection each time you wanted to return some data, programming languages implemented a cache with a pre-loaded pool of pre-established database connections. The intended goal is to reduce the execution time of an initial expensive operation of getting a new database connection if you were retrieving data from a simple SELECT statement. If intelligent enough (many are not), these pools would include features such as a low watermark, a high watermark, a pruning backoff of idle connections, and an ability to flush all connections.

When your code wanted to access the database to retrieve data, it would first ask the connection pool for an available connection from its pool, mark the connection as in-use and provide that for subsequent consumption.

Here is a simple example of the two queries that would actually be necessary to retrieve one piece of information.

SELECT 1
SELECT email_address, phone, position, active FROM employee where employee_id = ?

Staleness

SELECT 1 was implemented as the most light-weight SQL statement (i.e., minimal parsing, privilege checking, execution) that would validate that your connection was still active and usable. If SELECT 1 failed, i.e. a protocol communication across your network, the connection could be dropped from the connection pool, and a new connection from the pool could be requested. While this may appear harmless, it leads to multiple code in-efficiencies, a topic for a subsequent discussion.

Failed error handling

SELECT 1 was a lazy and flawed means to perform error handling. In reality, every single SQL statement requires adequate error handling, any statement can fail at any time to complete. In the prior example, what happens if the SELECT 1 succeeds but a simple indexed SELECT statement fails? This anti-pattern also generally shows that error handling is inconsistent and highly duplicated rather than at the correct position in the data access path.

By definition, error handling is needed in an abstraction function for all SQL statements, and it needs to handle all types of error handling including the connection no longer valid, connection terminated, timed out, etc.

If you had the right error handling SELECT 1 would then be redundant, and as I stated useless. You simply run the actual SELECT statement and handle any failure accordingly.

High availability

In today’s cloud-first architectures where high availability consists of multiple availability zones and multiple regions where application A can communicate with database B, every unneeded network round-trip in a well-tuned system is wasteful, i.e. it is costing you time to render a result quicker. We all know studies have shown that slow page loads drive users away from your site.

The cost of the cloud

This AWS Latency Monitoring grid by Matt Adorjan really shows you the impact that physics has on your resiliency testing strategy when application A and database B are geographically separated and you just want one piece of information.

Conclusion

The continued appearance of SELECT 1 is a re-enforcement that optimizing for performance is a missing skill for the much larger engineering code-writing workforce that have lost the ability for efficiency. It is also another easy win that becomes an unnecessary battle for Data Architects to ensure your organization provides a better customer experience.