WeSQL Introduction – MySQL running on S3

I recently became aware of WeSQL. A MySQL-compatible database that separates compute and storage, using S3 as the storage layer. The product uses a columnar format by default which is significantly more space-efficient than InnoDB.

WeSQL introduces a new storage engine called SmartEngine using a LSM-tree-based structure that is ideal for a storage bucket implementation, and documentation shows the implementation of raft replication to combat latency concerns. There is a lot more information to review, the serverless architecture and WeScale, a database proxy and resource manager.

It was very easy to take it for an initial spin using a docker container and an AWS S3 bucket. I would really like to try CloudFlare R2 which implements the S3 API.

Under the covers there are over 180 new variables comprising 83 for the smartengine, 57 for raft, and 22 for objectstore and more. This implies a lot of tunable options and a lot of complexity to optimize for a variety of workloads using the 79 new status variables.

I was able to launch a demo and confirm

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 8.0.35    |
+-----------+
1 row in set (0.01 sec)

mysql> SELECT @@wesql_version;
+-----------------+
| @@wesql_version |
+-----------------+
| 0.1.0           |
+-----------------+
1 row in set (0.00 sec)

One of my early tests showed that it does not support FOREIGN KEYS, which is not a major concern.

ERROR 1235 (42000) at line 10: SE currently doesn't support foreign key constraints

I did have some subsequent issues with the current docs version 8.0.35-0.1.0_beta1.37 and I did revert to a prior version from docs earlier this week 8.0.35-0.1.0_beta1.gedaf338.36. Given it’s a very new product I am sure there is a lot of ongoing development.

This is just a quick introduction but it’s a definitely a different architecture in the RDBMS landscape for MySQL compatibility. I hope to run some more tests using the provided sysbench use cases and my own workloads to delve under the covers more.

New Variables

branch_objectstore_id
clone_autotune_concurrency
clone_block_ddl
clone_buffer_size
clone_ddl_timeout
clone_delay_after_data_drop
clone_donor_timeout_after_network_failure
clone_enable_compression
clone_max_concurrency
clone_max_data_bandwidth
clone_max_network_bandwidth
clone_ssl_ca
clone_ssl_cert
clone_ssl_key
clone_valid_donor_list
initialize_branch_objectstore_id
initialize_from_objectstore
initialize_objectstore_bucket
initialize_objectstore_endpoint
initialize_objectstore_provider
initialize_objectstore_region
initialize_objectstore_use_https
initialize_repo_objectstore_id
initialize_smartengine_objectstore_data
objectstore_bucket
objectstore_endpoint
objectstore_mtr_test_bucket_dir
objectstore_provider
objectstore_region
objectstore_use_https
raft_replication_allow_no_valid_entry
raft_replication_appliedindex_force_delay
raft_replication_archive_log_bin_index
raft_replication_archive_recovery
raft_replication_archive_recovery_stop_datetime
raft_replication_auto_leader_transfer
raft_replication_auto_leader_transfer_check_seconds
raft_replication_auto_reset_match_index
raft_replication_check_commit_index_interval
raft_replication_checksum
raft_replication_cluseter_info_on_objectstore
raft_replication_cluster_id
raft_replication_cluster_info
raft_replication_configure_change_timeout
raft_replication_current_term
raft_replication_disable_election
raft_replication_disable_fifo_cache
raft_replication_dynamic_easyindex
raft_replication_election_timeout
raft_replication_flow_control
raft_replication_force_change_meta
raft_replication_force_recover_index
raft_replication_force_reset_meta
raft_replication_force_single_mode
raft_replication_force_sync_epoch_diff
raft_replication_heartbeat_thread_cnt
raft_replication_io_thread_cnt
raft_replication_large_batch_ratio
raft_replication_large_event_split_size
raft_replication_large_trx
raft_replication_learner_heartbeat
raft_replication_learner_node
raft_replication_learner_pipelining
raft_replication_learner_timeout
raft_replication_log_cache_size
raft_replication_log_level
raft_replication_log_type_node
raft_replication_max_delay_index
raft_replication_max_log_size
raft_replication_max_packet_size
raft_replication_min_delay_index
raft_replication_mts_recover_use_index
raft_replication_new_follower_threshold
raft_replication_optimistic_heartbeat
raft_replication_pipelining_timeout
raft_replication_prefetch_cache_size
raft_replication_prefetch_wakeup_ratio
raft_replication_prefetch_window_size
raft_replication_purged_gtid
raft_replication_recover_backup
raft_replication_recover_new_cluster
raft_replication_reset_prefetch_cache
raft_replication_send_timeout
raft_replication_start_index
raft_replication_sync_follower_meta_interva
raft_replication_with_cache_log
raft_replication_worker_thread_cnt
recovery_snapshot_from_objectstore
recovery_snapshot_only
recovery_snapshot_timestamp
recovery_snapshot_tmpdir
repo_objectstore_id
server_id_on_objectstore
serverless
smartengine_auto_shrink_enabled
smartengine_auto_shrink_schedule_interval
smartengine_batch_group_max_group_size
smartengine_batch_group_max_leader_wait_time_us
smartengine_batch_group_slot_array_size
smartengine_block_cache_size
smartengine_block_size
smartengine_bottommost_level
smartengine_bulk_load_size
smartengine_compact
smartengine_compaction_delete_percent
smartengine_compaction_task_extents_limit
smartengine_compaction_threads
smartengine_compression_options
smartengine_compression_per_level
smartengine_concurrent_writable_file_buffer_num
smartengine_concurrent_writable_file_buffer_switch_limit
smartengine_concurrent_writable_file_single_buffer_size
smartengine_data_dir
smartengine_deadlock_detect
smartengine_disable_auto_compactions
smartengine_disable_instant_ddl
smartengine_disable_online_ddl
smartengine_disable_parallel_ddl
smartengine_dump_memtable_limit_size
smartengine_enable_2pc
smartengine_estimate_cost_depth
smartengine_flush_delete_percent
smartengine_flush_delete_percent_trigger
smartengine_flush_delete_record_trigger
smartengine_flush_log_at_trx_commit
smartengine_flush_memtable
smartengine_flush_threads
smartengine_hotbackup
smartengine_idle_tasks_schedule_time
smartengine_level0_file_num_compaction_trigger
smartengine_level0_layer_num_compaction_trigger
smartengine_level1_extents_major_compaction_trigger
smartengine_level2_usage_percent
smartengine_level_compaction_dynamic_level_bytes
smartengine_lock_scanned_rows
smartengine_lock_wait_timeout
smartengine_master_thread_compaction_enabled
smartengine_master_thread_monitor_interval_ms
smartengine_max_background_dumps
smartengine_max_free_extent_percent
smartengine_max_row_locks
smartengine_max_shrink_extent_count
smartengine_max_write_buffer_number_to_maintain
smartengine_memtable_size
smartengine_min_write_buffer_number_to_merge
smartengine_mutex_backtrace_threshold_ns
smartengine_parallel_flush_log
smartengine_parallel_read_threads
smartengine_parallel_recovery_thread_num
smartengine_parallel_wal_recovery
smartengine_pause_background_work
smartengine_persistent_cache_dir
smartengine_persistent_cache_mode
smartengine_persistent_cache_size
smartengine_purge_invalid_subtable_bg
smartengine_query_trace_print_slow
smartengine_query_trace_sum
smartengine_query_trace_threshold_time
smartengine_rate_limiter_bytes_per_sec
smartengine_reset_pending_shrink
smartengine_row_cache_size
smartengine_scan_add_blocks_limit
smartengine_shrink_allocate_interval
smartengine_shrink_table_space
smartengine_sort_buffer_size
smartengine_stats_dump_period_sec
smartengine_strict_collation_check
smartengine_strict_collation_exceptions
smartengine_table_cache_numshardbits
smartengine_table_cache_size
smartengine_total_max_shrink_extent_count
smartengine_total_memtable_size
smartengine_total_wal_size
smartengine_unsafe_for_binlog
smartengine_wal_dir
smartengine_wal_recovery_mode
smartengine_write_disable_wal
snapshot_archive
snapshot_archive_dir
snapshot_archive_expire_auto_purge
snapshot_archive_expire_seconds
snapshot_archive_innodb_tar_mode
snapshot_archive_on_objectstore
snapshot_archive_period
snapshot_archive_smartengine_backup_checkpoint
snapshot_archive_smartengine_tar_mode
table_on_objectstore
wesql_version

New Status

Com_show_consensuslogs
Com_raft_replication_start
Com_raft_replication_stop
Com_native_admin_proc
Com_native_trans_proc
Com_show_consensuslog_events
Smartengine_block_cache_miss
Smartengine_block_cache_hit
Smartengine_block_cache_add
Smartengine_block_cache_index_miss
Smartengine_block_cache_index_hit
Smartengine_block_cache_filter_miss
Smartengine_block_cache_filter_hit
Smartengine_block_cache_data_miss
Smartengine_block_cache_data_hit
Smartengine_row_cache_add
Smartengine_row_cache_hit
Smartengine_row_cache_miss
Smartengine_memtable_hit
Smartengine_memtable_miss
Smartengine_number_keys_written
Smartengine_number_keys_read
Smartengine_number_keys_updated
Smartengine_bytes_written
Smartengine_bytes_read
Smartengine_block_cachecompressed_miss
Smartengine_block_cachecompressed_hit
Smartengine_wal_synced
Smartengine_wal_bytes
Smartengine_write_self
Smartengine_write_other
Smartengine_write_wal
Smartengine_number_superversion_acquires
Smartengine_number_superversion_releases
Smartengine_number_superversion_cleanups
Smartengine_number_block_not_compressed
Smartengine_snapshot_conflict_errors
Smartengine_wal_group_syncs
Smartengine_rows_deleted
Smartengine_rows_inserted
Smartengine_rows_updated
Smartengine_rows_read
Smartengine_system_rows_deleted
Smartengine_system_rows_inserted
Smartengine_system_rows_updated
Smartengine_system_rows_read
Smartengine_max_level0_layers
Smartengine_max_imm_numbers
Smartengine_max_level0_fragmentation_rate
Smartengine_max_level1_fragmentation_rate
Smartengine_max_level2_fragmentation_rate
Smartengine_max_level0_delete_percent
Smartengine_max_level1_delete_percent
Smartengine_max_level2_delete_percent
Smartengine_all_flush_megabytes
Smartengine_all_compaction_megabytes
Smartengine_top1_subtable_size
Smartengine_top2_subtable_size
Smartengine_top3_subtable_size
Smartengine_top1_mod_mem_info
Smartengine_top2_mod_mem_info
Smartengine_top3_mod_mem_info
Smartengine_global_external_fragmentation_rate
Smartengine_write_transaction_count
Smartengine_pipeline_group_count
Smartengine_pipeline_group_wait_timeout_count
Smartengine_pipeline_copy_log_size
Smartengine_pipeline_copy_log_count
Smartengine_pipeline_flush_log_size
Smartengine_pipeline_flush_log_count
Smartengine_pipeline_flush_log_sync_count
Smartengine_pipeline_flush_log_not_sync_count

Managing SQL Drift: Ensuring Stability in Database Transitions

SQL drift is a significant challenge that occurs when SQL statements from an existing system produce unexpected results after migration to a new environment or system. These issues manifest in several critical ways: SQL statements may generate new execution errors, experience significant performance degradation, or yield differences in data integrity. Such challenges extend beyond simple compatibility issues, stemming from variations in database engines, optimization strategies, and SQL implementations. SQL drift represents a fundamental shift in how SQL behaves across different platforms and versions. Whether during on-premises to cloud migrations, transitions to managed services, database vendor switches, or even routine version upgrades, SQL drift presents a critical consideration for data-driven applications.

SQL drift frequently occurs during:

  • On-premises to cloud migrations
  • Cloud to managed service transitions
  • Cross-product migrations (e.g., switching database vendors)
  • Database version upgrades
  • Platform modernization efforts

The implications of SQL drift can be significant, leading to application instability, increased operational costs, and delayed migration timelines. The impact often extends to compromised data quality and results in a degraded user experience as systems become less reliable and responsive. Successfully managing SQL drift involves four key stages:

  1. Identification
  2. Prioritization
  3. Correction
  4. Validation

Identification is the critical first step in managing SQL drift, focusing on systematically discovering potential issues. This phase involves detecting SQL statements that may behave differently in the new environment, analyzing syntax compatibility across platforms, establishing performance baselines, and validating data outputs to ensure consistency.

Prioritization involves evaluating SQL drift issues based on business impact, risk assessment, resource allocation, and migration scheduling to determine the optimal order for resolution.

Correction addresses SQL drift through code remediation, performance optimization, syntax updates, and developing alternative solutions when necessary.

Validation confirms SQL drift corrections through comprehensive testing, performance verification against established baselines, and data integrity checks to ensure the corrected SQL maintains its intended functionality.

An effective way to demonstrate the impact of SQL drift is by using a sample collection of SQL statements executed across different versions of MySQL. The End of Life (EOL) for MySQL 5.7, coupled with AWS RDS and AWS RDS Aurora beginning extended support in 2024, has increased costs for organizations that are not proactive in managing database migrations. This situation is particularly common in development-focused teams that lack dedicated architecture and operations resources.

A MySQL demonstration of SQL Drift

Using a subset of SQL statements executed in MySQL 5.7 and subsequent MySQL versions 8.0, 8.4, 9.0, and 9.1,Next BaseLine can examine the impact of SQL drift. This output shows the changing state of errors, deprecations, warnings and notices for the 42 example SQL statements.

Example Output from Next BaseLine

In MySQL 5.7, the use of the keyword SQL_NO_CACHE in an SQL statement presents as a deprecated warning.

17 Deprecations
ID: 5, Hash: f31f2e99b2
  SQL: "SELECT SQL_NO_CACHE 1;"
  Deprecation: (1681) 'SQL_NO_CACHE' is deprecated and will be removed in a future release.

In MySQL 8.0, the MySQL Query Cache is removed, however the use of SQL_NO_CACHE in SQL statements is still valid. Even in the next GA version, 8.4, this SQL keyword is still on the deprecated list, and it continues to deprecated in the current 9.1 innovation release.

A different example of deprecated functions are ENCRYPT and DES_ENCRYPT.

ID: 17, Hash: 947fcef53a
  SQL: "SELECT ENCRYPT('BaseLine',1);"
  Deprecation: (1287) 'ENCRYPT' is deprecated and will be removed in a future release. Please use AES_ENCRYPT instead
ID: 18, Hash: 364c0ffbf4
  SQL: "SELECT DES_ENCRYPT('BaseLine');"
  Deprecation: (1287) 'DES_ENCRYPT' is deprecated and will be removed in a future release. Please use AES_ENCRYPT instead

In MySQL 8.0, these SQL statements produce a hard error. These actually present as internal functions that are not present in the schema used rather than a “FUNCTION does not exist”. (More on this later).

ID: 17, Hash: 947fcef53a
  SQL: "SELECT ENCRYPT('BaseLine',1);"
  Error 1370 (42000): execute command denied to user 'nextbaseline'@'%' for routine 'airport.ENCRYPT'
ID: 18, Hash: 364c0ffbf4
  SQL: "SELECT DES_ENCRYPT('BaseLine');"
  Error 1370 (42000): execute command denied to user 'nextbaseline'@'%' for routine 'airport.DES_ENCRYPT'

Some example GIS SQL statements that in MySQL 5.7 present as deprecated, however they each are a different error number.

ID: 19, Hash: f319748e0c
  SQL: "SELECT CONTAINS(ST_GeomFromText('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))'), ST_GeomFromText('POINT(5 5)'));"
  Deprecation: (1287) 'CONTAINS' is deprecated and will be removed in a future release. Please use MBRCONTAINS instead
ID: 20, Hash: d686267b19
  SQL: "SELECT ST_GeomFromWKB(Point(0, 0));"
  Deprecation: (3195) st_geometryfromwkb(geometry) is deprecated and will be replaced by st_srid(geometry, 0) in a future version. Use st_geometryfromwkb(st_aswkb(geometry), 0) instead.

In MySQL 8.0+, these two deprecated statements produce different error messages.

ID: 19, Hash: f319748e0c
  SQL: "SELECT CONTAINS(ST_GeomFromText('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))'), ST_GeomFromText('POINT(5 5)'));"
  Error 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(ST_GeomFromText('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))'), ST_GeomFromText('POI' at line 1
ID: 20, Hash: d686267b19
  SQL: "SELECT ST_GeomFromWKB(Point(0, 0));"
  Error 3037 (22023): Invalid GIS data provided to function st_geomfromwkb.

Migrating a WordPress site

A more realistic example would involve taking production workload, such as WordPress running on a self-hosted MySQL 5.7 server, and assessing the potential impact of switching to MySQL 8.0 without upgrading the application (not a recommended approach). We have collected representative Production SQL statements for this WordPress setup, referred to as a BaseLine

After collecting SQL traffic and testing this workload against a MySQL 5.7 environment, previously unnoticed SQL warnings were highlighted for the team.

When executed against an upgraded MySQL 8.0 instance, problematic SQL statements were immediately identified. For a larger, more complex product, this process would help prioritize where resources are most needed.

A modern cloud database implementation

Finally, let’s consider TiDB from PingCap as an example of validating your application with a cloud implementation. Using the same small set of 42 SQL statements, TiDB has taken a proactive approach by entirely eliminating warnings in their MySQL protocol. In TiDB, SQL statements are now either valid SQL syntax or produce a hard error.

What was a deprecation for ENCRYPT is now a hard error. Also, a more correct error message is provided ‘FUNCTION does not exist’.

ID: 17, Hash: 947fcef53a
  SQL: "SELECT ENCRYPT('BaseLine',1);"
  Error 1305 (42000): FUNCTION ENCRYPT does not exist
ID: 18, Hash: 364c0ffbf4
  SQL: "SELECT DES_ENCRYPT('BaseLine');"
  Error 1305 (42000): FUNCTION DES_ENCRYPT does not exist

In MySQL 5.7, ENCODE was deprecated and in MySQL 8.0+ it was removed. In TiDB, it is a valid function.

TiDB also produces some interesting artifacts in error messages for SQL statements not seen with MySQL. An example is Error 1235 ... has only noop implementation in tidb now .... This syntax however shows that a setting can change the status of these SQL statements.

ID: 14, Hash: dbcb4b05a2
  SQL: "SELECT table_name, count(*) FROM information_schema.tables GROUP BY table_name ASC;"
  Error 1235 (42000): function GROUP BY expr ASC|DESC has only noop implementation in tidb now, use tidb_enable_noop_functions to enable these functions
...
ID: 26, Hash: 7369c77d51
  SQL: "SELECT SQL_CALC_FOUND_ROWS * FROM information_schema.schemata;"
  Error 1235 (42000): function SQL_CALC_FOUND_ROWS has only noop implementation in tidb now, use tidb_enable_noop_functions to enable these functions

Even during development, an interesting and unintended bug in early testing, resulted in an interesting error using TiDB.

ID: 31, Hash: 9cae50cbfc
  SQL: "SELECT DATE('2024-01-01   10:00:00'); /* Example of bad data causing warning */SELECT 'abc' AS full;"
  Error 8130 (HY000): client has multi-statement capability disabled. Run SET GLOBAL tidb_multi_statement_mode='ON' after you understand the security risk

Conclusion

Next BaseLine is now available in limited beta. Eliminate the uncertainty around “Will the migration work?” by performing an independent risk assessment of your product in a migrated database environment before committing to ad-hoc engineering efforts. If you’re interested in seeing a demo with your own SQL workload, you can register here.

Next BaseLine currently supports MySQL, PostgreSQL, Oracle, and SQL Server RDBMS products, covering both self-hosted and cloud-managed implementations across AWS, GCP, Azure, and Alibaba. It supports multiple MySQL- and PostgreSQL-compatible databases, including TiDB, SingleStore, Neon Serverless, Nile, ElephantSQL, TimeScale, and more. Additional compatibility is available for Snowflake, ClickHouse, and DuckDB.