Binary Log Replayer

When using the replication slave stream, or mysql command line client and mysqlbinlog output from a binary/relay log, all statements are executed in a single thread as quickly as possible.

I am seeking a tool to simulate the replay of the binary/relay log for a benchmark at a pace that is more representative to original statements. For a simple example, if the Binary Log has 3 transactions in the first second, 2 transactions in the second second, and 5 transactions in the third second, I am wanting to simulate the replay to take roughly 3 seconds, not as fast as possible (which would be sub-second). The tool should try to wait the remainder of a second before processing SQL statements in the incoming stream.

Does anybody know of a tool that currently provides this type of functionality? Any input appreciated before I create my own.

Speaking in Denver

Following a heavy schedule in the last month speaking in Tokyo, Beijing, Manila and Auckland it is nice to be on home soil for upcoming speaking. I will be in Denver, Colorado for RMOUG 2012 from February 14-16, 2012 where I will be speaking about the essentials of MySQL security.

I hope to also organize another presentation in the area for the local MySQL users group. More to follow.

And a friendly reminder, the annual MySQL conference is on again, same place (Santa Clara), same dates (April 10-12), same great speakers (too many to say), just a new primary sponsor. Why not submit a paper to present. Call for papers is still open, closing on December 5. More information at Percona Live MySQL Conference & Expo 2012.

Do you have a MySQL horror story to share?

I am looking for a few more unique examples to add to the final chapter of my upcoming book on MySQL Backup & Recovery. If you would like to share your fun experience, receive a mention and a free copy please let me know via comment. If you would like to share but not have your comment published, please note at top of your feedback.

Thanks for helping to contribute to a detailed list of what could go wrong and how to be prepared for a MySQL disaster.

NoSQL from a RDBMS company

Oracle has announced an open source product for the NoSQL space, the Oracle NoSQL Database. Unlike other popular products including Redis, MongoDB, Cassandra, Voldermort and many others, Oracle has set a benchmark on the features that are truly necessary for highly available data systems.

Many products in the NoSQL space have told you that consistency is not needed, eventual consistency is good enough, that transactions are not performant enough to include as a feature. No standards exists, there is no common interface for communication, or key features that products aim to meet or better. With this product, features including transactions, replicated data and failover which are built in, are features other open source NoSQL products will need to match.

Oracle NoSQL Database is a key value store, supporting a major/minor key for co-locating regularly accessed information for more consistent data retrieval. The API (built in Java) supports GET, PUT, and DEL operators. The system is designed to not have a single point of failure, and to support a node failure without impact. The replication factor is reported to enable up to 7 copies of information, which would be a feature to support cross data center management. The database driver is latency aware, so this can support load balancing operations for optimal performance.

I am excited to hear about this and looking forward to evaluating the software. I will be watching more closely how the integration of MySQL and Oracle NoSQL can be an offering for startups and Web 2.0

The Effective MySQL Book Series

Effective MySQL: Optimizing SQL StatementsAnnounced on Sunday at Oracle Open World 2011 is the release of the Effective MySQL book series starting with the “Optimizing SQL Statements” title. The goal of the Effective MySQL series is a highly practical, concise and topic specific reference providing applicable knowledge to use on each page. A feedback comment provided today was “no fluff” which is great comment to re-enforce the practical nature of the series.

Details on the Effective MySQL Optimizing SQL Statements page include a sample chapter, code downloads and purchase links for print and e-books at Amazon, McGraw-Hill and Barnes & Noble.

Visualizing reqstat

The reqstat tool was written to provide a vmstat like output of total web requests happening in real time. This really lightweight monitoring leverages memcached and has a trivial impact for immediate benefit.

  $ ./reqstat 5 5
  epoch,time,rps,avg_req,last,%comp,---,threshold,exceed,last_excess
  1307723122,162522,25,125.92,75.25,48,---,150,9,175.55
  1307723127,162527,24,107.33,6.97,48,---,150,6,188.45
  1307723132,162532,25,118.39,97.63,50,---,150,8,151.37
  1307723137,162537,22,120.51,88.62,42,---,150,5,168.56
  1307723142,162542,26,106.62,6.12,51,---,150,6,167.81

While this is useful, I can see 22-26 requests per second, averaging 106-120 ms, visualizing this gives more information immediately available like:

It is easy to look at an average and lose site of the larger picture. What are the outliers, how many are there? Visualizing of larger samples (a later example which will show 10,000 rps across multiple servers) shows that the granularity is also critical.

This graphic is produced with Flot. A very simply javascript library. You can also use gnuplot, an example script is included in github.

This output is the result of benchmarking, this being generated from reqstat output with a script in my monitor git repository.

Visualizing crowd sourcing data

At the closing keynote of the recent Strata Summit in New York, O’Reilly Media founder Tim O’Reilly showed a representation of crowd sourced data on Wikipedia of the 2011 Tōhoku earthquake and tsunami, showing a before and after picture of the page. While interesting, it did not represent what could be shown with the data.

Using the Wikipedia API’s, some features of my VisMarks startup I was able to create a better representation showing an animation of the article over time. While this Wikipedia Earthquake Animation (on a different page for loading) shows a representation of the first 1,000 revisions it highlights one cool way visualize crowd sourced data. Pay particular attention to the new language articles introduced, the images and table of contents as different types of data being added.

While the likes of Twitter and Facebook can provide a stream of information on an emerging event, Wikipedia is unique in that individuals contribute to a single source of combined information. This removes all the noise of duplication. It does not remove the CRUD, however as seen in this article this is quickly removed by others in the community.

It is also cool to see the size of the article grow over time. Below is a graph of the first 24 hours.

These are simple examples of using public API’s and simple tools (in this case, imagemagick,gnuplot and some shell scripts) to tell a story with data visualization.

Reasons to use MySQL 5.5 Presentation

I recently gave a presentation at the New York Effective MySQL Meetup on the new features of, and some of the compelling reasons to upgrade to MySQL 5.5. There are also a number of new MySQL variables that can have a dramatic effect on performance in a highly transactional environment, innodb_buffer_pool_instances and innodb_purge_threads are just two to consider.

For more information on all the new variables, status, reserved words and benchmarks of new features you can Download Presentation Slides.

Utilizing multiple indexes per MySQL table join

Historically it was considered that MySQL will generally use only one index per referenced table in a SQL query. In MySQL 5.0 the introduction of merge indexes enabled for certain conditions the possibility to utilize two indexes however this could result in worst performance then creating a better index. In MySQL 5.1 it became possible to control optimization switches with the optimizer_switch system variable.

However in explaining how to utilize the intersection, union and sort union in queries I discovered that MySQL could use three indexes for one given table.

        Extra: Using union(name,intersect(founded,type)); Using where

I was not aware of this.

Extra: Using Index

Many people consider this information in the MySQL Query Execution Plan (QEP) to indicate that the referenced table is using an index. It actually means that ONLY the index is used. This can for larger and more frequent queries provide a significant boost.

In a recent example, adding an index dropped a query from 190ms to 6ms. However adding a better index dropped that 6ms query to 1.2ms. When executed 100s/1000s of times per second these millisecond improvements can have a huge benefit in greater scalability. While people often tune slow running queries, in a well tuned system shaving milliseconds of queries, in this example making 6ms query 80% better is a far greater improvement.

You can get a detailed explanation of how to identify, create and verify covering indexes from my Percona Live presentation Improving performance with better indexes where I also include another great 10 table join example, reducing a query running 20,000+ times per second from 175ms to 10ms.

Why SQL_MODE is important

Today was another example of where a correct SQL_MODE saved customer data from being corrupted. By default, MySQL does not enforce data integrity. It allows what is called silent truncations where the result of what you INSERT or UPDATE does not represent truth. NOTE: I see very few customers ever have this correctly configured, those that do have actually listened to my advice.

If you do not read any further, your production MySQL environments should be running with at the bare minimum of SQL_MODE=STRICT_ALL_TABLES however I would also advocate for additional SQL_MODE settings.

For this example, some modified undesirable code attempted to reduce a counter by 1, however because of an UNSIGNED data type and a correctly set SQL_MODE, the application produced an error and data was not corrupted.

This is what should happen with your SQL.

mysql> update stats set loss_count=loss_count - 1 where user_id=42;
ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(`db`.`stats`.`loss_count` - 1)'

It is interesting to note that the error message actually is misleading. The datatype for the column is SMALLINT however the error message prompted an unnecessary schema verification. Even with the calculation I would have assumed the inherited data type would be the column definition before the subtraction. If you try to set the value automatically to negative you get a better message.

mysql> update stats set loss_count=-1 where user_id=42;
ERROR 1264 (22003): Out of range value for column 'loss_count' at row 1

So why is this a big deal? If your MySQL instance is not using SQL_MODE then the result would have been a value of 0 and this would never highlight the bug in the code.

I should point out that only Drizzle of the MySQL forks and variants has tightened up this data integrity, however I have to agree to disagree with the decision to drop UNSIGNED. While I under internally the code simplification behind the decision, without check constraints in MySQL, UNSIGNED is a saving grace. In this example, Drizzle in this case would not have reported an error.

Percona Live New York is underway

Today we have a dedicated MySQL conference in New York with Percona Live. It is great to see an overflowing room in the opening keynote. With over 20 speakers and 4 dedicated tracks there is a lot of content for attendees.

With all the confusion over conference ownership since the Oracle acquisition I applaud Percona for taking an initiative, first in San Francisco and now here in New York. Also announced today is the next Percona Live in London which is great for the MySQL ecosystem in Europe.

Another reason to avoid RDS

My list of reasons for never using or recommending Amazon’s MySQL RDS service grows every time I experience problems with customers. This was an interesting and still unresolved issue.

ERROR 126 (HY000): Incorrect key file for table '/rdsdbdata/tmp/#sql_5b7_1.MYI'; try to repair it

You may see this is a MyISAM table. The MySQL database is version 5.5, all InnoDB tables and is very small 100MB in total size.
What is happening is that MySQL is generating a temporary table, and this table is being written to disk. I am unable to change the code to improve the query causing this disk I/O.

What I can not understand and have no ability to diagnose is why this error occurs sometimes and generally when the database is under additional system load. With RDS you have no visibility of the server running the production database. While you have SQL access, an API for managing MySQL configuration options (I also add not all MySQL variables), and limited system statistics via a graphical interface, all information about the system performance, disk configuration etc is hidden and not accessible. This is a frustrating limitation of using RDS.

NOTE: While I cannot recommend RDS, I am very happy with AWS EC2 services when correctly configured. For a cloud based MySQL solution I would definitely recommend greater control over your MySQL database using EC2 and EBS.

query_cache_size=0 is not enough

Last week at the OUG Harmony conference thanks to Dimitri Kravtchuk I learned that setting query_cache_size=0 does not disable and remove locking from the Query Cache. You actually need to also set query_cache_type=0. This appears to been a bug, seen in the presently still open MySQL bugs database entry #38511.

My recommendation to customers now is to set both variables on all existing MySQL versions if you are not using the MySQL Query Cache.

Thanks to the Performance Schema in MySQL 5.5 for uncovering this. More information in Dimitri’s detailed post at MySQL Performance: Using Performance Schema

Details of all MySQL presentations at OUG Harmony 2011 in Helsinki, Finland.

Free does not mean cheap

Many organizations consider MySQL as a database because the initial license cost is free (*). Larger organizations that use Oracle and SQL Server also consider implementing MySQL as a means to lower the total cost of software infrastructure due to the initial cost for new software licenses or expensive upgrades due to new hardware.

However free software does not mean that services to support MySQL should be also free or even cheap. Recently a large multi-national customer wanted professional consulting and training for MySQL resources and they seemed shocked that I wanted to charge a reasonable rate for professional services. My MySQL consulting rates are cheaper then industry MySQL peers and also similar skilled resources providing Oracle consulting.

With over 10 years experience in MySQL, many doing consulting and also over 10 years professional experience before MySQL I am more then qualified to provide the best possible consulting available for architecture design, performance analysis and tuning, high availability, training and education. My significant contributions to the MySQL community including blogging, speaking and presenting is also not an indicator that companies should expect a significantly different nor cheaper price for professional consulting.

Speaking at Percona Live New York

As the top MySQL expert in New York it is great to join the team at Percona for the upcoming Percona Live in New York City on May 26th. As an invited speaker I am joining a select list of expert speakers including Harrison Fisk from Facebook, Kurt von Finck from Monty Program and Monty Taylor from the core Drizzle team.

My presentation will be on Improving Performance with Better Indexes where I will not only show how to apply indexes to improve query performance, but how to apply better indexes and provide even greater performance gains via a great technique known as a covering index.

Upcoming MySQL presentation in New York

On Tuesday I will be speaking in New York at the Effective MySQL Meetup group where I will be giving the presentation “MySQL Idiosyncrasies That Bite”. For more information and to register, check out the Meetup Event. There are just 10 seats left.

To promote the upcoming Percona Live event in New York, for attendees to the Meetup there will be a draw for a FREE ticket to the May 26th event.

Effective MySQL New York is the only MySQL group now operating in New York. Please join our group for the latest information and events for the MySQL community.

Basic scalability principles to avert downtime

In the press in the last two days has been the reported outage of Amazon Web Services Elastic Compute Cloud (EC2) in just one North Virginia data center. This has affected many large website includes FourSquare, Hootsuite, Reddit and Quora. A detailed list can be found at ec2disabled.com.

For these popular websites was this avoidable? Absolutely.

Basic scalability principles if deployed in these systems architecture would have averted the significant downtime regardless of your development stack. While I work primarily in MySQL these principles are not new, nor are they complicated, however they are fundamental concepts in scalability that apply to any technology including the popular MongoDB that is being used by a number of affected sites.

Scalability 101 involves some simple basic rules. Here are just two that seem to have been ignored by many affected by this recent AWS EC2 outage.

  1. Never put all your eggs in one basket. If you rely on AWS completely, or you rely on just one availability zone that is putting all your eggs in one basket.
  2. Always keep your important data close to home. When it comes to what is most critical to your business you need access and control to your information. At 5am in the morning when the CEO asks how long will our business be unavailabla and what is needed to resolve the problem, the answer “We have no control over this and have no ETA” is not an acceptable answer.

With a successful implementation and appropriate data redundancy you may not have an environment immediately available however you have access to your important information and the ability to create one quickly. Many large hosting companies can provide additional H/W on near demand, especially if you have an initial minimal footprint. Indeed using Amazon Web Services (AWS) as a means to avert a data center disaster is an ideal implementation of Infrastructure As A Service (IAAS). Even with this issue, organizations that had planned for this type of outage could have easily migrated to another AWS availability zone that was unaffected.

Furthermore, system architecture to support various levels of data availability and scalability ensure you can handle many more various types of unavailability without significant system down time as recently seen. There are many different types of availability and unavailability, know what your definition of downtime is and supporting disasters should be your primary focus of scalability, not an after thought.

As an expert in performance and scalability I can help your organization in the design of a suitable architecture to support successful scalability and disaster. This is not rocket science however many organizations gamble without the expertise of a professional to ensure business viability.

Use Replication for backups? Are you schemas consistent?

Many people have a master/slave MySQL environment of various different topologies, and many use the slave as a backup.
Is your slave schema identical to your production schema? As long as an SQL statements completes without an error, your slave schema can differ. Common examples are different indexes or storage engines for example. However if you use the slave as backup, you want to ensure when you recover, you are recovering a production environment, not a slave environment.

While the changes may be small, the can lead to different results. For a client I found that the default value of a price field was 10.00 in one schema and 0.00 in another. Not withstanding using defaults for important fields and not defining in an INSERT is a different issue, it highlighted the different schemas can easily exist.

My tool of choice is SchemaSync. The reason why I like this command is that it provides migration scripts to upgrade or downgrade your schemas.

I vote for Planet MySQL moderation

How this happens or who does it is obviously a larger and more complex conversation however it is better then involving innocent animals.

How is it that trivial $#*! gets voted and has a string of comments I can not explain, however Planet MySQL should have practical MySQL related content. I would vote for the front page and RSS feed to show only moderated content, and content that did not pass the cut (and there is a lot of this) can be available on a less important page if necessary.

Rather then complaining like many open source communities, let us propose a way to improve the system we use.

Checked that MySQL backup log lately?

Running a MySQL backup and ensuring it completed successfully and backup files exist is not enough. In my B&R Quiz from Checked your MySQL recovery process recently? one important step is “Do you review your backup logs EVERY SINGLE day or have tested backup monitoring in place?”

This is what I found when reviewing a backup log for a client today.

mysqldump: Got error: 1142: SELECT,LOCK TABL command denied to user 'root'@'localhost' for table 'cond_instances' when using LOCK TABLES
mysqldump: Got error: 1142: SELECT,LOCK TABL command denied to user 'root'@'localhost' for table 'cond_instances' when using LOCK TABLES

The backup script was completing, backup files were in place (and are listed in the log file) however these errors were occurring.

Further investigation was less then one minute of actual work. This server runs multiple MySQL instances and recently one instance was upgraded from MySQL 5.1 to MySQL 5.5 however the call to mysqldump was not. This error was the result of running 5.1 mysqldump against a 5.5 database. Change the hard coded path and the error message went away.

The greater problem is teaching people to understand the importance of backups.

Why are we standing still?

I wrote an email a week ago to several close friends titled, “[w]hy are we standing still?” I opened with “[y]ou are all good friends and you are all smart people. We need to work together more … I deal with startups all the time and I rarely find a team of smart articulated people, so why can’t we just do this?”

What was the motivation? I had just read online that startup A has 100 million users and startup B had just raised $42 million. I came to the realization that I am wasting my time trying to develop significantly better tools in my chosen profession because I would never achieve these types of numbers. Those tools would never need the performance and scalability expertise for which I am widely recognized. I am, as CNN Money recently wrote in “Tech companies desperate for ‘rockstarninja’ engineers,” a “Rockstar Ninja” in the tech field.

I work with startups daily and most abuse the technology being used. Technology is not even the problem. In fact, for me and my group of close tech friends, it is almost trivial at times. What is complex, however, is the people and the process combined with the one thing that nobody can escape, time. Ideas are also not the problem, I have plenty of those. I even have several ideas at various stages of actual implementation including VisMarks – Visual Bookmarking and Mooify – The social barometer for moods, thoughts and emotions. These are each at different levels of initial completion. Are they world changers? Probably not, however, they are iterations of the process of what Eric Ries calls The Lean Startup Machine. (Side Note: The next Lean startup machine weekend event is in New York starting on April 1st.)

So back to my band of smart friends, why are we still standing still? Not being content with just talking, I took action and have organized a 24 hour weekend collaboration at my own home for next weekend, code named JFDI/Bliss (there is an interesting story behind the name). I set aside not one project to tackle but three. And by tackle, I mean create, deploy, iterate and even complete a MVP, keeping in mind the technology is not the hard part. The first project is from my good friend Graham, the founder of Ultra Light Startups. We have had many conversations about our respective ideas in the past years, discussing different potential projects. This project is actually referenced by our second project from another friend, John (uBlanket his current startup), that we discussed just last week over dinner. The third, my own VisMarks project, needs only one technical hurdle solved and some user interface design to complete initial functionality. We will also be reviewing and integrating technology from the Lean Startup Bundle for SXSW for the The Lean Startup Challenge.

I can easily provide the food and drink, internet, power, a large finished basement area, a back yard if the weather is good, bedding (it’s a 24 hr event) and even some of my world class beer collection so there are no physical hurdles or expenses to start working together. That is our lean ultralight startup. We have two goals: first, to be able to collaborate in person because thought can be multiplied exponentially when discussing any idea or problem and second, to have fun.

So it all sounds easy, right? Wrong. What’s missing? Just as the capability of being able to code is only a small portion of being a qualified and competent developer, creating the technology is only one portion of a successful startup. Managing Director of TechStars New York, David Tisch (@davetisch) stated at a recent Ultra Light Startups panel discussion some key points that highlighted the resource components of a successful startup. The ideals of this NYC accelerator are mentorship, network and exposure. We need all three. When asked what is the ideal skill set for a team, his response was three people: one business, one technical and one product person. We need those as well. However, when David was asked what is most important when choosing startups for their program, the answer was the people on the team. Important considerations are how you work together, what have you done together as a team, where do you most need help. For the last question that answer is easy. We need help and mentoring in areas including marketing, business development, legal, accounting, PR, promotion, funding, sales, leadership, management, vision, etc. All of these cost money we do not presently have and are necessary to get the traction of millions of users.

The goal should not just to become rich but be happy, have a lot of fun and make a difference in the world in small way we know best, technology.

There are many innovators in our industry, however, the following are a few I follow very closely. These include Dave McClure (@davemcclure), who I followed for a long time before we first met at the inaugural Rethink Hawaii event in 2009. Dave’s AARRR startup metrics for pirates approach is something I share with many clients that do not see the critical need of tracking what their leads do from initial acquisition. This is necessary to help answer questions including what is the your total cost of acquisition of a paying client and what is the best return on investment.

Eric Ries (@ericries) is a person I have followed now for over a year reading, many great posts including What is a startup?, Four (not five) myths about the Lean Startup and Revisiting the Software Design Manifesto to name a few.

Finally, the Kauffman Foundation (@kauffmanfdn), provides good resources and opportunities including the recent 2011 State of Entrepreneurship Address in Washington DC. Entrepreneurs create new businesses and new businesses are the greatest source of jobs and this creates a better economy. Entrepreneurship is also not just about creating something new, it is also about finding better ways of doing things we presently do. Their work with immigration reform, and involvement with the Startup Visa and Statup America are issues close to my own heart.

I have easily been distracted from my day job and upcoming speaking presentations researching work for our initial kickoff. Excitement is a great motivator to achieving something great.

MySQL conference schedule

I am one of the crazy individuals(*) that will be speaking at both the regular O’Reilly MySQL Conference and the IOUG Collaborate conference both being held in the second week of April. My 4 presentations are:

Upcoming NY Presentation – How Better Indexes Save You Money

For all those in New York this is an upcoming MySQL presentation held in conjunction with our colleagues at General Assembly on March 22nd 2011.

This presentation “How Better Indexes Save You Money” will be discussing how one simple technique can result in huge MySQL performance improvements and with zero code changes necessary. Many people think they know indexes, however MySQL and MySQL Storage Engines have some specifics that differ from more traditional RDBMS products. Learn some of the key analysis and verification techniques and be able to see immediate potential results in performance.

You can find more details at Meetup.com EffectiveMySQL. This new group is all about highly technical MySQL related content “no fluff, just stuff”.

Your PHP installation appears to be missing the MySQL extension which is required by WordPress.

I recently deployed a new WordPress installation to my existing production webserver running Apache, MySQL and PHP for other websites, yet I was presented with the following message.

“Your PHP installation appears to be missing the MySQL extension which is required by WordPress.”

This thread at wordpress.org did not help me, however I was able to solve the problem, but this thread is now marked as closed. That’s poor form because I can’t share the solution I found.

My PHP configuration file did not have the following.

#php.ini
[PHP]
extension=mysql.so

Adding this and restarting Apache did not fix the problem.

The problem was more fundamental and required PHP to be recompiled. Orginally PHP was configured with the ‘–with-mysqli’ option. PHP requires the ‘–with-mysql’ which is rather stupid they have this dependency.

Recompiling PHP and adding the necessary extension were both necessary to get my new WordPress installation operational.

Part 2 – Simple lessons in improving scalability

Given the popular response from my first lesson in improving scalability where I detailed simple ways to eliminate unnecessary SQL, let me share another common bottleneck with MySQL scalability that can be instantly overcome.

Analyzing the writes that occur on a system can expose obvious potential bottlenecks. The MySQL Binary Log is a wealth of information that can be mined. Simple DML Counts per table can be achieved by a single line command.

Let’s look at the following example output of a production system:

mysqlbinlog /path/to/mysql-bin.000999 |  
   grep -i -e "^update" -e "^insert" -e "^delete" -e "^replace" -e "^alter"  | 
   cut -c1-100 | tr '[A-Z]' '[a-z]' |  
   sed -e "s/t/ /g;s/`//g;s/(.*$//;s/ set .*$//;s/ as .*$//" | sed -e "s/ where .*$//" |  
   sort | uniq -c | sort -nr

Of the approx 100,000 DML statements we get the following breakdown.

  55283 update sessions
  25204 insert into sessions
  12610 update items
  10536 insert into item_categories
   7532 update users
   5168 delete from item_categories
 

More then 50% of the statements that are written to the binary log and therefore replicated are INSERT’s into the sessions table. A further 25% are UPDATE’s to the same table. This represents 75% of DML statements in just the two most frequent statements.

What is disappointing is that these statements do not belong in MySQL. This is an example of when MySQL is being abused for a purpose where other products are more suited. While there is the argument in using MySQL for storing data, the impact in MySQL memory management, backup/recovery and slave replication throughput and lag can significantly impact scalability of your important MySQL data.

What is observed here is session management where a key value store product should be used as an alternative. In most circumstances it is likely this information is not even required to be persisted. The obvious replacement here is using memcached. If you do wish to persist this data there is an ever increasing list of products including Redis, Tokyo Cabinet/Kyoto Cabinent, Membrain, Voldemort etc that are specifically designed as a key-value store. Even the popular noSQL MongoDB can be easily substituted to perform as a key-value session manager with the added benefits of being a more fully functional product for other purposes.

This is often a common mistake when you use a framework such as Ruby on Rails (RoR) or PHP Code Igniter and many others.

Optimizing UPDATE and DELETE statements

Updated Nov 2011. Check out my latest book on Optimizing SQL Statements for more information. MySQL 5.6.2 also now provides an EXPLAIN syntax for UPDATE and DELETE statements natively.

While most people look at performance optimizations for SELECT statements, UPDATE and DELETE statements are often overlooked. These can benefit from the principles of analyzing the Query Execution Plan (QEP). You can only run an EXPLAIN on a SELECT statement, however it’s possible to rewrite an UPDATE or DELETE statement to perform like a SELECT statement.

To optimize an UPDATE, look at the WHERE clause. If you are using the PRIMARY KEY, no further analysis is necessary. If you are not, it is of benefit to rewrite your UPDATE statement as a SELECT statement and obtain a QEP as previously detailed to ensure optimal indexes are used. For example:

UPDATE t
SET	c1 = ‘x’, c2 = ‘y’, c3 = 100
WHERE c1 = ‘x’
AND	d = CURDATE()

You can rewrite this UPDATE statement as a SELECT statement for using EXPLAIN:

EXPLAIN SELECT c1, c2, c3 FROM	t WHERE c1 = ‘x’ AND	d = CURDATE()

You should now apply the same principles as you would when optimizing SELECT statements.

Simple lessons in improving scalability

It can be very easy to improve scalability with a MySQL server by a few simple rules. Here is one of them.

“The most efficient way to improve an SQL statement is to eliminate it”

There are numerous ways to eliminate SQL statements, however before I give a classic example that I’ve observed again with a client, let me explain the basic premise of why this improves scalability?

The MySQL kernel can only physically process a certain number of SQL statements for a given time period (e.g. per second). Regardless of the type of machine you have, there is a physical limit. If you eliminate SQL statements that are unwarranted and unnecessary, you automatically enable more important SQL statements to run. There are numerous other downstream affects, however this is the simple math. To run more SQL, reduce the number of SQL you need to run.

Here is the output of a small sample of analyzed TCP/IP packets via mk-query-digest.

# Rank Query ID           Response time Calls R/Call Apdx V/M   Item
# ==== ================== ============= ===== ====== ==== ===== ==========
#    1 0xD631CB919867DB50  0.0436 47.3%    92 0.0005 1.00  0.00 SELECT TTDOD
#    2 0x04FE01C5B31FD305  0.0258 27.9%   329 0.0001 1.00  0.00 ADMIN PING
#    3 0x93321857BCD8E771  0.0229 24.8%    36 0.0006 1.00  0.00 SELECT TTD

There are many problems here including the Row at a Time (RAT) nature of the SQL, the excessive pings however that’s a topic for another time. Let us look at the first statement.

SELECT `Date` FROM TTDOD WHERE ID = 9999;

That seems a simple enough query however let’s look at the table.

mysql> select count(*) from TTDOD;
+----------+
| count(*) |
+----------+
|        0 |
+----------+

In this case, the query will NEVER return any rows because the table is currently empty. Sure this may change in the future, however as this is more an exception processing situation the simple act of managing the knowledge this table rarely has any rows, and building a solution to inform the application of this can completely eliminate the need for this query to ever be executed.

FYI, the above sample is from less then 2 seconds of sampling. Removing the first query reduces the number of queries executed in this time slice by 20%. Regardless of whether this is typical load or load during a batch job the principle stands. We have not even started to look at what we can do with the next query.