MongoDB Experience: History

My first exposure to MongoDB was in July 2008 when I was a panelist on “A Panel on Cloud Computing” at the Entrepreneurs Round Table in New York. The panel included a representative from 10gen the company behind the open source database product and at the time Mongo was described as a full stack solution with the database being only one future component.

While I mentioned Mongo again in a blog in Nov 2008, it was not until Oct 6 2009 at the NoSQL event in New York where I saw a more stable product and a revised focus of development just on the database component.

As the moderator for the closing keynote “SQL v NOSQL” panel at Open SQL Camp 2009 in Portland, Oregon I had the chance to discuss MongoDB with the other products in the NoSQL space. Watch Video

In just the past few weeks, 3 people independently have mentioned MongoDB and asked for my input. I was disappointed to just miss the MongoNYC 2010 event.

While I have evaluated various new products in the key/value store and the schemaless space, my curiosity has been initially more with Cassandra and CouchDB.

Follow my journey as I explore in more detail the usage of mongoDB {name: “mongo”, type:”db”} via the mongodb tag on my blog.

Oracle resources for the MySQL Community

While I have spent a lot of time recently helping the MySQL community interact with and integrate with various Oracle User Groups including ODTUG, IOUG, NoCOUG, NYOUG, DAOG I thought I’d share some resources for the MySQL Community that wanted to know more about Oracle.

The Oracle family of products is huge. You only have to look at the acquisitions via Wikipedia to get an idea. The first thing is to narrow your search, e.g. Database, APEX, Middleware, BI, Hyperion, Financials, development via Java, PHP or Oracle Forms etc.

While Oracle is a commercial product you can download all software for FREE via Oracle Technology Network. There is also documentation, forums, blogs and events.

Some Oracle bloggers I have already been reading however I’m expanding my list. People you may want to consider include:

Cary Millsap,Lewis Cunningham, Debra Lilley, Dimitri Gielis,Duncan Mills, Edward Roske, Mark Rittman, Scott Spendolini, Tim Tow, Tom Kyte

If you want a comparison of the Oracle and MySQL community, be sure to also check out Sheeri Cabral’s keynote address at the 2010 MySQL User Conference for reference.

I'll have a MySQL shot to go!

Wednesday night of the MySQL track of ODTUG Kaleidoscope will include an evening with Last Comic Standing comedian, John Heffron. It should be great way to unwind after day 3 of the conference. Black vodka anybody.

Check out the MySQL Schedule for more information of presentations for the 4 days. More details is also available here.

When SET GLOBAL affects SESSION scope

We have all been caught out with using SET and not realizing that the default GLOBAL Scope (since 5.0.2) does not change the current SESSION scope.

I was not aware until today that changing GLOBAL scope has some exceptions that also automatically affect SESSION scope.

What I expected with a change in the GLOBAL scope is no affect SESSION scope. For example.

mysql> SHOW GLOBAL VARIABLES LIKE 'read_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| read_buffer_size | 131072 |
+------------------+--------+
1 row in set (0.00 sec)

mysql> SHOW SESSION VARIABLES LIKE 'read_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| read_buffer_size | 131072 |
+------------------+--------+
1 row in set (0.00 sec)

mysql> SET GLOBAL read_buffer_size=1024*256;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW GLOBAL VARIABLES LIKE 'read_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| read_buffer_size | 262144 |
+------------------+--------+
1 row in set (0.00 sec)

mysql> SHOW SESSION VARIABLES LIKE 'read_buffer_size';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| read_buffer_size | 131072 |
+------------------+--------+
1 row in set (0.00 sec)

However I was no prepared for this when changing an important variable for transaction management.

mysql> SHOW GLOBAL VARIABLES LIKE 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+
1 row in set (0.00 sec)

mysql> SHOW SESSION VARIABLES LIKE 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+
1 row in set (0.00 sec)

mysql> SET autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW GLOBAL VARIABLES LIKE 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | OFF   |
+---------------+-------+
1 row in set (0.00 sec)

mysql> SHOW SESSION VARIABLES LIKE 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | OFF   |
+---------------+-------+
1 row in set (0.00 sec)

However even more perplexing was the following message:

mysql> SET GLOBAL autocommit=0;
ERROR 1228 (HY000): Variable 'autocommit' is a SESSION variable and can't be used with SET GLOBAL

So this is another case were the definition of variables is not applicable in a GLOBAL level, yet the tools of the trade represent in some manner misleading information.
To prove my point, here is another new concurrent session started after the above.

mysql> SHOW GLOBAL VARIABLES LIKE 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+
1 row in set (0.01 sec)

mysql> SHOW SESSION VARIABLES LIKE 'autocommit';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| autocommit    | ON    |
+---------------+-------+
1 row in set (0.00 sec)

The MySQL Documentation also had an incorrect specification with description regarding this scope, Bug #54215

Best Practices: Additional User Security

By default MySQL allows you to create user accounts and privileges with no password. In my earlier MySQL Best Practices: User Security I describe how to address the default installation empty passwords.

For new user accounts, you can improve this default behavior using the SQL_MODE variable, with a value of NO_AUTO_CREATE_USER. As detailed via the 5.1 Reference Manual

NO_AUTO_CREATE_USER

Prevent the GRANT statement from automatically creating new users if it would otherwise do so, unless a nonempty password also is specified.

Having set this variable I attempted to show the error of operation to demonstrate in my upcoming “MySQL Idiosyncrasies that bite” presentation.

Confirm Settings

mysql> show global variables like 'sql_mode';
+---------------+---------------------+
| Variable_name | Value               |
+---------------+---------------------+
| sql_mode      | NO_AUTO_CREATE_USER |
+---------------+---------------------+
1 row in set (0.00 sec)

mysql> show session variables like 'sql_mode';
+---------------+---------------------+
| Variable_name | Value               |
+---------------+---------------------+
| sql_mode      | NO_AUTO_CREATE_USER |
+---------------+---------------------+
1 row in set (0.00 sec)

Create error condition

mysql> CREATE USER superuser@localhost;
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL ON *.* TO superuser@localhost;
Query OK, 0 rows affected (0.00 sec)
mysql> exit

What the? Surely this isn’t right.

$ mysql -usuperuser

mysql> SHOW GRANTS;
+--------------------------------------------------------+
| Grants for superuser@localhost                         |
+--------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'superuser'@'localhost' |
+--------------------------------------------------------+

mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.1.39    |
+-----------+

Well that’s broken functionality.

What should happen as described in Bug #43938 is a cryptic message as reproduced below.

mysql> GRANT SELECT ON foo.* TO 'geert12'@'localhost';
ERROR 1133 (42000): Can't find any matching row in the user table
mysql> GRANT SELECT ON *.* TO geert12@localhost IDENTIFIED BY 'foobar';
Query OK, 0 rows affected (0.00 sec)

It seems however that the user of CREATE USER first nullifies this expected behavior, hence new Bug #54208.

Eventually consistent Group Commit

Having just written an interview response about NoSQL concepts for a RDBMS audience it was poetic that an inconspicuous title “(4 of 3)” highlights that both a MySQL read scalable implementation via replication and a NoSQL solution can share a common lack of timely consistency of data. For the sake of Group Commit I hope my data is always consistent at some location at some point in time as soon as possible.

In attempting to comment to Kristian Nielsen’s Fixing MySQL group commit (part 4 of 3) I was forced to watch an ad before I could even add a comment. Go jump Live Journal, it’s quicker to write my own blog post.

And if anybody is still reading, I had just written the following.

“There is clearly a place for NoSQL solutions. The two primary types of products are a key/value store and a schema-less solution. You need to learn the strengths, benefits and weaknesses of both. For a RDBMS resource the lack of transactions, the lack of joins and the concept of eventually consistent can take some time to accept.”

mk-query-digest Tips – Showing all hosts & users

The Maatkit tools provide a suite of additional MySQL commands. There is one command I use constantly and that is mk-query-digest.

Unfortunately the documentation does leave a lot to be desired for usability. While throughout, it is a man page and not a user guide. Several of us have discussed writing better documentation however it’s always a matter of time. I have however learned a number of tips and I’d like to share them in smaller digests.

The first is showing additional display. Maatkit works on truncating per line output to a reasonable length of 73 characters?

One of those lines is the list of hosts that connected to MySQL for a query, for example.

# Hosts                  4 192.168.40... (2), 192.168.40... (2)... 2 more
# Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6)... 1 more

The problem is I want to know what that 1 more is so I can gather a complete list of IP addresses that connect to this server. You do that with the –show-all=host argument.

Without

$ cat external.tcpdump | ./mk-query-digest --type tcpdump | grep Hosts | uniq -c
#
      1 # Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6)... 1 more
      1 # Hosts                  1 99.99.139.140

With

$ cat external.tcpdump | ./mk-query-digest --type tcpdump --show-all=host | grep Hosts | uniq -c
      1 # Hosts                  3 99.99.245.14 (12), 999.106.206.167 (6), 99.99.139.140 (2)
      1 # Hosts                  1 99.99.139.140

You can apply the same principle to the Users as well with –show-all=user

$ cat external.tcpdump | ./mk-query-digest --type tcpdump  --show-all=user | grep Users | uniq -c
      1 # Users                  2 xxx (13), phpmysqlmo... (5)
     49 # Users                  1  xxx

The problem is a still gett a truncation of the name ‘phpmysqlmo…’ That’s the one thing I’m trying to uncover, because that IP and usernme are not valid permissions for this system.

ImageMagick on Mac OS X

Wanting to do some image manipulation I realized my Linux scripts don’t run under Mac OS X, as ImageMagick is not installed via my MacPorts.

However installation failed:

$ sudo port install imagemagick
--->  Computing dependencies for ImageMagick
--->  Verifying checksum(s) for xorg-libX11
Error: Checksum (md5) mismatch for libX11-1.3.3.tar.bz2
Error: Checksum (sha1) mismatch for libX11-1.3.3.tar.bz2
Error: Checksum (rmd160) mismatch for libX11-1.3.3.tar.bz2
Error: Target org.macports.checksum returned: Unable to verify file checksums
Error: The following dependencies failed to build: xorg-libXext xorg-libX11 xorg-libXt xorg-libsm xorg-libice
Error: Status 1 encountered during processing.
Before reporting a bug, first run the command again with the -d flag to get complete output.

Figuring that some of my packages may require upgrade:

$ sudo port selfupdate
sudo port -d upgrade outdated

The problem is this all failed. Turning to the FAQ it seemed what I needed to do was remove and re-install the offending package receiving the checksum error via the following syntax.

$ sudo port clean --all 
$ sudo port install 

It seemed I had to do this for several packages manually however in the end removing and installing a number of packages addressed the problem and now ImageMagick is happily running on Mac OS X

bash-3.2$ sudo port clean --all xorg-libX11
--->  Cleaning xorg-libX11
bash-3.2$ sudo port install xorg-libX11
---->  Computing dependencies for ImageMagick
--->  Verifying checksum(s) for xorg-libX11
Error: Checksum (md5) mismatch for libX11-1.3.3.tar.bz2
Error: Checksum (sha1) mismatch for libX11-1.3.3.tar.bz2
Error: Checksum (rmd160) mismatch for libX11-1.3.3.tar.bz2
Error: Target org.macports.checksum returned: Unable to verify file checksums
Error: The following dependencies failed to build: xorg-libXext xorg-libX11 xorg-libXt xorg-libsm xorg-libice
Error: Status 1 encountered during processing.
Before reporting a bug, first run the command again with the -d flag to get complete output.
bash-3.2$ sudo port clean --all libX11
Error: Port libX11 not found
Before reporting a bug, first run the command again with the -d flag to get complete output.
bash-3.2$ sudo port clean --all xorg-libX11
--->  Cleaning xorg-libX11
bash-3.2$ sudo port install xorg-libX11

tcpdump errors on FreeBSD for mk-query-digest

While I use this tcpdump command for MySQL query analysis with mk-query-digest, I found recently that it didn’t work on FreeBSD

$ tcpdump -i bge0 port 3306 -s 65535 -x -n -q -tttt -c 5
tcpdump: syntax error

It left me perplexed and reading the man page seemed to indicate my options were valid. I tried a few variances just to be sure without success.

$ tcpdump -i bge0 -c 5 port 3306 -x
tcpdump: syntax error
$ tcpdump -i bge0 -c 5 port 3306 -q
tcpdump: syntax error
$ tcpdump -i bge0 -c 5 port 3306 -tttt
tcpdump: syntax error

The solution was actually quite simple in the end, it had nothing to do with the commands, it had everything to do with the order of them. Placing port as the last option solved the problem.

$ tcpdump -i bge0 -s 65535 -x -n -q -tttt -c 5  port 3306
$ uname -a
FreeBSD db4.example.com 6.3-RELEASE-p3 FreeBSD 6.3-RELEASE-p3 #0: Wed Jul 16 05:13:50 EDT 200

MySQL Best Practices: User Security

It is critical that you do not use the default MySQL installation security, it’s simply insecure.

Default Installation

When installed, MySQL enables any user with physical permissions to the server to connect to the MySQL via unauthenticated users. MySQL also provides complete access to all super user privileges via the ‘root’ user with no default password.

$ mysql -uroot
mysql> SELECT host,user,password FROM mysql.user;
+--------------+------+-------------------------------------------+
| host         | user | password                                  |
+--------------+------+-------------------------------------------+
| localhost    | root |                                           |
| server.local | root |                                           |
| 127.0.0.1    | root |                                           |
| localhost    |      |                                           |
| server.local |      |                                           |
+--------------+------+-------------------------------------------+

What you see here are two types of users.

  • The ‘root’ user which has MySQL super user privileges for your server or ‘localhost’ connections with no password.
  • Unauthenticated users indicated by the blank ‘user’ column

The absolute minimum you should do, is run the provided optional command for immediate improvements mysql_secure_installation. When running this command, you’re prompted for the following
options — the output has been trimmed for presentations purposes.

$ mysql_secure_installation
Enter current password for root (enter for none):
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
Disallow root login remotely? [Y/n] Y
Remove test database and access to it? [Y/n] Y
Reload privilege tables now? [Y/n] Y

If you revisit permissions now, you’ll see what you would expect from a more initially secure installation.

mysql> SELECT host,user,password FROM mysql.user;
+-----------+------+-------------------------------------------+
| host      | user | password                                  |
+-----------+------+-------------------------------------------+
| localhost | root | *FDAF706717E70DB8DDAD0C5214B13770E1A80B0E |
+-----------+------+-------------------------------------------+

This is only the first step to hardening your MySQL instance and server.

Recommendations

The following are my recommendations for the minimum MySQL security permissions:

  • Always set a MySQL ‘root’ user password
  • Change the MySQL ‘root’ user id to a different name, e.g. ‘dba’
  • Only enable SUPER privileges to dba accounts, and only ever for ‘localhost’.
  • Application user permissions should be as restrictive as possible.
  • Never use ‘%’ for a hostname
  • Never use ALL TO *.*
  • Ideally the application should have at least two types of users, a read/write user and a read user.

There is a lot more information about physical Operating System security and the MySQL permission/privilege model to be discussed. One product I know of that help is SecuRich – The MySQL Security Package featuring roles, password history and many other cool functionalities.

References

A recent post by Lance Miller quoted the following.


I cant tell you how many times in the past 18 months that I’ve found real enterprises running vulnerable databases with default passwords, weak passwords and no real permissions management. It’s bad enough that the stats right now are this (so I guess I can tell you):
– 9 out of 10 organizations have a Microsoft SQL Database with a blank “sa” password (or an sa password of “sa”, “sql” or “password”)
– 9 out of 10 organizations have a Postgres Database with a default password
– 9 out of 10 organizations have a Sybase Database with a default password

The article didn’t include MySQL however some organizations don’t change the default password, probably not 9 of 10 in my experience.

More Information

MySQL Monitoring – What's really needed

The implementation of MySQL Monitoring is critical for any organization that uses a database and wants to avoid the inevitable disaster. There are 3 important components that all serve a key purpose to “MySQL Monitoring” in general:

  • Monitoring – Historical and graphical information
  • Alerting – Tell me when something is wrong
  • Dashboard – The State of NOW

Monitoring

There is no one option for Monitoring that is significantly better then another. A short list of what’s on offer can be found at http://monitoring-mysql.com/monitoring-products. What’s important is you have monitoring in place so historically you can review situations and compare across your servers and enabling the better identification of physical or database bottlenecks. My recommendations for products are Cacti which is packaged with most popular Linux distrubtions, so can be installed via a single apt-get/yum/rpm command, and the MySQL Cacti Templates. This is not the best product solution or combination, it’s in my opinion the most common and covers all the essential bases.

It’s best to define a different web server (i.e. publically accessible) to be the monitoring server rather then installing the web interface on any single DB server.

Alerting

Alerting is key to notify you of a problem without the need for somebody to be viewing a screen and see it happening via your monitoring. The identification of high CPU load, a disk nearing capacity, database locking etc often helps avoid a current problem before it becomes some level of disaster. Almost all companies use Nagios or a derivative such as the main fork Icigna or products that include Nagios like Opsview or Groundworks.

Dashboard

While both Monitoring and Alerting are necessary, they both however lack a key component necessary for successful administration. That is timing. Both of these earlier options sample, e.g. 1 minute or generally 5 minutes (by default), but problems can happen quickly. This is why each organization needs to have a Dashboard. I don’t know of any products here unless you try and adjust a monitoring product, but want’s needed is a very lightweight and very business centric single page of Green/Yellow/Red status’s of your environment including databases, webservers, response time and traffic etc. This is for the state of NOW. A Dashboard should sample every 5 to 10 seconds. I have seen larger and more successful companies have various home grown implementations. I developed a product for one company and it included the following on a single page. You can see the screen output in a presentation at http://www.slideshare.net/ronaldbradford/10x-performance-improvements. This included

  • 5 DB servers monitoring load average, ping time, database connections, active,free,locked, replication availability and lag
  • 5 web servers monitoring load average, ping time, apache connections
  • Application metrics monitoring 3 different page load times, and page size

It is often important to be able to identify a key problem and then drill down to this more quickly rather then the usual “the website is slow” question and having to investigate the same repetitive tasks. You need to automate, and be more pro-active in response especially to load and locking issues.

Advanced Monitoring

Information is only have the requirement, it is what you do with this information that determines how to be proactive rather then reactive. If I was the DBA of a company I’d do even more then these initial 3 steps which are a necessary base. I would also monitor for example:

  • Database size and growth. This is important to be preemptive about your capacity. Example SQL at http://ronaldbradford.com/mysql-dba/#allschemas
  • Error Log changes
  • Backup timing. This is important as your DB grows as it affects recovery.
  • Recovery timing
  • Gather raw MySQL status information because monitoring tools only capture what you ask it to do, not everything. While you may not analyze all now, you may want to in the future look back in time. Example scripts at http://ronaldbradford.com/mysql-dba/#log-stats
  • Hourly/Daily text reports. Producing a easy readable SHOW GLOBAL status report such as with statpack will for example enable me to know network throughtput in the DB, transaction throughput and key indicators of locking, disk access etc. While you may have a graphic interface, it’s a lot easier to automate and grep text reports.
  • Proactive restrictions. The Twitter failed whale is a great example of when the system moves closer to known limits, but before those limits they start limiting load. This includes for example to disable less critical but resource intensive functionality, e.g. people search. The also start rejecting connections so they do not reach a crash state. This could be proactively changing timeout values so the DB fails queries, and the webservers respond accordingly with a try again approach.

While these are important if you have only limited resources, too much information can be just as much of a burden then people just start ignoring the information and miss what’s important.

Finalized speakers list for Kaleidoscope conference

We have secured approval for our final two speakers and now have a full schedule for the 4 day MySQL track at ODTUG Kaleidoscope conference. The conference is in Washington DC from Monday June 28th to Thursday July 1st. Welcome to Josh Sled and Craig Sylvester that will be joining our existing list of speakers.

This conference will include 19 sessions of dedicated MySQL content from Monday thru Thursday by well qualified MySQL community members, as well a forums discussion and reception on Monday night. You don’t need to be an Oracle developer to get the benefit of this conference. We will offering a discount code for MySQL attendees in the upcoming days.

If you are in the DC area, the Monday night forum (known as the sundown sessions) as well as the reception are FREE for the MySQL community. This was a great jesture of the Oracle Developer Tools Users Group to openly invite the MySQL community to meet and interact. We ask that you register your name and email for confirmation of numbers.

Speakers List

  • Philip Antoniades, Oracle/MySQL
  • Ronald Bradford, 42SQL
  • Sheeri K. Cabral, The Pythian Group
  • Laine Campbell, PalominoDB
  • Patrick Galbraith, Northscale
  • Sarah Novotny, Blue Gecko
  • Padraig O’Sullivan, Akiba Technologies Inc.
  • Jay Pipes, Rackspace Cloud
  • Dossy Shiobara, Panoptic.com
  • Josh Sled, Oracle/MySQL
  • Craig Sylvester, Oracle/MySQL
  • Matt Yonkovit, Percona

References

Why is my database slow?

Not part of my Don’t Assume series, but when a client states “Why is my database slow””, you need to determine if indeed the database is slow.

Some simple tools come to the rescue here, one is Firebug. If a web page takes 5 seconds to load, but the .htm file takes 400ms, and the 100+ assets being downloaded from one base url, then is the database actually slow? Tuning the database will only improve the 400ms portion of 5,000ms download.

There some very simple tips here. MySQL is my domain expertise and I will not profess to improving the entire stack however perception is everything to a user and you can often do a lot. Some simple points include:

  • Know about blocking assets in your <head> element, e.g. .js files.
  • Streamline .js, .css and images to what’s needed. .e.g. download a 100k image only to resize to a thumbnail via style elements.
  • Sprites. Like many efficient but simple SQL statements, network overhead is your greatest expense.
  • Splitting images to a different domain.
  • Splitting images to multiple domains (e.g. 3 via CNAME only needed.) — Hint: Learn about the protocol
  • Cookieless domains for static assets
  • Lighter web container for static assets (e.g. nginx, lighttpd)
  • Know about caching, expires and etags
  • Stripping out http://ww.domain.com from all your internal links (that one alone saved 12% of HTML page size for a client). You may ask is that really a big deal, well in a high volume site the sooner you can release the socket on your webserver, the sooner you can start serving a different request.

Like tuning a database, some things work better then others, some require more testing then others, and consultants never tell you all the tricks.

References

As with everything in tuning, do your research and also determine what works in your environment and what doesn’t. Two excellent resources to start with are Steve Souders and Best Practices for Speeding Up Your Web Site by Yahoo.

More on understanding sort_buffer_size

There have been a few posts by Sheeri and Baron today on the MySQL sort_buffer_size variable. I wanted to add some more information about this buffer, what is impacted when it is changed, and what to do about it?

The first thing you need to know is the sort_buffer_size is a per session buffer. That is this memory is assigned per connection/thread. I’ve seen clients that set this assuming it’s a global buffer Don’t Assume – Per Session Buffers.

Second, internally in the OS usage independently of MySQL, there is a threshold > 256K. From Monty Taylor “if buffer is set to over 256K, it uses mmap() instead of malloc() for memory allocation. Actually – this is a libc malloc thing and is tunable, but defaults to 256k. From the manual:” . He goes on in a further to shows that impact > 256K for a buffer is 37x slower. This applies to all per session buffers, not just sort buffer. Now I have heard recently about this limit being 512K. I wasn’t able to nail down the specific speaker to see if this was a newer library or kernel or OS.

With MySQL instrumentation and the sort_buffer_size we are lucky, there is the Sort_merge_passes status variable. While it’s not perfect, it does indicate if the size of the buffer is in-sufficient, however even if we use a sort_buffer_size of say 256K, and you see Sort_merge_passes increasing slowly, does not indicate you have to increase the buffer.

So, all this does not tell you how to tune the buffer? Unfortunately with MySQL there is no actual easy answer. You do need to monitor the mysqld memory usage overall, especially if you are using persistent connections. A connection/thread will not release the memory assigned until it is closed, so it’s important to monitor for memory creep of the PGA, knowing what your initial SGA is. Morgan Tocker wrote a patch in Bug #33540 to create a RESET CONNECTION type command.

You do need to look for memory as a bottleneck. You need to learn how MySQL use memory, not just the sort_buffer_size. I actually started many years ago to write global/session variables to indicate when buffers were used, and how much and I started with the sort_buffer_size which was buried down in some very old filesort code. When I sought the input of an expert C coder around this, they wondered how the code, especially a loop handler even actually worked.

Nobody knows what the optimal setting is, and that’s the problem. In certain areas especially memory usage the MySQL instrumentation is simply non-existent, and I’d like to see this as something that is fixed.

In conclusion, if I ever see a sort_buffer_size above 256K, e.g. 1M or 2M, I always reset it to 256K. My reasoning is simple. Until you have evidence in your specific environment increasing the buffer makes performance better, it’s better to use a smaller value. There are bigger wins, like not using sorting, or better design, or even better simplifying or eliminating SQL.

References

sort_buffer_size and Knowing Why
How to tune MySQL’s sort_buffer_size
Read Buffer performance hit
more on malloc() speed

Free MySQL Event in Washington DC

As the program chair for the recently announced MySQL Track at the ODTUG Kaleidoscope conference located in Washington DC we are also looking into an associated free community event for MySQL locals in addition to a dedicated track for 4 days.

Please let us know your name and email via the form at http://ronaldbradford.com/ODTUG/free-event/ so we can provide more details in the coming week as we try to finalize logistics.

Registration will be necessary for attendance however for now we just want to know who is local so we can provide more details soon!

Updated. Full details of the free Monday night sundown sessions and reception can be found at MySQL track with free event at Kaleidoscope 2010

The MySQL community impacting the Oracle community

I’m happy to announce that the MySQL community has been given the opportunity to speak at the upcoming Oracle Developer Tools User Group (ODTUG) Kaleidoscope conference in Washington DC. We will be releasing more details this week of the MySQL presentations and topics and we are finalizing details of possible options to include the local MySQL community during the event.

The various independent Oracle User Groups in North America that embody “by the community and for the community” have been very positive with including the MySQL community. With the Sun/MySQL now Oracle community team of Giuseppe Maxia, Lenz Grimmer, Kaj Arnö and Oracle ACE Directors Sheeri K Cabral and myself we have been happy with the openness and willingness to include us in the larger Oracle ecosystem.

We’ll announce the schedule when we finalize it, but we have had a great response from an impressive list of speakers.

Additional References

The MySQL documentation is not always right

Let me premise this post with the statement I think the MySQL documentation is an excellent and highly accurate resource. I think the MySQL docs team do a great job, however like software and people, documentation is not perfect.

As members of the MySQL community you can always contribute to improve the process by reading the documentation and logging any issues as Documentation Bugs.

Some time ago in a discussion with a friend and colleague, we were talking about changes in historical defaults that had been improved finally in MySQL 5.4 The specific discussion was on the new default innodb_buffer_pool_size and we both agreed it increased significantly. One said 1GB, the other said 128MB. Who was right? Well we both were, and we were both inaccurate depending on versions.

Referencing the 5.4 Manual in InnoDB Startup Options and System Variables the current value for Linux is 128M, but for Windows it’s 1GB.

However I was confident I was told in a presentation, perhaps even the keynote the value was 1GB. Firing up my server and seeing the original version I used of 5.4.0 (which was not available on Windows) we find that the default for Linux was 1GB at some time, i.e. the first release.

mysql> show global variables like 'innodb_buffer_pool_size';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| innodb_buffer_pool_size | 1073741824 |
+-------------------------+------------+
1 row in set (0.00 sec)

mysql> select 1073741824/1024/1024 as MB;
+---------------+
| MB            |
+---------------+
| 1024.00000000 |
+---------------+
1 row in set (0.00 sec)

mysql> show global variables like 'version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| version                 | 5.4.0-beta                   |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | x86_64                       |
| version_compile_os      | unknown-linux-gnu            |
+-------------------------+------------------------------+
4 rows in set (0.00 sec)

I’m not trying to nit pick here, I’m highlighting that MySQL is a evolving product with many different versions and architectures. It’s our job of the MySQL community to help make the documentation the best for all readers. In this above case I’ve not logged the issue, because 5.4 is a defunct product, however if you want an example of how I identified a problem, provided a test case, and saw that my contribution was reviewed, verified and implemented check out Bug #51739 –core-file is not default TRUE (incorrect docs).

In conclusion, always read the documentation but pay special attention to the current version that matches the documentation, and the version you are actually running. Defaults change between versions, e.g. innodb_thread_concurrency is a complex example, and I’ve been caught with a large enterprise client with assuming the default of a Connector/J options as true, when it was in 5.0.6, but in 5.0.5 the version the client was running it was false.

An old saying, “trust by verify” is a good motto to consider.

The Drizzle Census

One thing I have often wondered is just how many MySQL instances exist in the world and what MySQL versions and architectures are in use. We hear of 50,000 windows downloads per day but this is misleading because MySQL is basically bundled with Linux by default or installed from various repositories. Linux servers powers many websites.

In Drizzle we have a proposed plan, the Drizzle Census. From the productive Drizzle Developers Day recently at the 2010 MySQL conference we sat down and created a blueprint, and subsequent high level spec of what we considered this optional plugin should do. We didn’t get as far as I would have liked in a code skeleton to at least gather and store a sample result, but the hope is that with the community we will in the near future.

Here is the list of information we decided was appropriate for anonymous information of value.

  • Kernel Version/Architecture
  • CPU type
  • SID – HASH(processor_id,listener address,first listener port)
  • Drizzle Version
  • Drizzle Uptime
  • Drizzled process memory usage

Installation issues with MySQL 5.5.4 and resolveip

I was installing the latest MySQL 5.5.4 on a new machine and I came across the following issues during installation, steps I generally perform on other versions without any incidents.

$ sudo su -
$ cd /opt
$ wget http://dev.mysql.com/get/Downloads/MySQL-5.5/mysql-5.5.4-m3-linux2.6-x86_64.tar.gz/from/http://mysql.mirrors.hoobly.com/
$ tar xvfz mysql-5.5.4-m3-linux2.6-x86_64.tar.gz
$ cd mysql-5.5.4-m3-linux2.6-x86_64
$ scripts/mysql_install_db
Neither host 'barney' nor 'localhost' could be looked up with /usr/bin/resolveip
Please configure the 'hostname' command to return a correct hostname.
If you want to solve this at a later stage, restart this script with the --force option

Perform some checks

$ /usr/bin/resolveip
-su: /usr/bin/resolveip: No such file or directory

$ hostname
barney

$ head -2 /etc/hosts
127.0.0.1	localhost
127.0.1.1	barney

I was surprised to find that my Ubuntu 10.04 Lucid Lynx box had a different IP for hostname, rather then 127.0.0.1. That was a little weird. I changed this just for comfort.

I was not familiar with this command so I tried to install it.

$ apt-get install resolveip
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Couldn't find package resolveip

$  apt-cache search resolveip

No luck there, off to Google to do a search for resolveip and ubuntu you find the following online manual page which surprising shows the following as the owner of the command.
COPYRIGHT
Copyright 2007-2008 MySQL AB, 2009 Sun Microsystems, Inc.

Thinking it’s now actually packaged with MySQL we find this in the untar directory.

$ bin/resolveip `hostname`
IP address of barney is 127.0.0.1

I’m now still none the wiser regarding this command. I’ve decided to take the –force option for now and we will see!

2010 MySQL Conference Presentations

I have uploaded my three presentations from the 2010 MySQL Users Conference in Santa Clara, California which was my 5th consecutive year appearing as a speaker.

A full history of my MySQL presentations can be found on the Presenting page.

My acceptance with Oracle as ACE Director

I hinted last week of my acceptance with Oracle before the formal announcement this week at the MySQL Users Conference, not for a job but as Oracle ACE Director. In today’s State of the MySQL Community keynote by Kaj Arnö I was one of the first three MySQL nominees that are now part of this program.

What exactly is an ACE Director? Using the description from the Oracle website.

Oracle ACEs and Oracle ACE Directors are known for their strong credentials as Oracle community enthusiasts and advocates, with candidates nominated by anyone in the Oracle Technology and Applications communities. The baseline requirements are the same for both designations; however, Oracle ACE Directors work more closely and formally with Oracle in terms of their community activity.

What does this mean to me?

As a significant contributor to the community I now have the opportunity to continue as well as to contribute to how Oracle continues to interact, promote and involve the MySQL community. As stewards our role as an Oracle ACE Director is to be actively involved. I look forward to the challenge to help shape and improve our State of the MySQL Community.

News and References
Welcome, Oracle ACE Directors for MySQL

State of the Dolphin – Opening keynote

Edward Screven – Chief Corporate Architect of Oracle provided the opening keynote at the 2010 MySQL Users Conference.

Overall I was disappointed. The first half was more an Oracle Sales pitch, we had some product announcements, we had some 5.5 performance buzz. While a few numbers and features were indeed great to hear, there was a clear lack of information to the MySQL ecosystem including employees, alumni and various support services. I hope more is unveiled this week.

Some notes of the session.

  • Oracle’s Strategy covers storage, servers, virtual machines, operating system, database, middleware, applications
  • We build a complete technology stack that is “open” and “integrated” based on “open standards”
  • products talk via open standards with the intention for customers to not feel locked in to any technology
  • Examples include apache, java, linux, xen, eclipse, and innodb
  • Unbreakable linux has now over 4,500 customers

After the sales pitch we got down to more about MySQL.

What MySQL means to Oracle? We make the Oracle solution more complete as a stack for customers.

What is the investment in MySQL?

  • Make MySQL a better MySQL
  • Develop, promote and support MySQL
  • MySQL community edition

Integration with Oracle Enterprise Manager, Oracle Secure Backup and Oracle Audit Vault infrastructure. *This I expected and have blogged about, so I’m glad to see this commitment.

MySQL 5.5 is now in Alpha, some features are

  • InnoDB will be default engine
  • Semi sync replication
  • Replication heartbeat
  • Signal
  • Performance Schema

MySQL 5.5 is planned on being faster with Innodb Performance Improvements & MySQL Performance Improvements.
MySQL 5.5 sysbench claims, read 200% faster, write 364% faster.

MySQL Workbench 5.2 announcement

  • SQL Development
  • Database Administration
  • Data Modelling

MySQL Cluster 7.1 GA announcement

  • Improved Administration
  • Higher Performance
  • Carrier Grade Availability & Performance

MySQL Enterprise Backup announcement

  • Online backup for InnoDB only
  • Formally InnoDB hot backup with additional features including incremental backups

MySQL Enterprise Monitor 2.2 Beta announcment

In closing the statement was “MySQL lets Oracle be more complete at the database layer”. Is that good for the MySQL Community or better for the Oracle revenue model?

My acceptance with Oracle

There have been a number of April fools jokes today so I thought I’d add my own to the list. While this sounds unexpected it’s actually no joke.

I just accepted a position with Oracle yesterday but I can’t say any more about the details until the MySQL users conference in a few weeks.

A special thanks to Lenz, Kaj & Giuseppe that championed everything to make it all happen, I really didn’t have to do anything other then accept.

Using ROLLBACK with MyISAM

Using ROLLBACK with MyISAM is useless. A ROLLBACK command is used to undo any DML that occurs during a transaction (i.e. START TRANSACTION and COMMIT). The MySQL default storage engine MyISAM does not support transactions.

It is easy with the SHOW GLOBAL STATUS command to see if your application code uses ROLLBACK. By performing two samples you can look at the delta over time. The statpack utility is one product that provides a human friendly display of this delta. As seen below, the use of ROLLBACK in combination with the read/write ratio and the my.cnf –skip-innodb indicate unnecessary database work.

====================================================================================================
                    Variable    Delta/Percentage         Per Second              Total
====================================================================================================

                                         Statement Activity
====================================================================================================

                     SELECT:        1,135,589                 1,309.79              189,279,510 (49.62%)
                     INSERT:            6,171                     7.12                  431,987 (0.11%)
                     UPDATE:            4,800                     5.54                  334,620 (0.09%)
                     DELETE:              312                     0.36                   17,910 (0.00%)
                    REPLACE:                0                     0.00                        0 (0.00%)
          INSERT ... SELECT:              121                     0.14                    4,042 (0.00%)
         REPLACE ... SELECT:               11                     0.01                      109 (0.00%)
               Multi UPDATE:                0                     0.00                       30 (0.00%)
               Multi DELETE:                0                     0.00                       28 (0.00%)
                     COMMIT:                0                     0.00                        0 (0.00%)
                   ROLLBACK:        1,154,987                 1,332.16              191,382,775 (50.17%)

If the ROLLBACK command doesn’t do anything you may be tempted to consider this doesn’t do much harm, think again. In the following example of statements analyzed via TCP packets, the ROLLBACK attributed to 21% of the execution time of all SQL in this sample.

# Profile
# Rank Query ID           Response time    Calls R/Call   Item
# ==== ================== ================ ===== ======== ================
#    1 0x4ED092EFA577DAB7     0.0106 24.8%     1   0.0106 SELECT p
#    2 0xC9ECBBF2C88C2336     0.0102 23.8%    52   0.0002 SELECT r_c
#    3 0x19C8068B5C1997CD     0.0092 21.6%   138   0.0001 ROLLBACK
#    4 0x448E4AEB7E02AF72     0.0091 21.3%    52   0.0002 SELECT r_t
#    5 0x56438040F4B2B894     0.0015  3.6%     2   0.0008 SELECT h_c
#    6 0x164962ED9B451586     0.0012  2.9%     9   0.0001 SELECT r
#    7 0x8FDE1484818AAACE     0.0008  1.9%     8   0.0001 SELECT p_c

In a well tuned system, the greatest time to execute an SQL statement is not the running of the SQL inside the MySQL kernel, it is the network latency of making the call, and the time taken to return the resultset requested.

In this extreme case on a production system, 1/2 the statements executed where unnecessary.

Uncovering this issue was three commands and less then 5 minutes of my time. The statpack report uncovered 4 additional red flags at the same time.

If you are not monitoring your production system, start now. For assistance on what to monitor and analysis please contact me for more information.

Installing Ubuntu Desktop 10.04 with LVM

With a new quad core desktop with 8GB RAM & 1TB HDD I wanted to install the Ubuntu desktop version using LVM. This is not possible with the “Desktop CD”. You need to use the “alternative CD” which will easily allow you to configure LVM via a text/cursors installation and also give you a deskop GNOME environment. The “Server CD” also gives you LVM options, but no GUI.

While there are complicated instructions on how to configure/setup LVM with various versions of Ubuntu, all you need with Ubuntu 10.04 is the right CD.

While installing I also read up on two tips that I found of benefit.

  • During installation of the base system, package unpacking and setup messages are redirected to tty4. You can access this terminal by pressing Left Alt+F4; get back to the main installer process with Left Alt+F1.
  • You can get a terminal window easily during installation by switching to the second virtual console by pressing Left Alt+F2

Installation was rather seamless, the only annoyance the cursors interface not displaying clearly on my Dell 2407WFP monitor during the installation process. Monitor works fine with installed GUI at 1920×1200.

New linux desktop configuration

My purchase yesterday was a HP Pavilion p6340f Desktop PC with the following specs.

  • Intel Core 2 Quad Q8400 2.66GHz Processor
  • 4MB L2 Cache, 1333MHz FSB
  • 8GB PC3-8500 DDR3 SDRAM (4 x 2GB)
  • 1TB Serial ATA Hard Drive
  • Intel Graphics Media Accelerator X4500 with 32MB Integrated shared graphics memory
  • Lightscribe SuperMulti DVD±R/RW with Double Layer
  • 10/100/1000 Base-T Network interface
  • Wireless LAN 802.11 a/b/g/n

The purchase price $749+tax which was more then B&H at $699 but not being open Friday nights, B&H it’s your loss. There is also a P6320 model with AMD Phenom II X4 820 2.80GHz processor and NVIDIA GeForce 9100 Graphics for the same price, it was a tough decision.

I’m not trilled with the HP part having not enjoyed experiences with HP servers and Compaq desktops, however time will tell.

How to find MySQL developers?

Brian wrote recently Where did all of the MySQL Developers Go?, while over in Drizzle land they have been accepted for the Google Summer of code along with many other open source projects. MySQL from my observation a noticeable absentee.

Historically, the lack of opportunity to enable community contributions and see them implemented in say under 5 years, has really hurt MySQL in recent times. There is plenty of history here so that’s not worth repeating. The current landscape of patches, forks and custom MySQL binaries for storage engine provider has provided a boom of innovation that sadly is now lost from the core MySQL product.

In Drizzle, community contribution is actively sought and a good portion of committed code is not from the core Drizzle developers (wherever they work). As a Drizzle GSoC project contributor last year Padraig for example this year is helping to mentor. The Drizzle project contribution philisophy, GSoC and other activities such as the Drizzle Developer Day all enable the next generation of developers to be part of ongoing project developement.

Oracle, what are you going to do to foster an active community and new long term developers for MySQL?

Understanding Drizzle user authentication options – Part 2

A key differentiator in Drizzle from it’s original MySQL roots is user based authentication. Gone is the host/user and schema/table/column model that was stored in the MyISAM based mysql.user table.

Authentication is now completely pluggable, leveraging existing systems such as PAM, LDAP via PAM and Http authentication.

In this post I’ll talk about HTTP authentication which requires an external http server to implement successfully. You can look at Part 1 for PAM authentication.

Compiling for http auth support

By default during compilation you may find.

checking for libcurl... no
configure: WARNING: libcurl development lib not found: not building auth_http plugin. On Debian this is found in libcurl4-gnutls-dev. On RedHat it's in libcurl-devel.

In my case I needed:

$ sudo yum install curl-devel

NOTE: Bug #527255 talks about issues of the message being incorrect for libcurl-devel however this appears it may be valid in Fedora Installs

After successfully installing the necessary pre-requisite you should see.

checking for libcurl... yes
checking how to link with libcurl... -lcurl
checking if libcurl has CURLOPT_USERNAME... no

HTTP Authentication

We need to enable the plugin at server startup.

$ sbin/drizzled --mysql-protocol-port=3399 --plugin_add=auth_http &

You need to ensure the auth_http plugin is active by checking the data dictionary plugin table.

drizzle> select * from data_dictionary.plugins where plugin_name='auth_http';
+-------------+----------------+-----------+-------------+
| PLUGIN_NAME | PLUGIN_TYPE    | IS_ACTIVE | MODULE_NAME |
+-------------+----------------+-----------+-------------+
| auth_http   | Authentication | TRUE      |             |
+-------------+----------------+-----------+-------------+

The auth_http plugin also has the following system variables.

drizzle> SHOW GLOBAL VARIABLES LIKE '%http%';
+------------------+-------------------+
| Variable_name    | Value             |
+------------------+-------------------+
| auth_http_enable | OFF               |
| auth_http_url    | http://localhost/ |
+------------------+-------------------+
2 rows in set (0 sec)

In order to configure Http authentication, you need to have the following settings added to your drizzled.cnf file. For example:

$ cat etc/drizzled.cnf
[drizzled]
auth_http_enable=TRUE
auth_http_url=http://thedrizzler.com/auth

NOTE: Replace the domain name with something you have, even localhost.

A Drizzle restart gives us

$ bin/drizzle -e "SHOW GLOBAL VARIABLES LIKE 'auth_http%'"
+------------------+-----------------------------+
| Variable_name    | Value                       |
+------------------+-----------------------------+
| auth_http_enable | ON                          |
| auth_http_url    | http://thedrizzler.com/auth |
+------------------+-----------------------------+

By default, currently if the settings result in an invalid url, then account validation does not fail and you can still login. It is recommended that you always configure pam authentication as well as a fall back.

$ wget -O tmp http://thedrizzler.com/auth
--17:32:32--  http://thedrizzler.com/auth
Resolving thedrizzler.com... 208.43.73.220
Connecting to thedrizzler.com|208.43.73.220|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
17:32:32 ERROR 404: Not Found.

$ bin/drizzle
drizzle > exit

Configuring passwords

To correctly configured your web server to perform the HTTP auth, you can use this Apache syntax as an example.

The following is added to the VirtualHost entry in your web browser.

<Directory /var/www/drizzle/auth>
AllowOverride FileInfo All AuthConfig
AuthType Basic
AuthName "Drizzle Access Only"
AuthUserFile /home/drizzle/.authentication
Require valid-user
</Directory>
$ sudo su -
$ mkdir /var/www/drizzle/auth
$ touch /var/www/drizzle/auth/index.htm
$ apachectl graceful

We check we now need permissions for the URL.

$ wget -O tmp http://thedrizzler.com/auth
--17:35:48--  http://thedrizzler.com/auth
Resolving thedrizzler.com... 208.43.73.220
Connecting to thedrizzler.com|208.43.73.220|:80... connected.
HTTP request sent, awaiting response... 401 Authorization Required
Authorization failed.

You need to create the username/password for access.

$ htpasswd -cb /home/drizzle/.authentication testuser sakila
$ cat /home/drizzle/.authentication
testuser:85/7CbdeVql4E

Confirm that the http auth with correct user/password works.

$ wget -O tmp http://thedrizzler.com/auth --user=testuser --password=sakila
--17:37:45--  http://thedrizzler.com/auth
Resolving thedrizzler.com... 208.43.73.220
Connecting to thedrizzler.com|208.43.73.220|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently

Drizzle HTTP Authentication in action

By default we now can’t login

$ bin/drizzle
ERROR 1045 (28000): Access denied for user ''@'127.0.0.1' (using password: NO)
$ bin/drizzle --user=testuser --password=sakila999
ERROR 1045 (28000): Access denied for user 'testuser'@'127.0.0.1' (using password: YES)

$ bin/drizzle --user=testuser --password=sakila
Welcome to the Drizzle client..  Commands end with ; or g.
Your Drizzle connection id is 6
Server version: 7 Source distribution (trunk)

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

drizzle>