MySQL Roadmap

Here are some notes from the MySQL Server Roadmap session at the MySQL Conference 2007.

MySQL: Past and Future

  • 2001: 3:23
  • 2003: 4.0 UNION query Cache Embedded
  • 2004: 41. Subqueries
  • 2005: 5.0 Stored Procedures, Triggers, Views
  • Now: 5.1.17 Partitioning, Events, Row-based replication
  • 2007?: 6.0 Falcon, Performance, Conflict detection
  • 2008?: 6.1 Online Backup, FK Constraints

2007 Timeline

  • Q1: 5.1 Beta, 5.1 Telco Production Ready, Monitoring Service 1.1, MySQL 6.0 Alpha, Community GA
  • Q2: MySQL 6.0 Beta, New Connectors GA
  • Q3: 5.1 RC, 6.0 Beta, MS 2.0, Enterprise Dashboard beta
  • Q4: 5.1 GA, 6.0 Beta

Where are we today?

  • We are by fare the most populate open source database
  • The Enterprise world is moving online and MySQL is well-positioned for that trend, But:
    • Transactional scalability
    • Manageability
    • Specific online features

MySQL Server Vision – The Future

  • Always Online — 24×7, Online backup,online analytics, online schema changes
  • Dynamic Scale-out — online partitioning, add node, replication aides,
  • Reliable — fault-tolerant, easy disagnosis, stable memory, ultimately self-healing
  • High-performance — Interactive web, real-time response, apps, 10,000-100,000 clients
  • Ease of use — Portable, Best for development, multiple connectors, easy tuning
  • Modularity and Ubiquity — Storage engines, plug ins

How can you help?

  • Bug finding and fixing — Community Quality Contributor
  • Feature/patch contribution
  • But, to expedite your patch

The goal: “Be the Best Online Database for Modern Applications”

MySQL Conference – For Oracle DBAs and Developers


I have just completed my presentation at the MySQL Conference 2007 on MySQL for Oracle DBAs and Developers.

Not mentioned in my slides, but referenced during the presentation was what I consider the most important page to document from the MySQL Manual — 5.2.1. Option and Variable Reference

You can download a PDF copy of my presentation here.

MySQL Conference – Building a Vertical Search Engine in a Day

Moving into the user sessions on the first day at MySQL Conference 2007, I attended Building a Vertical Search Engine in a Day.

Some of my notes for reference.

Web Crawling 101

  • Injection List – What is it seed URL’s you are starting from
  • Fetching the pages
  • Parsing the content – words and links
  • Updating the crawl DB
  • Whitelist
  • Blacklist
  • Convergence — avoiding the honey pots
  • Index
  • Map-reduce — split a large problem into little pieces, process in parallel, then combine results

Focused content == vertical crawl

  • 20 Billion Pages out there, a lot of junk
  • Bread-first would take years and cost millions of lives

OPIC + Term Vectors = Depth-first

  • OPIC is “On-line Page Importance Calculation”. Fixing OPIC Scoring Paper
  • Pure OPIC means “Fetch well-linked pages first”
  • We modify it to “fetch pages about MySQL first”

Nutch & Hadoop are the technologies that run on a 4 server cluster. Sample starting with www.mysql.com in 23 loops, 150k pages fetched, 2M URL’s found .

Serving up the results

MySQL Conference – RedHat Keynote – One Laptop Per Child

Our third keynote at MySQL Conference 2007 was titled Building the Ultimate Database Container with RHEL, MySQL, and Virtualization by Michael Evans.

The presentation was on Red Hat & One Laptop Per Child. His initial Quote was “Thinking Past Platforms: The Next Challenge for Linux”, By Doc Sears, 2007-04-16 http://www.linuxjournal.com/node/1000210

OLPC

  • A Non profit idea from Nicholas Negroponte.
  • Aim is to build & distribute inexpensive laptop systems to primary & secondary school students worldwide.
  • Sell to young children in developing countries.

In summary at presentation to Red Hat — “Non-profit, run by a professor, we make hardware and sell to governments.”

The overall dynamics have attracted a lot of interesting people in the world.

The ability and goal is to make the device together, bringing all H/W and S/W people together.

The people that get behind this project have the ethos — “I’m willing to jump into this to change the world.”

This is the first time for a new opportunity in the last 10 years.

The sugar user interface is a completely new experience.

When the UI designer was presenting to a room of head executives. “What ever advice you got, keep it to yourself, your not the target market.”

One key point — No backward compatibility needs.

More information at www.laptop.org. Wikipedia Reference. Some videos at You Tube Inside One Laptop per Child: Episode one and Slightly better demo of the OLPC User Interface.

MySQL Conference – The next keynote with Guy Kawasaki

Without missing a beat at MySQL Conference 2007, we moved from Marten’s keynote to The Art of Innovation by Guy Kawasaki.

Extremely fun and entertaining. His 10 points.

1. Make Meaning

  • “To change the world”
  • To a VC, do not say “you want to make money”, that is understood. You will attract the wrong team.

2. Make Mantra

  • Not a Mission statement (50-60 words long), but 2 or 3 words.
    • Wendy’s – “Healthy fast food”
    • Mike – “Authentic Athletic Performance”
    • FedEx – “Peace of Mind”
    • eBay – “Democratize commerce”
  • Create a mantra — Why do you exist?

If you get stuck try the Dilbert mission statement generator.

3. Jump to the next curve

  • Not 50% or 100% better, but “Do things 10x better”

4. Roll the DICEE

  • “Create great stuff”
    • Deep: Fanning (Reef) Sandal that open beer bottles
    • Intelligent: BF-104 Flashlight (Panasonic) (takes 3 sizes of batteries)
    • Complete: Lexus
    • Elegant: Nano (Apple)
    • Emotive: Harley Davidson (They generate strong emotions)

5. Don’t worry, be crappy

Get it out there.

6. Polarize people

People love it or hate it.

7. Let a hundred flowers blossom

  • People that are not your target market are using it.
  • Take the money, ask the people why you are buying, ask what you can do better.

8. Churn baby, churn.

Ok to ship stuff with crappy stuff in it, but important to continually revised and improve.

9. Niche thyself
With a nice graph.

  • Vertical — Ability to provide unique product or service
  • Horizontal –Value to customer
  • bottom right — Price
  • top left — Stupid
  • bottom left — Dotcom
  • top right — X You need to be High and to the right.
  • Fandango — It’s either Fandango, or Clubbin.
  • Breitling Emergency – watch
  • Smart car – park perpendicular
  • LG Kimchi refrigerator

You need to be like the President of the United States – You need to high and to the right. Got a great laugh from the crowd.

10. Follow the 10/20/30 rule

Innovative, you need to pitch for what you want.

  • The optimum number of slides in 10 slides.
  • Given the slides in 20 minutes.
  • Use 30 point font

11. Don’t let the bozos grind you down

A bonus to our friends in the community.

  • “I think there is world market for five computers”
  • “This telephone has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us.” –Western Union 1876
  • “There is no reason why anyone would want a computer in their home.” — Digital Equipment Corp 1977
  • “It’s too far to drive, and I don’t see how it can be a business.” – Guy Kawasaki – Bozo (The company was Yahoo)

Guy commenting on his lost opportunity with Yahoo — “It only covers the first billion, it’s the second billion that pisses me off.”

Read more about Guy at his website Guy Kawasaki.

The Art of Innovation. If you a copy of slides, send an email to [email protected]

MySQL Conference – Opening Keynote with Marton Mickos

It’s an early start this morning at 8:20am at MySQL Conference 2007 with CEO Mårten Mickos keynote talk Welcome and State of MySQL AB.

Here are some of the key points that impressed on me.


“The Participatory & Disruptive spirit of the Dolphin.”.

Open Source disrupts inefficient models and produces new wealth.

Architecture of Participation.

  • GPL and FOSS licenses
  • Google
  • del.ico.us

MySQL Architecture of Participation
You have the forge, the planet, community contributer program and many more.

Production of Amateurs

  • You Tube
  • Second Life
  • Wikipedia

Some really great Quotes:

“Fail often to succeed soon.” IDEO company slogan

“Noah’s Ark was built by amateurs, but the Titanic by professionals.”

Innovation Happens Here

MySQL Monitoring & Advisory Service

In his presentation of MySQL Network Enterprise Dashboard, If you were quick you would have noticed the MySQL Version 6.0.0-alpha-pb518.log

Leading Applications

  • Open Source ISVs
  • Web
  • Web 2.0
  • On-Demand
  • Hosting
  • Telco & VOIP
  • Security
  • Enterptise 2.0
  • Hub & Spoke

We want to be known as The Online Database.

Drawn to the Lamp

  • Microsoft
  • Oracle
  • IBM
  • Sun
  • HP
  • Dell
  • Unisys

They all have an Open Source strategy, the develop open source products, they use and partner with MySQL.

He also mentioned MySQL Enterprise Connection Alliance – MECA.

Global Distributed Organization

  • 70% work from home
  • 100 major locations in nearly 30 different countries.
  • There is a great focus on culture – global culture.

Disruptive Businesses

  • A smarter way to produce the goods
  • A smarter way to distribute them
  • Serve the underserved
  • Keep it simple
  • Innovate with technology to give customers more convenience
  • Make money
  • Gartner: 66% of enterprises are deploying MySQL or planning to
  • 750,000 newsletter subscribers
  • MySQL installed in over 11 million installations

Going Forward

  • Scalability
  • Availability
  • Manageability
MySQL – The best online database in the world.

MySQL Conference – Rewarding the Community

At MySQL Conference 2007, CEO Mårten Mickos in his opening keynote Welcome and State of MySQL AB rewarded the community. Those that contributed to “The best database in the world”.

2007 MySQL Applications of the Year
#1 in Online Video
#1 in 3G Mobile Entertainment
#1 in Creative Software

And the Winners- YouTube, Amp’d mobile, and Adobe

2007 MySQL Partners of the Year
#1 reseller of MySQL Enterprise to govt
#1 in MySQL Enterprise integration
#1 in Open Source

And the Winners – Carasoft, HP, and RedHat

2007 Community Members of the Year
Quality Contributor
Community Code Contributor
Community Advocate

And the Winners

Martin Freibe
Paul McCullagh
Sheeri Kritzer

MYSQL Conference – Scaling and High Availablilty Architectures Tutorial

My first tutorial today at MySQL Conference 2007 is Scaling and High Availablilty Architectures by Jeremy Cole and Eric Bergen of Proven Scaling.

Basic Tenets

While not discussed, the premise is to Cache Everything. MemCache is a key component to any scalable system.

Lifetime of a scalable system

Using the analogy from a newborn child Jeremy stepped us through the categories Newborn, Toddler, Teenager, Late teens to 20s, Adult.

In Late teens to 20s, is where most systems die a slow death, he termed “the awkward stage”. This is where scalability is critical, and a meltdown for example can ruin you. Downtime is also just not acceptable for your user community.

When your Adult you need to perfect the ability to deploy incremental changes to your application seamlessly.

As the system grows, optimizations changes that may have worked are now affecting your system. It’s important to revisit during each stage.

Partitioning

Most applications mainly implement a horizontal partitioning model. Different components of your systemcan be scaled by a “partition key”. The different models include fixed “Hash key” partitioning, Dynamic “directory” partitioning, Partition by “group” and partition by “user”.

The Dynamic “directory” is a lot harder to implement, but is ultimately more scalable.

One of Partitioning Difficulties, is inter-partition interactions. Ultimately the solution is duplicating meta-data or duplicating data. Overall reporting is also more difficult. What if we want average for users per location, if we partition by users. Most systems user driven and partition by user. A newer strategy is to partition by group.

For implementing a Fixed Hash Key partitioning.

  • Divide data into B buckets
  • Divide the B buckets over M machines

You define 1024 physical buckets (can then the easily dividable) 0-1023 (user_id % 1024). Coded then by range to physical machines, 0-255, 256-511, 512-767, 768-1023. The plus side is very easy to implement, you can always derive where something is. The biggest problems is scalability, e.g. going from 4 machines to 5. You also don’t have any fine grained control over buckets.

For Dynamic Directory partitioning you maintain a database of mappings to partitions. A user can be easily moved at a later date in a much finer grain. MySQL Cluster is designed for this type of application. It is not necessary however, a well configured Innodb Hardware solution with memcache can easily provide the same functionality. The only writes are new users, or update partition keys, with a lot of reads.

HiveDB

This open source product implements a “standard” partition-by-key MySQL system written in Java.
Many organizations have a somewhat similar built system, but this is an example of something that’s been open sourced.

More information at www.hivedb.org.

The Hive API language should be the only code that should be re-written to be application development language (e.g. PHP,Ruby) when needed.

High Availability

The obvious goals.

  • Avoid downtime due to failures.
  • No single point of failure.
  • Extremely fast failover.
  • No dependency of DNS changes.
  • No Dependency on code changes.
  • Painless and seamless failover.
  • Fail-back must be just as painless.

The overall objective is speed.

Understanding MySQL Replication is important to understanding HA options.

MySQL Replication is Master-Slave One Way asynchronous replication.

  • Slave requests binary logs from last position.
  • Master sends binary logs up to current time.
  • Master keeps sending binary logs in real-time.

Dual Master provides an easy configuration to fail over, it doesn’t provide benefits in throughput. Can help solve online schema changes without downtime. Assuming existing queries will perform both pre and post schema. (set-sql-bin-log=0 for the session is the tip). There are a number of caveats.

Ultimately for High Availability you have a trade off, data loss (minuet) to scalability.

SHOW PROFILE

I’ve been playing more with the SHOW PROFILE command available as part of MySQL Community 5.0.37. Thanks to the author Jeremy Cole. This command can provide some interesting insight into the workings of MySQL. It does however like most new releases of software enable users to make suggestions for new features and functionality. Here is my wish list. Some I figure are practical, some are wish list, but if you never ask you never know.

  1. The Unit of measure for duration is Second. It would be great if it could be tuned for display, say millisecond. If you look at my first Example, all figures are effectively represented in milli-second or even micro-second granularity.
  2. I would like to see a total for the query. Again in Example 1, you have to add up all the figures to determine this query took 8ms.
  3. Again in Example 1, perhaps a percentage of total time for each line duration may be helpful.
  4. More descriptive status descriptions (this is part of the MySQL internal code and not the patch)
  5. SET PROFILING=1; can only be set on a current session, making it impossible to easily monitor a JDBC multi-connection test. There needs to be a way to enable for a session, other then current interactive one you are viewing, but also be able to see results. You can’t do a SHOW PROFILE via a JDBC connection!
  6. I’d like to see a threshold, so queries running under threshold are discarded, much like a long-query-time option. This enables you to run a large number of SQL Statements and only profiles for longer running ones are logged
  7. I’d like to see a level of logging to file, again like a slow query log, so you can simply gather information on a working system and review at some later time. Combined with the previous point, you now have microsecond slow query log with explicit details.

One major benefit of the SHOW PROFILE command is I can accurately get a figure for how long a query it taking (time in milliseconds). You just have to sum all figures (See wish list point 2).

By default, the source details are not provided, you need to specify the SOURCE operand, which helps in both comparing with any debugging output and also trouncing through the code. As in Example 1, I needed to find why 95% of time was in a step with the most descriptive line of ‘end’.

Example 1

mysql> show profile SOURCE,MEMORY for query 4;
+--------------------+------------+-----------------------+---------------+-------------+
| Status             | Duration   | Source_function       | Source_file   | Source_line |
+--------------------+------------+-----------------------+---------------+-------------+
| Opening tables     | 0.00013200 | open_tables           | sql_base.cc   |        2106 |
| System lock        | 0.00001800 | mysql_lock_tables     | lock.cc       |         153 |
| Table lock         | 0.00000600 | mysql_lock_tables     | lock.cc       |         162 |
| init               | 0.00001300 | mysql_select          | sql_select.cc |        2073 |
| optimizing         | 0.00004800 | optimize              | sql_select.cc |         617 |
| statistics         | 0.00002500 | optimize              | sql_select.cc |         773 |
| preparing          | 0.00005200 | optimize              | sql_select.cc |         783 |
| executing          | 0.00002200 | exec                  | sql_select.cc |        1407 |
| Sending data       | 0.00000500 | exec                  | sql_select.cc |        1925 |
| end                | 0.00786600 | mysql_select          | sql_select.cc |        2118 |
| query end          | 0.00001400 | mysql_execute_command | sql_parse.cc  |        5085 |
| freeing items      | 0.00000700 | mysql_parse           | sql_parse.cc  |        5973 |
| closing tables     | 0.00001900 | dispatch_command      | sql_parse.cc  |        2120 |
| logging slow query | 0.00001000 | log_slow_statement    | sql_parse.cc  |        2178 |
| cleaning up        | 0.00000500 | dispatch_command      | sql_parse.cc  |        2143 |
+--------------------+------------+-----------------------+---------------+-------------+
15 rows in set (0.01 sec)

Example 2

We were experiencing increased latency in JDBC with a particular query, With a recommendation from MySQL Support we tried SET SQL_BUFFER_RESULT=1;

mysql> show profile source for query 14;
+------------------------------+------------+-------------------------+---------------+-------------+
| Status                       | Duration   | Source_function         | Source_file   | Source_line |
+------------------------------+------------+-------------------------+---------------+-------------+
| Opening tables               | 0.00006025 | open_tables             | sql_base.cc   |        2106 |
| System lock                  | 0.00004875 | mysql_lock_tables       | lock.cc       |         153 |
| Table lock                   | 0.00000400 | mysql_lock_tables       | lock.cc       |         162 |
| init                         | 0.00001600 | mysql_select            | sql_select.cc |        2073 |
| optimizing                   | 0.00005675 | optimize                | sql_select.cc |         617 |
| statistics                   | 0.00001250 | optimize                | sql_select.cc |         773 |
| preparing                    | 0.00005175 | optimize                | sql_select.cc |         783 |
| Creating tmp table           | 0.00001275 | optimize                | sql_select.cc |        1206 |
| executing                    | 0.00006025 | exec                    | sql_select.cc |        1407 |
| Copying to tmp table         | 0.00000400 | exec                    | sql_select.cc |        1547 |
| converting HEAP to MyISAM    | 0.04820900 | create_myisam_from_heap | sql_select.cc |        9914 |
| Copying to tmp table on disk | 0.04049075 | create_myisam_from_heap | sql_select.cc |        9968 |
| Sending data                 | 1.29302000 | exec                    | sql_select.cc |        1925 |
| end                          | 0.09398425 | mysql_select            | sql_select.cc |        2118 |
| removing tmp table           | 0.00004975 | free_tmp_table          | sql_select.cc |        9856 |
| end                          | 0.00089125 | free_tmp_table          | sql_select.cc |        9884 |
| query end                    | 0.00001850 | mysql_execute_command   | sql_parse.cc  |        5085 |
| freeing items                | 0.00000825 | mysql_parse             | sql_parse.cc  |        5973 |
| closing tables               | 0.00003425 | dispatch_command        | sql_parse.cc  |        2120 |
| logging slow query           | 0.00001325 | log_slow_statement      | sql_parse.cc  |        2178 |
| cleaning up                  | 0.00000675 | dispatch_command        | sql_parse.cc  |        2143 |
+------------------------------+------------+-------------------------+---------------+-------------+
21 rows in set (0.00 sec)

Looking at the lines helped to indicate that the temporary table was being flushed to disk, indicating we need to Add SET SESSION tmp_table_size=20*1024*1024;

mysql> show profile source for query 18;
+----------------------+------------+-----------------------+---------------+-------------+
| Status               | Duration   | Source_function       | Source_file   | Source_line |
+----------------------+------------+-----------------------+---------------+-------------+
| Opening tables       | 0.00006050 | open_tables           | sql_base.cc   |        2106 |
| System lock          | 0.00001250 | mysql_lock_tables     | lock.cc       |         153 |
| Table lock           | 0.00000400 | mysql_lock_tables     | lock.cc       |         162 |
| init                 | 0.00000775 | mysql_select          | sql_select.cc |        2073 |
| optimizing           | 0.00005475 | optimize              | sql_select.cc |         617 |
| statistics           | 0.00001225 | optimize              | sql_select.cc |         773 |
| preparing            | 0.00005075 | optimize              | sql_select.cc |         783 |
| Creating tmp table   | 0.00001350 | optimize              | sql_select.cc |        1206 |
| executing            | 0.00006125 | exec                  | sql_select.cc |        1407 |
| Copying to tmp table | 0.00000375 | exec                  | sql_select.cc |        1547 |
| Sending data         | 0.29110925 | exec                  | sql_select.cc |        1925 |
| end                  | 0.08023800 | mysql_select          | sql_select.cc |        2118 |
| removing tmp table   | 0.00001525 | free_tmp_table        | sql_select.cc |        9856 |
| end                  | 0.05971400 | free_tmp_table        | sql_select.cc |        9884 |
| query end            | 0.00001925 | mysql_execute_command | sql_parse.cc  |        5085 |
| freeing items        | 0.00000425 | mysql_parse           | sql_parse.cc  |        5973 |
| closing tables       | 0.00004625 | dispatch_command      | sql_parse.cc  |        2120 |
| logging slow query   | 0.00000800 | log_slow_statement    | sql_parse.cc  |        2178 |
| cleaning up          | 0.00000300 | dispatch_command      | sql_parse.cc  |        2143 |
+----------------------+------------+-----------------------+---------------+-------------+
19 rows in set (0.00 sec)

Top 10 Things for IT Professionals

These IT related lists are really quite accurate. I sound like a broken record sometimes when I repeat these things. The articles provides very good detailed descriptions, I’ve included the bullet points just to temp you to read more.

Top ten things ten years of professional software development has taught me.

  1. Object orientation is much harder than you think
  2. The difficult part of software development is communication
  3. Learn to say no
  4. If everything is equally important, then nothing is important
  5. Don’t over-think a problem
  6. Dive really deep into something, but don’t get hung up
  7. Learn about the other parts of the software development machine
  8. Your colleagues are your best teachers
  9. It all comes down to working software
  10. Some people are assholes

The Top 10 Things They Never Taught Me in Design School.

  1. Talent is one-third of the success equation.
  2. 95 percent of any creative profession is shit work
  3. If everything is equally important, then nothing is very important.
  4. Don’t over-think a problem.
  5. Start with what you know; then remove the unknowns.
  6. Don’t forget your goal.
  7. When you throw your weight around, you usually fall off balance.
  8. The road to hell is paved with good intentions; or, no good deed goes unpunished.
  9. It all comes down to output.
  10. The rest of the world counts.

That missing INNODB STATUS

On Thursday I saw something I’d not seen before. An Empty Innodb Status. Now given the amount of output normally shown it was certainly a first. And it looked like:

mysql> SHOW ENGINE INNODB STATUS;
+--------+------+--------+
| Type   | Name | Status |
+--------+------+--------+
| InnoDB |      |        |
+--------+------+--------+
1 row in set (0.03 sec)

To answer some of the most obvious questions.

  • Yes it was a working existing MySQL instance, with InnoDB correctly configured. Indeed we had been benchmarking for several hours.
  • MySQL Server was running, indeed a command selecting data from the mysql schema worked just fine after seeing this (All other tables were Innodb).
  • Absolutely nothing in the host MySQL error log. (This was the second most disappointing aspect)
  • The Process List showed two queries that had been running for some time, everything was taking  ; 1 second. (This was the most disappointing)

So the problem is, MySQL seems to effectively hung when dealing with queries solely in InnoDB tables. Closer investigation found that another application process had filled the /tmp file system. Reclaiming space didn’t cause MySQL and InnoDB to start operating. Even a shutdown of MySQL failed, with mysqld having to be killed manually

For those super inquisitive the version was 5.1.16-ndb-6.2.0-log, and yes it is a Cluster release. I’ve yet to test the problem on a normal 5.1 version and log a bug appropriately if it exists.

I suspect in our benchmark we definitely need to include some timeout handling, so the queries would fail (they were both UPDATES), but it did have the customer asking why, do which there was no answer.

Watching Replication in action

For all those instant GUI people out there, there is an easy way to watch the present status of your MySQL Slaves using the watch command.

$ watch -n 1 -d "mysql -uroot -pxxxx mysql -e 'SHOW SLAVE STATUS\G'"

The watch provides a view of a file or command, and shows interval updates to this output (-n  seconds> option). You can also specific a granularity better then one second for example 0.5. -d also highlights the differences for you. So while you see the following output with your SHOW SLAVE STATUS, on a loaded system you will also see bin-log and relay-log changes, and perhaps Seconds_Behind_Master.

The question is, Why is Seconds_Behind_Master the last column in this display?


*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: localhost
Master_User: repl
Master_Port: 10002
Connect_Retry: 60
Master_Log_File: master-bin.000006
Read_Master_Log_Pos: 102
Relay_Log_File: newyork-relay-bin.000055
Relay_Log_Pos: 244
Relay_Master_Log_File: master-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 102
Relay_Log_Space: 539
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0

Smarter indexing for column LIKE '%string%'

With my very heavy travel load and skilling load I’ve not had time to scratch myself. It hasn’t stopped the brain working overtime on various issues including the classic find a pattern in a string starting with a wildcard character. On a recent gig I saw the true classic.


SELECT columns
FROM users
WHERE username LIKE '%str%'
OR firstname LIKE '%str%'
OR lastname LIKE '%str%'

I went through the various options and comments on leading ‘%’, OR’s, combined columns, FULLTEXT (which doesn’t work in this case), merge indexing etc, however it perplexed me that nobody has really solved this problem, or at least shared their solutions.

I have an idea, a theory and while I’d love to prove/disprove it, I simply just don’t have the time. So here are my notes, hopefully somebody can comment positively/negatively, do some research, or encourage me to pursue it more.

The Problem

The problem is quite simply the leading wildcard, Der!. So how do you eliminate the wildcard from the search?

My Idea

Working with the earlier example, and having a concatenated field of the three components (username,firstname,lastname), my idea is to encode into 27 bits, a bit for each alphabetic character (A-Z) found, and any non-alphabetic character except whitespace in bit 27.

This column is then indexed and searched for bitwise matches. Here is a pictorial description.

String a b c d e f g h i j k l m n o p q r s t u v w x y z
Ronald Bradford 1 1   1   1           1   1 1     1                
Search: ‘brad’ 1 1   1                           1                 *** MATCH ***
Search: ‘fred’       1 1 1                       1                 NO MATCH

My idea is very simple, the question is, will it actually work.

The Questions

The goal is quite obviously to get an Index utilization, to a maximum say of 10%-25% of matching rows. Factors that would affect this include:

  • If search string is one character, like a vowel I could this not useful, so an implied criteria for optimal work is at least 2 or 3 characters.
  • I see the effectiveness lost on large column values, it could be that only up to say 30 characters is optimal, I see strings of 100+ characters would be ineffective.
  • It doesn’t support case-sensitive searching or non ASCII searching

The Tests

  • Create good sized table and distribution of first/last names (A IMDB Actors table would work)
  • Create a concatenated column of searchable fields (e.g. details = CONCAT_WS(‘ ‘,firstname,lastname)
  • Create function to return bitwised search string (e.g. bitize(str))
  • Create Indexed bitwise column and pre-populate accordingly (e.g. bitfulldetails)
  • Create BEFORE INSERT|BEFORE UPDATE Triggers to populate Indexed bitwised column
  • Test the sucker (of course you will need to include the actual LIKE command as well in the WHERE clause)

A sample query would then be:


SELECT columns
FROM users
WHERE bitize('str') & bitfulldetails
AND fulldetails like '%str%'

So the challenge to all those budding MySQL Gurus, does it seem plausible?

What is the maximum number of colons ':' that may appear in a valid URL?

In idle conversation I was asked by MM.

Question: What is the maximum number of colons ‘:’ that may appear in a valid URL?

* If you said zero to one, then you are victim of browsers, and you have never used anything but a browser.

* If you said one, then your a novice.

* If you said two, then you have probably seen http://host:port at some time.

* If you said three, then you would be correct, the elite.

http://user:pass@host:port/location

For the record my initial answer was 2.

CU@UC07


I’ll be speaking at the upcoming 2007 MySQL Conference & Expo (Why they dropped the word User, who knows), this time with Guy Harrison (Author of many books including MySQL Stored Procedures). We will be talking on MySQL for Oracle DBAs and Developers.

Anyway, good friend Paul McCullagh, creator of PBXT will be also speaking on PrimeBase XT: Design and Implementation of a Transactional Storage Engine. He coined to me in an email “CU at the UC”. I’ve done a further level of refactoring, and added marketing. You can buy the shirt online here. (More colors including black and products coming, if you want it now, please ask).

Using Innodb Primary Keys wisely

At a customer site recently I came across the following table definition for an Innodb table. 140 columns, 3 part primary key totally 44 bytes, 2 million rows giving over 900MB in data size. And it had 15 indexes, totaling over 2.3GB in size. Add into the mix a Windows Server 2003 OS, 640MB innodb_buffer_pool_size setting and table scans out the warzoo. This all leads to very poor performance.

In is generally considered best practice for Innodb to use the shortest primary key as possible, and there is a clear reason for this. Innodb stores with every index the full primary key. So for example if an indexed column was 4 bytes in length, in this example the index row would be 48 bytes (before overhead). Fortunately an easy solution presented itself, because of this index storage requirement, Innodb will create an internal 6 byte primary key if none exists for a table. I of course had known about this but had never tried it in a production situation. I come from old school where every table is defined with a primary key.

So a ALTER TABLE [name] DROP PRIMARY KEY results in a long wait, and a reduction in the Index size to 900MB. WOOT! Now, in closer analysis the Primary Key is the Primary Key because it’s the Unique requirement for the table. No problem I just add a Unique Key to replace the previously dropped Primary Key. A check to review the Index Size showed a size of 2.3GB. What the!

It seems if you read the fine print of the MySQL documentation for Innodb Table Structures there is an exception clause. If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.

In my situation, by adding a Unique Key this was in turn converted internally to the new Primary Key. Drat! So to the Innodb developers out there. I’d like to see a way for the internally generated key to remain in this situation, or at least provide the ability for the designer to choose this capability.

The only result is to physically create an INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY column definition. This may seem like not a big thing, but the customer did not want to make any schema changes. However it was a necessity simply as the first step to get reasonable performance.

This was only the first part of the problem, solving the full table scans via creative Indexing was not possible, code changes were out of the question for an immediate fix. The real problem was the size of the innodb_buffer_pool_size. It was just too small to handle both the table data and index pages in memory. Perfmon analysis showed the machine simply when into intensive disk I/O every time any queries were run. On a Linux system with just a database and Innodb tables, it can be recommended that 70-80% of available memory can be allocated to the innodb_buffer_pool_size.

Alas this is Windows 32 bit, and their is an implied 2GB memory limit for any process, so the best one could manage in this situation was 1600MB.
The long and the short was that even with poor database design, and immediate performance improvement occurred with an optimal Primary Key and sufficient memory allocated to the correct MySQL buffers. This is only a temporary fix for a greater problem.

MySQL Camp T-Shirts


For those that attended the MySQL Camp at Google HQ late last year you may have seen me with my own T-Shirt designs. A number of people inquired about getting them. I’ve finally got around to make them available online, so anybody that would like one can order online.

There are two different shirts. If you want your name on the shirt, you need to make sure you choose the correct one.

  • Early Adopters – For those that were the first 48 that signed up, your name as well as position and company are on the shirt.
  • The Herd – For everybody that registered on the website, your name is on the shirt.

Ok. I’ve already been asked why 48. This was the number of registrants when I got the shirt made back in Australia a week or so before the Camp.

There are also plenty more of my MySQL designs at my MySQL Wear Store.

For those that also liked the runner up pin “A mouse is a waste of a perfectly good hand”, you can also get this in it’s original graphical shirt design at Geek Cool – CLI”

MySQL Predictions for 2007

I’m interested to know what people consider will behold MySQL in 2007?

The announcement of “You” as Time person of the year can only considered a huge boost to the opportunities in 2007. So, in 2007 here are my 7 (in no significant order).

  1. 2007 will be the year of the storage engine. We will see 5 offerings for transactional storage engines, 20+ available and practical engines for management of some form of data.
  2. 2007 will see MySQL 5.1 GA (finally).
  3. 2007 will see MySQL release it’s own Falcon Storage Engine (GA not until Q4 :-().
  4. The MySQL Winter of Code will enable the contributions of the community to change feature development. I foresee a Bounty system from an external party or parties for MySQL Features emerging.
  5. MySQL will make major press inroads to the RDBMS Big 3 of Oracle, SQL Server and IBM DB2.
  6. Despite efforts of MySQL AB, major installations of MySQL 4.0 and 4.1 including large ISP’s will hamper the uptake of 5.0 and 5.1 and the de-commissioning of 4.x
  7. A major country government will make an announcement to move to Open Source across servers and desktops, and MySQL will contribute to being an enterprise database offering in systems replacements as part of a longer term strategy.

NY Tech Meetup

Tonight I headed to the NY Tech Meetup organized by the CEO of Meetup and co-founder of Fotolog, the company my friend Frank works for.

This forum provided for quick presentations by new NY high tech ventures and other interesting discussions, then enabling further networking between people.

A Perfect Thing


The first speaker was Steven Levy, mentioned on the site as Newsweek’s tech editor & all-around geek writer extraordinaire. He is the author of “The Perfect Thing”, a story of the Apple iPod. He shared a funny story of a dinner where he was seated with Bill Gates at a Microsoft XP launch in late 2001, in which he had just that week got his initial iPod following the launch. When he gave it to Bill Gates, he observed as he described this mind meld as a votex between Bill’s brain and the iPod while he checked it out, exploring all the menu options, buttons and options. 45 seconds later came the comment of something like, looks great, and it works with a Macintosh.

Urbis

Our second speaker was Steve Spurgat from www.urbis.com. The blub. Urbis is a creative community with three types of users: creative people, those who love and support creative people, and those who have opportunities for creative people. It’s very creative.. Some of the interesting features of this site included:

  • Can pre-define the people that can review your submissions, by various criteria, meaning that your feedback can be restricted.
  • You can specify your specific goals for your submission.
  • You can select the present opportunities for your submission.
  • There is an economy system to see reviews of your own work you must review others

Presently only writing is available, but plans for Music, Art and Film will be available in the next few months. With some 12,000+ members and 13% active, it’s a good start.

There was also discussion of copyright, Urbis being a registered copyright agent complying with government guidelines, and of revenue models including the option for fees from publishers, and the potential of ad copy. A competitor site Trigger Street was also mentioned, started by Kevin Spacey.

One Web 2.0 thing I liked about this site, and the next was that the website was the presentation (no powerpoint), and while talking the home page of the website was displayed and the content was dynamically changing, in this case, reviews being submitted online. A good selling point.

LinkStorms

Scott Kolber of LinkStorms was our next presenter. Described as the next generation of links for the web providing context specific fast links and specific navigation from a button, images, banner ad.

The revenue model is CPM plus a publishers setup, maintenance and support fee structure. Apparently up to 40% click thru rate, which is extraordinary compared to the current stats of < 1% for general banners.

When asked what was different with this model, the answer was "the results. It's a better user experience looking at ads".

You can see it in action at Premiere Magazine – The Departed.

CogMap

Brent Halliburton and his approach to a wikipedia of Organization charts with CogMap certainly got the best response the crowd. A good comedian, Brent made the mistake with a slow Internet connection to demonstrate interactively with an example from the audience and not his own prepared content. It ended up not rendering, then crashing but he managed to turn it around into a plus and the best applause of the night.

His idea provoked a wide range of comment and feedback and when asked why? “Because if your an entrepreneur you do things”. “In the big scheme of things I don’t have all the answers. I just put it out there.”

uPlayMe

David Fishman provided the last presentation of uPlayMe, a Windoze program that provides a slant on the community social networking via enternaintment, specifically when they are actually playing via Windows Media Player for example. It’s designed to help people discover other people with the same interests, or weird interests. Some other sites mentioned in the discussion included Last.fm, Pandora and MOG.

2007 Predictions

We ended with an audience participated 2007 predications. The included:

  • No Predication – (The first person from the Board of Advisors I believe that was specifically asked)
  • IP TV market and integration with the TV
  • Will see a Billion $ organization from the NY community
  • The buzz of radios that can do multiple gigibits of transfer between neighours (yes it sounded weird)
  • Era of the connected home, Computer, TV, Stereo
  • Some political thing at change.net
  • Another political thing, an organic style camp debrief
  • The Term 2.0 will cease being used in 2007
  • Skype will be a source of major innovation
  • NY will produce a billion dollar Internet company

Pluggable Storage Engines – What is the potential?

I started this post a month ago, but after Kaj’s discussion on the same topic at the MySQL Camp I figured it was time to post.

I had dinner with a friend recently (a very smart friend), and our conversation lead him to ask “What’s different with MySQL?”. One of the things I tried to describe was the “Pluggable Storage Engine Architecture” (PSE) potential for the future that I expect will set MySQL apart from all other Open Source and even commercial databases.

Here are some details of the example I tried to provide, given somebody who understands enough of the general principles of RDBMS’s.

Consider the ability that information (intelligent data) is available within a Relational Database via the appropriate tools and language (e.g. SQL) but it is not physically constrained to Tables, Columns and Rows of data and an application to manage that data which is the present traditional approach. Let’s use images that you take with your digital camera as an example.

In a typical RDBMS application you would create an application to managed the content of your data, with a number of tables, and links to the images etc. Of course you would need an application as well to both view and manage this information.

What if, you simply pointed your database to a directory of images and then was able to query information such as photos by date, or by size, by album, from a certain location, with given keyword etc. Most of this information about digital photographs is already there. This information is encoded into an Exif format that is embedded within JPEG images.

So what’s missing from this information? Tags and Comments are the most obvious, because this information can’t be determined electronically, this is something that humans do. If you could also embedded this information into an image with a suitable tool then you could be ready to manage your photos.

A further extension would be to have Image Analysis capabilities that enabled you to search for photos that contained the sky, or people, or something that was the color red.

What if in the future, your camera’s had a built in GPS and this information recorded within Exif, and then the ability to extend your output to link to popular on line mapping software such as Google Maps would exist. You could then use your digital camera to track your moves, taking photos that could then plot your path over a holiday, and also enabling location based queries.

It was interesting to postulate what ideas may be possible in the futre. I suspect that it won’t be long before we actually see this. So what are the other potentials that you may not consider. Another example may be a MP3 Dukebox style PSE, managing all the information held with ID tags of MP3 allowing you to do with music what could be done with images.

References

Exif Example

Here is some example content of Exif using EXIF Tool

 ./exiftool ~/Desktop/2006_02_23_AirShow/IMG_5966.JPG
ExifTool Version Number         : 6.50
File Name                       : IMG_5966.JPG
Directory                       : /home/rbradfor/Desktop/2006_02_23_AirShow
File Size                       : 2 MB
File Modification Date/Time     : 2006:09:24 17:44:32
File Type                       : JPEG
MIME Type                       : image/jpeg
Make                            : Canon
Camera Model Name               : Canon EOS 300D DIGITAL
Orientation                     : Horizontal (normal)
X Resolution                    : 180
Y Resolution                    : 180
Resolution Unit                 : inches
Modify Date                     : 2006:02:23 16:01:56
Y Cb Cr Positioning             : Centered
Exposure Time                   : 1/320
F Number                        : 10.0
ISO                             : 200
Exif Version                    : 0221
Date/Time Original              : 2006:02:23 16:01:56
Create Date                     : 2006:02:23 16:01:56
Components Configuration        : YCbCr
Compressed Bits Per Pixel       : 3
Shutter Speed Value             : 1/320
Aperture Value                  : 10.0
Max Aperture Value              : 3.5
Flash                           : No Flash
Focal Length                    : 18.0mm
Macro Mode                      : Unknown (0)
Self-timer                      : 0
Quality                         : Fine
Canon Flash Mode                : Off
Continuous Drive                : Single
Focus Mode                      : AI Focus AF
Canon Image Size                : Large
Easy Mode                       : Manual
Digital Zoom                    : Unknown (-1)
Contrast                        : +1
Saturation                      : +1
Sharpness                       : +1
Camera ISO                      : n/a
Metering Mode                   : Evaluative
Focus Range                     : Not Known
AF Point                        : Manual AF point selection
Canon Exposure Mode             : Program AE
Lens Type                       : Unknown (-1)
Long Focal                      : 55
Short Focal                     : 18
Focal Units                     : 1
Max Aperture                    : 3.6
Min Aperture                    : 22
Flash Activity                  : 0
Flash Bits                      : (none)
Zoom Source Width               : 3072
Zoom Target Width               : 3072
Color Tone                      : Normal
Focal Plane X Size              : 23.22mm
Focal Plane Y Size              : 15.49mm
Auto ISO                        : 100
Base ISO                        : 200
Measured EV                     : 9.00
Target Aperture                 : 10
Target Exposure Time            : 1/318
Exposure Compensation           : 0
White Balance                   : Auto
Slow Shutter                    : None
Shot Number In Continuous Burst : 0
Flash Guide Number              : 0
Flash Exposure Compensation     : 0
Auto Exposure Bracketing        : Off
AEB Bracket Value               : 0
Focus Distance Upper            : -0.01
Focus Distance Lower            : 5.46
Bulb Duration                   : 0
Camera Type                     : EOS Mid-range
Auto Rotate                     : None
ND Filter                       : Unknown (-1)
Self-timer 2                    : 0
Bracket Mode                    : Off
Bracket Value                   : 0
Bracket Shot Number             : 0
Canon Image Type                : IMG:EOS 300D DIGITAL JPEG
Canon Firmware Version          : Firmware Version 1.1.1
Camera Body No.                 : 0930402471
Serial Number Format            : Format 1
File Number                     : 159-5966
Owner's Name                    :
Canon Model ID                  : EOS Digital Rebel / 300D / Kiss Digital
Canon File Length               : 2387078
WB RGGB Levels Auto             : 1726 832 831 948
WB RGGB Levels Daylight         : 0 0 0 0
WB RGGB Levels Shade            : 0 0 0 0
WB RGGB Levels Cloudy           : 0 0 0 0
WB RGGB Levels Tungsten         : 0 0 0 0
WB RGGB Levels Fluorescent      : 0 0 0 0
WB RGGB Levels Flash            : 0 0 0 0
WB RGGB Levels Custom           : 0 0 0 0
WB RGGB Levels Kelvin           : 0 0 0 0
Color Temperature               : 5200
Num AF Points                   : 7
Canon Image Width               : 3072
Canon Image Height              : 2048
Canon Image Width As Shot       : 3072
Canon Image Height As Shot      : 2048
AF Points Used                  : Mid-left
Preview Quality                 : Normal
Preview Image Length            : 278318
Preview Image Width             : 1536
Preview Image Height            : 1024
Preview Image Start             : 2108760
Preview Focal Plane X Resolution: 3443.9
Preview Focal Plane Y Resolution: 3442.0
User Comment                    :
Flashpix Version                : 0100
Color Space                     : sRGB
Exif Image Width                : 3072
Exif Image Length               : 2048
Interoperability Index          : R98 - DCF basic file (sRGB)
Interoperability Version        : 0100
Related Image Width             : 3072
Related Image Length            : 2048
Focal Plane X Resolution        : 3443.946
Focal Plane Y Resolution        : 3442.017
Focal Plane Resolution Unit     : inches
Sensing Method                  : One-chip color area
File Source                     : Digital Camera
Custom Rendered                 : Normal
Exposure Mode                   : Auto
Scene Capture Type              : Standard
Compression                     : JPEG (old-style)
Thumbnail Offset                : 2560
Thumbnail Length                : 7680
Image Width                     : 3072
Image Height                    : 2048
Aperture                        : 10.0
Drive Mode                      : Single-frame shooting
Flash                           : Off
Image Size                      : 3072x2048
Lens                            : 18.0 - 55.0mm
Preview Image                   : (Binary data 278318 bytes, use -b option to extract)
Preview Image Size              : 1536x1024
Scale Factor To 35mm Equivalent : 1.6
Shooting Mode                   : Program AE
Shutter Speed                   : 1/320
Thumbnail Image                 : (Binary data 7680 bytes, use -b option to extract)
WB RGGB Levels                  : 1726 832 831 948
Blue Balance                    : 1.140108
Circle Of Confusion             : 0.019 mm
Focal Length                    : 18.0mm (35mm equivalent: 27.9mm)
Hyperfocal Distance             : 1.67 m
LV                              : 14.0
Lens                            : 18.0 - 55.0mm (35mm equivalent: 27.9 - 85.3mm)
Red Balance                     : 2.075767

Zune or zzz?

An interesting article in the local New Jersey paper this week. “Will Microsoft iPod-rival Zune be a ‘zoom’ or more of a ‘zzz’. This is Microsoft’s attempt to finally case in the 1.5 billion songs that have been sold online bt the Apple iTunes store. A comment from the article.

The Zune’s wireless function also taxes battery life. And Zun users – Zunies? – are in for some surprises when the actually try zapping songs to each other.
Those songs will deactivate in three days,or after three plays – whichever comes first.

I had a friend also tell me that the new Zune is not compatible with the upcoming Windows Vista. Now if that’s actually true, that’s amazing.

The desire for Performance SQL Tips

It seems, people are clammering for a more consolidated help guide for SQL Performance tips.

Jay Pipes at the MySQL Camp ran a session Interactive Top 10 SQL performance Tips. There was plenty of input and discussion, and at the time Sheeri simply typed them into a wiki page for later work.

Well it seems even that rough list is popular at Del.icio.us ranking near the top of the Hot List on the front page. I saw it earlier and it was second or third, but didn’t think of taking a screen shot until now, but it’s still high.

I’d say that we could easily get the Top 10 for up to 10 different categories rather easily. Good luck Jay.

The Falcon!

Some early notes by Brian Aker on Falcon as discussed at the MySQL Camp.

Falcon is a transactional engine MySQL will be introducing. The first discussions were held about 3 years ago with Ann Harrison and about 1 1/2 years ago, MySQL started taking seriously the possibilities.

Falcon is not an InnoDB replacement. It’s a different way of looking at the problem of how it looks at and manages transactions, and how it’s designed. It flips around the way data is stored. Some points:

  • It uses as much memory as possible, like Oracle SGA or InnoDB pool.
  • It has a row cache not a page cache for more optimal memory use.
  • No locking at all. Jim doesn’t believe in it for concurrency control. It has total versioning.
  • Falcon has to keep all changes in memory, so not great for user transactions that may take longer
  • Characteristics – Well optimised for short fast web transactions, Designed for environments with lots of memory.

In general discussions is was mentioned from the floor the fear that there will be so many storage engine options, and you will need a matrix for what is good for what.

In conclusion, Brian mentioned it will be alpha before the end of year.

MyISAM++

Monty gave us a quick overview of next generation of MyISAM. It is set to include:

  • New data disk format
  • Transaction support
  • multi-versioning
  • row level locking and escalation to table level locks. (interesting)
  • bitmap indexes and new table scanning optimizing indexes with up to 1000x times performance.

No details of time frame were given for delivery, however development is well underway.