Archive for the ‘MySQL User Conferences’ Category

What’s in a new name

Wednesday, April 16th, 2008

Also in the MySQL Press Releases today but dated for tomorrow is Sun Microsystems Announces MySQL 5.1.

I find the wording clearly a new language from my previous understanding — “pending general availability of MySQL™ 5.1″.

We now see the trademark notice, obviously a Sun influence.
We now have a “pending” GA version. MySQL is obviously very keen to release MySQL 5.1 for GA. This was expected at last years’ MySQL conference. Many in the past year have expected this prior to this year’s conference (we are now being informed late Q2 2008). There was an anticipation there would be two RC versions, this is now the third. So what exactly does “pending” mean? Will 5.1.24 be renamed Production if it passes community acceptance (I say community because it’s not an Enterprise release). This would be a change from previous naming policy. What’s most likely is hopefully they release 5.1.25 as production. Comes back then to why the words “pending general availability”, and not “next release candidate” which is what it is.

Previously MySQL also made new with the initial RC status of 5.1, moving away from the previously policy.

Standing room only

Wednesday, April 16th, 2008

At Day 1 of the 2008 MySQL Conference and Expo today, our high numbers of attendees (reported at 2,000) have resulted in Standing Room only in a lot of talks. This has got to be excellent PR.

I got to sit in on the Memcached and MySQL session by Brian Aker and at the end I stuck my head in to Best Practices for Database Administrators by and Explain demystified by Baron Schwartz both 2008 MySQL Award winners.

All full to overfilling presentations.

PrimeBase XT (PBXT) in the news

Wednesday, April 16th, 2008

In today’s Official MySQL Press Release, PrimeBase XT (PBXT) has been named along with three other storage engine partners in “Sun Celebrates Third-Party MySQL Storage Engines“.

This a great achievement for a small company to be recognized in the certified storage engine partner program, in comparison to the other companies that are much larger on the balance sheet. This continues the news with last week you may have read PrimeBase Technologies a MySQL Platinum Level Partner.

A noted absence from the list is Nitro, an expected absence was Solid.

Off to a flying start

Wednesday, April 16th, 2008

Marten has opened the 2008 MySQL Conference & Expo. This time he started in his opening comments “I have more to say to more people, and given less time to say it”.

His answer to why Sun bought MySQL included slides showing “Alignment in Culture and Vision” and “What’s in it for you – Performance & Scale, Support, Marketplace”.

This year the MySQL Conference has over 2,000 people and 55 exhibitors.

What was funny, was the photo showing the burning of the IPO Prospectus. Marten mentioned now with many Sun lawyers he has to be more careful what to day. I actually have an interesting extension to this at Watching what you say

Some points of note for me:

  • The Web Economy continues to have exponential growth and the need for new technology but a goal of linear growth.
  • Continuing to mention becoming “Disruptive Online Innovations”
  • The Online landscape consists of the Software Development Model, Business Model,Software Deployment Model and Organisational Model.
  • He Reiterated the Design Priorities at MySQL. Reliability, performance, easy of Use
  • MySQL Workbench is now GA. Congratulations Michael Zinner.
  • On Storage Engine Update, PrimeBase, PBXT and Paul McCullagh got a mention. In addition to he usual suspects, but ScaleDB & Tokutek are newly mentioned and are Exhibitors this year.

A few words from Jonathan Schwartz

Wednesday, April 16th, 2008

Following Marten Mickos, the second opening keynote at the 2008 MySQL Conference and Expo was by Jonathan Schwartz CEO and President of Sun Microsystems. Blog

His opening joke was about dinner with Marten, to which Marten said “You not going to get a keynote, unless you buy the company.”

So what was striking for me in his presentation “What is Sun’s Agenda?

  • There is no open-source phone yet, but that’s an industry that needs disrupting.
  • Like the need for water or electricity, The Network Has Become A Social Utility.
  • We want to work with the community, create greater innovation.
  • The future, the price tag of Free, the philosophy of Freedom

I had a chance to meet Jonathan and Rich Green on Sunday night, and it was great to see Jonathan learning about, and getting behind the product PBXT – The Community Engine a MySQL 5.1 open source engine.

MySQL Awards at the MySQL Conference & Expo

Wednesday, April 16th, 2008

Announced this morning is this year’s MySQL award winners.

2008 MySQL Application of the Year

  • FaceBook – Social Network
  • Virgin Mobile France – Mobile Operator
  • eBay – ECommerce Site

2008 MySQL Partner of the Year

  • Zmanda
  • Microsoft
  • Computer center

2008 MySQL Community Member of the Year

Watching what you say?

Wednesday, April 16th, 2008

Marten has opened the 2008 MySQL Conference and Expo this morning in Santa Clara.

What was funny in the early slides was the photo showing the burning of the IPO Prospectus. Marten mentioned now with many Sun lawyers in the audience he has to be more careful what to say.

This morning while coming down for breakfast, a Sun employee entered the lift. I introduced myself, and he indicated he knew me by name. When I asked what department he was in, he said “legal”. Being intrigued as to who he knew me, I’d discovered he has read my emails and blog posts.

I must admit I’ve met a number of new people this week and the first word has been “Oh!”, as in they have heard of my name previously.

These have been unexpected responses and information for me, I’m not normally surprised like this. I’ll not be changing what I say, and how I say it, professionally my writing and publishing will continue to embody “Opinions, Expertise, Passion”.

Trying out Google App Engine

Saturday, April 12th, 2008

I got my registration for Google App Engine this morning after being Waitlisted previously.

Between flying for about 15 hrs tomorrow and then the 2008 MySQL Conference & Expo where I’m presenting and running an Exhibitors booth, fat chance I’ll get to look into this much over the next week.

Pity!

MySQL Community Photo Day Prizes

Friday, April 11th, 2008

I forgot to mention in Support the MySQL Community Photo Day that PrimeBase Technologies is providing 3 prizes for the best photos uploaded.

First Prize $150 Amazon Gift Voucher
Second Prize $100 Amazon Gift Voucher
Third Prize $50 Amazon Gift Voucher

So, take your photo with other community supporters — they’ll also be wearing their open source t-shirts. You can upload photos to www.flickr.com/groups/mysqlcommunityphotoday.

The pursuit of a synchronous world

Friday, April 11th, 2008

Well at least your MySQL database world.

As Paul eluded to, PrimeBase Technologies has a project to provide synchronous replication for MySQL in a High Availability environment. It is more then an idea, there is a plan.

Is it possible?
What are the use cases?
How can you use it?
Would you use it?

Some input to date. We need these questions and more, and we seeking more input for discussion.

Unfortunately the opportunity to hear any input during a presentation is left to the last day of the conference, so there has been created a BoF session on Tuesday night for a round table discussion if necessary. People are encouraged to bring specific cases and situations for feedback, the reasons why MySQL Replication, MySQL Cluster, DRBD/HeatBeat or any other solution does not satisfy your needs, and what would.

If you can’t wait you may need to seek out Booth #518, and make a time to seek out the Technology Expert.

Just today, Peter Zaitsev of MySQL Performance Blog also writes in State of MySQL Market and will Replication live ?, “Customers are constantly asking me if there is something which would help them to scale MySQL and get some HA out of the box even on the medium level. Seriously – MySQL Cluster, Continuent, Master-Master Replication, DRBD or SAN based HA architectures all have their limits which makes neither of them used for very wide class of applications.” to confirm this pursuit.

Support the MySQL Community Photo Day

Thursday, April 10th, 2008

Updated On good advice from Sheeri I made a few comments clearer.

It has been proposed that the integral “MySQL Community Photo Day” be on Thursday April 17 2008, the final day of the MySQL Users Conference.

Wear a t-shirt from an open source community project on Thursday, whether a PrimeBase PBXT one or your favorite open source project. Get your photo taken with the masses of community supporters. If your not attending this year, this doesn’t mean you can’t also contribute a photo yourself from what location you are from. Save a saving fund for next year but get us a photo.

It doesn’t have to just be Thursday, photos will be accepted at any time before then. Upload a photo and win a prize. (Baron you definitely get chocolate)
Their is a Flickr group called “MySQL Community Photo Day” at http://www.flickr.com/groups/mysqlcommunityphotoday

This is a chance for you to support the community. There is already a small band of supporters before my post already including Colin C, Paul M, Lenz G, Baron S, Sheeri K.

MySQL Speakers and Presenters at LinkedIn

Thursday, April 10th, 2008

There is a Linked In group I created some time ago but forgot to advertise that is for MySQL Speakers and Presenters.

If you a speaker or presenter of MySQL content, confirm your registration here.
You will need to have a reference to a website confirming you have been a speaker at a MySQL Event such as a User Conference, MySQL Camp or Local MySQL Users Group.

Hopefully overtime we can build a consolidated index at MySQL Forge Presently some pages exist including MySQL Conference & Expo and User Group Presentations but I’m accepting input for a model to have a central page, and link to or upload of presentation. Any input welcome.

Storage Engines at the MySQL Conference

Tuesday, April 8th, 2008

I’ll be following closely the progression of Storage Engines available in the MySQL Database server, well soon to be available when 5.1 gets to GA (hopefully by end of Q2 which is what we have been told). Tick, Tick, time is running out.

PrimeBase XT (PBXT) and Blob Streaming is a focus as PrimeBase Technologies, the company which I want to note for people is an Open Source company, committed at providing an open source alternative to the other commercial players. You also have at the MySQL Conference talks on the the existing InnoDB from Innobase (a subsidiary of market RDBMS leader Oracle). There is a Nitro presentation, an Infobright presentation, no Solid presentation surprisingly (the IBM news happening after submissions closed). We also have from MySQL, presentations on the internally developed storage engines Falcon and Maria, both products that won’t even be in 5.1 but 6.0, however Maria is presently a different branch of 5.1 so I don’t know how that works. Will it be in 5.1?

But what I want to seek is more news of KickFire, a Diamond Sponsor, an engine with embedded H/W, something that’s been obviously worked on in reasonable stealth. For me it’s not just interesting, it’s a competitor in our technology space, so I’ve been researching Joseph Chamdani and some of his patents.

Plenty of news in the past few weeks on Kickfire including Kickfire Update by Keith Murphy on April 3, Kickfire: stream-processing SQL queries by Baron Schwartz on April 4, Kickfire looking to push MySQL limits by Farhan Mashraqi on April 4, and Kickfire Kickfire Kickfire by Peter Zaitsev on April 4, and myself back on March 23.

So what can I make from the lack of company information and posted information to date.

  • Hardware based acceleration.
  • No Solid State Drive (SSD) Technology, at least not yet but C2App mentions SSD.
  • Data Warehousing, lending to thinking it’s not a transactional storage engine
  • A new storage engine and a new approach to data storage. I find this surprising, as it takes years to develop a feature complete storage engine, and most new 5.1 storage engines are indeed existing products, take Nitro, Solid, Infobright and Falcon. Only PBXT has been written from the ground up for MySQL 5.1, so looking to know more about it’s development
  • Expensive, it’s dedicated H/W + (assuming) MySQL Enterprise + Storage Engine

Come and get a t-shirt at UC2008

Sunday, April 6th, 2008

Here I am at my desk sporting the PrimeBase supporters t-shirt that will be available at the exhibitors booth at the 2008 MySQL Conference. The front is rather uneventful with the official logo, but the back will be worth the experience. So everybody interested in supporting PBXT as the transactional storage engine for MySQL developed by the community and for the community, please come and see us and mention the secret password.

We have been placed way back in the right hand side of the exhibitors hall at booth 518, in front of the Open Source and OEM providers.

Learning SEO the painful way

Thursday, March 27th, 2008

Indeed I have a goal of launching a consolidated site of my online presence at ronaldbradford.com at some time soon, and even now I have found I’ve made some SEO 101 mistakes, just in my testing site, and my temporary placeholder.

As a database expert I see plenty of database 101 mistakes with most clients, so part of why my site is going nowhere is I don’t want to make SEO 101 mistakes, especially as I’m not launching a new site, but a migration of existing content to one site.

I see nobody at O’Reilly has made improvements to the redirection mess of the MySQL Conference website as described by Farhan Mashraqi in Someone please change mysqlconf.com redirection, and so rather then linking to www.mysqlconf.com which I have done, I’ve linked to the direct page, which I’m sure will probably change after the conference making this a broken link.

I am concerned that a larger organization can’t get this right. Is SEO/SEM not important to them? It will also be of interest to see what happens here with Sun acquiring MySQL. Sun did a rather detailed job of MySQL content on www.sun.com. Time will tell I guess.

Companies speaking at UC2008

Monday, March 24th, 2008

The Conference Speakers of the 2008 MySQL Conference provides some common and interesting names of companies not common in MySQL circles such as eBay, Microsoft Corporation, HP, Symantec. I see speakers outside of MySQL from countries including USA, Canada, Brazil, Germany, Japan and Australia.

I did some data analysis of the speakers list. There are 150 speakers, there are 45 from MySQL. Other companies with multiple speakers include Sun Microsystems, Kickfire, Linbit, Cafepress, Open Query, Proven Scaling, Standford Linear Accelerate Center, UC Berkeley, Siz Apart, The Hive, Zmanda, MySQL Performance Blog, Infobright, Digg, Grazr and of course PrimeBase Technologies.

Only two MySQL speakers have listed “MySQL/Sun” the rest are “MySQL”. I wonder what the policy is here? You have “Oracle / Innobase” and “Innobase / Oracle Corp.” some identity crisis there, the guys from “MySQL Performance Blog” prefer this name over the company name “Percona”, obviously for brand exposure. You have “Grazr Corporation” and “Grazr Inc”. It’s only trivia but interesting.

Upcoming 2008 MySQL Conference

Sunday, March 23rd, 2008

It’s just three weeks now before the 2008 MySQL Conference. Good to see my mug shot on the front page (see screen shot below).

I will still be presenting my session Top 20 DB Design Tips Every Architect Needs to Know, however as a departing MySQL Employee I’ve had to give up the chance to present the “MySQL for Oracle DBA’s Bootcamp” tutorial, content that I developed for MySQL specifically and have already presented three one day seminars in New York, San Francisco and Washington DC.
Update March 26 2008. I should clarify that I notified MySQL as part of my exit items that I would not be able to present the Tutorial. I would very much like to, and being the author of the content I am well qualified, however as this was developed for MySQL and will be again used by MySQL in the future I felt it was inappropriate that a non MySQL employee and a recent departed employee was presenting this content. I did not want for any attendees to be confused or see a potential conflict of interest and I wanted to ensure I kept my distance from the strict Sun Intellectual Policy procedures.

Kickfire formally C2 App a Diamond sponsor will be something I’m very interested in seeing, MySQL being written specifically in hardware. A few talks I’ll be interested in seeing include Securing MySQL for a Security Audit by Brian Miezejewski, Disaster is Inevitable—Are You Prepared? by Farhan Mashraqi, Developing Information Schema plugins by Mark Leith, Astronomy, Petabytes and MySQL and System Techniques to Remove I/O Bottlenecks in Large Query Intensive Applications.

What impact will Sun Microsystems a Platinum sponsor and acquirer of MySQL have at the conference? CEO Johnathan Schwartz will be following former CEO Marten Mickos’ open keynote with Open Source: The Heart of the Network Economy. There are a number of Sun related talks and Sun employees speaking. I’m sure attendance numbers of Sun employees will be up also.

PrimeBase Technologies will be out in force at the Exhibitors hall, so be sure to stop by and say hello, and get a free t-shirt from them.

Everything fails, Monitor Everything

Sunday, May 20th, 2007


From the recent MySQL Conference a number of things resonate strongly almost daily with me. These included:

  1. Guy Kawasaki – Don’t let the bozos grind you down.
    • Boy, the bozos have ground me down this week. I slept for 16 hrs today, the first day of solid rest in 3 weeks.
  2. Paul Tuckfield – YouTube and his various caching tip insights.
    • I’ve seen the promising results of Paul Tuckfield’s comment of pre-fetching for Replication written recently by Farhan.
  3. Ramus – SSL is not secure — This still really scares me.
    • How do I tell rather computer illiterate friends about running multiple browsers, clearing caches, never visiting SSL sites after other sites that are insecure etc.
  4. Everything fails, Monitor Everything – Google

What I’ve been working on most briefly lately, and really want to be far more prepared everywhere I go is Monitor Everything.

It’s so easy on site to just do a vmstat 1 in one session and a mysqladmin -r -i extended-status | grep -v ” | 0 “ in another, and you may observe a trend, make some notes, say 25% CPU, 3000 Selects, 4000 Insert/Updates per second etc, but the problem is, the next day you don’t have actual figures to compare. What was the table_lock_waits yesterday, they seem high today.

I also only found a problem on a site when I graphed the results. I’ll give you a specific example. The average CPU for the system was 55%, the target was 50%. When graphing the CPU, it was plainly obvious something was not right. I could see with extremely regularity (and count 12 in one hour) a huge CPU spike for a second or two. It was so regular in the graph it was not possible it was random. So, after further investigation and testing, a 5 minute job on this production server (and not on previous testing servers) took 25% CPU for a second or two, and a huge amount of Page Faults. Did it effect the overall impact of the performance of the system. I don’t know, but it was a significant anonomoly that required investigation.

So, quite simply, always monitor and record so you can later reference, even if you don’t process the raw figures at the time. The question is then, “What do I monitor”. Answer, monitor everything.

The problem is with most monitoring, e.g. vmstat and mysqladmin is the lack of a timestamp for easy comparison. It’s really, really annoying that you can add this to the line output. The simple solution is to segment your data into both manageable chunks and consistent chunks.

Manageable chucks can be as easy as creating log files hourly, ensuring the start exactly at the top of the hour. Use a YYYYMMDD.HHMMSS naming standard for all files and you can never go wrong.
Consistent chunks is to ensure you start all manageable monitor (e.g. hourly) at the exact same time, so you can compare.

You need to monitor at least the following:

  • vmstat
  • mysqladmin extended-status
  • mysqladmin processlist
  • mysqladmin variables
  • mysqladmin -r -i [n] extended-status | grep -v ” | 0 “

I haven’t found an appropriate network monitoring, but you should also at that.

The issue here is frequency. Here are some guidelines. vmstat every 5 seconds. extended-status and processlist every 30 seconds, variables every hour, and extended-status differences is difficult, but it saves a lot of number crunching later for quick analysis. I do it every second, but not all the time, you need to work out a trigger to enable, or to say run it for 30 seconds every 15-30 minutes.

So in one hour I could have:

  • 20070519.160000.os.vmstat.log
  • 20070519.160000.mysql.variables.log
  • 20070519.160000.mysql.status.log
  • 20070519.160000.mysql.processlist.log
  • 20070519.160500.mysql.status.increment.log, then 1610, 1620, 1630 etc

I have my own scripts for monitoring under development, and I’ve been revising slowly, particularly to be able to load data into a MySQL database so I can easily use SQL for analysis. One thing I actually do is parse files into CSV for easy loading.

There are two tools out there that I’m reviewing and you should look at. Mark Leith has written a Aggregated SHOW STATUS stat pack, and there is also tool called mysqlreport. These both go some what to ultimately what I want.

I haven’t used it yet, but I’ve seen and been very impressed with the simplicity of Munin for graphing. I really need to get some free time to get this operational.

So Monitor Everything and Graph Everything. Plenty of work to do here.

The MySQL Conference recap

Tuesday, May 15th, 2007

I recently had the opportunity to return and speak at the Brisbane MySQL Users Group. I spent some time talking about MySQL User Conference 2007 Summary and Life as a Consultant. My summary of included:

  • Overview
  • Keynotes
  • Marten Mickos – MySQL
  • Guy Kawasaki
  • Michael Evans – OLPC
  • Rasmus Lerdorf – PHP
  • Paul Tuckfield – YouTube
  • Community Awards
  • Product Road Map
  • Google
  • Storage Engines
  • Dorsal Source
  • What’s Next

One question was posed to me. “What new did MySQL do this year?” being from the last User Conference. MySQL did seem to not make a great impact at the conference over the successes of the previous year. I had to think some time to come up with the following list.

And most recently:

  • Open Source Database Vendor Partners with LINBIT to Jointly Promote & Support DRBD for MySQL Enterprise Read More
  • IBM DB2 as a Certified Storage Engine for MySQL on System i Read More

It’s hard to say if these are big ticket items or not, but it is definitely disappointing that 5.1 GA is still MIA. We stay tuned.

I also managed a much better response then from my Conference Presentation opening Slide.


“How can you tell an Oracle DBA has touched your MySQL Installation?”

MYSQL_HOME=/home/oracle/products/mysql-version
mysqld_safe –user=oracle &

MySQL Cluster Certified

Saturday, April 28th, 2007

Jonathon Coombes recently blogged in MySQL Cluster Certified that he passed the MySQL Cluster DBA Certification as was the first Australian. Lucky for him I passed the exam after my presentation on the second day of the conference. I guess us Australian’s are leading the world!

As Jonathon said it was rather hard, certainly more difficult then the other DBA exams but nothing for an experienced Cluster DBA.

MySQL Conference – YouTube

Friday, April 27th, 2007

MySQL Conference 2007 Day 4 rolled quickly into the second keynote Scaling MySQL at YouTube by Paul Tuckfield.

The introduction by Paul Tuckfield was; “What do I know about anything, I was just the DBA at PayPal, now I’m just the DBA at youTube. There are only 3 DBA’s at YouTube.”

This talk had a number of great performance points, with various caching situations. Very interesting.

Scaling MySQL at YouTube

Top Reasons for YouTube Scalability

The technology stack:

  • Python
  • Memcache
  • MySQL Replication

Caching outside the database is huge.

It a display of numbers of hits per day it was said “I can neither confirm or deny the interpretation will work here (using an Alexa graph)”. This is not the first time I’ve heard this standard “Google” response. They must get lessons by lawyers in what you can say.

Standardizing on DB boxes (but they crash almost daily)

  • 4x2ghz opteron core
  • 16G RAM
  • 12x10k scsi
  • LSI hardware raid 10
  • Replication played a big part in fixing
  • Get a reliable H/W supplier

Replication Lessons

  • You don’t worry about it when a replicas fail.
  • One thing that sucks, Innodb doesn’t recover very fast. It does that durability think, but it takes hours to finish recovering (was it going to finish)
  • How many backups can you restore. When you switch you a replica, are you sure it’s right?
  • Did you test recovery, did you test your backups.
  • replication was key to trying different H/W permutations to identify incompatible H/W (combinations of controllers/disks)
  • we got good at re-parenting/promoting replicas, really fast
  • we built up ways to clone databases as fast as possible
  • Excellent way to test tuning changes or fixes (powerful place to test things)
  • Keep “intentional lag”/Stemcell replicas – Stop SQL thread, keeps a server a few hours or a day behind. Say if you drop a table you have a online backup.
  • When upgrading, always mysqldump then reload, rather then upgrade database.
  • Don’t care about CPU’s. I want as much memory as possible, I want as many spindles as possible.
  • For YouTube 2-3 second lag is acceptable.

If you db fits in ram, great otherwise

  • Cache is king
  • Writes should be cached by raid controller (buffered really) not the OS
  • Only the db should cache reads (not raid, not Linux buffer cache)

Only DB should cache reads

  • Hit in db cache means lower caches went unused.
  • Miss in db cache can only miss in other caches since they’re smaller.
  • Caching reads is worse then useless. It’s serialized writes.
  • Avoiding serialization in reads reaps compounds benefits under high concurrency

An important lesson learned. Do no cache reads in F/S and Raid Controller.

Caching Lessons
Overcoming Mystery Serialization

  • Use O_DIRECT
  • vm.swappiness=1-5
  • if you’re >80% buys — your not doing I/O concurrently look at other figures e.g. 80% busy 8 I/O’s, next configuration 80%, only 4 I/O’s
  • Mirror in H/W strip in S/W

Scale Out

  • Writes are parallel to master, but serialized to replicas. We need true horizontal partitioning.
  • We want true independent masters
  • EMD – Even More Databases — Extreme Makeover Database
  • Slave transactions must serialize to preserve commit order (this is why replication is always way slower)
  • The oracle caching algorithm (that’s a small o) — predicting the future
  • Replication lags: one IO bound thread. You do know the future, commands are coming up serially.
  • Write a script to do reads, before updates coming up (because they are cache hits).
  • The diamond. For golive, play shards binlogs back to original master for fallback.

MySQL Conference – Get Behind Dorsal Source

Friday, April 27th, 2007


In a community session yesterday at MySQL Conference 2007, I first heard about Dorsal Source. A collaboration between Solid DB and Proven Scaling that allows for community people to upload patches to MySQL, get it compiled across multiple platforms, and have a downloadable distribution available on H/W individual contributors will never have access to.

That’s a great idea. I also hope we get the opportunity to get compiling of patches into multiple versions, as well to get builds of a lot of patches together. Personally, I’m running 3 versions just to diagnose one problem. 5.0.36 with a custom binary change, 5.0.37 so I have SHOW PROFILE, and 5.0.33 so I have microslow patch.

With new patches becoming available from the community, I hope I can see builds that combine all known patches that Dorsal Source may have.

I think this is going to be a great project.

MySQL Conference – PHP on Hormones

Friday, April 27th, 2007

MySQL Conference 2007 Day 4 started early again at 8:20 am with PHP on Hormones by the father of PHP Ramus Lerdorf.

A very funny man, one of the best insightful talks of the conference (rather scary actually). Here are some opening comments.

  • In his own words as Keynote speaker. “I’m here because I’m old”.
  • Php 1 from 1994 started after seeing Mozilla in 1993. Because it was just me using it, I could change the language any time.
  • In 2005 the code looks like this (in comparison on 1995) — I’m not sure if this is worth 10 years of development
  • I wrote PHP to avoid programming
  • It’s changed to be more OO because people expect that. Universities teach this.
  • Hey, I was fixing bugs in my sleep. Iwould wake up, and in my mail box there would be bug fixes to bugs I didn’t even know I had.

Why do people contribute?

  • Self-interest
  • self expression
  • hormones
  • Improve the world

The slide included a great Chemical equation of “The Neuropeptide oxytocin” — Nature’s trust hormone

People need to attract other people, it makes you feel good, it comes out when you interact with people.

It’s not what people think about you, but rather what they think about themselves.

  • PHP was my baby, giving up control, just because I started it, doesn’t mean I have a bigger say in it.
  • Systems that harness network effects and get better the more people use them in a way that caters to their own self-interest. — Web 2.0
  • Once you build a framework your done, the users build the site, they drive the content.
  • The same people that work on open source projects, are the same people that use websites.
    • Self-interest
    • self expression
    • hormones
    • Improve the world

1. Performance
It your sites falls apart your done.

  • Benchmark
    • http_load
    • Callgrind inside valgrind
    • XDebug

valgrind –tool=callgrind

  • Excellent tool to see where time is spent in the code. You have to run a profiler.
  • Example of using Drupal. It turns out 50% of time was spent in the them, it had 47 SQL queries, 46 Selects.
  • Went from 4 per second to 80 per second, without any code changes. Some performance options, and some caching.
  • Guaranteed you can double the speed of your website by using a profiler.

2. Security
Critical problem areas.

  • 404 pages
  • Search page
  • PHP_SELF
  • $_GET, $_POST, $_COOKIE
  • $_SERVER
  • Lots of stupidity in IE (e.g. Always send a charset)

The web is broken you can all go home now.

People are venerable because people run older versions of browsers, and their data is not secure, and you can’t secure their data.

What can happen??
9 out of 10 of you have cross-site scripting hole on your site

Remote Greasemonkey
Profile Hacks
JS Trojans

Added a PHP logo to the MySQL User Website, it’s really the PHP website
IBM webpage, on article about security.

Tool to find holes, banks, insurance companies, CIA, even Yahoo where I work.

You know if they have been to bankofamerica.com, you can tell if they are a customer, you can tell if they are logged, you can then see their cookie credentials.

You don’t know if any sites have these problems.

JS trojan, iframe that captures
reconfigures your wireless router, moves it outside your DMZ, then uses traditional techniques to attack your machine (that you thought was secure inside a firewall)

You should never ever click on a link. It sort of defeats the purpose of the web.

Never use the same browser instance to do personal stuff and browsing.

So what are we doing about this?
There isn’t much we (PHP) can do to secure sites developed.
Built a filter extension in 5.2, back in 5.1.

http://php.net/filter *** YOU MUST IMPLEMENT THIS
filter.default=special_chars

3. APIs are Cool!

Two lines to grap the Atom feed from flickr of photos just uploaded.
That’s all I have to add to my code.

The really make you want to use the servers. It’s so easy.

API drives passion, drive people to use your site.
You can add a lot of cool things to your sites.

What to do

  • Avoid Participation Gimmicks
  • Get their Oxytocin flowing
  • Solve One Problem
  • Clean and Intuitive UI
  • API’s
  • Make it work

A full copy of the slides can be found at http://talks.php.net/show/mysql07key

MySQL Conference – Google

Friday, April 27th, 2007

MySQL: The Real Grid Database

Introduction

  • Can’t work on performance problems until we solve the availability
  • We want MySQL to fix our problems first.

The problem

  • Deploy a DBMS for a workload with
    • too many queries
    • to many transactions
    • to much data

A well known solution

  • deploy a grid database
  • -use many replicas to scale read performance
    -shard your data over many master to scale write performance
    -sharding is easy, resharding is hard

  • availability and manageability trump performance
  • - make it easy to run many severs
    - unbretable aggregate perfomance

  • we describe problems that matter to us.
  • The grid database approach

    • Deploy a large number of small servers
    • use highly redundant commodity components
    • added capacity has a low incremental cost
    • not much capacity lost when a server fails
    • support many servers with a few DBAs

    Managability
    Make it easy to do the tasks that must be done. Reduce, Reduce.
    Make all tasks scriptable
    Why does it mater, support hundreds of servers, spend time solving more interesting problems. You generally have lots of problems to solve.

    Underutilize your severs
    Require less maintenance
    Requre less tuning
    tolerate load spikes better
    tolerate bad query plans better

    In a Perfect World
    Short running queries
    uses kill mistake and runaway queries
    accounts new use to many connections
    query plans are good
    new apps increase database workload by a small amount
    only appropiate date is stored in the database

    Reality

    • Long running transactions, create replication delays everywhere
    • servers with round robin DNS aliases make queries hard to find
    • applications create more connections where the database is slow
    • some storage engines use sampling to get query plan statistics
    • new applications create new database performance problems
    • applications use the database as long as rows are never deleted
    • many long running queries on replicas

    Solutions

    • Improve your ability to respond because prevention is impossible
    • Need tools to make monitoring easier
    • determine what is happening across servers
    • detemine what happened in the past

    Mantra

    • Monitor everything you can, and archive as long as possible. (vmstat 5 secs, iostat, mysql error logs)
    • You will need these to reconstruct failures
    • save as much as possible
    • script as much as possible

    Monitoring Matters

    • Display what is happening
    • -which table, account or statemsns caused most of the load
      -many fast queries can be as much a problem as one slow query

    • Record what happending
    • –archivce show status counters somweere
      - query data from the archive
      – visualise data from the archivce

    • record queries that have been run
    • – archive show processlist output (do every 30 seconds)
      – support queries on this archive

    • All of this much scale to an environment with many servers

    Monitoring Tools

    • Display counters and rate change for counters
    • aggregate values over many servers
    • visualize and rang results
    • display results over time

    Google mpgrep tools

    New Commands
    We changed mysql, three new commands
    SHOW USER _STATISTICS
    SHOW TABLE STATISTICS
    SHOW INDEX STATISTICS

    Per Account Activity
    USER_STATISTICS
    seconds executing commands
    number of rows fetched and changed
    total connections
    number of select/updates/other/commits/rollback/binlog bytes written.

    TABLE STATISTICS
    number of rows fetched/changed

    INDEX STATISTICS
    display number of rows fetched per index
    helps find indexes that are never used

    available in code.google.com in 4.0, porting to 5.0

    MySQL High Availability

    • Great options
      • Cluster
      • Replication
      • Middelware — e.g. continum
      • DRBD
    • We need some features right now
    • we are committed to innodb and mysql replication
    • *a lot of appplicaton code works on this
      *our tools and processed support this

    • We favor commodity hardware

    There are all great features but we are much more limited in what we can use.
    Management want to know we don’t loose transactions, not loose some transactions.

    Desired HA Functionality

    • Zero transaction loss on failures of a master
    • minimal downtime on failures of a master
    • reasonable cost in performance and dollars
    • fast and automatic failover to local or remove server
    • no changes to our programming model
      • does it support MVCC
      • does it support long running transactions (5 mins – populate temp table then use to update another table, changing rows), 5 mins on master, causes 5 mins on slave, causes code to failover from slaves to master

    • replication and reporting are concurrent on a slave

    MVCC must have update concurrent with query.

    Failures happen everywhere
    OS – kernal oom or panic (older 2.4 32 bit systems)
    mysqld – caused also by code we added
    disk, misdirected write, corrupt write (love innodb checksums)
    file system – inconsisted after unplanned hardware reboot (use ext2)
    server – bad RAM
    lan, switch – lose
    Rack – reboot
    Data center – power loss, overheading, lightning, fire
    People – things get killed or rebooted by mistake ( a typo can take out the wrong server, when names differ by a character or a digit)

    ext2 and 4.0 are great, there are the same generation.
    Trying not to use RAID, not battery backed raid etc, we try work around with software solutions. We do use RAID 0, but we also try software solution.
    When we have the right HA solution, we won’t need RAID.

    Mark. “Yes, Google programmers have bugs. Not me personally, it was my predecessor.”

    HA Features we want in MySQL
    Synchronous replication as an option
    a product that watches a master and initiates a failover
    archives of the master binlogs stored elsewhere
    state stored in the filesytstem t obe consistent after a crash
    . innodb and mysql dictionaries can get out of sync
    .replicatoin state on a slave can get out of sync

    We could not wait
    Features we added to MySQL 4.0.26
    We can do things a lot faster
    . we have more developers lying around
    . Our needs as specific, not a general product solution

    Transactional replications for slaves
    semi-synchronous replication
    mirrored binlogs
    fast and automated failover

    Transactional Replication
    Replication state on a slave is stored in files
    slave sql thread commits to storage engines and then updates a file
    a crash between the two can make replication state inconsistent
    transactional replication
    MySQL can solve this in the future by storing replication state in tables

    Semi-synchronous replication
    Block return from commit on a master until at least one slave has acknowledged receipt of
    slave io thread acknowledges receipt after buffering the changes
    modified mysql replication protocol to support acknowledgments
    conifuration options
    where to the master uses it
    where a slave used it
    how long the maser waits for an acknowledgement

    can run a server with some semi-sync replication slaves and some regulare replication salves
    this can be worked with any storage engines that supports commit, but we only use innodb

    * This is how we guarantee to management for Zero Transaction Loss.

    Latency single stream 1ms, multi-stream 10ms. This is acceptable for us.

    The MySQL Replication Protocol

    • The current replication protocol is efficient
    • a slaves makes one request

    Replication Acknowledgment

      Slaves register as semi-sync or async at connect time
      prepend flag bytes to all replication events sent to semi-sync clients
      the master sends the flag bytes to request acknowledged for replication events that represent the end of the transaction
      the slave use the existing connection for acknowledgments

    Mirrored Binlogs
    mysql does not provide a way to maintain a copy of a master’s binlog on a replica. By copy we me a file of same name and equivalent byte for byte.
    Hierarachial replication works much better where a slave can disconnect from one replication proxy and reconnect to another with adjusting binlog offsets.
    Hot backups taken before a failover and difficult to use after a failover

    Mirrored Binlog Implementions
    Slave IO threads write their own relay log and a copy of the bin log
    all events but the rotate log event are written

    After failover, start a new binlog on new master

    Fast Failover

    Slaves use a hostname, rather then an IP
    You can’t enable the binlog dynamically (in 4.0)
    Added new SQL STATEMENTS that does
    disconnect users with SUPER privilege
    disable new connections
    enable the bin log
    enable connections from all users

    Automatic failover
    Something must decided that a master has failed
    Something must choose the new master

    Q: What keeps up from moving to 5.0?
    A: Queries don’t parse (Joins)

    Data sets, 8GB servers, 50-100GB’s

    Quote – 26 April 2007

    Friday, April 27th, 2007
    “The web is broken you can all go home now.”

    Ramus Lerdorf — Father of PHP — MySQL Conference 2007

    Quote – 25 April 2007

    Thursday, April 26th, 2007
    “Don’t complain, do something about it”

    Baron Schwartz – Creator of MySQL Toolkit — MySQL Conference 2007

    MySQL Roadmap

    Thursday, April 26th, 2007

    Here are some notes from the MySQL Server Roadmap session at the MySQL Conference 2007.

    MySQL: Past and Future

    • 2001: 3:23
    • 2003: 4.0 UNION query Cache Embedded
    • 2004: 41. Subqueries
    • 2005: 5.0 Stored Procedures, Triggers, Views
    • Now: 5.1.17 Partitioning, Events, Row-based replication
    • 2007?: 6.0 Falcon, Performance, Conflict detection
    • 2008?: 6.1 Online Backup, FK Constraints

    2007 Timeline

    • Q1: 5.1 Beta, 5.1 Telco Production Ready, Monitoring Service 1.1, MySQL 6.0 Alpha, Community GA
    • Q2: MySQL 6.0 Beta, New Connectors GA
    • Q3: 5.1 RC, 6.0 Beta, MS 2.0, Enterprise Dashboard beta
    • Q4: 5.1 GA, 6.0 Beta

    Where are we today?

    • We are by fare the most populate open source database
    • The Enterprise world is moving online and MySQL is well-positioned for that trend, But:
      • Transactional scalability
      • Manageability
      • Specific online features

    MySQL Server Vision – The Future

    • Always Online — 24×7, Online backup,online analytics, online schema changes
    • Dynamic Scale-out — online partitioning, add node, replication aides,
    • Reliable — fault-tolerant, easy disagnosis, stable memory, ultimately self-healing
    • High-performance — Interactive web, real-time response, apps, 10,000-100,000 clients
    • Ease of use — Portable, Best for development, multiple connectors, easy tuning
    • Modularity and Ubiquity — Storage engines, plug ins

    How can you help?

    • Bug finding and fixing — Community Quality Contributor
    • Feature/patch contribution
    • But, to expedite your patch

    The goal: “Be the Best Online Database for Modern Applications”

    Quote – 25 April 2007

    Wednesday, April 25th, 2007
    “What ever advice you got, keep it to yourself, your not the target market.”

    Red Hat & One Laptop Per Child UI Designer to bunch of suits – MySQL Conference 2007

    MySQL Conference – For Oracle DBAs and Developers

    Wednesday, April 25th, 2007


    I have just completed my presentation at the MySQL Conference 2007 on MySQL for Oracle DBAs and Developers.

    Not mentioned in my slides, but referenced during the presentation was what I consider the most important page to document from the MySQL Manual — 5.2.1. Option and Variable Reference

    You can download a PDF copy of my presentation here.

    MySQL Conference – Building a Vertical Search Engine in a Day

    Wednesday, April 25th, 2007

    Moving into the user sessions on the first day at MySQL Conference 2007, I attended Building a Vertical Search Engine in a Day.

    Some of my notes for reference.

    Web Crawling 101

    • Injection List – What is it seed URL’s you are starting from
    • Fetching the pages
    • Parsing the content – words and links
    • Updating the crawl DB
    • Whitelist
    • Blacklist
    • Convergence — avoiding the honey pots
    • Index
    • Map-reduce — split a large problem into little pieces, process in parallel, then combine results

    Focused content == vertical crawl

    • 20 Billion Pages out there, a lot of junk
    • Bread-first would take years and cost millions of lives

    OPIC + Term Vectors = Depth-first

    • OPIC is “On-line Page Importance Calculation”. Fixing OPIC Scoring Paper
    • Pure OPIC means “Fetch well-linked pages first”
    • We modify it to “fetch pages about MySQL first”

    Nutch & Hadoop are the technologies that run on a 4 server cluster. Sample starting with www.mysql.com in 23 loops, 150k pages fetched, 2M URL’s found .

    Serving up the results