Digital Tech Trek Digest [#Issue 2024.10]

Google advances with vector search in MySQL, leapfrogging Oracle in LLM support

As the title states, GCP is the first MySQL-managed service to offer “vector” support. Clearly the buzz-word of 2024 along with RAG, genAI and LLM is so 2023.

IMO, Oracle should just rename MySQL Heatwave to Heatwave. It would distinguish the product as unique, which is is,

4 Signs You Belong In a Startup Accelerator for SaaS Founders

YouTube

Accelerators help early stage startup and enable businesses to grow faster, get mentorship, obtain funding/equitya and are generally industry specific. In this example Tinyseed focusses on B2B.

Some tips for considering an acclerator.

1. You know your numbers and your focussing on the right ones
– MRR
– Churn
-LTV or ACV – Lifetime value, Annual Contract Value
Vanity metrics/candy metrics. Email Subscribers, Free Trial Users, Unique Visitors. Without context on how they are growing paid users

2. You experimenting, especially in marketing and sales
– You should always be tweaking and finding the right market.

3. Your coachable
– You are going to use the resources provided and you are vulnerable (receiving a lot of feedback, lower your defenses)
– Your accountable to groups, e.g. masterminds
– Be open to criticism

4. You are fast and furious
– Your drive and need to be moving forward, never satisfied with status quo
– you try new things
– Nothing is perfect at the beginning, experimentation is a key

The Founder’s Guide to Stealth Startups

EverydaySpy

In this postcast “The Diary of a CEO“, Steven Bartlett interviews Andrew Bustamante. Andrew is a former covert CIA intelligence officer and US Air Force combat veteran. He is the founder of EverydaySpy, an online education platform that teaches real-world international espionage techniques that can be used in everyday life.

Messaging builds narrative. Don’t use mass marketing via social media, believe in your brand.
Marketing, present a message, crafted with an emotion, responsing showing motivation.

Competition is for Losers with Peter Thiel (How to Start a Startup 2014: 5)

This is a very old presentation that was recently re-shared with me.

In this presentation, Thiel starts off his presentation with the concept of “Avoid competition”

Creating value is a very simple formula of two things. Create X$ for the world, and you capture Y% of X where X and Y are independent variables.

A big piece of a small pie can drastically affect profit margin. All United State airline carriers combined compare with Google. Much smaller, much higher value.

He goes on to talk about effectively two types of businesses, a competitive business or a monopoly, there is no in between.

Are you talking about data the WRONG WAY?

Scott Taylor, a colleague I discovered lived in a neighboring town and whom I could meet in person after attending a virtual conference event, asks a very valid question about the importance of data management. He re-iterates the “3Vs” of effective storytelling, Vocabulary, Voice, and Vision. You can discover a lot more information in his book Telling Your Data Story: Data Storytelling for Data Management. The art of Effective Data Storytelling is something I practice daily. It is easy for a data specialist to have the data facts and visualize the data, but the art is being able to drive change by combining data, visualizations, and what I consider the most important component, narrative. I highly recommend Brent Dykes book of the same name, and when combined with Be Data Driven by Jordan Morrow you have a cradle of strategy when discussing data management to organizations that want to become a data-driven organization. It is way more difficult to implement than any plan and strategy you may read and prepare for.

About “Digital Tech Trek Digest”

Most days, I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new about professional and personal topics of interest. I turn what I read into actionable notes in a short, committed time window, summarizing what I learned, what I should learn and use, or what is of random interest. And thus my Digital Tech Trek.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow. Also Scientific American Technology, Fareed’s Global Briefing, Software Design: Tidy First? by Kent Beck, Last Week in AWS, Micro Newsletter to name a few.

New Additions to my reading

SaaS Developer Community

TDD for Infrastructure

Test Driven Development (TDD) is an important principle for producing quality software. This is not a new concept. The Extreme Programming (XP) agile methodology (1999) outlined the concept before the acronym became more widely accepted as “Another requirement is testability. You must be able to create automated unit and functional tests… You may need to change your system design to be easier to test. Just remember, where there is a will there is a way to test.” Another clear way to describe the hurdles TDD has encountered as a common sense approach is “This is opposed to software development that allows code to be added that is not proven to meet requirements.”

Infrastructure setup is still software. All setup should have adequate testing to ensure at anytime (not just during installation or configuration) any system is in a known state. While Configuration Management (CM) works with the goal of convergence, i.e. ensuring a system is in a known state, testing should be able to validate and identify any non-conformance and not to attempt to correct.

The Bash Automated Test System (BATS) is a known framework for shell scripting. It is very easy to use.

Good habits come from always doing them. Even for a quick test of a running MySQL server via vagrant for a blog post, the automated installation during setup includes validating a simple infrastructure setup via a bats test.

$ tail install.sh

...
sudo mysql -NBe "SHOW GRANTS"
systemctl status mysqld.service
ps -ef | grep mysqld
pidof mysqld
bats /vagrant/mysql8.bats

Rather than having some output that requires a human to read and interpret each line and make a determination, automated it. A good result is:

$ vagrant up
...
    mysql8: ok 1 bats present
    mysql8: ok 2 rpm present
    mysql8: ok 3 openssl present
    mysql8: ok 4 mysql rpm install
    mysql8: ok 5 mysql server command present
    mysql8: ok 6 mysql client command present
    mysql8: ok 7 mysqld running
    mysql8: ok 8 automated mysql access 

A unsuccessful installation is:

$ vagrant provision
...
    mysql8: not ok 8 automated mysql access
    mysql8: # (in test file /vagrant/mysql8.bats, line 60)
    mysql8: #   `[ "${status}" -eq 0 ]' failed
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

$ echo $?
1

This amount of very simple testing and re-execution of testing either via ssh or a re-provision highlighted a bug in the installation script. Anybody that wishes to identify please reach out directly!

...
# Because openssl does not always give you a special character
NEWPASSWD="$(openssl rand -base64 24)+"
mysql -uroot -p${PASSWD} -e "ALTER USER USER() IDENTIFIED BY '${NEWPASSWD}'" --connect-expired-password
# TODO: create mylogin.cnf which is more obfuscated
echo "[mysql]
user=root
password='$NEWPASSWD'" | sudo tee -a /root/.my.cnf
sudo mysql -NBe "SHOW GRANTS"
systemctl status mysqld.service
ps -ef | grep mysqld
pidof mysqld
bats /vagrant/mysql8.bats

A simple trick with a BATS test is to echo any output that will help debug a failing test. If the test succeeds no output is given, if it fails you get the information for free. For example, lets say your test is:

# Note: additional security to both access the server via ssh
#       and accessing sudo should be in place for production systems
@test "automated mysql access" {
  local EXPECTED="${USER}@localhost"
  run sudo mysql -NBe "SELECT USER()"
  [ "${status}" -eq 0 ]
  [ "${output}" = "${EXPECTED}" ]
}

Execution will only provide:

 ✗ automated mysql access
   (in test file /vagrant/mysql8.bats, line 62)
     `[ "${output}" = "${EXPECTED}" ]' failed

What you want to see to more easily identify the problem is:

 ✗ automated mysql access
   (in test file /vagrant/mysql8.bats, line 62)
     `[ "${output}" = "${EXPECTED}" ]' failed
   root@localhost != vagrant@localhost

This echo enables a better and quicker ability to correct the failing test.

...
  [ "${status}" -eq 0 ]
  echo "${output} != ${EXPECTED}"
  [ "${output}" = "${EXPECTED}" ]
...

Testing is only as good as the boundary conditions put in place. Here is an example where your code used a number of environment variables and your testing process performed checks that these variables existed.

@test "EXAMPLE_VAR is defined ${EXAMPLE_VAR}" {
  [ -n "${EXAMPLE_VAR}" ]
}

The code was subsequently refactored and the environment variable was removed. Do you remove the test that checks for its existence? No. You should not ensure the variable is not set, so that any code now or in the future acts as desired.

@test "EXAMPLE_VAR is NOT defined" {
  [ -z "${EXAMPLE_VAR}" ]
}

References:
[1] https://en.wikipedia.org/wiki/Test-driven_development
[2] http://www.extremeprogramming.org/when.html
[3] https://github.com/sstephenson/bats
[4] https://github.com/bats-core/bats-core

Understanding OpenStack developer dependencies

While reviewing the OpenStack keystone codebase on an existing VM used with devstack I came across a dependency problem with Python pbr. Python Build Reasonableness (pbr) is actually a result of work on OpenStack. Additional info can be found at Openstack pbr.

On one server machine I had this package installed. At this time I do not know what process actually installed the pbr package.

$ sudo dpkg -l | grep pbr
ii  python-pbr        0.7.0-0ubuntu2      all          inject useful and sensible default behaviors into setuptools - Python 2.x

This is incompatible with current code from several OpenStack projects, keystone and python-openstackclient being two I am working with when reviewing the projects requirements in requirements.txt.

$ grep pbr requirements.txt
pbr>=0.6,!=0.7,<1.0

As seen here, 0.7 is specifically excluded. When updating this machine with the required versions to run the checked out code I ran into the following problem.

$ sudo -H pip install -r requirements.txt

...
  Found existing installation: pbr 0.7.0
    Uninstalling pbr-0.7.0:
...
     OSError: [Errno 13] Permission denied: '/usr/lib/python2.7/dist-packages/pbr/version.py'

This lead me to determine I need to run multiple separate VMs. Dedicated VMs for devstack installations when I'm testing things, and a dedicated VM for source development. I later determined the best action was to do development on my host machine installing these developer dependencies and always running any deployed versions in VMs.

Minimum requirements

Using a stock Ubuntu 14.04 LTS server installation I took the time to iteratively check the needed dependencies

# Git needed to retrieve OpenStack code
sudo apt-get install -y git-core

# Python is installed by default on an Ubuntu Server

# install easy_install
sudo apt-get install python-setuptools

# install pip - Package Management System   Uses Python Package Index (PyPI)
sudo easy_install pip

# Install tox - Python automated and standardized testing
sudo -H pip install tox

# Python Developer Libraries
sudo apt-get install -y python-dev

# Openstack developer dependencies
sudo apt-get install -y libffi-dev libssl-dev libldap2-dev libffi-dev libsasl2-dev libxslt1-dev libxml2-dev

With the necessary dependencies met, the following builds a working keystone developer virtual environment.

git clone git://git.openstack.org/openstack/keystone
cd keystone
tox -e py27 --notest

Required Dependencies

Certain projects do a good job of defining the required OS dependencies such as keystone.

To validate these requirements the following is an iterative process of determining the compilation error message and needed package dependency.

  • For missing #include <ffi.h> install libffi-dev
  • For missing #include <openssl/aes.h> install libssl-dev
  • For missing #include "lber.h" install libldap2-dev
  • For missing #include <ffi.h> install libffi-dev
  • For missing #include <sasl.h> install libsasl2-dev
  • For missing #include "libxml/xmlversion.h" install libxslt1-dev which requires libxml2-dev

For setting up a development environment libsqlite3-dev was not initially needed. This does not mean it's needed later for testing purposes.

A testimony to Linux resilience

A client released a new version of their website onto 20 AWS m1.medium instances (current site at peak load runs approximately 60 m1.medium webservers).
It was clearly an unsuccessful release, but what was surprising was the system did not actually crash, it was effectively a meltdown, but servers were still operational with load averages > 100. I was impressed with the ability for Linux to still (just) function.

parallel-ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 -i -h   uptime
 18:01:00 up 18:44,  0 users,  load average: 104.26, 110.03, 113.12
 18:01:00 up 18:56,  1 user,  load average: 62.33, 87.75, 90.40
 18:01:03 up 18:44,  0 users,  load average: 105.28, 115.33, 115.61
 18:01:03 up 18:44,  0 users,  load average: 149.35, 155.74, 133.68
 18:01:03 up 18:51,  0 users,  load average: 124.63, 121.31, 115.91
 18:01:03 up 18:44,  0 users,  load average: 118.99, 109.92, 110.60
 18:01:04 up 18:44,  0 users,  load average: 121.73, 118.40, 113.50
 18:01:04 up 18:44,  0 users,  load average: 113.89, 120.56, 114.64
 18:01:05 up 18:44,  0 users,  load average: 119.30, 119.71, 115.65
 18:01:05 up 18:44,  0 users,  load average: 126.33, 120.33, 119.02
 18:01:05 up 18:44,  0 users,  load average: 117.47, 113.01, 112.84
 18:01:05 up 18:44,  0 users,  load average: 172.21, 158.62, 135.19
 18:01:05 up 18:44,  0 users,  load average: 115.81, 114.96, 116.18
 18:01:05 up 18:44,  0 users,  load average: 122.25, 115.32, 115.27
 18:01:05 up 18:44,  0 users,  load average: 164.13, 168.04, 153.03
 18:01:05 up 18:44,  0 users,  load average: 123.80, 114.94, 110.29
 18:01:06 up 18:44,  0 users,  load average: 173.64, 173.80, 158.76
 18:01:06 up 18:44,  0 users,  load average: 132.52, 140.94, 135.43
 18:01:06 up 18:44,  0 users,  load average: 166.17, 151.68, 135.23
 18:01:06 up 18:44,  0 users,  load average: 170.14, 164.03, 145.31

The AWS m1.medium is a single CPU instance.

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 45
model name	: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
stepping	: 7
cpu MHz		: 1800.000
cache size	: 20480 KB
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de tsc msr pae cx8 cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm up rep_good aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm arat epb xsaveopt pln pts dts
bogomips	: 3600.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

The lack of good Internet access in the US

The state of high speed internet providers in the “Capital of the World” is rather woeful. Located in Queens, only a few miles from Manhattan leaves you few choices. Always plenty of ads, but options like Verizon FiOS are not available.

There is basically a monopoly with Time Warner Cable, and while the service is generally reliable, the falseness of pricing and options is criminal.

First of all I could not raise an individual via Chat Online, I was forced to call, wait, provide my details, then be told by an individual he could not do anything, I get transferred, then have to provide all my details again. That’s the *HUGE* failure in customer service. You already know my phone number and account details, why do I have to give my phone number, name, address and account number multiple times.

I wanted to save money, but they only wanted me to pay more. Infact, I was offered a package at double what I was paying for now. If I stated I wanted to save money, why would I be dumb enough to pay more. In the end I was offered an upgrade at no charge for 6 months, but of course after that my bill will go up another $10 per month. Did I gain anything or was I, the consumer, screwed over.

Why does this service suck so much. In Australia,

The heavy handed LinkedIn approach to your contacts

I recently wanted to add two individuals to my list of professional contacts at LinkedIn. I was extremely disappointed at the modified user interface (UI) experience that made it difficult to do so. In the past, you just entered a list of emails.

Many companies these days pressure you into opening up your entire network of contacts for their benefits of knowing your social graph. This is unacceptable.

You have to go thru the following complexity just to send an email request for connection in LinkedIn now.

  • Add Connections
  • Select any email (last button of options)
  • Click Invite by individual email (hidden at bottom of page)

Why SQL_MODE is essential even when not perfect

In a recent rant on Why I think SQL_MODE is useless…, I wanted to counteract this statement with why we MUST all use SQL_MODE, even with the inherit flaws.

The fundamental principle of a database is to restore and retrieve data. When I can insert data into the database and then I select this data it is different, this is fundamentally wrong. This is a loss of essential data integrity, something a database should NEVER do.

SQL_MODE solves the problem of “silent truncation” in most instances, and produces an all important error. As pointed out, the SQL_MODE has several limitations, however the benefits do out way the risks. Quality control on source code can reduce the limitations, but no amount of coding can stop the CRUD that comes out of the database without some SQL_MODE settings.

I would ask two more important questions.

  1. How in the first place can such a critical feature of silent data truncation ever be permitted in MySQL? Who made that decision and why?
  2. When is the owner of MySQL codebase realize this is rather ridiculous and enforce essential minimual data integrity that can be obtain with options including STRICT_ALL_TABLES, NO_ZERO_DATE, NO_ZERO_IN_DATE and NO_ENGINE_SUBSTITUTION.

References

Determining consulting rates

It can be hard sometimes, particularly with startups to determine what to charge. I have tried various models over the years from nothing, to greatly reduced, to full-price. Nothing works well.

As one of the top consultants in MySQL, I kept my rates down as an individual to compete competitively with the 3 or 4 other companies world wide that provide relative services, this in the end hurt my bottom line.

I charge a premium rate that matches my skills, expertise and competitors. I charge that for all customers, large, small, old and new. When the value of my work in performance tuning, disaster management, scalability and architecture is offset by the loss of potential or future business it is not difficult to justify a reasonable rate. I also continue to speak extensively, write and publish materials that provides detailed practical knowledge for organizations and individuals that can invest the time, but not the money.

I am still shocked when large established companies want a discount, just last week for a few hours work a company wanted 33% off.

An extract from “3 Things Entrepreneurs Should Never Depend On When Starting A Company” provides a great re-enforcement about what is appropriate pricing.

Fearful Pricing

When I started my business, I undercharged for my service. I didn’t have the confidence to ask for a decent price, and I thought I had to have the lowest price in order to get business.

What did these practices get me? Low profits and poor cash flow.

In order to survive as a startup—both financially and mentally—it’s crucial that you make sure you’re receiving maximum reward for your maxed out efforts. If you don’t see the true value in your business, how do you expect your clients to do so? Your work is worth it; adjust your prices accordingly.

Read more: Business Insider

Why are we standing still?

I wrote an email a week ago to several close friends titled, “[w]hy are we standing still?” I opened with “[y]ou are all good friends and you are all smart people. We need to work together more … I deal with startups all the time and I rarely find a team of smart articulated people, so why can’t we just do this?”

What was the motivation? I had just read online that startup A has 100 million users and startup B had just raised $42 million. I came to the realization that I am wasting my time trying to develop significantly better tools in my chosen profession because I would never achieve these types of numbers. Those tools would never need the performance and scalability expertise for which I am widely recognized. I am, as CNN Money recently wrote in “Tech companies desperate for ‘rockstarninja’ engineers,” a “Rockstar Ninja” in the tech field.

I work with startups daily and most abuse the technology being used. Technology is not even the problem. In fact, for me and my group of close tech friends, it is almost trivial at times. What is complex, however, is the people and the process combined with the one thing that nobody can escape, time. Ideas are also not the problem, I have plenty of those. I even have several ideas at various stages of actual implementation including VisMarks – Visual Bookmarking and Mooify – The social barometer for moods, thoughts and emotions. These are each at different levels of initial completion. Are they world changers? Probably not, however, they are iterations of the process of what Eric Ries calls The Lean Startup Machine. (Side Note: The next Lean startup machine weekend event is in New York starting on April 1st.)

So back to my band of smart friends, why are we still standing still? Not being content with just talking, I took action and have organized a 24 hour weekend collaboration at my own home for next weekend, code named JFDI/Bliss (there is an interesting story behind the name). I set aside not one project to tackle but three. And by tackle, I mean create, deploy, iterate and even complete a MVP, keeping in mind the technology is not the hard part. The first project is from my good friend Graham, the founder of Ultra Light Startups. We have had many conversations about our respective ideas in the past years, discussing different potential projects. This project is actually referenced by our second project from another friend, John (uBlanket his current startup), that we discussed just last week over dinner. The third, my own VisMarks project, needs only one technical hurdle solved and some user interface design to complete initial functionality. We will also be reviewing and integrating technology from the Lean Startup Bundle for SXSW for the The Lean Startup Challenge.

I can easily provide the food and drink, internet, power, a large finished basement area, a back yard if the weather is good, bedding (it’s a 24 hr event) and even some of my world class beer collection so there are no physical hurdles or expenses to start working together. That is our lean ultralight startup. We have two goals: first, to be able to collaborate in person because thought can be multiplied exponentially when discussing any idea or problem and second, to have fun.

So it all sounds easy, right? Wrong. What’s missing? Just as the capability of being able to code is only a small portion of being a qualified and competent developer, creating the technology is only one portion of a successful startup. Managing Director of TechStars New York, David Tisch (@davetisch) stated at a recent Ultra Light Startups panel discussion some key points that highlighted the resource components of a successful startup. The ideals of this NYC accelerator are mentorship, network and exposure. We need all three. When asked what is the ideal skill set for a team, his response was three people: one business, one technical and one product person. We need those as well. However, when David was asked what is most important when choosing startups for their program, the answer was the people on the team. Important considerations are how you work together, what have you done together as a team, where do you most need help. For the last question that answer is easy. We need help and mentoring in areas including marketing, business development, legal, accounting, PR, promotion, funding, sales, leadership, management, vision, etc. All of these cost money we do not presently have and are necessary to get the traction of millions of users.

The goal should not just to become rich but be happy, have a lot of fun and make a difference in the world in small way we know best, technology.

There are many innovators in our industry, however, the following are a few I follow very closely. These include Dave McClure (@davemcclure), who I followed for a long time before we first met at the inaugural Rethink Hawaii event in 2009. Dave’s AARRR startup metrics for pirates approach is something I share with many clients that do not see the critical need of tracking what their leads do from initial acquisition. This is necessary to help answer questions including what is the your total cost of acquisition of a paying client and what is the best return on investment.

Eric Ries (@ericries) is a person I have followed now for over a year reading, many great posts including What is a startup?, Four (not five) myths about the Lean Startup and Revisiting the Software Design Manifesto to name a few.

Finally, the Kauffman Foundation (@kauffmanfdn), provides good resources and opportunities including the recent 2011 State of Entrepreneurship Address in Washington DC. Entrepreneurs create new businesses and new businesses are the greatest source of jobs and this creates a better economy. Entrepreneurship is also not just about creating something new, it is also about finding better ways of doing things we presently do. Their work with immigration reform, and involvement with the Startup Visa and Statup America are issues close to my own heart.

I have easily been distracted from my day job and upcoming speaking presentations researching work for our initial kickoff. Excitement is a great motivator to achieving something great.

Your PHP installation appears to be missing the MySQL extension which is required by WordPress.

I recently deployed a new WordPress installation to my existing production webserver running Apache, MySQL and PHP for other websites, yet I was presented with the following message.

“Your PHP installation appears to be missing the MySQL extension which is required by WordPress.”

This thread at wordpress.org did not help me, however I was able to solve the problem, but this thread is now marked as closed. That’s poor form because I can’t share the solution I found.

My PHP configuration file did not have the following.

#php.ini
[PHP]
extension=mysql.so

Adding this and restarting Apache did not fix the problem.

The problem was more fundamental and required PHP to be recompiled. Orginally PHP was configured with the ‘–with-mysqli’ option. PHP requires the ‘–with-mysql’ which is rather stupid they have this dependency.

Recompiling PHP and adding the necessary extension were both necessary to get my new WordPress installation operational.

Eyes Only for Recruiters

Dear Recruiters.

I am always open to hearing about exceptional opportunities that will be a challenging role with hard problems to solve and a great team to work with.

You have been directed to this link because you have contacted me. Please do not consider this as impersonal (after you cut and paste this spiel so many times) it is simply easier to publish my response.

My standard recruiter spiel

This request for additional information will help me determine if I am interested in discussing your opportunity in further detail. Please answer all questions. Please do not be an annoying recruiter like the 2 or 3 a week I have to deal with that refuse to provide details. I’m not going to call you unless I am interested, my time is very valuable. If you can not provide information via email to determine my interest level I can not help you, nor am I motivated to share this with my network.

The details
I get contacted 5 times a week by recruiters. In 90% of cases I would be lucky if I am provided with more then one sentence regarding a position, and hence why you have received a concise response. If your the 10% exception then also please continue reading, you just have less work to do towards getting a meaningful response.

I am not just going to pick up the phone and call you, especially when you provide a single sentence and ask me to call you. My time is extremely valuable, I bill at $250 per hour. If you value my time, then please respect it. To better determine if I am the right person for your inquiry please provide more information including but not limited to:

  • What are the required skills sought?
  • What is the industry involved?
  • What is the remuneration?
  • What is the time frame, i.e. what is “short term” if applicable?
  • What is the existing team infrastructure that I would call peers?
  • What is the associated technology stack?
  • What is the detailed job description?

Simply sending me a full job description is not going to win you points if is does not address these questions.

NOTE: I’m not a junior DBA or even a senior DBA. I’m in the top 1% in the field. I have easily proven my output is 2x-3x of MySQL DBA’s working for major fortune 500 companies where these resources are effectively holding companies to ransom. If you are approaching me for a DBA role please ensure it is exceptional. Very few organizations even need a full-time DBA. What they require is a skilled resource such as myself to create an infrastructure that leads to a dispensable role, not an indispensable role for a DBA. I can assist organizations in this transition that includes working with partners for 24×7 DBA support .

You can find my skills and experience on my website at http://ronaldbradford.com and on various pages. If you have further questions then please ask. I do not give out my resume initially and I will never provide this via a Word document.

On a closing note, those that have started this conversation with “connect with me” via LinkedIn, you a one step from the trash or being reported to LinkedIn for spam (i..e if you have never worked with me, then do not make that claim). I do not accept invitations from people I do not know. I do not accept invitations even from people I do know if I do not want them in my network. I will not open up my professional contacts to recruiters. If you are unable to send a proper email from Linked In, it is rather trivial to track me down. There are no points for being lazy.

Successful MySQL Scalability Presentation

Last night I was the invited guest at the SF MySQL Meetup. In my presentation “Successful MySQL Scalability” I talked about a set of principles to ensure appropriate system architecture, data availability and best practices to build an ideal solution for your business. The presentation was also live streamed and is available online.

MySQL South America tour

DISCLAIMER: This post contains no technical MySQL content however it is good news for the MySQL Community.

MySQL content will be included for the first time with the LAOUC (Latin American Oracle Usergroups Council) Oracle tour that is being organized in conjunction with OTN (Oracle Technology Network).

I have no idea what MySQL user communities are in South America however if you live in any of the following cities, please feel free to contact me. I am happy to have additional discussion regarding MySQL or help in some way if there is interest in any cities.

This seven country tour includes:

  • Oct 12 – Lima, Peru
  • Oct 14 – Santiago, Chile
  • Oct 16 – Montevideo, Uruguay
  • Oct 18 – São Paulo, Brazil
  • Oct 20 – Bogota, Colombia
  • Oct 22 – Quito, Ecuador
  • Oct 25 – San Jose, Costa Rica

More details on the specific locations in each city will be available when finalized.

I would be very happy if anybody wants to translate this to Spanish or Portuguese for readers in South America.


View OTN Latin America in a larger map

First thoughts of Augen Android Internet Tablet

In the last few days there has been some press of the Augen gentouch 7″ Tablet. A new cheap tablet that is running Android 2.1.

Image from http://android-devices.net

There were a few primary motivations for getting one, the first being price, at $150 I consider cheap for a small tablet, and second it runs Android, something I’m wanting to play more with. I have an original Google G1, and I also purchased one recently for my fiance.

So what are my first impressions.

  1. First it was difficult to get. It is only available at KMart it seems. My local KMart in New York City didn’t have any. Apparently they sold out very quickly with stock on Wednesday. 10 KMart stores later (via phoning them) I found just 1 more that had even received stock, and they had 1 left. Lucky last. It was only that we had a car on Saturday it was even possible to get to this store. (-1 for ease of acquisition via company distribution)
  2. KMart claimed the price was $165, but if you checked the online magazine it was $149.99. I had to prove that first to get the price via using the web on my phone. (-1 for Kmart customer service)
  3. I wanted to buy and additional memory card as I knew from online review this was available however the box (which was still unopened before purchase only mentions Expansion Memory Card Slot, it doesn’t mention the actually size format. Even reading the manual after purchase talks about a SD card or a TF card, but is not specific. I ended up purchasing a Sandisk Mobile microSDHC 16GB card. Even that packaging was confusing as it clearly states “For Mobile Phones” twice on the package. (-1 for ease of information/packaging)
  4. Turning on the first time was a pleasant surprise because the item was fully charged. (+2 there). With my MiFi I was immediately able to connect to the web while still in the car (as a passenger). (+1 for access to primary use)
  5. The touch screen is clearly not as responsive as an Apple or smart phone, it’s mixed sometimes it only requires a light touch, sometimes a heavy touch. Early reports mentioned this, so it was not unexpected. Again for the price and proposed uses I have for it, it wasn’t a deal breaker. I saw one online video that included a stylus, and another mention online, however the manual gives no indication, and I don’t seem to have one.
  6. Downloading the 2012 HD trailer via YouTube worked promptly and without issues and picture was good (+1) but there is no external controls for audio. (-1 for that).
  7. There is a headphone jack however I find out later that this is a 2.5mm jack, and standard headphones used by everybody on the planet is 3.5mm. (-1 for that). The website claims they will supply people at no cost a set of 2.5mm headphones. Again useless as I don’t want to have to carry those around as well. What I want is a 2.5mm to 3.5mm converter, actually two because I’m sure I’ll lose it easily. (-1 for that)
  8. The main buttons for “back, menu, home” are actually on the back of the device. Once you know that it’s not that bad, but it is a little odd. Of course for right handed person. If they were on the side it’s possible any type of “death grip” may accidentally press them.
  9. Unit comes with a handy and practical leather case (+1) with adequate access for side controls, however it’s only good for holding not using because the buttons are on the back (-1).
  10. By purchasing a 16GB at $99, the price is now $250, it’s not as impressive a cost product.
  11. The purchased SDCard was not easy to install, there is no clear instruction on right side up and certainly nothing in the docs, and there are reports online that people misplace in the slot and the card gets lost inside of unit. I didn’t have that problem and I don’t really see that would happen for me. The manual indicates you will get an icon about it, however that wasn’t the case so I really didn’t know if it worked or not. (-1) Using the AndExplorer indicates a /sdcard so I assume it’s operational.
  12. There is reference to a U Disk (who knows what that is). with a supplied cable I assume it’s a means of adding a USB thumbdrive to the mini USB slot. Trying this however didn’t seem to work so I don’t know if that’s the intended use.
  13. The power supply is yet another plug I have to now carry (along with one for laptop, phone, mifi). I was really hoping the power adapter would have been a mini or micro USB. (-1)
  14. The first real use after web, video, google maps, email is to download other stuff however it seems the Android Market is broken. The official website (which I now can’t find, -1 for poor SEO and google searchability) makes a note this is broken and is expecting a patch (-2 for poor testing there.
  15. The manual was not proof read by an English person. The title cover states “Table” not “Tablet” however the funniest part was reading some the manual. I will not type what I found, but you can see the image below. It so made me laugh.
  16. There is no video output, which is really annoying because the side panel actually states HDMI in printing, but nothing physical.
  17. The screen res is 800×480. Ok, so it’s small but of the sites I’ve initially visited the horizontal bar has not been an issue. I suspect it may be in the future, but this is not my primary development machine.
  18. No webcam
  19. I wanted to download some of my ebooks, however this seems to be broken and related to the market problem. Will see after patch.

I’ve yet to really test it out, these are just my first impressions. For the price I consider it a worthwhile investment for the purposes I want it for. That is some browsing, (ideal for bedroom), I can see it a wicked 7″ GPS unit with turn by turn controls for driving, but without GPS it will be a bit manual. Reading email, or an ebook, and even use as a large digital frame, especially for my photos.

This is a gen 1 product, so you have to accept the shortcomings. You are either an alpha adopter that is willing to accept limitations and accept the benefits it does have or your not.

Other References:

Augen’s $150 Android tablet hits Kmart circular, coming to stores later this week (what first caught my eye). KMart update. First impressions of the Augen GenTouch78 Android tablet (with second mention of a stylus).

Still room at Kaleidoscope for MySQL attendees

Today I received notice that next week’s Velocity conference is at maximum capacity. With just under 2 weeks before the start of ODTUG Kaleidoscope in Washington DC we still have room for late registrations. There is 4 days of MySQL content, free events and also a Sunday Symposium that includes talks on performance and high availability.

Contact any of the MySQL speakers directly and you can receive a special 50% discount code. This is only for MySQL attendees.

If you live in the DC area and only want the FREE option then come along and join use on Monday night for a free session and reception.

ODTUG Kaleidoscope 2010
July 27 – July 1
Marriott Wardman Part Hotel
2660 Woodley Road NW
Washington, District Of Columbia 20008
www.odtugkaleidoscope.com

Conference highlights include

Community Service Day – Saturday, June 26, 8:00 a.m. – 1:00 p.m.
Join ODTUG volunteers and help refurbish a school in D.C.  Under the guidance of Greater DC Cares (GDCC), the leading and largest nonprofit coordinator of volunteerism in the D.C. region, ODTUGgers will: Sort books, beautify school grounds, and paint games on blacktop outside of hte school.

There is still time to sign up!  

Four Full-day Symposia – Sunday, June 27, 8:30 a.m. – 4:00 p.m.
Application Express; Oracle EPM and Essbase; Security, Scalability, and Performance; SOA and BPM. One-day registration available.

Welcome Reception/Battle of the Rock Bands – Sunday, June 27, 6:15 – 8:00 p.m.
Meet the exhibitors and compete in the “Battle of the Rock Bands.” Sign up to play.


Opening General Session – Monday, June 28, 8:30 – 10:00 a.m.
Awards for Best Technical Paper and Best 2009 Presentations
Keynote – “Future of the Internet and its Social Impact” by Lee Rainie, Director of the PEW Research Center’s Internet & American Life Project.
Sundown Sessions with Oracle ACE Directors – Monday, June 28, 5:45 – 6:45 p.m.
Reception to meet the Oracle ACE Directors immediately follows – 6:45 – 7:45 p.m.

Special Event – Wednesday, June 30, 6:30 – 10:00 p.m.
Featuring comedian John Heffron, 2nd season champion of the hit TV show, Last Comic Standing.
Music by live cover band, Right Foot Red

MongoDB Experience: Replication 101

After successfully installing and testing mongoDB it’s very easy to create a replication environment.

$ mkdir -p data/{master,slave}
$ mongod --dbpath=`pwd`/data/master --master --port 28011 > master.log 2>&1 &
# Always check your log file
$ cat master.log
$ mongod --dbpath=`pwd`/data/slave --slave --source localhost:28011 --port 28022 > slave.log 2>&1 &
$ cat slave.log

The options are relatively descriptive and straightforward.

  • –dbpath – The directory for data (we set because we are running master/slave on same server)
  • –port – Likewise we are running multiple instances on same machine
  • –master – I’m the master
  • –slave – I’m a slave
  • –source – For slaves, tell them were the source (i.e. master is)

What I found under the covers was a difference from the single instance version. There is a series of ‘local’ files for the namespace, where in the single instance version there were ‘test’ files.

$ ls -ltR data
total 0
drwxr-xr-x  6 rbradfor  staff  204 Jun 10 10:24 slave
drwxr-xr-x  5 rbradfor  staff  170 Jun 10 10:22 master

data/slave:
total 163848
drwxr-xr-x  2 rbradfor  staff        68 Jun 10 10:24 _tmp
-rw-------  1 rbradfor  staff  67108864 Jun 10 10:24 local.0
-rw-------  1 rbradfor  staff  16777216 Jun 10 10:24 local.ns
-rwxr-xr-x  1 rbradfor  staff         6 Jun 10 10:24 mongod.lock

data/slave/_tmp:

data/master:
total 163848
-rw-------  1 rbradfor  staff  67108864 Jun 10 10:22 local.0
-rw-------  1 rbradfor  staff  16777216 Jun 10 10:22 local.ns
-rwxr-xr-x  1 rbradfor  staff         6 Jun 10 10:22 mongod.lock

A quick replication test.

$ mongo --port 28011
MongoDB shell version: 1.4.3
url: test
connecting to: 127.0.0.1:28011/test
type "help" for help
> db.foo.save({s:"Hello world"});
> db.foo.find();
{ "_id" : ObjectId("4c10f7904a30c35548b0af06"), "s" : "Hello world" }
> exit
bye

$ mongo --port 28022
MongoDB shell version: 1.4.3
url: test
connecting to: 127.0.0.1:28022/test
type "help" for help
> db.foo.find();
{ "_id" : ObjectId("4c10f7904a30c35548b0af06"), "s" : "Hello world" }
> exit

A look now at the underlying data shows a ‘test’ namespace which confirms the lazy instantiation approach. The ‘local’ namespace files are obviously a reflection of the –master/–slave operation.

$ ls -ltR data
total 0
drwxr-xr-x  9 rbradfor  staff  306 Jun 10 10:32 slave
drwxr-xr-x  8 rbradfor  staff  272 Jun 10 10:32 master

data/slave:
total 589832
-rw-------  1 rbradfor  staff  134217728 Jun 10 10:33 test.1
drwxr-xr-x  2 rbradfor  staff         68 Jun 10 10:32 _tmp
-rw-------  1 rbradfor  staff   67108864 Jun 10 10:32 test.0
-rw-------  1 rbradfor  staff   16777216 Jun 10 10:32 test.ns
-rw-------  1 rbradfor  staff   67108864 Jun 10 10:24 local.0
-rw-------  1 rbradfor  staff   16777216 Jun 10 10:24 local.ns
-rwxr-xr-x  1 rbradfor  staff          6 Jun 10 10:24 mongod.lock

data/master:
total 327688
drwxr-xr-x  2 rbradfor  staff        68 Jun 10 10:32 _tmp
-rw-------  1 rbradfor  staff  67108864 Jun 10 10:32 test.0
-rw-------  1 rbradfor  staff  16777216 Jun 10 10:32 test.ns
-rw-------  1 rbradfor  staff  67108864 Jun 10 10:22 local.0
-rw-------  1 rbradfor  staff  16777216 Jun 10 10:22 local.ns
-rwxr-xr-x  1 rbradfor  staff         6 Jun 10 10:22 mongod.lock

By default there appears to be no read-only default state for a slave. I was able to add new data to the slave.

$ mongo --port 28022
MongoDB shell version: 1.4.3
url: test
connecting to: 127.0.0.1:28022/test
type "help" for help
> db.foo.save({s:"Hello New York"});
> db.foo.find();
{ "_id" : ObjectId("4c10f7904a30c35548b0af06"), "s" : "Hello world" }
{ "_id" : ObjectId("4c10f864d8e80f1a1ad305cf"), "s" : "Hello New York" }
>

A closer look at this ‘local’ namespace and a check via the docs gives us details of the slave configuration.

$ mongo --port 28022
MongoDB shell version: 1.4.3
url: test
connecting to: 127.0.0.1:28022/test
type "help" for help
> show dbs;
admin
local
test
> use local;
switched to db local
> show collections;
oplog.$main
pair.sync
sources
system.indexes
> db.sources.find();
{ "_id" : ObjectId("4c10f5b633308f7c3d7afc45"), "host" : "localhost:28011", "source" : "main", "syncedTo" : { "t" : 1276180895000, "i" : 1 }, "localLogTs" : { "t" : 1276180898000, "i" : 1 } }

You can also with the mongo client connect directly to a collection via the command line.

$ mongo localhost:28022/local
MongoDB shell version: 1.4.3
url: localhost:28022/local
connecting to: localhost:28022/local
type "help" for help
> db.sources.find();
{ "_id" : ObjectId("4c10f5b633308f7c3d7afc45"), "host" : "localhost:28011", "source" : "main", "syncedTo" : { "t" : 1276180775000, "i" : 1 }, "localLogTs" : { "t" : 1276180778000, "i" : 1 } }
> exit
bye

The shell gives 3 convenience commands for showing replication state.

On the Slave

$ mongo --port 28022
> db.getReplicationInfo();
{
	"logSizeMB" : 50,
	"timeDiff" : 1444,
	"timeDiffHours" : 0.4,
	"tFirst" : "Thu Jun 10 2010 10:24:54 GMT-0400 (EDT)",
	"tLast" : "Thu Jun 10 2010 10:48:58 GMT-0400 (EDT)",
	"now" : "Thu Jun 10 2010 10:48:59 GMT-0400 (EDT)"
}
> db.printReplicationInfo();
configured oplog size:   50MB
log length start to end: 1444secs (0.4hrs)
oplog first event time:  Thu Jun 10 2010 10:24:54 GMT-0400 (EDT)
oplog last event time:   Thu Jun 10 2010 10:48:58 GMT-0400 (EDT)
now:                     Thu Jun 10 2010 10:49:07 GMT-0400 (EDT)
> db.printSlaveReplicationInfo();
source:   localhost:28011
syncedTo: Thu Jun 10 2010 10:49:25 GMT-0400 (EDT)
          = 1secs ago (0hrs)

On the master, the same commands are applicable, output basically the same.

$ mongo --port 28011
> db.getReplicationInfo();
{
	"logSizeMB" : 50,
	"timeDiff" : 1714,
	"timeDiffHours" : 0.48,
	"tFirst" : "Thu Jun 10 2010 10:22:01 GMT-0400 (EDT)",
	"tLast" : "Thu Jun 10 2010 10:50:35 GMT-0400 (EDT)",
	"now" : "Thu Jun 10 2010 10:50:40 GMT-0400 (EDT)"
}
> db.printReplicationInfo();
configured oplog size:   50MB
log length start to end: 1714secs (0.48hrs)
oplog first event time:  Thu Jun 10 2010 10:22:01 GMT-0400 (EDT)
oplog last event time:   Thu Jun 10 2010 10:50:35 GMT-0400 (EDT)
now:                     Thu Jun 10 2010 10:50:45 GMT-0400 (EDT)
> db.printSlaveReplicationInfo();
local.sources is empty; is this db a --slave?
>

From these commands there seems no obvious way to easily identify if an instance is a master or not.

References

DBA operations from shell
Replication
Master/Slave Replication

Free MySQL Event in Washington DC

As the program chair for the recently announced MySQL Track at the ODTUG Kaleidoscope conference located in Washington DC we are also looking into an associated free community event for MySQL locals in addition to a dedicated track for 4 days.

Please let us know your name and email via the form at http://ronaldbradford.com/ODTUG/free-event/ so we can provide more details in the coming week as we try to finalize logistics.

Registration will be necessary for attendance however for now we just want to know who is local so we can provide more details soon!

Updated. Full details of the free Monday night sundown sessions and reception can be found at MySQL track with free event at Kaleidoscope 2010

State of the Dolphin – Opening keynote

Edward Screven – Chief Corporate Architect of Oracle provided the opening keynote at the 2010 MySQL Users Conference.

Overall I was disappointed. The first half was more an Oracle Sales pitch, we had some product announcements, we had some 5.5 performance buzz. While a few numbers and features were indeed great to hear, there was a clear lack of information to the MySQL ecosystem including employees, alumni and various support services. I hope more is unveiled this week.

Some notes of the session.

  • Oracle’s Strategy covers storage, servers, virtual machines, operating system, database, middleware, applications
  • We build a complete technology stack that is “open” and “integrated” based on “open standards”
  • products talk via open standards with the intention for customers to not feel locked in to any technology
  • Examples include apache, java, linux, xen, eclipse, and innodb
  • Unbreakable linux has now over 4,500 customers

After the sales pitch we got down to more about MySQL.

What MySQL means to Oracle? We make the Oracle solution more complete as a stack for customers.

What is the investment in MySQL?

  • Make MySQL a better MySQL
  • Develop, promote and support MySQL
  • MySQL community edition

Integration with Oracle Enterprise Manager, Oracle Secure Backup and Oracle Audit Vault infrastructure. *This I expected and have blogged about, so I’m glad to see this commitment.

MySQL 5.5 is now in Alpha, some features are

  • InnoDB will be default engine
  • Semi sync replication
  • Replication heartbeat
  • Signal
  • Performance Schema

MySQL 5.5 is planned on being faster with Innodb Performance Improvements & MySQL Performance Improvements.
MySQL 5.5 sysbench claims, read 200% faster, write 364% faster.

MySQL Workbench 5.2 announcement

  • SQL Development
  • Database Administration
  • Data Modelling

MySQL Cluster 7.1 GA announcement

  • Improved Administration
  • Higher Performance
  • Carrier Grade Availability & Performance

MySQL Enterprise Backup announcement

  • Online backup for InnoDB only
  • Formally InnoDB hot backup with additional features including incremental backups

MySQL Enterprise Monitor 2.2 Beta announcment

In closing the statement was “MySQL lets Oracle be more complete at the database layer”. Is that good for the MySQL Community or better for the Oracle revenue model?

New linux desktop configuration

My purchase yesterday was a HP Pavilion p6340f Desktop PC with the following specs.

  • Intel Core 2 Quad Q8400 2.66GHz Processor
  • 4MB L2 Cache, 1333MHz FSB
  • 8GB PC3-8500 DDR3 SDRAM (4 x 2GB)
  • 1TB Serial ATA Hard Drive
  • Intel Graphics Media Accelerator X4500 with 32MB Integrated shared graphics memory
  • Lightscribe SuperMulti DVD±R/RW with Double Layer
  • 10/100/1000 Base-T Network interface
  • Wireless LAN 802.11 a/b/g/n

The purchase price $749+tax which was more then B&H at $699 but not being open Friday nights, B&H it’s your loss. There is also a P6320 model with AMD Phenom II X4 820 2.80GHz processor and NVIDIA GeForce 9100 Graphics for the same price, it was a tough decision.

I’m not trilled with the HP part having not enjoyed experiences with HP servers and Compaq desktops, however time will tell.

Upgrading my Google G1 dev phone to Android 1.6

To update your Google G1 phone (mine is an Android developer unlocked phone) to Android 1.6 (Donut), I did the following.

  • Download and unpack the Android SDK for Mac OS X from http://developer.android.com/sdk/index.html
  • Download the Android 1.6 Radio and System Images from http://developer.htc.com/adp.html
  • Reboot phone with USB connected
  • Update the Device Radio Firmware
    • Confirm devices with $ adb devices This step drove me crazy because it would list no devices. It ended up being a faulty (and new) USB cable. When your phone is connected to USB, it will give you a notification, and usb icon on phone top menu.
    • Copy Radio image
    • Reboot in recovery mode and follow instructions
  • Download the fastboot for Mac OS X at http://developer.htc.com/adp.html
  • Flash the System Image Package to the Device as per instructions

The instructions say to reboot, but in my case it rebooted automatically after the fastboot update.

The problem after reboot was I was unable to sign in to google servers the first time. At G1 Dev Phone won’t connect to Google servers with valid SIM card I added the necessary AT&T/Cingular APN via details at http://modmyi.com/wiki/index.php/Carrier_APN_Settings.

I could then go Settings | Data synchronization and continue the Google registration process.

Getting started with Cassandra

With the motivation from today’s public news on Twitter’s move from MySQL to Cassandra, my own skills desire following in-depth discussions at last November’s Open SQL Camp to consider Cassandra and yesterday’s discussion with a new client on persistent key-value store products, today I download installed and configured for the first time. Not that today’s news was unexpected, if you follow the Twitter Engineering Open Source projects you would have seen Cassandra as well as other products being used or evaluated by Twitter.

So I went from nothing to a working Cassandra node in under 5 minutes. This is what I did.

  1. While I knew this was an Apache project, a Google Search yields for me the 3rd link for the The Apache Cassandra Project at http://incubator.apache.org/cassandra/. Congrats for Cassandra now a top level Apache Project. This url will update soon.
  2. Download Cassandra. Hard to miss with a big green button on home page. Current version is 0.5
  3. I read Getting Started, which is the 3rd top level link on menu after Home and Download. Step 1 is picking a version which I’ve already done, Step 2 is Running a single node.
  4. The Getting Started indicated a problem on Mac OS X for the required minimum Java version. I was installing on Mac OS X 10.5 and CentOS 5.4. I’ve experienced this Java 6 default path issue before. Set my JAVA_HOME and PATH accordingly (after I updated the wiki with correct value)
  5. I extracted the tar file, changed to the directory and took at look at the README.txt file. Yes, I always check this first with any software and relevant because it includes valuable instructions on creating the default data and log directories.
  6. Start with bin/cassandra -f. No problems!
  7. I then followed the instructions from the link in Step 2 with the CassandraCli. This tests and confirms the installation is operational.

Ok, a working environment. I’ve now installed on a second machine and tested however I now need to configure the cluster, and the documentation is not as straightforward. Time to try out Google again.

On a side note, this is one reason why I love Open Source. I followed the instructions online and found a mistake in the Mac OS X path, I simply registered and corrected providing the benefit of my experience for the next reader(s).

You may also like to view future posts including.

What's your MySQL version?

I’ve heard that the mechanic’s wife always has a car that needs repair or tuneup, the painter’s wife always had walls of peeling paint, you get the picture. What about MySQL DBA’s and their own databases? While I have many versions of MySQL for testing including for example the latest 5.1.44 which I was using for my previous post, what is running on my production server? Let’s see:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.25-rc |
+-----------+

That’s really old. And yes, to prove my point that we can be our own worst enemy, the previous version before 5.1.25 was 5.1.6. Yes, .6 which worked just fine, and never crashed once for my 20+ websites. While I have downloaded onto my production server several versions ready for upgrade including versions 5.1.30, 5.1,38, and 5.4.1 I’ve never actually gone through the upgrade process.

Migrating MySQL latin1 to utf8 – Character Set Options

Continuing on from preparation in our MySQL latin1 to utf8 migration let us first understand where MySQL uses character sets. MySQL defines the character set at 4 different levels for the structure of data.

  • Instance
  • Schema
  • Table
  • Column

In MySQL 5.1, the default character set is latin1. If not specified, this is what you will get. For example.

mysql> create table test1(c1 varchar(10) not null);
mysql> show create table test1G
Create Table: CREATE TABLE `test1` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

If you want all tables in your instance to always be a default of utf8, you can changed the server variable character_set_server. This can be set dynamically.

mysql> set global character_set_server=utf8;
mysql> set session character_set_server=utf8;
mysql> create table test2(c1 varchar(10) not null);
mysql> show create table test2G
Create Table: CREATE TABLE `test2` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

If you change this dynamically be sure to include the option in your my.cnf to ensure this option is persisted for a mysqld restart.

You can define the default character set for all new tables in a given schema. You specify this when you create the schema.

mysql> set global character_set_server=latin1;
mysql> set session character_set_server=latin1;
mysql> create schema test_ucs2 default character set ucs2;
mysql> use test_ucs2;
mysql> create table test3(c1 varchar(10) not null);
mysql> show create table test3G
Create Table: CREATE TABLE `test3` (
  `c1` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=ucs2

Even though we have a schema default, you can always specify the default character set for a given table which overrides any defaults.

mysql> use test_ucs2;
mysql> create table test4_utf8 (c varchar(10) not null) default charset utf8;
mysql> show create table test4_utf8G
Create Table: CREATE TABLE `test4_utf8` (
  `c` varchar(10) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

And finally, if you really wanted to be specific you can define the character set on a per column level.

mysql> create table test4_utf8_latin1 (c varchar(10) not null, c2 varchar(20) charset latin1) default charset utf8;
mysql> show create table test4_utf8_latin1G
*************************** 1. row ***************************
       Table: test4_utf8_latin1
Create Table: CREATE TABLE `test4_utf8_latin1` (
  `c` varchar(10) NOT NULL,
  `c2` varchar(20) CHARACTER SET latin1 DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

With great flexibility comes great responsibility. You should have a defined standard for your application that is simple and easy to understand. I am not a proponent of using utf8 for everything, the primary reason why is memory. As part of my consulting I spend a lot of time with clients that have limited resources, e.g. database servers with 2GB or 4GB of RAM. MySQL stores utf8 efficiently on disk, but when this data is stored in memory for internal usage, it automatically uses 3 bytes, when on disk it may only be 1 byte. You can test this by creating a MEMORY table with latin1 and utf8 examples and comparing the difference in size. Is this a serious problem? Well that depends on many factors such as the number of database connections, persistent or not persistent connections, the size of the results etc. While it’s difficult in MySQL to instrument the memory precisely on a per connection basis, prudence should be a consideration for any physical resources, especially RAM.

Now that we understand what’s possible, how can we change our existing latin1 tables in our preparation example?

We could try a simple ALTER TABLE command.

mysql> alter table test_latin1 default charset utf8;
mysql> show create table test_latin1G
Create Table: CREATE TABLE `test_latin1` (
  `c` varchar(100) CHARACTER SET latin1 NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

This does not work, because we are only changing the default storage engine of the table. The underlying columns remain the same. If we were to add a new column, it would default to utf8. We can however achieve what we expected with the CONVERT option.

mysql> alter table test_latin1 convert to character set utf8;
mysql> show create table test_latin1G
Create Table: CREATE TABLE `test_latin1` (
  `c` varchar(100) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8

We look at our data, and it looks great? Are we done with our conversion?

mysql> select * from test_latin1;
+---------------+
| c             |
+---------------+
| a             |
| abc           |
| ☺           |
| abc ☺☹☻ |
+---------------+

The answer is no. While it may look like the data is correct, MySQL also manages character sets for the communication channel. In this case, we are still communicating in latin1. To ensure moving forward in the future we must always communicate in utf8 to ensure we correctly pass utf8 to the database. We can test this with the mysql client, and as you will see our data is still corrupt.

mysql> set names utf8;
mysql> select * from test_latin1;
+------------------------+
| c                      |
+------------------------+
| a                      |
| abc                    |
| ☺                 |
| abc ☺☹☻ |
+------------------------+

mysql> show session variables like 'character%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

While you can see how we could migrate the schema definition, this does not complete our migration. In my next post, I will discuss the various different ways to correctly perform a data migration between latin1 and utf8.

Checked your MySQL recovery process recently?

I sound like a broken record with every client when I talk to about the resilience of their production environments. It’s very simple in theory, however in practice many organizations fail.

Ask yourself these checklist questions for your MySQL backup and recovery process?

  1. Do you have MySQL backups in place?
  2. Do you backup ALL your MySQL data?
  3. Do you have consistent MySQL backups?
  4. Do you have backups that include both static snapshot and point in time transactions?
  5. Do you review your backup logs EVERY SINGLE day or have tested backup monitoring in place?
  6. Do you perform a test recovery of your static backup?
  7. Do you perform a test recovery to point in time?
  8. Do you time your backup and recovery process and review over time?
  9. Do you have off-site copies of your backups?
  10. Do you backup your primary binary logs?

In the past month I’ve discovered clients that have an online only business (i.e. Their MySQL data is their only tangible asset), they perform daily backups but they don’t have binary logging enabled. I’ve also discovered an example of backup logs not being checked, and an underlying mysqldump error was resulting in an incomplete backup, yet the backup script apparently completed successfully.

Disaster is inevitable.

If you don’t score 8 or better in the above checklist in your business, you are at higher risk. If you are a owner/founder/executive this should keep you awake at night if your not sure of your business viability. If your organization needs help, please contact me for assistance.

  • Can you recover from a small or large disaster?
  • Do you have confidence in your DR plan?
  • Do you know how long your DR plan will take.
  • What does your online business look like or operate during your DR time?

Migrating MySQL latin1 to utf8 – Preparation

This article is Part 1 of a series of articles regarding MySQL character set conversion.
Be sure to also check out character set options and the process for more information.

Before undertaking such migration the first step is a lesson in understanding more about how latin1 and utf8 work and interact in MySQL. latin1 in a common and historical character set used in MySQL. utf8 (first available in MySQL Version 4.1) is an encoding supporting multiple bytes and is the system default in MySQL 5.0

  • latin1 is a single byte character set.
  • utf8 is a 1-3 byte character set depending on the size of the character. NOTE: MySQL utf8 does not support the RFC 3629 4 byte sequences. (Updated: MySQL 5.5 now supports full Unicode support with the ” utf8m4″charset))

MySQL variables

MySQL has a number of different system variables to consider, the following is the default representation in MySQL 5.1

mysql> show global variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | latin1                                                         |
| character_set_connection | latin1                                                         |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | latin1                                                         |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

MySQL enables you to specify the character set for data at the Instance,Schema,Table and Column level. This is important because this complexity, especially between the Table and Columns can catch you out.

It is also important to ensure that not only is data stored in the appropriate format, the communication between client and server also needs to support the character set you wish to use.

latin1 example

We start by creating a simple table, inserting some data, and reviewing the data.

mysql> create table test_latin1(c varchar(100) not null) default charset latin1;
mysql> insert into test_latin1(c) values ('a'),('abc'),('☺'),('abc ☺☹☻');

mysql> select c,length(c),char_length(c),charset(c),hex(c) from test_latin1;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | latin1     | 61                         |
| abc           |         3 |              3 | latin1     | 616263                     |
| ☺             |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻       |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

We use the LENGTH() and CHAR_LENGTH() functions to confirm the known length of the string and the true number of characters.

In the above examples, the smiley characters can be reproduced in a web page with the following.

&#9786;&#9785;&#9787;

utf8 example

In a separate session (because we re-use these later) we repeat for utf8.

mysql> set  names utf8;
mysql> show session variables like '%char%';
+--------------------------+----------------------------------------------------------------+
| Variable_name            | Value                                                          |
+--------------------------+----------------------------------------------------------------+
| character_set_client     | utf8                                                           |
| character_set_connection | utf8                                                           |
| character_set_database   | latin1                                                         |
| character_set_filesystem | binary                                                         |
| character_set_results    | utf8                                                           |
| character_set_server     | latin1                                                         |
| character_set_system     | utf8                                                           |
+--------------------------+----------------------------------------------------------------+

mysql> create table test_utf8(c varchar(100) not null) default charset utf8;
mysql> insert into test_utf8(c) values ('a'),('abc'),('☺'),('abc ☺☹☻');

mysql> select c,length(c),char_length(c),charset(c), hex(c) from test_utf8;
+---------------+-----------+----------------+------------+----------------------------+
| c             | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------------+-----------+----------------+------------+----------------------------+
| a             |         1 |              1 | utf8       | 61                         |
| abc           |         3 |              3 | utf8       | 616263                     |
| ☺            |         3 |              1 | utf8       | E298BA                     |
| abc ☺☹☻       |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

As you can see, the key difference here is the CHAR_LENGTH() of the utf8 strings differ.

Mixing latin1 with utf8

If we now look at the latin1 data in the utf8 session we see that while the underlying data via LENGTH(),CHAR_LENGTH() and HEX() remains unchanged (because this is the database representation of the data), the actual output presentation is garbled due to the mismatch in the client communication.

mysql> select c,length(c),char_length(c),charset(c), hex(c) from test_latin1;
+------------------------+-----------+----------------+------------+----------------------------+
| c                      | length(c) | char_length(c) | charset(c) | hex(c)                     |
+------------------------+-----------+----------------+------------+----------------------------+
| a                      |         1 |              1 | latin1     | 61                         |
| abc                    |         3 |              3 | latin1     | 616263                     |
| ☺                    |         3 |              3 | latin1     | E298BA                     |
| abc ☺☹☻          |        13 |             13 | latin1     | 61626320E298BAE298B9E298BB |
+------------------------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

If we look at the utf8 data in latin1, we see a different garbled representation.

mysql> select c,length(c),char_length(c),charset(c),hex(c) from test_utf8;
+---------+-----------+----------------+------------+----------------------------+
| c       | length(c) | char_length(c) | charset(c) | hex(c)                     |
+---------+-----------+----------------+------------+----------------------------+
| a       |         1 |              1 | utf8       | 61                         |
| abc     |         3 |              3 | utf8       | 616263                     |
| ?       |         3 |              1 | utf8       | E298BA                     |
| abc ??? |        13 |              7 | utf8       | 61626320E298BAE298B9E298BB |
+---------+-----------+----------------+------------+----------------------------+
4 rows in set (0.00 sec)

Conclusion

Armed now with a basic understanding we are ready to consider what approaches we may undertake to migrate this example table, and how we may be able to verify our data.

References

Character Set Support
Connection Character Sets and Collations

Beyond MySQL GA: patches, storage engines, forks, and pre-releases – FOSDEM 2010

Kristian Nielsen presented “Beyond MySQL GA: patches, storage engines, forks, and pre-releases”.
This included a history of current products:

Google Patches (5.0 & 5.1) included improvements in :

  • statistics/monitoring
  • lock contention
  • binlog
  • malloc()
  • filesorts
  • innodb I/O and wait statistics
  • SHOW …STATISTICS statements
  • smp scalability
  • I/O scalability
  • semisync replication
  • many more

Percona Patches (5.0) focus on

  • statistics/monitoring
  • performance/scalability
  • buffer pool content/mutexes
  • microslow patch

These have been ported to 5.1 and mainly integrated into XtraDB.

EBay Patches (5.0) have included:

  • variable length memory storage engine
  • pool of threads
  • Virtual columns

XtraDB storage engine (5.1) includes

  • Percona patches
  • Google patches
  • Innodb patches
  • Has XtraBackup for backup

Other engines/patches discussed included:

  • PBXT storage engine – community contribution
  • FederatedX – replacement to Federated
  • Sphinx storage engine
  • Pinba storage engine – Collects PHP statistics
  • Others OQGraph/Spider
  • Galera – Synchronous replication
  • Drizzle

Alternative packaging options for MySQL 5.0 and MySQL 5.1 including Our Delta, Percona and MariaDB.

FOSDEM 2010 MySQL Developer Room Schedule
FOSDEM 2010 Website
Brussels, Belgium
February 7, 2010

10x Performance Improvements in MySQL – A Case Study

The slides for my presentation at FOSDEM 2010 are now available online at slideshare. In this presentation I describe a successful client implementation with the result of 10x performance improvements. My presentation covers monitoring, reviewing and analyzing SQL, the art of indexes, improving SQL, storage engines and caching.

The end result was a page load improvement from 700+ms load time to a a consistent 60ms.

Be sure to know your my.cnf [sections]

The MySQL configuration file, e.g. /etc/my.cnf has a number of different section headings including [mysql], [mysqld], [mysqld_safe]. It is important that you ensure you put the right variables into the right section. For example, the following my.cnf configuration file will not operate as the user probably expects.

[mysqld]
...
log-bin=mysql-bin
server-id=1
query_cache_size = 100M
query_cache_type = 1

...

[mysqld_safe]
...
key_buffer_size=600M
skip-innodb
...

In this example, this configuration does not give you a MyISAM key buffer of 600M, it’s actually the default of 8M.

mysql> show global variables like 'key_buffer_size';
+-----------------+---------+
| Variable_name   | Value   |
+-----------------+---------+
| key_buffer_size | 8388600 |
+-----------------+---------+

Be sure to add the right options to the [mysqld] section.

What I didn’t know until yesterday was that some programs read from multiple groups. From the 5.1.2. Server Command Options MySQL reference manual page. In helping the describe the problem for the benefit of readers I actually learned something new myself.


mysqld reads options from the [mysqld] and [server] groups. mysqld_safe reads options from the [mysqld], [server], [mysqld_safe], and [safe_mysqld] groups. mysql.server reads options from the [mysqld] and [mysql.server] groups.

I have for example always put log-error in both the [mysqld_safe] and [mysql]d sections because both of these write different errors. Seems that is unnecessary.

Problem of the day, DESC gives error.

I hit this interesting problem last night on 5.0.51a.

mysql> use information_schema;
mysql> desc routines;
ERROR 1 (HY000): Can't create/write to file '/home/tmp/#sql_fea_1.MYD' (Errcode: 24)
mysql> show create table routinesG
*************************** 1. row ***************************
       Table: ROUTINES
Create Table: CREATE TEMPORARY TABLE `ROUTINES` (
  `SPECIFIC_NAME` varchar(64) NOT NULL default '',
  `ROUTINE_CATALOG` varchar(512) default NULL,
  `ROUTINE_SCHEMA` varchar(64) NOT NULL default '',
  `ROUTINE_NAME` varchar(64) NOT NULL default '',
  `ROUTINE_TYPE` varchar(9) NOT NULL default '',
  `DTD_IDENTIFIER` varchar(64) default NULL,
  `ROUTINE_BODY` varchar(8) NOT NULL default '',
  `ROUTINE_DEFINITION` longtext,
  `EXTERNAL_NAME` varchar(64) default NULL,
  `EXTERNAL_LANGUAGE` varchar(64) default NULL,
  `PARAMETER_STYLE` varchar(8) NOT NULL default '',
  `IS_DETERMINISTIC` varchar(3) NOT NULL default '',
  `SQL_DATA_ACCESS` varchar(64) NOT NULL default '',
  `SQL_PATH` varchar(64) default NULL,
  `SECURITY_TYPE` varchar(7) NOT NULL default '',
  `CREATED` datetime NOT NULL default '0000-00-00 00:00:00',
  `LAST_ALTERED` datetime NOT NULL default '0000-00-00 00:00:00',
  `SQL_MODE` longtext NOT NULL,
  `ROUTINE_COMMENT` varchar(64) NOT NULL default '',
  `DEFINER` varchar(77) NOT NULL default ''
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

I tried in on my MySQL 5.1 server without incident. In a few minutes of thinking about it, I was unable to determine the problem. Anybody care to guess?