Digital Tech Trek Digest [#Issue 2024.09]

As an entrepreneur, pricing is an important consideration in any evaluation, development, and customer testing. In How To Price A SaaS Product, we see different pricing strategies, cost-based pricing, competitor-based pricing, penetration pricing, value-based pricing, freemium pricing. None of these match what I am ultimately considering: consumption-based pricing. Pricing is critical to define the value proposition statement and determine the range of the total lifetime value (TLV). It can vary greatly for B2C, B2B, and B2B enterprise offerings. If we look at YCombinator https://www.ycombinator.com/library/6h-startup-pricing-101 a basic principle is determining the gap between price and cost. That is your margin and your incentive to sell, and you work with either cost-plus or value-based pricing. Starting with founder-led sales is difficult as you do not have the luxury of a dedicated and experienced head of sales to work on different models and guide a technical founder, even before you enter the minefield of enterprise sales with applicable bids, contract, and compliance complexities. I am drawn back to “Consumption-based pricing is a pricing model that charges customers based on their product or service usage. Consumption-based pricing calculates pricing based on usage volume rather than the number of users and is a popular pricing model for IT services, SaaS, and cloud computing and storage” Cite: Consumption-Based Pricing.

Moving a Billion Postgres Rows on a $100 Budget

I wrote recently about the 1 Billion Row Challenge (1BR). This week, I found this article on the same number with a different title. The objective was not performance; it was cost. PeerDB enables the efficient extraction of data from PostgreSQL into a data warehouse, such as Big Query, ClickHouse or Snowflake. It was interesting to see Arvo as a format used over, for example, Parquet. The product also offers different streaming modes, including log-based (CDC), cursor-based (timestamp or integer), and XMIN-based. I will need to do further research on this new term XMIN-Based.

Test queries against your production database (responsibly)

This post links off to a YouTube video of The Safest Way to Test Postgres Destructive Queries, which provides a basic introduction to branching of the Neon PostgreSQL DBaaS. While the title originally interested me, the example showing the mechanics is like many other product examples in which it is extremely simplistic and not a true representation of “production” size or workload. I see this as a similar concept to AWS RDS Aurora cloning. However, any example should modify the structure of a table, measure the impact of that structure against production queries (note plural), and provide additional metadata rather than just a response time. These are important considerations in my own evaluation of test coverage of data access and the gathering of configuration, data, and infrastructure when running experiments to determine a more optimal data access path or a new functionality requirement. More documentation can be found here on Neon Branching

About “Digital Tech Trek Digest”

Most days, I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new about professional and personal topics of interest. I turn what I read into actionable notes in a short, committed time window, summarizing what I learned, what I should learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow. Also Scientific American Technology, Fareed’s Global Briefing, Software Design: Tidy First? by Kent Beck, Last Week in AWS, Micro Newsletter to name a few.

Random Wisdom

This week, I was reminded via a very interesting statement that work-life balance and joy in what you do are critically important. You will not find on a tombstone the statement:

“I never worked enough hours.”

Digital Tech Trek Digest [#Issue 2024.08]

The One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds

Well, it’s way under 2 seconds for the 1brc.The published results are in and if you’re good you can read 1 billion data points of weather data and analyze it. The final best number as per the article release is “00:00.323″. Yes, that answer is in milliseconds “Result (m:s.ms)”. Mind-blowing.

ScyllaDB Summit 2024

Last week I attended this virtual event. All the presentations can be found online. I had never used the product before so while some new features like Tablets were not as applicable in understanding the full impact, the comparisons of DynamoDB performance and cost were very applicable.

So what is ScyllaDB? It is a distributed NoSQL DBaaS that speaks Cassandra protocol (do large companies still use this?), and it speaks AWS DynamoDB protocol. That is really interesting to me. You can choose a Cloud Hosted offering, or if you’re into managing your setup, you can use the Open Source ScyllaDB version available from GitHub. I started at ScyllaDB University to get a grip on the basics. I have yet to try the local Docker Compose setup.

Thanks also to the team for the swag which I received.

Playing a game with your CI/CD pipeline

My friend Sergey has created a game in GitLab called GitTerra. Drop a few lines into your .gitlab-ci.yml and each build will give you a generated 3D map of a city based on your commit. I really look forward to some of his next steps, leveraging potentially different colors for languages, or different building structures for artifacts found in your commit.

We raised 11.6M to build Serverless Postgres for Modern SaaS

Congrats to Gwen and her co-founder for getting seed funding for Nile Serverless Postgres for Modern SaaS. Awesome news for an entrepreneur and I’m very hopeful for the success of Nile.

The Safest Way to Test Postgres Destructive Queries

While I am a user of ElephantSQL serverless PostgreSQL and Neon, Nile and Xata are just a few that are competing in the space. With multiple other products also speaking PostgreSQL protocol, you can easily trial a small product in an RDBMS in the cloud at no cost. PostgreSQL is definitely outdoing MySQL in this space. You have the extensive set of NoSQL Cloud offerings, SycllaDB I just mentioned, and D1 by CloudFlare I have yet to try this branching feature for your database, sounds interesting and I’ve added to my just as long list of products to try, as books to read. Nit: It’s PostgreSQL, not Postgres.

About “Digital Tech Trek Digest”

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new covering professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned, what I should learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow. Also Scientific American Technology, Fareed’s Global Briefing, Software Design: Tidy First? by Kent Beck, Last Week in AWS, Micro Newsletter to name a few.

Digital Tech Trek Digest [#Issue 2024.03]

Lessons from going freemium: a decision that broke our business

As an entrepreneur always considering how to produce a sustaining passive revenue, what licensing model to use, and how to acquire and retain customers, the allure of a freemium model is ever present in so many offerings. You may wish to read this article and look at the visualizations provided with the narrative. I found this a useful data storytelling example.

The allure of seeing a new product is the strongest motivator new users have to complete setup. If you make onboarding too easy, they’ll never come back to do the hard task you let them skip.

Read more at Lessons from going freemium: a decision that broke our business Source: TLDR

Newsletters and online content creators

Lenny’s Newsletter from the prior article, listed with over 574,000 subscribers is one of several Substack newsletters I subscribe to. Substack is described as “The subscription network for independent writers and creators”. I have been collecting the number of subscribers from several newsletters I follow, however, there is no way to see that growth over time. Also missing are the price rates over time and the ratio of free to paying subscribers. Random Idea: What is missing is a history of this information. Other stats I’ve noted previously include 66,000 subscribers for Kent Becks Newsletter with 3 subscription plan offerings, 1,250,000 subscribers for the free TLDR (I can remember this years ago being much less) and 65,000 subscribers for the Seattle Data Guy newsletter.

FWIW this post from Lenny’s Newsletter This newsletter is growing up is from 2020.

Golden Kitty Awards 2022

I came across the Golden Kitty Awards which unfortunately are only current to 2022 (fail on being current). It was interesting to scan the list for innovative ideas. I’ve yet to visit any sites, but I’m always encouraged by what people think of and commit to building regardless of the motivation or incentive. What counts is an entrepreneur takes an idea and releases a product. 

Source: Random

Streamer JS – Video stream layout manager for OBS Studio and other streaming applications.

I am a new user of Twitch streaming for personal projects.  My good friend Sergey Chernyshev organizer of the large New York Web Performance Group has created Streamer JS as a means to drive more dynamic content in the browser and with common languages of HTML/CSS/Javascript and using OBS more as the streaming only component.  One objective is to have better version control management of assets/scenes/sources/filters/etc.  It’s interesting that  PouchDB is an eventually consistent distributed datastore in Javascript. Yet another simple data store to review for suitability. 

Source: Word of mouth

The 37th Chaos Communication Congress (37C3) by the Chaos Computer Club

Last month I was introduced to the Chaos Computer Club. This is a large German-based annual tech conference with a focus on security & infrastructure/hacking.  Over 100 talks from the most recent event last week have been posted here.

Source: Word of mouth

About ‘Digital Tech Trek Digest’

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new covering professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned today, what I should learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow. Also Scientific American Technology, Fareed’s Global Briefing, Software Design: Tidy First? by Kent Beck to name a few.

Digital Tech Trek Digest [#Issue 2024.02]

Indie Newsletter Tool Generates $15,000 a Month

There are so many different email newsletter sites you could wonder if there is market saturation. MailChimp, Mailgun, ConvertKit, Sendgrid (now part of Twilio it seems), Moosend and Mailersend come to mind.

It seems the space still has plenty of revenue-producing options including buttondown.email reportedly a side gig generating $15k per month. Source: BoringCashCow

When I asked a good friend and author of the Technical SEO Weekly his use of ConvertKit directed me to this Baremetrics Dashboard which is another product to look at sometime.

LLMs and Programming in the first days of 2024

How do use an LLM? If you are still on the fence start getting into the habit of using it more frequently then start. I now use ChatGPT and Claude AI daily, and with a crowded market there are many other emerging technologies to also consider.

I use ChatGPT for coding and image generation with DALL.E. I use Claude more for reviewing large documents that seem to be ideal for producing a summary, or to generate a fictitious movie script from those documents.

I do not like Javascript nor do I wish to actually learn this language however I write it daily via ChatGPT. Javascript is the ever-changing technology of web development and it’s impossible to keep up with the next product, or version of a product you may know. ChatGPT helps me navigate this combined with asking for HTML and TailwindCSS.  However, it’s not perfect, you need to be an experienced engineer who has learned how to write code for many years to ask the right questions and to correct the LLM when it does not produce what you expect. Let’s look at CSS. Now there is flex and grid and it’s hard to keep up with changing features that browsers support. This is where ChatGPT has helped me. I have been using Tailwindcss but it still took an expert friend 30 minutes to help me debug a CSS formatting issue of a future OBS Twitch streaming project to correctly size the content all in a 1920×1080 box. I learned a lot of new features of Google Chrome Developer Tools Inspector I did not know and are probably just the start of expert debugging features.

Until a few months ago I never knew it’s now much easier to read JSON in Javascript.

async function fetchData() {
  try {
    const response = await fetch('data.json');
    const data = await response.json();
    console.log(data);
    return data
  } catch (error) {
    console.error('Error fetching data:', error);
  }
}

let data = await fetchData();

I’d like to remind users that  ChatGPT can make mistakes. Consider checking important information.. Source:  TLDR

ParadeDB (GitHub Repo)

Every day there is another PostgreSQL product to review.  I am a current user of ElephantSQL which I didn’t know existed two months ago. Neon and Tembo are two more PostgreSQL serverless-related products on my product review list.  Now adding ParadeDB as well as reading Thoughts on PostgreSQL in 2024.

About ‘Digital Tech Trek Digest’

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new covering professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned today, what I should learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow to name a few.

Mastering MySQL 5.7 EOL migrations

In a recent podcast on Mastering EOL Migrations: Lessons learned from MySQL 5.7 to 8.0 I discuss with my colleague Adam North not only the technical issues that become a major migration but also key business and management requirements with having a well-articulated strategy that covers:

  • Planning
  • Testing
  • Be Prepared
  • Proactiveness

Having a plan is key to any significant task including data migrations. You should heed the warnings and the deprecations and consider all potential downstream product impacts such as connector upgrades. The plan includes a timeline but also needs to define all the stakeholders both technical and business, the definition of a successful migration, and most importantly the decision tree for a non-successful migration that would include any outage, failback, rollback, or fix-forward requirements.

Test, Test, Test. Leveraging the simple design pattern of read-write splitting (hint: if your application does not support this, it should) you get to test with minimal risk all of your application reads and with real load from 1% to 100%. You can validate all writes but this does not match concurrency, however, you can emulate load testing and using this two-way door strategy, verify and prevent many common problems before the decision point of failover.

Being prepared is assuming your migration will fail, rather than assuming it will succeed. Rehearsal of all steps that are documented and reproducible. Validating that your backup and recovery strategy is still optimal and operational with the new version, preparing supporting staff for availability before, during, and after the migration. There are probably not many technologists that can say, “Well that was a boring, uneventful successful migration”. The question is why not?

Being proactive is just as important. Leaving a large migration to the last minute is procrastination and a cause of unneeded stress during a non-successful migration. The Meltdown/Spectre vulnerabilities are one example of a highly impactful event outside of your control that sidelined entire teams in many companies for months. Does an outage of your cloud provider impact your uptime requirements and force you to delay a last-minute migration due to customer SLA obligations? While being prepared is for the reasons you could think of, being proactive and prepared is for the situations you have not thought of.

Having solid architectural design practices will aid greatly in many critical business requirements of uptime, read-only mode, scale-out, scale-up, and sharding. These design patterns also greatly enhance the likelihood of a successful database migration.

We have also created a Checklist to cover the planning and execution of a migration. Any input is welcome.

You can check out the video podcast on YouTube or listen with your favorite podcast tool.

AWS RDS Aurora wish list

I’ve had this list on a post-it note on my monitor for all of 2022. I figured it was time to write it down, and reuse the space.

In summary, AWS suffers from the same problem that almost every other product does. It sacrifices improved security for backward compatibility of functionality. IMO this is not in the best practices of a data ecosystem that is under constant attack.

  • Storage should be encrypted by default. When you launch an RDS cluster its storage is not encrypted. This goes against their own AWS Well-Architected Framework Section 2 – Security.
  • Plain text passwords. To launch a cluster you must specify a password in plain text on the command line, again not security best practice. At least change this to using a known secret from AWS secrets manager.
  • TLS for administrative accounts should be the only option. The root user should only be REQUIRE SSL (MySQL syntax).
  • Expanding on the AWS secrets manager usage for passwords, there should not need to be lambda code and cloudwatch cron event for rotation, it should just be automatically built in.
  • The awscli has this neat wait command that will block until you can execute the next statement in a series of sequential events to prepare and launch a cluster, but it doesn’t work for create-db-cluster. You have to build in your own manual “wait” until “available” process.
  • In my last position, I was unable to enforce TLS communications to the database from the application. This insecure practice is a more touchy situation, however, there needs to be some way to ensure security best practices over application developer laziness in the future.
  • AWS has internal special flags that only AWS support can set when say you have a bug in a version. Call it a per-client feature flag. However, there is no visibility into what is set, which account, which cluster, etc. Transparency is of value so that the customer knows to get that special flag unset after minor upgrades.
  • When you launch a new RDS Cluster, for example, MySQL 2.x, you get the oldest version, back earlier in the year it was like 2.7.2, even when 2.10.1 was released. AWS should be using a default version when only an engine is specified as a more current version. I would advocate the latest version is not the automatic choice, but it’s better to be more current.
  • the ALTER SYSTEM CRASH functionality is great, but it’s incomplete. You cannot for example crash a global cluster, forcing a region-specific failover. If you have a disaster resiliency plan that is multi-region it’s impossible to actually test it. You can emulate a controlled failover, but this is a different use case to a real failover (aka Dec 2021)
  • Use arn when it’s required not id. This goes back to my earlier point over maximum compatibility over usability, but when a --db-instance-identifier, or --db-instance-identifier requires the value to be the ARN, then the parameter should be specific. IMO –identifier is what you use for that argument, e.g. --db-cluster-identifier. When you specify for example --replication-source-identifier this must be (as per docs) “The Amazon Resource Name (ARN) of the source DB instance or DB cluster if this DB cluster is created as a read replica.” It should then be --replication-source-arn. There are a number of different occurrences of this situation.

SELECT 1

If you have worked with an RDBMS for some time, you will likely have come across the statement SELECT 1.

However, rarely is it correctly explained to engineers what the origin of SELECT 1 is, and why it’s useless and wasteful? A google search is not going to give you the response you would hope, these ranked responses are just as useless as the statement itself.

Bloat

Seeing a SELECT 1 confirms two things. First you are using a generic ORM framework, quote, and second, you have never optimized your SQL traffic patterns.

“Frameworks generally suck.
They CLAIM to improve the speed of development and abstract the need to know SQL.
The REALITY is the undocumented cost to sub-optimal performance, especially with data persistence.”

Connection Pooling

SELECT 1 comes from early implementations of connection pooling.

What is a connection pool? Rather than a new request or call getting a new database connection each time you wanted to return some data, programming languages implemented a cache with a pre-loaded pool of pre-established database connections. The intended goal is to reduce the execution time of an initial expensive operation of getting a new database connection if you were retrieving data from a simple SELECT statement. If intelligent enough (many are not), these pools would include features such as a low watermark, a high watermark, a pruning backoff of idle connections, and an ability to flush all connections.

When your code wanted to access the database to retrieve data, it would first ask the connection pool for an available connection from its pool, mark the connection as in-use and provide that for subsequent consumption.

Here is a simple example of the two queries that would actually be necessary to retrieve one piece of information.

SELECT 1
SELECT email_address, phone, position, active FROM employee where employee_id = ?

Staleness

SELECT 1 was implemented as the most light-weight SQL statement (i.e., minimal parsing, privilege checking, execution) that would validate that your connection was still active and usable. If SELECT 1 failed, i.e. a protocol communication across your network, the connection could be dropped from the connection pool, and a new connection from the pool could be requested. While this may appear harmless, it leads to multiple code in-efficiencies, a topic for a subsequent discussion.

Failed error handling

SELECT 1 was a lazy and flawed means to perform error handling. In reality, every single SQL statement requires adequate error handling, any statement can fail at any time to complete. In the prior example, what happens if the SELECT 1 succeeds but a simple indexed SELECT statement fails? This anti-pattern also generally shows that error handling is inconsistent and highly duplicated rather than at the correct position in the data access path.

By definition, error handling is needed in an abstraction function for all SQL statements, and it needs to handle all types of error handling including the connection no longer valid, connection terminated, timed out, etc.

If you had the right error handling SELECT 1 would then be redundant, and as I stated useless. You simply run the actual SELECT statement and handle any failure accordingly.

High availability

In today’s cloud-first architectures where high availability consists of multiple availability zones and multiple regions where application A can communicate with database B, every unneeded network round-trip in a well-tuned system is wasteful, i.e. it is costing you time to render a result quicker. We all know studies have shown that slow page loads drive users away from your site.

The cost of the cloud

This AWS Latency Monitoring grid by Matt Adorjan really shows you the impact that physics has on your resiliency testing strategy when application A and database B are geographically separated and you just want one piece of information.

Conclusion

The continued appearance of SELECT 1 is a re-enforcement that optimizing for performance is a missing skill for the much larger engineering code-writing workforce that have lost the ability for efficiency. It is also another easy win that becomes an unnecessary battle for Data Architects to ensure your organization provides a better customer experience.

Upcoming Percona Live 2021 Presentations

I am pleased to have been selected to present at Percona Live 2021 May 12-13. My presentations include talks on AWS RDS Aurora and QLDB managed services.

Understanding AWS RDS Aurora Capabilities

The RDS Aurora MySQL/PostgreSQL capabilities of AWS extend the HA capabilities of RDS read replicas and Multi-AZ.

In this presentation we will discuss the different capabilities and HA configurations with RDS Aurora including:

* RDS Cluster single instance
* RDS Cluster multiple instances (writer + 1 or more readers)
* RDS Cluster multi-master
* RDS Global Cluster
* RDS Cluster options for multi-regions

Each option has its relative merits and limitations. Each will depend on your business requirements, global needs and budget.

This presentation will include setup, monitoring and failover evaluations for the attendee with the goal to provide a feature matrix of when/how to consider each option as well as provide some details of the subtle differences Aurora provides.

This presentation is not going to go into the technical details of RDS Aurora’s underlying infrastructure or a feature by feature comparison of AWS RDS to AWS RDS Aurora.

A QLDB Cheatsheet for MySQL Users

Amazons new ledger database (QLDB) is an auditors best friend and lives up to the stated description of “Amazon QLDB can be used to track each and every application data change and maintains a complete and verifiable history of changes over time.”

This presentation will go over what was done to take a MySQL application that provided auditing activity changes for key data, and how it is being migrated to QLDB.

While QLDB does use a SQL-format for DML, and you can perform the traditional INSERT/UPDATE/DELETE/SELECT. The ability to extend these statements to manipulate Amazon Ion data (a superset of JSON) gives you improved data manipulation, and for example the FROM SQL statement.

Get a blow by blow comparison of MySQL structures (multiple tables and lots of columns) and SQL converted into a single QLDB table, with immutable, and cryptographically verifiable transaction log. No more triggers, duplicated tables, extra auditing for abuse of binary log activity.

We also cover the simplicity of using X Protocol and JSON output for data migration, and the complexity of AWS RDS not supporting X Protocol

#WDILTW – What can I run from my AWS Aurora database

When you work with AWS Aurora you have limited admin privileges. There are some different grants for MySQL including SELECT INTO S3 and LOAD FROM S3 that replace the loss of functionality to SELECT INTO OUTFILE and mysqldump/mysqlimport using a delimited format. While I know and use lambda capabilities, I have never executed anything with INVOKE LAMDBA directly from the database.

This week I found out about INVOKE COMPREHEND (had to look that product up), and INVOKE SAGEMAKER (which I used independently). These are machine learning capabilities that enable you to build custom integrations using Comprehend and SageMaker. I did not have any chance to evaluate these capabilities so I am unable to share any use cases or experiences. There are two built-in comprehend functions aws_comprehend_detect_sentiment() and aws_comprehend_detect_sentiment_confidence(), a likely future starting place. Sagemaker is invoked as an extension of a CREATE FUNCTION that provides the ALIAS AWS_SAGEMAKER_INVOKE_ENDPOINT syntax.

Also available are some MySQL status variables including Aurora_ml_logical_response_cnt, Aurora_ml_actual_request_cnt, Aurora_ml_actual_response_cnt, Aurora_ml_cache_hit_cnt, Aurora_ml_single_request_cnt.

Some googling found an interesting simple example, calculating the positive/negative sentiment and confidence of sentences of text. I could see this as useful for analyzing comments. I’ve included the example from this site here to encourage my readers to take a look as I plan to do. Post IAM configuration I will be really curious to evaluate the responsiveness of this example. Is this truly a batch only operation or could you return some meaningful response timely?

This also lead to bookmarking for reading https://awsauroralabsmy.com/, https://github.com/aws-samples/amazon-aurora-labs-for-mysql/ and https://squidfunk.github.io/mkdocs-material/ all from this one page.

#WDILTW – To use a RDBMS is to use a transaction

I learned this week that 30+ years of Relational Database Management System (RDBMS) experience still does not prepare yourself for the disappointment of working with organizations that use a RDBMS; MySQL specifically; have a released production product, have dozens to hundreds of developers, team leaders and architects, but do not know the importance of, nor use transactions. If I was to ask this when interviewing somebody that would work with a database and the response was it is not important, or not used these days it would be a hard fail.

To use a RDBMS is to understand a very simple principle, a foundation of a transactional system. It is called ACID. Atomicity, Consistency, Isolation, Durability.

In a simplistic description of this 50 year principle.

  • A – It’s all or nothing
  • C – Your work meets all existing constraints
  • I – Your work is independent and not affected by other concurrent work
  • D – Your work is reproducible in the event of a hard failure

A – Atomicity means a transaction. It’s not rocket science, it’s actually in the description of a RDBMS. What is a transaction? It is a means in Structure Query Language (SQL) notation that can ensure your unit of work is all or nothing.

The most basic of examples is very simple. You want to deposit a check that was given to you. For the bank to accept this deposit, it needs to ensure that the account holder that issued the check has sufficient funds, that is Step 1: subtract amount X from account A, and Step 2: deposit in account B. You cannot do Step 1 or Step 2 independently, you must do 1 and 2 together, otherwise real money could be lost or created from thin air. The lack of transactions also can be affected by not handling I – Isolation.

We have to ask our educators and responsible professionals why? Why can such a bedrock principle not be used? Why in a team of individuals in a company, one single individual from the years of development and support not say, well you have to use a transaction to ensure the integrity of the data you are working with? It’s called a database for a reason.

The unfortunately reality is there are several reasons for this utter failure, and the frustration of professionals like myself.

  1. RDBMS is not the bedrock of data management it was 20,30 even 50 years ago. Today with a proliferation of different products, transactions are simply not taught. Even SQL is not taught, yet most products that want to enable users to query data ultimately provide a SQL-like interface. Many users today want a drag & drop GUI interface but not realize that is not how you manipulate data. Hadoop was a life change with map-reduce approach to large datasets. iIt did not start with SQL, but it has a widely used SQL interface now. You look at the newest hot products like snowflake. What is it’s method of data access, SQL.
  2. MySQL, the most popular open source database does not enforce transactions, it’s optional. What you say? MySQL has some unique features including the capability for multiple storage engines, that provide different features and capabilities, such as index approaches, consistency and support for transactions. An entire generation of open source products were released and do not use transactions, but use a transactional storage engine within a transactional product.
  3. Still on the topic of MySQL, it also does not enforce strict handling like other RDBMS products. The default for decades was to enable an SQL statement to corrupt the data integrity, i.e. C – Consistency. Because MySQL supported warnings and they were always not checked for by developers, data in did not guarantee data out.
  4. Every mini-generation (e.g. 5 years) of new developers think they know more, than seasoned professionals.

As it was for me in the 80s, it should be for use of a relational database; and even a non-relational database; to read and understand the seminal works of An introduction to Database Systems by C.J. Date. It has been required learning at universities for decades, but it seems to have lost is favor with multiple iterations of new software developers.

#3 – What Did I learn this week. WDILTW.

Defensive Data Techniques

As a data architect I always ensure that for any database schema change there a fully recoverable execution path.
I have generally advised to create a patch/revert process for every change.  For example, if a change adds a new column or index to a table, a revert script would remove the respective column or index.
The goal is to always have a defensive position for any changes. The concept is that simple, it is not complex.

In its simplest form I use the following directory and file structure.

/schema
    schema.sql
    /patch
        YYYYMMDDXX.sql     where XX,ZZ are sequential 2 digit numbers, e.g. 01,02
        YYYYMMDDZZ.sql
   /revert
       YYYYMMDDXX.sql   This is the same file name in the revert sub-directory.
       YYYYMMDDZZ.sql

At any commit or tag in configuration management it is possible to create a current copy of the schema, i.e. use schema.sql.
It is also possible to take the first version of schema.sql and apply chronologically all the patch scripts to arrive at the same consistent structure of the schema that is in schema.sql. You can also run a validation process to confirm these are equivalent.
For each tagged version or commit of this directory structure and files in version control, this should always hold true.
While not the desired execution path, every revert script can be applied in a reverse chronological order and return to the first version of the schema.
If you want to maintain a first_schema.sql file within the directory structure, you can always create any version of the schema from a given commit in a roll-forward or roll-back scenario.

In reality however this is rarely implemented. There is always divergence or drift. Drift occurs for several primary reasons. The first is non-adherence to the defined process. The second and more critical is the lack of adequate testing and verification at each and every step.  A Test Driven Design (TDD) that validates the given approach would enable a verification of end state of the schema and enable the verification at each accumulated

In addition to each patch/revert there needs to be a state that is maintained of what has been applied.  Generally for RDBMS storing this metadata within a table is recommended.

The above example shows files of .sql extension. Any schema management process needs to cater for .sh, .py or other extensions to cater for more complex operations.
 
What about data changes?  I would recommend that for all configuration information you follow the same management principles as for schema objects, that is you have a patch to insert/update/delete data, and you have a revert script that can restore that data.  Generally the complexity of the rollback process is a hurdle for developers/engineers. Having a framework is important to manage how data consistency is maintained. This framework could generate a statement to restore the data (e.g. a selective mysqldump), require a hand-crafted statement, or leverage the benefit of the RDBMS by storing the data into intermediate shallow tables.

Using a least privileged model complicates an applicable framework approach. Does the user applying the change now require the FILE privilege, or CREATE/DROP privilege to create tables for the ability to restore data.

If there is strict referential integrity at the database level, those protections will defend against unintended consequences. For example, deleting a row that is dependent on a foreign key relationship.  In a normal operating system accommodations are made generally for the sake of performance, but also for supporting poor data cleansing requirements. If the application maintains a level of referential integrity, the schema management process also needs to support this, adding a further complexity.  Ensuring data integrity is an important separate topic. If there is a dangling row, what is the impact? The data still exists, it is just not presented in a user interface or included in calculations. This generally leads to greater unintended consequences that are generally never obvious at the time of execution, but rather days, weeks or months later.

When it comes to objects within the structure of an RDBMS the situation is more complex.  A classic example in MySQL is a user.  A user in MySQL is actually the user definition which is just the username, password and host.  A user contains one or more grants. The user may be the owner of additional objects. Using default and legacy MySQL, it is simply not possible to determine if a user is actually being used. Percona and other variances support INFORMATION_SCHEMA.USER_STATISTICS which is a better method of evaluating the use of a user.  This does however require the intervention of time-based data collection, as this table is the accumulative statistics since an instance restart or flush.

With this type of object, or meta object several defensive techniques exist.  

If you had the user `blargie` and that user had grants to read data from several schemas, is the user used?  I don’t think so, let’s just delete it is not a fact-based approach to avoiding a subsequent problem.
Is the user used? Let’s revoke the users privileges and monitor for errors or user feedback? Or let’s change the user’s password?  With each of these strategies it is important to always have a defensive process to rollback.
A different approach is to use a common data technique of marking information as deleted before it’s physically deleted (think trash can before you empty the trash).  For MySQL users there is no default functionality (in the most recent versions of MySQL you can DISABLE a user).  One implementation to apply this pattern is to rename the user, which has the benefit of keeping the user’s password and privileges intack, therefore reducing the amount of complexity in restoring.

Regardless of the technique, it is important there is always a recovery path.  In a subsequent post I will discuss this approach towards cloud metadata, for example an AWS KMS policy, IAM Rule or ASG setting and the impact of  Infrastructure as a Service (IaaS) such as Terraform.

More reading https://en.wikipedia.org/wiki/Test-driven_development, https://en.wikipedia.org/wiki/Defensive_programming

MySQL Data Security Risk Assessment presentation

Securing your data is only as good as your weakest link. A clear-text password in a file or history file, shared privileges between test and production or open sudo access when you can connect as an unprivileged user all are security flaws. This talk discusses how to navigate the poor defaults MySQL has in place, how to strengthen processes and how to audit your environment. It also covers the complexity of deploying changes in an always available production environment.

Presented at the Data.Ops Conference in Barcelona, Spain.
Download slides

Identifying MySQL SSL communication using ngrep

Prior to MySQL 5.7 client communications with a MySQL instance were unencrypted by default. This plaintext capability allowed for various tools including pt-query-digest to analyze TCP/IP traffic. Starting with MySQL 5.7 SSL is enabled by default and will be used in client communications if SSL is detected on the server.

We can detect and confirm this easily with the ngrep command.

Default client connection traffic (5.6)

On a MySQL 5.6 server we monitor the default MySQL port traffic.

mysql56$ sudo ngrep -d eth1 -wi -P ' ' -W single -l port 3306
interface: eth1 (192.168.42.0/255.255.255.0)
filter: ( port 3306 ) and ((ip || ip6) || (vlan && (ip || ip6)))
...

We connect to this server using TCP/IP.

host$ mysql -uexternal -p -h192.168.42.16
mysql56> select 'unencrypted';

We can observe the communication to and from the server (in this example 192.168.42.16) is plaintext.

...
#
T 192.168.42.1:47634 -> 192.168.42.16:3306 [AP]      select 'unencrypted'
#
T 192.168.42.16:3306 -> 192.168.42.1:47634 [AP]      !    def    unencrypted  ! !                       unencrypted
#

SSL System Variables (5.6 default)

A default 5.6 installation does not have SSL enabled as verified by system variables.

mysql56 >SHOW  VARIABLES  LIKE '%ssl%';
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| have_openssl  | DISABLED |
| have_ssl      | DISABLED |
| ssl_ca        |          |
| ssl_capath    |          |
| ssl_cert      |          |
| ssl_cipher    |          |
| ssl_crl       |          |
| ssl_crlpath   |          |
| ssl_key       |          |
+---------------+----------+
9 rows in set (0.02 sec)

Default client connection traffic (5.7)

Running the same example client connection with MySQL 5.7 you will observe that communications to and from the server (in this example 192.168.42.17) are not in plaintext.

mysql57$ sudo ngrep -d eth1 -wi -P ' ' -W single -l port 3306
interface: eth1 (192.168.42.0/255.255.255.0)
filter: ( port 3306 ) and ((ip || ip6) || (vlan && (ip || ip6)))

host$ mysql -uexternal -p -h192.168.42.17
mysql57> select 'encrypted';


T 192.168.42.1:36781 -> 192.168.42.17:3306 [AP]     @    F   l   d iVr  H   b ^    s t Z      ( 2d   " ?  |   )
#
T 192.168.42.17:3306 -> 192.168.42.1:36781 [AP]     p%  s`   3u5!%P]   v=  r # x   E   a y  '!    )Z    8   Js  z.  \t   (r H@     0 2 5k\    <   M  @)E& b q|q@    h

SSL System Variables (5.7 default)

A new MySQL 5.7 installation will have SSL enabled by default as seen in the MySQL variables.

mysql57 > SHOW GLOBAL VARIABLES LIKE '%ssl%';
+---------------+-----------------+
| Variable_name | Value           |
+---------------+-----------------+
| have_openssl  | YES             |
| have_ssl      | YES             |
| ssl_ca        | ca.pem          |
| ssl_capath    |                 |
| ssl_cert      | server-cert.pem |
| ssl_cipher    |                 |
| ssl_crl       |                 |
| ssl_crlpath   |                 |
| ssl_key       | server-key.pem  |
+---------------+-----------------+
9 rows in set (0.00 sec)

-no-ssl Client connection traffic (5.7)

If you want to emulate the unencrypted nature of MySQL 5.6 within any SSL enabled MySQL version (e.g. MySQL 5.7) you specify the --ssl option for mysql client connections. In MySQL 5.7 this option is also deprecated and --ssl-mode=disabled should be used>.

host$ > mysql -uexternal -p -h192.168.42.17 --ssl=0

host >select '-ssl=0 unencrypted';

T 192.168.42.1:36785 -> 192.168.42.17:3306 [AP]      select '-ssl=0 unencrypted'
#
T 192.168.42.17:3306 -> 192.168.42.1:36785 [AP]      '    def    -ssl=0 unencrypted  ! 3              -ssl=0 unencrypted

References

https://wiki.christophchamp.com/index.php?title=Ngrep
http://infoheap.com/ngrep-quick-start-guide/
Encrypted Connections - MySQL 5.7 Reference Manual
Implementing MySQL Security Features - Tutorial at Percona Live Europe 2017.

Testing MySQL/MariaDB/Percona versions with Docker

Giuseppe Maxia has provided some great MySQL docker images. Percona and MariaDB also provide version via Docker Hub. In an attempt to have a consistent means of launching these different images I created the following convenience functions.

  1. Install docker for your OS. See Official Docker installation instructions.
  2. Get dockerhelper.sh
  3. Run your desired variant and version.

$ wget https://raw.githubusercontent.com/ronaldbradford/mysql-docker-minimal/master/dockerhelper.sh
$ . ./dockerhelper.sh
Docker Registered functions are:  docker_mysql, docker_percona, docker_mariadb

$ docker_mysql
ERROR: Specify a MySQL version to launch. Valid versions are 5.0 5.1 5.5 5.6 5.7 8.0
$ docker_percona
ERROR: Specify a Percona version to launch. Valid versions are 5.5 5.6 5.7
$ docker_mariadb
ERROR: Specify a MariaDB version to launch. Valid versions are 5.5 10.0 10.1

Q: Does MySQL support ACID? A: Yes

I was recently asked this question by an experienced academic at the NY Oracle Users Group event I presented at.

Does MySQL support ACID? (ACID is a set of properties essential for a relational database to perform transactions, i.e. a discrete unit of work.)

Yes, MySQL fully supports ACID, that is Atomicity, Consistency, Isolation and Duration. (*)

This is contrary to the first Google response found searching this question which for reference states “The standard table handler for MySQL is not ACID compliant because it doesn’t support consistency, isolation, or durability”.

The question is however not a simple Yes/No because it depends on timing within the MySQL product’s lifecycle and the version/configuration used in deployment. What is also *painfully* necessary is to understand why this question would even be asked of the most popular open source relational database.

MySQL has a unique characteristic of supporting multiple storage engines. These engines enabling varying ways of storing and retrieving data via the SQL interface in MySQL and have varying features for supporting transactions, locking, index strategies, compression etc. The problem is that the default storage engine from version 3.23 (1999) to 5.1 (2010) was MyISAM, a non-transactional engine, and hence the first point of confusion.

The InnoDB storage engine has been included and supported from MySQL 3.23. This is a transactional engine supporting ACID properties. However, not all of the default settings in the various MySQL versions have fully meet all ACID needs, specifically the durability of data. This is the second point of confusion. Overtime other transactional storage engines in MySQL have come and gone. InnoDB has been there since the start so there is no excuse to not write applications to fully support transactions. The custodianship of Oracle Corporation starting in 2010 quickly corrected this *flaw* by ensuring the default storage engine in MySQL 5.5 is InnoDB. But the damage to the ecosystem that uses MySQL, that is many thousands of open source projects, and the resources that work with MySQL has been done. Recently working on a MySQL 5.5 production system in 2016, the default engine was specifically defined in the configuration defined as MyISAM, and some (but not all tables) were defined using MyISAM. This is a further conversation as to why, is this a upgrade problem? Are there legacy dependencies with applications? Are the decision makers and developers simply not aware of the configuration? Or, are developers simply not comfortable with transactions?

Like other anti-reasonable MySQL defaults the unaware administrator or developer could consider MySQL as supporting ACID properties, however until detailed testing with concurrency and error conditions not realize the impact of poor configuration settings.

The damage of having a non-transactional storage engine as the default for over a decade has created a generation of professionals and applications that abuses one of the primary usages of a relational database, that is a transaction, i.e. to product a unit for work that is all or nothing. Popular open source projects such as WordPress, Drupal and hundreds more have for a long time not supported transactions or used InnoDB. Mediawiki was at least one popular open source project that was proactive towards InnoDB and transaction usage. The millions of plugins, products and startups that build on these technologies have the same flaws.

Further confusion arises when an application uses InnoDB tables but does not use transactions, or the application abuses transactions, for example 3 different transactions that should really be 1.

While newer versions of MySQL 5.6 and 5.7 improve default configurations, until these versions a more commonly implemented non-transactional use in a relational database will remain. A recent Effective MySQL NYC Meetup survey showed that installations of version 5.0 still exist, and that few have a policy for a regular upgrade cadence.

Do you control your database outages?

Working with a client last week I noted in my analysis, “The mysql server was restarted on Thursday and so the [updated] my.cnf settings seems current”. This occurred between starting my analysis on Wednesday and delivering my findings on Friday.

# more /var/lib/mysql/ip-104-238-102-213.secureserver.net.err
160609 17:04:43 [Note] /usr/sbin/mysqld: Normal shutdown

The client however stated they did not restart MySQL and would not do that at 5pm which is still a high usage time of the production system. This is unfortunately not an uncommon finding, that a production system had an outage and that the client did not know about it and did not instigate this.

There are several common causes and the “DevOps” mindset for current production systems has made this worse.

  • You have managed hosting and they perform software updates with/without your knowledge. I have for example worked with several Rackspace customers and there would be an outage because Rackspace engineers decided to apply an upgrade at a time that suited them, not their customers.
  • You have chosen automate updates for your Operating System.
  • Your developers update the software when they like.
  • You are using a 3rd party product that is making an arbitrary decision.

In this case the breadcrumbs lead to the last option, that cPanel is performing this operation as hinted by the cPanel specific installed MySQL binaries.

$ rpm -qa | grep -i mysql
cpanel-perl-522-MySQL-Diff-0.43-1.cp1156.x86_64
MySQL55-devel-5.5.50-1.cp1156.x86_64
MySQL55-client-5.5.50-1.cp1156.x86_64
cpanel-perl-522-DBD-mysql-4.033-1.cp1156.x86_64
compat-MySQL50-shared-5.0.96-4.cp1136.x86_64
MySQL55-server-5.5.50-1.cp1156.x86_64
cpanel-mysql-libs-5.1.73-1.cp1156.x86_64
MySQL55-shared-5.5.50-1.cp1156.x86_64
MySQL55-test-5.5.50-1.cp1156.x86_64
compat-MySQL51-shared-5.1.73-1.cp1150.x86_64
cpanel-mysql-5.1.73-1.cp1156.x86_64

Also note that cPanel still uses MySQL 5.1 shared libraries.

So why did cPanel perform not one shutdown, but two in immediate succession? The first was 17 seconds, the second was 2 seconds. Not being experienced with cPanel I cannot offer an answer for this shutdown occurance. I can for others which I will detail later.

160609 17:04:24 [Note] /usr/sbin/mysqld: Normal shutdown
...
160609 17:04:28 [Note] /usr/sbin/mysqld: Shutdown complete
...
160609 17:04:41 [Note] /usr/sbin/mysqld: ready for connections.
...
160609 17:04:43 [Note] /usr/sbin/mysqld: Normal shutdown
...
160609 17:04:45 [Note] /usr/sbin/mysqld: ready for connections.
...

And why did the customer not know about the outage? If you use popular SaaS monitoring solutions such as New Relic and Pingdom you would not have been informed because these products have a sampling time of 60 seconds. I use these products along with Nagios on my personal blog site as they provide adequate instrumentation based on the frequency of usage. I would not recommend these tools as the only tools used in a high volume production system simply because of this one reason. In high volume system you need sampling are much finer granularity.

So just when you were going to justify that 17 seconds while unexpected is tolerable, I want to point out that this subsequently occurred and the outage was over 4 minutes.

160619 11:58:07 [Note] /usr/sbin/mysqld: Normal shutdown
...
160619 12:02:26 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
...

An analysis of the MySQL error log which is correctly not rolled as I always recommend showed a pattern of regular MySQL updates, from 5.5.37 thru 5.5.50. This is the most likely reason a 3rd party product has performed a database outage, to perform a software update at their choosing, not yours.

150316  3:54:11 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.37-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

150316  3:54:22 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.37-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

150316 19:07:31 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.37-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

150317  2:05:45 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.40-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

150317  2:05:54 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.40-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

150319  1:17:26 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.42-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

150319  1:17:34 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.42-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

150616  1:39:44 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.42-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

150616  1:39:52 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.42-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

151006  1:01:45 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.45-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

151006  1:01:54 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.45-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

151027  1:21:12 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.46-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160105  1:31:35 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.47-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160211  1:52:47 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.48-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160211  1:52:55 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.48-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160503  1:14:59 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.49-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160503  1:15:03 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.49-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160521 18:46:24 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.49-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160522 11:51:45 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.49-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160529 15:26:41 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.49-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160529 15:30:12 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.49-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160604 23:29:15 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.49-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160609 17:04:41 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.50-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160609 17:04:45 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.50-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)

160619 12:21:58 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.50-cll'  socket: '/var/lib/mysql/mysql.sock'  port: 0  MySQL Community Server (GPL)

What is intriguing from this analysis is that several versions were skipped including .38, .39, .41, .43, .44. One may ask the question why?

For Clients

This leads to several questions of the strategy used in your organization for controlling outages of your MySQL infrastructure for upgrades or for other reasons.

  • What is an acceptable outage time?
  • What is the acceptable maintenance window to perform outages?
  • What is your release cadence for MySQL upgrades?
  • Who or what performs updates?
  • Can your monitoring detect small outages?

You should also consider in your business strategy having a highly available (HA) MySQL infrastructure to avoid any outage, or application intelligence to support varying levels of data access as I describe in Successful MySQL Scalability Principles.

Utilizing OpenStack Trove DBaaS for deployment management

Trove is used for self service provisioning and lifecycle management for relational and non-relational databases in an OpenStack cloud. Trove provides a RESTful API interface that is same regardless of the type of database. CLI tools and a web UI via Horizon are also provided wrapping Trove API requests.

In simple terms. You are a MySQL shop. You run a replication environment with daily backups and failover capabilities which you test and verify regularly. You have defined DBA and user credentials ACL’s across dev, test and prod environments. Now there is a request for using MongoDB or Cassandra, the engineering department has not decided but they want to evaluate the capabilities. How long as a operator does it take to acquire the software, install, configure, setup replication, backups, ACLs and enable the engineering department to evaluate the products?

With Trove DBaaS this complexity is eliminated due to a consistent interface to perform the provisioning, configuration, HA, B&R, ACL across other products the exact same way you perform these tasks for MySQL. This enables operations to be very proactive to changing technology requests supporting digital transformation strategies.

Enabling this capability is not an automatic approval of a new technology stack. It is important that strategic planning, support and management is included in the business strategy to understanding the DBaaS capability for your organization. Examples of operations due diligence would include how to integrate these products into your monitoring, logging and alerting systems. Determine what additional disk storage requirements may be needed. Test, verify and time recovery strategies.

Trove specifically leverages several other OpenStack services for source image and instance management. Each trove guest image includes a base operating system, the applicable database software and a database technology specific trove guest agent. This agent is the intelligence that knows the specific syntax and version needs to perform the tasks. The agent is also the communication mechanism between Trove and the running nova instance.

Trove is a total solution manager for the instance running your chosen database. Instances have no ssh, telnet or other general access. The only access is via the SQL communication via the defined ports, e.g. 3306 for MySQL.

The Trove lifecycle management covers the provisioning, management, security, configuration and tuning of your database. Amrith Kumar in a recent presentation at the NYC Postgres meetup provides a good description of the specifics.

Trove is capable of describing and supporting clustering and replication topologies for the various data stores. It can support backup and restore, failover and resizing of clusters without the operator needing to know the specific syntax of complexities of a database product they are unfamiliar with.

A great example is the subtle difference in MySQL replication management using GTID’s between MySQL and MariaDb. To the developer, the interaction between MySQL and MariaDB via SQL is the same, the management of a replication topology is not identical, but is managed by the Trove guest agent. To the operator, the management is the same.

Also in his presentation, Kumar described Tesora, an enterprise class Trove service provided with a number of important additional features. Tesora supports additional database products including Oracle and DB2Express as well as commercial versions for Oracle MySQL, EnterpriseDB, Couchbase, Datastax, and mongoDB. Using the Horizon UI customizations with pre-defined trove instances greatly reduces the work needed for operators and deployers to build there own.

Managing MySQL Version Upgrades Presentation

The following presentation was given at the Oracle Technology Network (OTN) Latin America 2015 tour events in Uruguay, Argentina, Chile and Peru.

In this presentation I talk about the various versions and means of installing and upgrading MySQL including:

  • MySQL version history from 3.23 to 5.7.8
  • Historical installation options
  • Recommended use of Oracle yum repository for current version
  • The installation and upgrade process, and errors that occur
  • Compatibility changes between MySQL 5.5 and MySQL 5.6 including
    • Reserved words (and their true impact)
    • Legacy TIMESTAMP usage
    • FULLTEXT indexes
    • The query optimizer
    • Clear text password warnings and security improvements
  • Important configuration differences
  • Other recommendations
  • The use of replication

Testing and Verifying your MySQL Backup Strategy Presentation

This past week I have been the sole MySQL representative on the Oracle Technology Network (OTN) Latin America 2015 tour events in Uruguay, Argentina, Chile and Peru.

In this presentation I talk about the important steps for testing and verifying your MySQL backup strategy to ensure your business continuity in any disaster recovery situation. This includes:

  • Overview of the primary product options
  • Backup and recovery strategy considerations
  • Technical requirements
  • Common problems observed
  • What about a failover strategy

Updating MySQL on Ubuntu 12.04 LTS to MySQL 5.6

The Ubuntu 12.04.3 LTS release only provides MySQL 5.1 and MySQL 5.5 using the default Ubuntu package manager.

Oracle (owners of the MySQL(tm)) now provide Debian/Ubuntu APT repositories for all GA and DMR versions of MySQL including supporting Ubuntu 12.04.

The following steps demonstrate upgrading from the Ubuntu 5.5 server package to the Oracle 5.6 server package.

Verify MySQL Packages

$ apt-cache search mysql-server
mysql-server - MySQL database server (metapackage depending on the latest version)
mysql-server-5.5 - MySQL database server binaries and system database setup
mysql-server-core-5.5 - MySQL database server binaries
auth2db - Powerful and eye-candy IDS logger, log viewer and alert generator
cacti - Frontend to rrdtool for monitoring systems and services
torrentflux - web based, feature-rich BitTorrent download manager

Verify MySQL on Server

$  dpkg -l | grep mysql
ii  libdbd-mysql-perl                      4.020-1build2                                       Perl5 database interface to the MySQL database
ii  libmysqlclient-dev                     5.5.34-0ubuntu0.12.04.1                             MySQL database development files
ii  libmysqlclient18                       5.5.34-0ubuntu0.12.04.1                             MySQL database client library
ii  mysql-client-5.5                       5.5.31-0ubuntu0.12.04.1                             MySQL database client binaries
ii  mysql-client-core-5.5                  5.5.34-0ubuntu0.12.04.1                             MySQL database core client binaries
ii  mysql-common                           5.5.34-0ubuntu0.12.04.1                             MySQL database common files, e.g. /etc/mysql/my.cnf
ii  mysql-server-5.5                       5.5.31-0ubuntu0.12.04.1                             MySQL database server binaries and system database setup
ii  mysql-server-core-5.5                  5.5.31-0ubuntu0.12.04.1                             MySQL database server binaries
ii  php5-mysqlnd                           5.3.10-1ubuntu3.8                                   MySQL module for php5 (Native Driver)

Results may vary based on dependencies.

Checking the MySQL error log (as it’s the right good practice to always do)

$ sudo tail -50 /var/log/mysql/error.log
150402 16:02:49 [Note] Plugin 'FEDERATED' is disabled.
150402 16:02:49 InnoDB: The InnoDB memory heap is disabled
150402 16:02:49 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150402 16:02:49 InnoDB: Compressed tables use zlib 1.2.3.4
150402 16:02:49 InnoDB: Initializing buffer pool, size = 1.0G
150402 16:02:49 InnoDB: Completed initialization of buffer pool
150402 16:02:49 InnoDB: highest supported file format is Barracuda.
150402 16:02:49  InnoDB: Waiting for the background threads to start
150402 16:02:50 InnoDB: 5.5.31 started; log sequence number 20079278867
150402 16:02:50 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306
150402 16:02:50 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
150402 16:02:50 [Note] Server socket created on IP: '127.0.0.1'.
150402 16:02:50 [Note] Event Scheduler: Loaded 0 events
150402 16:02:50 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.31-0ubuntu0.12.04.1'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)

A check shows that this is not the most current version available of 5.5 using the Ubuntu packages.

$ apt-cache show mysql-server-5.5
Package: mysql-server-5.5
Priority: optional
Section: database
Installed-Size: 31947
Maintainer: Ubuntu Developers 
Original-Maintainer: Debian MySQL Maintainers 
Architecture: amd64
Source: mysql-5.5
Version: 5.5.41-0ubuntu0.12.04.1
...

Package: mysql-server-5.5
Status: install ok installed
Priority: optional
Section: database
Installed-Size: 31950
Maintainer: Ubuntu Developers 
Architecture: amd64
Source: mysql-5.5
Version: 5.5.31-0ubuntu0.12.04.1
...

Just to be consistent with keeping current versions, you may choose to update MySQL 5.5 to the current available version.

$ sudo apt-get install mysql-server-5.5
...

Installing Oracle APT Packaging

The recommended documented way to move to using the Oracle repo is:

cd /tmp
# See https://dev.mysql.com/downloads/repo/apt/ for your right distro version
wget https://dev.mysql.com/get/mysql-apt-config_0.3.3-2ubuntu12.04_all.deb
sudo dpkg -i mysql-apt-config*.deb

This unfortunately uses a cursors based interface which is not something you automate for production systems and not the approach I would suggest.

So doing what this does

echo "deb http://repo.mysql.com/apt/ubuntu/ precise mysql-apt-config
deb http://repo.mysql.com/apt/ubuntu/ precise mysql-5.6" | sudo tee /etc/apt/sources.list.d/mysql.list
curl -s http://ronaldbradford.com/mysql/mysql.gpg | sudo apt-key add -
sudo apt-get update

Now we can look at available versions.


$ apt-cache search mysql-server
mysql-server-5.5 - MySQL database server binaries and system database setup
mysql-server-core-5.5 - MySQL database server binaries
auth2db - Powerful and eye-candy IDS logger, log viewer and alert generator
cacti - Frontend to rrdtool for monitoring systems and services
torrentflux - web based, feature-rich BitTorrent download manager
mysql-community-server - MySQL Server
mysql-server - MySQL Server meta package depending on latest version

This is where life gets a little confusing. Because Ubuntu supported MySQL 5.1 (as mysql-server) and MySQL 5.5 (as mysql-server-5.5) it can be misleading.

$ apt-cache show mysql-server
$ apt-cache show mysql-server
Package: mysql-server
Source: mysql-community
Version: 5.6.23-1ubuntu12.04
Architecture: amd64
Maintainer: MySQL Release Engineering 
Installed-Size: 46
Depends: mysql-community-server (= 5.6.23-1ubuntu12.04)
Homepage: http://www.mysql.com/
Priority: optional
Section: database
Filename: pool/mysql-5.6/m/mysql-community/mysql-server_5.6.23-1ubuntu12.04_amd64.deb
Size: 11644
SHA256: 1cb166cd230d2a4daca761ea80f2f34ee1fc0c92aaae972c914d81746f235d63
SHA1: 63548c852d5faeda751fbf038c0799fbbeac9905
MD5sum: da2f709a29a7cac97c834e6e69929891
Description: MySQL Server meta package depending on latest version
 The MySQL(TM) software delivers a very fast, multi-threaded, multi-user,
 and robust SQL (Structured Query Language) database server. MySQL Server
 is intended for mission-critical, heavy-load production systems as well
 as for embedding into mass-deployed software. MySQL is a trademark of
 Oracle. This is a meta package that depends on the latest mysql server
 package available in the repository.

Package: mysql-server
Priority: optional
Section: database
Installed-Size: 114
Maintainer: Ubuntu Developers 
Original-Maintainer: Debian MySQL Maintainers 
Architecture: all
Source: mysql-5.5
Version: 5.5.41-0ubuntu0.12.04.1
...

We are looking to ensure the Maintainer is the Official Release.

Upgrading to MySQL 5.6

sudo service mysql stop
ps -ef | grep mysql
sudo apt-get install -y mysql-server
$ sudo apt-get install -y mysql-server
...
The following extra packages will be installed:
  mysql-client mysql-common mysql-community-client mysql-community-server
The following packages will be REMOVED:
  mysql-client-5.5 mysql-client-core-5.5 mysql-server-5.5 mysql-server-core-5.5
The following NEW packages will be installed:
  mysql-client mysql-community-client mysql-community-server mysql-server
The following packages will be upgraded:
  mysql-common
...
Configuration file `/etc/mysql/my.cnf'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** my.cnf (Y/I/N/O/D/Z) [default=N] ? N
...
Installing new version of config file /etc/apparmor.d/usr.sbin.mysqld ...
2015-04-02 16:53:07 0 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
2015-04-02 16:53:07 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
150402 16:53:14 mysqld_safe Can't log to error log and syslog at the same time.  Remove all --log-error configuration options for --syslog to take effect
...

You may think the process is completed, but it is not. Always, Always check the error log. Have you checked your MySQL error log today?

$ sudo tail -300 /var/log/mysql/error.log
...
2015-04-02 16:53:14 20429 [Warning] Buffered warning: Changed limits: table_cache: 431 (requested 2000)

2015-04-02 16:53:14 20429 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
2015-04-02 16:53:14 20429 [Note] Plugin 'FEDERATED' is disabled.
2015-04-02 16:53:14 20429 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-04-02 16:53:14 20429 [Note] InnoDB: The InnoDB memory heap is disabled
2015-04-02 16:53:14 20429 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-04-02 16:53:14 20429 [Note] InnoDB: Memory barrier is not used
2015-04-02 16:53:14 20429 [Note] InnoDB: Compressed tables use zlib 1.2.3.4
2015-04-02 16:53:14 20429 [Note] InnoDB: Using Linux native AIO
2015-04-02 16:53:14 20429 [Note] InnoDB: Not using CPU crc32 instructions
2015-04-02 16:53:14 20429 [Note] InnoDB: Initializing buffer pool, size = 1.0G
2015-04-02 16:53:15 20429 [Note] InnoDB: Completed initialization of buffer pool
2015-04-02 16:53:15 20429 [Note] InnoDB: Highest supported file format is Barracuda.
2015-04-02 16:53:15 20429 [Note] InnoDB: 128 rollback segment(s) are active.
2015-04-02 16:53:15 20429 [Note] InnoDB: Waiting for purge to start
2015-04-02 16:53:15 20429 [Note] InnoDB: 5.6.23 started; log sequence number 20079286519
2015-04-02 16:53:15 20429 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 487fda28-d97a-11e4-9254-e0cb4e3feb73.
2015-04-02 16:53:15 20429 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306
2015-04-02 16:53:15 20429 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2015-04-02 16:53:15 20429 [Note] Server socket created on IP: '127.0.0.1'.
2015-04-02 16:53:15 20429 [ERROR] Column count of mysql.events_waits_current is wrong. Expected 19, found 16. Created with MySQL 50541, now running 50623. Please use mysql_upgrade to fix this error.
2015-04-02 16:53:15 20429 [ERROR] Column count of mysql.events_waits_history is wrong. Expected 19, found 16. Created with MySQL 50541, now running 50623. Please use mysql_upgrade to fix this error.
2015-04-02 16:53:15 20429 [ERROR] Column count of mysql.events_waits_history_long is wrong. Expected 19, found 16. Created with MySQL 50541, now running 50623. Please use mysql_upgrade to fix this error.
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'events_waits_summary_by_host_by_event_name' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Incorrect definition of table performance_schema.events_waits_summary_by_thread_by_event_name: expected column 'THREAD_ID' at position 0 to have type bigint(20), found type int(11).
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'events_waits_summary_by_user_by_event_name' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'events_waits_summary_by_account_by_event_name' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Column count of mysql.file_summary_by_event_name is wrong. Expected 23, found 5. Created with MySQL 50541, now running 50623. Please use mysql_upgrade to fix this error.
2015-04-02 16:53:15 20429 [ERROR] Column count of mysql.file_summary_by_instance is wrong. Expected 25, found 6. Created with MySQL 50541, now running 50623. Please use mysql_upgrade to fix this error.
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'host_cache' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Incorrect definition of table performance_schema.mutex_instances: expected column 'LOCKED_BY_THREAD_ID' at position 2 to have type bigint(20), found type int(11).
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'objects_summary_global_by_type' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Incorrect definition of table performance_schema.rwlock_instances: expected column 'WRITE_LOCKED_BY_THREAD_ID' at position 2 to have type bigint(20), found type int(11).
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'setup_actors' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'setup_objects' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'table_io_waits_summary_by_index_usage' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'table_io_waits_summary_by_table' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'table_lock_waits_summary_by_table' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Column count of mysql.threads is wrong. Expected 14, found 3. Created with MySQL 50541, now running 50623. Please use mysql_upgrade to fix this error.
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'events_stages_current' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'events_stages_history' has the wrong structure
2015-04-02 16:53:15 20429 [ERROR] Native table 'performance_schema'.'events_stages_history_long' has the wrong structure
...

Completing the MySQL 5.6 Upgrade

A MySQL upgrade of the meta schema is necessary.

$ sudo mysql_upgrade -uroot -p
Enter password:
Looking for 'mysql' as: mysql
Looking for 'mysqlcheck' as: mysqlcheck
Running 'mysqlcheck' with connection arguments: '--port=3306' '--socket=/var/run/mysqld/mysqld.sock'
Warning: Using a password on the command line interface can be insecure.
Running 'mysqlcheck' with connection arguments: '--port=3306' '--socket=/var/run/mysqld/mysqld.sock'
Warning: Using a password on the command line interface can be insecure.
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.event                                        OK
mysql.func                                         OK
mysql.general_log                                  OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.host                                         OK
mysql.ndb_binlog_index                             OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.servers                                      OK
mysql.slow_log                                     OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.user                                         OK
Running 'mysql_fix_privilege_tables'...
Warning: Using a password on the command line interface can be insecure.
Running 'mysqlcheck' with connection arguments: '--port=3306' '--socket=/var/run/mysqld/mysqld.sock'
Warning: Using a password on the command line interface can be insecure.
Running 'mysqlcheck' with connection arguments: '--port=3306' '--socket=/var/run/mysqld/mysqld.sock'
Warning: Using a password on the command line interface can be insecure.
...
OK
$ sudo service mysql restart
$ sudo service mysql restart
 * Stopping MySQL Community Server 5.6.23
....
 * MySQL Community Server 5.6.23 is stopped
 * Re-starting MySQL Community Server 5.6.23
150402 17:06:17 mysqld_safe Can't log to error log and syslog at the same time.  Remove all --log-error configuration options for --syslog to take effect.
......
 * MySQL Community Server 5.6.23 is started
$ sudo tail -300 /var/log/mysql/error.log
...
2015-04-02 17:06:15 20429 [Note] /usr/sbin/mysqld: Shutdown complete

150402 17:06:15 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
150402 17:06:17 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2015-04-02 17:06:17 0 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
2015-04-02 17:06:17 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-04-02 17:06:17 20994 [Warning] Buffered warning: Changed limits: max_open_files: 1024 (requested 5000)

2015-04-02 17:06:17 20994 [Warning] Buffered warning: Changed limits: table_cache: 431 (requested 2000)

2015-04-02 17:06:17 20994 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
2015-04-02 17:06:17 20994 [Note] Plugin 'FEDERATED' is disabled.
2015-04-02 17:06:17 20994 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-04-02 17:06:17 20994 [Note] InnoDB: The InnoDB memory heap is disabled
2015-04-02 17:06:17 20994 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-04-02 17:06:17 20994 [Note] InnoDB: Memory barrier is not used
2015-04-02 17:06:17 20994 [Note] InnoDB: Compressed tables use zlib 1.2.3.4
2015-04-02 17:06:17 20994 [Note] InnoDB: Using Linux native AIO
2015-04-02 17:06:17 20994 [Note] InnoDB: Not using CPU crc32 instructions
2015-04-02 17:06:17 20994 [Note] InnoDB: Initializing buffer pool, size = 1.0G
2015-04-02 17:06:17 20994 [Note] InnoDB: Completed initialization of buffer pool
2015-04-02 17:06:17 20994 [Note] InnoDB: Highest supported file format is Barracuda.
2015-04-02 17:06:17 20994 [Note] InnoDB: 128 rollback segment(s) are active.
2015-04-02 17:06:17 20994 [Note] InnoDB: Waiting for purge to start
2015-04-02 17:06:17 20994 [Note] InnoDB: 5.6.23 started; log sequence number 20081020877
2015-04-02 17:06:17 20994 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306
2015-04-02 17:06:17 20994 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2015-04-02 17:06:17 20994 [Note] Server socket created on IP: '127.0.0.1'.
2015-04-02 17:06:17 20994 [Note] Event Scheduler: Loaded 0 events
2015-04-02 17:06:17 20994 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.23'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)

Correcting errors

As you can see there are several warnings/errors when starting MySQL.

The first is

 mysqld_safe Can't log to error log and syslog at the same time.  Remove all --log-error configuration options for --syslog to take effect.

We solve this with

$ sudo rm -f /etc/mysql/conf.d/mysqld_safe_syslog.cnf
$ sudo service mysql restart
 * Stopping MySQL Community Server 5.6.23
....
 * MySQL Community Server 5.6.23 is stopped
 * Re-starting MySQL Community Server 5.6.23
......
 * MySQL Community Server 5.6.23 is started

This is an Ubuntu default that conflicts with the my.cnf log_error were are familiar with in monitoring the MySQL error log. You can read my opinion on this in The correct approach to rolling MySQL logs

The second is

2015-04-02 17:06:17 0 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.

We solve this with

sudo sed -ie "s/^key_buffer[^_]/key_buffer_size/" /etc/mysql/my.cnf

Next

2015-04-02 17:18:06 22123 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.

We solve this with

sudo sed -ie "s/^myisam-recover[^-]/myisam-recover-options/" /etc/mysql/my.cnf

The warnings are interesting, and will part of the following post on MySQL 5.6 configuration changes discussed in the next point.

2015-04-02 17:22:08 22626 [Warning] Buffered warning: Changed limits: max_open_files: 1024 (requested 5000)
2015-04-02 17:22:08 22626 [Warning] Buffered warning: Changed limits: table_cache: 431 (requested 2000)

Leveraging MySQL 5.6 benefits

We may now have a MySQL 5.6 installation however we are far from utilizing the benefits of MySQL 5.6 fully. In a subsequent post I will talk about the configuration options we need to now consider, both new options such as innodb_purge_threads and important improvements such as sync_binlog. There are far greater complex changes including innodb_file_per_table, master_info_repository and relay_log_info_repository and then changes in defaults such as performance_schema

Validating MySQL version numbers

As part of a MySQL 5.5 to MySQL 5.6 upgrade across several Ubuntu servers of varying distros an audit highlighted a trivial but interesting versioning identification error in Ubuntu’s packaging of MySQL.

Ubuntu 12.04 LTS

$ sudo dpkg -l | grep mysql-server-5.5
ii  mysql-server-5.5   5.5.41-0ubuntu0.12.04.1  ...
$ mysql -uroot -p -e "SELECT VERSION()"
+-------------------------+
| VERSION()               |
+-------------------------+
| 5.5.41-0ubuntu0.12.04.1 |
+-------------------------+

But when you look at the mysql --version it does NOT say 5.5.41.

$ mysql --version
mysql  Ver 14.14 Distrib 5.5.34, for debian-linux-gnu (x86_64) using readline 6.2

Ubuntu 14.04 LTS

On 14.04 I get expected results.

$ sudo dpkg -l | grep mysql-server-5.5
ii  mysql-server-5.5       5.5.41-0ubuntu0.14.04.1   ...
rbradfor@rubble:~$ mysql -uroot -p -e "SELECT VERSION()"
+-------------------------+
| VERSION()               |
+-------------------------+
| 5.5.41-0ubuntu0.14.04.1 |
+-------------------------+
$ mysql --version
mysql  Ver 14.14 Distrib 5.5.41, for debian-linux-gnu (x86_64) using readline 6.3

Dynamic recreation of InnoDB redo logs

MySQL 5.6 will now automatically recreate the InnoDB redo log files during a MySQL restart if the size (or number) of these logs changes, i.e. a change to innodb_log_file_size. See Changing the Number or Size of InnoDB Log Files which states “If InnoDB detects that the innodb_log_file_size differs from the redo log file size, it will write a log checkpoint, close and remove the old log files, create new log files at the requested size, and open the new log files.”

Before MySQL 5.6 it was necessary to stop MySQL and remove the InnoDB log files manually before restarting MySQL.

The error log shows:

tail -f /mysql/log/error.log
...
2015-03-28 21:51:25 3767 [Warning] InnoDB: Resizing redo log from 2*3072 to 2*65536 pages, LSN=1626017
2015-03-28 21:51:25 3767 [Warning] InnoDB: Starting to delete and rewrite log files.
2015-03-28 21:51:25 3767 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1024 MB
InnoDB: Progress in MB: 100 200 300 400 500 600 700 800 900 1000
2015-03-28 21:51:28 3767 [Note] InnoDB: Setting log file ./ib_logfile1 size to 1024 MB
InnoDB: Progress in MB: 100 200 300 400 500 600 700 800 900 1000
2015-03-28 21:51:31 3767 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2015-03-28 21:51:31 3767 [Warning] InnoDB: New log files created, LSN=1626017
...

It was however odd that MySQL had indicated it had successfully started but the underlying Redo log files were not complete and in-place as seen by the following directory listings.

$ sudo service mysql start
......
 * MySQL Community Server 5.6.23 is started
$
$ ls -lh /var/lib/mysql/
total 1.9G
-rw-rw---- 1 mysql mysql   56 Mar 28 19:42 auto.cnf
-rw-rw---- 1 mysql mysql  12M Mar 28 21:51 ibdata1
-rw-rw---- 1 mysql mysql 902M Mar 28 21:51 ib_logfile1
-rw-rw---- 1 mysql mysql 1.0G Mar 28 21:51 ib_logfile101
drwxr-x--- 2 mysql mysql 4.0K Mar 28 19:42 mysql
drwx------ 2 mysql mysql 4.0K Mar 28 19:42 performance_schema
$ ls -lh /var/lib/mysql/
total 2.1G
-rw-rw---- 1 mysql mysql   56 Mar 28 19:42 auto.cnf
-rw-rw---- 1 mysql mysql  12M Mar 28 21:51 ibdata1
    -rw-rw---- 1 mysql mysql 1.0G Mar 28 21:51 ib_logfile0 -rw-rw---- 1 mysql mysql 1.0G Mar 28 21:51 ib_logfile1 drwxr-x--- 2 mysql mysql 4.0K Mar 28 19:42 mysql drwx------ 2 mysql mysql 4.0K Mar 28 19:42 performance_schema

SQL, ANSI Standards, PostgreSQL and MySQL

I have recently been working with the Donors Choose Open Data Set which happens to be in PostgreSQL. Easy enough to install and load the data in PostgreSQL, however as I live and breath MySQL, lets load the data into MySQL.

And here is where start our discussion, first some history.

SQL History

SQL – Structure Query Language is a well known common language for communicating with Relational Databases (RDBMS). It is not the only language I might add, having both used many years ago and just mentioned QUEL at a Looker Look and Tell event in New York. It has also been around since the 1970s making it; along with C; one of oldest in general use programming languages today.

SQL became an ANSI standard in 1986, and an ISO standard in 1987. The purpose of a standard is to provide commonality when communicating or exchanging information; in our case; a programming language communicating with a RDBMS. There have been several iterations of the standard as functionality and syntax improves. These are commonly referred to as SQL-86, SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008 and SQL:2011.

And so, with SQL being a standard it means that what we can do in PostgreSQL should translate to what we can do in MySQL.

SQL Communication

Both products provide a Command Line Interface (CLI) client tool for SQL communication, mysql for MySQL and psql for PostgreSQL. No surprises there. Both use by default the semicolon ; as a SQL statement terminator, and both CLI tools use \q as a means to quit and exit the tool. Certainly not a standard but great for syntax compatibility.

DDL Specification

Our journey begins with defining tables.

DROP TABLE

Both products SQL syntax support DROP TABLE. Infact, both support the DROP TABLE [IF EXISTS] syntax.

DROP TABLE donorschoose_projects;
DROP TABLE IF EXISTS donorschoose_projects;

CREATE TABLE

Both support CREATE TABLE.

Both support defining columns in the typical format <column_name> <datatype>, and both support the NOT NULL attribute. Talking about specific datatypes for columns is a topic on its own and so I discuss this later.

The PostgreSQL syntax was a table option WITHOUT OIDS which is not valid in MySQL. It is also obsolescent syntax in PostgreSQL 9.3. From the PostgreSQL manual “This optional clause specifies whether rows of the new table should have OIDs (object identifiers) assigned to them. The default is to have OIDs. Specifying WITHOUT OIDS allows the user to suppress generation of OIDs for rows of a table. This may be worthwhile for large tables … Specifying WITHOUT OIDS also reduces the space required to store the table on disk by 4 bytes per row of the table, thereby improving performance.”

In this example as this is just for testing, dropping the WITHOUT OIDS syntax creates a mutually compatible syntax.

Comments

Both MySQL and PostgreSQL support -- as an inline comment in an SQL statement. No need to strip those out.

ALTER TABLE

Both support ALTER TABLE ADD CONSTRAINT syntax which in our example is used to define the PRIMARY KEY, however while the syntax remains the same, the choice of datatype affects the outcome.

The following works in both products when the datatype is CHARACTER(32). More about CHARACTER() later.

ALTER TABLE donorschoose_projects ADD CONSTRAINT pk_donorschoose_projects PRIMARY KEY(_projectid);

In our example dataset, the primary key is defined with a TEXT datatype, and in MySQL this fails.

ERROR 1170 (42000): BLOB/TEXT column '_projectid' used in key specification without a key length

As the data in the dataset for primary keys by further analysis is indeed a 32 byte hexadecimal value, this is changed to CHARACTER(32) to be compatible for this data loading need. This however is an important key difference in any migration process with other data sets.

Side Note

Both products support the definition of the PRIMARY KEY in the CREATE TABLE syntax two different ways.

CREATE TABLE demo_pk1 (id character(32) NOT NULL PRIMARY KEY);
CREATE TABLE demo_pk2 (id character(32) NOT NULL, PRIMARY KEY(id));

CREATE INDEX

Both use CREATE INDEX syntax however with our sample dataset, this is the first observed difference in syntax with provided sample SQL statements.

PostgresSQL

CREATE INDEX projects_schoolid ON projects USING btree (_schoolid);

MySQL
The USING <type> qualifier must appear before the ON <table>.

CREATE INDEX USING btree projects_schoolid ON projects (_schoolid);

In both products USING btree is an optional syntax (for minimum compatibility) purposes so removing this provides a consistency.

Data Types

The following data types are defined in the PostgreSQL example data set. Each is discussed to identify a best fit in MySQL. For reference:

character

This data type is for a fixed width character field and requires a length attribute. MySQL supports CHARACTER(n) syntax for compatibility, however generally CHAR(n) is the preferred syntax. Indeed, PostgreSQL also supports CHAR(n).

The following showing both variants is valid in both products.

CREATE TABLE demo_character(c1 CHARACTER(1), c2 CHAR(1));

varchar/character varying

While this dataset does not use these datatypes, they are critical in the general conservations of character (aka string) types. This refers to a variable length string.

While character varying is not a valid MySQL syntax, varchar is compatible with both products.

CREATE TABLE demo_varchar(vc1 VARCHAR(10));

text

In PostgresSQL, text is used for variables of undefined length. The maximum length of a field is 1GB as stated in the FAQ.

In MySQL however TEXT only stores 2^16 characters (64K). The use of LONGTEXT is needed to support the full length capacity in PostgeSQL. This store 2^32 characters (~4GB).

Of all the complexity of this example dataset, the general use of text will be the most difficult to modify to a more applicable VARCHAR or TEXT datatype when optimizing in MySQL.

integer

PostgreSQL uses the integer datatype for a signed 4 byte integer value. MySQL supports the same syntax, however generally prefers to refer to the shorter INT syntax. Both products support both overall.

mysql> CREATE TABLE demo_integer(i1 INTEGER, i2 INT);
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO demo_integer VALUES (1,-1);
Query OK, 1 row affected (0.05 sec)

mysql> SELECT * FROM demo_integer;
+------+------+
| i1   | i2   |
+------+------+
|    1 |   -1 |
+------+------+
1 row in set (0.00 sec)
demo=# CREATE TABLE demo_integer(i1 INTEGER, i2 INT);
CREATE TABLE
demo=# INSERT INTO demo_integer VALUES (1,-1);
INSERT 0 1
demo=# SELECT * FROM demo_integer;
 i1 | i2
----+----
  1 | -1
(1 row)

And just to note the boundary of this data type.

mysql> TRUNCATE TABLE demo_integer;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO demo_integer VALUES (2147483647, -2147483648);
Query OK, 1 row affected (0.04 sec)

mysql> SELECT * FROM demo_integer;
+------------+-------------+
| i1         | i2          |
+------------+-------------+
| 2147483647 | -2147483648 |
+------------+-------------+
1 row in set (0.00 sec)
demo=# TRUNCATE TABLE demo_integer;
TRUNCATE TABLE

demo=# INSERT INTO demo_integer VALUES (2147483647, -2147483648);
INSERT 0 1
demo=# SELECT * FROM demo_integer;
     i1     |     i2
------------+-------------
 2147483647 | -2147483648
(1 row)

The difference is in out-of-bounds value management, and here MySQL defaults suck. You can read my views at DP#4 The importance of using sql_mode.

demo=# TRUNCATE TABLE demo_integer;
TRUNCATE TABLE
demo=# INSERT INTO demo_integer VALUES (2147483647 + 1, -2147483648 - 1);
ERROR:  integer out of range
demo=# SELECT * FROM demo_integer;
 i1 | i2
----+----
(0 rows)
mysql> TRUNCATE TABLE demo_integer;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO demo_integer VALUES (2147483647 + 1, -2147483648 - 1);
Query OK, 1 row affected, 2 warnings (0.07 sec)

mysql> SELECT * from demo_integer;
+------------+-------------+
| i1         | i2          |
+------------+-------------+
| 2147483647 | -2147483648 |
+------------+-------------+
1 row in set (0.00 sec)

While not in this dataset, both support the bigint data type. While the PostgreSQL docs indicate bigint is 8 bytes, testing with PostgresSQL 9.3 failed. Something to investigate more later.

demo=# CREATE TABLE demo_bigint(i1 BIGINT);
CREATE TABLE
demo=# INSERT INTO demo_bigint VALUES (2147483647 + 1), (-2147483648 - 1);
ERROR:  integer out of range
mysql> CREATE TABLE demo_bigint(i1 BIGINT);
Query OK, 0 rows affected (0.12 sec)

mysql> INSERT INTO demo_bigint VALUES (2147483647 + 1), (-2147483648 - 1);
Query OK, 2 rows affected (0.04 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * from demo_bigint;
+-------------+
| i1          |
+-------------+
|  2147483648 |
| -2147483649 |
+-------------+
2 rows in set (0.01 sec)

And for reference, both products support smallint, a 2-byte integer.

Each product has additional integer data types.

numeric

For a fixed-precision number, PostgreSQL uses numeric but supports decimal.It would not be surprising to know that MySQL uses DECIMAL and for compatibility supports NUMERIC.

This leads to a side-bar discussion on knowing your data-types for your product. In a recent interview for a MySQL Engineer, a candidate (with SQL Server experience) provided a code example defining the NUMERIC datatype. I knew it was technically valid in MySQL syntax, but never actually seen this in use. When I asked the candidate for what was the syntax commonly used for a fixed-precision datatype they were unable to answer.

real/double precision

This dataset does not include these data types, however for reference, PostgresSQL uses real for 4 bytes, and double precision for 8 bytes. MySQL uses float for 4 bytes, and double for 8 bytes. MySQL however supports both PostgreSQL syntax options, however PostgreSQL supports float, but not double.

demo=# CREATE TABLE demo_floatingpoint(f1 FLOAT, f2 REAL, d1 DOUBLE, d2 DOUBLE PRECISION);
ERROR:  type "double" does not exist
LINE 1: ...TE TABLE demo_floatingpoint(f1 FLOAT, f2 REAL, d1 DOUBLE, d2...

demo=# CREATE TABLE demo_floatingpoint(f1 FLOAT, f2 REAL, d2 DOUBLE PRECISION);
CREATE TABLE
mysql> CREATE TABLE demo_floatingpoint(f1 FLOAT, f2 REAL, d1 DOUBLE, d2 DOUBLE PRECISION);
Query OK, 0 rows affected (0.07 sec)

date

Both PostgreSQL and MySQL use the date data type.

timestamp

Both PostgreSQL and MySQL use the timestamp data type to store date/time values. However, there is a difference in both precision and implementation here.

In PostgresSQL, timestamp supports a date before EPOCH, while in MySQL it does not. MySQL uses the DATETIME datatype.

Using PostgresSQL timestamp and MySQL DATETIME, both support microsecond precision. MySQL however only started to provide this in MySQL 5.6.

A key difference in column definition is the PostgreSQL timestamp without time zone syntax, used in our example dataset. Analysis of data loading will determine the impact here.

boolean

SQL:1999 calls for a Boolean datatype, and both PostgreSQL and MySQL support defining a column as BOOLEAN. MySQL however implicitly converts this to a SIGNED TINYINT, and any future DDL viewing shows this reference.

When referencing boolean, in PostgreSQL WHERE column_name = TRUE or WHERE column_name = t retrieves a true value. In MySQL WHERE column_name = TRUE or WHERE column_name = 1. When you SELECT a boolean, in PostgresSQL the answer is ‘t’, in MySQL, the answer is 1.

demo=# CREATE TABLE demo_boolean (b1 boolean);
CREATE TABLE
demo=# INSERT INTO demo_boolean VALUES (TRUE),(FALSE);
INSERT 0 2
demo=# SELECT * FROM demo_boolean;
 b1
----
 t
 f
(2 rows)
mysql> CREATE TABLE demo_boolean (b1 boolean);
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO demo_boolean VALUES (TRUE),(FALSE);
Query OK, 2 rows affected (0.03 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM demo_boolean;
+------+
| b1   |
+------+
|    1 |
|    0 |
+------+
2 rows in set (0.00 sec)

Other Data Types

Only the data types in this example have been reviewed.

Other syntax

In our sample SQL script, there is psql specific syntax to show a debugging line with \qecho .... For compatibility these are removed.

The loading of data with the \COPY <table_name> FROM PSTDIN WITH CSV HEADER is PostgreSQL specific and so loading the data is a future topic.

Finally, the VACUUM ANALYZE <table_name> command is also PostgreSQL specific and removed. This is a means effectively of optimizing and analyzing the table.

Both PostgreSQL and MySQL have an ANALYZE command, however the syntax is different, with the required TABLE keyword in MySQL.

PostgresSQL

ANALYZE donorschoose_projects;

ANALYZE TABLE donorschoose_projects;
ERROR:  syntax error at or near "table"

MySQL

ANALYZE donorschoose_projects;
ERROR 1064 (42000): You have an error in your SQL syntax;...

ANALYZE TABLE donorschoose_projects;

MySQL has an OPTIMIZE TABLE syntax, however while technically valid syntax this is not compatible with the default storage table InnoDB.

mysql> OPTIMIZE TABLE donorschoose_projects;
+----------------------------+----------+----------+-------------------------------------------------------------------+
| Table                      | Op       | Msg_type | Msg_text                                                          |
+----------------------------+----------+----------+-------------------------------------------------------------------+
| test.donorschoose_projects | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| test.donorschoose_projects | optimize | status   | OK                                                                |
+----------------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.32 sec)

Improving performance – A full stack problem

Improving the performance of a web system involves knowledge of how the entire technology stack operates and interacts. There are many simple and common tips that can provide immediate improvements for a website. Some examples include:

  • Using a CDN for assets
  • Compressing content
  • Making fewer requests (web, cache, database)
  • Asynchronous management
  • Optimizing your SQL statements
  • Have more memory
  • Using SSD’s for database servers
  • Updating your software versions
  • Adding more servers
  • Configuring your software correctly
  • … And the general checklist goes on

Understanding where to invest your energy first, knowing what the return on investment can be, and most importantly the measurement and verification of every change made is the difference between blind trial and error and a solid plan and process. Here is a great example for the varied range of outcome to the point about “Updating your software versions”.

On one project the MySQL database was reaching saturation, both the maximum number of database connections and maximum number of concurrent InnoDB transactions. The first is a configurable limit, the second was a hard limit of the very old version of the software. Changing the first configurable limit can have dire consequences, there is a tipping point, however that is a different discussion. A simple software upgrade of MySQL which had many possible improvement benefits, combined with corrected configuration specific for this new version made an immediate improvement. The result moved a production system from crashing consistently under load, to at least barely surviving under load. This is an important first step in improving the customer experience.

In the PHP application stack for the same project the upgrading of several commonly used frameworks including Slim and Twig by the engineering department seemed like a good idea. However applicable load testing and profiling (after it was deployed, yet another discussion point) found the impact was a 30-40% increase in response time for the application layer. This made the system worse, and cancelled out prior work to improve the system.

How to tune a system to support 100x load increase with no impact in performance takes knowledge, experience, planning, testing and verification.

The following summarized graphs; using New Relic monitoring as a means of representative comparison; shows three snapshots of the average response time during various stages of full stack tuning and optimization. This is a very simplified graphical view that is supported by more detailed instrumentation using different products, specifically with much finer granularity of hundreds of metrics.

These graphs represent the work undertaken for a system under peak load showing an average 2,000ms response time, to the same workload under 50ms average response time. That is a 40x improvement!

If your organization can benefit from these types of improvements feel free to Contact Me.

There are numerous steps to achieving this. A few highlights to show the scope of work you need to consider includes:

  • Knowing server CPU saturation verses single core CPU saturation.
  • Network latency detection and mitigation.
  • What are the virtualization mode options of virtual cloud instances?
  • Knowing the network stack benefits of different host operating systems.
  • Simulating production load is much harder than it sounds.
  • Profiling, Profiling, Profiling.
  • Instrumentation can be misleading. Knowing how different monitoring works with sampling and averaging.
  • Tuning the stack is an iterative process.
  • The simple greatest knowledge is to know your code, your libraries, your dependencies and how to optimize each specific area of your technology stack.
  • Not everything works, some expected wins provided no overall or observed benefits.
  • There is always more that can be done. Knowing when to pause and prioritize process optimizations over system optimizations.

These graphs show the improvement work in the application tier (1500ms to 35ms to 25ms) and the database tier (500ms to 125ms to 10ms) at various stages. These graphs do not show for example improvements made in DNS resolution, different CDNs, managing static content, different types and ways of compression, remove unwanted software components and configuration, standardized and consistent stack deployments using chef, and even a reduction in overall servers. All of these successes contributed to a better and more consistent user experience.

40x performance improvements in LAMP stack

Correctly setting your mysql prompt using sudo

If you run multiple MySQL environments on multiple servers it’s a good habit to set your MySQL prompt to double check which server you are on.
however, using the MYSQL_PS1 environment variable I found this does not work under sudo (the normal way people run sudo).

I.e., the following syntax’s work.

$ mysql
$ sudo su - -c mysql
$ sudo su - ; mysql

but the following does not.

$ sudo mysql

The trick is actually to ensure via /etc/sudoers you inherit the MySQL_PS1 environment variable.

echo "export MYSQL_PS1="`hostname` [d]> "" | sudo tee /etc/profile.d/mysql.sh
echo 'Defaults    env_keep += "MYSQL_PS1"' | sudo tee /tmp/mysql
sudo chmod 400 /tmp/mysql
sudo mv /tmp/mysql /etc/sudoers.d

What is FTS_BEING_DELETED.ibd

I currently have on a MySQL 5.6 database using innodb_file_per_table the following individual tablespace file.

schema/FTS_00000000000001bb_BEING_DELETED.ibd

The schema is all InnoDB tables, and there ARE NO Full Text Indexes. I cannot comment on if a developer has tried to create one previously.
I am none the wiser in explaining the ongoing use of these files, or if it can be/should be deleted.

On closer inspection there are infact a number of FTS files.

$ ls -al FTS*
-rw-r----- 1 mysql mysql 98304 Jan 29 16:21 FTS_00000000000001bb_BEING_DELETED_CACHE.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:20 FTS_00000000000001bb_BEING_DELETED.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:26 FTS_00000000000001bb_CONFIG.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:21 FTS_00000000000001bb_DELETED_CACHE.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:00 FTS_00000000000001bb_DELETED.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:20 FTS_00000000000001c7_BEING_DELETED_CACHE.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:26 FTS_00000000000001c7_BEING_DELETED.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:21 FTS_00000000000001c7_CONFIG.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:20 FTS_00000000000001c7_DELETED_CACHE.ibd
-rw-r----- 1 mysql mysql 98304 Jan 29 16:20 FTS_00000000000001c7_DELETED.ibd

Any MySQL gurus with knowledge to share, and for the benefit of others that Internet search at a later time.

Related articles included Overview and Getting Started with InnoDB FTS and Difference between InnoDB FTS and MyISAM FTS but do not mention file specifics.

The article InnoDB Full-text Search in MySQL 5.6 (part 1) provides more insight that these files remain even if a full text index was created and has since being removed. It is not clear from the filename which tables these files relate to.

Good Test Data

Over the years you collect datasets you have created for various types of testing, seeding databases etc. I have always thought one needs to better manage this for future re-use. Recently I wanted to do some “Big Data” playing and again that question of what datasets can I use let me to review the past collated list at Seeking public data for benchmarks.

The types of things I was wanting to do lead me to realize a lot of content is “public domain” and Project Gutenberg is just one great source of text in multiple languages. This was just one aspect of my wish list but text based data is used from blogs, comments, articles, microblogs etc, and multiple languages was important from some text analysis.

With a bit of thinking about the building blocks, I created Good Test Data. A way for me to have core data, IP’s, people’s names, User Agents strings, text for articles, comments and a lot more. And importantly the ability to generate large randomized amounts of this data quickly and easily.

Now I can build a list of 1 million random names with unique usernames and emails with ease. I can generate millions of varying articles, from a short microblog, a comment, a blog to a multi page article. Then be able to produce HTML/PDF/PNG versions giving me file attachments. I’ve been playing more with image generation, creating banner images with varying text, and now I’m generating MP4 video to simulate the various standard sizes for advertising and just to see what people need.

I’m not sure of the potential use and benefit for others and that wasn’t the primary goal, however I would like to know how these building blocks could be used. The data is relatively agnostic, being able to easily load into MySQL tables. Depending on demand, being able to create pre-configured open source product data for e-commence products, CRM or blogging are all possible options.

The GRANT/REVOKE dilemma

It is common practice to grant your application the privileges of “GRANT SELECT, INSERT, UPDATE, DELETE ON yourdb.* TO user@host”.

But what if you want to ensure you cannot DELETE data from just one table?

Ideally I want to be able to “REVOKE DELETE ON yourdb.important_table FROM user@host”. You cannot do currently this with the MySQL privilege system.

If your schema has 100 tables, and you want to remove DELETE from one, you have to define DELETE for the 99 others, and remember that for each new table, you need to remember to also modify user privileges.

Unexplained halts using mysql command line client

I recently came across an issue trying to connect to a MySQL server using the mysql client. It appeared as through the connection was hanging.

A subsequent connection using the -A option highlighted the problem with the previous connection stuck in the state “Waiting for table metadata lock”.

mysql> SHOW PROCESSLISTG
*************************** 1. row ***************************
     Id: 37
   User: root
   Host: localhost
     db: tmp
Command: Query
   Time: 90
  State: preparing
   Info: create table missing as select id from AK where id not in (select id ..
*************************** 2. row ***************************
     Id: 38
   User: root
   Host: localhost
     db: tmp
Command: Field List
   Time: 50
  State: Waiting for table metadata lock
   Info:
*************************** 3. row ***************************
     Id: 39
   User: root
   Host: localhost
     db: tmp
Command: Query
   Time: 0
  State: init
   Info: show processlist

In this example you can see a long running CREATE TABLE … SELECT statement as the cause of the problem. The -A or –no-auto-rehash argument is a means of disabling the tab completion in the mysql command line client.

Giving thanks to MySQL authors challenge

Next week the US celebrates Thanksgiving Day. For those that are American or live here, this is a significant event. Three different experiences recently have lead me to write this request for ALL MySQL community members to give thanks to those that have contributed to the MySQL ecosystem. I have made a commitment to myself, and I would like to challenge others to write one book review per week in December, that’s 4 book reviews to the MySQL books that I have on my bookshelf that have made an impact in some way. I ask others to give it a go too.

It only takes a few minutes to pen a comment on Amazon, or a publishers site, but to authors it means so much more. I can only speak for myself, but any comment; good, bad or ugly; helps to know you are out there and you took the time to acknowledge somebody’s work of art (in this case a publication).

I only have to look at my bookshelf and I find the following MySQL books (in order they currently are placed which is no specific order), MySQL Crash Course by Ben Forta, MySQL Clustering by Alex Davies and Harrison Fisk, MySQL Cookbook by Paul DuBois, MySQL Stored Procedure Programming by Guy Harrison, Developing Web Applications with Apache, MySQL, memcached, and Perl by Patrick Galbraith, Pro MySQL (The Expert’s Voice in Open Source) by Mike Kruckenberg and Jay Pipes, MySQL Administrator’s Bible by Sheeri Cabral, MySQL (Third Edition) by Paul DuBois, High Performance MySQL: Optimization, Backups, Replication, and More (Second Edition) by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy Zawodny, Arjen Lentz and Derek Balling, High Performance MySQL: Optimization, Backups, and Replication (Third Edition) by Baron Schwartz, Peter Zaitsev, Vadim Tkachenkoo, MySQL High Availability: Tools for Building Robust Data Centers by Charles Bell, Mats Kindal and Lars Thalman, Expert PHP and MySQLby Andrew Curiso, Ronald Bradford and Patrick Galbraith, Effective MySQL Backup and Recovery (Oracle Press) by Ronald Bradford, Effective MySQL Replication Techniques in Depth by Ronald Bradford and Chris Schneider, Effective MySQL Optimizing SQL Statements (Oracle Press) by Ronald Bradford, Database in Depth: Relational Theory for Practitioners by Chris Date, Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL by Roland Bouman and Josh an Dongen, MySQL Administrator’s Guide and Language Reference (2nd Edition) and MySQL 5.0 Certification Study Guide by Paul Dubois, Stefan Hinz and Cartsten Pedersen.

And they are just the physical books, I have several PDF and Kindle only versions, and other MySQL Books I know about and have not purchased.

I would also like to give a special shout out to Sheeri Cabral and MySQL Marinate. A program using the O’Reilly Learning MySQL book to help anybody that wants to learn. At my most recent Effective MySQL Meetup a beginner question was asked by an audience member, and it was another audience member (not even myself) that piped up and recommended MySQL Marinate.

Finally, I learned MySQL by reading the online reference manual from cover to cover, and I did it again several years later, and probably should do it again some day. I am unable to find the names of the authors present or past, nor the right place you could leave a comment, but thanks to those I do know about, Jon Stephens, Mike Hillyer, Stefan Hinz, MC Brown and Paul DuBois.