Digital Tech Trek Digest [#Issue 2024.05]

Because the world needs better dashboards

While my professional interests in Building Better Data Insights Faster rely on using visuals and narratives to show data-driven results, “Starting from first principles” is the question you have to ask. Identifying the quality of data sources, the time to delivery, and the confidence of accuracy are critical aspects of any dashboard.

Source: WrapText by Equals

This is the second article I’ve read about Equals in a week, and while I’m not ready to go back to a spreadsheet, this company has some great previous posts with excellent content, such as the 2023 summary and How to ship fast. An appropriate statement would be.

What a year. We embraced AI. We reimagined BI. We waved freemium goodbye. And as the cliché goes, we’re only just getting started.

[Last Week in AWS] Issue #352: New Year, New You, Here’s December in Review

Damm right, I think you are giving too much created by saying “a year”. More than once I had to rewrite code because AWS was years behind standard Python releases. AWS Lambda adds support for Python 3.12.

Whatever was going on with the delays in getting new language runtimes out a year or more after the language version itself was released seems to have been resolved. I wonder how long it’ll take that unpleasant chapter to fade from the collective awareness around Lambda.

Source: Last Week in AWS

Latency is the new outage

While technically a video that I listened to with Getting Started with ElastiCache for Redis Performance & Cost Optimization, this needs to be a slogan used more frequently. It is so true. The speaker in the opening minutes also describes some compelling reasons why our proliferation of data can contribute to a negative impact.

Source: Random AWS reading.

About “Digital Tech Trek Digest”

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new covering professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned, what I should really learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow. Also Scientific American Technology, Fareed’s Global Briefing, Software Design: Tidy First? by Kent Beck, Last Week in AWS to name a few.

Digital Tech Trek Digest [#Issue 2024.01]

The Tiny Stack (Astro, SQLite, Litestream)

I spent many years in the LAMP stack, and there are often many more acronyms of technology stacks in our evolving programming ecosystem. New today is “The Tiny Stack”, consisting of Astro, a modern meta-framework for javascript (not my words), and Lightstream Continuously stream SQLite changes. I’ve never been a fan of Javascript, a necessary evil in modern stacks, but it changes so rapidly it’s a constant stream of new products with never the time to learn any. Lightstream is interesting. Replication of SQL operations to a database is nothing new, the Change Data Capture (CDC) of your data, however, I’d not thought of SQLite which is embedded everywhere offered this type of capability.

Amazon Aurora Introduces Long-Awaited RDS Data API to Simplify Serverless Workloads

AWS Aurora Serverless version 2 has been out for at least a year (actually 20 months – Apr 21, 2022), but a feature of version 1 that was not available in version 2 is the Data API. This is for developers without SQL skills to have a RESTful interface to the database, however, it only works in AppSync and only for recent versions of PostgreSQL and only in certain regions. I’ve never used it myself, but it is news.

Speaking of what is available in what AWS region, recently released InstanceHunt allows you to identify the instance families/types available in different regions across various AWS Database services. I developed this in just a few days and released it only last week as a working MVP. Future goals are to include other clouds and other categories of services such as Compute. The prior announcement may facilitate a future version that supports the features of services in regions.

Stop Stalling And Start Your Dream Side Business In 2024

The title kinda says it all. As an inspiring entrepreneur, my pursuits have only offered limited minimal success over the decades and never a passive revenue stream. While the article did not provide valuable nuggets, the title did. One of my goals for 2024 is to elevate my creation and release of side projects, regardless of each project being a source of revenue. I consider refining my design, development, testing, and implementation skills and providing information of value, are all resources of a soft income that showcase some of my diverse skills.

About ‘Digital Tech Trek Digest ‘

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new covering professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned today, what I should learn and use, or what is of random interest.

Announcing InstanceHunt

InstanceHunt identifies the instance (families/types/classes) available for a cloud service across all the regions of that cloud.

The initial version is a working example of several AWS database services. Future releases will enable advanced filtering and will cover other service categories (e.g. compute) as well as GCP and Azure cloud platforms, as well as providing the full list of instance types within families within the service matrix.

For a few days investment this MVP is a usable service, complete with adding new regions the same day, for example ca-central-1 data was available the day of release. It is interesting and can answer questions like what regions the new generation 7 instance families are available? What consistent instance types can use use across Europe regions? Where is MemoryDB not available?

Feature requests are welcome. From today’s reading, being able to show a feature of a service may be also a useful future matrix, e.g. AWS Aurora Serverless Data API now available in Serverless v2, but only one of two engines and only in a few regions.

China regions and AWS Gov Cloud regions are coming soon.

InstanceHunt - Find what instances you can use for your cloud services

AWS RDS Aurora wish list

I’ve had this list on a post-it note on my monitor for all of 2022. I figured it was time to write it down, and reuse the space.

In summary, AWS suffers from the same problem that almost every other product does. It sacrifices improved security for backward compatibility of functionality. IMO this is not in the best practices of a data ecosystem that is under constant attack.

  • Storage should be encrypted by default. When you launch an RDS cluster its storage is not encrypted. This goes against their own AWS Well-Architected Framework Section 2 – Security.
  • Plain text passwords. To launch a cluster you must specify a password in plain text on the command line, again not security best practice. At least change this to using a known secret from AWS secrets manager.
  • TLS for administrative accounts should be the only option. The root user should only be REQUIRE SSL (MySQL syntax).
  • Expanding on the AWS secrets manager usage for passwords, there should not need to be lambda code and cloudwatch cron event for rotation, it should just be automatically built in.
  • The awscli has this neat wait command that will block until you can execute the next statement in a series of sequential events to prepare and launch a cluster, but it doesn’t work for create-db-cluster. You have to build in your own manual “wait” until “available” process.
  • In my last position, I was unable to enforce TLS communications to the database from the application. This insecure practice is a more touchy situation, however, there needs to be some way to ensure security best practices over application developer laziness in the future.
  • AWS has internal special flags that only AWS support can set when say you have a bug in a version. Call it a per-client feature flag. However, there is no visibility into what is set, which account, which cluster, etc. Transparency is of value so that the customer knows to get that special flag unset after minor upgrades.
  • When you launch a new RDS Cluster, for example, MySQL 2.x, you get the oldest version, back earlier in the year it was like 2.7.2, even when 2.10.1 was released. AWS should be using a default version when only an engine is specified as a more current version. I would advocate the latest version is not the automatic choice, but it’s better to be more current.
  • the ALTER SYSTEM CRASH functionality is great, but it’s incomplete. You cannot for example crash a global cluster, forcing a region-specific failover. If you have a disaster resiliency plan that is multi-region it’s impossible to actually test it. You can emulate a controlled failover, but this is a different use case to a real failover (aka Dec 2021)
  • Use arn when it’s required not id. This goes back to my earlier point over maximum compatibility over usability, but when a --db-instance-identifier, or --db-instance-identifier requires the value to be the ARN, then the parameter should be specific. IMO –identifier is what you use for that argument, e.g. --db-cluster-identifier. When you specify for example --replication-source-identifier this must be (as per docs) “The Amazon Resource Name (ARN) of the source DB instance or DB cluster if this DB cluster is created as a read replica.” It should then be --replication-source-arn. There are a number of different occurrences of this situation.

Spoiler – Owning your data isn’t good enough

While this is a catchy title, if you use Software as a Service (SaaS), or an online cloud provider, do you actually own and have total control of your business data and its infrastructure? For all the free and paid services your business uses, what happens if one day, a portion of that were no longer available?

When you have data in a CRM, an analytics platform, a marketing platform, a payments platform, if one of those service providers locks you out of your data, you have lost control and access to a part of your business. Can you still operate unaffected? What is the actual impact? What is your contingency? You could be lucky and the impact is temporary, such as a day or a week, but it could also be longer or even indefinite.

Let me give you a simple but concrete example. Fellow woodworker Eric of Spencley Design posted recently on YouTube “I just lost half of my business”. If you listen to just 2 1/2 minutes from 12:00 to 14:30 of his youtube video explanation you will understand that this business relies on several online SaaS services. Many are free, but for an unexplained reason, whether bad code, bad ML/AI, or several other plausible reasons, one of his income streams was shut down without notice. This was not by his doing, or any of his actions but for unrelated reasons. Online attempts to appeal this situation caused a permanent suspension. Talking to a human to understand what happened, why it happened, and how this can be resolved, was also unanswered because there is no ability to physically speak to a human.

This problem is not limited to online services. A great example of just a decade ago is your business credit card stops working, transactions are declined. If you were lucky you could physically call your bank manager, or go to your bank manager to get to the bottom of the situation. You knew your bank account contained sufficient funds as you maintained on-premise accounting practices and you could provide evidence of such facts. If you run a small business today, do you think you can talk to a human that would have the ability to correct this problem, or would you have to talk to 5 humans, multiple automated (and annoying) systems, costing countless hours of time and frustration?

If you rely on Acme George Inc workspaces product for your small business email and shared documents, what if that becomes blocked? How do you communicate with your customers? What if you use Acme Archie Inc for your customer support ticketing system, and for a week it is unavailable to use? Not only can your customers not report issues, but you have no access to see what issues were already outstanding and work on them independently.

At times there are widespread outages of online presences that have a wide effect across industries from hours to weeks. Cloudflare Jun 21, 2022, Fastly June 8, 2021, Amazon Web Services Dec 7, 2021, and then Dec 15 and Dec 22. A blog post called it the AWS’s December Outagepalooza. The Atlassian April 2022 outage for paying customers lasted upto 2 weeks. Even a free social media company and its related entities incurred widespread impact Facebook Oct 4 ,2021 that affected many gig economy businesses. These outages can have far ranging effects. Actual examples include you cannot pay your employees, your staff at a hospital cannot authenticate to access patient records, transportation and logistics of your shipping business is halted.

I am referring here to loss of access to your data in a SaaS environment, and loss of cloud infrastructure that supported your SaaS services or even your internally developed and maintained systems running on cloud infrastructure. If you are not convinced of the larger ramifications of an extreme loss of infrastructure services what was the impact to Parler in 2021.

My point here is you cannot simply stop using these services, or your cloud provider(s) infrastructure. You need to be prepared. In a traditional system, you backup your data for some degree of disaster, and you support the capability to recover both infrastructure and data from this, and if you a smart you actually test this. Sidebar a colleague recently shared that even with massive investment in infrastructure and global redundancy, a scheduled test for this large bank took down services for 12 hours.

Large SaaS organizations could offer services that offer multi-region or multi-cloud capabilities, but they are also at the mercy of the SaaS providers they use. Do you know all the interdependencies? Look no further than the wipe out of Okta’s stock (down 30%) in one day. CEO of Okta Todd McKinnon cited several factors including a security impact by text message provider Twilio. Read more about that at Twilio Employeee, Customer Accounts Breached Through Texts. And yes, the headline here has an incorrect spelling. I tried to add a comment to offer feedback, but the MarketWatch paywall of 4 articles would not let me create an account to login to leave a comment!

The solution is not to host all of your own infrastructure either. Facebook’s very long outage was self-inflicted and they controlled all of their own infrastructure. It not only had an impact on their websites, their internal staff were unable to use security badges to access critical infrastructure to correct the problem because they were physically locked out of buildings holding the infrastructure.

Returning to the small business owner who uses a marketing platform, an analytics platform, a CRM, a payment platform or even a social media platform. Do you keep current copies of your data in these systems so that if there were a loss, you knew who to communicate with? In the first cited case, did Eric have a list of all of his subscribers, a copy of all his online content, and all comments made by subscribers. Was there a means to communicate with them via other means, or was access to sufficient PII not even possible for what was his original content?

In future posts I will share some of my techniques for ensuring you have a data acquisition strategy.

Can a picture replace a text description?

Data visualization, data storytelling, and data lineage are all ways to better describe and visualize a specific situation for a set of data. Generally, I find these techniques are used as a means to uncover or identify information that ultimately pertains to individuals. For example how many sales have we made across time/location/business unit? How many customers do we have? How many social media photos has a person provided over a period of time? Unfortunately, this is not the kind of data that I feel has real-world meaning to me. It doesn’t describe the advancements made in the biomedical field to help fight disease, it doesn’t tell us the amount of energy that we have saved or the amount of energy that we failed to collect and the impact that has locally and globally on our world, or it doesn’t describe valuable human experiences in history about people and places. I find this value in data visualizations of others.

During a recent vacation, I thought about the impact of the visualization of my experiences and just how much information was not collected, and how much information was collected but is of average or poor value or is extremely valuable. How hard it was to collate even what I had collected, and to who or what the value of this information is? While these are personal experiences and not that of a commercial organization, Google certainly informed me of how many people were viewing a public image I uploaded and a comment I made of an iconic Australian location and food.

Inaccessible value in a text description

I recently caught up with a very dear friend. She had lost her husband 10 years ago after more than 50 years of marriage. He kept a written diary every year since 1946, starting a year after the end of world war II that he served it.

So from 1946 to 2012, that is 66 years, there is a wealth of information that includes personal feelings, expectations, and perhaps thoughts about what is going well and what’s not going well. It would also include valuable information about the world view from a very intelligent and influential individual. These diaries are still located today on the same shelf in the same office they have for the short 25 years I was aware.

To draw a conclusion to the question with a data analogy. A single copy, in a single location of un-indexed information, which first-hand sources know has unlocked potential. It is also an immutable and finite time capsule. It would also IMHO contain great value in feelings, emotions, family, and history that is important to a small community. Could a picture represent this data?

A picture

Whilst traveling I used my camera as a means to record the experiences that I was having with my family. I am a photographer and not a videographer so my expression is a picture rather than a video and optionally audio. Sidebar, we did give our child a diary for this vacation to write in, however the attempt to build this new habit only lasted 2 days. Forming new habits can be hard.

But can a picture relay adequate information to describe when and where this photo was taken, why it was taken, how I felt, who I was with, what did it inspire, how did it make me think about related experiences in the past.

Today there is technology now that can take a picture and describe the contents, effectively it could create a summarized description of the picture. With additional metadata such as Exif data where you can extract more details such as time and location. With machine learning you can do picture comparison to identify locations even if location was not specified with the photo.

You can now have AI create an image from a description, if only my Dall-E-2 account would not keep crashing I would try it out.

A picture on its own only contains some value. If you collect all this information and combined with other sources, for example when I used my phone and not my camera, this is stored use google photos. This company can use this information to create a timeline of where you were, when you were there, perhaps you were with and combined with all the sources this company has such as your Google calendar, and Gmail it can and does create a timeline much like the timeline you see in social media platforms such as Facebook if you are regularly user of such a platform.

So we have not a picture, but a collection of pictures, including those not taken or owned by yourself, combined with other structured and unstructured data that can provide an improved timeline.

In comparison in data visualization there is usually a time component for most data. Animated data visualizations which can be awesome usually represent data across time.

Example pictures

Let me give you some simple images as an example and I’ll add some information that is not included in the photos specifically what is available in today’s modern technology such as GPS location. First my existing EOS 5D camera does not provide that information and second I do not enable that on my phone because I want to keep that information private and Google does not provide a capability that would enable me to store personal information but do not share that information for consumption by for example use that information for other machine learning capabilities.

Emu

I had never seen an emu roaming in the wild in Australia before this trip. From a conversation with a friend on a different topic did she provide information that driving between Jeriderie and Narandera you will find emus in the wild. Indeed we did multiple times on this specific highway without having to randomly goto some isolated place hoping for the same outcome.

This is a truly unique animal without only an ostrich looking similar.
Emu in nature, NSW Australia

AWS Rekognition output with values >90% categorize this photo with Antelope, Animal, Mammal, Wildlife, Bird, Sheep, with parents categories of Wildlife, Mammal, Animal. Well that is horribly wrong. An Antelope has 4 legs, this clearly has 2. An Emu is not a Mammal.

AWS Rekognition Response – August 2022. Larger version

That was so bad (and unexpected) I wanted to give the technology another chance with a different emu pic.
Emu in nature, NSW Australia
This time AWS Rekognition output with values >90% describe this photo with Bird, Animal, Emu, Sheep, Mammal, and at 88% Antelope and Wildlife. So if you get Emu (that has two legs), why would you say Sheep which is 4 legs and not a bird? And if you said Emu and Bird, why would you then select Antelope, also with 4 legs, but so not a Bird.

AWS Rekogintion Response – August 2022. Larger version.

Feeling a bit duped by technology, I tried Google Image search next. The first image was recognized as “Tasmanian Emu”. I didn’t know there was such a thing, but it did say Emu and all other related visual matches were Emu. I was surprised it only picked 3 of the 4 animals in the first pic.

The second image was recognized as “Tasmanian emu”, “Emu” and “Common ostrich”. Doh!

Platypus

This next image I was confident AWS Rekognition would be spot on. It’s even a more unique animal, and there are no grass or obstacles to obscure the animal. Boy was I wrong.

Platypus in nature, Eungella QLD Australia

AWS Rekognition output with values >90% describe this photo with Wildlife, Animal, Mammal and at 87% Hippo. It is true that a Hippo is a mammal, and you do find them in water, but?

Platypus in nature, Eungella QLD Australia
Finding an even more evident picture that anybody would recognize, well the software could not. Wildlife, Animal, Mammal. At 85% Lizard, Reptile and Otter. A lizard is not a mammal?

This post is starting to turn into a self proposition even more than I thought.

Had I provided GPS, it would have said Eungella, QLD and any more additional searching would show that it’s a popular destination for finding a Platypus in nature. In-fact a precise GPS location would give the name literally as “Platypus Deck”. here. A human would quickly articulate this by reading text from basic online searches.

Would AWS Rekognition discount values responses if it could know the precise location or even the country. Somehow I feel not.

For what it’s worth, a Google Image Search of Platypus yields tons of pictures AWS should use in it’s recognition validate and ML model.

What this image does not say with whom I was traveling with, that is a local of the area and his comment that in 20 years I’ve not seen platypus as easily and playful as this. It would not describe that on social media, many locals were surprised with the quality of images and videos. It would also not describe what the video shows, how they dive and then burrow into the muddy water using their bill, then rise to the surface and dive again.

Trying Google Image search again, the first image yielded platypus which it is. For the second image, google found no results. Again, I was rather shocked in comparison to the images of platypus a Google Image search shows and the fact this second image showed more detail IMO.

If I correctly label this image with alt text as a platypus, will Google Image search in future show this within search results, or will the output of correct recognition improve at a later time.

Other animals

At this time I decided to give up on animals. An echidna is a unique animal unlike almost any other, a quokka is also, but does resemble a small kangaroo. I do have a hippo, I’ll have to try that out.

Other images

I decided to try some more easier images and I was again overall disappointed and therefore decided to stop adding content here after these two images.

Surfers during summer waves in Oahu

AWS Rekognition above 90% accuracy was Sea, Ocean, Water, Nature, Outdoors, Person, Human, Surfing, Sea Waves, Sport, Sports.
Google Image search gave the first visual response as Viewing summer Swells on Oahu, which is spot on. It was Oahu, not Maui (wonders if I have a surfer image from Maui), and it was more importantly in Summer and not winter which has much larger waves.

Finally I tried to pick one of the most uniquely observable images that you will only find in one location and this image also included readable text.
U.S.S Arizona Memorial at Pearl Harbor Hawaii

AWS Rekognition above 90% was Flag, Symbol, Watercraft, Vessel, Vehicle, Transportation, Waterfront, Water, with Ferry and Boat being 88%. Rather shocked it could not pull out not 1 but 2 specific and clear areas of text and used this. The visible text is “U.S.S ARIZONA MEMORIAL” and “USS ARIZONA BB 39″.

Google Image search described this image as the Pearl Harbor National Memorial which is exactly what it is (in summary).

Even a picture of the Sydney Opera House, a truly unique building did not yield the result of Sydney via AWS Rekognition.

I look forward to this post being indexed to see if Google can give the source of the image as my site. I am also going to have to look into Google APIs for image recognition rather then the very slow and painful web browser option.

In conclusion

To return to the question of this post. Can a picture replace a text description? In short, no. A picture can summarize and quickly convey meaning simply because of how our mind processes visuals but it cannot replace a full text description.

In these examples I would expect almost every reader to be about to summarize these images more accurately, whereas (accessible) machine learning has a very long way to go. Even with location information, only some people would be able to add additional value or better infer an initial description, however all the information I could convey that is of applicable value is not within the picture itself. You cannot interpret additional meaning and value without more context, or without intrinsic knowledge. Using the surfers example, you can find waves used by surfers all around the world. In Oahu specifically in summer, surfable waves are infrequent and small, whereas in winter apparently they are very different.

With data storytelling, a data visualization is going to provide a similar outcome. Better visualizations will contain color and legends that visually describe information more clearly. They will also offer additional insights in creative ways, such as magnifications or clear differentiators. Should photographs as these shown here, automatically contain further context that is shown just in the image. Should image recognition automatically suggest titles that can be further edited by the author? Should an audio summary of the image be able to be recorded with the image? Should any single picture use context of other pictures around locality or time or similarities in the view to build a better picture. It is interesting to consider how technology could improve to provide greater value to the consumer.

How much text is needed is an entirely different question. Is the saying “A picture is worth a 1,000 words” approximately accurate?

Even a complex picture that takes months to review all the detail does not replace a full text description. If you would like an example for comparison checkout the “L. Tellier Kitchen Poster”. A piece of art that I own.

SELECT 1

If you have worked with an RDBMS for some time, you will likely have come across the statement SELECT 1.

However, rarely is it correctly explained to engineers what the origin of SELECT 1 is, and why it’s useless and wasteful? A google search is not going to give you the response you would hope, these ranked responses are just as useless as the statement itself.

Bloat

Seeing a SELECT 1 confirms two things. First you are using a generic ORM framework, quote, and second, you have never optimized your SQL traffic patterns.

“Frameworks generally suck.
They CLAIM to improve the speed of development and abstract the need to know SQL.
The REALITY is the undocumented cost to sub-optimal performance, especially with data persistence.”

Connection Pooling

SELECT 1 comes from early implementations of connection pooling.

What is a connection pool? Rather than a new request or call getting a new database connection each time you wanted to return some data, programming languages implemented a cache with a pre-loaded pool of pre-established database connections. The intended goal is to reduce the execution time of an initial expensive operation of getting a new database connection if you were retrieving data from a simple SELECT statement. If intelligent enough (many are not), these pools would include features such as a low watermark, a high watermark, a pruning backoff of idle connections, and an ability to flush all connections.

When your code wanted to access the database to retrieve data, it would first ask the connection pool for an available connection from its pool, mark the connection as in-use and provide that for subsequent consumption.

Here is a simple example of the two queries that would actually be necessary to retrieve one piece of information.

SELECT 1
SELECT email_address, phone, position, active FROM employee where employee_id = ?

Staleness

SELECT 1 was implemented as the most light-weight SQL statement (i.e., minimal parsing, privilege checking, execution) that would validate that your connection was still active and usable. If SELECT 1 failed, i.e. a protocol communication across your network, the connection could be dropped from the connection pool, and a new connection from the pool could be requested. While this may appear harmless, it leads to multiple code in-efficiencies, a topic for a subsequent discussion.

Failed error handling

SELECT 1 was a lazy and flawed means to perform error handling. In reality, every single SQL statement requires adequate error handling, any statement can fail at any time to complete. In the prior example, what happens if the SELECT 1 succeeds but a simple indexed SELECT statement fails? This anti-pattern also generally shows that error handling is inconsistent and highly duplicated rather than at the correct position in the data access path.

By definition, error handling is needed in an abstraction function for all SQL statements, and it needs to handle all types of error handling including the connection no longer valid, connection terminated, timed out, etc.

If you had the right error handling SELECT 1 would then be redundant, and as I stated useless. You simply run the actual SELECT statement and handle any failure accordingly.

High availability

In today’s cloud-first architectures where high availability consists of multiple availability zones and multiple regions where application A can communicate with database B, every unneeded network round-trip in a well-tuned system is wasteful, i.e. it is costing you time to render a result quicker. We all know studies have shown that slow page loads drive users away from your site.

The cost of the cloud

This AWS Latency Monitoring grid by Matt Adorjan really shows you the impact that physics has on your resiliency testing strategy when application A and database B are geographically separated and you just want one piece of information.

Conclusion

The continued appearance of SELECT 1 is a re-enforcement that optimizing for performance is a missing skill for the much larger engineering code-writing workforce that have lost the ability for efficiency. It is also another easy win that becomes an unnecessary battle for Data Architects to ensure your organization provides a better customer experience.

What does the MySQL mysqlsh util.checkForServerUpgrade() execute

During a recent Aurora MySQL 8 upgrade process, a number of validation checks have failed. This is an analysis of the error message “present in INFORMATION_SCHEMA’s INNODB_SYS_TABLES table but missing from TABLES table”.

Some background

During a Major Upgrade from Aurora MySQL 5.7 to Aurora MySQL 8.0 the cluster instances were left in an incompatible-parameters state. The upgrade-prechecks.log shed some more light on the situation with

{
            "id": "schemaInconsistencyCheck",
            "title": "Schema inconsistencies resulting from file removal or corruption",
            "status": "OK",
            "description": "Error: Following tables show signs that either table datadir directory or frm file was removed/corrupted. Please check server logs, examine datadir to detect the issue and fix it before upgrade",
            "detectedProblems": [
                {
                    "level": "Error",
                    "dbObject": "flinestones.fred",
                    "description": "present in INFORMATION_SCHEMA's INNODB_SYS_TABLES table but missing from TABLES table"
                }
            ]
        }, 

For anonymity the troublesome table here is played by flinestones.fred

This error could be reproduced more quickly with the util.checkForServerUpgrade() check that saves the creation of a snapshot of your cluster, restore from the snapshot cluster, then the launch cluster instance path.

18) Schema inconsistencies resulting from file removal or corruption
  Error: Following tables show signs that either table datadir directory or frm
    file was removed/corrupted. Please check server logs, examine datadir to
    detect the issue and fix it before upgrade

  mysql.rds_heartbeat2 - present in INFORMATION_SCHEMA's INNODB_SYS_TABLES
    table but missing from TABLES table
  flinstones.fred -
    present in INFORMATION_SCHEMA's INNODB_SYS_TABLES table but missing from
    TABLES table 

As I am using the MySQL community mysqlsh tool with a managed AWS RDS MySQL cluster, I have discounted any rds specific messages.

Back to investigating the cause. Some basic spot checks within the Cluster confirmed this mismatch.

mysql > desc flinstones.fred;
ERROR 1146 (42S02): Table flinstones.fred ' doesn't exist

mysql > select * from information_schema.innodb_sys_tables where name = ' flinstones/fred';

*results*
(1 row)

A closer inspection of the Aurora MySQL error log re-iterated there was some issue.

$ aws rds download-db-log-file-portion --db-instance-identifier ${INSTANCE_ID} --log-file-name error/mysql-error-running.log --output text

... 
[Warning] InnoDB: Tablespace 'flinstones/fred' exists in the cache with id 5233285 != 4954605
...

What is this check

It is easy enough to look at the SQL behind this using open-source software, you go to the source and look at the SQL https://github.com/mysql/mysql-shell .. upgrade_check.cc. As the message is near identical to what AWS provides I am making an educated assumption the check is the same.

// clang-format off
std::unique_ptr
Sql_upgrade_check::get_schema_inconsistency_check() {
  return std::make_unique(
      "schemaInconsistencyCheck",
      "Schema inconsistencies resulting from file removal or corruption",
      std::vector{
       "select A.schema_name, A.table_name, 'present in INFORMATION_SCHEMA''s "
       "INNODB_SYS_TABLES table but missing from TABLES table' from (select "
       "distinct "
       replace_in_SQL("substring_index(NAME, '/',1)")
       " as schema_name, "
       replace_in_SQL("substring_index(substring_index(NAME, '/',-1),'#',1)")
       " as table_name from "
       "information_schema.innodb_sys_tables where NAME like '%/%') A left "
       "join information_schema.tables I on A.table_name = I.table_name and "
       "A.schema_name = I.table_schema where A.table_name not like 'FTS_0%' "
       "and (I.table_name IS NULL or I.table_schema IS NULL) and A.table_name "
       "not REGEXP '@[0-9]' and A.schema_name not REGEXP '@[0-9]';"},
      Upgrade_issue::ERROR,
      "Following tables show signs that either table datadir directory or frm "
      "file was removed/corrupted. Please check server logs, examine datadir "
      "to detect the issue and fix it before upgrade");
}

Ok, that’s a little more difficult to read than plain text, and what if I wanted to review other SQL statements this could become tedious.

Gather the SQL statements executed by util.checkForServerUpgrade()

Let’s use a more straightforward means of capturing SQL statements, the MySQL general log.

MYSQL_PASSWD=$(date | md5sum - | cut -c1-20)

docker network create -d bridge mynetwork
docker run --name mysql57 -e MYSQL_ROOT_PASSWORD="${MYSQL_PASSWD}" -d mysql:5.7
docker network connect mynetwork mysql57
docker inspect mysql57 | grep "IPAddress"
IP=$(docker inspect mysql57 | grep '"IPAddress":' | head -1 | cut -d'"' -f4)
docker exec -it mysql57 mysql -uroot -p${MYSQL_PASSWD} -e "SET GLOBAL general_log=1"
docker exec -it mysql57 mysql -uroot -p${MYSQL_PASSWD} -e "SHOW GLOBAL VARIABLES LIKE 'general_log_file'"
GENERAL_LOG_FILE=$(docker exec -it mysql57 mysql -uroot -p${MYSQL_PASSWD} -e "SHOW GLOBAL VARIABLES LIKE 'general_log_file'" | grep general_log_file | cut -d'|' -f3)


docker run --name mysql8 -e "MYSQL_ALLOW_EMPTY_PASSWORD=yes" -d mysql/mysql-server
docker exec -it mysql8 mysqlsh -h${IP} -uroot -p${MYSQL_PASSWD} --js -- util checkForServerUpgrade | tee check.txt

docker exec -it mysql57 grep Query ${GENERAL_LOG_FILE} | cut -c41- | tee check.sql


# Cleanup
docker stop mysql8 && docker rm mysql8
docker stop mysql57 && docker rm mysql57
docker network rm mynetwork

And we are left with the output of util.checkForServerUpgrade() and the SQL of all checks including of said statement:

check.sql

SET NAMES 'utf8mb4'
select current_user()
SELECT PRIVILEGE_TYPE, IS_GRANTABLE FROM INFORMATION_SCHEMA.USER_PRIVILEGES WHERE GRANTEE = '\'root\'@\'%\''
SELECT PRIVILEGE_TYPE, IS_GRANTABLE, TABLE_SCHEMA FROM INFORMATION_SCHEMA.SCHEMA_PRIVILEGES WHERE GRANTEE = '\'root\'@\'%\'' ORDER BY TABLE_SCHEMA
SELECT PRIVILEGE_TYPE, IS_GRANTABLE, TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLE_PRIVILEGES WHERE GRANTEE = '\'root\'@\'%\'' ORDER BY TABLE_SCHEMA, TABLE_NAME
select @@version, @@version_comment, UPPER(@@version_compile_os)
SET show_old_temporals = ON
SELECT table_schema, table_name,column_name,column_type FROM information_schema.columns WHERE column_type LIKE 'timestamp /* 5.5 binary format */'
SET show_old_temporals = OFF
select SCHEMA_NAME, 'Schema name' as WARNING from INFORMATION_SCHEMA.SCHEMATA where SCHEMA_NAME in ('ADMIN', 'CUBE', 'CUME_DIST', 'DENSE_RANK', 'EMPTY', 'EXCEPT', 'FIRST_VALUE', 'FUNCTION', 'GROUPING', 'GROUPS', 'JSON_TABLE', 'LAG', 'LAST_VALUE', 'LEAD', 'NTH_VALUE', 'NTILE', 'OF', 'OVER', 'PERCENT_RANK', 'PERSIST', 'PERSIST_ONLY', 'RANK', 'RECURSIVE', 'ROW', 'ROWS', 'ROW_NUMBER', 'SYSTEM', 'WINDOW', 'LATERAL', 'ARRAY' ,'MEMBER' )
SELECT TABLE_SCHEMA, TABLE_NAME, 'Table name' as WARNING FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE != 'VIEW' and TABLE_NAME in ('ADMIN', 'CUBE', 'CUME_DIST', 'DENSE_RANK', 'EMPTY', 'EXCEPT', 'FIRST_VALUE', 'FUNCTION', 'GROUPING', 'GROUPS', 'JSON_TABLE', 'LAG', 'LAST_VALUE', 'LEAD', 'NTH_VALUE', 'NTILE', 'OF', 'OVER', 'PERCENT_RANK', 'PERSIST', 'PERSIST_ONLY', 'RANK', 'RECURSIVE', 'ROW', 'ROWS', 'ROW_NUMBER', 'SYSTEM', 'WINDOW', 'LATERAL', 'ARRAY' ,'MEMBER' )
select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLUMN_TYPE, 'Column name' as WARNING FROM information_schema.columns WHERE TABLE_SCHEMA not in ('information_schema', 'performance_schema') and COLUMN_NAME in ('ADMIN', 'CUBE', 'CUME_DIST', 'DENSE_RANK', 'EMPTY', 'EXCEPT', 'FIRST_VALUE', 'FUNCTION', 'GROUPING', 'GROUPS', 'JSON_TABLE', 'LAG', 'LAST_VALUE', 'LEAD', 'NTH_VALUE', 'NTILE', 'OF', 'OVER', 'PERCENT_RANK', 'PERSIST', 'PERSIST_ONLY', 'RANK', 'RECURSIVE', 'ROW', 'ROWS', 'ROW_NUMBER', 'SYSTEM', 'WINDOW', 'LATERAL', 'ARRAY' ,'MEMBER' )
SELECT TRIGGER_SCHEMA, TRIGGER_NAME, 'Trigger name' as WARNING FROM INFORMATION_SCHEMA.TRIGGERS WHERE TRIGGER_NAME in ('ADMIN', 'CUBE', 'CUME_DIST', 'DENSE_RANK', 'EMPTY', 'EXCEPT', 'FIRST_VALUE', 'FUNCTION', 'GROUPING', 'GROUPS', 'JSON_TABLE', 'LAG', 'LAST_VALUE', 'LEAD', 'NTH_VALUE', 'NTILE', 'OF', 'OVER', 'PERCENT_RANK', 'PERSIST', 'PERSIST_ONLY', 'RANK', 'RECURSIVE', 'ROW', 'ROWS', 'ROW_NUMBER', 'SYSTEM', 'WINDOW', 'LATERAL', 'ARRAY' ,'MEMBER' )
SELECT TABLE_SCHEMA, TABLE_NAME, 'View name' as WARNING FROM INFORMATION_SCHEMA.VIEWS WHERE TABLE_NAME in ('ADMIN', 'CUBE', 'CUME_DIST', 'DENSE_RANK', 'EMPTY', 'EXCEPT', 'FIRST_VALUE', 'FUNCTION', 'GROUPING', 'GROUPS', 'JSON_TABLE', 'LAG', 'LAST_VALUE', 'LEAD', 'NTH_VALUE', 'NTILE', 'OF', 'OVER', 'PERCENT_RANK', 'PERSIST', 'PERSIST_ONLY', 'RANK', 'RECURSIVE', 'ROW', 'ROWS', 'ROW_NUMBER', 'SYSTEM', 'WINDOW', 'LATERAL', 'ARRAY' ,'MEMBER' )
SELECT ROUTINE_SCHEMA, ROUTINE_NAME, 'Routine name' as WARNING FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_NAME in ('ADMIN', 'CUBE', 'CUME_DIST', 'DENSE_RANK', 'EMPTY', 'EXCEPT', 'FIRST_VALUE', 'FUNCTION', 'GROUPING', 'GROUPS', 'JSON_TABLE', 'LAG', 'LAST_VALUE', 'LEAD', 'NTH_VALUE', 'NTILE', 'OF', 'OVER', 'PERCENT_RANK', 'PERSIST', 'PERSIST_ONLY', 'RANK', 'RECURSIVE', 'ROW', 'ROWS', 'ROW_NUMBER', 'SYSTEM', 'WINDOW', 'LATERAL', 'ARRAY' ,'MEMBER' )
SELECT EVENT_SCHEMA, EVENT_NAME, 'Event name' as WARNING FROM INFORMATION_SCHEMA.EVENTS WHERE EVENT_NAME in ('ADMIN', 'CUBE', 'CUME_DIST', 'DENSE_RANK', 'EMPTY', 'EXCEPT', 'FIRST_VALUE', 'FUNCTION', 'GROUPING', 'GROUPS', 'JSON_TABLE', 'LAG', 'LAST_VALUE', 'LEAD', 'NTH_VALUE', 'NTILE', 'OF', 'OVER', 'PERCENT_RANK', 'PERSIST', 'PERSIST_ONLY', 'RANK', 'RECURSIVE', 'ROW', 'ROWS', 'ROW_NUMBER', 'SYSTEM', 'WINDOW', 'LATERAL', 'ARRAY' ,'MEMBER' )
select SCHEMA_NAME, concat('schema''s default character set: ',  DEFAULT_CHARACTER_SET_NAME) from INFORMATION_SCHEMA.schemata where SCHEMA_NAME not in ('information_schema', 'performance_schema', 'sys') and DEFAULT_CHARACTER_SET_NAME in ('utf8', 'utf8mb3')
select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, concat('column''s default character set: ',CHARACTER_SET_NAME) from information_schema.columns where CHARACTER_SET_NAME in ('utf8', 'utf8mb3') and TABLE_SCHEMA not in ('sys', 'performance_schema', 'information_schema', 'mysql')
SELECT TABLE_SCHEMA, TABLE_NAME, 'Table name used in mysql schema in 8.0' as WARNING FROM INFORMATION_SCHEMA.TABLES WHERE LOWER(TABLE_SCHEMA) = 'mysql' and LOWER(TABLE_NAME) IN ('catalogs', 'character_sets', 'collations', 'column_type_elements', 'columns', 'dd_properties', 'events', 'foreign_key_column_usage', 'foreign_keys', 'index_column_usage', 'index_partitions', 'index_stats', 'indexes', 'parameter_type_elements', 'parameters', 'routines', 'schemata', 'st_spatial_reference_systems', 'table_partition_values', 'table_partitions', 'table_stats', 'tables', 'tablespace_files', 'tablespaces', 'triggers', 'view_routine_usage', 'view_table_usage', 'component', 'default_roles', 'global_grants', 'innodb_ddl_log', 'innodb_dynamic_metadata', 'password_history', 'role_edges')
select table_schema, table_name, concat(engine, ' engine does not support native partitioning') from information_schema.Tables where create_options like '%partitioned%' and upper(engine) not in ('INNODB', 'NDB', 'NDBCLUSTER')
select table_schema, table_name, 'Foreign key longer than 64 characters' as description from information_schema.tables where table_name in (select left(substr(id,instr(id,'/')+1), instr(substr(id,instr(id,'/')+1),'_ibfk_')-1) from information_schema.innodb_sys_foreign where length(substr(id,instr(id,'/')+1))>64)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete MAXDB sql_mode') from information_schema.routines where find_in_set('MAXDB', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete MAXDB sql_mode' from information_schema.EVENTS where find_in_set('MAXDB', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete MAXDB sql_mode' from information_schema.TRIGGERS where find_in_set('MAXDB', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete MAXDB option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('MAXDB', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete DB2 sql_mode') from information_schema.routines where find_in_set('DB2', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete DB2 sql_mode' from information_schema.EVENTS where find_in_set('DB2', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete DB2 sql_mode' from information_schema.TRIGGERS where find_in_set('DB2', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete DB2 option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('DB2', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete MSSQL sql_mode') from information_schema.routines where find_in_set('MSSQL', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete MSSQL sql_mode' from information_schema.EVENTS where find_in_set('MSSQL', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete MSSQL sql_mode' from information_schema.TRIGGERS where find_in_set('MSSQL', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete MSSQL option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('MSSQL', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete MYSQL323 sql_mode') from information_schema.routines where find_in_set('MYSQL323', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete MYSQL323 sql_mode' from information_schema.EVENTS where find_in_set('MYSQL323', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete MYSQL323 sql_mode' from information_schema.TRIGGERS where find_in_set('MYSQL323', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete MYSQL323 option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('MYSQL323', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete MYSQL40 sql_mode') from information_schema.routines where find_in_set('MYSQL40', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete MYSQL40 sql_mode' from information_schema.EVENTS where find_in_set('MYSQL40', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete MYSQL40 sql_mode' from information_schema.TRIGGERS where find_in_set('MYSQL40', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete MYSQL40 option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('MYSQL40', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete NO_AUTO_CREATE_USER sql_mode') from information_schema.routines where find_in_set('NO_AUTO_CREATE_USER', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete NO_AUTO_CREATE_USER sql_mode' from information_schema.EVENTS where find_in_set('NO_AUTO_CREATE_USER', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete NO_AUTO_CREATE_USER sql_mode' from information_schema.TRIGGERS where find_in_set('NO_AUTO_CREATE_USER', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete NO_AUTO_CREATE_USER option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('NO_AUTO_CREATE_USER', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete NO_FIELD_OPTIONS sql_mode') from information_schema.routines where find_in_set('NO_FIELD_OPTIONS', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete NO_FIELD_OPTIONS sql_mode' from information_schema.EVENTS where find_in_set('NO_FIELD_OPTIONS', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete NO_FIELD_OPTIONS sql_mode' from information_schema.TRIGGERS where find_in_set('NO_FIELD_OPTIONS', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete NO_FIELD_OPTIONS option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('NO_FIELD_OPTIONS', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete NO_KEY_OPTIONS sql_mode') from information_schema.routines where find_in_set('NO_KEY_OPTIONS', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete NO_KEY_OPTIONS sql_mode' from information_schema.EVENTS where find_in_set('NO_KEY_OPTIONS', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete NO_KEY_OPTIONS sql_mode' from information_schema.TRIGGERS where find_in_set('NO_KEY_OPTIONS', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete NO_KEY_OPTIONS option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('NO_KEY_OPTIONS', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete NO_TABLE_OPTIONS sql_mode') from information_schema.routines where find_in_set('NO_TABLE_OPTIONS', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete NO_TABLE_OPTIONS sql_mode' from information_schema.EVENTS where find_in_set('NO_TABLE_OPTIONS', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete NO_TABLE_OPTIONS sql_mode' from information_schema.TRIGGERS where find_in_set('NO_TABLE_OPTIONS', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete NO_TABLE_OPTIONS option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('NO_TABLE_OPTIONS', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete ORACLE sql_mode') from information_schema.routines where find_in_set('ORACLE', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete ORACLE sql_mode' from information_schema.EVENTS where find_in_set('ORACLE', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete ORACLE sql_mode' from information_schema.TRIGGERS where find_in_set('ORACLE', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete ORACLE option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('ORACLE', variable_value)
select routine_schema, routine_name, concat(routine_type, ' uses obsolete POSTGRESQL sql_mode') from information_schema.routines where find_in_set('POSTGRESQL', sql_mode)
select event_schema, event_name, 'EVENT uses obsolete POSTGRESQL sql_mode' from information_schema.EVENTS where find_in_set('POSTGRESQL', sql_mode)
select trigger_schema, trigger_name, 'TRIGGER uses obsolete POSTGRESQL sql_mode' from information_schema.TRIGGERS where find_in_set('POSTGRESQL', sql_mode)
select concat('global system variable ', variable_name), 'defined using obsolete POSTGRESQL option' as reason from performance_schema.global_variables where variable_name = 'sql_mode' and find_in_set('POSTGRESQL', variable_value)
select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, UPPER(DATA_TYPE), COLUMN_TYPE, CHARACTER_MAXIMUM_LENGTH from information_schema.columns where data_type in ('enum','set') and CHARACTER_MAXIMUM_LENGTH > 255 and table_schema not in ('information_schema')
SELECT TABLE_SCHEMA, TABLE_NAME, concat('Partition ', PARTITION_NAME, ' is in shared tablespace ', TABLESPACE_NAME) as description FROM information_schema.PARTITIONS WHERE PARTITION_NAME IS NOT NULL AND (TABLESPACE_NAME IS NOT NULL AND TABLESPACE_NAME!='innodb_file_per_table')
SELECT tablespace_name, concat('circular reference in datafile path: \'', file_name, '\'') FROM INFORMATION_SCHEMA.FILES where file_type='TABLESPACE' and (file_name rlike '[^\\.]/\\.\\./' or file_name rlike '[^\\.]\\\\\\.\\.\\\\')
select table_schema, table_name, '', 'VIEW', UPPER(view_definition) from information_schema.views where table_schema not in ('performance_schema','information_schema','sys','mysql')
select routine_schema, routine_name, '', routine_type, UPPER(routine_definition) from information_schema.routines where routine_schema not in ('performance_schema','information_schema','sys','mysql')
select TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME, 'COLUMN', UPPER(GENERATION_EXPRESSION) from information_schema.columns where extra regexp 'generated' and table_schema not in ('performance_schema','information_schema','sys','mysql')
select TRIGGER_SCHEMA, TRIGGER_NAME, '', 'TRIGGER', UPPER(ACTION_STATEMENT) from information_schema.triggers where TRIGGER_SCHEMA not in ('performance_schema','information_schema','sys','mysql')
select event_schema, event_name, '', 'EVENT', UPPER(EVENT_DEFINITION) from information_schema.events where event_schema not in ('performance_schema','information_schema','sys','mysql')
select table_schema, table_name, 'VIEW', UPPER(view_definition) from information_schema.views where table_schema not in ('performance_schema','information_schema','sys','mysql') and (UPPER(view_definition) like '%ASC%' or UPPER(view_definition) like '%DESC%')
select routine_schema, routine_name, routine_type, UPPER(routine_definition) from information_schema.routines where routine_schema not in ('performance_schema','information_schema','sys','mysql') and (UPPER(routine_definition) like '%ASC%' or UPPER(routine_definition) like '%DESC%')
select TRIGGER_SCHEMA, TRIGGER_NAME, 'TRIGGER', UPPER(ACTION_STATEMENT) from information_schema.triggers where TRIGGER_SCHEMA not in ('performance_schema','information_schema','sys','mysql') and (UPPER(ACTION_STATEMENT) like '%ASC%' or UPPER(ACTION_STATEMENT) like '%DESC%')
select event_schema, event_name, 'EVENT', UPPER(EVENT_DEFINITION) from information_schema.events where event_schema not in ('performance_schema','information_schema','sys','mysql') and (UPPER(event_definition) like '%ASC%' or UPPER(event_definition) like '%DESC%')
select 'global.sql_mode', 'does not contain either NO_ZERO_DATE or NO_ZERO_IN_DATE which allows insertion of zero dates' from (SELECT @@global.sql_mode like '%NO_ZERO_IN_DATE%' and @@global.sql_mode like '%NO_ZERO_DATE%' as zeroes_enabled) as q where q.zeroes_enabled = 0
select 'session.sql_mode', concat(' of ', q.thread_count, ' session(s) does not contain either NO_ZERO_DATE or NO_ZERO_IN_DATE which allows insertion of zero dates') FROM (select count(thread_id) as thread_count from performance_schema.variables_by_thread WHERE variable_name = 'sql_mode' and (variable_value not like '%NO_ZERO_IN_DATE%' or variable_value not like '%NO_ZERO_DATE%')) as q where q.thread_count > 0
select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, concat('column has zero default value: ', COLUMN_DEFAULT) from information_schema.columns where TABLE_SCHEMA not in ('performance_schema','information_schema','sys','mysql') and DATA_TYPE in ('timestamp', 'datetime', 'date') and COLUMN_DEFAULT like '0000-00-00%'
select A.schema_name, A.table_name, 'present in INFORMATION_SCHEMA''s INNODB_SYS_TABLES table but missing from TABLES table' from (select distinct replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(substring_index(NAME, '/',1), '@002d', '-'), '@003a', ':'), '@002e', '.'), '@0024', '$'), '@0021', '!'), '@003f', '?'), '@0025', '%'), '@0023', '#'), '@0026', '&'), '@002a', '*'), '@0040', '@')  as schema_name, replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(substring_index(substring_index(NAME, '/',-1),'#',1), '@002d', '-'), '@003a', ':'), '@002e', '.'), '@0024', '$'), '@0021', '!'), '@003f', '?'), '@0025', '%'), '@0023', '#'), '@0026', '&'), '@002a', '*'), '@0040', '@')  as table_name from information_schema.innodb_sys_tables where NAME like '%/%') A left join information_schema.tables I on A.table_name = I.table_name and A.schema_name = I.table_schema where A.table_name not like 'FTS_0%' and (I.table_name IS NULL or I.table_schema IS NULL) and A.table_name not REGEXP '@[0-9]' and A.schema_name not REGEXP '@[0-9]'
select a.table_schema, a.table_name, concat('recognized by the InnoDB engine but belongs to ', a.engine) from information_schema.tables a join (select replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(substring_index(NAME, '/',1), '@002d', '-'), '@003a', ':'), '@002e', '.'), '@0024', '$'), '@0021', '!'), '@003f', '?'), '@0025', '%'), '@0023', '#'), '@0026', '&'), '@002a', '*'), '@0040', '@')  as table_schema, replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(substring_index(substring_index(NAME, '/',-1),'#',1), '@002d', '-'), '@003a', ':'), '@002e', '.'), '@0024', '$'), '@0021', '!'), '@003f', '?'), '@0025', '%'), '@0023', '#'), '@0026', '&'), '@002a', '*'), '@0040', '@')  as table_name from information_schema.innodb_sys_tables where NAME like '%/%') b on a.table_schema = b.table_schema and a.table_name = b.table_name where a.engine != 'Innodb'
FLUSH LOCAL TABLES
SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA not in ('information_schema', 'performance_schema', 'sys')
CHECK TABLE `mysql`.`columns_priv` FOR UPGRADE
CHECK TABLE `mysql`.`db` FOR UPGRADE
CHECK TABLE `mysql`.`engine_cost` FOR UPGRADE
CHECK TABLE `mysql`.`event` FOR UPGRADE
CHECK TABLE `mysql`.`func` FOR UPGRADE
CHECK TABLE `mysql`.`general_log` FOR UPGRADE
CHECK TABLE `mysql`.`gtid_executed` FOR UPGRADE
CHECK TABLE `mysql`.`help_category` FOR UPGRADE
CHECK TABLE `mysql`.`help_keyword` FOR UPGRADE
CHECK TABLE `mysql`.`help_relation` FOR UPGRADE
CHECK TABLE `mysql`.`help_topic` FOR UPGRADE
CHECK TABLE `mysql`.`innodb_index_stats` FOR UPGRADE
CHECK TABLE `mysql`.`innodb_table_stats` FOR UPGRADE
CHECK TABLE `mysql`.`ndb_binlog_index` FOR UPGRADE
CHECK TABLE `mysql`.`plugin` FOR UPGRADE
CHECK TABLE `mysql`.`proc` FOR UPGRADE
CHECK TABLE `mysql`.`procs_priv` FOR UPGRADE
CHECK TABLE `mysql`.`proxies_priv` FOR UPGRADE
CHECK TABLE `mysql`.`server_cost` FOR UPGRADE
CHECK TABLE `mysql`.`servers` FOR UPGRADE
CHECK TABLE `mysql`.`slave_master_info` FOR UPGRADE
CHECK TABLE `mysql`.`slave_relay_log_info` FOR UPGRADE
CHECK TABLE `mysql`.`slave_worker_info` FOR UPGRADE
CHECK TABLE `mysql`.`slow_log` FOR UPGRADE
CHECK TABLE `mysql`.`tables_priv` FOR UPGRADE
CHECK TABLE `mysql`.`time_zone` FOR UPGRADE
CHECK TABLE `mysql`.`time_zone_leap_second` FOR UPGRADE
CHECK TABLE `mysql`.`time_zone_name` FOR UPGRADE
CHECK TABLE `mysql`.`time_zone_transition` FOR UPGRADE
CHECK TABLE `mysql`.`time_zone_transition_type` FOR UPGRADE
CHECK TABLE `mysql`.`user` FOR UPGRADE

check.txt

Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
WARNING: Using a password on the command line interface can be insecure.
The MySQL server at 172.17.0.3:3306, version 5.7.33 - MySQL Community Server
(GPL), will now be checked for compatibility issues for upgrade to MySQL
8.0.24...

1) Usage of old temporal type
  No issues found

2) Usage of db objects with names conflicting with new reserved keywords
  No issues found

3) Usage of utf8mb3 charset
  No issues found

4) Table names in the mysql schema conflicting with new tables in 8.0
  No issues found

5) Partitioned tables using engines with non native partitioning
  No issues found

6) Foreign key constraint names longer than 64 characters
  No issues found

7) Usage of obsolete MAXDB sql_mode flag
  No issues found

8) Usage of obsolete sql_mode flags
  Notice: The following DB objects have obsolete options persisted for
    sql_mode, which will be cleared during upgrade to 8.0.
  More information:

https://dev.mysql.com/doc/refman/8.0/en/mysql-nutshell.html#mysql-nutshell-removals

  global system variable sql_mode - defined using obsolete NO_AUTO_CREATE_USER
    option

9) ENUM/SET column definitions containing elements longer than 255 characters
  No issues found

10) Usage of partitioned tables in shared tablespaces
  No issues found

11) Circular directory references in tablespace data file paths
  No issues found

12) Usage of removed functions
  No issues found

13) Usage of removed GROUP BY ASC/DESC syntax
  No issues found

14) Removed system variables for error logging to the system log configuration
  To run this check requires full path to MySQL server configuration file to be specified at 'configPath' key of options dictionary
  More information:

https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-13.html#mysqld-8-0-13-logging

15) Removed system variables
  To run this check requires full path to MySQL server configuration file to be specified at 'configPath' key of options dictionary
  More information:

https://dev.mysql.com/doc/refman/8.0/en/added-deprecated-removed.html#optvars-removed

16) System variables with new default values
  To run this check requires full path to MySQL server configuration file to be specified at 'configPath' key of options dictionary
  More information:

https://mysqlserverteam.com/new-defaults-in-mysql-8-0/

17) Zero Date, Datetime, and Timestamp values
  No issues found

18) Schema inconsistencies resulting from file removal or corruption
  No issues found

19) Tables recognized by InnoDB that belong to a different engine
  No issues found

20) Issues reported by 'check table x for upgrade' command
  No issues found

21) New default authentication plugin considerations
  Warning: The new default authentication plugin 'caching_sha2_password' offers
    more secure password hashing than previously used 'mysql_native_password'
    (and consequent improved client connection authentication). However, it also
    has compatibility implications that may affect existing MySQL installations.
    If your MySQL installation must serve pre-8.0 clients and you encounter
    compatibility issues after upgrading, the simplest way to address those
    issues is to reconfigure the server to revert to the previous default
    authentication plugin (mysql_native_password). For example, use these lines
    in the server option file:

    [mysqld]
    default_authentication_plugin=mysql_native_password

    However, the setting should be viewed as temporary, not as a long term or
    permanent solution, because it causes new accounts created with the setting
    in effect to forego the improved authentication security.
    If you are using replication please take time to understand how the
    authentication plugin changes may impact you.
  More information:

https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html#upgrade-caching-sha2-password-compatibility-issues


https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html#upgrade-caching-sha2-password-replication

Errors:   0
Warnings: 1
Notices:  1

No fatal errors were found that would prevent an upgrade, but some potential issues were detected. Please ensure that the reported issues are not significant before upgrading.

The pre-pre SQL check

I now am armed with an simplified single SQL statement. It does of course take a long to run in a cluster with thousands of tables.

select A.schema_name, A.table_name, 
       'present in INFORMATION_SCHEMA''s INNODB_SYS_TABLES table but missing from TABLES table' 
from (select distinct replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(substring_index(NAME, '/',1), '@002d', '-'), '@003a', ':'), '@002e', '.'), '@0024', '$'), '@0021', '!'), '@003f', '?'), '@0025', '%'), '@0023', '#'), '@0026', '&'), '@002a', '*'), '@0040', '@')  as schema_name, 
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(substring_index(substring_index(NAME, '/',-1),'#',1), '@002d', '-'), '@003a', ':'), '@002e', '.'), '@0024', '$'), '@0021', '!'), '@003f', '?'), '@0025', '%'), '@0023', '#'), '@0026', '&'), '@002a', '*'), '@0040', '@')  as table_name
 from information_schema.innodb_sys_tables 
where NAME like '%/%') A 
left join information_schema.tables I on A.table_name = I.table_name and A.schema_name = I.table_schema 
where A.table_name not like 'FTS_0%' 
and (I.table_name IS NULL or I.table_schema IS NULL) 
and A.table_name not REGEXP '@[0-9]' 
and A.schema_name not REGEXP '@[0-9]')

I then performed a number of drop/remove/restart/re-create/discard tablespace steps with no success. As a managed service RDS the only course of action now is to open an AWS Support ticket for help with this specific internal corruption.

Upgrading to AWS Aurora MySQL 8

With Aurora MySQL 8 now generally available to all, you may want to consider the plan for an upgrade path if you would like to take advantage of the new features for your application, for example, Common Table Expressions (CTE). This new major release has a much improved and streamlined upgrade progress from Aurora MySQL 5.7.

This tutorial will provide all the steps to allow you to try out setting up an Aurora cluster and performing an upgrade without the impact on your existing AWS environment. The two pre-requisites to getting started are:

You can find all the CLI cut/paste commands in my AWS Tutorials repo. This will lead you through all of the various AWS dependencies for a successful RDS Aurora cluster including IAM, KMS, VPC and EC2 requirements.

Create an RDS Aurora MySQL Cluster and Aurora MySQL Major upgrade – Aurora 2.x to Aurora 3.x can provide you with a POC of the primary operations path to achieving the goal of this post in under 30 minutes.

While this example will produce an upgraded cluster with some warnings, in real life a more detailed upgrade assessment is needed for any new version of software. The MySQL and Aurora pre-checks can be performed to minimize surprises during the final process of your data migration.

mysqlcheck –check-upgrade and the mysqlsh util.checkForServerUpgrade() pre-checks can help to assist in being prepared and not have your Cluster instances with the incompatible-parameters status. At this point download the upgrade-prechecks.log Aurora Log and trash your cluster and instance. They are unusable. Reviewing the upgrade-prechecks.log can contain more information than mysqlsh util.checkForServerUpgrade() output.

With an Aurora cluster configured with an instance parameter group enabling MySQL binary log replication, it is easy to have a functioning Aurora 5.7 Cluster with real-time replication to an Aurora 8 Cluster to minimize any downtime in your production environment and then benefit from an atomic data dictionary, roles, descending indexes, improved internal temporary table, additional JSON functions, Window Functions, CTEs and more!

More Reading

AWS Aurora MySQL 8 is now generally available

AWS has just announced the general availability of Aurora MySQL 8 compatibility (known as Aurora Version 3). This is long awaited addition to RDS MySQL 8 and provides many of the new features that can be found in the open-source MySQL 8 community version.

For those unfamiliar with Amazon Aurora my Understanding AWS RDS Aurora Capabilities presentation from Percona Live 2021 provides a great introduction of the benefits of this managed service.

There is a lot to digest and the Aurora User Guide provides details of the new features from the MySQL 8 community version, and of Aurora 3 new features, and feature differences or unsupported features. This AWS blog post also provides a general introduction.

It is very easy to spin up a new Aurora MySQL 3.01.0 cluster in an existing environment containing existing Aurora clusters. After defining new cluster and instance parameter groups for the aurora-mysql8.0 family, or starting with the available default.aurora-mysql8.0 parameter groups, there are no other differences in aws rds create-db-cluster syntax, or using the AWS Console or Terraform syntax for example.

Before considering a migration of an existing Aurora cluster, there is a lot of information around parameter changes (including inclusive language functionality), and those related status and CloudWatch Metrics changes. Yes, looking at the 29 ‘Removed from Aurora MySQL version 3′, 30 ‘This parameter applies to Aurora MySQL version 3 and higher’ and presently ‘Currently not available in Aurora MySQL version 3′ LOAD|SELECT S3 capabilities is important. There are new reserved words to be aware of, you will need to note how to take advantage of roles within the Aurora permissions model.

Migrating an existing Aurora MySQL 2 cluster to Aurora 3 is a little more involved than specifying the snapshot-id. Seeing your restored Aurora 2 snapshot in an Aurora 3 cluster but with a status of incompatible-parameters is a good indication that more work is needed. While I will detail some of my experiences in a subsequent post, one helpful tip is found in those additional pages of the 5 rows of logs for your new cluster after all the error.log files, you will find an upgrade-prechecks.log file. This contains an extensive list of checks and warnings performed for the upgrade. Skipping to the end of the JSON will give you an idea of your errorCount, warningCount and noticeCount.

Searching then for an object of “status”: “ERROR” will find the errorCount entries matching the count. Several other checks provide a “detectedProblems” section and a “level”: “Error” which would seem to be needed to be also corrected. There are a lot of checks between the INFORMATION_SCHEMA, InnoDB internal data dictionary and actual data/files on disk. You will also be presented with a nice long list of tables/columns using reserved words, as well as character set deprecations.

At a more technical glance of the documentation, there is a key change in how internal temporary tables are created, and how this differs from writer and reader instances. Benchmarking your application in different configurations will definitely be recommended.

Restoring an Aurora 2 cluster into Aurora 3 also took significantly more time; many hours; than a simple restore-db-cluster-from-snapshot you may be used to. While Terraform cluster creation timeouts need to be increased for global clusters, this time the default 1h30 timeout for an instance was also exceeded.

While different features will benefit different consumers of your Aurora database, one of the most anticipated is CTEs. From the operations perspective, as a managed service Aurora offers a subset of community features. One great feature that is now available in Aurora 3 is binary log filtering, a simple long-lived option in MySQL land that will help replacing more complex functionality.

This is a great and long awaited version release for Aurora.

#WDILTW – RTFM, then RTFM again, then improve it

This week I learned two valuable aspects of Terraform I did not know.

The first is Terraform State Import. While I use terraform state to list and show state and even remove state, I was unaware you could import from a created AWS resource. It’s not actually an argument to the “terraform state” syntax, instead its “terraform import” and likely why I do not see it when I look at terraform state syntax.

% terraform state
Usage: terraform [global options] state  [options] [args]

  This command has subcommands for advanced state management.

  These subcommands can be used to slice and dice the Terraform state.
  This is sometimes necessary in advanced cases. For your safety, all
  state management commands that modify the state create a timestamped
  backup of the state prior to making modifications.

  The structure and output of the commands is specifically tailored to work
  well with the common Unix utilities such as grep, awk, etc. We recommend
  using those tools to perform more advanced state tasks.

Subcommands:
    list                List resources in the state
    mv                  Move an item in the state
    pull                Pull current state and output to stdout
    push                Update remote state from a local state file
    replace-provider    Replace provider in the state
    rm                  Remove instances from the state

I am not an expert in Terraform, and looking at the command help output shown above did not give me reference to look elsewhere, but just reading the manual can help you to learn a new feature. If you do not know a product, reading documentation and examples can be an ideal way to get started in a self-paced way.

The second is Meta-Arguments. I use lifecycle, and to be honest I have learned and forgotten about count. Count was something I was able to use to solve a very nasty cross-region kinesis stream issue, reminding me of a syntax I had since forgotten. Using coalesce and conditional expressions (aka ternary operator) can help in modules, for example.

resource "aws_rds_cluster" "demo" {
  ...
  global_cluster_identifier       = var.has_global_cluster ? local.global_cluster_identifier : ""
  master_username                 = var.has_global_cluster ? "" : var.master_username
  db_cluster_parameter_group_name = coalesce(var.db_cluster_parameter_group_name , local.db_cluster_parameter_group_name)
  ...      

However to stop the creation of the object completely, use count.

resource "aws_???" "demo_???" {
  count = var.filter_condition ? 1 : 0
  ...

And just when I thought I’d read about Meta-Arguments, I hit a new never before seen problem. Now if I’d read the summary resources page about Meta-Arguments, and looked the very next section I would have been able to likely solve this new error without having to RTFM a second time.

module.?.?.aws_rds_cluster.default: Still creating... [1h59m53s elapsed]

Error: Error waiting for RDS Cluster state to be "available": timeout while waiting for state to become 'available' (last state: 'creating', timeout: 2h0m0s)

on .terraform/modules/?/main.tf line 306, in resource "aws_rds_cluster" "default":

306: resource "aws_rds_cluster" "default" {

I did not know there was a 2 hour timeout, and I did not know you can change that with

timeouts {
    create = "4h"
    delete = "4h"
  }
}

On a number of occasions I have found documentation to not be complete or accurate online. If you find this, then submit a request to get it fixed, must sources include a link at the bottom to recommend improvements. I have had good success with submitting improvements to the AWS documentation.

A QLDB Cheat Sheet for MySQL Users

The AWS ledger database (QLDB) is an auditors best friend and lives up to the stated description of “Amazon QLDB can be used to track each and every application data change and maintains a complete and verifiable history of changes over time.”

This presentation will go over what was done to take a MySQL application that provided auditing activity changes for key data, and how it is being migrated to QLDB.

While QLDB does use a SQL-format for DML (PartiQL), and you can perform the traditional INSERT/UPDATE/DELETE/SELECT, the ability to extend these statements to manipulate Amazon Ion data (a superset of JSON) gives you improved capabilities and statements.

Get a comparison of how to map a MySQL structure multiple tables and lots of columns into a single QLDB table and then benefit with an immutable and cryptographically verifiable transaction log. No more triggers, duplicated tables, extra auditing for abuse of binary log activity.

We also cover the simplicity of using X Protocol and JSON output for data migration, and the complexity of AWS RDS not supporting X Protocol.

Understanding AWS RDS Aurora Capabilities

The RDS Aurora MySQL/PostgreSQL capabilities of AWS extend the High Availability (HA) capabilities of RDS read replicas and Multi-AZ. In this presentation I discuss the different capabilities and HA configurations with RDS Aurora including:

  • RDS Aurora Cluster single instance
  • RDS Aurora Cluster multiple instances (writer + 1 or more readers)
  • RDS Aurora Cluster multi-master
  • RDS Aurora Global Cluster
  • RDS Aurora Cluster options for multi-regions

Each option has its relative merits and limitations. Each will depend on your business requirements, global needs and budget.

Upcoming Percona Live 2021 Presentations

I am pleased to have been selected to present at Percona Live 2021 May 12-13. My presentations include talks on AWS RDS Aurora and QLDB managed services.

Understanding AWS RDS Aurora Capabilities

The RDS Aurora MySQL/PostgreSQL capabilities of AWS extend the HA capabilities of RDS read replicas and Multi-AZ.

In this presentation we will discuss the different capabilities and HA configurations with RDS Aurora including:

* RDS Cluster single instance
* RDS Cluster multiple instances (writer + 1 or more readers)
* RDS Cluster multi-master
* RDS Global Cluster
* RDS Cluster options for multi-regions

Each option has its relative merits and limitations. Each will depend on your business requirements, global needs and budget.

This presentation will include setup, monitoring and failover evaluations for the attendee with the goal to provide a feature matrix of when/how to consider each option as well as provide some details of the subtle differences Aurora provides.

This presentation is not going to go into the technical details of RDS Aurora’s underlying infrastructure or a feature by feature comparison of AWS RDS to AWS RDS Aurora.

A QLDB Cheatsheet for MySQL Users

Amazons new ledger database (QLDB) is an auditors best friend and lives up to the stated description of “Amazon QLDB can be used to track each and every application data change and maintains a complete and verifiable history of changes over time.”

This presentation will go over what was done to take a MySQL application that provided auditing activity changes for key data, and how it is being migrated to QLDB.

While QLDB does use a SQL-format for DML, and you can perform the traditional INSERT/UPDATE/DELETE/SELECT. The ability to extend these statements to manipulate Amazon Ion data (a superset of JSON) gives you improved data manipulation, and for example the FROM SQL statement.

Get a blow by blow comparison of MySQL structures (multiple tables and lots of columns) and SQL converted into a single QLDB table, with immutable, and cryptographically verifiable transaction log. No more triggers, duplicated tables, extra auditing for abuse of binary log activity.

We also cover the simplicity of using X Protocol and JSON output for data migration, and the complexity of AWS RDS not supporting X Protocol

#WDILTW – Creating examples can be hard

This week I was evaluating AWS QLDB. Specifically the verifiable history of changes to determine how to simplify present processes that perform auditing via CDC. This is not the first time I have looked at QLDB so there was nothing that new to learn.

What I found was that creating a workable solution with an existing application is hard. Even harder is creating an example to publish in this blog (and the purpose of this post).

First some background.

Using MySQL as the source of information, how can you leverage QLDB? It’s easy to stream data from MySQL Aurora, and it’s easy to stream data from QLDB, but it not that easy to place real-time data into QLDB. AWS DMS is a good way to move data from a source to a target, previously my work has included MySQL to MySQL, MySQL to Redshift, and MySQL to Kinesis, however there is no QLDB target.

Turning the problem upside down, and using QLDB as the source of information, and streaming to MySQL for compatibility seemed a way forward.

After setting up the QLDB Ledger and an example table, it was time to populate with existing data. The documented reference example looked very JSON compatible. Side bar, it is actually Amazon Ion a superset of JSON.

INSERT INTO Person
<< {
    'FirstName' : 'Raul',
    'LastName' : 'Lewis',
    'DOB' : `1963-08-19T`,
    'GovId' : 'LEWISR261LL',
    'GovIdType' : 'Driver License',
    'Address' : '1719 University Street, Seattle, WA, 98109'
},
{
    'FirstName' : 'Brent',
    'LastName' : 'Logan',
    'DOB' : `1967-07-03T`,
    'GovId' : 'LOGANB486CG',
    'GovIdType' : 'Driver License',
    'Address' : '43 Stockert Hollow Road, Everett, WA, 98203'
}

Now, MySQL offers with the X Protocol. This is something that lefred has evangelized for many years, I have seen presented many times, but finally I had a chance to use. The MySQL Shell JSON output looked ideal.

{
    "ID": 1523,
    "Name": "Wien",
    "CountryCode": "AUT",
    "District": "Wien",
    "Info": {
        "Population": 1608144
    }
}
{
    "ID": 1524,
    "Name": "Graz",
    "CountryCode": "AUT",
    "District": "Steiermark",
    "Info": {
        "Population": 240967
    }
}

And now, onto some of the things I learned this week.
Using AWS RDS Aurora MySQL is the first stumbling block, X Protocol is not supported. As this was a example, simple, mysqldump some reference data and load it into a MySQL 8 instance, and extract into JSON, so as to potentially emulate a pipeline.

Here is my experiences of trying to refactor into a demo to write up.

Launch a MySQL Docker container as per my standard notes. Harmless, right?

MYSQL_ROOT_PASSWORD="$(date | md5sum | cut -c1-20)#"
echo $MYSQL_ROOT_PASSWORD
docker run --name=qldb-mysql -p3306:3306 -v mysql-volume:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=$MYSQL_ROOT_PASSWORD -d mysql/mysql-server:latest
docker logs qldb-mysql
docker exec -it qldb-mysql /bin/bash

As it's a quick demo, I shortcut credentials to make using the mysql client easier. NOTE: as I always generate a new password each container, it's included here.

# echo "[mysql]
user=root
password='ab6ea7b0436cbc0c0d49#' > .my.cnf

# mysql 
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

What the? Did I make a mistake, I test manually and check

# mysql -u root -p

# cat .my.cnf

Nothing wrong there. Next check

# pwd
/
bash-4.2# grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

And there is the first Dockerism. I don't live in Docker, so these 101 learnings would be known. First I really thing using "root" by default is a horrible idea. And when you shell in, you are not dropped into the home directory? Solved, we move on.

# mv /.my.cnf /root/.my.cnf

Mock and example as quickly as I can think.

# mysql

mysql> create schema if not exists demo;
Query OK, 1 row affected (0.00 sec)

mysql> use demo;
Database changed
mysql> create table sample(id int unsigned not null auto_increment, name varchar(30) not null, location varchar(30) not null, domain varchar(50) null, primary key(id));
Query OK, 0 rows affected (0.03 sec)
mysql> show create table sample;

mysql> insert into sample values (null,'Demo Row','USA',null), (null,'Row 2','AUS','news.com.au'), (null,'Kiwi','NZ', null);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from sample;
+----+----------+----------+-------------+
| id | name     | location | domain      |
+----+----------+----------+-------------+
|  1 | Demo Row | USA      | NULL        |
|  2 | Row 2    | AUS      | news.com.au |
|  3 | Kiwi     | NZ       | NULL        |
+----+----------+----------+-------------+
3 rows in set (0.00 sec)

Cool, now to look at it in Javascript using MySQL Shell. Hurdle 2.

# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

 MySQL  JS > var session=mysqlx.getSession('root:ab6ea7b0436cbc0c0d49#@localhost')
mysqlx.getSession: Argument #1: Invalid URI: Illegal character [#] found at position 25 (ArgumentError)


What the, it doesn't like the password format. I'm not a Javascript person, and well this is an example for blogging, which is not what was actually setup, so do it the right way, create a user.

# mysql

mysql> create user demo@localhost identified by 'qldb';
Query OK, 0 rows affected (0.01 sec)

mysql> grant ALL ON sample.* to demo@localhost;
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> SHOW GRANTS FOR demo@localhost;
+----------------------------------------------------------+
| Grants for demo@localhost                                |
+----------------------------------------------------------+
| GRANT USAGE ON *.* TO `demo`@`localhost`                 |
| GRANT ALL PRIVILEGES ON `sample`.* TO `demo`@`localhost` |
+----------------------------------------------------------+
2 rows in set (0.00 sec)

Back into the MySQL Shell, and hurdle 3.

MySQL  JS > var session=mysqlx.getSession('demo:qldb@localhost')
mysqlx.getSession: Access denied for user 'demo'@'127.0.0.1' (using password: YES) (MySQL Error 1045)

Did I create the creds wrong, verify. No my password is correct.

#  mysql -udemo -pqldb -e "SELECT NOW()"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------+
| NOW()               |
+---------------------+
| 2021-03-06 23:15:26 |
+---------------------+

I don't have time to debug this, User take 2.

mysql> drop user demo@localhost;
Query OK, 0 rows affected (0.00 sec)

mysql> create user demo@'%' identified by 'qldb';
Query OK, 0 rows affected (0.01 sec)

mysql> grant all on demo.* to demo@'%'
    -> ;
Query OK, 0 rows affected (0.00 sec)

mysql> show grants;
+--
| Grants for root@localhost                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+---
| GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `root`@`localhost` WITH GRANT OPTION                                                                                                                                                                                                                    |
| GRANT APPLICATION_PASSWORD_ADMIN,AUDIT_ADMIN,BACKUP_ADMIN,BINLOG_ADMIN,BINLOG_ENCRYPTION_ADMIN,CLONE_ADMIN,CONNECTION_ADMIN,ENCRYPTION_KEY_ADMIN,FLUSH_OPTIMIZER_COSTS,FLUSH_STATUS,FLUSH_TABLES,FLUSH_USER_RESOURCES,GROUP_REPLICATION_ADMIN,INNODB_REDO_LOG_ARCHIVE,INNODB_REDO_LOG_ENABLE,PERSIST_RO_VARIABLES_ADMIN,REPLICATION_APPLIER,REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,SERVICE_CONNECTION_ADMIN,SESSION_VARIABLES_ADMIN,SET_USER_ID,SHOW_ROUTINE,SYSTEM_USER,SYSTEM_VARIABLES_ADMIN,TABLE_ENCRYPTION_ADMIN,XA_RECOVER_ADMIN ON *.* TO `root`@`localhost` WITH GRANT OPTION |
| GRANT PROXY ON ''@'' TO 'root'@'localhost' WITH GRANT OPTION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+---
3 rows in set (0.00 sec)

mysql> show grants for demo@'%';
+--------------------------------------------------+
| Grants for demo@%                                |
+--------------------------------------------------+
| GRANT USAGE ON *.* TO `demo`@`%`                 |
| GRANT ALL PRIVILEGES ON `demo`.* TO `demo`@`%`   |
+--------------------------------------------------+
2 rows in set (0.00 sec)

Right, initially I showed grants of not new user, but note to self, I should checkout the MySQL 8 Improved grants. I wonder how RDS MySQL 8 handles these, and how Aurora MySQL 8 will (when it ever drops, that's another story).

Third try is a charm, so nice to also see queries with 0.0000 execution granularity.

 MySQL  JS > var session=mysqlx.getSession('demo:qldb@localhost')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > session.sql(sql)
+----+----------+----------+-------------+
| id | name     | location | domain      |
+----+----------+----------+-------------+
|  1 | Demo Row | USA      | NULL        |
|  2 | Row 2    | AUS      | news.com.au |
|  3 | Kiwi     | NZ       | NULL        |
+----+----------+----------+-------------+
3 rows in set (0.0006 sec)

Get that now in JSON output. NOTE: There are 3 different JSON formats, this matched what I needed.

bash-4.2# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > var session=mysqlx.getSession('demo:qldb@localhost')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > shell.options.set('resultFormat','json/array')
 MySQL  JS > session.sql(sql)
[
{"id":1,"name":"Demo Row","location":"USA","domain":null},
{"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
{"id":3,"name":"Kiwi","location":"NZ","domain":null}
]
3 rows in set (0.0006 sec)

Ok, that works in interactive interface, I need it scripted.

# vi
bash: vi: command not found
# yum install vi
Loaded plugins: ovl
http://repo.mysql.com/yum/mysql-connectors-community/el/7/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden
Trying other mirror.
...

And another downer of Docker containers, other tools or easy ways to install them, again I want to focus on the actual example, and not all this preamble, so

# echo "var session=mysqlx.getSession('demo:qldb@localhost')
var sql='SELECT * FROM demo.sample'
shell.options.set('resultFormat','json/array')
session.sql(sql)" > dump.js


# mysqlsh < dump.js

What the? Hurdle 4. Did I typo this as well, I check the file, and cut/paste it and get what I expect.

# cat dump.js
var session=mysqlx.getSession('demo:qldb@localhost')
var sql='SELECT * FROM demo.sample'
shell.options.set('resultFormat','json/array')
session.sql(sql)
# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > var session=mysqlx.getSession('demo:qldb@localhost')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > shell.options.set('resultFormat','json/array')
 MySQL  JS > session.sql(sql)
[
{"id":1,"name":"Demo Row","location":"USA","domain":null},
{"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
{"id":3,"name":"Kiwi","location":"NZ","domain":null}
]
3 rows in set (0.0022 sec)

This is getting crazy.

# echo '[
> {"id":1,"name":"Demo Row","location":"USA","domain":null},
> {"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
> {"id":3,"name":"Kiwi","location":"NZ","domain":null}
> ]' > sample.json
bash-4.2# jq . sample.json
bash: jq: command not found

Oh the docker!!!!. Switching back to my EC2 instance now.

$ echo '[
> {"id":1,"name":"Demo Row","location":"USA","domain":null},
> {"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
> {"id":3,"name":"Kiwi","location":"NZ","domain":null}
> ]' > sample.json
$ jq . sample.json
[
  {
    "id": 1,
    "name": "Demo Row",
    "location": "USA",
    "domain": null
  },
  {
    "id": 2,
    "name": "Row 2",
    "location": "AUS",
    "domain": "news.com.au"
  },
  {
    "id": 3,
    "name": "Kiwi",
    "location": "NZ",
    "domain": null
  }
]

I am now way of the time I would like to spend on this weekly post, and it's getting way to long, and I'm nowhere near showing what I actually want. Still we trek on.

Boy, this stock EC2 image uses version 1, we need I'm sure V2, and well command does not work!!!!

$  aws qldb list-ledgers
ERROR:
$ aws --version

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install
$ export PATH=/usr/local/bin:$PATH
$ aws --version

Can I finally get a ledger now.

$ aws qldb create-ledger --name demo --tags JIRA=DEMO-5826,Owner=RonaldBradford --permissions-mode ALLOW_ALL --no-deletion-protection
 
{
    "Name": "demo",
    "Arn": "arn:aws:qldb:us-east-1:999:ledger/demo",
    "State": "CREATING",
    "CreationDateTime": "2021-03-06T22:46:41.760000+00:00",
    "DeletionProtection": false
}

$  aws qldb list-ledgers

{
    "Ledgers": [
        {
            "Name": "xx",
            "State": "ACTIVE",
            "CreationDateTime": "2021-03-05T20:12:44.611000+00:00"
        },
        {
            "Name": "demo",
            "State": "ACTIVE",
            "CreationDateTime": "2021-03-06T22:46:41.760000+00:00"
        }
    ]
}

$ aws qldb describe-ledger --name demo
{
    "Name": "demo",
    "Arn": "arn:aws:qldb:us-east-1:999:ledger/demo",
    "State": "ACTIVE",
    "CreationDateTime": "2021-03-06T22:46:41.760000+00:00",
    "DeletionProtection": false
}

Oh the Python 2, and the lack of user packaging, more crud of getting an example.

$ pip install pyqldb==3.1.0
ERROR

$ echo "alias python=python3
alias pip=pip3" >> ~/.bash_profile
source ~/.bash_profile
$ pip --version
pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)

$ python --version
Python 3.6.8

$ pip install pyqldb==3.1.0

ERROR

$ sudo pip install pyqldb==3.1.0

Yeah!, after all that, my example code works and data is inserted.

$ cat demo.py
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)


def create_table(transaction_executor, table):

    print("Creating table {}".format(table))
    transaction_executor.execute_statement("Create TABLE {}".format(table))

def create_index(transaction_executor, table, column):
    print("Creating index {}.{}".format(table, column))
    transaction_executor.execute_statement("CREATE INDEX ON {}({})".format(table,column))


def insert_record(transaction_executor, table, values):
    print("Inserting into {}".format(table))
    transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"
column="id"
qldb_driver.execute_lambda(lambda executor: create_table(executor, table))
qldb_driver.execute_lambda(lambda executor: create_index(executor, table, column))


record1 = { 'id': "1",
            'name': "Demo Row",
            'location': "USA",
            'domain':  ""
        }

qldb_driver.execute_lambda(lambda x: insert_record(x, table, record1))
$ python demo.py
Initializing the driver
Creating table sample
Creating index sample.id
Inserting into sample

One vets in the AWS Console, but you cannot show that in text in this blog, so goes to find a simple client and there is qldbshell

What the? I installed it and it complains about pyqldb.driver.pooled_qldb_driver. I literally used that in the last example.

$ pip3 install qldbshell
Collecting qldbshell
  Downloading PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth' -> '/tmp/pip-p8j4d45d-uninstall/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python3.6/site-packages/pip/commands/install.py", line 365, in run
    strip_file_prefix=options.strip_file_prefix,
  File "/usr/lib/python3.6/site-packages/pip/req/req_set.py", line 783, in install
    requirement.uninstall(auto_confirm=True)
  File "/usr/lib/python3.6/site-packages/pip/req/req_install.py", line 754, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/usr/lib/python3.6/site-packages/pip/req/req_uninstall.py", line 115, in remove
    renames(path, new_path)
  File "/usr/lib/python3.6/site-packages/pip/utils/__init__.py", line 267, in renames
    shutil.move(old, new)
  File "/usr/lib64/python3.6/shutil.py", line 565, in move
    os.unlink(src)
PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth'
[centos@ip-10-204-101-224] ~
$ sudo pip3 install qldbshell
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting qldbshell
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Installing collected packages: amazon.ion, qldbshell
  Found existing installation: amazon.ion 0.7.0
    Uninstalling amazon.ion-0.7.0:
      Successfully uninstalled amazon.ion-0.7.0
  Running setup.py install for amazon.ion ... done
  Running setup.py install for qldbshell ... done
Successfully installed amazon.ion-0.5.0 qldbshell-1.2.0


$ sudo pip3 install qldbshell

$ qldbshell
Traceback (most recent call last):
  File "/usr/local/bin/qldbshell", line 11, in 
    load_entry_point('qldbshell==1.2.0', 'console_scripts', 'qldbshell')()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 476, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2700, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2318, in load
    return self.resolve()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.6/site-packages/qldbshell/__main__.py", line 25, in 
    from pyqldb.driver.pooled_qldb_driver import PooledQldbDriver
ModuleNotFoundError: No module named 'pyqldb.driver.pooled_qldb_driver'
$ pip list qldbshell
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
amazon.ion (0.5.0)
boto3 (1.17.21)
botocore (1.20.21)
ionhash (1.1.0)
jmespath (0.10.0)
pip (9.0.3)
prompt-toolkit (3.0.16)
pyqldb (3.1.0)
python-dateutil (2.8.1)
qldbshell (1.2.0)
s3transfer (0.3.4)
setuptools (39.2.0)
six (1.15.0)
urllib3 (1.26.3)

So, uninstalled and re-installed and voila, my data.

$ qldbshell
usage: qldbshell [-h] [-v] [-s QLDB_SESSION_ENDPOINT] [-r REGION] [-p PROFILE]
                 -l LEDGER
qldbshell: error: the following arguments are required: -l/--ledger
$ qldbshell -l demo

Welcome to the Amazon QLDB Shell version 1.2.0
Use 'start' to initiate and interact with a transaction. 'commit' and 'abort' to commit or abort a transaction.
Use 'start; statement 1; statement 2; commit; start; statement 3; commit' to create transactions non-interactively.
Use 'help' for the help section.
All other commands will be interpreted as PartiQL statements until the 'exit' or 'quit' command is issued.

qldbshell >

qldbshell > SELECT * FROM sample;                                                                                                                           
INFO:
{
 id: "1",
 name: "Demo Row",
 location: "USA",
 domain: ""
}
INFO: (0.1718s)

qldbshell > \q                                                                                                                                              
WARNING: Error while executing query: An error occurred (BadRequestException) when calling the SendCommand operation: Lexer Error: at line 1, column 1: invalid character at, '\' [U+5c];
INFO: (0.1134s)
qldbshell > exit                                                                                                                                            
Exiting QLDB Shell

Right \q is a mysqlism of the client, need to rewire myself.

Now, I have a ledger, I created an example table, mocked a row of data and verified. Now I can just load my sample data in JSON I created earlier right? Wrong!!!

$ cat load.py
import json
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)

def insert_record(transaction_executor, table, values):
  print("Inserting into {}".format(table))
  transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"

with open('sample.json') as f:
  data=json.load(f)

qldb_driver.execute_lambda(lambda x: insert_record(x, table, data))

$ python load.py
Traceback (most recent call last):
  File "load.py", line 2, in 
    from pyqldb.config.retry_config import RetryConfig
ModuleNotFoundError: No module named 'pyqldb'
[centos@ip-10-204-101-224] ~

Oh sweet, I'd installed that, and used it, and re-installed it.

$ pip list | grep pyqldb
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
[centos@ip-10-204-101-224] ~
$ sudo pip3 install pyqldb
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting pyqldb
Requirement already satisfied: boto3<2,>=1.16.56 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: botocore<2,>=1.19.56 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: ionhash<2,>=1.1.0 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from amazon.ion<1,>=0.7.0->pyqldb)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/lib/python3.6/site-packages (from boto3<2,>=1.16.56->pyqldb)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/site-packages (from boto3<2,>=1.16.56->pyqldb)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/site-packages (from botocore<2,>=1.19.56->pyqldb)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<2,>=1.19.56->pyqldb)
Installing collected packages: amazon.ion, pyqldb
  Found existing installation: amazon.ion 0.5.0
    Uninstalling amazon.ion-0.5.0:
      Successfully uninstalled amazon.ion-0.5.0
  Running setup.py install for amazon.ion ... done
  Running setup.py install for pyqldb ... done
Successfully installed amazon.ion-0.7.0 pyqldb-3.1.0

Load one more time.


$ cat load.py
import json
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)

def insert_record(transaction_executor, table, values):
  print("Inserting into {}".format(table))
  transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"

with open('sample.json') as f:
  data=json.load(f)

qldb_driver.execute_lambda(lambda x: insert_record(x, table, data))

$ python load.py
Initializing the driver
Inserting into sample

And done, I've got my JSON extracted MySQL 8 data in QLDB. I go to vett it in the client, and boy, didn't expect yet another package screw up. Clearly, these 2 AWS python packages are incompatible. That's a venv need, but I'm now at double my desired time to show this.

$ qldbshell -l demo
Traceback (most recent call last):
  File "/usr/local/bin/qldbshell", line 11, in 
    load_entry_point('qldbshell==1.2.0', 'console_scripts', 'qldbshell')()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 476, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2700, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2318, in load
    return self.resolve()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.6/site-packages/qldbshell/__main__.py", line 25, in 
    from pyqldb.driver.pooled_qldb_driver import PooledQldbDriver
ModuleNotFoundError: No module named 'pyqldb.driver.pooled_qldb_driver'
[centos@ip-10-204-101-224] ~
$ pip list | grep qldbshell
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
qldbshell (1.2.0)


$ sudo pip uninstall qldbshell pyqldb

$ sudo pip install qldbshell
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting qldbshell
  Downloading Requirement already satisfied: boto3>=1.9.237 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: amazon.ion<0.6.0,>=0.5.0 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.5 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: ionhash~=1.1.0 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: botocore<1.21.0,>=1.20.21 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from amazon.ion<0.6.0,>=0.5.0->qldbshell)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.6/site-packages (from prompt_toolkit<3.1.0,>=3.0.5->qldbshell)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Installing collected packages: qldbshell
  Running setup.py install for qldbshell ... done
Successfully installed qldbshell-1.2.0

Can I see my data now


$ qldbshell -l demo

Welcome to the Amazon QLDB Shell version 1.2.0
Use 'start' to initiate and interact with a transaction. 'commit' and 'abort' to commit or abort a transaction.
Use 'start; statement 1; statement 2; commit; start; statement 3; commit' to create transactions non-interactively.
Use 'help' for the help section.
All other commands will be interpreted as PartiQL statements until the 'exit' or 'quit' command is issued.

qldbshell > select * from sample;                                                                                                                           
INFO:
{
 id: 1,
 name: "Demo Row",
 location: "USA",
 domain: null
},
{
 id: 1,
 name: "Demo Row",
 location: "USA",
 domain: null
},
{
 id: "1",
 name: "Demo Row",
 location: "USA",
 domain: ""
},
{
 id: 3,
 name: "Kiwi",
 location: "NZ",
 domain: null
},
{
 id: 2,
 name: "Row 2",
 location: "AUS",
 domain: "news.com.au"
},
{
 id: 3,
 name: "Kiwi",
 location: "NZ",
 domain: null
},
{
 id: 2,
 name: "Row 2",
 location: "AUS",
 domain: "news.com.au"
}
INFO: (0.0815s)

And yes, data, I see it's duplicated, so I must have in between the 10 steps run twice. This does highlight a known limitation of QLDB, no unique constraints.

But wait, that data is not really correct, I don't want null. Goes back to the JSON to see the MySQL shell gives that.

$ jq . sample.json
[
  {
    "id": 1,
    "name": "Demo Row",
    "location": "USA",
    "domain": null
  },
...

At some point I also got this load error, but by now I've given up documenting how to do something, in order to demonstrate something.

NameError: name 'null' is not defined

One has to wrap the only nullable column with IFNULL(subdomain,'') as subdomain and redo all those steps again. This is not going to be practical having to wrap all columns in a wider table with IFNULL.

However, having exhausted all this time for what was supposed to be a quiet weekend few hours, my post is way to long, and I've learned "Creating examples can be hard".

#WDILTW – What can I run from my AWS Aurora database

When you work with AWS Aurora you have limited admin privileges. There are some different grants for MySQL including SELECT INTO S3 and LOAD FROM S3 that replace the loss of functionality to SELECT INTO OUTFILE and mysqldump/mysqlimport using a delimited format. While I know and use lambda capabilities, I have never executed anything with INVOKE LAMDBA directly from the database.

This week I found out about INVOKE COMPREHEND (had to look that product up), and INVOKE SAGEMAKER (which I used independently). These are machine learning capabilities that enable you to build custom integrations using Comprehend and SageMaker. I did not have any chance to evaluate these capabilities so I am unable to share any use cases or experiences. There are two built-in comprehend functions aws_comprehend_detect_sentiment() and aws_comprehend_detect_sentiment_confidence(), a likely future starting place. Sagemaker is invoked as an extension of a CREATE FUNCTION that provides the ALIAS AWS_SAGEMAKER_INVOKE_ENDPOINT syntax.

Also available are some MySQL status variables including Aurora_ml_logical_response_cnt, Aurora_ml_actual_request_cnt, Aurora_ml_actual_response_cnt, Aurora_ml_cache_hit_cnt, Aurora_ml_single_request_cnt.

Some googling found an interesting simple example, calculating the positive/negative sentiment and confidence of sentences of text. I could see this as useful for analyzing comments. I’ve included the example from this site here to encourage my readers to take a look as I plan to do. Post IAM configuration I will be really curious to evaluate the responsiveness of this example. Is this truly a batch only operation or could you return some meaningful response timely?

This also lead to bookmarking for reading https://awsauroralabsmy.com/, https://github.com/aws-samples/amazon-aurora-labs-for-mysql/ and https://squidfunk.github.io/mkdocs-material/ all from this one page.

#WDILTW – Debugging failed http requests thru the web of redirects

There are reports that your website is down. You pull up the login page without incident. What’s next?

Monitoring is critical. How detailed is this? How frequently are you sampling? The resolution to any issue is only as good as the response to a paged alert. Who is looking into the issue? What escalation exists?

In today’s complex interconnected infrastructure is it ever that simple? When speaking about an AWS hosted solution, is it an AWS Issue? Does status.aws.amazon.com give you a clue? Does the inability to access other services/sites you may be using at this moment give an indicator of a larger problem? Is it AWS related for a service, an availability zone, or even an entire region? Having experienced all of those before sometimes its obvious, sometimes it is not. Or does a Twitter Search report other shared experiences of regional outages, was it that severed Verizon underwater cable?

I learned two things this week in triage of this situation. The first is that the old CLI tools you have been using for 20+ years still help in triage quickly. D not discount them or the detail they provide. I was able to identify and reproduce an underlying cause with just nslookup and curl. For many reviewing the outage the problem did not manifest as an error. It turned out there were two distinct paths from two separate domains to the ultimate target page. This was not immediately obvious and known, and there was no definitive network diagram to describe this.

When this was determined nslookup provided that there were two different resolved AWS ELBs. dig is also a useful command to master, for example to determine if an A record or CNAME for example.

$ nslookup demo.internal-example.com

demo.internal-example.com	canonical name = internal.us-east-1.elb.amazonaws.com.
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.1.2
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.0.3
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.2.4
$ ▶ nslookup demo.public-example.com

Non-authoritative answer:
demo.public-example.com	         canonical name = external.us-east-1.elb.amazonaws.com.
Name:	 external.us-east-1.elb.amazonaws.com
Address: 23.123.111.222
Name:	 external.us-east-1.elb.amazonaws.com
Address: 50.200.211.222

The first indication was actually to find that one of the ELBs was not in the AWS account with all other resources, and this AWS account was not viewable. That is a separate discussion for why? curl then helped to traverse the various redirects of each ELB using these options

  • -i/–include – Include the headers
  • -k/–insecure – Allow insecure SSL connections
  • -L/–location – Follow redirects
$ curl -ikL external.us-east-1.elb.amazonaws.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Thu, 11 Feb 2021 20:34:47 GMT
Content-Type: text/html
Content-Length: 134
Location: https://external.us-east-1.elb.amazonaws.com:443/
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 0

HTTP/1.1 200 Connection established

HTTP/2 302
date: Thu, 11 Feb 2021 20:34:48 GMT
content-length: 0
location: http://demo.unavailable.com
cache-control: no-cache

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 2071
Date: Thu, 11 Feb 2021 19:09:29 GMT
Last-Modified: Tue, 18 Dec 2018 05:32:31 GMT
Accept-Ranges: bytes
Server: AmazonS3
X-Cache: Hit from cloudfront
Via: 1.1 44914fa6421b789193cec8998428f8bd.cloudfront.net (CloudFront)
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 1071

<html

Using these commands was nothing new, however identifying this single line provided a way to isolate within the chain of redirects where to focus.

content-length: 0

Ultimately the issue was not ELB related, but internal infrastructure behind this one ELB. When corrected the result was (trimmed for readability)

$ curl -ikL external.us-east-1.elb.amazonaws.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Thu, 11 Feb 2021 20:37:18 GMT
Content-Type: text/html
Content-Length: 134
Location: https://external.us-east-1.elb.amazonaws.com:443/
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 0

HTTP/1.1 200 Connection established

HTTP/2 302
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/plain; charset=utf-8
content-length: 27
x-powered-by: 
location: /redirect
vary: Accept

HTTP/2 301
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/html
content-length: 162
location: /redirect/

HTTP/2 200
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/html
content-length: 2007
last-modified: Tue, 02 Feb 2021 03:27:13 GMT
vary: Accept-Encoding

<html>
  <head>

In summary, and a means to triage a future problem, or to monitor:

Failure success
$ egrep -i "^HTTP|^Content-Length" 

HTTP/1.1 301 Moved Permanently
Content-Length: 134
HTTP/1.1 200 Connection established
HTTP/2 302
content-length: 0
HTTP/1.1 200 OK
Content-Length: 2071


$ egrep -i "^HTTP|^Content-Length"

HTTP/1.1 301 Moved Permanently
Content-Length: 134
HTTP/1.1 200 Connection established
HTTP/2 302
content-length: 27
HTTP/2 301
content-length: 162
HTTP/2 200
content-length: 2007

With the proliferation of GUI based monitoring products it is likely for many organizations that multiple different monitors are available, but are they triggered, and do they enable you to pinpoint the underlying issue? Long gone are the days of a Pingdom type ping of a URL from multiple locations every minute and a report of latency or errors then you start digging. This week I learned about DataDog Synthetic Monitoring. DataDog is a well established monitoring solution that I have only just started to understand, I wish I had a year to master to delving into.

In later review this monitoring showed an already configured browser test for this top level URL that was failing, it was simply not alerting correctly. The Synthetic monitoring is far more advanced, providing an ITTT workflow, and even provides physical images of the rendered pages.

This experience highlighted the need to have detailed and redundant monitoring but also the right process to triage and drill down.

I looked into trying to provide an example of this DataDog feature, however the free tier monitoring solution does not provide all the advanced features for the evaluation I’d like. You can look at some product examples.

Observability is a key tool in any operations management. It should be one of the pillars where a continued investment of time, resources and skills development can add significant value for business continuity.

#WDILTW – AWS RDS Proxy

This week I was evaluating AWS RDS Proxy. If you are familiar with the Relational Database Service (RDS) and use MySQL or Postgres, this is an additional option to consider.

Proxies in general by the name accept incoming requests and perform some management before those requests are forwarded to the ultimate target.

RDS proxy takes incoming database connections and can perform several capabilities including collection pooling and capping the total database connections with each configured proxy holding a percentage of the total connections for the target cluster. The proxy can handle routing only for writer instances (at this time) to minimize a planned or unplanned failover. The RDS proxy however does not address the underlying problem of too many connections to the database, it just adds another layer, that is or may be more configurable or tunable than an application requesting connections.

The RDS Proxy is automatically Highly Available (HA). You can determine this by looking at the host IPs of the MySQL processlist. I have yet to identify any other means of seeing if a connection is a proxy connection at the database level if you are using the same credentials. RDS Proxy does give you the ability via Secrets Manager to connect as a different user. You can specify a connection initialization query. I used a SET variable so that application could determine if it was using a Proxy however that is of little benefit in server connection management.

The RDS proxy can enforce TLS, something which in my opinion should always be used for application to data store communications, but historically has been overlooked at practically every company I have worked for or consulted to. Just because you are communicating within a VPC does not protect your communications from actors within your VPC. I can remember at a prior employment the disappointment of cross-region replication that was encrypted being dropped because it was too hard to migrate or manage. That shows an all too common problem of laziness over security.

If you are new to a particular technology the age of the Internet gives you search capabilities to find numerous articles. If you search for anything AWS you will generally always get as the top results the official pages, it takes some digging to find other articles. Prior to this lesson I had only read about RDS Proxy, I had never actually setup one.

When anybody is learning something new, I like to say your value add is not to just read an article, but reproduce and then adapt or enhance. This Amazon example is no different. Repeating each step showed multiple errors in syntax which I can contribute back as comments. If this was open source code, you could contribute a pull request (PR). The good news is the first example of configuring a proxy includes by GUI and CLI commands. I always like to do my work on the command line, even the first iteration. You cannot scale a human moving a mouse around and clicking. What I found however was that the official AWS CLI lacked a key component of the proxy setup around group targets. The UI provides a capability that the CLI did not. Another discrepancy was when I was making modifications to the proxy in the GUI I would get an error, but I could make that change via the CLI. These discrepancies are an annoyance for consistency and first evaluation.

So what was the outcome of my evaluation? First I was able to demonstrate I could add a proxy to an existing cluster in one of our test environments and direct traffic from a mysql client thru the proxy to the target database. I was able to use Secrets Manager (SSM) to enforce credentials for authorization. I did not look into Identity Access Management (IAM) roles support. I was able to benchmark with sysbench simulated load to compare latency of the proxy traffic versus direct traffic. I have simplified my examples so that anybody can run these tests themselves for simple validation.

I could enforce TLS communications for the mysql client testing, however our company internal http proxy caused the usual self signed certificate issues with sysbench, something I really need to master. Surprisingly I looked at what options sysbench gave me for SSL options (side bar we should always refer to this as TLS instead of SSL), but the defined options for the installed recent version are still using the ssl name. The scope of options differed from the source code online so a question as to why? That’s the great thing about open source, you can read the code. You may have even met the author at a conference presentation.

Where the evaluation hit a business impact was in comparative performance. I am still awaiting an AWS support response to my evaluation.

What’s next is to get an application team to evaluate end to end database operations, easily done as Route 53 DNS is used for endpoint communications.
Where I got stuck was incorporating the setup of RDS proxy within Terraform We currently use version 12. While there was the aws_db_proxy module, I needed an updated version of the aws provider to our environment. The official Hashicorp documentation of the resource really does not highlight the complexity necessary to create a proxy. While you will have already configured a VPC, and subnets, even Ingres security groups and secrets which all parts necessary for RDS cluster, you need a number of integrated pieces.

You will need an IAM role for your proxy, but that role requires a policy to use KMS to get the secrets you wish to use for authorization. This interdependency of KMS and secret ARNs make is difficult to easily launch a RDS proxy as you would an RDS aurora cluster. Still it’s a challenge for something else to do. The added complexity is the RDS proxy also needs an authorization argument, for example the –auth argument in the AWS CLI. I see this as a complexity for management of RDS users that you wish to also be configured for use in the proxy.

As with any evaluation or proof of concept (POC) the devil is in the details. How do you monitor your new resources, what logging is important to know, what types of errors can happen, and how do you address these.

Another issue I had was the RDS proxy required a new version of the AWS client in order to run RDS commands such as describe-db-proxies. That adds an additional administrative dependency to be rolled out.

Proxies for MySQL have been around for decades, I can remember personally working on the earliest version of MySQL Proxy at MySQL Inc back in 2007. The gold standard if you use MySQL, is ProxySQL by Sysown’s René Cannaò. This is a topic for a different discussion.

Checkout my code for this work.

Reading

Enforcing a least privileged security model can be hard

In a greenfield environment you generally have the luxury to right any wrongs of any past tech debt. It can be more difficult to apply this to an existing environment? For example, my setup is configured to just work with the AWS CLI and various litmus tests to validate that. Generally instructions would include, valid your AWS access.  This can be as simple as: 

$ aws ec2 describe-regions
$ aws ec2 describe-availability-zones --profile oh

As part of documenting some upcoming Athena/Hadoop/Pig/RDBMS posts I decided it was important to separate out the AWS IAM privileges with a new user and permission policies.This introduced a number of steps that simply do not work.  Creating a new AWS IAM user is not complex. Validating console and API access of that user required some revised setup.

$ aws ec2 describe-regions

An error occurred (AuthFailure) when calling the DescribeRegions operation: AWS was not able to validate the provided access credentials

In order to be able to use the CLI you require your aws_access_key_id and aws_secret_access_key information as well as aws_session_token if used. In order for a new individual user to gain this information, you also need a number of policy rules including the ability to ListAccessKeys, CreateAccessKey and potentially DeleteAccessKey.

 
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "iam:DeleteAccessKey",
                "iam:CreateAccessKey",
                "iam:ListAccessKeys"
            ],
            "Resource": "arn:aws:iam::[account]:user/[username]"
        }
    ]
}

In this example, we also restrict the least privileged model with a specific user resource ARN. For a single user account that is workable, for a large organization it would not.
This gives the ability to configure your AWS CLI via typical ~/.aws/credentials and/or ~/aws/config settings. Performing  the litmus test now gives:

$ aws ec2 describe-regions

An error occurred (UnauthorizedOperation) when calling the DescribeRegions operation: You are not authorized to perform this operation.

This requires a policy of:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeRegions"
            ],
            "Resource": "*"
        }
    ]
}
$ aws ec2 describe-regions | jq '.Regions[0]'
{
  "Endpoint": "ec2.eu-north-1.amazonaws.com",
  "RegionName": "eu-north-1",
  "OptInStatus": "opt-in-not-required"
}


$ aws ec2 describe-availability-zones --filter "Name=region-name,Values=us-east-1" | jq -r '.AvailabilityZones[].ZoneName'

us-east-1a
us-east-1b
us-east-1c
us-east-1d
us-east-1e
us-east-1f

However, this may be too restrictive for a larger organization.  The EC2 Access level for ‘list’ includes currently over 120 individual permissions. A more open policy could be:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:Describe*"
            ],
            "Resource": "*"
        }
    ]
}

However this does not provide all of the EC2 ‘list’ actions, e.g. ExportClientVpnClientConfiguration, and it includes several ‘read’ actions, e.g. DescribeVolumesModifications.
Selecting the ‘list’ tickbox via the GUI will provide all actions by name individually in the policy action list, currently 117, however this is not forward compatible for any future list defined access level actions.

This is before the exercise to starting granting access to a new AWS service – Athena, and its data source S3.

Defensive Data Techniques

As a data architect I always ensure that for any database schema change there a fully recoverable execution path.
I have generally advised to create a patch/revert process for every change.  For example, if a change adds a new column or index to a table, a revert script would remove the respective column or index.
The goal is to always have a defensive position for any changes. The concept is that simple, it is not complex.

In its simplest form I use the following directory and file structure.

/schema
    schema.sql
    /patch
        YYYYMMDDXX.sql     where XX,ZZ are sequential 2 digit numbers, e.g. 01,02
        YYYYMMDDZZ.sql
   /revert
       YYYYMMDDXX.sql   This is the same file name in the revert sub-directory.
       YYYYMMDDZZ.sql

At any commit or tag in configuration management it is possible to create a current copy of the schema, i.e. use schema.sql.
It is also possible to take the first version of schema.sql and apply chronologically all the patch scripts to arrive at the same consistent structure of the schema that is in schema.sql. You can also run a validation process to confirm these are equivalent.
For each tagged version or commit of this directory structure and files in version control, this should always hold true.
While not the desired execution path, every revert script can be applied in a reverse chronological order and return to the first version of the schema.
If you want to maintain a first_schema.sql file within the directory structure, you can always create any version of the schema from a given commit in a roll-forward or roll-back scenario.

In reality however this is rarely implemented. There is always divergence or drift. Drift occurs for several primary reasons. The first is non-adherence to the defined process. The second and more critical is the lack of adequate testing and verification at each and every step.  A Test Driven Design (TDD) that validates the given approach would enable a verification of end state of the schema and enable the verification at each accumulated

In addition to each patch/revert there needs to be a state that is maintained of what has been applied.  Generally for RDBMS storing this metadata within a table is recommended.

The above example shows files of .sql extension. Any schema management process needs to cater for .sh, .py or other extensions to cater for more complex operations.
 
What about data changes?  I would recommend that for all configuration information you follow the same management principles as for schema objects, that is you have a patch to insert/update/delete data, and you have a revert script that can restore that data.  Generally the complexity of the rollback process is a hurdle for developers/engineers. Having a framework is important to manage how data consistency is maintained. This framework could generate a statement to restore the data (e.g. a selective mysqldump), require a hand-crafted statement, or leverage the benefit of the RDBMS by storing the data into intermediate shallow tables.

Using a least privileged model complicates an applicable framework approach. Does the user applying the change now require the FILE privilege, or CREATE/DROP privilege to create tables for the ability to restore data.

If there is strict referential integrity at the database level, those protections will defend against unintended consequences. For example, deleting a row that is dependent on a foreign key relationship.  In a normal operating system accommodations are made generally for the sake of performance, but also for supporting poor data cleansing requirements. If the application maintains a level of referential integrity, the schema management process also needs to support this, adding a further complexity.  Ensuring data integrity is an important separate topic. If there is a dangling row, what is the impact? The data still exists, it is just not presented in a user interface or included in calculations. This generally leads to greater unintended consequences that are generally never obvious at the time of execution, but rather days, weeks or months later.

When it comes to objects within the structure of an RDBMS the situation is more complex.  A classic example in MySQL is a user.  A user in MySQL is actually the user definition which is just the username, password and host.  A user contains one or more grants. The user may be the owner of additional objects. Using default and legacy MySQL, it is simply not possible to determine if a user is actually being used. Percona and other variances support INFORMATION_SCHEMA.USER_STATISTICS which is a better method of evaluating the use of a user.  This does however require the intervention of time-based data collection, as this table is the accumulative statistics since an instance restart or flush.

With this type of object, or meta object several defensive techniques exist.  

If you had the user `blargie` and that user had grants to read data from several schemas, is the user used?  I don’t think so, let’s just delete it is not a fact-based approach to avoiding a subsequent problem.
Is the user used? Let’s revoke the users privileges and monitor for errors or user feedback? Or let’s change the user’s password?  With each of these strategies it is important to always have a defensive process to rollback.
A different approach is to use a common data technique of marking information as deleted before it’s physically deleted (think trash can before you empty the trash).  For MySQL users there is no default functionality (in the most recent versions of MySQL you can DISABLE a user).  One implementation to apply this pattern is to rename the user, which has the benefit of keeping the user’s password and privileges intack, therefore reducing the amount of complexity in restoring.

Regardless of the technique, it is important there is always a recovery path.  In a subsequent post I will discuss this approach towards cloud metadata, for example an AWS KMS policy, IAM Rule or ASG setting and the impact of  Infrastructure as a Service (IaaS) such as Terraform.

More reading https://en.wikipedia.org/wiki/Test-driven_development, https://en.wikipedia.org/wiki/Defensive_programming

AWS cost saving tips – EBS Volumes

A trivial cost saving tip for checking if you are spending money in your AWS environment on unused resources. This is especially appropriate when using provisioned IOPS EBS volumes.

$ ec2-describe-volumes | grep available

VOLUME	vol-44dff904	8	snap-d86d0884	us-east-1b	available	2014-08-01T14:11:24+0000	standard
VOLUME	vol-62dff922	100		us-east-1b	available	2014-08-01T14:11:24+0000	io1	1000
VOLUME	vol-15dff955	8	snap-d86d0884	us-east-1b	available	2014-08-01T14:11:24+0000	standard
VOLUME	vol-80a88ec0	8	snap-d86d0884	us-east-1b	available	2014-08-01T15:12:54+0000	standard
VOLUME	vol-ca82a48a	100		us-east-1b	available	2014-08-01T16:13:49+0000	standard
VOLUME	vol-5d79581d	8	snap-d86d0884	us-east-1b	available	2014-08-01T18:27:01+0000	standard
VOLUME	vol-baf9dbfa	8	snap-d86d0884	us-east-1b	available	2014-08-03T18:20:59+0000	standard
VOLUME	vol-53ffdd13	8	snap-d86d0884	us-east-1b	available	2014-08-03T18:25:52+0000	standard
VOLUME	vol-ade7daed	8	snap-d86d0884	us-east-1b	available	2014-08-13T20:10:46+0000	standard
VOLUME	vol-34e2df74	8	snap-065a2e52	us-east-1b	available	2014-08-13T20:26:17+0000	standard
VOLUME	vol-cacef38a	100	snap-280ffb7f	us-east-1b	available	2014-08-13T21:19:18+0000	standard
VOLUME	vol-41350a01	8	snap-f23ccba5	us-east-1b	available	2014-08-14T16:54:27+0000	standard
VOLUME	vol-51350a11	100	snap-fc3ccbab	us-east-1b	available	2014-08-14T16:54:27+0000	standard
VOLUME	vol-912f10d1	8	snap-96ee24c1	us-east-1b	available	2014-08-14T17:15:06+0000	standard
VOLUME	vol-a82f10e8	100	snap-9dee24ca	us-east-1b	available	2014-08-14T17:15:06+0000	standard

These are available and unused EBS volumes which you should consider deleting.

Another reason to avoid RDS

My list of reasons for never using or recommending Amazon’s MySQL RDS service grows every time I experience problems with customers. This was an interesting and still unresolved issue.

ERROR 126 (HY000): Incorrect key file for table '/rdsdbdata/tmp/#sql_5b7_1.MYI'; try to repair it

You may see this is a MyISAM table. The MySQL database is version 5.5, all InnoDB tables and is very small 100MB in total size.
What is happening is that MySQL is generating a temporary table, and this table is being written to disk. I am unable to change the code to improve the query causing this disk I/O.

What I can not understand and have no ability to diagnose is why this error occurs sometimes and generally when the database is under additional system load. With RDS you have no visibility of the server running the production database. While you have SQL access, an API for managing MySQL configuration options (I also add not all MySQL variables), and limited system statistics via a graphical interface, all information about the system performance, disk configuration etc is hidden and not accessible. This is a frustrating limitation of using RDS.

NOTE: While I cannot recommend RDS, I am very happy with AWS EC2 services when correctly configured. For a cloud based MySQL solution I would definitely recommend greater control over your MySQL database using EC2 and EBS.

Basic scalability principles to avert downtime

In the press in the last two days has been the reported outage of Amazon Web Services Elastic Compute Cloud (EC2) in just one North Virginia data center. This has affected many large website includes FourSquare, Hootsuite, Reddit and Quora. A detailed list can be found at ec2disabled.com.

For these popular websites was this avoidable? Absolutely.

Basic scalability principles if deployed in these systems architecture would have averted the significant downtime regardless of your development stack. While I work primarily in MySQL these principles are not new, nor are they complicated, however they are fundamental concepts in scalability that apply to any technology including the popular MongoDB that is being used by a number of affected sites.

Scalability 101 involves some simple basic rules. Here are just two that seem to have been ignored by many affected by this recent AWS EC2 outage.

  1. Never put all your eggs in one basket. If you rely on AWS completely, or you rely on just one availability zone that is putting all your eggs in one basket.
  2. Always keep your important data close to home. When it comes to what is most critical to your business you need access and control to your information. At 5am in the morning when the CEO asks how long will our business be unavailabla and what is needed to resolve the problem, the answer “We have no control over this and have no ETA” is not an acceptable answer.

With a successful implementation and appropriate data redundancy you may not have an environment immediately available however you have access to your important information and the ability to create one quickly. Many large hosting companies can provide additional H/W on near demand, especially if you have an initial minimal footprint. Indeed using Amazon Web Services (AWS) as a means to avert a data center disaster is an ideal implementation of Infrastructure As A Service (IAAS). Even with this issue, organizations that had planned for this type of outage could have easily migrated to another AWS availability zone that was unaffected.

Furthermore, system architecture to support various levels of data availability and scalability ensure you can handle many more various types of unavailability without significant system down time as recently seen. There are many different types of availability and unavailability, know what your definition of downtime is and supporting disasters should be your primary focus of scalability, not an after thought.

As an expert in performance and scalability I can help your organization in the design of a suitable architecture to support successful scalability and disaster. This is not rocket science however many organizations gamble without the expertise of a professional to ensure business viability.

Problems compiling MySQL 5.4

Seem’s the year Sun had for improving MySQL, and with an entire new 5.4 branch the development team could not fix the autoconf and compile dependencies that has been in MySQL for all the years I’ve been compiling MySQL. Drizzle has got it right, thanks to the great work of Monty Taylor.

I’m working on the Wafflegrid AWS EC2 AMI’s for Matt Yonkovit and while compiling 5.1 was straight forward under Ubuntu 8.10 Intrepid, compiling 5.4 was more complicated.

For MySQL 5.1 I needed only to do the following:

apt-get install -y build-essential
apt-get install libncurses5-dev
./configure
make
make install

For MySQL 5.4, I elected to use the BUILD scripts (based on Wafflegrid recommendations). That didn’t go far before I needed.

apt-get install -y automake libtool

You then have to go compiling MySQL 5.4 for 10+ minutes to get an abstract error, then you need to consider what dependencies may be missing.
I don’t like to do a blanket apt-get of a long list of proposed packages unless I know they are actually needed.

The error was:

make[1]: Entering directory `/src/mysql-5.4.0-beta/sql'
make[1]: warning: -jN forced in submake: disabling jobserver mode.
/bin/bash ../ylwrap sql_yacc.yy y.tab.c sql_yacc.cc y.tab.h sql_yacc.h y.output sql_yacc.output -- -d --verbose
make -j 6 gen_lex_hash
make[2]: Entering directory `/src/mysql-5.4.0-beta/sql'
rm -f mini_client_errors.c
/bin/ln -s ../libmysql/errmsg.c mini_client_errors.c
make[2]: warning: -jN forced in submake: disabling jobserver mode.
rm -f pack.c
../ylwrap: line 111: -d: command not found
/bin/ln -s ../sql-common/pack.c pack.c
....
make[1]: Leaving directory `/src/mysql-5.4.0-beta/sql'
make: *** [all-recursive] Error 1

What a lovely error ../ylwrap: line 111: -d: command not found

ylwrap is part of yacc, and by default in this instance it’s not even an installed package. I’ve compiled MySQL long enough that it requires yacc, and actually bison but to you think it would hurt if the configure told the user this.

It’s also been some time since I’ve compiled MySQL source, rather focusing on Drizzle. I had forgotten just how many compile warnings MySQL throws. Granted a warning is not an error, but you should not just ignore them in building a quality product.

Announcing Drizzle on EC2

I have published the very first sharable Drizzle Amazon Machine Image (AMI) for AWS EC2, based on the good feedback from my discussion at the Drizzle Developer Day on what options we should try.

This first version is a 32bit Developer instance, showcasing Drizzle and all necessary developer tools to build Drizzle from source.

What you will find on drizzle-ami/intrepid-dev32 – ami-b858bfd1

Ubuntu 8.10 Intrepid 32 bit base server installation:

  • build tools
  • drizzle dependencies
  • bzr 1.31.1

From the respective source trees the following software is available:

  • drizzle 2009.04.997
  • libdrizzle 0.0.2
  • gearman 0.0.4
  • memcached 1.2.8
  • libmemcached 0.28

Drizzle has been configured with necessary dependencies for PAM authentication, http_auth, libgearman and MD5 but these don’t seem to be available in the binary distribution.

I will be creating additional AMI’s including 64bit and LAMP ready binary only images.

The following example shows using drizzle on this AMI. Some further work is necessary for full automation, parameters and logging. I’ve raised a number of issues the Drizzle Developers are now hard at work on.

1. Starting Drizzle

ssh [email protected]
sudo /etc/init.d/drizzle-server.init start &

2. Testing Drizzle (the sakila database has been installed)

$ drizzle
Welcome to the Drizzle client..  Commands end with ; or g.
Your Drizzle connection id is 4
Server version: 2009.04.997 Source distribution

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

drizzle> select version();
+-------------+
| version()   |
+-------------+
| 2009.04.997 |
+-------------+
1 row in set (0 sec)

drizzle> select count(*) from sakila.film;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0 sec)

3. Compiling Drizzle

sudo su - drizzle
ls
deploy  drizzle  libdrizzle  sakila-drizzle
cd drizzle
./configure --help
Description of plugins:

   === HTTP Authentication Plugin ===
  Plugin Name:      auth_http
  Description:      HTTP based authentications
  Supports build:   static and dynamic

   === PAM Authenication Plugin ===
  Plugin Name:      auth_pam
  Description:      PAM based authenication.
  Supports build:   dynamic

   === compression UDFs ===
  Plugin Name:      compression
  Description:      UDF Plugin for compression
  Supports build:   static and dynamic
  Status:           mandatory

   === crc32 UDF ===
  Plugin Name:      crc32
  Description:      UDF Plugin for crc32
  Supports build:   static and dynamic
  Status:           mandatory

   === Error Message Plugin ===
  Plugin Name:      errmsg_stderr
  Description:      Errmsg Plugin that sends messages to stderr.
  Supports build:   dynamic

   === Daemon Example Plugin ===
  Plugin Name:      hello_world
  Description:      UDF Plugin for Hello World.
  Supports build:   dynamic

   === Gearman Logging Plugin ===
  Plugin Name:      logging_gearman
  Description:      Logging Plugin that logs to Gearman.
  Supports build:   dynamic

   === Query Logging Plugin ===
  Plugin Name:      logging_query
  Description:      Logging Plugin that logs all queries.
  Supports build:   static and dynamic
  Status:           mandatory

   === Syslog Logging Plugin ===
  Plugin Name:      logging_syslog
  Description:      Logging Plugin that writes to syslog.
  Supports build:   static and dynamic
  Status:           mandatory

   === MD5 UDF ===
  Plugin Name:      md5
  Description:      UDF Plugin for MD5
  Supports build:   static and dynamic

   === One Thread Per Connection Scheduler ===
  Plugin Name:      multi_thread
  Description:      plugin for multi_thread
  Supports build:   static
  Status:           mandatory

   === Old libdrizzle Protocol ===
  Plugin Name:      oldlibdrizzle
  Description:      plugin for oldlibdrizzle
  Supports build:   static
  Status:           mandatory

   === Pool of Threads Scheduler ===
  Plugin Name:      pool_of_threads
  Description:      plugin for pool_of_threads
  Supports build:   static
  Status:           mandatory

   === Default Signal Handler ===
  Plugin Name:      signal_handler
  Description:      plugin for signal_handler
  Supports build:   static
  Status:           mandatory

   === Single Thread Scheduler ===
  Plugin Name:      single_thread
  Description:      plugin for single_thread
  Supports build:   static
  Status:           mandatory

   === Archive Storage Engine ===
  Plugin Name:      archive
  Description:      Archive Storage Engine
  Supports build:   static
  Status:           mandatory

   === Blackhole Storage Engine ===
  Plugin Name:      blackhole
  Description:      Basic Write-only Read-never tables
  Supports build:   static and dynamic
  Configurations:   max, max-no-ndb

   === CSV Storage Engine ===
  Plugin Name:      csv
  Description:      Stores tables in text CSV format
  Supports build:   static
  Status:           mandatory

   === Memory Storage Engine ===
  Plugin Name:      heap
  Description:      Volatile memory based tables
  Supports build:   static
  Status:           mandatory

   === InnoDB Storage Engine ===
  Plugin Name:      innobase
  Description:      Transactional Tables using InnoDB
  Supports build:   static and dynamic
  Configurations:   max, max-no-ndb
  Status:           mandatory

   === MyISAM Storage Engine ===
  Plugin Name:      myisam
  Description:      Traditional non-transactional MySQL tables
  Supports build:   static
  Status:           mandatory


Report bugs to <http://bugs.launchpad.net/drizzle>.

Setting up MySQL on Amazon Web Services (AWS) Presentation

On Tuesday at the MySQL Camp 2009 in Santa Clara I presented Setting up MySQL on Amazon Web Services (AWS).

This presentation assumed you know nothing about AWS, and have no account. With Internet access via a Browser and a valid Credit Card, you can have your own running Web Server on the Internet in under 10 minutes, just point and click.

We also step into some more detail online click and point and supplied command line tools to demonstrate some more advanced usage.

Extending application data to the cloud

I was one of the invited panel speakers to A panel on Cloud Computing this week in New York. As one of 2 non vendor presenters, it was a great experience to be invited and be involved with vendors.

While I never got to use my slides available here, I did get to both present certain content, and indeed questions and discussions on the night were on other points of my content.

Cloud computing is here, it’s early days and new players will continue to emerge. For example, from the panel there was AppNexus, reviewed favorably at Info World in comparison with EC2 and Google App Engine, 10gen, an open source stack solution and Kaavo which from an initial 60 seconds of playing provide a management service on top of AWS similar to what ElasticFox provides. I need to investigate further how much the feature set extends and would compete with others like RightScale for example.

The greatest mystery came from Hank Williams and his stealth Kloudshare. He did elaborate more on where they aim to provide services. A new term discussed was “Tools as a service”, akin to moving use metaphorically from writing in Assembly language to the advanced frameworks of today’s generation of languages such as Java and Ruby.

Thanks to Murat Aktihanoglu of Unype who chaired the event.

Your data and the cloud

I will be speaking on July 29th in New York at an Entrepreneurs Forum on A Free Panel on Cloud Computing. With a number of experts including Hank Williams of KloudShare, Mike Nolet of AppNexus, and Hans Zaunere of New York PHP fame is should be a great event.

The focus of my presentation will be on “Extending existing applications to leverage the cloud” where I will be discussing both the advantages of the cloud, and the complexities and issues that you will encounter such as data management, data consistency, loss of control, security and latency for example.

Using traditional MySQL based applications I’ll be providing an approach that can lead to your application gaining greater power of cloud computing.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

Setting up on EC2

Thanks to my friend Dustin, and his EC2 demo using Elasticfox Firefox Extension for Amazon EC2 I got an EC2 image setup. With other references Link 1,Link 2,Link 3 I was also able to create my own AMI.

Some notes specific for my configuration.

Pre-config ElasticFox key for launching directly from ElasticFox SSH connections.

mkdir ~/ec2-keys
mv ~/Downloads/elasticfox.pem ~/ec2-keys/id_elasticfox
chmod 600 ~/ec2-keys/id_elasticfox
chmod 700 ~/ec2-keys/
ssh -i /Users/rbradfor/ec2-keys/id_elasticfox [email protected]

Installed Software.

apt-get update
apt-get -y autoremove
apt-get -y install apache2
apt-get -y install mysql-server
# Prompts for password (very annoying)
apt-get -y install php5
apache2ctl graceful
echo "Hello World" > /var/www/index.html
echo "< ? phpinfo() ?>" > /var/www/phpinfo.php

Configuration to save AMI.

scp -i ~/ec2-keys/id_elasticfox ~/ec2-keys/id_elasticfox pk-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem cert-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem [email protected]:/mnt
ec2-bundle-vol -d /mnt -c cert-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem -k pk-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem -u AccountNumber -r i386 -p ubuntu804_lamp
ec2-upload-bundle -b rbradford_804_lamp_ami -m /mnt/ubuntu804_lamp.manifest.xml -a AccessID -s SecretKey