#WDILTW – What can I run from my AWS Aurora database

When you work with AWS Aurora you have limited admin privileges. There are some different grants for MySQL including SELECT INTO S3 and LOAD FROM S3 that replace the loss of functionality to SELECT INTO OUTFILE and mysqldump/mysqlimport using a delimited format. While I know and use lambda capabilities, I have never executed anything with INVOKE LAMDBA directly from the database.

This week I found out about INVOKE COMPREHEND (had to look that product up), and INVOKE SAGEMAKER (which I used independently). These are machine learning capabilities that enable you to build custom integrations using Comprehend and SageMaker. I did not have any chance to evaluate these capabilities so I am unable to share any use cases or experiences. There are two built-in comprehend functions aws_comprehend_detect_sentiment() and aws_comprehend_detect_sentiment_confidence(), a likely future starting place. Sagemaker is invoked as an extension of a CREATE FUNCTION that provides the ALIAS AWS_SAGEMAKER_INVOKE_ENDPOINT syntax.

Also available are some MySQL status variables including Aurora_ml_logical_response_cnt, Aurora_ml_actual_request_cnt, Aurora_ml_actual_response_cnt, Aurora_ml_cache_hit_cnt, Aurora_ml_single_request_cnt.

Some googling found an interesting simple example, calculating the positive/negative sentiment and confidence of sentences of text. I could see this as useful for analyzing comments. I’ve included the example from this site here to encourage my readers to take a look as I plan to do. Post IAM configuration I will be really curious to evaluate the responsiveness of this example. Is this truly a batch only operation or could you return some meaningful response timely?

This also lead to bookmarking for reading https://awsauroralabsmy.com/, https://github.com/aws-samples/amazon-aurora-labs-for-mysql/ and https://squidfunk.github.io/mkdocs-material/ all from this one page.

#WDILTW – To use a RDBMS is to use a transaction

I learned this week that 30+ years of Relational Database Management System (RDBMS) experience still does not prepare yourself for the disappointment of working with organizations that use a RDBMS; MySQL specifically; have a released production product, have dozens to hundreds of developers, team leaders and architects, but do not know the importance of, nor use transactions. If I was to ask this when interviewing somebody that would work with a database and the response was it is not important, or not used these days it would be a hard fail.

To use a RDBMS is to understand a very simple principle, a foundation of a transactional system. It is called ACID. Atomicity, Consistency, Isolation, Durability.

In a simplistic description of this 50 year principle.

  • A – It’s all or nothing
  • C – Your work meets all existing constraints
  • I – Your work is independent and not affected by other concurrent work
  • D – Your work is reproducible in the event of a hard failure

A – Atomicity means a transaction. It’s not rocket science, it’s actually in the description of a RDBMS. What is a transaction? It is a means in Structure Query Language (SQL) notation that can ensure your unit of work is all or nothing.

The most basic of examples is very simple. You want to deposit a check that was given to you. For the bank to accept this deposit, it needs to ensure that the account holder that issued the check has sufficient funds, that is Step 1: subtract amount X from account A, and Step 2: deposit in account B. You cannot do Step 1 or Step 2 independently, you must do 1 and 2 together, otherwise real money could be lost or created from thin air. The lack of transactions also can be affected by not handling I – Isolation.

We have to ask our educators and responsible professionals why? Why can such a bedrock principle not be used? Why in a team of individuals in a company, one single individual from the years of development and support not say, well you have to use a transaction to ensure the integrity of the data you are working with? It’s called a database for a reason.

The unfortunately reality is there are several reasons for this utter failure, and the frustration of professionals like myself.

  1. RDBMS is not the bedrock of data management it was 20,30 even 50 years ago. Today with a proliferation of different products, transactions are simply not taught. Even SQL is not taught, yet most products that want to enable users to query data ultimately provide a SQL-like interface. Many users today want a drag & drop GUI interface but not realize that is not how you manipulate data. Hadoop was a life change with map-reduce approach to large datasets. iIt did not start with SQL, but it has a widely used SQL interface now. You look at the newest hot products like snowflake. What is it’s method of data access, SQL.
  2. MySQL, the most popular open source database does not enforce transactions, it’s optional. What you say? MySQL has some unique features including the capability for multiple storage engines, that provide different features and capabilities, such as index approaches, consistency and support for transactions. An entire generation of open source products were released and do not use transactions, but use a transactional storage engine within a transactional product.
  3. Still on the topic of MySQL, it also does not enforce strict handling like other RDBMS products. The default for decades was to enable an SQL statement to corrupt the data integrity, i.e. C – Consistency. Because MySQL supported warnings and they were always not checked for by developers, data in did not guarantee data out.
  4. Every mini-generation (e.g. 5 years) of new developers think they know more, than seasoned professionals.

As it was for me in the 80s, it should be for use of a relational database; and even a non-relational database; to read and understand the seminal works of An introduction to Database Systems by C.J. Date. It has been required learning at universities for decades, but it seems to have lost is favor with multiple iterations of new software developers.

#3 – What Did I learn this week. WDILTW.

#WDILTW – Debugging failed http requests thru the web of redirects

There are reports that your website is down. You pull up the login page without incident. What’s next?

Monitoring is critical. How detailed is this? How frequently are you sampling? The resolution to any issue is only as good as the response to a paged alert. Who is looking into the issue? What escalation exists?

In today’s complex interconnected infrastructure is it ever that simple? When speaking about an AWS hosted solution, is it an AWS Issue? Does status.aws.amazon.com give you a clue? Does the inability to access other services/sites you may be using at this moment give an indicator of a larger problem? Is it AWS related for a service, an availability zone, or even an entire region? Having experienced all of those before sometimes its obvious, sometimes it is not. Or does a Twitter Search report other shared experiences of regional outages, was it that severed Verizon underwater cable?

I learned two things this week in triage of this situation. The first is that the old CLI tools you have been using for 20+ years still help in triage quickly. D not discount them or the detail they provide. I was able to identify and reproduce an underlying cause with just nslookup and curl. For many reviewing the outage the problem did not manifest as an error. It turned out there were two distinct paths from two separate domains to the ultimate target page. This was not immediately obvious and known, and there was no definitive network diagram to describe this.

When this was determined nslookup provided that there were two different resolved AWS ELBs. dig is also a useful command to master, for example to determine if an A record or CNAME for example.

$ nslookup demo.internal-example.com

demo.internal-example.com	canonical name = internal.us-east-1.elb.amazonaws.com.
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.1.2
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.0.3
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.2.4
$ ▶ nslookup demo.public-example.com

Non-authoritative answer:
demo.public-example.com	         canonical name = external.us-east-1.elb.amazonaws.com.
Name:	 external.us-east-1.elb.amazonaws.com
Address: 23.123.111.222
Name:	 external.us-east-1.elb.amazonaws.com
Address: 50.200.211.222

The first indication was actually to find that one of the ELBs was not in the AWS account with all other resources, and this AWS account was not viewable. That is a separate discussion for why? curl then helped to traverse the various redirects of each ELB using these options

  • -i/–include – Include the headers
  • -k/–insecure – Allow insecure SSL connections
  • -L/–location – Follow redirects
$ curl -ikL external.us-east-1.elb.amazonaws.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Thu, 11 Feb 2021 20:34:47 GMT
Content-Type: text/html
Content-Length: 134
Location: https://external.us-east-1.elb.amazonaws.com:443/
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 0

HTTP/1.1 200 Connection established

HTTP/2 302
date: Thu, 11 Feb 2021 20:34:48 GMT
content-length: 0
location: http://demo.unavailable.com
cache-control: no-cache

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 2071
Date: Thu, 11 Feb 2021 19:09:29 GMT
Last-Modified: Tue, 18 Dec 2018 05:32:31 GMT
Accept-Ranges: bytes
Server: AmazonS3
X-Cache: Hit from cloudfront
Via: 1.1 44914fa6421b789193cec8998428f8bd.cloudfront.net (CloudFront)
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 1071

<html

Using these commands was nothing new, however identifying this single line provided a way to isolate within the chain of redirects where to focus.

content-length: 0

Ultimately the issue was not ELB related, but internal infrastructure behind this one ELB. When corrected the result was (trimmed for readability)

$ curl -ikL external.us-east-1.elb.amazonaws.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Thu, 11 Feb 2021 20:37:18 GMT
Content-Type: text/html
Content-Length: 134
Location: https://external.us-east-1.elb.amazonaws.com:443/
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 0

HTTP/1.1 200 Connection established

HTTP/2 302
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/plain; charset=utf-8
content-length: 27
x-powered-by: 
location: /redirect
vary: Accept

HTTP/2 301
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/html
content-length: 162
location: /redirect/

HTTP/2 200
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/html
content-length: 2007
last-modified: Tue, 02 Feb 2021 03:27:13 GMT
vary: Accept-Encoding

<html>
  <head>

In summary, and a means to triage a future problem, or to monitor:

Failure success
$ egrep -i "^HTTP|^Content-Length" 

HTTP/1.1 301 Moved Permanently
Content-Length: 134
HTTP/1.1 200 Connection established
HTTP/2 302
content-length: 0
HTTP/1.1 200 OK
Content-Length: 2071


$ egrep -i "^HTTP|^Content-Length"

HTTP/1.1 301 Moved Permanently
Content-Length: 134
HTTP/1.1 200 Connection established
HTTP/2 302
content-length: 27
HTTP/2 301
content-length: 162
HTTP/2 200
content-length: 2007

With the proliferation of GUI based monitoring products it is likely for many organizations that multiple different monitors are available, but are they triggered, and do they enable you to pinpoint the underlying issue? Long gone are the days of a Pingdom type ping of a URL from multiple locations every minute and a report of latency or errors then you start digging. This week I learned about DataDog Synthetic Monitoring. DataDog is a well established monitoring solution that I have only just started to understand, I wish I had a year to master to delving into.

In later review this monitoring showed an already configured browser test for this top level URL that was failing, it was simply not alerting correctly. The Synthetic monitoring is far more advanced, providing an ITTT workflow, and even provides physical images of the rendered pages.

This experience highlighted the need to have detailed and redundant monitoring but also the right process to triage and drill down.

I looked into trying to provide an example of this DataDog feature, however the free tier monitoring solution does not provide all the advanced features for the evaluation I’d like. You can look at some product examples.

Observability is a key tool in any operations management. It should be one of the pillars where a continued investment of time, resources and skills development can add significant value for business continuity.

#WDILTW – AWS RDS Proxy

This week I was evaluating AWS RDS Proxy. If you are familiar with the Relational Database Service (RDS) and use MySQL or Postgres, this is an additional option to consider.

Proxies in general by the name accept incoming requests and perform some management before those requests are forwarded to the ultimate target.

RDS proxy takes incoming database connections and can perform several capabilities including collection pooling and capping the total database connections with each configured proxy holding a percentage of the total connections for the target cluster. The proxy can handle routing only for writer instances (at this time) to minimize a planned or unplanned failover. The RDS proxy however does not address the underlying problem of too many connections to the database, it just adds another layer, that is or may be more configurable or tunable than an application requesting connections.

The RDS Proxy is automatically Highly Available (HA). You can determine this by looking at the host IPs of the MySQL processlist. I have yet to identify any other means of seeing if a connection is a proxy connection at the database level if you are using the same credentials. RDS Proxy does give you the ability via Secrets Manager to connect as a different user. You can specify a connection initialization query. I used a SET variable so that application could determine if it was using a Proxy however that is of little benefit in server connection management.

The RDS proxy can enforce TLS, something which in my opinion should always be used for application to data store communications, but historically has been overlooked at practically every company I have worked for or consulted to. Just because you are communicating within a VPC does not protect your communications from actors within your VPC. I can remember at a prior employment the disappointment of cross-region replication that was encrypted being dropped because it was too hard to migrate or manage. That shows an all too common problem of laziness over security.

If you are new to a particular technology the age of the Internet gives you search capabilities to find numerous articles. If you search for anything AWS you will generally always get as the top results the official pages, it takes some digging to find other articles. Prior to this lesson I had only read about RDS Proxy, I had never actually setup one.

When anybody is learning something new, I like to say your value add is not to just read an article, but reproduce and then adapt or enhance. This Amazon example is no different. Repeating each step showed multiple errors in syntax which I can contribute back as comments. If this was open source code, you could contribute a pull request (PR). The good news is the first example of configuring a proxy includes by GUI and CLI commands. I always like to do my work on the command line, even the first iteration. You cannot scale a human moving a mouse around and clicking. What I found however was that the official AWS CLI lacked a key component of the proxy setup around group targets. The UI provides a capability that the CLI did not. Another discrepancy was when I was making modifications to the proxy in the GUI I would get an error, but I could make that change via the CLI. These discrepancies are an annoyance for consistency and first evaluation.

So what was the outcome of my evaluation? First I was able to demonstrate I could add a proxy to an existing cluster in one of our test environments and direct traffic from a mysql client thru the proxy to the target database. I was able to use Secrets Manager (SSM) to enforce credentials for authorization. I did not look into Identity Access Management (IAM) roles support. I was able to benchmark with sysbench simulated load to compare latency of the proxy traffic versus direct traffic. I have simplified my examples so that anybody can run these tests themselves for simple validation.

I could enforce TLS communications for the mysql client testing, however our company internal http proxy caused the usual self signed certificate issues with sysbench, something I really need to master. Surprisingly I looked at what options sysbench gave me for SSL options (side bar we should always refer to this as TLS instead of SSL), but the defined options for the installed recent version are still using the ssl name. The scope of options differed from the source code online so a question as to why? That’s the great thing about open source, you can read the code. You may have even met the author at a conference presentation.

Where the evaluation hit a business impact was in comparative performance. I am still awaiting an AWS support response to my evaluation.

What’s next is to get an application team to evaluate end to end database operations, easily done as Route 53 DNS is used for endpoint communications.
Where I got stuck was incorporating the setup of RDS proxy within Terraform We currently use version 12. While there was the aws_db_proxy module, I needed an updated version of the aws provider to our environment. The official Hashicorp documentation of the resource really does not highlight the complexity necessary to create a proxy. While you will have already configured a VPC, and subnets, even Ingres security groups and secrets which all parts necessary for RDS cluster, you need a number of integrated pieces.

You will need an IAM role for your proxy, but that role requires a policy to use KMS to get the secrets you wish to use for authorization. This interdependency of KMS and secret ARNs make is difficult to easily launch a RDS proxy as you would an RDS aurora cluster. Still it’s a challenge for something else to do. The added complexity is the RDS proxy also needs an authorization argument, for example the –auth argument in the AWS CLI. I see this as a complexity for management of RDS users that you wish to also be configured for use in the proxy.

As with any evaluation or proof of concept (POC) the devil is in the details. How do you monitor your new resources, what logging is important to know, what types of errors can happen, and how do you address these.

Another issue I had was the RDS proxy required a new version of the AWS client in order to run RDS commands such as describe-db-proxies. That adds an additional administrative dependency to be rolled out.

Proxies for MySQL have been around for decades, I can remember personally working on the earliest version of MySQL Proxy at MySQL Inc back in 2007. The gold standard if you use MySQL, is ProxySQL by Sysown’s René Cannaò. This is a topic for a different discussion.

Checkout my code for this work.

Reading