#WDILTW – Creating examples can be hard

This week I was evaluating AWS QLDB. Specifically the verifiable history of changes to determine how to simplify present processes that perform auditing via CDC. This is not the first time I have looked at QLDB so there was nothing that new to learn.

What I found was that creating a workable solution with an existing application is hard. Even harder is creating an example to publish in this blog (and the purpose of this post).

First some background.

Using MySQL as the source of information, how can you leverage QLDB? It’s easy to stream data from MySQL Aurora, and it’s easy to stream data from QLDB, but it not that easy to place real-time data into QLDB. AWS DMS is a good way to move data from a source to a target, previously my work has included MySQL to MySQL, MySQL to Redshift, and MySQL to Kinesis, however there is no QLDB target.

Turning the problem upside down, and using QLDB as the source of information, and streaming to MySQL for compatibility seemed a way forward.

After setting up the QLDB Ledger and an example table, it was time to populate with existing data. The documented reference example looked very JSON compatible. Side bar, it is actually Amazon Ion a superset of JSON.

INSERT INTO Person
<< {
    'FirstName' : 'Raul',
    'LastName' : 'Lewis',
    'DOB' : `1963-08-19T`,
    'GovId' : 'LEWISR261LL',
    'GovIdType' : 'Driver License',
    'Address' : '1719 University Street, Seattle, WA, 98109'
},
{
    'FirstName' : 'Brent',
    'LastName' : 'Logan',
    'DOB' : `1967-07-03T`,
    'GovId' : 'LOGANB486CG',
    'GovIdType' : 'Driver License',
    'Address' : '43 Stockert Hollow Road, Everett, WA, 98203'
}

Now, MySQL offers with the X Protocol. This is something that lefred has evangelized for many years, I have seen presented many times, but finally I had a chance to use. The MySQL Shell JSON output looked ideal.

{
    "ID": 1523,
    "Name": "Wien",
    "CountryCode": "AUT",
    "District": "Wien",
    "Info": {
        "Population": 1608144
    }
}
{
    "ID": 1524,
    "Name": "Graz",
    "CountryCode": "AUT",
    "District": "Steiermark",
    "Info": {
        "Population": 240967
    }
}

And now, onto some of the things I learned this week.
Using AWS RDS Aurora MySQL is the first stumbling block, X Protocol is not supported. As this was a example, simple, mysqldump some reference data and load it into a MySQL 8 instance, and extract into JSON, so as to potentially emulate a pipeline.

Here is my experiences of trying to refactor into a demo to write up.

Launch a MySQL Docker container as per my standard notes. Harmless, right?

MYSQL_ROOT_PASSWORD="$(date | md5sum | cut -c1-20)#"
echo $MYSQL_ROOT_PASSWORD
docker run --name=qldb-mysql -p3306:3306 -v mysql-volume:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=$MYSQL_ROOT_PASSWORD -d mysql/mysql-server:latest
docker logs qldb-mysql
docker exec -it qldb-mysql /bin/bash

As it's a quick demo, I shortcut credentials to make using the mysql client easier. NOTE: as I always generate a new password each container, it's included here.

# echo "[mysql]
user=root
password='ab6ea7b0436cbc0c0d49#' > .my.cnf

# mysql 
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

What the? Did I make a mistake, I test manually and check

# mysql -u root -p

# cat .my.cnf

Nothing wrong there. Next check

# pwd
/
bash-4.2# grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

And there is the first Dockerism. I don't live in Docker, so these 101 learnings would be known. First I really thing using "root" by default is a horrible idea. And when you shell in, you are not dropped into the home directory? Solved, we move on.

# mv /.my.cnf /root/.my.cnf

Mock and example as quickly as I can think.

# mysql

mysql> create schema if not exists demo;
Query OK, 1 row affected (0.00 sec)

mysql> use demo;
Database changed
mysql> create table sample(id int unsigned not null auto_increment, name varchar(30) not null, location varchar(30) not null, domain varchar(50) null, primary key(id));
Query OK, 0 rows affected (0.03 sec)
mysql> show create table sample;

mysql> insert into sample values (null,'Demo Row','USA',null), (null,'Row 2','AUS','news.com.au'), (null,'Kiwi','NZ', null);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from sample;
+----+----------+----------+-------------+
| id | name     | location | domain      |
+----+----------+----------+-------------+
|  1 | Demo Row | USA      | NULL        |
|  2 | Row 2    | AUS      | news.com.au |
|  3 | Kiwi     | NZ       | NULL        |
+----+----------+----------+-------------+
3 rows in set (0.00 sec)

Cool, now to look at it in Javascript using MySQL Shell. Hurdle 2.

# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

 MySQL  JS > var session=mysqlx.getSession('root:ab6ea7b0436cbc0c0d49#@localhost')
mysqlx.getSession: Argument #1: Invalid URI: Illegal character [#] found at position 25 (ArgumentError)


What the, it doesn't like the password format. I'm not a Javascript person, and well this is an example for blogging, which is not what was actually setup, so do it the right way, create a user.

# mysql

mysql> create user demo@localhost identified by 'qldb';
Query OK, 0 rows affected (0.01 sec)

mysql> grant ALL ON sample.* to demo@localhost;
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> SHOW GRANTS FOR demo@localhost;
+----------------------------------------------------------+
| Grants for demo@localhost                                |
+----------------------------------------------------------+
| GRANT USAGE ON *.* TO `demo`@`localhost`                 |
| GRANT ALL PRIVILEGES ON `sample`.* TO `demo`@`localhost` |
+----------------------------------------------------------+
2 rows in set (0.00 sec)

Back into the MySQL Shell, and hurdle 3.

MySQL  JS > var session=mysqlx.getSession('demo:qldb@localhost')
mysqlx.getSession: Access denied for user 'demo'@'127.0.0.1' (using password: YES) (MySQL Error 1045)

Did I create the creds wrong, verify. No my password is correct.

#  mysql -udemo -pqldb -e "SELECT NOW()"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------+
| NOW()               |
+---------------------+
| 2021-03-06 23:15:26 |
+---------------------+

I don't have time to debug this, User take 2.

mysql> drop user demo@localhost;
Query OK, 0 rows affected (0.00 sec)

mysql> create user demo@'%' identified by 'qldb';
Query OK, 0 rows affected (0.01 sec)

mysql> grant all on demo.* to demo@'%'
    -> ;
Query OK, 0 rows affected (0.00 sec)

mysql> show grants;
+--
| Grants for root@localhost                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+---
| GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `root`@`localhost` WITH GRANT OPTION                                                                                                                                                                                                                    |
| GRANT APPLICATION_PASSWORD_ADMIN,AUDIT_ADMIN,BACKUP_ADMIN,BINLOG_ADMIN,BINLOG_ENCRYPTION_ADMIN,CLONE_ADMIN,CONNECTION_ADMIN,ENCRYPTION_KEY_ADMIN,FLUSH_OPTIMIZER_COSTS,FLUSH_STATUS,FLUSH_TABLES,FLUSH_USER_RESOURCES,GROUP_REPLICATION_ADMIN,INNODB_REDO_LOG_ARCHIVE,INNODB_REDO_LOG_ENABLE,PERSIST_RO_VARIABLES_ADMIN,REPLICATION_APPLIER,REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,SERVICE_CONNECTION_ADMIN,SESSION_VARIABLES_ADMIN,SET_USER_ID,SHOW_ROUTINE,SYSTEM_USER,SYSTEM_VARIABLES_ADMIN,TABLE_ENCRYPTION_ADMIN,XA_RECOVER_ADMIN ON *.* TO `root`@`localhost` WITH GRANT OPTION |
| GRANT PROXY ON ''@'' TO 'root'@'localhost' WITH GRANT OPTION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+---
3 rows in set (0.00 sec)

mysql> show grants for demo@'%';
+--------------------------------------------------+
| Grants for demo@%                                |
+--------------------------------------------------+
| GRANT USAGE ON *.* TO `demo`@`%`                 |
| GRANT ALL PRIVILEGES ON `demo`.* TO `demo`@`%`   |
+--------------------------------------------------+
2 rows in set (0.00 sec)

Right, initially I showed grants of not new user, but note to self, I should checkout the MySQL 8 Improved grants. I wonder how RDS MySQL 8 handles these, and how Aurora MySQL 8 will (when it ever drops, that's another story).

Third try is a charm, so nice to also see queries with 0.0000 execution granularity.

 MySQL  JS > var session=mysqlx.getSession('demo:qldb@localhost')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > session.sql(sql)
+----+----------+----------+-------------+
| id | name     | location | domain      |
+----+----------+----------+-------------+
|  1 | Demo Row | USA      | NULL        |
|  2 | Row 2    | AUS      | news.com.au |
|  3 | Kiwi     | NZ       | NULL        |
+----+----------+----------+-------------+
3 rows in set (0.0006 sec)

Get that now in JSON output. NOTE: There are 3 different JSON formats, this matched what I needed.

bash-4.2# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > var session=mysqlx.getSession('demo:qldb@localhost')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > shell.options.set('resultFormat','json/array')
 MySQL  JS > session.sql(sql)
[
{"id":1,"name":"Demo Row","location":"USA","domain":null},
{"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
{"id":3,"name":"Kiwi","location":"NZ","domain":null}
]
3 rows in set (0.0006 sec)

Ok, that works in interactive interface, I need it scripted.

# vi
bash: vi: command not found
# yum install vi
Loaded plugins: ovl
http://repo.mysql.com/yum/mysql-connectors-community/el/7/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden
Trying other mirror.
...

And another downer of Docker containers, other tools or easy ways to install them, again I want to focus on the actual example, and not all this preamble, so

# echo "var session=mysqlx.getSession('demo:qldb@localhost')
var sql='SELECT * FROM demo.sample'
shell.options.set('resultFormat','json/array')
session.sql(sql)" > dump.js


# mysqlsh < dump.js

What the? Hurdle 4. Did I typo this as well, I check the file, and cut/paste it and get what I expect.

# cat dump.js
var session=mysqlx.getSession('demo:qldb@localhost')
var sql='SELECT * FROM demo.sample'
shell.options.set('resultFormat','json/array')
session.sql(sql)
# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > var session=mysqlx.getSession('demo:qldb@localhost')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > shell.options.set('resultFormat','json/array')
 MySQL  JS > session.sql(sql)
[
{"id":1,"name":"Demo Row","location":"USA","domain":null},
{"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
{"id":3,"name":"Kiwi","location":"NZ","domain":null}
]
3 rows in set (0.0022 sec)

This is getting crazy.

# echo '[
> {"id":1,"name":"Demo Row","location":"USA","domain":null},
> {"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
> {"id":3,"name":"Kiwi","location":"NZ","domain":null}
> ]' > sample.json
bash-4.2# jq . sample.json
bash: jq: command not found

Oh the docker!!!!. Switching back to my EC2 instance now.

$ echo '[
> {"id":1,"name":"Demo Row","location":"USA","domain":null},
> {"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
> {"id":3,"name":"Kiwi","location":"NZ","domain":null}
> ]' > sample.json
$ jq . sample.json
[
  {
    "id": 1,
    "name": "Demo Row",
    "location": "USA",
    "domain": null
  },
  {
    "id": 2,
    "name": "Row 2",
    "location": "AUS",
    "domain": "news.com.au"
  },
  {
    "id": 3,
    "name": "Kiwi",
    "location": "NZ",
    "domain": null
  }
]

I am now way of the time I would like to spend on this weekly post, and it's getting way to long, and I'm nowhere near showing what I actually want. Still we trek on.

Boy, this stock EC2 image uses version 1, we need I'm sure V2, and well command does not work!!!!

$  aws qldb list-ledgers
ERROR:
$ aws --version

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install
$ export PATH=/usr/local/bin:$PATH
$ aws --version

Can I finally get a ledger now.

$ aws qldb create-ledger --name demo --tags JIRA=DEMO-5826,Owner=RonaldBradford --permissions-mode ALLOW_ALL --no-deletion-protection
 
{
    "Name": "demo",
    "Arn": "arn:aws:qldb:us-east-1:999:ledger/demo",
    "State": "CREATING",
    "CreationDateTime": "2021-03-06T22:46:41.760000+00:00",
    "DeletionProtection": false
}

$  aws qldb list-ledgers

{
    "Ledgers": [
        {
            "Name": "xx",
            "State": "ACTIVE",
            "CreationDateTime": "2021-03-05T20:12:44.611000+00:00"
        },
        {
            "Name": "demo",
            "State": "ACTIVE",
            "CreationDateTime": "2021-03-06T22:46:41.760000+00:00"
        }
    ]
}

$ aws qldb describe-ledger --name demo
{
    "Name": "demo",
    "Arn": "arn:aws:qldb:us-east-1:999:ledger/demo",
    "State": "ACTIVE",
    "CreationDateTime": "2021-03-06T22:46:41.760000+00:00",
    "DeletionProtection": false
}

Oh the Python 2, and the lack of user packaging, more crud of getting an example.

$ pip install pyqldb==3.1.0
ERROR

$ echo "alias python=python3
alias pip=pip3" >> ~/.bash_profile
source ~/.bash_profile
$ pip --version
pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)

$ python --version
Python 3.6.8

$ pip install pyqldb==3.1.0

ERROR

$ sudo pip install pyqldb==3.1.0

Yeah!, after all that, my example code works and data is inserted.

$ cat demo.py
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)


def create_table(transaction_executor, table):

    print("Creating table {}".format(table))
    transaction_executor.execute_statement("Create TABLE {}".format(table))

def create_index(transaction_executor, table, column):
    print("Creating index {}.{}".format(table, column))
    transaction_executor.execute_statement("CREATE INDEX ON {}({})".format(table,column))


def insert_record(transaction_executor, table, values):
    print("Inserting into {}".format(table))
    transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"
column="id"
qldb_driver.execute_lambda(lambda executor: create_table(executor, table))
qldb_driver.execute_lambda(lambda executor: create_index(executor, table, column))


record1 = { 'id': "1",
            'name': "Demo Row",
            'location': "USA",
            'domain':  ""
        }

qldb_driver.execute_lambda(lambda x: insert_record(x, table, record1))
$ python demo.py
Initializing the driver
Creating table sample
Creating index sample.id
Inserting into sample

One vets in the AWS Console, but you cannot show that in text in this blog, so goes to find a simple client and there is qldbshell

What the? I installed it and it complains about pyqldb.driver.pooled_qldb_driver. I literally used that in the last example.

$ pip3 install qldbshell
Collecting qldbshell
  Downloading PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth' -> '/tmp/pip-p8j4d45d-uninstall/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python3.6/site-packages/pip/commands/install.py", line 365, in run
    strip_file_prefix=options.strip_file_prefix,
  File "/usr/lib/python3.6/site-packages/pip/req/req_set.py", line 783, in install
    requirement.uninstall(auto_confirm=True)
  File "/usr/lib/python3.6/site-packages/pip/req/req_install.py", line 754, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/usr/lib/python3.6/site-packages/pip/req/req_uninstall.py", line 115, in remove
    renames(path, new_path)
  File "/usr/lib/python3.6/site-packages/pip/utils/__init__.py", line 267, in renames
    shutil.move(old, new)
  File "/usr/lib64/python3.6/shutil.py", line 565, in move
    os.unlink(src)
PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth'
[centos@ip-10-204-101-224] ~
$ sudo pip3 install qldbshell
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting qldbshell
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Installing collected packages: amazon.ion, qldbshell
  Found existing installation: amazon.ion 0.7.0
    Uninstalling amazon.ion-0.7.0:
      Successfully uninstalled amazon.ion-0.7.0
  Running setup.py install for amazon.ion ... done
  Running setup.py install for qldbshell ... done
Successfully installed amazon.ion-0.5.0 qldbshell-1.2.0


$ sudo pip3 install qldbshell

$ qldbshell
Traceback (most recent call last):
  File "/usr/local/bin/qldbshell", line 11, in 
    load_entry_point('qldbshell==1.2.0', 'console_scripts', 'qldbshell')()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 476, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2700, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2318, in load
    return self.resolve()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.6/site-packages/qldbshell/__main__.py", line 25, in 
    from pyqldb.driver.pooled_qldb_driver import PooledQldbDriver
ModuleNotFoundError: No module named 'pyqldb.driver.pooled_qldb_driver'
$ pip list qldbshell
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
amazon.ion (0.5.0)
boto3 (1.17.21)
botocore (1.20.21)
ionhash (1.1.0)
jmespath (0.10.0)
pip (9.0.3)
prompt-toolkit (3.0.16)
pyqldb (3.1.0)
python-dateutil (2.8.1)
qldbshell (1.2.0)
s3transfer (0.3.4)
setuptools (39.2.0)
six (1.15.0)
urllib3 (1.26.3)

So, uninstalled and re-installed and voila, my data.

$ qldbshell
usage: qldbshell [-h] [-v] [-s QLDB_SESSION_ENDPOINT] [-r REGION] [-p PROFILE]
                 -l LEDGER
qldbshell: error: the following arguments are required: -l/--ledger
$ qldbshell -l demo

Welcome to the Amazon QLDB Shell version 1.2.0
Use 'start' to initiate and interact with a transaction. 'commit' and 'abort' to commit or abort a transaction.
Use 'start; statement 1; statement 2; commit; start; statement 3; commit' to create transactions non-interactively.
Use 'help' for the help section.
All other commands will be interpreted as PartiQL statements until the 'exit' or 'quit' command is issued.

qldbshell >

qldbshell > SELECT * FROM sample;                                                                                                                           
INFO:
{
 id: "1",
 name: "Demo Row",
 location: "USA",
 domain: ""
}
INFO: (0.1718s)

qldbshell > \q                                                                                                                                              
WARNING: Error while executing query: An error occurred (BadRequestException) when calling the SendCommand operation: Lexer Error: at line 1, column 1: invalid character at, '\' [U+5c];
INFO: (0.1134s)
qldbshell > exit                                                                                                                                            
Exiting QLDB Shell

Right \q is a mysqlism of the client, need to rewire myself.

Now, I have a ledger, I created an example table, mocked a row of data and verified. Now I can just load my sample data in JSON I created earlier right? Wrong!!!

$ cat load.py
import json
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)

def insert_record(transaction_executor, table, values):
  print("Inserting into {}".format(table))
  transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"

with open('sample.json') as f:
  data=json.load(f)

qldb_driver.execute_lambda(lambda x: insert_record(x, table, data))

$ python load.py
Traceback (most recent call last):
  File "load.py", line 2, in 
    from pyqldb.config.retry_config import RetryConfig
ModuleNotFoundError: No module named 'pyqldb'
[centos@ip-10-204-101-224] ~

Oh sweet, I'd installed that, and used it, and re-installed it.

$ pip list | grep pyqldb
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
[centos@ip-10-204-101-224] ~
$ sudo pip3 install pyqldb
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting pyqldb
Requirement already satisfied: boto3<2,>=1.16.56 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: botocore<2,>=1.19.56 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: ionhash<2,>=1.1.0 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from amazon.ion<1,>=0.7.0->pyqldb)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/lib/python3.6/site-packages (from boto3<2,>=1.16.56->pyqldb)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/site-packages (from boto3<2,>=1.16.56->pyqldb)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/site-packages (from botocore<2,>=1.19.56->pyqldb)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<2,>=1.19.56->pyqldb)
Installing collected packages: amazon.ion, pyqldb
  Found existing installation: amazon.ion 0.5.0
    Uninstalling amazon.ion-0.5.0:
      Successfully uninstalled amazon.ion-0.5.0
  Running setup.py install for amazon.ion ... done
  Running setup.py install for pyqldb ... done
Successfully installed amazon.ion-0.7.0 pyqldb-3.1.0

Load one more time.


$ cat load.py
import json
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)

def insert_record(transaction_executor, table, values):
  print("Inserting into {}".format(table))
  transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"

with open('sample.json') as f:
  data=json.load(f)

qldb_driver.execute_lambda(lambda x: insert_record(x, table, data))

$ python load.py
Initializing the driver
Inserting into sample

And done, I've got my JSON extracted MySQL 8 data in QLDB. I go to vett it in the client, and boy, didn't expect yet another package screw up. Clearly, these 2 AWS python packages are incompatible. That's a venv need, but I'm now at double my desired time to show this.

$ qldbshell -l demo
Traceback (most recent call last):
  File "/usr/local/bin/qldbshell", line 11, in 
    load_entry_point('qldbshell==1.2.0', 'console_scripts', 'qldbshell')()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 476, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2700, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2318, in load
    return self.resolve()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.6/site-packages/qldbshell/__main__.py", line 25, in 
    from pyqldb.driver.pooled_qldb_driver import PooledQldbDriver
ModuleNotFoundError: No module named 'pyqldb.driver.pooled_qldb_driver'
[centos@ip-10-204-101-224] ~
$ pip list | grep qldbshell
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
qldbshell (1.2.0)


$ sudo pip uninstall qldbshell pyqldb

$ sudo pip install qldbshell
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting qldbshell
  Downloading Requirement already satisfied: boto3>=1.9.237 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: amazon.ion<0.6.0,>=0.5.0 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.5 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: ionhash~=1.1.0 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: botocore<1.21.0,>=1.20.21 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from amazon.ion<0.6.0,>=0.5.0->qldbshell)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.6/site-packages (from prompt_toolkit<3.1.0,>=3.0.5->qldbshell)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Installing collected packages: qldbshell
  Running setup.py install for qldbshell ... done
Successfully installed qldbshell-1.2.0

Can I see my data now


$ qldbshell -l demo

Welcome to the Amazon QLDB Shell version 1.2.0
Use 'start' to initiate and interact with a transaction. 'commit' and 'abort' to commit or abort a transaction.
Use 'start; statement 1; statement 2; commit; start; statement 3; commit' to create transactions non-interactively.
Use 'help' for the help section.
All other commands will be interpreted as PartiQL statements until the 'exit' or 'quit' command is issued.

qldbshell > select * from sample;                                                                                                                           
INFO:
{
 id: 1,
 name: "Demo Row",
 location: "USA",
 domain: null
},
{
 id: 1,
 name: "Demo Row",
 location: "USA",
 domain: null
},
{
 id: "1",
 name: "Demo Row",
 location: "USA",
 domain: ""
},
{
 id: 3,
 name: "Kiwi",
 location: "NZ",
 domain: null
},
{
 id: 2,
 name: "Row 2",
 location: "AUS",
 domain: "news.com.au"
},
{
 id: 3,
 name: "Kiwi",
 location: "NZ",
 domain: null
},
{
 id: 2,
 name: "Row 2",
 location: "AUS",
 domain: "news.com.au"
}
INFO: (0.0815s)

And yes, data, I see it's duplicated, so I must have in between the 10 steps run twice. This does highlight a known limitation of QLDB, no unique constraints.

But wait, that data is not really correct, I don't want null. Goes back to the JSON to see the MySQL shell gives that.

$ jq . sample.json
[
  {
    "id": 1,
    "name": "Demo Row",
    "location": "USA",
    "domain": null
  },
...

At some point I also got this load error, but by now I've given up documenting how to do something, in order to demonstrate something.

NameError: name 'null' is not defined

One has to wrap the only nullable column with IFNULL(subdomain,'') as subdomain and redo all those steps again. This is not going to be practical having to wrap all columns in a wider table with IFNULL.

However, having exhausted all this time for what was supposed to be a quiet weekend few hours, my post is way to long, and I've learned "Creating examples can be hard".

#WDILTW – What can I run from my AWS Aurora database

When you work with AWS Aurora you have limited admin privileges. There are some different grants for MySQL including SELECT INTO S3 and LOAD FROM S3 that replace the loss of functionality to SELECT INTO OUTFILE and mysqldump/mysqlimport using a delimited format. While I know and use lambda capabilities, I have never executed anything with INVOKE LAMDBA directly from the database.

This week I found out about INVOKE COMPREHEND (had to look that product up), and INVOKE SAGEMAKER (which I used independently). These are machine learning capabilities that enable you to build custom integrations using Comprehend and SageMaker. I did not have any chance to evaluate these capabilities so I am unable to share any use cases or experiences. There are two built-in comprehend functions aws_comprehend_detect_sentiment() and aws_comprehend_detect_sentiment_confidence(), a likely future starting place. Sagemaker is invoked as an extension of a CREATE FUNCTION that provides the ALIAS AWS_SAGEMAKER_INVOKE_ENDPOINT syntax.

Also available are some MySQL status variables including Aurora_ml_logical_response_cnt, Aurora_ml_actual_request_cnt, Aurora_ml_actual_response_cnt, Aurora_ml_cache_hit_cnt, Aurora_ml_single_request_cnt.

Some googling found an interesting simple example, calculating the positive/negative sentiment and confidence of sentences of text. I could see this as useful for analyzing comments. I’ve included the example from this site here to encourage my readers to take a look as I plan to do. Post IAM configuration I will be really curious to evaluate the responsiveness of this example. Is this truly a batch only operation or could you return some meaningful response timely?

This also lead to bookmarking for reading https://awsauroralabsmy.com/, https://github.com/aws-samples/amazon-aurora-labs-for-mysql/ and https://squidfunk.github.io/mkdocs-material/ all from this one page.

#WDILTW – To use a RDBMS is to use a transaction

I learned this week that 30+ years of Relational Database Management System (RDBMS) experience still does not prepare yourself for the disappointment of working with organizations that use a RDBMS; MySQL specifically; have a released production product, have dozens to hundreds of developers, team leaders and architects, but do not know the importance of, nor use transactions. If I was to ask this when interviewing somebody that would work with a database and the response was it is not important, or not used these days it would be a hard fail.

To use a RDBMS is to understand a very simple principle, a foundation of a transactional system. It is called ACID. Atomicity, Consistency, Isolation, Durability.

In a simplistic description of this 50 year principle.

  • A – It’s all or nothing
  • C – Your work meets all existing constraints
  • I – Your work is independent and not affected by other concurrent work
  • D – Your work is reproducible in the event of a hard failure

A – Atomicity means a transaction. It’s not rocket science, it’s actually in the description of a RDBMS. What is a transaction? It is a means in Structure Query Language (SQL) notation that can ensure your unit of work is all or nothing.

The most basic of examples is very simple. You want to deposit a check that was given to you. For the bank to accept this deposit, it needs to ensure that the account holder that issued the check has sufficient funds, that is Step 1: subtract amount X from account A, and Step 2: deposit in account B. You cannot do Step 1 or Step 2 independently, you must do 1 and 2 together, otherwise real money could be lost or created from thin air. The lack of transactions also can be affected by not handling I – Isolation.

We have to ask our educators and responsible professionals why? Why can such a bedrock principle not be used? Why in a team of individuals in a company, one single individual from the years of development and support not say, well you have to use a transaction to ensure the integrity of the data you are working with? It’s called a database for a reason.

The unfortunately reality is there are several reasons for this utter failure, and the frustration of professionals like myself.

  1. RDBMS is not the bedrock of data management it was 20,30 even 50 years ago. Today with a proliferation of different products, transactions are simply not taught. Even SQL is not taught, yet most products that want to enable users to query data ultimately provide a SQL-like interface. Many users today want a drag & drop GUI interface but not realize that is not how you manipulate data. Hadoop was a life change with map-reduce approach to large datasets. iIt did not start with SQL, but it has a widely used SQL interface now. You look at the newest hot products like snowflake. What is it’s method of data access, SQL.
  2. MySQL, the most popular open source database does not enforce transactions, it’s optional. What you say? MySQL has some unique features including the capability for multiple storage engines, that provide different features and capabilities, such as index approaches, consistency and support for transactions. An entire generation of open source products were released and do not use transactions, but use a transactional storage engine within a transactional product.
  3. Still on the topic of MySQL, it also does not enforce strict handling like other RDBMS products. The default for decades was to enable an SQL statement to corrupt the data integrity, i.e. C – Consistency. Because MySQL supported warnings and they were always not checked for by developers, data in did not guarantee data out.
  4. Every mini-generation (e.g. 5 years) of new developers think they know more, than seasoned professionals.

As it was for me in the 80s, it should be for use of a relational database; and even a non-relational database; to read and understand the seminal works of An introduction to Database Systems by C.J. Date. It has been required learning at universities for decades, but it seems to have lost is favor with multiple iterations of new software developers.

#3 – What Did I learn this week. WDILTW.

#WDILTW – Debugging failed http requests thru the web of redirects

There are reports that your website is down. You pull up the login page without incident. What’s next?

Monitoring is critical. How detailed is this? How frequently are you sampling? The resolution to any issue is only as good as the response to a paged alert. Who is looking into the issue? What escalation exists?

In today’s complex interconnected infrastructure is it ever that simple? When speaking about an AWS hosted solution, is it an AWS Issue? Does status.aws.amazon.com give you a clue? Does the inability to access other services/sites you may be using at this moment give an indicator of a larger problem? Is it AWS related for a service, an availability zone, or even an entire region? Having experienced all of those before sometimes its obvious, sometimes it is not. Or does a Twitter Search report other shared experiences of regional outages, was it that severed Verizon underwater cable?

I learned two things this week in triage of this situation. The first is that the old CLI tools you have been using for 20+ years still help in triage quickly. D not discount them or the detail they provide. I was able to identify and reproduce an underlying cause with just nslookup and curl. For many reviewing the outage the problem did not manifest as an error. It turned out there were two distinct paths from two separate domains to the ultimate target page. This was not immediately obvious and known, and there was no definitive network diagram to describe this.

When this was determined nslookup provided that there were two different resolved AWS ELBs. dig is also a useful command to master, for example to determine if an A record or CNAME for example.

$ nslookup demo.internal-example.com

demo.internal-example.com	canonical name = internal.us-east-1.elb.amazonaws.com.
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.1.2
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.0.3
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.2.4
$ ▶ nslookup demo.public-example.com

Non-authoritative answer:
demo.public-example.com	         canonical name = external.us-east-1.elb.amazonaws.com.
Name:	 external.us-east-1.elb.amazonaws.com
Address: 23.123.111.222
Name:	 external.us-east-1.elb.amazonaws.com
Address: 50.200.211.222

The first indication was actually to find that one of the ELBs was not in the AWS account with all other resources, and this AWS account was not viewable. That is a separate discussion for why? curl then helped to traverse the various redirects of each ELB using these options

  • -i/–include – Include the headers
  • -k/–insecure – Allow insecure SSL connections
  • -L/–location – Follow redirects
$ curl -ikL external.us-east-1.elb.amazonaws.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Thu, 11 Feb 2021 20:34:47 GMT
Content-Type: text/html
Content-Length: 134
Location: https://external.us-east-1.elb.amazonaws.com:443/
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 0

HTTP/1.1 200 Connection established

HTTP/2 302
date: Thu, 11 Feb 2021 20:34:48 GMT
content-length: 0
location: http://demo.unavailable.com
cache-control: no-cache

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 2071
Date: Thu, 11 Feb 2021 19:09:29 GMT
Last-Modified: Tue, 18 Dec 2018 05:32:31 GMT
Accept-Ranges: bytes
Server: AmazonS3
X-Cache: Hit from cloudfront
Via: 1.1 44914fa6421b789193cec8998428f8bd.cloudfront.net (CloudFront)
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 1071

<html

Using these commands was nothing new, however identifying this single line provided a way to isolate within the chain of redirects where to focus.

content-length: 0

Ultimately the issue was not ELB related, but internal infrastructure behind this one ELB. When corrected the result was (trimmed for readability)

$ curl -ikL external.us-east-1.elb.amazonaws.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Thu, 11 Feb 2021 20:37:18 GMT
Content-Type: text/html
Content-Length: 134
Location: https://external.us-east-1.elb.amazonaws.com:443/
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 0

HTTP/1.1 200 Connection established

HTTP/2 302
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/plain; charset=utf-8
content-length: 27
x-powered-by: 
location: /redirect
vary: Accept

HTTP/2 301
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/html
content-length: 162
location: /redirect/

HTTP/2 200
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/html
content-length: 2007
last-modified: Tue, 02 Feb 2021 03:27:13 GMT
vary: Accept-Encoding

<html>
  <head>

In summary, and a means to triage a future problem, or to monitor:

Failure success
$ egrep -i "^HTTP|^Content-Length" 

HTTP/1.1 301 Moved Permanently
Content-Length: 134
HTTP/1.1 200 Connection established
HTTP/2 302
content-length: 0
HTTP/1.1 200 OK
Content-Length: 2071


$ egrep -i "^HTTP|^Content-Length"

HTTP/1.1 301 Moved Permanently
Content-Length: 134
HTTP/1.1 200 Connection established
HTTP/2 302
content-length: 27
HTTP/2 301
content-length: 162
HTTP/2 200
content-length: 2007

With the proliferation of GUI based monitoring products it is likely for many organizations that multiple different monitors are available, but are they triggered, and do they enable you to pinpoint the underlying issue? Long gone are the days of a Pingdom type ping of a URL from multiple locations every minute and a report of latency or errors then you start digging. This week I learned about DataDog Synthetic Monitoring. DataDog is a well established monitoring solution that I have only just started to understand, I wish I had a year to master to delving into.

In later review this monitoring showed an already configured browser test for this top level URL that was failing, it was simply not alerting correctly. The Synthetic monitoring is far more advanced, providing an ITTT workflow, and even provides physical images of the rendered pages.

This experience highlighted the need to have detailed and redundant monitoring but also the right process to triage and drill down.

I looked into trying to provide an example of this DataDog feature, however the free tier monitoring solution does not provide all the advanced features for the evaluation I’d like. You can look at some product examples.

Observability is a key tool in any operations management. It should be one of the pillars where a continued investment of time, resources and skills development can add significant value for business continuity.

#WDILTW – AWS RDS Proxy

This week I was evaluating AWS RDS Proxy. If you are familiar with the Relational Database Service (RDS) and use MySQL or Postgres, this is an additional option to consider.

Proxies in general by the name accept incoming requests and perform some management before those requests are forwarded to the ultimate target.

RDS proxy takes incoming database connections and can perform several capabilities including collection pooling and capping the total database connections with each configured proxy holding a percentage of the total connections for the target cluster. The proxy can handle routing only for writer instances (at this time) to minimize a planned or unplanned failover. The RDS proxy however does not address the underlying problem of too many connections to the database, it just adds another layer, that is or may be more configurable or tunable than an application requesting connections.

The RDS Proxy is automatically Highly Available (HA). You can determine this by looking at the host IPs of the MySQL processlist. I have yet to identify any other means of seeing if a connection is a proxy connection at the database level if you are using the same credentials. RDS Proxy does give you the ability via Secrets Manager to connect as a different user. You can specify a connection initialization query. I used a SET variable so that application could determine if it was using a Proxy however that is of little benefit in server connection management.

The RDS proxy can enforce TLS, something which in my opinion should always be used for application to data store communications, but historically has been overlooked at practically every company I have worked for or consulted to. Just because you are communicating within a VPC does not protect your communications from actors within your VPC. I can remember at a prior employment the disappointment of cross-region replication that was encrypted being dropped because it was too hard to migrate or manage. That shows an all too common problem of laziness over security.

If you are new to a particular technology the age of the Internet gives you search capabilities to find numerous articles. If you search for anything AWS you will generally always get as the top results the official pages, it takes some digging to find other articles. Prior to this lesson I had only read about RDS Proxy, I had never actually setup one.

When anybody is learning something new, I like to say your value add is not to just read an article, but reproduce and then adapt or enhance. This Amazon example is no different. Repeating each step showed multiple errors in syntax which I can contribute back as comments. If this was open source code, you could contribute a pull request (PR). The good news is the first example of configuring a proxy includes by GUI and CLI commands. I always like to do my work on the command line, even the first iteration. You cannot scale a human moving a mouse around and clicking. What I found however was that the official AWS CLI lacked a key component of the proxy setup around group targets. The UI provides a capability that the CLI did not. Another discrepancy was when I was making modifications to the proxy in the GUI I would get an error, but I could make that change via the CLI. These discrepancies are an annoyance for consistency and first evaluation.

So what was the outcome of my evaluation? First I was able to demonstrate I could add a proxy to an existing cluster in one of our test environments and direct traffic from a mysql client thru the proxy to the target database. I was able to use Secrets Manager (SSM) to enforce credentials for authorization. I did not look into Identity Access Management (IAM) roles support. I was able to benchmark with sysbench simulated load to compare latency of the proxy traffic versus direct traffic. I have simplified my examples so that anybody can run these tests themselves for simple validation.

I could enforce TLS communications for the mysql client testing, however our company internal http proxy caused the usual self signed certificate issues with sysbench, something I really need to master. Surprisingly I looked at what options sysbench gave me for SSL options (side bar we should always refer to this as TLS instead of SSL), but the defined options for the installed recent version are still using the ssl name. The scope of options differed from the source code online so a question as to why? That’s the great thing about open source, you can read the code. You may have even met the author at a conference presentation.

Where the evaluation hit a business impact was in comparative performance. I am still awaiting an AWS support response to my evaluation.

What’s next is to get an application team to evaluate end to end database operations, easily done as Route 53 DNS is used for endpoint communications.
Where I got stuck was incorporating the setup of RDS proxy within Terraform We currently use version 12. While there was the aws_db_proxy module, I needed an updated version of the aws provider to our environment. The official Hashicorp documentation of the resource really does not highlight the complexity necessary to create a proxy. While you will have already configured a VPC, and subnets, even Ingres security groups and secrets which all parts necessary for RDS cluster, you need a number of integrated pieces.

You will need an IAM role for your proxy, but that role requires a policy to use KMS to get the secrets you wish to use for authorization. This interdependency of KMS and secret ARNs make is difficult to easily launch a RDS proxy as you would an RDS aurora cluster. Still it’s a challenge for something else to do. The added complexity is the RDS proxy also needs an authorization argument, for example the –auth argument in the AWS CLI. I see this as a complexity for management of RDS users that you wish to also be configured for use in the proxy.

As with any evaluation or proof of concept (POC) the devil is in the details. How do you monitor your new resources, what logging is important to know, what types of errors can happen, and how do you address these.

Another issue I had was the RDS proxy required a new version of the AWS client in order to run RDS commands such as describe-db-proxies. That adds an additional administrative dependency to be rolled out.

Proxies for MySQL have been around for decades, I can remember personally working on the earliest version of MySQL Proxy at MySQL Inc back in 2007. The gold standard if you use MySQL, is ProxySQL by Sysown’s René Cannaò. This is a topic for a different discussion.

Checkout my code for this work.

Reading