#WDILTW – RTFM, then RTFM again, then improve it

This week I learned two valuable aspects of Terraform I did not know.

The first is Terraform State Import. While I use terraform state to list and show state and even remove state, I was unaware you could import from a created AWS resource. It’s not actually an argument to the “terraform state” syntax, instead its “terraform import” and likely why I do not see it when I look at terraform state syntax.

% terraform state
Usage: terraform [global options] state  [options] [args]

  This command has subcommands for advanced state management.

  These subcommands can be used to slice and dice the Terraform state.
  This is sometimes necessary in advanced cases. For your safety, all
  state management commands that modify the state create a timestamped
  backup of the state prior to making modifications.

  The structure and output of the commands is specifically tailored to work
  well with the common Unix utilities such as grep, awk, etc. We recommend
  using those tools to perform more advanced state tasks.

Subcommands:
    list                List resources in the state
    mv                  Move an item in the state
    pull                Pull current state and output to stdout
    push                Update remote state from a local state file
    replace-provider    Replace provider in the state
    rm                  Remove instances from the state

I am not an expert in Terraform, and looking at the command help output shown above did not give me reference to look elsewhere, but just reading the manual can help you to learn a new feature. If you do not know a product, reading documentation and examples can be an ideal way to get started in a self-paced way.

The second is Meta-Arguments. I use lifecycle, and to be honest I have learned and forgotten about count. Count was something I was able to use to solve a very nasty cross-region kinesis stream issue, reminding me of a syntax I had since forgotten. Using coalesce and conditional expressions (aka ternary operator) can help in modules, for example.

resource "aws_rds_cluster" "demo" {
  ...
  global_cluster_identifier       = var.has_global_cluster ? local.global_cluster_identifier : ""
  master_username                 = var.has_global_cluster ? "" : var.master_username
  db_cluster_parameter_group_name = coalesce(var.db_cluster_parameter_group_name , local.db_cluster_parameter_group_name)
  ...      

However to stop the creation of the object completely, use count.

resource "aws_???" "demo_???" {
  count = var.filter_condition ? 1 : 0
  ...

And just when I thought I’d read about Meta-Arguments, I hit a new never before seen problem. Now if I’d read the summary resources page about Meta-Arguments, and looked the very next section I would have been able to likely solve this new error without having to RTFM a second time.

module.?.?.aws_rds_cluster.default: Still creating... [1h59m53s elapsed]

Error: Error waiting for RDS Cluster state to be "available": timeout while waiting for state to become 'available' (last state: 'creating', timeout: 2h0m0s)

on .terraform/modules/?/main.tf line 306, in resource "aws_rds_cluster" "default":

306: resource "aws_rds_cluster" "default" {

I did not know there was a 2 hour timeout, and I did not know you can change that with

timeouts {
    create = "4h"
    delete = "4h"
  }
}

On a number of occasions I have found documentation to not be complete or accurate online. If you find this, then submit a request to get it fixed, must sources include a link at the bottom to recommend improvements. I have had good success with submitting improvements to the AWS documentation.

A QLDB Cheat Sheet for MySQL Users

The AWS ledger database (QLDB) is an auditors best friend and lives up to the stated description of “Amazon QLDB can be used to track each and every application data change and maintains a complete and verifiable history of changes over time.”

This presentation will go over what was done to take a MySQL application that provided auditing activity changes for key data, and how it is being migrated to QLDB.

While QLDB does use a SQL-format for DML (PartiQL), and you can perform the traditional INSERT/UPDATE/DELETE/SELECT, the ability to extend these statements to manipulate Amazon Ion data (a superset of JSON) gives you improved capabilities and statements.

Get a comparison of how to map a MySQL structure multiple tables and lots of columns into a single QLDB table and then benefit with an immutable and cryptographically verifiable transaction log. No more triggers, duplicated tables, extra auditing for abuse of binary log activity.

We also cover the simplicity of using X Protocol and JSON output for data migration, and the complexity of AWS RDS not supporting X Protocol.

Understanding AWS RDS Aurora Capabilities

The RDS Aurora MySQL/PostgreSQL capabilities of AWS extend the High Availability (HA) capabilities of RDS read replicas and Multi-AZ. In this presentation I discuss the different capabilities and HA configurations with RDS Aurora including:

  • RDS Aurora Cluster single instance
  • RDS Aurora Cluster multiple instances (writer + 1 or more readers)
  • RDS Aurora Cluster multi-master
  • RDS Aurora Global Cluster
  • RDS Aurora Cluster options for multi-regions

Each option has its relative merits and limitations. Each will depend on your business requirements, global needs and budget.

Upcoming Percona Live 2021 Presentations

I am pleased to have been selected to present at Percona Live 2021 May 12-13. My presentations include talks on AWS RDS Aurora and QLDB managed services.

Understanding AWS RDS Aurora Capabilities

The RDS Aurora MySQL/PostgreSQL capabilities of AWS extend the HA capabilities of RDS read replicas and Multi-AZ.

In this presentation we will discuss the different capabilities and HA configurations with RDS Aurora including:

* RDS Cluster single instance
* RDS Cluster multiple instances (writer + 1 or more readers)
* RDS Cluster multi-master
* RDS Global Cluster
* RDS Cluster options for multi-regions

Each option has its relative merits and limitations. Each will depend on your business requirements, global needs and budget.

This presentation will include setup, monitoring and failover evaluations for the attendee with the goal to provide a feature matrix of when/how to consider each option as well as provide some details of the subtle differences Aurora provides.

This presentation is not going to go into the technical details of RDS Aurora’s underlying infrastructure or a feature by feature comparison of AWS RDS to AWS RDS Aurora.

A QLDB Cheatsheet for MySQL Users

Amazons new ledger database (QLDB) is an auditors best friend and lives up to the stated description of “Amazon QLDB can be used to track each and every application data change and maintains a complete and verifiable history of changes over time.”

This presentation will go over what was done to take a MySQL application that provided auditing activity changes for key data, and how it is being migrated to QLDB.

While QLDB does use a SQL-format for DML, and you can perform the traditional INSERT/UPDATE/DELETE/SELECT. The ability to extend these statements to manipulate Amazon Ion data (a superset of JSON) gives you improved data manipulation, and for example the FROM SQL statement.

Get a blow by blow comparison of MySQL structures (multiple tables and lots of columns) and SQL converted into a single QLDB table, with immutable, and cryptographically verifiable transaction log. No more triggers, duplicated tables, extra auditing for abuse of binary log activity.

We also cover the simplicity of using X Protocol and JSON output for data migration, and the complexity of AWS RDS not supporting X Protocol

#WDILTW – Creating examples can be hard

This week I was evaluating AWS QLDB. Specifically the verifiable history of changes to determine how to simplify present processes that perform auditing via CDC. This is not the first time I have looked at QLDB so there was nothing that new to learn.

What I found was that creating a workable solution with an existing application is hard. Even harder is creating an example to publish in this blog (and the purpose of this post).

First some background.

Using MySQL as the source of information, how can you leverage QLDB? It’s easy to stream data from MySQL Aurora, and it’s easy to stream data from QLDB, but it not that easy to place real-time data into QLDB. AWS DMS is a good way to move data from a source to a target, previously my work has included MySQL to MySQL, MySQL to Redshift, and MySQL to Kinesis, however there is no QLDB target.

Turning the problem upside down, and using QLDB as the source of information, and streaming to MySQL for compatibility seemed a way forward.

After setting up the QLDB Ledger and an example table, it was time to populate with existing data. The documented reference example looked very JSON compatible. Side bar, it is actually Amazon Ion a superset of JSON.

INSERT INTO Person
<< {
    'FirstName' : 'Raul',
    'LastName' : 'Lewis',
    'DOB' : `1963-08-19T`,
    'GovId' : 'LEWISR261LL',
    'GovIdType' : 'Driver License',
    'Address' : '1719 University Street, Seattle, WA, 98109'
},
{
    'FirstName' : 'Brent',
    'LastName' : 'Logan',
    'DOB' : `1967-07-03T`,
    'GovId' : 'LOGANB486CG',
    'GovIdType' : 'Driver License',
    'Address' : '43 Stockert Hollow Road, Everett, WA, 98203'
}

Now, MySQL offers with the X Protocol. This is something that lefred has evangelized for many years, I have seen presented many times, but finally I had a chance to use. The MySQL Shell JSON output looked ideal.

{
    "ID": 1523,
    "Name": "Wien",
    "CountryCode": "AUT",
    "District": "Wien",
    "Info": {
        "Population": 1608144
    }
}
{
    "ID": 1524,
    "Name": "Graz",
    "CountryCode": "AUT",
    "District": "Steiermark",
    "Info": {
        "Population": 240967
    }
}

And now, onto some of the things I learned this week.
Using AWS RDS Aurora MySQL is the first stumbling block, X Protocol is not supported. As this was a example, simple, mysqldump some reference data and load it into a MySQL 8 instance, and extract into JSON, so as to potentially emulate a pipeline.

Here is my experiences of trying to refactor into a demo to write up.

Launch a MySQL Docker container as per my standard notes. Harmless, right?

MYSQL_ROOT_PASSWORD="$(date | md5sum | cut -c1-20)#"
echo $MYSQL_ROOT_PASSWORD
docker run --name=qldb-mysql -p3306:3306 -v mysql-volume:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=$MYSQL_ROOT_PASSWORD -d mysql/mysql-server:latest
docker logs qldb-mysql
docker exec -it qldb-mysql /bin/bash

As it's a quick demo, I shortcut credentials to make using the mysql client easier. NOTE: as I always generate a new password each container, it's included here.

# echo "[mysql]
user=root
password='ab6ea7b0436cbc0c0d49#' > .my.cnf

# mysql 
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

What the? Did I make a mistake, I test manually and check

# mysql -u root -p

# cat .my.cnf

Nothing wrong there. Next check

# pwd
/
bash-4.2# grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

And there is the first Dockerism. I don't live in Docker, so these 101 learnings would be known. First I really thing using "root" by default is a horrible idea. And when you shell in, you are not dropped into the home directory? Solved, we move on.

# mv /.my.cnf /root/.my.cnf

Mock and example as quickly as I can think.

# mysql

mysql> create schema if not exists demo;
Query OK, 1 row affected (0.00 sec)

mysql> use demo;
Database changed
mysql> create table sample(id int unsigned not null auto_increment, name varchar(30) not null, location varchar(30) not null, domain varchar(50) null, primary key(id));
Query OK, 0 rows affected (0.03 sec)
mysql> show create table sample;

mysql> insert into sample values (null,'Demo Row','USA',null), (null,'Row 2','AUS','news.com.au'), (null,'Kiwi','NZ', null);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from sample;
+----+----------+----------+-------------+
| id | name     | location | domain      |
+----+----------+----------+-------------+
|  1 | Demo Row | USA      | NULL        |
|  2 | Row 2    | AUS      | news.com.au |
|  3 | Kiwi     | NZ       | NULL        |
+----+----------+----------+-------------+
3 rows in set (0.00 sec)

Cool, now to look at it in Javascript using MySQL Shell. Hurdle 2.

# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

 MySQL  JS > var session=mysqlx.getSession('root:ab6ea7b0436cbc0c0d49#@localhost')
mysqlx.getSession: Argument #1: Invalid URI: Illegal character [#] found at position 25 (ArgumentError)


What the, it doesn't like the password format. I'm not a Javascript person, and well this is an example for blogging, which is not what was actually setup, so do it the right way, create a user.

# mysql

mysql> create user [email protected] identified by 'qldb';
Query OK, 0 rows affected (0.01 sec)

mysql> grant ALL ON sample.* to [email protected];
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> SHOW GRANTS FOR [email protected];
+----------------------------------------------------------+
| Grants for [email protected]                                |
+----------------------------------------------------------+
| GRANT USAGE ON *.* TO `demo`@`localhost`                 |
| GRANT ALL PRIVILEGES ON `sample`.* TO `demo`@`localhost` |
+----------------------------------------------------------+
2 rows in set (0.00 sec)

Back into the MySQL Shell, and hurdle 3.

MySQL  JS > var session=mysqlx.getSession('demo:[email protected]')
mysqlx.getSession: Access denied for user 'demo'@'127.0.0.1' (using password: YES) (MySQL Error 1045)

Did I create the creds wrong, verify. No my password is correct.

#  mysql -udemo -pqldb -e "SELECT NOW()"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------+
| NOW()               |
+---------------------+
| 2021-03-06 23:15:26 |
+---------------------+

I don't have time to debug this, User take 2.

mysql> drop user [email protected];
Query OK, 0 rows affected (0.00 sec)

mysql> create user [email protected]'%' identified by 'qldb';
Query OK, 0 rows affected (0.01 sec)

mysql> grant all on demo.* to [email protected]'%'
    -> ;
Query OK, 0 rows affected (0.00 sec)

mysql> show grants;
+--
| Grants for [email protected]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+---
| GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `root`@`localhost` WITH GRANT OPTION                                                                                                                                                                                                                    |
| GRANT APPLICATION_PASSWORD_ADMIN,AUDIT_ADMIN,BACKUP_ADMIN,BINLOG_ADMIN,BINLOG_ENCRYPTION_ADMIN,CLONE_ADMIN,CONNECTION_ADMIN,ENCRYPTION_KEY_ADMIN,FLUSH_OPTIMIZER_COSTS,FLUSH_STATUS,FLUSH_TABLES,FLUSH_USER_RESOURCES,GROUP_REPLICATION_ADMIN,INNODB_REDO_LOG_ARCHIVE,INNODB_REDO_LOG_ENABLE,PERSIST_RO_VARIABLES_ADMIN,REPLICATION_APPLIER,REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,SERVICE_CONNECTION_ADMIN,SESSION_VARIABLES_ADMIN,SET_USER_ID,SHOW_ROUTINE,SYSTEM_USER,SYSTEM_VARIABLES_ADMIN,TABLE_ENCRYPTION_ADMIN,XA_RECOVER_ADMIN ON *.* TO `root`@`localhost` WITH GRANT OPTION |
| GRANT PROXY ON ''@'' TO 'root'@'localhost' WITH GRANT OPTION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+---
3 rows in set (0.00 sec)

mysql> show grants for [email protected]'%';
+--------------------------------------------------+
| Grants for [email protected]%                                |
+--------------------------------------------------+
| GRANT USAGE ON *.* TO `demo`@`%`                 |
| GRANT ALL PRIVILEGES ON `demo`.* TO `demo`@`%`   |
+--------------------------------------------------+
2 rows in set (0.00 sec)

Right, initially I showed grants of not new user, but note to self, I should checkout the MySQL 8 Improved grants. I wonder how RDS MySQL 8 handles these, and how Aurora MySQL 8 will (when it ever drops, that's another story).

Third try is a charm, so nice to also see queries with 0.0000 execution granularity.

 MySQL  JS > var session=mysqlx.getSession('demo:[email protected]')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > session.sql(sql)
+----+----------+----------+-------------+
| id | name     | location | domain      |
+----+----------+----------+-------------+
|  1 | Demo Row | USA      | NULL        |
|  2 | Row 2    | AUS      | news.com.au |
|  3 | Kiwi     | NZ       | NULL        |
+----+----------+----------+-------------+
3 rows in set (0.0006 sec)

Get that now in JSON output. NOTE: There are 3 different JSON formats, this matched what I needed.

bash-4.2# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > var session=mysqlx.getSession('demo:[email protected]')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > shell.options.set('resultFormat','json/array')
 MySQL  JS > session.sql(sql)
[
{"id":1,"name":"Demo Row","location":"USA","domain":null},
{"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
{"id":3,"name":"Kiwi","location":"NZ","domain":null}
]
3 rows in set (0.0006 sec)

Ok, that works in interactive interface, I need it scripted.

# vi
bash: vi: command not found
# yum install vi
Loaded plugins: ovl
http://repo.mysql.com/yum/mysql-connectors-community/el/7/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden
Trying other mirror.
...

And another downer of Docker containers, other tools or easy ways to install them, again I want to focus on the actual example, and not all this preamble, so

# echo "var session=mysqlx.getSession('demo:[email protected]')
var sql='SELECT * FROM demo.sample'
shell.options.set('resultFormat','json/array')
session.sql(sql)" > dump.js


# mysqlsh < dump.js

What the? Hurdle 4. Did I typo this as well, I check the file, and cut/paste it and get what I expect.

# cat dump.js
var session=mysqlx.getSession('demo:[email protected]')
var sql='SELECT * FROM demo.sample'
shell.options.set('resultFormat','json/array')
session.sql(sql)
# mysqlsh
MySQL Shell 8.0.22

Copyright (c) 2016, 2020, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > var session=mysqlx.getSession('demo:[email protected]')
 MySQL  JS > var sql='SELECT * FROM demo.sample'
 MySQL  JS > shell.options.set('resultFormat','json/array')
 MySQL  JS > session.sql(sql)
[
{"id":1,"name":"Demo Row","location":"USA","domain":null},
{"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
{"id":3,"name":"Kiwi","location":"NZ","domain":null}
]
3 rows in set (0.0022 sec)

This is getting crazy.

# echo '[
> {"id":1,"name":"Demo Row","location":"USA","domain":null},
> {"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
> {"id":3,"name":"Kiwi","location":"NZ","domain":null}
> ]' > sample.json
bash-4.2# jq . sample.json
bash: jq: command not found

Oh the docker!!!!. Switching back to my EC2 instance now.

$ echo '[
> {"id":1,"name":"Demo Row","location":"USA","domain":null},
> {"id":2,"name":"Row 2","location":"AUS","domain":"news.com.au"},
> {"id":3,"name":"Kiwi","location":"NZ","domain":null}
> ]' > sample.json
$ jq . sample.json
[
  {
    "id": 1,
    "name": "Demo Row",
    "location": "USA",
    "domain": null
  },
  {
    "id": 2,
    "name": "Row 2",
    "location": "AUS",
    "domain": "news.com.au"
  },
  {
    "id": 3,
    "name": "Kiwi",
    "location": "NZ",
    "domain": null
  }
]

I am now way of the time I would like to spend on this weekly post, and it's getting way to long, and I'm nowhere near showing what I actually want. Still we trek on.

Boy, this stock EC2 image uses version 1, we need I'm sure V2, and well command does not work!!!!

$  aws qldb list-ledgers
ERROR:
$ aws --version

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install
$ export PATH=/usr/local/bin:$PATH
$ aws --version

Can I finally get a ledger now.

$ aws qldb create-ledger --name demo --tags JIRA=DEMO-5826,Owner=RonaldBradford --permissions-mode ALLOW_ALL --no-deletion-protection
 
{
    "Name": "demo",
    "Arn": "arn:aws:qldb:us-east-1:999:ledger/demo",
    "State": "CREATING",
    "CreationDateTime": "2021-03-06T22:46:41.760000+00:00",
    "DeletionProtection": false
}

$  aws qldb list-ledgers

{
    "Ledgers": [
        {
            "Name": "xx",
            "State": "ACTIVE",
            "CreationDateTime": "2021-03-05T20:12:44.611000+00:00"
        },
        {
            "Name": "demo",
            "State": "ACTIVE",
            "CreationDateTime": "2021-03-06T22:46:41.760000+00:00"
        }
    ]
}

$ aws qldb describe-ledger --name demo
{
    "Name": "demo",
    "Arn": "arn:aws:qldb:us-east-1:999:ledger/demo",
    "State": "ACTIVE",
    "CreationDateTime": "2021-03-06T22:46:41.760000+00:00",
    "DeletionProtection": false
}

Oh the Python 2, and the lack of user packaging, more crud of getting an example.

$ pip install pyqldb==3.1.0
ERROR

$ echo "alias python=python3
alias pip=pip3" >> ~/.bash_profile
source ~/.bash_profile
$ pip --version
pip 9.0.3 from /usr/lib/python3.6/site-packages (python 3.6)

$ python --version
Python 3.6.8

$ pip install pyqldb==3.1.0

ERROR

$ sudo pip install pyqldb==3.1.0

Yeah!, after all that, my example code works and data is inserted.

$ cat demo.py
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)


def create_table(transaction_executor, table):

    print("Creating table {}".format(table))
    transaction_executor.execute_statement("Create TABLE {}".format(table))

def create_index(transaction_executor, table, column):
    print("Creating index {}.{}".format(table, column))
    transaction_executor.execute_statement("CREATE INDEX ON {}({})".format(table,column))


def insert_record(transaction_executor, table, values):
    print("Inserting into {}".format(table))
    transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"
column="id"
qldb_driver.execute_lambda(lambda executor: create_table(executor, table))
qldb_driver.execute_lambda(lambda executor: create_index(executor, table, column))


record1 = { 'id': "1",
            'name': "Demo Row",
            'location': "USA",
            'domain':  ""
        }

qldb_driver.execute_lambda(lambda x: insert_record(x, table, record1))
$ python demo.py
Initializing the driver
Creating table sample
Creating index sample.id
Inserting into sample

One vets in the AWS Console, but you cannot show that in text in this blog, so goes to find a simple client and there is qldbshell

What the? I installed it and it complains about pyqldb.driver.pooled_qldb_driver. I literally used that in the last example.

$ pip3 install qldbshell
Collecting qldbshell
  Downloading PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth' -> '/tmp/pip-p8j4d45d-uninstall/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python3.6/site-packages/pip/commands/install.py", line 365, in run
    strip_file_prefix=options.strip_file_prefix,
  File "/usr/lib/python3.6/site-packages/pip/req/req_set.py", line 783, in install
    requirement.uninstall(auto_confirm=True)
  File "/usr/lib/python3.6/site-packages/pip/req/req_install.py", line 754, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/usr/lib/python3.6/site-packages/pip/req/req_uninstall.py", line 115, in remove
    renames(path, new_path)
  File "/usr/lib/python3.6/site-packages/pip/utils/__init__.py", line 267, in renames
    shutil.move(old, new)
  File "/usr/lib64/python3.6/shutil.py", line 565, in move
    os.unlink(src)
PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/site-packages/amazon.ion-0.7.0-py3.6-nspkg.pth'
[[email protected]] ~
$ sudo pip3 install qldbshell
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting qldbshell
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Installing collected packages: amazon.ion, qldbshell
  Found existing installation: amazon.ion 0.7.0
    Uninstalling amazon.ion-0.7.0:
      Successfully uninstalled amazon.ion-0.7.0
  Running setup.py install for amazon.ion ... done
  Running setup.py install for qldbshell ... done
Successfully installed amazon.ion-0.5.0 qldbshell-1.2.0


$ sudo pip3 install qldbshell

$ qldbshell
Traceback (most recent call last):
  File "/usr/local/bin/qldbshell", line 11, in 
    load_entry_point('qldbshell==1.2.0', 'console_scripts', 'qldbshell')()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 476, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2700, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2318, in load
    return self.resolve()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.6/site-packages/qldbshell/__main__.py", line 25, in 
    from pyqldb.driver.pooled_qldb_driver import PooledQldbDriver
ModuleNotFoundError: No module named 'pyqldb.driver.pooled_qldb_driver'
$ pip list qldbshell
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
amazon.ion (0.5.0)
boto3 (1.17.21)
botocore (1.20.21)
ionhash (1.1.0)
jmespath (0.10.0)
pip (9.0.3)
prompt-toolkit (3.0.16)
pyqldb (3.1.0)
python-dateutil (2.8.1)
qldbshell (1.2.0)
s3transfer (0.3.4)
setuptools (39.2.0)
six (1.15.0)
urllib3 (1.26.3)

So, uninstalled and re-installed and voila, my data.

$ qldbshell
usage: qldbshell [-h] [-v] [-s QLDB_SESSION_ENDPOINT] [-r REGION] [-p PROFILE]
                 -l LEDGER
qldbshell: error: the following arguments are required: -l/--ledger
$ qldbshell -l demo

Welcome to the Amazon QLDB Shell version 1.2.0
Use 'start' to initiate and interact with a transaction. 'commit' and 'abort' to commit or abort a transaction.
Use 'start; statement 1; statement 2; commit; start; statement 3; commit' to create transactions non-interactively.
Use 'help' for the help section.
All other commands will be interpreted as PartiQL statements until the 'exit' or 'quit' command is issued.

qldbshell >

qldbshell > SELECT * FROM sample;                                                                                                                           
INFO:
{
 id: "1",
 name: "Demo Row",
 location: "USA",
 domain: ""
}
INFO: (0.1718s)

qldbshell > \q                                                                                                                                              
WARNING: Error while executing query: An error occurred (BadRequestException) when calling the SendCommand operation: Lexer Error: at line 1, column 1: invalid character at, '\' [U+5c];
INFO: (0.1134s)
qldbshell > exit                                                                                                                                            
Exiting QLDB Shell

Right \q is a mysqlism of the client, need to rewire myself.

Now, I have a ledger, I created an example table, mocked a row of data and verified. Now I can just load my sample data in JSON I created earlier right? Wrong!!!

$ cat load.py
import json
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)

def insert_record(transaction_executor, table, values):
  print("Inserting into {}".format(table))
  transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"

with open('sample.json') as f:
  data=json.load(f)

qldb_driver.execute_lambda(lambda x: insert_record(x, table, data))

$ python load.py
Traceback (most recent call last):
  File "load.py", line 2, in 
    from pyqldb.config.retry_config import RetryConfig
ModuleNotFoundError: No module named 'pyqldb'
[[email protected]] ~

Oh sweet, I'd installed that, and used it, and re-installed it.

$ pip list | grep pyqldb
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
[[email protected]] ~
$ sudo pip3 install pyqldb
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting pyqldb
Requirement already satisfied: boto3<2,>=1.16.56 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: botocore<2,>=1.19.56 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: ionhash<2,>=1.1.0 in /usr/local/lib/python3.6/site-packages (from pyqldb)
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from amazon.ion<1,>=0.7.0->pyqldb)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/lib/python3.6/site-packages (from boto3<2,>=1.16.56->pyqldb)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/site-packages (from boto3<2,>=1.16.56->pyqldb)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/site-packages (from botocore<2,>=1.19.56->pyqldb)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<2,>=1.19.56->pyqldb)
Installing collected packages: amazon.ion, pyqldb
  Found existing installation: amazon.ion 0.5.0
    Uninstalling amazon.ion-0.5.0:
      Successfully uninstalled amazon.ion-0.5.0
  Running setup.py install for amazon.ion ... done
  Running setup.py install for pyqldb ... done
Successfully installed amazon.ion-0.7.0 pyqldb-3.1.0

Load one more time.

$ cat load.py
import json
from pyqldb.config.retry_config import RetryConfig
from pyqldb.driver.qldb_driver import QldbDriver

# Configure retry limit to 3
retry_config = RetryConfig(retry_limit=3)

# Initialize the driver
print("Initializing the driver")
qldb_driver = QldbDriver("demo", retry_config=retry_config)

def insert_record(transaction_executor, table, values):
  print("Inserting into {}".format(table))
  transaction_executor.execute_statement("INSERT INTO {} ?".format(table),  values)


table="sample"

with open('sample.json') as f:
  data=json.load(f)

qldb_driver.execute_lambda(lambda x: insert_record(x, table, data))

$ python load.py
Initializing the driver
Inserting into sample

And done, I've got my JSON extracted MySQL 8 data in QLDB. I go to vett it in the client, and boy, didn't expect yet another package screw up. Clearly, these 2 AWS python packages are incompatible. That's a venv need, but I'm now at double my desired time to show this.

$ qldbshell -l demo
Traceback (most recent call last):
  File "/usr/local/bin/qldbshell", line 11, in 
    load_entry_point('qldbshell==1.2.0', 'console_scripts', 'qldbshell')()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 476, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2700, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2318, in load
    return self.resolve()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2324, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.6/site-packages/qldbshell/__main__.py", line 25, in 
    from pyqldb.driver.pooled_qldb_driver import PooledQldbDriver
ModuleNotFoundError: No module named 'pyqldb.driver.pooled_qldb_driver'
[[email protected]] ~
$ pip list | grep qldbshell
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
qldbshell (1.2.0)


$ sudo pip uninstall qldbshell pyqldb

$ sudo pip install qldbshell
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Collecting qldbshell
  Downloading Requirement already satisfied: boto3>=1.9.237 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: amazon.ion<0.6.0,>=0.5.0 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.5 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: ionhash~=1.1.0 in /usr/local/lib/python3.6/site-packages (from qldbshell)
Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: botocore<1.21.0,>=1.20.21 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/site-packages (from boto3>=1.9.237->qldbshell)
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from amazon.ion<0.6.0,>=0.5.0->qldbshell)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.6/site-packages (from prompt_toolkit<3.1.0,>=3.0.5->qldbshell)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.21->boto3>=1.9.237->qldbshell)
Installing collected packages: qldbshell
  Running setup.py install for qldbshell ... done
Successfully installed qldbshell-1.2.0

Can I see my data now

$ qldbshell -l demo

Welcome to the Amazon QLDB Shell version 1.2.0
Use 'start' to initiate and interact with a transaction. 'commit' and 'abort' to commit or abort a transaction.
Use 'start; statement 1; statement 2; commit; start; statement 3; commit' to create transactions non-interactively.
Use 'help' for the help section.
All other commands will be interpreted as PartiQL statements until the 'exit' or 'quit' command is issued.

qldbshell > select * from sample;                                                                                                                           
INFO:
{
 id: 1,
 name: "Demo Row",
 location: "USA",
 domain: null
},
{
 id: 1,
 name: "Demo Row",
 location: "USA",
 domain: null
},
{
 id: "1",
 name: "Demo Row",
 location: "USA",
 domain: ""
},
{
 id: 3,
 name: "Kiwi",
 location: "NZ",
 domain: null
},
{
 id: 2,
 name: "Row 2",
 location: "AUS",
 domain: "news.com.au"
},
{
 id: 3,
 name: "Kiwi",
 location: "NZ",
 domain: null
},
{
 id: 2,
 name: "Row 2",
 location: "AUS",
 domain: "news.com.au"
}
INFO: (0.0815s)

And yes, data, I see it's duplicated, so I must have in between the 10 steps run twice. This does highlight a known limitation of QLDB, no unique constraints.

But wait, that data is not really correct, I don't want null. Goes back to the JSON to see the MySQL shell gives that.

$ jq . sample.json
[
  {
    "id": 1,
    "name": "Demo Row",
    "location": "USA",
    "domain": null
  },
...

At some point I also got this load error, but by now I've given up documenting how to do something, in order to demonstrate something.

NameError: name 'null' is not defined

One has to wrap the only nullable column with IFNULL(subdomain,'') as subdomain and redo all those steps again. This is not going to be practical having to wrap all columns in a wider table with IFNULL.

However, having exhausted all this time for what was supposed to be a quiet weekend few hours, my post is way to long, and I've learned "Creating examples can be hard".

#WDILTW – What can I run from my AWS Aurora database

When you work with AWS Aurora you have limited admin privileges. There are some different grants for MySQL including SELECT INTO S3 and LOAD FROM S3 that replace the loss of functionality to SELECT INTO OUTFILE and mysqldump/mysqlimport using a delimited format. While I know and use lambda capabilities, I have never executed anything with INVOKE LAMDBA directly from the database.

This week I found out about INVOKE COMPREHEND (had to look that product up), and INVOKE SAGEMAKER (which I used independently). These are machine learning capabilities that enable you to build custom integrations using Comprehend and SageMaker. I did not have any chance to evaluate these capabilities so I am unable to share any use cases or experiences. There are two built-in comprehend functions aws_comprehend_detect_sentiment() and aws_comprehend_detect_sentiment_confidence(), a likely future starting place. Sagemaker is invoked as an extension of a CREATE FUNCTION that provides the ALIAS AWS_SAGEMAKER_INVOKE_ENDPOINT syntax.

Also available are some MySQL status variables including Aurora_ml_logical_response_cnt, Aurora_ml_actual_request_cnt, Aurora_ml_actual_response_cnt, Aurora_ml_cache_hit_cnt, Aurora_ml_single_request_cnt.

Some googling found an interesting simple example, calculating the positive/negative sentiment and confidence of sentences of text. I could see this as useful for analyzing comments. I’ve included the example from this site here to encourage my readers to take a look as I plan to do. Post IAM configuration I will be really curious to evaluate the responsiveness of this example. Is this truly a batch only operation or could you return some meaningful response timely?

This also lead to bookmarking for reading https://awsauroralabsmy.com/, https://github.com/aws-samples/amazon-aurora-labs-for-mysql/ and https://squidfunk.github.io/mkdocs-material/ all from this one page.

#WDILTW – Debugging failed http requests thru the web of redirects

There are reports that your website is down. You pull up the login page without incident. What’s next?

Monitoring is critical. How detailed is this? How frequently are you sampling? The resolution to any issue is only as good as the response to a paged alert. Who is looking into the issue? What escalation exists?

In today’s complex interconnected infrastructure is it ever that simple? When speaking about an AWS hosted solution, is it an AWS Issue? Does status.aws.amazon.com give you a clue? Does the inability to access other services/sites you may be using at this moment give an indicator of a larger problem? Is it AWS related for a service, an availability zone, or even an entire region? Having experienced all of those before sometimes its obvious, sometimes it is not. Or does a Twitter Search report other shared experiences of regional outages, was it that severed Verizon underwater cable?

I learned two things this week in triage of this situation. The first is that the old CLI tools you have been using for 20+ years still help in triage quickly. D not discount them or the detail they provide. I was able to identify and reproduce an underlying cause with just nslookup and curl. For many reviewing the outage the problem did not manifest as an error. It turned out there were two distinct paths from two separate domains to the ultimate target page. This was not immediately obvious and known, and there was no definitive network diagram to describe this.

When this was determined nslookup provided that there were two different resolved AWS ELBs. dig is also a useful command to master, for example to determine if an A record or CNAME for example.

$ nslookup demo.internal-example.com

demo.internal-example.com	canonical name = internal.us-east-1.elb.amazonaws.com.
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.1.2
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.0.3
Name:	 internal.us-east-1.elb.amazonaws.com
Address: 10.10.2.4
$ ▶ nslookup demo.public-example.com

Non-authoritative answer:
demo.public-example.com	         canonical name = external.us-east-1.elb.amazonaws.com.
Name:	 external.us-east-1.elb.amazonaws.com
Address: 23.123.111.222
Name:	 external.us-east-1.elb.amazonaws.com
Address: 50.200.211.222

The first indication was actually to find that one of the ELBs was not in the AWS account with all other resources, and this AWS account was not viewable. That is a separate discussion for why? curl then helped to traverse the various redirects of each ELB using these options

  • -i/–include – Include the headers
  • -k/–insecure – Allow insecure SSL connections
  • -L/–location – Follow redirects
$ curl -ikL external.us-east-1.elb.amazonaws.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Thu, 11 Feb 2021 20:34:47 GMT
Content-Type: text/html
Content-Length: 134
Location: https://external.us-east-1.elb.amazonaws.com:443/
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 0

HTTP/1.1 200 Connection established

HTTP/2 302
date: Thu, 11 Feb 2021 20:34:48 GMT
content-length: 0
location: http://demo.unavailable.com
cache-control: no-cache

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 2071
Date: Thu, 11 Feb 2021 19:09:29 GMT
Last-Modified: Tue, 18 Dec 2018 05:32:31 GMT
Accept-Ranges: bytes
Server: AmazonS3
X-Cache: Hit from cloudfront
Via: 1.1 44914fa6421b789193cec8998428f8bd.cloudfront.net (CloudFront)
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 1071

<html

Using these commands was nothing new, however identifying this single line provided a way to isolate within the chain of redirects where to focus.

content-length: 0

Ultimately the issue was not ELB related, but internal infrastructure behind this one ELB. When corrected the result was (trimmed for readability)

$ curl -ikL external.us-east-1.elb.amazonaws.com
HTTP/1.1 301 Moved Permanently
Server: awselb/2.0
Date: Thu, 11 Feb 2021 20:37:18 GMT
Content-Type: text/html
Content-Length: 134
Location: https://external.us-east-1.elb.amazonaws.com:443/
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 0

HTTP/1.1 200 Connection established

HTTP/2 302
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/plain; charset=utf-8
content-length: 27
x-powered-by: 
location: /redirect
vary: Accept

HTTP/2 301
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/html
content-length: 162
location: /redirect/

HTTP/2 200
date: Thu, 11 Feb 2021 20:37:18 GMT
content-type: text/html
content-length: 2007
last-modified: Tue, 02 Feb 2021 03:27:13 GMT
vary: Accept-Encoding

<html>
  <head>

In summary, and a means to triage a future problem, or to monitor:

Failure success
$ egrep -i "^HTTP|^Content-Length" 

HTTP/1.1 301 Moved Permanently
Content-Length: 134
HTTP/1.1 200 Connection established
HTTP/2 302
content-length: 0
HTTP/1.1 200 OK
Content-Length: 2071


$ egrep -i "^HTTP|^Content-Length"

HTTP/1.1 301 Moved Permanently
Content-Length: 134
HTTP/1.1 200 Connection established
HTTP/2 302
content-length: 27
HTTP/2 301
content-length: 162
HTTP/2 200
content-length: 2007

With the proliferation of GUI based monitoring products it is likely for many organizations that multiple different monitors are available, but are they triggered, and do they enable you to pinpoint the underlying issue? Long gone are the days of a Pingdom type ping of a URL from multiple locations every minute and a report of latency or errors then you start digging. This week I learned about DataDog Synthetic Monitoring. DataDog is a well established monitoring solution that I have only just started to understand, I wish I had a year to master to delving into.

In later review this monitoring showed an already configured browser test for this top level URL that was failing, it was simply not alerting correctly. The Synthetic monitoring is far more advanced, providing an ITTT workflow, and even provides physical images of the rendered pages.

This experience highlighted the need to have detailed and redundant monitoring but also the right process to triage and drill down.

I looked into trying to provide an example of this DataDog feature, however the free tier monitoring solution does not provide all the advanced features for the evaluation I’d like. You can look at some product examples.

Observability is a key tool in any operations management. It should be one of the pillars where a continued investment of time, resources and skills development can add significant value for business continuity.

#WDILTW – AWS RDS Proxy

This week I was evaluating AWS RDS Proxy. If you are familiar with the Relational Database Service (RDS) and use MySQL or Postgres, this is an additional option to consider.

Proxies in general by the name accept incoming requests and perform some management before those requests are forwarded to the ultimate target.

RDS proxy takes incoming database connections and can perform several capabilities including collection pooling and capping the total database connections with each configured proxy holding a percentage of the total connections for the target cluster. The proxy can handle routing only for writer instances (at this time) to minimize a planned or unplanned failover. The RDS proxy however does not address the underlying problem of too many connections to the database, it just adds another layer, that is or may be more configurable or tunable than an application requesting connections.

The RDS Proxy is automatically Highly Available (HA). You can determine this by looking at the host IPs of the MySQL processlist. I have yet to identify any other means of seeing if a connection is a proxy connection at the database level if you are using the same credentials. RDS Proxy does give you the ability via Secrets Manager to connect as a different user. You can specify a connection initialization query. I used a SET variable so that application could determine if it was using a Proxy however that is of little benefit in server connection management.

The RDS proxy can enforce TLS, something which in my opinion should always be used for application to data store communications, but historically has been overlooked at practically every company I have worked for or consulted to. Just because you are communicating within a VPC does not protect your communications from actors within your VPC. I can remember at a prior employment the disappointment of cross-region replication that was encrypted being dropped because it was too hard to migrate or manage. That shows an all too common problem of laziness over security.

If you are new to a particular technology the age of the Internet gives you search capabilities to find numerous articles. If you search for anything AWS you will generally always get as the top results the official pages, it takes some digging to find other articles. Prior to this lesson I had only read about RDS Proxy, I had never actually setup one.

When anybody is learning something new, I like to say your value add is not to just read an article, but reproduce and then adapt or enhance. This Amazon example is no different. Repeating each step showed multiple errors in syntax which I can contribute back as comments. If this was open source code, you could contribute a pull request (PR). The good news is the first example of configuring a proxy includes by GUI and CLI commands. I always like to do my work on the command line, even the first iteration. You cannot scale a human moving a mouse around and clicking. What I found however was that the official AWS CLI lacked a key component of the proxy setup around group targets. The UI provides a capability that the CLI did not. Another discrepancy was when I was making modifications to the proxy in the GUI I would get an error, but I could make that change via the CLI. These discrepancies are an annoyance for consistency and first evaluation.

So what was the outcome of my evaluation? First I was able to demonstrate I could add a proxy to an existing cluster in one of our test environments and direct traffic from a mysql client thru the proxy to the target database. I was able to use Secrets Manager (SSM) to enforce credentials for authorization. I did not look into Identity Access Management (IAM) roles support. I was able to benchmark with sysbench simulated load to compare latency of the proxy traffic versus direct traffic. I have simplified my examples so that anybody can run these tests themselves for simple validation.

I could enforce TLS communications for the mysql client testing, however our company internal http proxy caused the usual self signed certificate issues with sysbench, something I really need to master. Surprisingly I looked at what options sysbench gave me for SSL options (side bar we should always refer to this as TLS instead of SSL), but the defined options for the installed recent version are still using the ssl name. The scope of options differed from the source code online so a question as to why? That’s the great thing about open source, you can read the code. You may have even met the author at a conference presentation.

Where the evaluation hit a business impact was in comparative performance. I am still awaiting an AWS support response to my evaluation.

What’s next is to get an application team to evaluate end to end database operations, easily done as Route 53 DNS is used for endpoint communications.
Where I got stuck was incorporating the setup of RDS proxy within Terraform We currently use version 12. While there was the aws_db_proxy module, I needed an updated version of the aws provider to our environment. The official Hashicorp documentation of the resource really does not highlight the complexity necessary to create a proxy. While you will have already configured a VPC, and subnets, even Ingres security groups and secrets which all parts necessary for RDS cluster, you need a number of integrated pieces.

You will need an IAM role for your proxy, but that role requires a policy to use KMS to get the secrets you wish to use for authorization. This interdependency of KMS and secret ARNs make is difficult to easily launch a RDS proxy as you would an RDS aurora cluster. Still it’s a challenge for something else to do. The added complexity is the RDS proxy also needs an authorization argument, for example the –auth argument in the AWS CLI. I see this as a complexity for management of RDS users that you wish to also be configured for use in the proxy.

As with any evaluation or proof of concept (POC) the devil is in the details. How do you monitor your new resources, what logging is important to know, what types of errors can happen, and how do you address these.

Another issue I had was the RDS proxy required a new version of the AWS client in order to run RDS commands such as describe-db-proxies. That adds an additional administrative dependency to be rolled out.

Proxies for MySQL have been around for decades, I can remember personally working on the earliest version of MySQL Proxy at MySQL Inc back in 2007. The gold standard if you use MySQL, is ProxySQL by Sysown’s René Cannaò. This is a topic for a different discussion.

Checkout my code for this work.

Reading

Enforcing a least privileged security model can be hard

In a greenfield environment you generally have the luxury to right any wrongs of any past tech debt. It can be more difficult to apply this to an existing environment? For example, my setup is configured to just work with the AWS CLI and various litmus tests to validate that. Generally instructions would include, valid your AWS access.  This can be as simple as: 

$ aws ec2 describe-regions
$ aws ec2 describe-availability-zones --profile oh

As part of documenting some upcoming Athena/Hadoop/Pig/RDBMS posts I decided it was important to separate out the AWS IAM privileges with a new user and permission policies.This introduced a number of steps that simply do not work.  Creating a new AWS IAM user is not complex. Validating console and API access of that user required some revised setup.

$ aws ec2 describe-regions

An error occurred (AuthFailure) when calling the DescribeRegions operation: AWS was not able to validate the provided access credentials

In order to be able to use the CLI you require your aws_access_key_id and aws_secret_access_key information as well as aws_session_token if used. In order for a new individual user to gain this information, you also need a number of policy rules including the ability to ListAccessKeys, CreateAccessKey and potentially DeleteAccessKey.

 
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "iam:DeleteAccessKey",
                "iam:CreateAccessKey",
                "iam:ListAccessKeys"
            ],
            "Resource": "arn:aws:iam::[account]:user/[username]"
        }
    ]
}

In this example, we also restrict the least privileged model with a specific user resource ARN. For a single user account that is workable, for a large organization it would not.
This gives the ability to configure your AWS CLI via typical ~/.aws/credentials and/or ~/aws/config settings. Performing  the litmus test now gives:

$ aws ec2 describe-regions

An error occurred (UnauthorizedOperation) when calling the DescribeRegions operation: You are not authorized to perform this operation.

This requires a policy of:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeRegions"
            ],
            "Resource": "*"
        }
    ]
}
$ aws ec2 describe-regions | jq '.Regions[0]'
{
  "Endpoint": "ec2.eu-north-1.amazonaws.com",
  "RegionName": "eu-north-1",
  "OptInStatus": "opt-in-not-required"
}


$ aws ec2 describe-availability-zones --filter "Name=region-name,Values=us-east-1" | jq -r '.AvailabilityZones[].ZoneName'

us-east-1a
us-east-1b
us-east-1c
us-east-1d
us-east-1e
us-east-1f

However, this may be too restrictive for a larger organization.  The EC2 Access level for ‘list’ includes currently over 120 individual permissions. A more open policy could be:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:Describe*"
            ],
            "Resource": "*"
        }
    ]
}

However this does not provide all of the EC2 ‘list’ actions, e.g. ExportClientVpnClientConfiguration, and it includes several ‘read’ actions, e.g. DescribeVolumesModifications.
Selecting the ‘list’ tickbox via the GUI will provide all actions by name individually in the policy action list, currently 117, however this is not forward compatible for any future list defined access level actions.

This is before the exercise to starting granting access to a new AWS service – Athena, and its data source S3.

Defensive Data Techniques

As a data architect I always ensure that for any database schema change there a fully recoverable execution path.
I have generally advised to create a patch/revert process for every change.  For example, if a change adds a new column or index to a table, a revert script would remove the respective column or index.
The goal is to always have a defensive position for any changes. The concept is that simple, it is not complex.

In its simplest form I use the following directory and file structure.

/schema
    schema.sql
    /patch
        YYYYMMDDXX.sql     where XX,ZZ are sequential 2 digit numbers, e.g. 01,02
        YYYYMMDDZZ.sql
   /revert
       YYYYMMDDXX.sql   This is the same file name in the revert sub-directory.
       YYYYMMDDZZ.sql

At any commit or tag in configuration management it is possible to create a current copy of the schema, i.e. use schema.sql.
It is also possible to take the first version of schema.sql and apply chronologically all the patch scripts to arrive at the same consistent structure of the schema that is in schema.sql. You can also run a validation process to confirm these are equivalent.
For each tagged version or commit of this directory structure and files in version control, this should always hold true.
While not the desired execution path, every revert script can be applied in a reverse chronological order and return to the first version of the schema.
If you want to maintain a first_schema.sql file within the directory structure, you can always create any version of the schema from a given commit in a roll-forward or roll-back scenario.

In reality however this is rarely implemented. There is always divergence or drift. Drift occurs for several primary reasons. The first is non-adherence to the defined process. The second and more critical is the lack of adequate testing and verification at each and every step.  A Test Driven Design (TDD) that validates the given approach would enable a verification of end state of the schema and enable the verification at each accumulated

In addition to each patch/revert there needs to be a state that is maintained of what has been applied.  Generally for RDBMS storing this metadata within a table is recommended.

The above example shows files of .sql extension. Any schema management process needs to cater for .sh, .py or other extensions to cater for more complex operations.
 
What about data changes?  I would recommend that for all configuration information you follow the same management principles as for schema objects, that is you have a patch to insert/update/delete data, and you have a revert script that can restore that data.  Generally the complexity of the rollback process is a hurdle for developers/engineers. Having a framework is important to manage how data consistency is maintained. This framework could generate a statement to restore the data (e.g. a selective mysqldump), require a hand-crafted statement, or leverage the benefit of the RDBMS by storing the data into intermediate shallow tables.

Using a least privileged model complicates an applicable framework approach. Does the user applying the change now require the FILE privilege, or CREATE/DROP privilege to create tables for the ability to restore data.

If there is strict referential integrity at the database level, those protections will defend against unintended consequences. For example, deleting a row that is dependent on a foreign key relationship.  In a normal operating system accommodations are made generally for the sake of performance, but also for supporting poor data cleansing requirements. If the application maintains a level of referential integrity, the schema management process also needs to support this, adding a further complexity.  Ensuring data integrity is an important separate topic. If there is a dangling row, what is the impact? The data still exists, it is just not presented in a user interface or included in calculations. This generally leads to greater unintended consequences that are generally never obvious at the time of execution, but rather days, weeks or months later.

When it comes to objects within the structure of an RDBMS the situation is more complex.  A classic example in MySQL is a user.  A user in MySQL is actually the user definition which is just the username, password and host.  A user contains one or more grants. The user may be the owner of additional objects. Using default and legacy MySQL, it is simply not possible to determine if a user is actually being used. Percona and other variances support INFORMATION_SCHEMA.USER_STATISTICS which is a better method of evaluating the use of a user.  This does however require the intervention of time-based data collection, as this table is the accumulative statistics since an instance restart or flush.

With this type of object, or meta object several defensive techniques exist.  

If you had the user `blargie` and that user had grants to read data from several schemas, is the user used?  I don’t think so, let’s just delete it is not a fact-based approach to avoiding a subsequent problem.
Is the user used? Let’s revoke the users privileges and monitor for errors or user feedback? Or let’s change the user’s password?  With each of these strategies it is important to always have a defensive process to rollback.
A different approach is to use a common data technique of marking information as deleted before it’s physically deleted (think trash can before you empty the trash).  For MySQL users there is no default functionality (in the most recent versions of MySQL you can DISABLE a user).  One implementation to apply this pattern is to rename the user, which has the benefit of keeping the user’s password and privileges intack, therefore reducing the amount of complexity in restoring.

Regardless of the technique, it is important there is always a recovery path.  In a subsequent post I will discuss this approach towards cloud metadata, for example an AWS KMS policy, IAM Rule or ASG setting and the impact of  Infrastructure as a Service (IaaS) such as Terraform.

More reading https://en.wikipedia.org/wiki/Test-driven_development, https://en.wikipedia.org/wiki/Defensive_programming

AWS cost saving tips – EBS Volumes

A trivial cost saving tip for checking if you are spending money in your AWS environment on unused resources. This is especially appropriate when using provisioned IOPS EBS volumes.

$ ec2-describe-volumes | grep available

VOLUME	vol-44dff904	8	snap-d86d0884	us-east-1b	available	2014-08-01T14:11:24+0000	standard
VOLUME	vol-62dff922	100		us-east-1b	available	2014-08-01T14:11:24+0000	io1	1000
VOLUME	vol-15dff955	8	snap-d86d0884	us-east-1b	available	2014-08-01T14:11:24+0000	standard
VOLUME	vol-80a88ec0	8	snap-d86d0884	us-east-1b	available	2014-08-01T15:12:54+0000	standard
VOLUME	vol-ca82a48a	100		us-east-1b	available	2014-08-01T16:13:49+0000	standard
VOLUME	vol-5d79581d	8	snap-d86d0884	us-east-1b	available	2014-08-01T18:27:01+0000	standard
VOLUME	vol-baf9dbfa	8	snap-d86d0884	us-east-1b	available	2014-08-03T18:20:59+0000	standard
VOLUME	vol-53ffdd13	8	snap-d86d0884	us-east-1b	available	2014-08-03T18:25:52+0000	standard
VOLUME	vol-ade7daed	8	snap-d86d0884	us-east-1b	available	2014-08-13T20:10:46+0000	standard
VOLUME	vol-34e2df74	8	snap-065a2e52	us-east-1b	available	2014-08-13T20:26:17+0000	standard
VOLUME	vol-cacef38a	100	snap-280ffb7f	us-east-1b	available	2014-08-13T21:19:18+0000	standard
VOLUME	vol-41350a01	8	snap-f23ccba5	us-east-1b	available	2014-08-14T16:54:27+0000	standard
VOLUME	vol-51350a11	100	snap-fc3ccbab	us-east-1b	available	2014-08-14T16:54:27+0000	standard
VOLUME	vol-912f10d1	8	snap-96ee24c1	us-east-1b	available	2014-08-14T17:15:06+0000	standard
VOLUME	vol-a82f10e8	100	snap-9dee24ca	us-east-1b	available	2014-08-14T17:15:06+0000	standard

These are available and unused EBS volumes which you should consider deleting.

Another reason to avoid RDS

My list of reasons for never using or recommending Amazon’s MySQL RDS service grows every time I experience problems with customers. This was an interesting and still unresolved issue.

ERROR 126 (HY000): Incorrect key file for table '/rdsdbdata/tmp/#sql_5b7_1.MYI'; try to repair it

You may see this is a MyISAM table. The MySQL database is version 5.5, all InnoDB tables and is very small 100MB in total size.
What is happening is that MySQL is generating a temporary table, and this table is being written to disk. I am unable to change the code to improve the query causing this disk I/O.

What I can not understand and have no ability to diagnose is why this error occurs sometimes and generally when the database is under additional system load. With RDS you have no visibility of the server running the production database. While you have SQL access, an API for managing MySQL configuration options (I also add not all MySQL variables), and limited system statistics via a graphical interface, all information about the system performance, disk configuration etc is hidden and not accessible. This is a frustrating limitation of using RDS.

NOTE: While I cannot recommend RDS, I am very happy with AWS EC2 services when correctly configured. For a cloud based MySQL solution I would definitely recommend greater control over your MySQL database using EC2 and EBS.

Basic scalability principles to avert downtime

In the press in the last two days has been the reported outage of Amazon Web Services Elastic Compute Cloud (EC2) in just one North Virginia data center. This has affected many large website includes FourSquare, Hootsuite, Reddit and Quora. A detailed list can be found at ec2disabled.com.

For these popular websites was this avoidable? Absolutely.

Basic scalability principles if deployed in these systems architecture would have averted the significant downtime regardless of your development stack. While I work primarily in MySQL these principles are not new, nor are they complicated, however they are fundamental concepts in scalability that apply to any technology including the popular MongoDB that is being used by a number of affected sites.

Scalability 101 involves some simple basic rules. Here are just two that seem to have been ignored by many affected by this recent AWS EC2 outage.

  1. Never put all your eggs in one basket. If you rely on AWS completely, or you rely on just one availability zone that is putting all your eggs in one basket.
  2. Always keep your important data close to home. When it comes to what is most critical to your business you need access and control to your information. At 5am in the morning when the CEO asks how long will our business be unavailabla and what is needed to resolve the problem, the answer “We have no control over this and have no ETA” is not an acceptable answer.

With a successful implementation and appropriate data redundancy you may not have an environment immediately available however you have access to your important information and the ability to create one quickly. Many large hosting companies can provide additional H/W on near demand, especially if you have an initial minimal footprint. Indeed using Amazon Web Services (AWS) as a means to avert a data center disaster is an ideal implementation of Infrastructure As A Service (IAAS). Even with this issue, organizations that had planned for this type of outage could have easily migrated to another AWS availability zone that was unaffected.

Furthermore, system architecture to support various levels of data availability and scalability ensure you can handle many more various types of unavailability without significant system down time as recently seen. There are many different types of availability and unavailability, know what your definition of downtime is and supporting disasters should be your primary focus of scalability, not an after thought.

As an expert in performance and scalability I can help your organization in the design of a suitable architecture to support successful scalability and disaster. This is not rocket science however many organizations gamble without the expertise of a professional to ensure business viability.

Problems compiling MySQL 5.4

Seem’s the year Sun had for improving MySQL, and with an entire new 5.4 branch the development team could not fix the autoconf and compile dependencies that has been in MySQL for all the years I’ve been compiling MySQL. Drizzle has got it right, thanks to the great work of Monty Taylor.

I’m working on the Wafflegrid AWS EC2 AMI’s for Matt Yonkovit and while compiling 5.1 was straight forward under Ubuntu 8.10 Intrepid, compiling 5.4 was more complicated.

For MySQL 5.1 I needed only to do the following:

apt-get install -y build-essential
apt-get install libncurses5-dev
./configure
make
make install

For MySQL 5.4, I elected to use the BUILD scripts (based on Wafflegrid recommendations). That didn’t go far before I needed.

apt-get install -y automake libtool

You then have to go compiling MySQL 5.4 for 10+ minutes to get an abstract error, then you need to consider what dependencies may be missing.
I don’t like to do a blanket apt-get of a long list of proposed packages unless I know they are actually needed.

The error was:

make[1]: Entering directory `/src/mysql-5.4.0-beta/sql'
make[1]: warning: -jN forced in submake: disabling jobserver mode.
/bin/bash ../ylwrap sql_yacc.yy y.tab.c sql_yacc.cc y.tab.h sql_yacc.h y.output sql_yacc.output -- -d --verbose
make -j 6 gen_lex_hash
make[2]: Entering directory `/src/mysql-5.4.0-beta/sql'
rm -f mini_client_errors.c
/bin/ln -s ../libmysql/errmsg.c mini_client_errors.c
make[2]: warning: -jN forced in submake: disabling jobserver mode.
rm -f pack.c
../ylwrap: line 111: -d: command not found
/bin/ln -s ../sql-common/pack.c pack.c
....
make[1]: Leaving directory `/src/mysql-5.4.0-beta/sql'
make: *** [all-recursive] Error 1

What a lovely error ../ylwrap: line 111: -d: command not found

ylwrap is part of yacc, and by default in this instance it’s not even an installed package. I’ve compiled MySQL long enough that it requires yacc, and actually bison but to you think it would hurt if the configure told the user this.

It’s also been some time since I’ve compiled MySQL source, rather focusing on Drizzle. I had forgotten just how many compile warnings MySQL throws. Granted a warning is not an error, but you should not just ignore them in building a quality product.

Announcing Drizzle on EC2

I have published the very first sharable Drizzle Amazon Machine Image (AMI) for AWS EC2, based on the good feedback from my discussion at the Drizzle Developer Day on what options we should try.

This first version is a 32bit Developer instance, showcasing Drizzle and all necessary developer tools to build Drizzle from source.

What you will find on drizzle-ami/intrepid-dev32 – ami-b858bfd1

Ubuntu 8.10 Intrepid 32 bit base server installation:

  • build tools
  • drizzle dependencies
  • bzr 1.31.1

From the respective source trees the following software is available:

  • drizzle 2009.04.997
  • libdrizzle 0.0.2
  • gearman 0.0.4
  • memcached 1.2.8
  • libmemcached 0.28

Drizzle has been configured with necessary dependencies for PAM authentication, http_auth, libgearman and MD5 but these don’t seem to be available in the binary distribution.

I will be creating additional AMI’s including 64bit and LAMP ready binary only images.

The following example shows using drizzle on this AMI. Some further work is necessary for full automation, parameters and logging. I’ve raised a number of issues the Drizzle Developers are now hard at work on.

1. Starting Drizzle

ssh [email protected]
sudo /etc/init.d/drizzle-server.init start &

2. Testing Drizzle (the sakila database has been installed)

$ drizzle
Welcome to the Drizzle client..  Commands end with ; or g.
Your Drizzle connection id is 4
Server version: 2009.04.997 Source distribution

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

drizzle> select version();
+-------------+
| version()   |
+-------------+
| 2009.04.997 |
+-------------+
1 row in set (0 sec)

drizzle> select count(*) from sakila.film;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0 sec)

3. Compiling Drizzle

sudo su - drizzle
ls
deploy  drizzle  libdrizzle  sakila-drizzle
cd drizzle
./configure --help
Description of plugins:

   === HTTP Authentication Plugin ===
  Plugin Name:      auth_http
  Description:      HTTP based authentications
  Supports build:   static and dynamic

   === PAM Authenication Plugin ===
  Plugin Name:      auth_pam
  Description:      PAM based authenication.
  Supports build:   dynamic

   === compression UDFs ===
  Plugin Name:      compression
  Description:      UDF Plugin for compression
  Supports build:   static and dynamic
  Status:           mandatory

   === crc32 UDF ===
  Plugin Name:      crc32
  Description:      UDF Plugin for crc32
  Supports build:   static and dynamic
  Status:           mandatory

   === Error Message Plugin ===
  Plugin Name:      errmsg_stderr
  Description:      Errmsg Plugin that sends messages to stderr.
  Supports build:   dynamic

   === Daemon Example Plugin ===
  Plugin Name:      hello_world
  Description:      UDF Plugin for Hello World.
  Supports build:   dynamic

   === Gearman Logging Plugin ===
  Plugin Name:      logging_gearman
  Description:      Logging Plugin that logs to Gearman.
  Supports build:   dynamic

   === Query Logging Plugin ===
  Plugin Name:      logging_query
  Description:      Logging Plugin that logs all queries.
  Supports build:   static and dynamic
  Status:           mandatory

   === Syslog Logging Plugin ===
  Plugin Name:      logging_syslog
  Description:      Logging Plugin that writes to syslog.
  Supports build:   static and dynamic
  Status:           mandatory

   === MD5 UDF ===
  Plugin Name:      md5
  Description:      UDF Plugin for MD5
  Supports build:   static and dynamic

   === One Thread Per Connection Scheduler ===
  Plugin Name:      multi_thread
  Description:      plugin for multi_thread
  Supports build:   static
  Status:           mandatory

   === Old libdrizzle Protocol ===
  Plugin Name:      oldlibdrizzle
  Description:      plugin for oldlibdrizzle
  Supports build:   static
  Status:           mandatory

   === Pool of Threads Scheduler ===
  Plugin Name:      pool_of_threads
  Description:      plugin for pool_of_threads
  Supports build:   static
  Status:           mandatory

   === Default Signal Handler ===
  Plugin Name:      signal_handler
  Description:      plugin for signal_handler
  Supports build:   static
  Status:           mandatory

   === Single Thread Scheduler ===
  Plugin Name:      single_thread
  Description:      plugin for single_thread
  Supports build:   static
  Status:           mandatory

   === Archive Storage Engine ===
  Plugin Name:      archive
  Description:      Archive Storage Engine
  Supports build:   static
  Status:           mandatory

   === Blackhole Storage Engine ===
  Plugin Name:      blackhole
  Description:      Basic Write-only Read-never tables
  Supports build:   static and dynamic
  Configurations:   max, max-no-ndb

   === CSV Storage Engine ===
  Plugin Name:      csv
  Description:      Stores tables in text CSV format
  Supports build:   static
  Status:           mandatory

   === Memory Storage Engine ===
  Plugin Name:      heap
  Description:      Volatile memory based tables
  Supports build:   static
  Status:           mandatory

   === InnoDB Storage Engine ===
  Plugin Name:      innobase
  Description:      Transactional Tables using InnoDB
  Supports build:   static and dynamic
  Configurations:   max, max-no-ndb
  Status:           mandatory

   === MyISAM Storage Engine ===
  Plugin Name:      myisam
  Description:      Traditional non-transactional MySQL tables
  Supports build:   static
  Status:           mandatory


Report bugs to <http://bugs.launchpad.net/drizzle>.

Setting up MySQL on Amazon Web Services (AWS) Presentation

On Tuesday at the MySQL Camp 2009 in Santa Clara I presented Setting up MySQL on Amazon Web Services (AWS).

This presentation assumed you know nothing about AWS, and have no account. With Internet access via a Browser and a valid Credit Card, you can have your own running Web Server on the Internet in under 10 minutes, just point and click.

We also step into some more detail online click and point and supplied command line tools to demonstrate some more advanced usage.

Extending application data to the cloud

I was one of the invited panel speakers to A panel on Cloud Computing this week in New York. As one of 2 non vendor presenters, it was a great experience to be invited and be involved with vendors.

While I never got to use my slides available here, I did get to both present certain content, and indeed questions and discussions on the night were on other points of my content.

Cloud computing is here, it’s early days and new players will continue to emerge. For example, from the panel there was AppNexus, reviewed favorably at Info World in comparison with EC2 and Google App Engine, 10gen, an open source stack solution and Kaavo which from an initial 60 seconds of playing provide a management service on top of AWS similar to what ElasticFox provides. I need to investigate further how much the feature set extends and would compete with others like RightScale for example.

The greatest mystery came from Hank Williams and his stealth Kloudshare. He did elaborate more on where they aim to provide services. A new term discussed was “Tools as a service”, akin to moving use metaphorically from writing in Assembly language to the advanced frameworks of today’s generation of languages such as Java and Ruby.

Thanks to Murat Aktihanoglu of Unype who chaired the event.

Your data and the cloud

I will be speaking on July 29th in New York at an Entrepreneurs Forum on A Free Panel on Cloud Computing. With a number of experts including Hank Williams of KloudShare, Mike Nolet of AppNexus, and Hans Zaunere of New York PHP fame is should be a great event.

The focus of my presentation will be on “Extending existing applications to leverage the cloud” where I will be discussing both the advantages of the cloud, and the complexities and issues that you will encounter such as data management, data consistency, loss of control, security and latency for example.

Using traditional MySQL based applications I’ll be providing an approach that can lead to your application gaining greater power of cloud computing.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

Setting up on EC2

Thanks to my friend Dustin, and his EC2 demo using Elasticfox Firefox Extension for Amazon EC2 I got an EC2 image setup. With other references Link 1,Link 2,Link 3 I was also able to create my own AMI.

Some notes specific for my configuration.

Pre-config ElasticFox key for launching directly from ElasticFox SSH connections.

mkdir ~/ec2-keys
mv ~/Downloads/elasticfox.pem ~/ec2-keys/id_elasticfox
chmod 600 ~/ec2-keys/id_elasticfox
chmod 700 ~/ec2-keys/
ssh -i /Users/rbradfor/ec2-keys/id_elasticfox [email protected]

Installed Software.

apt-get update
apt-get -y autoremove
apt-get -y install apache2
apt-get -y install mysql-server
# Prompts for password (very annoying)
apt-get -y install php5
apache2ctl graceful
echo "Hello World" > /var/www/index.html
echo "< ? phpinfo() ?>" > /var/www/phpinfo.php

Configuration to save AMI.

scp -i ~/ec2-keys/id_elasticfox ~/ec2-keys/id_elasticfox pk-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem cert-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem [email protected]:/mnt
ec2-bundle-vol -d /mnt -c cert-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem -k pk-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem -u AccountNumber -r i386 -p ubuntu804_lamp
ec2-upload-bundle -b rbradford_804_lamp_ami -m /mnt/ubuntu804_lamp.manifest.xml -a AccessID -s SecretKey