I’ve had this list on a post-it note on my monitor for all of 2022. I figured it was time to write it down, and reuse the space.
In summary, AWS suffers from the same problem that almost every other product does. It sacrifices improved security for backward compatibility of functionality. IMO this is not in the best practices of a data ecosystem that is under constant attack.
- Storage should be encrypted by default. When you launch an RDS cluster its storage is not encrypted. This goes against their own AWS Well-Architected Framework Section 2 – Security.
- Plain text passwords. To launch a cluster you must specify a password in plain text on the command line, again not security best practice. At least change this to using a known secret from AWS secrets manager.
- TLS for administrative accounts should be the only option. The root user should only be REQUIRE SSL (MySQL syntax).
- Expanding on the AWS secrets manager usage for passwords, there should not need to be lambda code and cloudwatch cron event for rotation, it should just be automatically built in.
- The awscli has this neat
wait
command that will block until you can execute the next statement in a series of sequential events to prepare and launch a cluster, but it doesn’t work forcreate-db-cluster
. You have to build in your own manual “wait” until “available” process. - In my last position, I was unable to enforce TLS communications to the database from the application. This insecure practice is a more touchy situation, however, there needs to be some way to ensure security best practices over application developer laziness in the future.
- AWS has internal special flags that only AWS support can set when say you have a bug in a version. Call it a per-client feature flag. However, there is no visibility into what is set, which account, which cluster, etc. Transparency is of value so that the customer knows to get that special flag unset after minor upgrades.
- When you launch a new RDS Cluster, for example, MySQL 2.x, you get the oldest version, back earlier in the year it was like 2.7.2, even when 2.10.1 was released. AWS should be using a default version when only an engine is specified as a more current version. I would advocate the latest version is not the automatic choice, but it’s better to be more current.
- the
ALTER SYSTEM CRASH
functionality is great, but it’s incomplete. You cannot for example crash a global cluster, forcing a region-specific failover. If you have a disaster resiliency plan that is multi-region it’s impossible to actually test it. You can emulate a controlled failover, but this is a different use case to a real failover (aka Dec 2021) - Use arn when it’s required not id. This goes back to my earlier point over maximum compatibility over usability, but when a
--db-instance-identifier
, or--db-instance-identifier
requires the value to be the ARN, then the parameter should be specific. IMO –identifier is what you use for that argument, e.g.--db-cluster-identifier
. When you specify for example--replication-source-identifier
this must be (as per docs) “The Amazon Resource Name (ARN) of the source DB instance or DB cluster if this DB cluster is created as a read replica.” It should then be--replication-source-arn
. There are a number of different occurrences of this situation.