Things I Read About Recently [#Issue 2024.02]

Indie Newsletter Tool Generates $15,000 a Month

There are so many different email newsletter sites you could wonder if there is market saturation. MailChimp, Mailgun, ConvertKit, Sendgrid (now part of Twilio it seems), Moosend and Mailersend come to mind.

It seems the space still has plenty of revenue-producing options including buttondown.email reportedly a side gig generating $15k per month. Source: BoringCashCow

When I asked a good friend and author of the Technical SEO Weekly his use of ConvertKit directed me to this Baremetrics Dashboard which is another product to look at sometime.

LLMs and Programming in the first days of 2024

How do use an LLM? If you are still on the fence start getting into the habit of using it more frequently then start. I now use ChatGPT and Claude AI daily, and with a crowded market there are many other emerging technologies to also consider.

I use ChatGPT for coding and image generation with DALL.E. I use Claude more for reviewing large documents that seems to be ideal for producing a summary, or to generate a fictitious movie script from those documents.

I do not like Javascript nor do I wish to actually learn this language however I write it daily via ChatGPT. Javascript is the ever changing technology of web development and it’s impossible to keep up with the next product, or version of a product you may know. ChatGPT helps me navigate this combined with asking for HTML and TailwindCSS.  However, it’s not perfect, you need to be an experienced engineer that has learned how to write code for many years to ask the right questions and to correct the LLM when it does not produce what you expect. Lets look at CSS. Now there is flex and grid and its hard to keep up with changing features that browsers support. This is where ChatGPT has helped me. I have been using Tailwindcss but it still took an expert friend 30 minutes to help me debug a CSS formatting issue of a future OBS twitch streaming project to correctly size the content all in a 1920×1080 box. I actually learned a lot of new features of Google Chrome Developer Tools inspector I did not know and are probably just the start of expert debugging features.

Until a few months ago I never knew it’s now much easier to read JSON in Javascript.

async function fetchData() {
  try {
    const response = await fetch('data.json');
    const data = await response.json();
    console.log(data);
    return data
  } catch (error) {
    console.error('Error fetching data:', error);
  }
}

let data = await fetchData();

I’d like to remind users that  ChatGPT can make mistakes. Consider checking important information.. Source:  TLDR

ParadeDB (GitHub Repo)

Every day there is another PostgreSQL product to review.  I am a current user of ElephantSQL which I didn’t know existed two months ago. Neon and Tembo are two more postgreSQL serverless-related products on my product review list.  Now adding ParadeDB as well as reading Thoughts on PostgreSQL in 2024.

About ‘Things I Read About Recently’

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn or other sources to read something new covering the professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned today, what I should really learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow to name a few.

AWS RDS Aurora wish list

I’ve had this list on a post-it note on my monitor for all of 2022. I figured it was time to write it down, and reuse the space.

In summary, AWS suffers from the same problem that almost every other product does. It sacrifices improved security for backward compatibility of functionality. IMO this is not in the best practices of a data ecosystem that is under constant attack.

  • Storage should be encrypted by default. When you launch an RDS cluster its storage is not encrypted. This goes against their own AWS Well-Architected Framework Section 2 – Security.
  • Plain text passwords. To launch a cluster you must specify a password in plain text on the command line, again not security best practice. At least change this to using a known secret from AWS secrets manager.
  • TLS for administrative accounts should be the only option. The root user should only be REQUIRE SSL (MySQL syntax).
  • Expanding on the AWS secrets manager usage for passwords, there should not need to be lambda code and cloudwatch cron event for rotation, it should just be automatically built in.
  • The awscli has this neat wait command that will block until you can execute the next statement in a series of sequential events to prepare and launch a cluster, but it doesn’t work for create-db-cluster. You have to build in your own manual “wait” until “available” process.
  • In my last position, I was unable to enforce TLS communications to the database from the application. This insecure practice is a more touchy situation, however, there needs to be some way to ensure security best practices over application developer laziness in the future.
  • AWS has internal special flags that only AWS support can set when say you have a bug in a version. Call it a per-client feature flag. However, there is no visibility into what is set, which account, which cluster, etc. Transparency is of value so that the customer knows to get that special flag unset after minor upgrades.
  • When you launch a new RDS Cluster, for example, MySQL 2.x, you get the oldest version, back earlier in the year it was like 2.7.2, even when 2.10.1 was released. AWS should be using a default version when only an engine is specified as a more current version. I would advocate the latest version is not the automatic choice, but it’s better to be more current.
  • the ALTER SYSTEM CRASH functionality is great, but it’s incomplete. You cannot for example crash a global cluster, forcing a region-specific failover. If you have a disaster resiliency plan that is multi-region it’s impossible to actually test it. You can emulate a controlled failover, but this is a different use case to a real failover (aka Dec 2021)
  • Use arn when it’s required not id. This goes back to my earlier point over maximum compatibility over usability, but when a --db-instance-identifier, or --db-instance-identifier requires the value to be the ARN, then the parameter should be specific. IMO –identifier is what you use for that argument, e.g. --db-cluster-identifier. When you specify for example --replication-source-identifier this must be (as per docs) “The Amazon Resource Name (ARN) of the source DB instance or DB cluster if this DB cluster is created as a read replica.” It should then be --replication-source-arn. There are a number of different occurrences of this situation.

SQL, ANSI Standards, PostgreSQL and MySQL

I have recently been working with the Donors Choose Open Data Set which happens to be in PostgreSQL. Easy enough to install and load the data in PostgreSQL, however as I live and breath MySQL, lets load the data into MySQL.

And here is where start our discussion, first some history.

SQL History

SQL – Structure Query Language is a well known common language for communicating with Relational Databases (RDBMS). It is not the only language I might add, having both used many years ago and just mentioned QUEL at a Looker Look and Tell event in New York. It has also been around since the 1970s making it; along with C; one of oldest in general use programming languages today.

SQL became an ANSI standard in 1986, and an ISO standard in 1987. The purpose of a standard is to provide commonality when communicating or exchanging information; in our case; a programming language communicating with a RDBMS. There have been several iterations of the standard as functionality and syntax improves. These are commonly referred to as SQL-86, SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008 and SQL:2011.

And so, with SQL being a standard it means that what we can do in PostgreSQL should translate to what we can do in MySQL.

SQL Communication

Both products provide a Command Line Interface (CLI) client tool for SQL communication, mysql for MySQL and psql for PostgreSQL. No surprises there. Both use by default the semicolon ; as a SQL statement terminator, and both CLI tools use \q as a means to quit and exit the tool. Certainly not a standard but great for syntax compatibility.

DDL Specification

Our journey begins with defining tables.

DROP TABLE

Both products SQL syntax support DROP TABLE. Infact, both support the DROP TABLE [IF EXISTS] syntax.

DROP TABLE donorschoose_projects;
DROP TABLE IF EXISTS donorschoose_projects;

CREATE TABLE

Both support CREATE TABLE.

Both support defining columns in the typical format <column_name> <datatype>, and both support the NOT NULL attribute. Talking about specific datatypes for columns is a topic on its own and so I discuss this later.

The PostgreSQL syntax was a table option WITHOUT OIDS which is not valid in MySQL. It is also obsolescent syntax in PostgreSQL 9.3. From the PostgreSQL manual “This optional clause specifies whether rows of the new table should have OIDs (object identifiers) assigned to them. The default is to have OIDs. Specifying WITHOUT OIDS allows the user to suppress generation of OIDs for rows of a table. This may be worthwhile for large tables … Specifying WITHOUT OIDS also reduces the space required to store the table on disk by 4 bytes per row of the table, thereby improving performance.”

In this example as this is just for testing, dropping the WITHOUT OIDS syntax creates a mutually compatible syntax.

Comments

Both MySQL and PostgreSQL support -- as an inline comment in an SQL statement. No need to strip those out.

ALTER TABLE

Both support ALTER TABLE ADD CONSTRAINT syntax which in our example is used to define the PRIMARY KEY, however while the syntax remains the same, the choice of datatype affects the outcome.

The following works in both products when the datatype is CHARACTER(32). More about CHARACTER() later.

ALTER TABLE donorschoose_projects ADD CONSTRAINT pk_donorschoose_projects PRIMARY KEY(_projectid);

In our example dataset, the primary key is defined with a TEXT datatype, and in MySQL this fails.

ERROR 1170 (42000): BLOB/TEXT column '_projectid' used in key specification without a key length

As the data in the dataset for primary keys by further analysis is indeed a 32 byte hexadecimal value, this is changed to CHARACTER(32) to be compatible for this data loading need. This however is an important key difference in any migration process with other data sets.

Side Note

Both products support the definition of the PRIMARY KEY in the CREATE TABLE syntax two different ways.

CREATE TABLE demo_pk1 (id character(32) NOT NULL PRIMARY KEY);
CREATE TABLE demo_pk2 (id character(32) NOT NULL, PRIMARY KEY(id));

CREATE INDEX

Both use CREATE INDEX syntax however with our sample dataset, this is the first observed difference in syntax with provided sample SQL statements.

PostgresSQL

CREATE INDEX projects_schoolid ON projects USING btree (_schoolid);

MySQL
The USING <type> qualifier must appear before the ON <table>.

CREATE INDEX USING btree projects_schoolid ON projects (_schoolid);

In both products USING btree is an optional syntax (for minimum compatibility) purposes so removing this provides a consistency.

Data Types

The following data types are defined in the PostgreSQL example data set. Each is discussed to identify a best fit in MySQL. For reference:

character

This data type is for a fixed width character field and requires a length attribute. MySQL supports CHARACTER(n) syntax for compatibility, however generally CHAR(n) is the preferred syntax. Indeed, PostgreSQL also supports CHAR(n).

The following showing both variants is valid in both products.

CREATE TABLE demo_character(c1 CHARACTER(1), c2 CHAR(1));

varchar/character varying

While this dataset does not use these datatypes, they are critical in the general conservations of character (aka string) types. This refers to a variable length string.

While character varying is not a valid MySQL syntax, varchar is compatible with both products.

CREATE TABLE demo_varchar(vc1 VARCHAR(10));

text

In PostgresSQL, text is used for variables of undefined length. The maximum length of a field is 1GB as stated in the FAQ.

In MySQL however TEXT only stores 2^16 characters (64K). The use of LONGTEXT is needed to support the full length capacity in PostgeSQL. This store 2^32 characters (~4GB).

Of all the complexity of this example dataset, the general use of text will be the most difficult to modify to a more applicable VARCHAR or TEXT datatype when optimizing in MySQL.

integer

PostgreSQL uses the integer datatype for a signed 4 byte integer value. MySQL supports the same syntax, however generally prefers to refer to the shorter INT syntax. Both products support both overall.

mysql> CREATE TABLE demo_integer(i1 INTEGER, i2 INT);
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO demo_integer VALUES (1,-1);
Query OK, 1 row affected (0.05 sec)

mysql> SELECT * FROM demo_integer;
+------+------+
| i1   | i2   |
+------+------+
|    1 |   -1 |
+------+------+
1 row in set (0.00 sec)
demo=# CREATE TABLE demo_integer(i1 INTEGER, i2 INT);
CREATE TABLE
demo=# INSERT INTO demo_integer VALUES (1,-1);
INSERT 0 1
demo=# SELECT * FROM demo_integer;
 i1 | i2
----+----
  1 | -1
(1 row)

And just to note the boundary of this data type.

mysql> TRUNCATE TABLE demo_integer;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO demo_integer VALUES (2147483647, -2147483648);
Query OK, 1 row affected (0.04 sec)

mysql> SELECT * FROM demo_integer;
+------------+-------------+
| i1         | i2          |
+------------+-------------+
| 2147483647 | -2147483648 |
+------------+-------------+
1 row in set (0.00 sec)
demo=# TRUNCATE TABLE demo_integer;
TRUNCATE TABLE

demo=# INSERT INTO demo_integer VALUES (2147483647, -2147483648);
INSERT 0 1
demo=# SELECT * FROM demo_integer;
     i1     |     i2
------------+-------------
 2147483647 | -2147483648
(1 row)

The difference is in out-of-bounds value management, and here MySQL defaults suck. You can read my views at DP#4 The importance of using sql_mode.

demo=# TRUNCATE TABLE demo_integer;
TRUNCATE TABLE
demo=# INSERT INTO demo_integer VALUES (2147483647 + 1, -2147483648 - 1);
ERROR:  integer out of range
demo=# SELECT * FROM demo_integer;
 i1 | i2
----+----
(0 rows)
mysql> TRUNCATE TABLE demo_integer;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO demo_integer VALUES (2147483647 + 1, -2147483648 - 1);
Query OK, 1 row affected, 2 warnings (0.07 sec)

mysql> SELECT * from demo_integer;
+------------+-------------+
| i1         | i2          |
+------------+-------------+
| 2147483647 | -2147483648 |
+------------+-------------+
1 row in set (0.00 sec)

While not in this dataset, both support the bigint data type. While the PostgreSQL docs indicate bigint is 8 bytes, testing with PostgresSQL 9.3 failed. Something to investigate more later.

demo=# CREATE TABLE demo_bigint(i1 BIGINT);
CREATE TABLE
demo=# INSERT INTO demo_bigint VALUES (2147483647 + 1), (-2147483648 - 1);
ERROR:  integer out of range
mysql> CREATE TABLE demo_bigint(i1 BIGINT);
Query OK, 0 rows affected (0.12 sec)

mysql> INSERT INTO demo_bigint VALUES (2147483647 + 1), (-2147483648 - 1);
Query OK, 2 rows affected (0.04 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * from demo_bigint;
+-------------+
| i1          |
+-------------+
|  2147483648 |
| -2147483649 |
+-------------+
2 rows in set (0.01 sec)

And for reference, both products support smallint, a 2-byte integer.

Each product has additional integer data types.

numeric

For a fixed-precision number, PostgreSQL uses numeric but supports decimal.It would not be surprising to know that MySQL uses DECIMAL and for compatibility supports NUMERIC.

This leads to a side-bar discussion on knowing your data-types for your product. In a recent interview for a MySQL Engineer, a candidate (with SQL Server experience) provided a code example defining the NUMERIC datatype. I knew it was technically valid in MySQL syntax, but never actually seen this in use. When I asked the candidate for what was the syntax commonly used for a fixed-precision datatype they were unable to answer.

real/double precision

This dataset does not include these data types, however for reference, PostgresSQL uses real for 4 bytes, and double precision for 8 bytes. MySQL uses float for 4 bytes, and double for 8 bytes. MySQL however supports both PostgreSQL syntax options, however PostgreSQL supports float, but not double.

demo=# CREATE TABLE demo_floatingpoint(f1 FLOAT, f2 REAL, d1 DOUBLE, d2 DOUBLE PRECISION);
ERROR:  type "double" does not exist
LINE 1: ...TE TABLE demo_floatingpoint(f1 FLOAT, f2 REAL, d1 DOUBLE, d2...

demo=# CREATE TABLE demo_floatingpoint(f1 FLOAT, f2 REAL, d2 DOUBLE PRECISION);
CREATE TABLE
mysql> CREATE TABLE demo_floatingpoint(f1 FLOAT, f2 REAL, d1 DOUBLE, d2 DOUBLE PRECISION);
Query OK, 0 rows affected (0.07 sec)

date

Both PostgreSQL and MySQL use the date data type.

timestamp

Both PostgreSQL and MySQL use the timestamp data type to store date/time values. However, there is a difference in both precision and implementation here.

In PostgresSQL, timestamp supports a date before EPOCH, while in MySQL it does not. MySQL uses the DATETIME datatype.

Using PostgresSQL timestamp and MySQL DATETIME, both support microsecond precision. MySQL however only started to provide this in MySQL 5.6.

A key difference in column definition is the PostgreSQL timestamp without time zone syntax, used in our example dataset. Analysis of data loading will determine the impact here.

boolean

SQL:1999 calls for a Boolean datatype, and both PostgreSQL and MySQL support defining a column as BOOLEAN. MySQL however implicitly converts this to a SIGNED TINYINT, and any future DDL viewing shows this reference.

When referencing boolean, in PostgreSQL WHERE column_name = TRUE or WHERE column_name = t retrieves a true value. In MySQL WHERE column_name = TRUE or WHERE column_name = 1. When you SELECT a boolean, in PostgresSQL the answer is ‘t’, in MySQL, the answer is 1.

demo=# CREATE TABLE demo_boolean (b1 boolean);
CREATE TABLE
demo=# INSERT INTO demo_boolean VALUES (TRUE),(FALSE);
INSERT 0 2
demo=# SELECT * FROM demo_boolean;
 b1
----
 t
 f
(2 rows)
mysql> CREATE TABLE demo_boolean (b1 boolean);
Query OK, 0 rows affected (0.11 sec)

mysql> INSERT INTO demo_boolean VALUES (TRUE),(FALSE);
Query OK, 2 rows affected (0.03 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM demo_boolean;
+------+
| b1   |
+------+
|    1 |
|    0 |
+------+
2 rows in set (0.00 sec)

Other Data Types

Only the data types in this example have been reviewed.

Other syntax

In our sample SQL script, there is psql specific syntax to show a debugging line with \qecho .... For compatibility these are removed.

The loading of data with the \COPY <table_name> FROM PSTDIN WITH CSV HEADER is PostgreSQL specific and so loading the data is a future topic.

Finally, the VACUUM ANALYZE <table_name> command is also PostgreSQL specific and removed. This is a means effectively of optimizing and analyzing the table.

Both PostgreSQL and MySQL have an ANALYZE command, however the syntax is different, with the required TABLE keyword in MySQL.

PostgresSQL

ANALYZE donorschoose_projects;

ANALYZE TABLE donorschoose_projects;
ERROR:  syntax error at or near "table"

MySQL

ANALYZE donorschoose_projects;
ERROR 1064 (42000): You have an error in your SQL syntax;...

ANALYZE TABLE donorschoose_projects;

MySQL has an OPTIMIZE TABLE syntax, however while technically valid syntax this is not compatible with the default storage table InnoDB.

mysql> OPTIMIZE TABLE donorschoose_projects;
+----------------------------+----------+----------+-------------------------------------------------------------------+
| Table                      | Op       | Msg_type | Msg_text                                                          |
+----------------------------+----------+----------+-------------------------------------------------------------------+
| test.donorschoose_projects | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| test.donorschoose_projects | optimize | status   | OK                                                                |
+----------------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.32 sec)