Using GenAI directly in the database. A practical example using MySQL 8.0

If you have a typical MySQL installation setup of MySQL 8.0 (EOL) with replication, you can take advantage of VillageSQL extensions to generate AI responses directly with your source data with no impact on your production setup or application software. You can achieve results using SQL.

In this tutorial I will demonstate how you can produce GenAI summary information of your existing setup.

Existing Setup

  • MySQL 8.0
  • 1 or more replicas

To emulate this, we will use dbdeployer. See here for a full script to run locally.

dbdeployer example

$ dbdeployer deploy replication --gtid --sandbox-directory demo${MAJOR_VERSION_CLEAN} ${MINOR_VERSION}
$ cd ~/sandboxes/demo80
$ master/use

mysql> source support-ticket-tables.sql
mysql> source support-ticket-data.sql
erDiagram
    support_ticket {
        BIGINT ticket_id PK
        VARCHAR subject
        VARCHAR requester_name
        VARCHAR requester_email
        ENUM status
        ENUM priority
        VARCHAR assignee
        TIMESTAMP created_at
        TIMESTAMP updated_at
    }

    support_ticket_summary {
        BIGINT ticket_id PK,FK
        TEXT summary
        TIMESTAMP created_at
        TIMESTAMP updated_at
    }

    support_ticket_message {
        BIGINT message_id PK
        BIGINT ticket_id FK
        ENUM sender_type
        VARCHAR sender_name
        TEXT body
        TIMESTAMP created_at
    }

    support_ticket ||--|| support_ticket_summary : "has"
    support_ticket ||--o{ support_ticket_message : "has"

Setup of VillageSQL

On a separate server install:

For this demo I am using Gemini , however you can use Claude , OpenAI , or a local LLM such as Ollama .

Configure VillageSQL are replica

Running a default VillageSQL setup, and with dbdeployer installed locally can configure MySQL replication with:

SET GLOBAL GTID_MODE=OFF_PERMISSIVE;
SET GLOBAL GTID_MODE=ON_PERMISSIVE;
SET GLOBAL ENFORCE_GTID_CONSISTENCY=ON;
SET GLOBAL GTID_MODE=ON;
CHANGE REPLICATION SOURCE TO SOURCE_HOST='127.0.0.1', SOURCE_PORT=22435, SOURCE_USER="rsandbox", SOURCE_PASSWORD="rsandbox",  SOURCE_AUTO_POSITION=1;
SHOW WARNINGS;
START REPLICA;
SHOW REPLICA STATUS\G

As you have a replica, you may just wish to perform a database restore to get an current and updating copy.

o

USE aidemo; SET @GEMINI_API_KEY=’’; SET SESSION group_concat_max_len = 1000000; INSERT INTO support_ticket_summary (ticket_id, summary) SELECT ticket_id, ai_prompt(‘google’, ‘gemini-3.5-flash’, @GEMINI_API_KEY, CONCAT(‘Summarize support ticket information of ‘,GROUP_CONCAT(body))) FROM support_ticket_message WHERE ticket_id NOT IN (select ticket_id FROM support_ticket_summary) GROUP BY ticket_id;

SELECT * FROM support_ticket_summary LIMIT 1\G

villagesql> SELECT * FROM support_ticket_summary LIMIT 1\G
*************************** 1. row ***************************
 ticket_id: 1
   summary: Here is a summary of the support ticket regarding the installation, configuration, and port customization of Ollama on macOS and Linux.

---

### **Ticket Summary**

*   **Customer Goal:** Install Ollama to run local LLMs on macOS and a Linux server, and resolve a port conflict by changing Ollama's default port (11434).
*   **Status:** Resolved successfully.

...
Tagged with: MySQL VillageSQL Extensions AI

Producing Skewness statistics with SQL

Skewness measures the asymmetry of a distribution. A perfectly symmetric distribution has a skewness of zero. A positive skew (right-skewed) means the tail extends to the right — a small number of high values pull the mean above the median.

Exploring the vsql-ai extension

The vsql-ai extension adds AI prompt capabilities and text embeddings directly in SQL queries, with support for Anthropic Claude , Google Gemini , OpenAI ChatGPT , or a local LLM such as $aOllama .

Producing Chi-Squared statistics with SQL

The Chi-Squared test is one of the most widely used statistical tests for categorical data. It comes in two flavors: the goodness-of-fit test asks whether an observed frequency distribution matches an expected one, while the test of independence asks whether two categorical variables are associated with each other.