Seems I have over time, thought of many ideas, jotted some notes on some, and even done some work, but everybody knows that “home projects” can take a long time.
Here are a few that have resurfaced over the past month, and I doubt I’ll ever get to them, or perhaps some other enterprising person has already done a similar thing. Of course most are for my own personal/professional gratification, but input from others of “great idea, when do we see it” could sway my interests.
INFORMATION_SCHEMA for MySQL Version 4
Well, quering the INFORMATION_SCHEMA is very cool, and long overdue for information gathering including statistics, schema definitions, schema version comparision tools etc. Of course there are concerns regarding the performance of using the INFORMATION_SCHEMA, and any design should significantly consider this limitation. The INFORMATION_SCHEMA is not physically represented as a normal MySQL database with tables and real data files, and then can be optimised appropiately, queries reference various sources of information and return the results at query time.
How would I achieve this? Here is an approach. The biggest issue would be what programming technology and non-Linux support.
- Generate a physical INFORMATION_SCHEMA create SQL script from the most current database version. This would allow you to create a real database with real tables.
- Population. Well at least some tables will be empty, but some work would be required to the population of other information. A first pass with some easy shell scripting and SHOW commands would easily populate information for SCHEMATA (SHOW DATABASES), TABLES (SHOW TABLES), COLUMNS (DESC [table]). Some are covered by the mysql schema. You get the picture. Of course some information will be more more difficult to populate easily.
- Syncing. Populating is one thing, but if the information is not current then it becomes useless. Depending on your organisations management, in a controlled environment, simply re-running the population process when schema information is changed, or run nightly may work. A more brut force approach may be to look at filesystem timestamps of the database table definitions and trigger when changes occur, but I suspect this would a poor approach.
Storage Engines in use
I’ve often wondered, do people use X storage engine.
I would love to see plastered on the MySQL Home Page stats of 99,999,999 Downloads, 99,999,999 Installations reporting in, and then some breakdowns of database versions, storage engine usage, even demographics. This type of statistical information can be great, but can also be dangerous to competitors, or can be too detailed to become useful. Consider the objective it a general census.
I recently was introduced to Fotolog, and in the top left corner is Members, Photos, and Photos today. Of course now they have volume the numbers are impressive. You blink, do a refresh and they have changed.
Having a more general information goal approach, rather then detailed would be my guidelines. Here are some points.
- Each MySQL installation is registered providing it a unique id. This is used for all collection and communication, and there is never any real reference back to organisation/name etc. How this works correctly is a little difficult, but would need to involve some checksuming of information like hostname,os,mysql version,maybe ip.
- A script runs to generate basic statisical information, including unique installation id, present GMT date/time, MySQL version, master/slave, installation date, number of databases, breakdown of storage engines used per database/installation, total number of tables per database/installation.
- The Information needs to be recorded, so it can be transmitted, and also used for historical comparision, best option would a flatfile on the filesystem.
- Having an automated approach to then say email this information to a known server. There would need to be some means to have some authentication, and feedback to confirm success. Using email is not an ideal approach, but is a more readily available medium. Another easy means would be a webservices approach, but would require a more direct internet connection, while mail could overcome this means.
- More advanced ideas would be to record information in and XML format, have a XML storage engine, and enable reports to be run against this collated information, so the System Adminstrator has at there finger tips historical details that are gathered. The benefit of an XML approach is this can be more easily collated with clever approaches with XSLT for example. The fallback would could be a CSV file, but the information being sent, may well need to be more structured for various reasons.
- Of course if it was limited to Version 5 up, the benefits of stored procedures, events and UDF’s a lot of this could be implemented in MySQL specific technologies, but what about the bulk of installations pre Version 5, you would definitely what to include these.
- My concerns would definitely be SPAM abuse, or false figures, so these points alone may nullify the process even being worth it.
This is just a broad stroke overview, I’ve got more detailed analysis of pro’s, con’s, example file layouts, and a means of collating the information and providing dynamic reporting on the server.
Of course there are a few ideas I have implemented, and I’m still passionate about and working on in that “spare time” we all never have.
A showcase of MySQL (via existing popular Open Source apps). I’ve done this via a VMWare Appliance called UltimateLAMP.
I did this for two reasons, first of course I wanted to easily show applications that could be used without any technical requirements (indeed the hardest part is installing the free VMware player), and second, if there was more penetration of possible MySQL options in my present home city of Brisbane, it may provide some MySQL work opportunities. Of course, this an application approach to demonstrating MySQL, rather then a MySQL features approach, and is dependent then on the community and these applications may be less then ideal in a large scale deployment or may not be of a quality to be placed with MySQL, but it’s one way to get leverage.
I would like some way to poll usage of MySQL within organsiations my home town, but it’s a very difficult approach, and I’m sure some organisations would not even know they are using MySQL. The process may also provide a means of providing a service, say quarterly updates of software (MySQL and open source products), customisations, but I think it would be a difficult sell. Having an army of people applying these ideals across a number of cities may provide a more marketable approach.
ACID Performance Benchmark
Leveraging the Apache project JMeter, which has all the infrastructure for handling theads, reporting, graphing, and volume load testing, I’ve written to enhancements for JMeter JDBC Sampler, the first was CALL support, enabling calls to stored procedures. The second is TRANSACTION support enabling a number of SQL statements to be execute in a true transaction, all pass or all fail.
These allow me to provide an application specific performance approach. Now MySQL has various means of testing, the mysql-test that comes with installations (I’ve got more on this for a later time), as well as mysqlslap (again, some more points on this at a later time). Both of these tools have benefits, but I wanted a more application specific approach. Using JMeter provides an overhead, but it provides a more realistic application approach, my goal is not to get the maximum thoughtput possible, being as close to the screws, but to provide a more realistic approach.
Of course, my objective is to use application specific SQL, the SQL you run, this would need to be gathered via various means. Information gathering including what types of transactions, the interactions, the volumes of use would be needed to create realistic approaches. The benefits of JMeter for example, is I can simulate various transactions, enable them to interact, enable them to have implied delays to simulate user use. These are some of the advantages over the current MySQL performance approaches.
Providing this level of simulation however is only half the requirement. This provides for a benchmark for performance, but what about validation. Sure the system can be used with 100 threads with X Insert Orders, and Y Select Orders, and Z Invoice Prints etc per thread, but what if the there were issues and errors.
Data Quality is a key interest and this level of benchmarking also has to have the extent of validating the results, and taking appropiate actions for failures. I’m planning also that this can be used by enabling Referential Integrity (via an appropiate storage engine) on an existing application and then confirming transactional state of queries within applications that are not presently transaction safe.
Presently, I using this approach to assist in the testing of the MySQL Community Storage Engine PBXT. I’m hoping in conjunction with the Sakila Sample Database and various data sets, that I can also publish comparision results with other Storage Engines.
Schema Designer Analyser
My background is database modelling and large systems design, and I’m writing a A MySQL Guide to Database Modelling, Design and Support (still under development). This is an online and very simple approach for open source development to improve it’s image in database modelling. Keeping things simple will help all those non-professionals out there.
To back up this article, which has a checklist for organisations to follow and then incoporate into standards, is a tool to do the analysis that the checklists requires.
I had hoped to initial develop this using the GRT portion of the MySQL Workbench, but I’ve decided to make that a secondary project for now, focusing on a tehnology of greater understanding, possibly Java. Of course the biggest issues is accurate gathering of schema information, so you can see why a INFORMATION_SCHEMA for MySQL version 4 would come in handy.
I’ve got a lot more ideas, and discussions with other MySQL peers rekind ideas I’ve thought of previously. I really should get to putting them all out there to see how many people think I’m crazy, or to find other crazy people to contribute, or even fund! One can dream.