How many installations, and just what are they doing?

Would it not be great if on the MySQL website there was a page of stats (updated daily) that provided statistics like number of installations, a breakdown of versions registered (not certain I like that exact word) , OS’s, countries etc. More specifically, some useful stats on the engine types in practical use, avg number of tables per database etc. Of course the types of stats could be limitless, but with the success of MySQL as well as other open source projects, more imperial figures on installations other then just downloads I think would definitely benefit given the current momentum. (Availability of information to competitors could be both a good and bad thing.) Perhaps figures can be shown in percentages, not actual numbers.

Anyway, nice idea you say, we can all come up with ideas, but how could you implement something like this.

I made a post yesterday and mentioned the thought of an XML storage engine. It seemed to stir some feedback. Those notes were actually taken from more detailed notes that I’ve never published so I thought it would be good to explain some of the background.

Many months ago at the Brisbane MySQL Users Group our informal discussions following the evening presentation turned to a more detailed analysis of the different engine types available within MySQL. A throw away comment by somebody like ‘We have no idea if BDB is really used?’ prompted me to consider the possibility of why can’t this question be answered in the future.

How would it work? Well for each installation of MySQL on a server (given you can have multiple installations of varying versions per machine) the option to provide a feedback loop is made available post installation, it could even be part of the install processs (probably not in .rpm). The feedback loop configures the following basic settings.

  • Frequency (Monthly, Weekly, Daily, Once Off)
  • Granularity (Summary, Verbose)

What would I have this feedback loop submit:

  • MySQL Version
  • OS Version
  • Country/Locale
  • Number of User Databases (excluding mysql & test)
  • Number of tables per storage Engine per database

More detailed information, not necessary for stats, but perhaps useful for MySQL internals would be output from the SHOW set of commands. (variables, status, table status,engines, innodb status). This alone is another topic, allowing collection of information that can be used to evaluate usage, possible tunings etc). Additional information such as number of triggers, stored procedures etc, would identify if this level of functionality is actually really being used out there.

By registering with MySQL to provide a feedback loop from your server installation to MySQL, you are provided with a unique Id, which enables clear tracking for MySQL, but provides clear anonymity for the customer to ensure we are not taking any propriety information from them.

We never want to hide the data of an installation that is provided to MySQL, it needs to be in the open. Why not make it an open standard of data exchange (using XML), and why not then enable the installation itself to use these statistics internally for some level of reporting.

XML allows for the files to be reused outside of MySQL directly. Using XSLT you could take these XML files for better presentation of statistical information. This would probably go against the ethos of MySQL with now information stored in XML instead of the database. So why not create an XML Storage Engine.

It would for lack of a better word be grossly inefficient, however it was more a Read/Only type structure, more you just write out a chunk of data in one go, and you always read the entire file. You don’t do partial updates (you could, but you rewrite the entire file), it leads itself to statistically information, written once only. Of course you would have to satisfy the search

What other possible extensions for an XML storage engine, you could store and read RSS feeds, even OpenDoc standards as implemented by Open Office for Documents.

Now, regardless of all the associated complexities like, how do you post add this functionality to older versions, how do you automate the scheduling of the feedback look across different OS’s, blah, blah, blah, the purpose is just to throw the idea around for a minute and see what falls out.

Additional statistics more useful for internal MySQL usage could pinpoint numbers of upgrades, from which version to which versions (more specifically if not current GA version). The biggest downside is as part of Open Source you can’t enforce gathering any of this information, so when the website says 1 million installations (it’s 1 million of people that have taken the time to notify MySQL, and that could be 10% or 50%, you just don’t know).

Could there perhaps be space here for a commercial product? I doubt it as this is very tightly coupled with MySQL, and unless there was seamless integration I doubt very few people would go to any trouble after installing MySQL.

It’s just an idea, but it’s nice to have ideas at least some of the time.