The March NY MySQL Meetup featured a presentation from Infobright, a data warehousing solution built on the MySQL Product.
With a pitch of “Simplicity, Scalability and low TCO” I became more impressed with the capability to delivery on these as the presentation proceeded. Here are some highlights.
- The company and product has been around for a few years. Infobright started as a compression engine to sit beside Teradata, providing a significant cost saving to clients, and allowing a two way data transfer between Teradata.
- In September 2008, a open source community edition was released, called ICE. (Which I didn’t know)
- The technology is based on a Rough Set theory, a mathematical approach
- Using a column oriented approach, compression generally starts at 10:1, different applications can get 30:1 or better
- There is basically no tuning, there are no indexes. Knowledge is gleaned at data loading and each data pack node holds key information per column, such as range of values (min,max).
- Some interesting results are, there is a constant load time, it doesn’t degrade over time as the size of your data increases. Also, Query performance scales with data volume.
- Depending on queries, the knowledge grid can retrieve results without having to uncompress the data, i.e. introspection of the meta data is all that is needed
- Infobright is not a pluggable storage engine, rather a custom binary of MySQL. This is due to the restrictions of the API and the lack of optimizer push down conditions for example.
The product is not without some limitations, but you have to realize the product is for a data warehousing implementation, not an OLTP web app. It’s not great with SELECT *, and large text strings for example.
Functionality continues to be added, with a recent release adding many more MySQL Functions, but again, Infobright does not claim to be a solution to everybody, there is not UDF support or SP support at this time, however I’d warrant this is really not needed.
While the presentation went into some detail regarding the knowledge grid, data packs, data pack nodes, and pack to pack integration from a slide perspective, the presentation lacked the technical here is how you use the loader to get data out of MySQL and into Infobright. Here is the throughput, etc. As a marketing presentation it had the right content, but I’d like to now see the companion technical presentation.
Having previously been part of the MySQL Consulting team, and having worked also in the Storage Engine API with the Nitro Storage engine I have a distinct advantage of knowing the complexities of integration with MySQL. We can only hope this continues to improve with future releases of MySQL enabling Infobright and other products to integrate better and keep up to date with the MySQL Release cycle.