One nice feature of MySQL is its table data compression. It comes with the InnoDB engine and as long as you have “innodb_file_per_table”. The ideal situation would be to decide what and how to compress in the design phase of your project, but reality is seldom ideal.
At CAPSiDE we got a warning alert, from one of the thousands of systems we manage for our customers, it was about free space in a filesystem that hosted MySQL data.
But this time we were on a more traditional infrastructure where we could not cast our cloud wizardry spells. In this case, a disk expansion meant some downtime and the customer was in peak season, so downtime was out of the question.
Fortunately, one of the biggest tables consisted mainly of records with one big text field. We had a 240GB FileSystem with about 30GB free, which was using 50GB of mostly text.
A gzip of a mysqldump of the table was a mere 2.25GB or 4.5% of the original size, so there seemed to be no problem with doing a simple
This was the first idea, to go from here:
What could possibly go wrong?
It turns out the answer is “plenty”.
How did we find out the pitfalls? Did someone really clever and experienced know beforehand about the answer to our question?
It would be nice to say yes, but the truth is more straightforward. We do have a lot of clever and experienced people, mind you, but most importantly we know that in a production system you better test your changes before applying them. In this case, with no cloud wizardry, our test environment was the production one. Not the ideal situation, but better than going for the first idea without any testing.
So we created the “not_so_big_test_table” with about 5% of the contents of the big one and the same structure. We had a 2.5GBytes MySQL table to test and still 27.5GB of free disk space to go.
We ran the test and were expecting to finish the “alter table” with close to 30GB free again like this:
But to our surprise, we were greeted with less than 25GB. Even worse, while monitoring free disk space, we saw that we went to less than 22.5GB free! (and we fired a CRITICAL alert on a production system to top it off).
In other words; had we gone with the first idea on the whole 50GB, we would have needed around 150GB of free disk space where we barely had 30GB.
Compressing a MySQL table – pitfalls
Let’s see the different reasons why our space requirements were tripled during the process instead of being reduced.
1) Alter Table usually does a copy of the table.
Well, we expected that one, and it did not sound that bad. We had 27.5GB of free disk space. The gzip of the mysqldump of the whole table used 4.5% of the space. That ratio would give us only 115MB in size only. Right? Well… Not exactly…
2) ROW_FORMAT=COMPRESSED is in no way as efficient as gzipping the entire table.
Although it is true that both use the same compress algorithm (zlib), InnoDB does not apply the algorithm on the whole MySQL table.
It applies decompression row by row and field by field individually. So your space gains are way smaller. It all boils down to how many rows you have and how long the fields are.
Also, the way long columns or fields are stored in InnoDB depends on how big each row is and on the KEY_BLOCK_SIZE parameter for the table. Too big of a KEY_BLOCK_SIZE can be suboptimal in some cases.
In our case, the average “big_text” field was only slightly bigger than 4KB. With the default KEY_BLOCK_SIZE of 16KB, our table still used 93% of the original space. Ouch!
The recommendation is to do some testing with some real data to see what you will achieve. Do it, you will not regret it.
3) Binary Logs
If you really care about your data, you have safety mechanisms to avoid losing any of it, right?
One nice and simple enough mechanism of MySQL is Master/Slave replication, where a separate system runs a MySQL Slave receiving all the updates done on the Master. If you have this going, then you have Binary-Logs. These are the files where all the changes on the master are written before they are sent to the slave.
Even if you do not have Binary-Logs for Master/Slave replication, you may still have them to allow for point-in-time recovery.
The thing to bear in mind is that you have to be careful about them and how you maintain them. And what matters most in our case. They are not compressed. At all.
So here you have it. Three different ways the process consumed disk space that we either did not foresee or did not expect to have such big impact.
Fixes for these pitfalls
1) Alter table space
The creation of a new table done by “alter table” can seldom be prevented, and in this case, it was not possible. We were on MySQL5.5 so the “ALGORITHM=INPLACE” option was not available.
2) Compression space
To achieve better compression, we tested different values of KEY_BLOCK_SIZE with our 5% sample. These were our results;
So we settled for a 2KB KEY_BLOCK_SIZE, which provided the maximum disk space benefit, which was our goal.
Note: selecting 8K could have saved roughly the same amount of disk and perhaps result in better performance.
3) Binary-Logs space
To avoid problems with Binary-Logs, we kept deleting them manually. For that, we used:
on the slave. It shows which Binary-Log file from the master is being processed on the Relay_Master_Log_File value.
Once the slave had jumped to a new Binary-Log file, we issued a
on the master. We made sure that we kept Binary-Logs on the Slave, just in case disaster stroked in the worse moment.
With all this, we proceeded to compress the MySQL table. We were able to do the whole process with still 8GB to spare at the most critical point and as you can see: