Friday, November 19, 2010

MySQL: enable innodb_file_per_table with zero downtime

I thought that while my wife is preoccupied with the lemon pie, I might tell you this story.

InnoDB is a very good storage engine for MySQL that combines reasonable performance with wide popularity and, as a consequence, a good set of tools for diagnostics and fine-tuning. One of its downsides is that it is inefficient when it comes to the disk space management. While an extent of HDD space was added to the storage, InnoDB will not return it back even when you delete tables or databases. To add some flexibility, you should use innodb_file_per_table option. Unfortunately, if you have a running database, you cannot just enable this option. You will have to make a dump of the database and restore it on a new instance of MySQL with the option enabled from the very beginning. This scenario means that the database will be inaccessible from the moment you start mysqldump to the moment you finish restoring the data in the new instance. Is there a way to minimize the downtime?

Yes, you can run mysqldump on a backup of your database. But, then you lose the data written to the database from the moment you make the backup to the moment the new instance is ready. But that's a bit closer to the solution. You can also set up replication between the original database and the new one and then, when the new instance catches up with the old one, your task is completed. And the backup can be done online, without stopping MySQL, if you use Xtrabackup tool by Percona.

So, the basic steps you have to follow are:

  • Configure your original database as master. Unless your database is already using binlogs for security, this is the only step that will require restarting MySQL.
  • Make a backup of the original database using Xtrabackup.
  • Restore the backup and run a second instance of MySQL.
  • Run mysqldump on the second instance.
  • Stop the second instance, but do not delete it yet.
  • Create a new database and start the third instance of MySQL with the enabled option innodb_file_per_table.
  • Restore the dump by feeding it into the third instance of MySQL.
  • Configure the third instance as slave and run the replication.
  • When the initial replication finishes and the slave catches up with the master, reconfigure your clients to use the new instance.
  • That's it. You can stop the first instance now and delete it.

I wrote an even more detailed guide illustrated with example commands. It was published on recently: HOWTO: Reconfigure MySQL to use innodb_file_per_table with zero downtime

Friday, November 12, 2010

How not to write scripts

Today, I've seen a PHP script that used 41.5 Gb of virtual memory. 26 of them were carefully put to the swap. And I recalled the programs we were writing only 15-20 years ago. Like, a text editor that could process gigabyte text files (in theory, because there were no disk drives to store such files). Or a graphic viewer that could show pictures ten times larger than the amount of available RAM.

As a friend of mine has put it, in the USSR, when the engineers were sent to kolkhoz to gather in the crop of potato, they knew what might happen to them if they didn't work well. Wish some of modern developers knew that, too.

Wednesday, November 3, 2010

Hardware RAID? Software RAID? Both!

I have received a HP server with Ubuntu installed by someone else. There is Smart Array RAID controller installed there with cciss driver. And at the same time, there's device mapper configured:

$ ls -l /dev/mapper/
total 0
crw-rw---- 1 root root  10, 59 2010-08-25 19:01 control
brw-rw---- 1 root disk 251,  0 2010-08-25 19:01 okd-root
brw-rw---- 1 root disk 251,  1 2010-08-25 19:01 okd-swap_1
$ ls -l /dev/cciss/
total 0
brw-rw---- 1 root disk 104, 0 2010-08-25 19:01 c0d0
brw-rw---- 1 root disk 104, 1 2010-08-25 19:01 c0d0p1
brw-rw---- 1 root disk 104, 2 2010-08-25 19:01 c0d0p2
brw-rw---- 1 root disk 104, 5 2010-08-25 19:01 c0d0p5
$ mount
/dev/mapper/okd-root on / type ext4 (rw,errors=remount-ro)
/dev/cciss/c0d0p1 on /boot type ext2 (rw)

What I'm certain about is that hardware RAID is a must on my servers. Its battery-backed cache can provide a higher performance than that of software-only RAID. But should I get rid of the device mapper and LVM based on it?

From what I found in various sources, I thought I would better leave it as it is.

Firstly, the performance penalty is negligible. Secondly, LVM allows for more flexible volume management (like, removal of a PV, moving PEs to the free space in the VG). Thirdly, snapshots. And, finally, portability of LVM volumes between incompatible hardware (this won't work for me, I'm afraid, because of the underlying hardware RAID1).

Besides, this LVM over RAID approach seems to be a common thing now and nobody seems to have complained about it... :)

Well, anyway, I understood that Linux software RAID is one more area I should learn more about.