Friday, December 25, 2009
Google Chromium
Tuesday, December 8, 2009
Problems with Lotus Notes 8.5 under Ubuntu Karmic Koala
The problem is deep inside Notes (and Ubuntu). Go here: Ubuntu 9.10 Lotus Notes 8.5 grab a TGZ archive with four shared libraries and copy the libraries to /opt/ibm/lotus/notes, start Notes and do something useless there.
Sunday, November 29, 2009
Desktop search engines compared
Intro
I have a large electronic library (over 15,000 books) and I was looking for a way to cope with this mass of information. I didn't like the idea of a special catalog, since it would take a lot of manual work to enter the metadata. Besides, my books are in various formats, from HTML to RTF, to DOC, to PDF, to DjVU. These files lack metadata way too often and I thought a local indexing service with a full-text search might solve my problems. I knew there are more options to choose from than just Google, but I could not find a good modern comparison. I had to compare them myself. Even the table in Wikinfo's Comparison of desktop search software contained too many errors, as I found.
My task imposed certain restrictions on the one hand, but made the others irrelevant on the other hand. So, I was especially interested in a wide gamut of file types, in the ability to add new ones (Epub, fb2, html.zip) and in extensive query language. All software, except for GDS and DocFetcher, was installed from Ubuntu 9.10 repositories.
I have no special preferences regarding the backend, it may be Xapian- or Lucene-based tool, or even a custom backend. On the other hand, Xapian usually requires more disk space, and there is never too much space on desktops.
Beagle
http://beagle-project.orgThe list of supported file types is quite large, it includes typical office files, source code, LaTeX source, images, audio and video files, RPM and DEB packages, e-mail from Evolution, Thunderbird and Kmail, IM and IRC logs, RSS feeds and many more (see here: http://beagle-project.org/Supported_Filetypes) and you are free to extend it. I could add new file types by editing one file, /etc/beagle/external-filters.xml.
Indexing process can run in two ways: CPU-lenient and CPU-intensive (using EXERCISE_THE_DOG environment variable). The search engine is based on Lucene.Net. I have no idea why the developers chose this exotic platform to implement Beagle, but Beagle works, and it works well.
Beagle understands limited (very limited, actually) regexps (*). You can search for phrases, exclude words (-word), use OR operator, specify dates when the file was created (on, before, after and between!), limit the search with a file type and define the directory where to look for the files. Unfortunately, you cannot point at the directory under which Beagle should search.
You can even use the metadata of audio and image files, as in the examples from the manual:
artist:Beatles ext:mp3 OR ext:ogg -album:"Abbey Road"
You can specify to search in mail attachments, to search by music genres, mailing lists, IM correspondents and much, MUCH more.
Beagle tends to create huge log files in ~/.beagle/Logs.
Beagle has a web interface. It's very easy to start using it, but not so easy to make use of it, since the alleged links to the results are not exactly links.
Beagle web-site includes information on the query syntax and extending Beagle, but finding the information is next to impossible unless you use Google.
Description of query syntax is here: http://beagle-project.org/Searching_Data
The index for a 45Gb home partition was only about 700Mb.
Google Desktop Search
http://desktop.google.com/linux/Supports OpenOffice and MS Office files, PDF, HTML, TXT, audio and image files and email from Thunderbird. Strangely enough, it does not index zipped archives.
I could not add new file types, not even plain text with a different extension. I was pretty sure that GDS supports stemming, but not regexps. To my surprise, stemming did not work in GDS. Nor did regexps. It does not even support AND and OR keywords.
Otherwise the query syntax is acceptable. You can point at the directory where the file you are looking for is located, or the directory, under which the file is supposed to be. You can search for phrases or exclude words.
I was using GDS for some years and it works great as long as you use it in the way Google intended it to work. While suitable for and average office cubicle, it was next to useless for my purposes.
Index size was about 1.7Gb for 50Gb of data.
Recoll
http://www.lesbonscomptes.com/recoll/A large number of file types is supported natively, including plain text, HTML, maildir and mailbox files, OpenOffice, MS Office 2007, Abiword, LyX, Kword and Scribus files and GAIM logs. Many more are supported with external helpers: DOC, XLS, PDF, DjVU, MP3, image files and so on. Feel free to add to the list, it's easy: one file establishes associations between extension and mimetype, another one specifies how the data is extracted from a file of a certain MIME type, and the third one defines applications used to open MIME types.
Recoll is built around the Xapian engine.
I had an impression that the indexing process takes much longer with Recoll than with the other tools. When indexing RTF with unrtf, Recoll created a heap of WMF files in my home directory. Recoll has no indexing daemon that would run in the background all the time. Instead. Recollindex is to be launched from time to time (with cron, for example).
The manual mentions stemming support, but also points that this is done the other way round. Stemming is not included in the database, as in other indexing engines, but the query is stemmed instead. Unfortunately, my version gave different results when searching for plural 'notebooks' and singular 'notebook', so, I assume, stemming does not work in my installation of Recoll. Recoll understands regexps pretty well, which to a certain degree compensates for the problems with stemming.
Rich query language, modeled after Xesam End User Search language (see here). Like with Beagle, you can use the dir: prefix to limit the search path to one directory, but you cannot specify a directory tree. Alas! Other useful prefixes include title, author, ext (for file type), etc.
The search client, recoll, is a GUI program, but with the -t option it runs in text mode. It means that instead of specifying a directory tree, I can just grep the results for a string, like this:
recoll -t -q \"jack london\"|grep /library/fiction/adventure
Note that for the command line client, you have to escape quotation marks to denote a phrase search.
Recoll, unlike some other tools, has a decent user manual, containing information on query syntax and adding support for new file types.
The index size threw a damper on me. For a 50Gb home directory it was more than 5Gb.
Strigi
http://strigi.sourceforge.net/Strigi supports regular expressions. Theoretically, Strigi should support plain text files, PDF, DEB and RPM packages, OpenOffice documents and zipped files. Besides, Strigi was the only program that successfully indexed EPUB files without customization, interpreting them as just plain ZIP-archives with HTML, NCX, etc. inside.
There's little I can say about this program. The daemon kept crashing when I tested it so I could not even finish building the index for my home directory. The client erroneously classified a lot of hits as being "email".
The incomplete (?) index size was about 750Mb.
Tracker
http://projects.gnome.org/trackerTracker is a part of GNOME project and it tries to adhere to various useless technologies, like DBus. Tracker introduces the concept of file tags thus overcomplicating the task of file management. I admit that the notion of file tags might be reasonable, but only if it is supported universally, if tags are freely backed up, copied, etc. Now, fortunately, the tags are not obligatory for Tracker
The full list of supported file types is unavailable, but the web-site talks about image, audio, video, text files, source code, applications, playlists, IM converstaions and so on. No email, nor bookmarks, nor contacts as yet, though. The indexing daemon would segfault occasionally and I could not finish indexing.
As a matter of fact, Tracker was designed as a metadata search tool (and its full name is MetaTracker), but the normal use case is just full text search. Tracker was written to work well even on machines with 128 or 256 Mb RAM. Judging by the slowness of indexing, this statement could be true. I was wrong, Recoll was not the slowest indexer, it was Tracker.
I could not find a good user manual.
DocFetcher
http://docfetcher.sourceforge.net/en/index.htmlSupported file types: HTML, plain text, PDF, Microsoft Office (doc, xls, ppt), Microsoft Office 2007 (docx, xlsx, pptx), OpenOffice.org Writer, Calc, Draw and Impress, RTF, AbiWord (abw, abw.gz, zabw), CHM, Visio, SVG.
Written in Java. Fast and CPU-sparing indexing. DocFetcher comes in two flavors: a binary installable package and a "portable" version, which you can run right from your home directory.
DocFetcher supports regular expressions (at least * and ?). Phrase search, AND and OR keywords, search in content or in metadata: author and title fields are supported. Does not index zipped files. It is easy to add new filename extensions that are treated as yet another text file or HTML, but I could not add a new file type which is to be treated in a special way. For me this means that I cannot process custom XML to convert the content to the proper charset. It's a problem.
An interesting query feature is boosting terms: "You can assign custom weights to words, thus increasing or decreasing the level of matching for a particular document if the weighted word occurs in it. This allows you to influence the relevance sorting of the result page. Example: dog^4 cat will bring up the documents with "dog" in it on the top of the result page."
The manual can be found in the downloaded archive, but it is very brief.
Pinot
http://pinot.berlios.de/Like Tracker and Strigi, Pinot is built for D-Bus. Its indexing engine uses the same Xapian engine as Recoll, so I could use Pinot text-mode client to query the database built by Recoll indexer. Pinot can use other databases, but I was not interested in this option. The crawler takes a huge share of RAM and CPU. It ate up 70% of RAM on my PC, causing some other programs to crash, so I had to leave it for a night to complete indexing.
The documentation consists of one Readme file and a couple of web-pages. Quoting these web-pages, "The following document types are supported internally :
- plain text
- HTML
- XML
- mbox, including attachments and embedded documents
- MP3, Ogg Vorbis, FLAC
- JPEG
- common archive formats (tar, Z, gz, bzip2, deb)
- ISO 9660 images
- PDF (pdftotext required)
- RTF (unrtf required)
- OpenDocument/StarOffice files (unzip required)
- MS Word (antiword required)
- PowerPoint (catppt required)
- Excel (xls2csv required)
- DVI (catdvi required)
- DjVu (djvutext required)
- RPM (rpm required)
Indeed, new file types are defined in the file external-filters.xml very similar (but not identical, Pinot developers warn) the the file with the same name used by Beagle.
I have to say that these external programs made indexing of PDF, RTF and other files a difficult task. Indexing a PDF document took up to two minutes.
Conclusion
Recoll and Pinot may be considered good alternatives to Beagle, but the size of the Xapian index database leaves just one choice for me, Beagle.
Tuesday, November 24, 2009
The account has been restricted. Please, contact your system administrator
So, the boss tries to log on, gets the message "The account has been restricted. Please, contact your system administrator", contacts me and what am I supposed to do? I can log on as the local administrator, I can check the registry, launch McAfee, but to no avail. Then I find a solution offered by Dizzy, then I think it's nonsense and try to fix the problem playing with permissions, domain accounts and so on. Then, finally, I remove the computer from the domain, then join it to the domain back, and it works! Why? What happened? No idea.
And then this wishy-washy rascal who happens to be my boss, insists that I grant him administrator privileges on his notebook? Ha-ha!
Tuesday, October 13, 2009
Readability+Opera+StumpWM
Last time I wrote about using Readability bookmarklet in Opera running under StumpWM without mouse. The straightforward solution had a lot of deficiencies. One of the worst ones was that it didn't work in full-screen mode. Here's a much better way. Opera has a number of command-line switches (listed here). One of them, openURL, allows us to open a URL in the current window. So, if we put the bookmarklet code into the URL field, it will do the trick:
(defcommand readability () () (run-shell-command "opera -remote \"openURL(javascript: (function(){readStyle='style-novel'; readSize='size-medium'; readMargin='margin-medium'; _readability_script=document.createElement('SCRIPT'); _readability_script.type='text/javascript'; _readability_script.src='http://lab.arc90.com/experiments/readability/js/readability.js?x='+(Math.random()); document.getElementsByTagName('head')[0].appendChild(_readability_script); _readability_css=document.createElement('LINK'); _readability_css.rel='stylesheet'; _readability_css.href='http://lab.arc90.com/experiments/readability/css/readability.css'; _readability_css.type='text/css'; _readability_css.media='screen'; document.getElementsByTagName('head')[0].appendChild(_readability_css); _readability_print_css=document.createElement('LINK'); _readability_print_css.rel='stylesheet'; _readability_print_css.href='http://lab.arc90.com/experiments/readability/css/readability-print.css'; _readability_print_css.media='print'; _readability_print_css.type='text/css'; document.getElementsByTagName('head')[0].appendChild(_readability_print_css); })();)\""))
Now, this Stump command will work even in full screen mode. It will even work when the Opera window does not have the input focus.
Of course, the code for any bookmarklet may be supplied instead of Readability.
NB: Just like before, the lines containing the bookmarklet code should be merged in one line before cutting and pasting the above snippet!
Monday, October 5, 2009
Readability bookmarklet, browsers and StumpWM
Recently, I was looking for a way to get rid of the web-design while surfing the Net. Some most obvious ways, like banning custom fonts in Firefox work nice, while banning custom colors makes reading Gmail and Google Reader extremely unpleasant.
In Firefox I use TidyRead extension regularly. In the rare cases when TidyRead fails to do its job in the way I expect it to do, I use Readability bookmarklet. Readability, though, has a huge deficiency of being a bookmarklet. I mean, to use it you have to grab the mouse and click it. Of course, one could use keyboard to select Bookmarks menu and press twenty four arrows to get to the bookmarklet, but, of course, this is not what I would like to see. Especially because I love StumpWM window manager which is known for the disregard of mouse-like pointers. I needed a way to call the bookmark from the keyboard.
Here's what I came up with. Below is a piece of my .stumpwmrc file:
(defcommand readability () () (send-meta-key (current-screen) (kbd "C-l")) (window-send-string "javascript:(function(){readStyle='style-novel';readSize='size-medium'; readMargin='margin-medium'; _readability_script=document.createElement('SCRIPT'); _readability_script.type='text/javascript'; _readability_script.src='http://lab.arc90.com/experiments/readability/js/readability.js?x='+(Math.random()); document.getElementsByTagName('head')[0].appendChild(_readability_script); _readability_css=document.createElement('LINK'); _readability_css.rel='stylesheet'; _readability_css.href='http://lab.arc90.com/experiments/readability/css/readability.css'; _readability_css.type='text/css'; _readability_css.media='screen'; document.getElementsByTagName('head')[0].appendChild(_readability_css); _readability_print_css=document.createElement('LINK'); _readability_print_css.rel='stylesheet'; _readability_print_css.href='http://lab.arc90.com/experiments/readability/css/readability-print.css'; _readability_print_css.media='print'; _readability_print_css.type='text/css'; document.getElementsByTagName('head')[0].appendChild(_readability_print_css); })(); ") (send-meta-key (current-screen) (kbd "RET"))) (define-key *top-map* (kbd "s-p") "readability")
NB! I had to split the long line into a number of shorter ones. Merge them together to make the command work!
The first lines defines a new command which behaves just like you would: "presses" C-l to move the pointer to the address bar of the browser, pastes the full text of the bookmarklet and presses Return. That's it! And the last line maps the new command to a keyboard shortcut super-p.
Monday, September 28, 2009
Group Policies Gone
Today, a number of my LAN users lost their O: and P: drives, which were mapped to the shares on my old Windows 2000 Server. The drives are mapped in a logon script, defined in a group policy object. When I checked the GPOs, they were not loaded.
Next, I checked the event log on the domain controller and there were numerous Event ID 1000 messages. Some said that "Windows cannot query for the list of Group Policy objects", while others insisted that "Windows cannot access the file gpt.ini for GPO The file must be present at the location <>. (). Group Policy processing aborted."
What I did next was to open the GPO editor and check the GPO, of course. They were there, all four of them, including the default domain policy, but I found I couldn't edit them. All I got was the dialog box saying "Failed to open the Group Policy Object. You may not have appropriate rights."
The GPO permissions were OK. The next step was running gpotool. To my surprise, the only response I got from the tool was "DC list is empty". I had an impression that the AD is down, but it wasn't. From what I could find with Google, DNS could be the point of failure, but DNS was working well, like everything else. Another tool I tried to use at that moment was netdiag, but it gave no results, things were up and running.
I checked whether SYSVOL was accessible from the workstations and it was. The permissions on directories and files in SYSVOL were OK, but some files were missing. So, the domain\Policies was pristinely empty. I tried to create a new policy in the GPO editor and the corresponding directory appeared in sysvol\Policies.
So, I made sure I could recreate the GPOs from scratch, but I didn't have the Default Domain Policy. I found a tool to recreate it, Windows 2000 Default Group Policy Restore Tool. I didn't run it, though. Instead, I decided to compare the contents of my SYSVOL and that in the backed up system. Of course, I found the old policies in the domain\Policies and simply copied them into the corresponding directory. The immediate result was that gpotool could run and produce some meaningful results. So, it complained about a missing policy, but the old ones were there. Altogether, gpotool found seven policies instead of five actually present (the default one, three GPOs I had defined and one more which was created five minutes ago), but it was more or less OK.
So, now I have the default policy and I can use the GPO editor to recreate my old policies. There were only a few of them and it shouldn't take long.
Thursday, September 10, 2009
Keyboard navigation addons for Firefox
As I wrote before, Conkeror doesn't support many useful Firefox extensions. I thought I should go back to Firefox and check a number of Firefox addons for surfing with keyboard: mozless, Hit-A-Hint, NumberFox, nomouse. Much to my surprise, neither of them worked with Firefox 3, all of them were outdated. The only one that worked was Mouseless Browsing, but it lacks some features. To begin with, the link numbering cannot be turned off.
So, if four extensions out of five are abandoned, does it mean that the mouseless browsing in Firefox turned out to be too uncomfortable or even completely impossible?
Tuesday, September 8, 2009
Conkeror once more
I've learned to move focus away from the input fields and I found a replacement for TidyRead. See the new configuration file below (unfocus and readability).
// new webjumps: define_webjump("gread", "http://google.com/reader"); define_webjump("gmail", "https://mail.google.com"); define_webjump("youtube", "http://www.youtube.com/results?search_query=%s&search=Search"); define_webjump("del", "http://delicious.com/search?p=%s&lc=0&atags=&rtags=&context=userposts%7cminaev%7c"); // show history in the url bar: url_completion_use_bookmarks = false; url_completion_use_history = true; // my login at delicious: add_delicious_webjumps ("minaev"); // install addons from anywhere: session_pref("xpinstall.whitelist.required", false); // use tabs: user_pref("conkeror.load.tab-bar", 1); require("tab-bar.js"); // auto-save and auto-load session: require("session.js"); //session_auto_save_file = "/home/minaev/.conkeror.session"; session_auto_save_auto_load = true; session_auto_save_auto_load_fn = session_auto_save_load_window_current; //Open Middle-Clicked Links in New Buffers: require("clicks-in-new-buffer.js"); clicks_in_new_buffer_target = OPEN_NEW_BUFFER; // Borrowed from David Kettler (http://www.mozdev.org/pipermail/conkeror/2008-September/001129.html) interactive("toggle-stylesheets", "Toggle whether conkeror uses style sheets (CSS) for the " + "current buffer. It is sometimes useful to turn off style " + "sheets when the web site makes obnoxious choices.", function(I) { var s = I.buffer.document.styleSheets; for (var i = 0; i < s.length; i++) s[i].disabled = !s[i].disabled; }); // http://lab.arc90.com/experiments/readability/ interactive("readability_arc90", "Readability is a simple tool that makes reading on the web more enjoyable by removing the clutter around what you are reading", function readability_arc90(I) { var document = I.window.buffers.current.document; _readability_readStyle=document.createElement('SCRIPT'); _readability_readStyle.text = 'var readStyle = style-novel;'; document.getElementsByTagName('head')[0].appendChild(_readability_readStyle); _readability_readSize=document.createElement('SCRIPT'); _readability_readSize.text = 'var readSize = size-large;'; document.getElementsByTagName('head')[0].appendChild(_readability_readSize); _readability_readMargin=document.createElement('SCRIPT'); _readability_readMargin.text = 'var readMargin = margin-narrow;'; document.getElementsByTagName('head')[0].appendChild(_readability_readMargin); _readability_script=document.createElement('SCRIPT'); _readability_script.type='text/javascript'; _readability_script.src='http://lab.arc90.com/experiments/readability/js/readability.js?x='+(Math.random()); document.getElementsByTagName('head')[0].appendChild(_readability_script); _readability_css=document.createElement('LINK'); _readability_css.rel='stylesheet'; _readability_css.href='http://lab.arc90.com/experiments/readability/css/readability.css'; _readability_css.type='text/css'; _readability_css.media='screen'; document.getElementsByTagName('head')[0].appendChild(_readability_css); _readability_print_css=document.createElement('LINK'); _readability_print_css.rel='stylesheet'; _readability_print_css.href='http://lab.arc90.com/experiments/readability/css/readability-print.css'; _readability_print_css.media='print'; _readability_print_css.type='text/css'; document.getElementsByTagName('head')[0].appendChild(_readability_print_css); }); //interactive("new-tab", // "Open new tab", // alternates(follow_new_buffer, follow_new_window); // $browser_object="http://google.com"); define_key(content_buffer_normal_keymap, "M-q", "unfocus"); define_key(content_buffer_normal_keymap, "M-s", "toggle-stylesheets"); define_key(content_buffer_normal_keymap, "C-x C-s", "session-save"); define_key(content_buffer_normal_keymap, "F", "follow-new-buffer-background"); define_key(content_buffer_normal_keymap, "C-f", "follow-new-buffer"); define_key(content_buffer_normal_keymap, "d", "follow-new-window"); define_key(content_buffer_normal_keymap, "z", "readability_arc90"); // some Firefox keybindings: define_key(content_buffer_normal_keymap, "C-page_up", "buffer-previous"); define_key(content_buffer_normal_keymap, "C-page_down", "buffer-next"); //define_key(content_buffer_normal_keymap, "C-t", "new-tab"); define_key(content_buffer_normal_keymap, "C-w", "kill-current-buffer");
Monday, September 7, 2009
What is wrong with Conkeror?
- You have to change focus from input fields to something else if you want to run a command from the keyboard.
- I couldn't find a way to do what I used to do with the right click in Firefox, like copy the url, save image, etc.
- Too many extensions do not work in Conkeror. Most of all I miss Google Notebook, TidyRead and Flashblock.
Conkeror: Emacs-like browser
While a long-time user of Firefox, I'm always ready to have a look at the alternatives. Firefox works well, but it's too heavy. Some time ago I found a new Firefox extension called Conkeror, which added Emacs-like keybindings to Firefox. It was too unusual for me at that time and I ditched it. Since then I switched to tiling window managers (first Ion, now Stumpwm) and I thought that Conkeror might fit my new style of work.
It turns out that the new Conkeror is not an extension anymore, but a standalone browser packed with features. It's fully customizable (in JavaScript), extensible (in Python, AFAIU) and, of course, keybound all the way through.
Below is the initialization file I compiled on the very first day with Conkeror:
// shortcuts for Google Reader and Gmail: define_webjump("gread", "http://google.com/reader"); define_webjump("gmail", "https://mail.google.com"); // keybindings to open links in new buffer and new window define_key(content_buffer_normal_keymap, "F", "follow-new-buffer-background"); define_key(content_buffer_normal_keymap, "C-f", "follow-new-buffer"); define_key(content_buffer_normal_keymap, "d", "follow-new-window"); // show history items when typing URL: url_completion_use_bookmarks = false; url_completion_use_history = true; // use my delicious: add_delicious_webjumps ("minaev"); // allow installation of addons from any URL: session_pref("xpinstall.whitelist.required", false); // show Firefox-like tabs: user_pref("conkeror.load.tab-bar", 1); require("tab-bar.js");
There's a lot of things to borrow from other RC files, but the first steps were made.
I started by installing Conkeror packaged with Ubuntu 9.04, version 080629. This version could not find the location of the Conkeror init file till I explicitly pointed at the file in .conkeror.mozdev.org/conkeror/<profile>/prefs.js, where I added a new line:
user_pref('conkeror.rcfile', '/home/minaev/.conkerorrc');
The Ubuntu version did not support tabs, so I had to grab a Debian package from the Conkeror web site, version 090624.