Monday, August 18, 2008

A Vista on Desktop Search

Back in May 2006 I wrote an article in my Notes Tone Unturned blog about the promise of better search being offered in Windows Vista, which was then only at BEta 2 testing stage. The original post is at http://notestoneunturned.blogspot.com/2006_05_01_archive.html and I've decided to reproduce it below for completeness of this blog, dedicated as it is to search.

I reckon that this improved (over Windows XP) search feature has indeed turned out to be one of the best enhancments in Vista, change for the sake of value -- rather than change for the sake of change quite a few of the other things in Vista have turned out to be.

What do you think, now that Vista has been released for a year and a half?

- - - - - - - - - - - - - - - ORIGINAL POST - - - - - - - - - - - - - - - - - - - -

It's the end of May 2006, and Microsoft have recently released the new Beta 2 test version of the Vista operating system (a.k.a. "Windows NT 6.0"). If you're short of reading matter, here's a link to the Windows Vista Beta 2 Product Guide available as either a mere 60MB Word document; alternatively in Microsoft's new XPS format, viewable in Vista or the free XPS reader (which I couldn't locate).

Good luck to Microsoft, they'll need it since it seems Vista still needs significant clean-up work done to address various issues like the ones mentioned at longhornblogs.com (notebook PC running hot, low battery life, driver problems, large disk space consumption, etc). They have four or five months to clean it all up before they get to the Release To Manufacturing stage late this year (in order for product launch early in 2007). It looks like they'll need every day of beta testing to sort out the myriad driver problems, and others. I'm glad that it's not me!

Windows NT4.0 operating system suffered badly from failures (the infamous "Blue Screen Of Death - BSOD") caused by poorly-written device drivers. With its successor Windows 2000 ("Windows NT 5.0") Microsoft introduced a device driver validation program, and that seemed to markedly improve the reliability. My experience is that Windows XP ("Windows NT 5.1") improved on that again, to the point where BSOD happens very occasionally (a few times per year), so well done Microsoft!

At this late stage in the Vista beta cycle, it seems curious if not inexcusable that device manufacturers are not rushing to update their device drivers and ensure that they're all included in the betas. (See Vista Beta 2: The Return of Driver Hell and Driver Hell Avoided... For Now )

For me, with the sorts of things I mostly do, Windows XP works quite well and Office 2003 has far, far more functionality than I need. While I rushed to install Windows XP at the earliest opportunity, I'm not yet sure if there's a convincing argument to do the same with Vista (and the same for Office 2007, even though it has some nice usability improvements). I expect that corporates will be even more reticent. Interesting times ahead for Microsoft to convince the vast masses to upgrade. It will be a hard sell. I wish them well.

If Vista (and Office 2007) prove to be so compelling enough to justify the upgrade costs (money, time, effort, learning curve, frustration) then we may all have benefited. Time will tell. From what I've seen and read so far, I'm happy to stick with Windows XP for quite a while because it's very reliable and "good enough" for what I want to do.

Here's a review by the CRN Test Center Windows Vista Beta 2: An Improvement?

And yet another one, by Dr Dobb's Journal: Windows Vista Beta 2: Great Search, Improved Security, Hardware Snags This review has a glowing report on the new built-in search capabilities of Vista, and if it's as good as they say then this alone might convince me to do an early switch from XP to Vista. And here's the reason ...

MY QUEST FOR DESKTOP SEARCH:
Over the past year or so, I've been testing or researching (via demo or reading) a number of "desktop search" products for personal "power user" activities. These have included: Copernic Desktop Search, Blinkx (no longer available for download, it seems), X1, Intellext Watson, dtSearch, ISYS, Verity Ultraseek). I wanted to discover the "best" one that:
  • Is free, or quite inexpensive.
  • Allows its index to be stored other than on the C: drive (to keep the C; drive as small as possible, for backup/recovery purposes). Since I have well over 10GB of files to index (more than 5GB of IBM Redbooks PDF files alone), this means an index size of considerably more than 1GB (perhaps even in excess of 2GB), which is definitely something to keep well clear of the C: drive.
  • Does not crash when indexing files of all types (Copernic and dtSearch failed this test badly, and I was unhappy to waste time debugging their indexer engines)
  • Makes it easy to select the drives/folders/filetypes that are to be indexed (no tiny, inconvenient, fixed-size windows that have to be once for each choice).
  • Allows you to set indexing to run at low priority so as not to interfere with other desktop activities.
I was a bit surprised myself about it, but for me the winner out of all these was Windows Desktop Search (MSN version is here, Enterprise version is here) in conjunction with a range of free Ifilter add-ins from Citeknet to handle PDF files, ZIP files, etc (see also: More MSN Search Toolbar Search Add-ins ). You could do far worse than this combination. Feel free to contact me if you want to find out more details about my experiences with the various desktop serch products that I tested.

If you're interested in desktop search, then take a look at this recent Dr. Dobb's Portal article: True Desktop Search - "Finding what you want when you want it is often easier said than done. Luckily the lines between the desktop and the Web are blurring—and the race is on for the best desktop search tool."

Monday, May 19, 2008

Slow progress in this blog?

Folks, I started this blog off well over a month ago with an initial posting, and it would appear that I haven't been doing anything since!

Well, on the surface of it that's true. But I can assure you that I have indeed been working on things in the background. Progress is slow because (a) I have other blogs to work on, such as itWire and (b) because I have hundreds of gigabytes of files on my desktop system. it takes days to re-index them and this alone is slowing down my progress.

One of the products that I've been testing is Copernic Desktop Search, or CDS for short. Late in 2007 they released a corporate edition (they still have a free edition, which is what I've been using for years). As a commercial business -- Asia/Pacific Computer Services -- I decided to be honest, and so purchased a license for the CDS corporate edition which is what I've been testing since late 2007.

Unfortunately, the corporate edition has, in my case at least, so far proved to be no more stable or reliable than the free edition. The problem that I originally encountered, starting several years ago, is that the indexing engine proved incapable of scanning every type of file and would "stall" on certain files: that is, it would come across a certain file (such as a Eudora mailbox file, or some PDF document or other) and just sit there chewing up CPU cycles and not moving on to other files. The one advantage that I've found as a paying subscriber to the corporate edition is that (at last) the Copernic technical support team has been prompt in trying to assist me (for which I thank them).

Earlier in 2008 an update to CDS was released which seems to have resolved the indexer "stalling" issue, and I was able (after about four days of indexing activity) to use the search function knowing that all the underlying files had been indexed -- at least, I presume so!

The a month or two ago, quite out of the blue, something happened and the entire CDS index got reset to zero indexed documents (when it should have remained well over 300,000 documents). I don't know what it was that caused this catastrophic failure: I suspect that it was occurred when coming out of Windows XP stand by mode, but am not at all sure.

After having spent at least three or four days at a time rebuilding the index during the last year or two, for similar reasons -- quite a few days in total, I'm sure you will agree -- I got so discouraged/frustrated with CDS that I left gave up on it for a couple of months and put some time into studying both X1 Desktop Search and dtSearch Desktop search (more about them in other postings).

Then last week along came a newsletter alerting the release of CDS version 2.3 (build 23). After some prevarication, I installed this new version late last week and started it off building an index from scratch. Am I a glutton for punishment or am I not?

After running for 10-12 hours per day for over three and a half days, the indexer seems to have nearly reached the final stages. What a relief! Will there be another catastrophic index failure again? I suppose only time will tell. Whatever happens one way or the other, I'll report it in this blog.

During the indexing process you can put up an Indexing Status panel like the following:


(Click to view a larger image)
(Click to view a larger image)

Compared with all the other desktop search products, this is a well-presented way to follow the progress of indexing. Some people will not be in the least interested in doing so, but others certainly will, especially when a problem document is encountered or the indexing progress takes days and days. A nice feature of this panel is that you can drag the bottom right corner in order to see more files names (vertically) at one time and/or longer file names (horizontally). From what I remember, none of the other products allow you to do this.

What's wrong with this progress display? Several things, the first of which is the fact that the title bar simply says "Index Status" (circled in red) when surely it should say "Copernic Desktop Search - Index Status" to differentiate it from all the other windows that you have open. What a silly omission! And in my case it becomes even sillier, since I use the excellent free WinRoll tool to collapse windows like this when I want to free up valuable screen real estate, like so:
(Click to view a larger image)
(Click to view a larger image)

Patently, this collapsed window "Index Status" might be referring to other running tasks, not just this one from Copernic.

Then there's the fact that indexed file names annoyingly roll off the top of the panel, and you have to keep sliding the vertical scroll bar downwards (circled in green) to see the names of the most recently indexed files. For some reason that's beyond me, after you have done this for a while it seems to lock into position and you don't have to keep scrolling down any more. I have recommended to Copernic that they alter this behavior so that the most recently indexed files are kept in view at all times. (The dtSearch indexer does this, so why not CDS too?)

A much more important deficiency in the CDS indexer is the fact that you are totally in the dark about how long the indexing job will take. You wonder if it the job will take minutes, hours, or days; then you wait, and wait, and wait... As I said, in my case it was over 3 days! In comparison, the dtSearch indexer gives a rough estimate of the hours and minutes it will take, and the X1 indexer gives indicates the percentage of files indexed so far. Either or both of these should be added to the CDS window, say in the area at bottom circled in brown.
Finally, when you have the window open it would be extremely useful to have a Pause/Resume toggle button, for example in the area at bottom circled in blue. As it stands, you have to shift focus and right-click on the CDS icon in the system tray then select the "Pause Indexing" option, which is not very convenient.
Anyhow, it seems that the CDS indexer has finished its main run, and it's now showing a status of "Idle" -- until a file is added, deleted, renamed, etc, whereupon the indexer will update the index to take account of this within about 15 seconds, because I have configured the extremely nice On the fly indexing feature. (The others force you to run batch indexing tasks, which is not anything like as convenient as Copernic's on the fly indexing.)

Thursday, April 3, 2008

Concerning Desktop Search ... a Preamble

This is a new blog, started in April 2008, in which I hope to informally discuss my experiences with desktop search products.

While I have extensive experience with search products on various enterprise platforms going back as far as the 1970s, I'll be restricting this blog to those desktop products available for the Microsoft Windows platforms.

I'm still using Windows XP Professional, although I've had Vista Business available to me for a year or more, and why I haven't switched over from XP is sufficient for a series of articles in itself! The search built into Vista is rather nice, indeed it's one of the few features that would encourage me to upgrade from XP sooner rather than later.

Over the past five years or so, I have investigated a range of desktop products: those built into Windows itself (a.k.a. Windows Desktop Search), Copernic Desktop Search (since its earliest releases), Blinkx Desktop Search (now, only an online search is available from Blinkx), Exalead, X1 Desktop Search, dtSearch, IBM OmniFind Yahoo! Edition, Ultraseek, Intellext, and others.

It has cost me tons of effort to try out these products, not the least of which is the hours and days spent building search indexes as I've moved from one product to another -- not to forget the painful rebuilding of indexes when some of them failed for one reason or another. I now have many tens of Gigabytes of files, and building a new index takes days to finish (even with my quite fast dual-core system, which has no lack of main storage, and with the index being placed on a dedicated disk drive to minimize disk arm contention).

Some of these products are free, some come in both free and retail versions (the latter usually having more features), while some of them have been withdrawn.

I have four other blogs to populate, a website to maintain, and all sorts of other distractions, so can only afford the time to make relatively informal posts in this blog: not extensively detailed comparisons of the various products. Yet I intend that what I do have time to report will be accurate and useful enough that it will assist readers who are seeking a desktop search solution.