<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Baydin Blog &#124; Email, Startups, and Search &#187; Technical</title>
	<atom:link href="http://baydin.com/blog/category/technical/feed/" rel="self" type="application/rss+xml" />
	<link>http://baydin.com/blog</link>
	<description>Baydin takes the work out of email.</description>
	<lastBuildDate>Thu, 24 Jun 2010 16:22:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>5 Sikuli Pitfalls (and how to avoid them!)</title>
		<link>http://baydin.com/blog/2010/06/5-sikuli-pitfalls-and-how-to-avoid-them/</link>
		<comments>http://baydin.com/blog/2010/06/5-sikuli-pitfalls-and-how-to-avoid-them/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 16:21:11 +0000</pubDate>
		<dc:creator>Mario Maldonado</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[machine vision]]></category>
		<category><![CDATA[MIT]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[sikuli]]></category>

		<guid isPermaLink="false">http://baydin.com/blog/?p=207</guid>
		<description><![CDATA[Introduction
Project Sikuli, a machine-vision research project from MIT, allows users to write scripts that automate UI-driven tasks. This tool, while very powerful and simple, can cause a number of headaches if you&#8217;re not careful. In this post, I&#8217;ll talk about some issues you might encounter, as well as how to avoid them. If you&#8217;ve never [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p><a href="http://groups.csail.mit.edu/uid/sikuli/" onclick="pageTracker._trackPageview('/outgoing/groups.csail.mit.edu/uid/sikuli/?referer=');">Project Sikuli</a>, a machine-vision research project from MIT, allows users to write scripts that automate UI-driven tasks. This tool, while very powerful and simple, can cause a number of headaches if you&#8217;re not careful. In this post, I&#8217;ll talk about some issues you might encounter, as well as how to avoid them. If you&#8217;ve never used Sikuli before, you can see demos <a href="http://groups.csail.mit.edu/uid/sikuli/demo.shtml" onclick="pageTracker._trackPageview('/outgoing/groups.csail.mit.edu/uid/sikuli/demo.shtml?referer=');">here</a>, or download the IDE <a href="http://groups.csail.mit.edu/uid/sikuli/download.shtml" onclick="pageTracker._trackPageview('/outgoing/groups.csail.mit.edu/uid/sikuli/download.shtml?referer=');">here</a>.</p>
<h3>5. Not <a href="http://sikuli.org/trac/wiki/reference-0.10#wait" onclick="pageTracker._trackPageview('/outgoing/sikuli.org/trac/wiki/reference-0.10_wait?referer=');">wait()</a>ing</h3>
<p>If you&#8217;re a long-time user of a particular program or operating system, you can probably describe many common tasks from memory. For example, I might describe how to access the &#8220;Uninstall or change a program&#8221; dialog in Windows 7 as follows:</p>
<ol>
<li>Click on the start menu</li>
<li>Click on &#8220;Control Panel&#8221;</li>
<li>Click on &#8220;Uninstall a Program&#8221; under the &#8220;Programs&#8221; heading</li>
</ol>
<p>If you&#8217;re new to Sikuli, you might be tempted to script this simple process like this:</p>
<p><a href="http://baydin.com/blog/wp-content/uploads/2010/06/UninstallWithoutWaits.png"><img class="alignnone size-full wp-image-208" src="http://baydin.com/blog/wp-content/uploads/2010/06/UninstallWithoutWaits.png" alt="How not to do it" width="230" height="128" /></a></p>
<p>You could then run your script and not notice a single issue for a long time. Then, one day, your computer is busy running multiple background tasks and you run your script again. The start menu takes a few seconds to display its contents, and your script raises an exception. The problem is that Sikuli doesn&#8217;t automatically wait for an image to be visible on-screen before trying to <a href="http://sikuli.org/trac/wiki/reference-0.10#click" onclick="pageTracker._trackPageview('/outgoing/sikuli.org/trac/wiki/reference-0.10_click?referer=');">click()</a> it, so if it doesn&#8217;t see &#8220;Control Panel&#8221; on your screen very soon after clicking on the start menu, it will raise an exception. In order to force it to behave properly, you&#8217;ll need to insert wait() statements:</p>
<p><a href="http://baydin.com/blog/wp-content/uploads/2010/06/UninstallWithWaits.png"><img class="alignnone size-full wp-image-209" src="http://baydin.com/blog/wp-content/uploads/2010/06/UninstallWithWaits.png" alt="A better way to do it" width="221" height="185" /></a></p>
<p>If you&#8217;ve added wait()s and you&#8217;re still having problems, try <a href="http://sikuli.org/trac/wiki/reference-0.10#wait" onclick="pageTracker._trackPageview('/outgoing/sikuli.org/trac/wiki/reference-0.10_wait?referer=');">setting a longer wait time</a>.</p>
<h3>4. Having trouble with context-sensitive/popup menus</h3>
<p>If you tried to follow along with the previous example, you likely ran into a problem when you tried to capture the &#8220;Control Panel&#8221; option in the start menu. After opening the start menu, switching focus back to the Sikuli IDE will cause the start menu to close, thwarting your effort to capture an image. Now, you could use PrintScreen while the start menu is open, paste the image into an image processor, and then use Sikuli to capture the image from the image processor, but thankfully, there&#8217;s a better way.</p>
<p>Sikuli installs hotkeys for common tasks like capturing an image (CTRL + SHIFT + 2 by default), and they don&#8217;t cause the current program to lose focus. So you can simply open the start menu/context-sensitive menu of your choice and use the hotkey to capture the screen. That way the menu won&#8217;t disappear in the process.</p>
<h3>3. click()ing when you should be <a href="http://sikuli.org/trac/wiki/reference-0.10#type" onclick="pageTracker._trackPageview('/outgoing/sikuli.org/trac/wiki/reference-0.10_type?referer=');">type()</a>ing</h3>
<p>Sikuli&#8217;s pretty good at finding stuff on-screen, but it&#8217;s still a costly and error-prone process. If you can navigate a user interface by emulating keystrokes rather than clicks, you&#8217;ll save yourself a lot of trouble. Typing also has the benefit of sending events sequentially to the current program&#8217;s <a href="http://en.wikipedia.org/wiki/Event-driven_programming" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Event-driven_programming?referer=');">event queue</a>, so if your program is stalling on something, your type() commands will wait for that task to be done, meaning you&#8217;ll run into problem #5 a lot less often. If you were to instead use click()s, you would have to tell your script in advance how long to wait() before it can take its next action, giving you inconsistent results if your machine is running slower than expected. Keep in mind that Sikuli can emulate <a href="http://sikuli.org/trac/wiki/reference-0.10#ModifierKeys" onclick="pageTracker._trackPageview('/outgoing/sikuli.org/trac/wiki/reference-0.10_ModifierKeys?referer=');">key modifiers</a>, <a href="http://sikuli.org/trac/wiki/reference-0.10#SpecialKeys" onclick="pageTracker._trackPageview('/outgoing/sikuli.org/trac/wiki/reference-0.10_SpecialKeys?referer=');">function keys, arrow keys, etc.</a>, so you can specify some pretty complex interactions using only type() commands.</p>
<h3>2. Forgetting that you&#8217;re using Python</h3>
<p>Sikuli is a powerful and flexible tool, but remember that Sikuli scripts are written in an incredibly powerful and flexible language. Before writing that fancy UI-driven script to do something simple like change the system date/time, consider that the same task can be done in about three lines in Python or a shell script, saving you a great deal of time and headache. Whenever a task seems unnecessarily complex in Sikuli, ask yourself if it might be better solved programmatically, rather than visually.</p>
<h3>1. Not knowing where to find help</h3>
<p>Since Sikuli is still in the early stages of development, finding online resources to help you can be very difficult, to say the least. Here are some of the pages that I find most useful when I&#8217;m writing Sikuli scripts:</p>
<ul>
<li><a href="http://sikuli.org/trac/wiki/reference-0.10" onclick="pageTracker._trackPageview('/outgoing/sikuli.org/trac/wiki/reference-0.10?referer=');">Documentation / Guide</a> &#8211; Perhaps the best Sikuli reference out there. Complete documentation of Sikuli, along with some example code.</li>
<li><a href="https://bugs.launchpad.net/sikuli" onclick="pageTracker._trackPageview('/outgoing/bugs.launchpad.net/sikuli?referer=');">Bug Reporting / Tracking</a> &#8211; Sikuli is still in beta. See if that problem you&#8217;re having is really a bug in Sikuli, not your script.</li>
<li><a href="http://blog.sikuli.org/" onclick="pageTracker._trackPageview('/outgoing/blog.sikuli.org/?referer=');">Blog</a> &#8211; Contains useful code examples and news</li>
<li><a href="https://answers.launchpad.net/sikuli" onclick="pageTracker._trackPageview('/outgoing/answers.launchpad.net/sikuli?referer=');">Q&amp;A</a> &#8211; If you&#8217;re having an issue, there might be someone else who&#8217;s been there before.</li>
</ul>
<p><em>[Mario is a summer intern with Baydin, and he is spending part of the summer automating functional tests for </em><a href="http://www.baydin.com/boomerang" onclick="pageTracker._trackPageview('/outgoing/www.baydin.com/boomerang?referer=');"><span style="font-weight: normal;"><em>Boomerang</em></span></a><span style="font-weight: normal;"><em> using Project Sikuli]</em></span></p>
]]></content:encoded>
			<wfw:commentRss>http://baydin.com/blog/2010/06/5-sikuli-pitfalls-and-how-to-avoid-them/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search SharePoint in 3 Easy Steps</title>
		<link>http://baydin.com/blog/2009/08/search-sharepoint-in-3-easy-steps/</link>
		<comments>http://baydin.com/blog/2009/08/search-sharepoint-in-3-easy-steps/#comments</comments>
		<pubDate>Mon, 10 Aug 2009 19:51:18 +0000</pubDate>
		<dc:creator>stever</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[SharePoint]]></category>

		<guid isPermaLink="false">http://www.baydin.com/blog/?p=107</guid>
		<description><![CDATA[We’ve been seeing an increasing number of corporate customers look into SharePoint. The New York Times has a nice summary on the growth of Microsoft SharePoint that echoes our observation. One of the challenges that users run into is that they expect SharePoint to be the end-all collaboration solution. In reality it’s a bunch of pieces [...]]]></description>
			<content:encoded><![CDATA[<p>We’ve been seeing an increasing number of corporate customers look into SharePoint. The New York Times has a nice summary on the <a title="NYTimes Sharepoint" href="http://bits.blogs.nytimes.com/2009/08/07/microsofts-sharepoint-thrives-in-the-recession/?hp" target="_blank" onclick="pageTracker._trackPageview('/outgoing/bits.blogs.nytimes.com/2009/08/07/microsofts-sharepoint-thrives-in-the-recession/?hp&amp;referer=');">growth of Microsoft SharePoint</a> that echoes our observation. One of the challenges that users run into is that they expect SharePoint to be the end-all collaboration solution. In reality it’s a bunch of pieces that let you build a collaborative solution.  In this blog post, I’m going to show you how you can find documents in SharePoint quickly without having to launch your browser and manually hunt for the file.</p>
<p>Back in our previous jobs, we had begun to put all of our team files on SharePoint. As the site filled up with project schedules, business plans and customer reports, it became difficult to find the correct file. We used it as a collaborative dumping ground. Team members were alerted with an email describing the new file and its location. This created a problem: we filled each other’s inbox with “junk mail” so that we could keep track of our team documents. The second problem was that it was still hard to find the right file if we ignored the notification and did not know the title of the document or its location.</p>
<p>Microsoft is solving this problem with Microsoft Office SharePoint Server 2007. They have incorporated fast, effective search. The problem is that many companies have not yet upgraded to MOSS 2007. So we figured we could show you a way to still find documents while using an older version of SharePoint, which is what most of corporate users have. To do this you will need Outlook 2007 and Microsoft Desktop Search installed. You may know it as <a title="Instant Search in Microsoft Outlook" href="http://office.microsoft.com/en-us/outlook/HA012305851033.aspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/office.microsoft.com/en-us/outlook/HA012305851033.aspx?referer=');">Instant Search</a> in Outlook. This is the search bar that let’s you find emails and calendar items as you type.</p>
<p>What we are going to do is synchronize SharePoint with Outlook 2007. The files that are in your SharePoint site will be searchable through the “Instant Search” bar. This will save you a lot of time if you want to search for files using key words instead of having to navigate through the SharePoint site to locate a file.</p>
<p><span style="text-decoration: underline;">Instructions to Search SharePoint without MOSS 2007 in 3 Easy Steps</span></p>
<p>Pre-requisite: We assume you have Outlook 2007 and <a title="Windows Desktop Search" href="http://www.microsoft.com/windows/products/winfamily/desktopsearch/getitnow.mspx" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.microsoft.com/windows/products/winfamily/desktopsearch/getitnow.mspx?referer=');">Windows Desktop Search</a> running.</p>
<p>Step 1: Go to your SharePoint Site and go to the location where all the files are kept. In this example, we have it in “Shared Documents.” So we go to that location.</p>
<p><img class="alignnone size-full wp-image-121" title="sharepoint1-home" src="http://baydin.com/blog/wp-content/uploads/2009/08/sharepoint1-home5.png" alt="sharepoint1-home" width="644" height="398" /></p>
<p>Step 2: Click the “Action” button and select the menu option that says “Connect to Outlook.”</p>
<p><img class="alignnone size-full wp-image-123" title="sharepoint2-shared-docs" src="http://baydin.com/blog/wp-content/uploads/2009/08/sharepoint2-shared-docs.png" alt="sharepoint2-shared-docs" width="641" height="398" /></p>
<p>Step 3: You will see a listing of the files from SharePoint in your Outlook panel. These are synchronized with Outlook and can be searched through the Instant Search bar.</p>
<p><img class="alignnone size-full wp-image-124" title="sharepoint3-outlook" src="http://baydin.com/blog/wp-content/uploads/2009/08/sharepoint3-outlook.png" alt="sharepoint3-outlook" width="641" height="386" /></p>
<p>Great! You’re done. Now you can search files in SharePoint from your desktop without having to launch a web browser, find the bookmark and navigate through SharePoint.</p>
<p>If you want to be even more productive, you can run <a title="Baydin" href="http://www.baydin.com" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.baydin.com?referer=');">Baydin</a>. It works within Outlook 2007 and will display relevant files from SharePoint, shared network drives, your local computer and inbox by analyzing the context of your email. Think of it like a really intelligent recommendation engine that shows you related documents in a side panel. Our early users have found files they didn’t know they had. Others have sped up product development because they discovered someone else in their company with relevant experience had published documents related to their project. Knowing is half the battle and Baydin shows you what you don’t know.</p>
]]></content:encoded>
			<wfw:commentRss>http://baydin.com/blog/2009/08/search-sharepoint-in-3-easy-steps/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Desktop Search: What hasn&#8217;t changed</title>
		<link>http://baydin.com/blog/2009/05/desktop-search-what-hasn%e2%80%99t-changed/</link>
		<comments>http://baydin.com/blog/2009/05/desktop-search-what-hasn%e2%80%99t-changed/#comments</comments>
		<pubDate>Wed, 20 May 2009 16:29:39 +0000</pubDate>
		<dc:creator>alex</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[desktop search]]></category>

		<guid isPermaLink="false">http://www.baydin.com/blog/?p=28</guid>
		<description><![CDATA[A few posts back, we talked about why search on the desktop works a lot better than it did just a few years ago.&#160; In this post, we’ll talk about how desktop search hasn’t kept up as the way we find and consume content on our computers has changed.
As recently as 2000, the deluge of [...]]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://www.baydin.com/blog/2009/05/a-brief-history-of-search-on-the-desktop/" onclick="pageTracker._trackPageview('/outgoing/www.baydin.com/blog/2009/05/a-brief-history-of-search-on-the-desktop/?referer=');">few posts back</a>, we talked about why search on the desktop works a lot better than it did just a few years ago.&#160; In this post, we’ll talk about how desktop search hasn’t kept up as the way we find and consume content on our computers has changed.</p>
<p>As recently as 2000, the deluge of emails, files, podcasts, blog posts and everything else that we have to keep track of was more like a drizzle.&#160; The average hard drive held about 8 GB of data and we averaged about 7 non-junk emails per day.</p>
<p>As of 2009, those numbers look pretty different.&#160; My laptop’s hard drive is a relatively tiny 160 GB; most computers come with at least 320 GB.&#160; The way we work with email has changed too.&#160; We now average 25 emails per day (almost a whopping 10,000 per year!) thanks to a lot of mailing lists and a lot of CCing.</p>
<p>Of course, we’re not suddenly 60 times more productive than we used to be.&#160; Instead, we just get more of other people’s content.&#160; Before Gmail made email quotas obsolete, CCing large files to everyone who might want a document wasn’t practical.&#160; In 2000, blogs didn’t really exist, and the number of pages that interested each of us on the Internet was orders of magnitude smaller.</p>
<p>The problem only intensifies if we think about it from a corporate perspective.&#160; How many gigabytes of data does your entire company have?&#160; Where does it live?&#160; At our former company, many groups had internal wikis, all of them had internal sharepoint sites (at least three, and as many as fifteen per group!), we had a document management library, we had personal websites with documents attached; everyone cared more about getting the job done than setting it up for other people to have an easy time finding what they created.</p>
<p>So there are now a lot more fragments of information in our brains and a lot more places that the rest of that information could be.&#160; We spend a lot more time asking ourselves “where did I see that again?”&#160; That translates into a lot of time and money. Bill Gates <a href="http://news.zdnet.co.uk/software/0,1000000121,39269529,00.htm" onclick="pageTracker._trackPageview('/outgoing/news.zdnet.co.uk/software/0_1000000121_39269529_00.htm?referer=');">says</a> that the average knowledge worker spends 11 hours a week looking for information, costing his/her company $18,000 per year in lost time.</p>
<p>The future looks like it is going to be even more chaotic – we will not only access more information in more places, but on more devices as well.&#160; We will see some content on our computers, some on our $200 Netbooks, more on our iPhones or BlackBerries, and even more on our Kindles or Sony Readers.&#160; And as we see more content on more devices, remembering where we saw the content we need NOW is going to get even harder.</p>
<p>A lot of productivity gurus are challenging us to “take charge of our Inboxes!” and implement a regimen that will help us manage the information.&#160; But technology caused this problem.&#160; Why isn’t it fixing it?</p>
<p>Fundamentally, the way we look for information hasn’t changed a lick since 2000.&#160; Whether searching our computers or the Internet, we try to figure out what we want to find and we type it into a search box.&#160; We get results that we hope are good enough – they often are.&#160; When programmers have tried to improve on the search box, they’ve come up with some terrifying things.</p>
<p>I’ve attached a screenshot of the <a href="http://simile.mit.edu/seek/" onclick="pageTracker._trackPageview('/outgoing/simile.mit.edu/seek/?referer=');">MIT Simile Seek</a> project’s implementation of what is called faceted search below.&#160; It’s a programmer’s dream.&#160; I think I am wired to love driving tools like this.&#160; It feels like piloting a starship.&#160; If I know i want the 2nd top level domain to be .mit.edu because I know it came from someone at MIT, but I don’t know which lab, faceted search puts that power right at my fingertips.</p>
<p><a href="http://baydin.com/blog/wp-content/uploads/2009/05/simile-seek.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="simile_seek" border="0" alt="simile_seek" src="http://baydin.com/blog/wp-content/uploads/2009/05/simile-seek-thumb.png" width="595" height="226" /></a></p>
<p>But when I showed faceted search to anyone who doesn’t program computers for a living (like Electrical Engineers), they did not share my enthusiasm.&#160; Other search improvements yielded similar gnashing of teeth.&#160; The search box remains the search box.</p>
<p>So we’ve got a lot more content than we’ve ever had before, located in a lot more places than it’s ever been before, and we access it on more devices than we’ve ever used before.&#160; And we still do pretty much the same things to find it that we did in 2000, when we had a lot less content, all on one hard drive, all on one computer.</p>
<p>So there’s a <strong>lot</strong> to fix.&#160; And we’d love to fix all of it!&#160; But for now, we’re trying to siphon off just one aspect of the problem where we think our technology can make a big difference.&#160; In a few days, we’ll talk more about how we’re going to do it.</p>
]]></content:encoded>
			<wfw:commentRss>http://baydin.com/blog/2009/05/desktop-search-what-hasn%e2%80%99t-changed/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Brief History of Search on the Desktop</title>
		<link>http://baydin.com/blog/2009/05/a-brief-history-of-search-on-the-desktop/</link>
		<comments>http://baydin.com/blog/2009/05/a-brief-history-of-search-on-the-desktop/#comments</comments>
		<pubDate>Thu, 07 May 2009 04:03:45 +0000</pubDate>
		<dc:creator>alex</dc:creator>
				<category><![CDATA[Technical]]></category>
		<category><![CDATA[desktop search]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[indexing]]></category>

		<guid isPermaLink="false">http://www.baydin.com/blog/?p=12</guid>
		<description><![CDATA[Desktop search has come a long way in the past few years.&#160; In this post, we’ll explore how the technology behind all of the major desktop search options has changed based on web search innovations.&#160; In the follow-up posts, we’ll talk a little bit about how desktop search is different from web search and how [...]]]></description>
			<content:encoded><![CDATA[<p>Desktop search has come a long way in the past few years.&#160; In this post, we’ll explore how the technology behind all of the major desktop search options has changed based on web search innovations.&#160; In the follow-up posts, we’ll talk a little bit about how desktop search is different from web search and how it has both succeeded and failed at making interacting with our computers better.&#160; We’ll share a few tricks for getting more out of Desktop Search and a few things we wish it could do.&#160; We’ll also share a little bit about how Baydin plans to fill in the gaps. </p>
<p>There are two major advantages to a modern desktop search experience: the first is that searching for a document is a lot faster than it used to be, and the second is that in virtually all file types, the text inside the document is searchable, instead of just the filename. </p>
<p>Think back to the file search in Windows 95.&#160; It was pretty terrible.&#160; All it could do was search for filenames, and it took the better part of eternity to find anything.&#160; Here’s why: when someone searched for a word, Windows opened the file system and looked at every single file it had.&#160; It compared the search query with the filename for each file, and as it found matches, it added the files to the results listing.&#160; Every time a new search started, Windows had to look at every single file, which is why the results trickled in over a period of a few minutes.&#160; If the search term were somewhere in a document or in an email rather than in the filename of a physical file, we were pretty much out of luck.</p>
<p><a href="http://baydin.com/blog/wp-content/uploads/2009/05/win95search.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="win95search" border="0" alt="win95search" src="http://baydin.com/blog/wp-content/uploads/2009/05/win95search-thumb.png" width="443" height="239" /></a> </p>
<p>Searching the full text of documents was beyond the pale.&#160; To do that, Windows would need to open <em>every single file</em> as it came across them and extract the text.&#160; It would have been slower than slow, it would have required every piece of software that saved any kind of document to provide hooks for Windows to extract the text, and it probably would have made the computer rottenly unstable.&#160; </p>
<p>Searching through email in Office (up until 2003) used the same method, but since every email had a known structure, Outlook could search through the full text of messages.&#160; When a user started searching for something, Outlook opened the most recent email and compared the search terms against each word in that email.&#160; If there was a match, it would add the email to the result list in real-time.&#160; When it finished with the most recent email, it would move on to the next, then to the next, then to the next.&#160; Searching through email was a slow process, but it would eventually yield results where the terms were found only in the text of emails. </p>
<p>A real innovation happened, though, when software developers realized that the same technology that powers web search engines could be applied to the desktop.</p>
<p>When someone clicks the search button on a web search engine, the search engine responds in a totally different way from Windows 95-style search.&#160; Google does not crawl every page on the web, word for word, comparing the search terms for a match.&#160; Instead, Google just looks in a previously-generated database where they already have prepared a list of all the web pages that contain the search term (and a bunch of other information that helps them order the results!) </p>
<p>Instead of sifting through every word ever written on the Internet in real time, Google crawls each page on the web only every few hours, days, or weeks depending on how important a site is and how frequently its content changes.&#160; When Google crawls a site, its crawler looks through every page, processes every term, and updates the database.&#160; </p>
<p>Very crudely, that index looks like this:</p>
<table border="0" cellspacing="0" cellpadding="2" width="465">
<tbody>
<tr>
<td valign="top" width="115"><strong>Term</strong></td>
<td valign="top" width="348"><strong>Results</strong></td>
</tr>
<tr>
<td valign="top" width="115">baydin</td>
<td valign="top" width="348"><a href="http://www.baydin.com" onclick="pageTracker._trackPageview('/outgoing/www.baydin.com?referer=');">http://www.baydin.com</a>          <br /> <a href="http://burmadigest.info/2008/03/20/set-ka-lay-baydin-burmese" onclick="pageTracker._trackPageview('/outgoing/burmadigest.info/2008/03/20/set-ka-lay-baydin-burmese?referer=');">http://burmadigest.info/2008/03/20/set-ka-lay-baydin-burmese</a>          <br /><a href="http://www.baydin.com/blog" onclick="pageTracker._trackPageview('/outgoing/www.baydin.com/blog?referer=');">http://www.baydin.com/blog</a>          <br />etc.</td>
</tr>
<tr>
<td valign="top" width="115">chicken</td>
<td valign="top" width="348"><a href="http://en.wikipedia.org/wiki/Chicken" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Chicken?referer=');">http://en.wikipedia.org/wiki/Chicken</a>           <br /><a href="http://allrecipes.com/Recipes/Chicken" onclick="pageTracker._trackPageview('/outgoing/allrecipes.com/Recipes/Chicken?referer=');">http://allrecipes.com/Recipes/Chicken</a>           <br />etc.</td>
</tr>
<tr>
<td valign="top" width="115">outlook</td>
<td valign="top" width="348"><a href="http://www.microsoft.com/outlook" onclick="pageTracker._trackPageview('/outgoing/www.microsoft.com/outlook?referer=');">http://www.microsoft.com/outlook</a> <a href="http://en.wikipedia.org/wiki/Microsoft_Outlook" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Microsoft_Outlook?referer=');">http://en.wikipedia.org/wiki/Microsoft_Outlook</a>           <br />etc…</td>
</tr>
</tbody>
</table>
<p>All Google has to do when you search for “chicken” is find that index and list the results.</p>
<p>Of course, that’s a sweeping simplification – it doesn’t address multiple-term searches, result order, or the fact that the index is HUGE and difficult to maintain.&#160; There are dozens of fantastic papers from Google engineers that explains a lot of the details; try <a href="http://labs.google.com/papers" onclick="pageTracker._trackPageview('/outgoing/labs.google.com/papers?referer=');">http://labs.google.com/papers</a> for a listing, or start <a href="http://infolab.stanford.edu/~backrub/google.html" onclick="pageTracker._trackPageview('/outgoing/infolab.stanford.edu/_backrub/google.html?referer=');">here</a> for an overview from when Sergey and Larry were still at Stanford.&#160; But for the purposes of this post, that’s all we need to worry about.&#160; </p>
<p>Creating and maintaining a mapping from search terms to web pages is the critical innovation for desktop search.&#160; The idea extends quite well to our individual computers.&#160; Instead of a mapping from terms to web pages, though, we need to make a mapping from terms to documents. So the problem is a little bit harder in that we have to be able to index a whale of a lot of document types instead of just HTML, but it is a lot easier in that the index size is nowhere near as large as the index for the web.&#160; It can be generated relatively fast (probably under an hour for the average computer) and does not require a lot of space.</p>
<p>Google Desktop Search, Windows Desktop Search, and all the competitors do exactly this.&#160; Their indexer runs in the background, opens every file on the computer, and creates a database in the same format as the web databases above:</p>
<table border="0" cellspacing="0" cellpadding="2" width="465">
<tbody>
<tr>
<td valign="top" width="106"><strong>Term</strong></td>
<td valign="top" width="357"><strong>Results</strong></td>
</tr>
<tr>
<td valign="top" width="106">baydin</td>
<td valign="top" width="357">C:\Alex\Documents\baydin_biz_plan.doc         <br />C:\Alex\Desktop\blog\post1.html          <br />C:\Alex\Documents\cashflow.xls          <br />etc.</td>
</tr>
<tr>
<td valign="top" width="106">chicken</td>
<td valign="top" width="357">C:\Alex\Documents\Recipes\chicken florentine.doc         <br />C:\Alex\Desktop\chicken.jpg<a href="http://en.wikipedia.org/wiki/Chicken" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Chicken?referer=');">           <br /></a>etc.</td>
</tr>
<tr>
<td valign="top" width="106">outlook</td>
<td valign="top" width="357">C:\Program Files\Microsoft\Outlook.exe         <br />C:\Alex\Documents\problems with outlook.doc          <br />etc…</td>
</tr>
</tbody>
</table>
<p>When I search on my computer for a word, like the web search engines, all my computer now has t<br />
o do is look in that index and find the already-generated list of files that match my term.&#160; </p>
<p>The key takeaway is that thanks to these indexes, searching through the full text of every file on a computer is now thousands of times faster than just searching the filenames used to be.&#160; </p>
]]></content:encoded>
			<wfw:commentRss>http://baydin.com/blog/2009/05/a-brief-history-of-search-on-the-desktop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
