Archive

Archive for the ‘Technical’ Category

Search SharePoint in 3 Easy Steps

We’ve been seeing an increasing number of corporate customers look into SharePoint. The New York Times has a nice summary on the growth of Microsoft SharePoint that echoes our observation. One of the challenges that users run into is that they expect SharePoint to be the end-all collaboration solution. In reality it’s a bunch of pieces that let you build a collaborative solution.  In this blog post, I’m going to show you how you can find documents in SharePoint quickly without having to launch your browser and manually hunt for the file.

Back in our previous jobs, we had begun to put all of our team files on SharePoint. As the site filled up with project schedules, business plans and customer reports, it became difficult to find the correct file. We used it as a collaborative dumping ground. Team members were alerted with an email describing the new file and its location. This created a problem: we filled each other’s inbox with “junk mail” so that we could keep track of our team documents. The second problem was that it was still hard to find the right file if we ignored the notification and did not know the title of the document or its location.

Microsoft is solving this problem with Microsoft Office SharePoint Server 2007. They have incorporated fast, effective search. The problem is that many companies have not yet upgraded to MOSS 2007. So we figured we could show you a way to still find documents while using an older version of SharePoint, which is what most of corporate users have. To do this you will need Outlook 2007 and Microsoft Desktop Search installed. You may know it as Instant Search in Outlook. This is the search bar that let’s you find emails and calendar items as you type.

What we are going to do is synchronize SharePoint with Outlook 2007. The files that are in your SharePoint site will be searchable through the “Instant Search” bar. This will save you a lot of time if you want to search for files using key words instead of having to navigate through the SharePoint site to locate a file.

Instructions to Search SharePoint without MOSS 2007 in 3 Easy Steps

Pre-requisite: We assume you have Outlook 2007 and Windows Desktop Search running.

Step 1: Go to your SharePoint Site and go to the location where all the files are kept. In this example, we have it in “Shared Documents.” So we go to that location.

sharepoint1-home

Step 2: Click the “Action” button and select the menu option that says “Connect to Outlook.”

sharepoint2-shared-docs

Step 3: You will see a listing of the files from SharePoint in your Outlook panel. These are synchronized with Outlook and can be searched through the Instant Search bar.

sharepoint3-outlook

Great! You’re done. Now you can search files in SharePoint from your desktop without having to launch a web browser, find the bookmark and navigate through SharePoint.

If you want to be even more productive, you can run Baydin. It works within Outlook 2007 and will display relevant files from SharePoint, shared network drives, your local computer and inbox by analyzing the context of your email. Think of it like a really intelligent recommendation engine that shows you related documents in a side panel. Our early users have found files they didn’t know they had. Others have sped up product development because they discovered someone else in their company with relevant experience had published documents related to their project. Knowing is half the battle and Baydin shows you what you don’t know.

Technical

Desktop Search: What hasn’t changed

A few posts back, we talked about why search on the desktop works a lot better than it did just a few years ago.  In this post, we’ll talk about how desktop search hasn’t kept up as the way we find and consume content on our computers has changed.

As recently as 2000, the deluge of emails, files, podcasts, blog posts and everything else that we have to keep track of was more like a drizzle.  The average hard drive held about 8 GB of data and we averaged about 7 non-junk emails per day.

As of 2009, those numbers look pretty different.  My laptop’s hard drive is a relatively tiny 160 GB; most computers come with at least 320 GB.  The way we work with email has changed too.  We now average 25 emails per day (almost a whopping 10,000 per year!) thanks to a lot of mailing lists and a lot of CCing.

Of course, we’re not suddenly 60 times more productive than we used to be.  Instead, we just get more of other people’s content.  Before Gmail made email quotas obsolete, CCing large files to everyone who might want a document wasn’t practical.  In 2000, blogs didn’t really exist, and the number of pages that interested each of us on the Internet was orders of magnitude smaller.

The problem only intensifies if we think about it from a corporate perspective.  How many gigabytes of data does your entire company have?  Where does it live?  At our former company, many groups had internal wikis, all of them had internal sharepoint sites (at least three, and as many as fifteen per group!), we had a document management library, we had personal websites with documents attached; everyone cared more about getting the job done than setting it up for other people to have an easy time finding what they created.

So there are now a lot more fragments of information in our brains and a lot more places that the rest of that information could be.  We spend a lot more time asking ourselves “where did I see that again?”  That translates into a lot of time and money. Bill Gates says that the average knowledge worker spends 11 hours a week looking for information, costing his/her company $18,000 per year in lost time.

The future looks like it is going to be even more chaotic – we will not only access more information in more places, but on more devices as well.  We will see some content on our computers, some on our $200 Netbooks, more on our iPhones or BlackBerries, and even more on our Kindles or Sony Readers.  And as we see more content on more devices, remembering where we saw the content we need NOW is going to get even harder.

A lot of productivity gurus are challenging us to “take charge of our Inboxes!” and implement a regimen that will help us manage the information.  But technology caused this problem.  Why isn’t it fixing it?

Fundamentally, the way we look for information hasn’t changed a lick since 2000.  Whether searching our computers or the Internet, we try to figure out what we want to find and we type it into a search box.  We get results that we hope are good enough – they often are.  When programmers have tried to improve on the search box, they’ve come up with some terrifying things.

I’ve attached a screenshot of the MIT Simile Seek project’s implementation of what is called faceted search below.  It’s a programmer’s dream.  I think I am wired to love driving tools like this.  It feels like piloting a starship.  If I know i want the 2nd top level domain to be .mit.edu because I know it came from someone at MIT, but I don’t know which lab, faceted search puts that power right at my fingertips.

simile_seek

But when I showed faceted search to anyone who doesn’t program computers for a living (like Electrical Engineers), they did not share my enthusiasm.  Other search improvements yielded similar gnashing of teeth.  The search box remains the search box.

So we’ve got a lot more content than we’ve ever had before, located in a lot more places than it’s ever been before, and we access it on more devices than we’ve ever used before.  And we still do pretty much the same things to find it that we did in 2000, when we had a lot less content, all on one hard drive, all on one computer.

So there’s a lot to fix.  And we’d love to fix all of it!  But for now, we’re trying to siphon off just one aspect of the problem where we think our technology can make a big difference.  In a few days, we’ll talk more about how we’re going to do it.

Technical

A Brief History of Search on the Desktop

Desktop search has come a long way in the past few years.  In this post, we’ll explore how the technology behind all of the major desktop search options has changed based on web search innovations.  In the follow-up posts, we’ll talk a little bit about how desktop search is different from web search and how it has both succeeded and failed at making interacting with our computers better.  We’ll share a few tricks for getting more out of Desktop Search and a few things we wish it could do.  We’ll also share a little bit about how Baydin plans to fill in the gaps.

There are two major advantages to a modern desktop search experience: the first is that searching for a document is a lot faster than it used to be, and the second is that in virtually all file types, the text inside the document is searchable, instead of just the filename.

Think back to the file search in Windows 95.  It was pretty terrible.  All it could do was search for filenames, and it took the better part of eternity to find anything.  Here’s why: when someone searched for a word, Windows opened the file system and looked at every single file it had.  It compared the search query with the filename for each file, and as it found matches, it added the files to the results listing.  Every time a new search started, Windows had to look at every single file, which is why the results trickled in over a period of a few minutes.  If the search term were somewhere in a document or in an email rather than in the filename of a physical file, we were pretty much out of luck.

win95search

Searching the full text of documents was beyond the pale.  To do that, Windows would need to open every single file as it came across them and extract the text.  It would have been slower than slow, it would have required every piece of software that saved any kind of document to provide hooks for Windows to extract the text, and it probably would have made the computer rottenly unstable. 

Searching through email in Office (up until 2003) used the same method, but since every email had a known structure, Outlook could search through the full text of messages.  When a user started searching for something, Outlook opened the most recent email and compared the search terms against each word in that email.  If there was a match, it would add the email to the result list in real-time.  When it finished with the most recent email, it would move on to the next, then to the next, then to the next.  Searching through email was a slow process, but it would eventually yield results where the terms were found only in the text of emails.

A real innovation happened, though, when software developers realized that the same technology that powers web search engines could be applied to the desktop.

When someone clicks the search button on a web search engine, the search engine responds in a totally different way from Windows 95-style search.  Google does not crawl every page on the web, word for word, comparing the search terms for a match.  Instead, Google just looks in a previously-generated database where they already have prepared a list of all the web pages that contain the search term (and a bunch of other information that helps them order the results!)

Instead of sifting through every word ever written on the Internet in real time, Google crawls each page on the web only every few hours, days, or weeks depending on how important a site is and how frequently its content changes.  When Google crawls a site, its crawler looks through every page, processes every term, and updates the database. 

Very crudely, that index looks like this:

Term Results
baydin http://www.baydin.com
http://burmadigest.info/2008/03/20/set-ka-lay-baydin-burmese
http://www.baydin.com/blog
etc.
chicken http://en.wikipedia.org/wiki/Chicken
http://allrecipes.com/Recipes/Chicken
etc.
outlook http://www.microsoft.com/outlook http://en.wikipedia.org/wiki/Microsoft_Outlook
etc…

All Google has to do when you search for “chicken” is find that index and list the results.

Of course, that’s a sweeping simplification – it doesn’t address multiple-term searches, result order, or the fact that the index is HUGE and difficult to maintain.  There are dozens of fantastic papers from Google engineers that explains a lot of the details; try http://labs.google.com/papers for a listing, or start here for an overview from when Sergey and Larry were still at Stanford.  But for the purposes of this post, that’s all we need to worry about. 

Creating and maintaining a mapping from search terms to web pages is the critical innovation for desktop search.  The idea extends quite well to our individual computers.  Instead of a mapping from terms to web pages, though, we need to make a mapping from terms to documents. So the problem is a little bit harder in that we have to be able to index a whale of a lot of document types instead of just HTML, but it is a lot easier in that the index size is nowhere near as large as the index for the web.  It can be generated relatively fast (probably under an hour for the average computer) and does not require a lot of space.

Google Desktop Search, Windows Desktop Search, and all the competitors do exactly this.  Their indexer runs in the background, opens every file on the computer, and creates a database in the same format as the web databases above:

Term Results
baydin C:\Alex\Documents\baydin_biz_plan.doc
C:\Alex\Desktop\blog\post1.html
C:\Alex\Documents\cashflow.xls
etc.
chicken C:\Alex\Documents\Recipes\chicken florentine.doc
C:\Alex\Desktop\chicken.jpg
etc.
outlook C:\Program Files\Microsoft\Outlook.exe
C:\Alex\Documents\problems with outlook.doc
etc…

When I search on my computer for a word, like the web search engines, all my computer now has t
o do is look in that index and find the already-generated list of files that match my term. 

The key takeaway is that thanks to these indexes, searching through the full text of every file on a computer is now thousands of times faster than just searching the filenames used to be. 

Technical , ,