Archive

Archive for June, 2010

5 Sikuli Pitfalls (and how to avoid them!)

June 24th, 2010

Introduction

Project Sikuli, a machine-vision research project from MIT, allows users to write scripts that automate UI-driven tasks. This tool, while very powerful and simple, can cause a number of headaches if you’re not careful. In this post, I’ll talk about some issues you might encounter, as well as how to avoid them. If you’ve never used Sikuli before, you can see demos here, or download the IDE here.

5. Not wait()ing

If you’re a long-time user of a particular program or operating system, you can probably describe many common tasks from memory. For example, I might describe how to access the “Uninstall or change a program” dialog in Windows 7 as follows:

  1. Click on the start menu
  2. Click on “Control Panel”
  3. Click on “Uninstall a Program” under the “Programs” heading

If you’re new to Sikuli, you might be tempted to script this simple process like this:

How not to do it

You could then run your script and not notice a single issue for a long time. Then, one day, your computer is busy running multiple background tasks and you run your script again. The start menu takes a few seconds to display its contents, and your script raises an exception. The problem is that Sikuli doesn’t automatically wait for an image to be visible on-screen before trying to click() it, so if it doesn’t see “Control Panel” on your screen very soon after clicking on the start menu, it will raise an exception. In order to force it to behave properly, you’ll need to insert wait() statements:

A better way to do it

If you’ve added wait()s and you’re still having problems, try setting a longer wait time.

4. Having trouble with context-sensitive/popup menus

If you tried to follow along with the previous example, you likely ran into a problem when you tried to capture the “Control Panel” option in the start menu. After opening the start menu, switching focus back to the Sikuli IDE will cause the start menu to close, thwarting your effort to capture an image. Now, you could use PrintScreen while the start menu is open, paste the image into an image processor, and then use Sikuli to capture the image from the image processor, but thankfully, there’s a better way.

Sikuli installs hotkeys for common tasks like capturing an image (CTRL + SHIFT + 2 by default), and they don’t cause the current program to lose focus. So you can simply open the start menu/context-sensitive menu of your choice and use the hotkey to capture the screen. That way the menu won’t disappear in the process.

3. click()ing when you should be type()ing

Sikuli’s pretty good at finding stuff on-screen, but it’s still a costly and error-prone process. If you can navigate a user interface by emulating keystrokes rather than clicks, you’ll save yourself a lot of trouble. Typing also has the benefit of sending events sequentially to the current program’s event queue, so if your program is stalling on something, your type() commands will wait for that task to be done, meaning you’ll run into problem #5 a lot less often. If you were to instead use click()s, you would have to tell your script in advance how long to wait() before it can take its next action, giving you inconsistent results if your machine is running slower than expected. Keep in mind that Sikuli can emulate key modifiers, function keys, arrow keys, etc., so you can specify some pretty complex interactions using only type() commands.

2. Forgetting that you’re using Python

Sikuli is a powerful and flexible tool, but remember that Sikuli scripts are written in an incredibly powerful and flexible language. Before writing that fancy UI-driven script to do something simple like change the system date/time, consider that the same task can be done in about three lines in Python or a shell script, saving you a great deal of time and headache. Whenever a task seems unnecessarily complex in Sikuli, ask yourself if it might be better solved programmatically, rather than visually.

1. Not knowing where to find help

Since Sikuli is still in the early stages of development, finding online resources to help you can be very difficult, to say the least. Here are some of the pages that I find most useful when I’m writing Sikuli scripts:

  • Documentation / Guide – Perhaps the best Sikuli reference out there. Complete documentation of Sikuli, along with some example code.
  • Bug Reporting / Tracking – Sikuli is still in beta. See if that problem you’re having is really a bug in Sikuli, not your script.
  • Blog – Contains useful code examples and news
  • Q&A – If you’re having an issue, there might be someone else who’s been there before.

[Mario is a summer intern with Baydin, and he is spending part of the summer automating functional tests for Boomerang using Project Sikuli]

Technical , , ,

Thoughts from Enterprise 2.0 Boston 2010

June 22nd, 2010

Baydin participated in the Enterprise 2.0 LaunchPad this year, announcing availability of our email-based automatic knowledge discovery tool, Unsearch.

I felt like we were preaching to the choir, because the keynote speakers who talked before us, including Tony Zingale from Jive and Jamie Whitmoyer, who implemented Sony’s E2.0 infrastructure in SharePoint, showed data about how much room there is for email and search to improve inside the organization. We were thrilled to be able to announce a product that makes email collaborative, for everyone in the enterprise, just a few minutes later.

We had several large/medium companies approach us about setting up pilot programs, so I would definitely encourage other startups in this space to apply to be part of the LaunchPad at E2.0 San Francisco in November. If you weren’t able to be at E2.0, but are curious how Unsearch’s email integration can get 100% of your coworkers involved and collaborating using the systems you already have, like SharePoint, please email us about setting up a demo.

Below are a few thoughts about the conference.

Major Themes

Collaboration and Social Software is back again in 2010. After a couple rough years because of the economy, it looks like companies are again making major pushes to find and deploy software to try to get people talking, collaborating, and connecting with their coworkers again.  There were several major themes that appeared throughout the keynotes, across the Expo, and in the panels.

The Rise of Feeds
Everyone has a “News Feed” view now. It’s clear that vendors have discovered value in bite-size pieces of information, delivered in chronological order, from people you already know (or groups you are already part of). This is a major part of practically all the new E2.0 products. The big value in these feeds is that they are public and somewhat customized, but unlike email, they are not directed specifically to you, so you can read just part of the stream and not worry about missing something.

There’s a lot of value in being able to filter information this way, even if the filter is “I don’t have time to look at this today.” The software demos looked like they would have a lot of irrelevant information in the feeds, though and there was a large number of unread items in most of the demos. I tend to feel stressed by unread counts, so I’m not sure the ability to come and go through the feed, without worrying about missing things is as strong as in the consumer News Feeds.

I worry that these systems will continue to create more information overload, but I think there’s a lot of potential here. Especially since most of the feed systems allow comments and discussions to form around news entries as they catch people’s interest.

Innovation in Search
Between the “Search is Not Enough” panel, the keynote speeches describing information overload, and the cool techniques presented for incorporating more serendipitous information browsing (like DarwinEco), I think the big technological shift over the next couple years will be in changing the way search works. Of course, I’m biased. But the panelists made it clear, over and over again, that searching through these new systems is going to require more intelligence on the part of the system.

The amount of information getting shared, through microblogging and social collaboration systems, continues to increase. Files are still important, but most of the new shared information is showed through searchable, non-proprietary-file-format HTML, which means that all the Web 2.0 tools for navigating this information can be brought to the enterprise.  That’s exciting.

Millenials
Barely a panel went by without a mention of millenials (people who graduated from college after 2000 seemed like the general definition) – a bit strange, since I only saw a handful of fellow millenials in attendance. There were two major ways that millenials came up in discussion.  The first is that we were described as being more comfortable sharing information digitally and more willing to become contributors using collaboration software. In my experience as a millenial at a big company, this was absolutely true. There were plenty of older (even 50+) people who were heavy users of our wikis at ADI, but virtually ALL of the millenials used it a lot.

The second context where millenials received frequent mention was in a sort of reverse-Luddite way. Essentially, said some panelists, millenials grew up with Facebook and are incapable of learning to operate in an environment without it. We are apparently too young to understand how to communicate via email or in person, and without Facebook for Business, we are unmanageable. Selling social software based on these premises struck me as asinine – selling software on the basis of old people being too dumb to adapt to a world where the telephone isn’t the dominant form of communication would be clearly offensive, so why is it OK to generalize about millenials in such an obviously wrong way?

New Infrastructure Ideas
My favorite product at the expo was an infrastructure product.  Cisco’s Pulse has some very bright minds working on it. Pulse is based on physical boxes that sit in front of Exchange servers, or Wiki systems, or video sharing systems, with all the network traffic itself running through them. Like, with a physical wire. The Pulse systems pull information out of the physical packets on the network and identify the appearance of a set of pre-specified keywords as they go over the wire, connecting people with the experts who regularly communicate about that keyword.

They’ve also got some incredible tech baked in for detecting phonetic appearances of those keywords in online video, making the video searchable. This reverse-keyword technique is different, because instead of building an index of every word mentioned (and dealing with associated transcription issues), they instead look specifically for a dictionary of words that are known to have meaning.

This approach is technically very interesting to me, and I am looking forward to seeing how the product develops, and potentially integrating Unsearch with it. I am very impressed with Cisco’s ability to innovate as a HUGE company – and a system this complex needs a lot of resources and a lot of different expertise, so it almost requires a big company to build it. Very impressive.

The LaunchPad

Baydin was incredibly excited to be named the winner of the Enterprise 2.0 Launchpad. Companies from around the globe, including some pretty big names, competed to launch products at the LaunchPad. The four companies on stage came from Cambridge (us), Portugal, Switzerland, and Germany.

Our fellow finalists put together some amazing demos. They did a great job presenting, and all of them are working on stuff that could really have an impact. We were honored that the Enterprise 2.0 attendees selected us as the winner out of a group with this much potential.

The finalists were:

Doodle – A very easy way to schedule a meeting. Suggest times that work for you, and let the other invitees vote for which times they’d prefer. They announced a new feature that lets a meeting organizer see free/busy status for fellow Doodle-users when first selecting possible times for the meeting.  Think MS Exchange-style scheduling, but cross-platform.

InnovationCast – Leonardo described InnovationCast as a tool for managing innovation as a company develops new products and services. Built on top of Telligent, they provide some really neat analytics on how innovation grows and spreads through an organization.

MindQuilt – Enterprise Q&A tool, think StackOverflow/Yahoo Answers for inside a company. I expected MindQuilt to be really derivative, but it turns out they add some really slick autotagging features and integrate really well with email and IM clients, making it easy for users to get into it. I expect them to have a lot of success in large companies.

The E2Conf folks haven’t uploaded the video of our presentations yet, as far as I can tell, but you can see some photos from the LaunchPad and the other Wednesday keynotes here: http://www.flickr.com/photos/adunne/sets/72157624252543430/.  The videos that we created to be selected as finalists are here: http://launchpad.e2conf.com/final-four-2/.

Thanks very much to all of you who voted for our video and made it possible for us to be part of the conference this year.

Uncategorized