Investing A Million Bucks Into Performance and Stability
(This blog entry is based on a long comment I posted recently on Hacker News.)
Last week Xobni announced that we’ve added the Blackberry Fund to our series B financing.
We also released a new version of our software alongside that announcement. The release got less attention because there aren’t any user visible features.
Truth be told, though, we invested about a million dollars into that release! And the investment will be critical to the success of Xobni.
So how is it possible to invest so much, on something so important, without adding new features?
If you’ve read the title of this post then you already know the answer. Other than consuming ungodly amounts of Reese's Peanut Butter Cups, we worked on two things -- performance and stability.
Outlook is a hostile environment for a product like Xobni. It has several different APIs, each with different quirky interfaces, side effects, and threading models.
There's no way around having complex APIs for Outlook. They expose programmatic access to the most complicated email application ever built. Just like Excel and MS Word, it's really hard to underestimate how feature rich this program is.
As evidence, I’d suggest you check out this unusual peak into the depths of Outlook's complexity.
Scroll down to the section titled "Individual bugs that are fixed." Wow!
Let's run through my favorite example of how complex these APIs are.
Imagine you have an ID for a message, and you want to open a draft reply to that message so the user can type in their reply and press send. It should work just as if the user hit the Reply button, or pressed Ctrl-R.
Easy right? That's what I thought, too...
The first API I tried seemed to work, but when the user pressed send the icon for the original message didn't change to the purple arrow to indicate that it had been replied to.
I found a second approach that didn't have that bug, except it turned out to save the draft of the message (if it is being composed for more than five minutes, or the user explicitly presses Save) STRAIGHT INTO THE INBOX, instead of the Drafts folder. It looked funky -- no sender name, no sent time, etc. Ouch.
I found another API to use. It set the right icon -- good -- and seemed to save drafts into the Drafts folder. Double check. Unfortunately as soon as the user started typing it showed up in Times New Roman, 12pt, as the default font. Doh.
One of these three approaches also wouldn't pre-populate the user's email signature.
Fourth API was a charm!
…not to mention that the ID for a message is allowed to change under certain circumstances, such as when it gets moved to a different folder.
I could really talk about the complexity of Outlook and the scenarios we've ran into for days. Below are some example bugs I pulled from the Outlook feb-09 cumulative update document linked above. Each of these can hide weird race conditions, thread starvations, or just plain old corner cases that only show up when the moon is in the seventh house.
* Inefficient processing occurs in a loop during intermittent network connectivity.
* If the store providers are disconnected early, the Outlook.exe process becomes unresponsive for a very long time.
* When you right-click an item, the whole item is loaded into memory more frequently than necessary.
* Unnecessary disk reads are performed for every time that a custom form icon is rendered.
One of the fun side effects of doing this work is that you end up seeing all of the bugs you've come across in OTHER Outlook addins. I was using TechSmith's Snag-It the other day and smiled when I saw a draft it had opened save to the Inbox before I pressed send. They were using API #2. :-)
Not to mention that all of these other addins are accessing the same APIs, sometimes with "interesting interactions."
And on top of these challenges there are several users who have giant mailboxes. We've seen users with almost a million emails loaded into Outlook at the same time.
Usually these are the people with 12" laptops, 2 GBs of RAM, and 5400 RPM hard drives.
When this happens, all bets are off. Outlook takes a long time to load their twelve PST files. If an addin is trying to load its stuff at the same time you get heavy disk contention and "sequential" read throughputs plummet to 1 MB per second. And god help you if Outlook needs to "repair" any of these PSTs.
..the list goes on and on. We've been on a four month odyssey.
The sad truth is that there are still issues with our performance and stability. We fixed all of the reproducible issues but there still remain computers out there with all kind of weird registry permissions issues, combinations of other addins that can conflict with Xobni, and so on. It’s not the wild west out there, but bandits do pop up from time to time.
It's been both tough and fun. I love my team. We’ve hit some real high notes together.
I think we do it for our users. Not to be trite, but I think it’s true. Everyone on the engineering team feels a pinch of pain when users have problems. [1]
But, the other side of the coin is that we have a lot of fun with the code. There is some pretty awesome stuff going on under the covers. I'll give three quick examples..
1) Xobni's data store sits strictly underneath the sidebar code in the stack. It was originally built to support Xobni Analytics, our first product from 2006 that bombed. Fortunately we got to leverage the same data store when creating the Xobni sidebar. It's very cool. When someone is building software leveraging our backend, say the sidebar, or the Invite Your Friends feature, the code ends up looking like this:
foreach(var mail in new MailIterator()) {
Console.WriteLine(mail.Subject);
}
This code will print 10k subjects per second, from disk! And it's from a key-value store, so it's easy to add new data fields and types.
Our data store is darn useful. Just two days ago I wrote some code against it to get some important data for a new project we’re working on. It looks like pseudocode!
2) Not only that, but the data store is built to be client agnostic above "layer 1" where we interface with the mail client. So when we wanted to integrate Yahoo Mail all we had to do was build the adapter piece that knew how to speak Yahoo's language, and suddenly the mail floats all the way up the stack and appears in the sidebar right next to Outlook emails. :-)
3) The areas where we display information from Facebook, LinkedIn, etc are all little embedded instances of Internet Explorer. The code for those extensions is all just HTML and Javascript. When the user changes the current email we invoke a specific JS function called updatePerson(), and there's a callback object the JS can use to make HTTP calls and write lines to the log file. This architecture, which was invented by someone smarter than me, allows us to pump these babies out quickly and without much QA risk to the other parts of the program.
It just doesn't get any cooler than this!
Anyway, back to the main subject of the blog post: performance and stability and the road ahead..
I think we’re all excited with what we've done, but mostly we're already starting to look look forward to the next generation of features for our customers. It's going to be an exciting rest-of-2009 ahead!
Stay tuned!
(Commercial: if you're a developer and are interested in being part of the team, send your resume to [email protected]!)