What politicians teach us about email

August 16, 2007 / Robert Pease / Leave a comment

Politicians as a group are generally not on the cutting edge of either technology or clear communication and this rings true for their attitudes and struggles with what to do with email at the federal, state, and municipal level. This podcast from NPR (via Boing Boing) is an interesting discussion of this topic as only NPR can do but highlights many of the same challenges that exist in the private sector about what the heck to do with the mountain of email created every day.

Many politicians like NJ Governor Jon Corzine have chosen not to use email in an effort to leave no trail or chance that something typed in a hurry could come back and bite at a later date (a true politician’s stance). As this NPR piece points out, making the decision to begin systematically deleting emails after a defined retention period can create a "perception problem" that these records are being destroyed to hide something. This story goes on to compare and contrast the Governor’s Offices in California and Virginia. California chooses to delete emails after 2 weeks while Virginia chooses to save them and burn them to disk.

One thing is for sure – there is no agreement on the best approach and whether these should be treated as historical documents or not. Contrary to the PR flack from the VA Gov’s office, I would say that the writings of Thomas Jefferson had more methodical thinking behind them due to both the effort and importance of written correspondence in the 1700s. The ease of creating and sending emails has dumbed the content down considerably so not sure this is even a worthwhile comparison.

Learn about email classification

May 18, 2007 / Robert Pease / Leave a comment

I was recently interviewed by Dan Keldsen formerly of the Delphi Group/Perot Systems and the guy behind BizTechTalk about how you can apply intelligence to archiving with email classification. Here’s the link to the audio and here is the link to his blog and short write-up on our chat.

Digital landfills

April 27, 2007 / Robert Pease / Leave a comment

Was on a call this morning and one of the folks on the call referred to the mass of electronic records that companies are accumulating and are petrified to delete as a "digital landfill."

Here’s the visual that came to my mind:

A big massive heap of debris (you know the ones we have all seen as we descend in an airplane to the city of the day) compacted together without rhyme or reason. Mixed in with all that garbage are some valuable items, but who wants to look for them?

Got me thinking – is there a way to apply recycling concepts to digital content?

At the very least, separating this debris before storing it off makes sense and gives you a better shot at identifying the more valuable content you and your company create on any given day.

Learning what your products can do from your customers

April 26, 2007 / Robert Pease / Leave a comment

Funny thing about customers – they will show you how your product(s) can be used in ways you never imagined.

This is a great stage to be in as a company as our technology is finding new problems to solve and our customers are getting huge value out of our ability to solve them without having to shop for something new.

We had a recent experience where one of our customers came to us with the need to control duplicate emails before archival and wanted to use our classification product to do so.

De-duplication has long been the turf of pure storage and archival companies and is not necessarily something we considered or thought of ourselves as a solution to fix. We are not focused on single-instance storage and optimization anymore than we want to be an archiving company, but the opportunity came up to be used as a piece of the solution to this problem and we were able to address it.

We were asked if we could help eliminate duplicate emails before they entered the archive. Their problem was that in a multiple MS Exchange Journal environment, the same message could end up in multiple Journals based on the recipient list creating what was estimated to be as much as 30% duplicate emails in the long-term archive.

They are a pretty large company and that amount of duplicates translates to terabytes of extra storage not to mention the retrieval headache of seeing the same message over and over again. They looked internally at what they had that could address this and contacted us and another vendor for proposals.

Sorry for MessageGate commercial here, but we were able to get this problem under control with a pretty simple classification rule designed to detect duplicate message IDs and were able to do so with limited additional hardware cost. The other solution *could* have done this but was disqualified based on the amount of iron needed for processing.

The reason I point this out is that scalability is touted by every enterprise software company out there. No one is going to tell you they can’t scale. Where the rubber meets the road is how, operationally, you scale. Chasing performance with server count and claiming infinite software scalability without including the associated hardware and processing costs is not a way to endear yourself to an enterprise customer or win business.

As you acquire customers and grow a business, be prepared to be shown how your product(s) can be used to solve their pressing needs in many ways you never even considered and to be able to meet these new requirements with an attractive total cost of operation.

Separating high-value from low-value email

April 25, 2007 / Robert Pease / Leave a comment

I recently did a post on this topic based on a pilot I am running on my own Inbox with our classification software. Here’s an update:

Number of days: 41 including weekends
Number of emails: 816
Total size: 21.9MB

Man, I get a lot of crap. Good thing I have this set up to route to specific sub-folder.

I need to do some spring cleaning and remove myself from these various lists or at least identify the ones that are of value (which very few seem to be). Also keep in mind, I am eating for two after picking up a departed co-worker’s mail.

My results are not atypical of what we have seen from our customers before they head down the road of intelligent email classification. This enormous volume of easily identified non-biz stuff not only makes its way to the Inbox but to the archive.

How big a deal is this? Go ask the person responsible for email retrieval/e-discovery how much they enjoy sorting through mountains of useless stuff like Joke of the Day, Fedex delivery alerts, or even Out of Office replies as they seek out requested emails.

Is an IM chat a business record?

April 11, 2007 / Robert Pease / Leave a comment

As messaging gets further defined beyond email to IM, texting, etc. companies are struggling with what to save and for how long. The explosion in more and more unstructured ways to interact both in and out of the corporate network presents quite a dilemma for those in the legal, records, and/or compliance groups.

One particular question that has had a pretty consistent answer as we have talked to customers and prospects is that they (and, more importantly, their legal depts) feel they don’t have to archive IM conversations because it is "like a phone call." Don’t record calls so therefore don’t archive IM chat sessions.

The reinforcing point here in any retention program is consistency. Do what you have always done and if you are going to start doing something new make sure you are well documented/justified and that the timing is not suspect.

Clear as mud, right? The discussion over the burden that US companies feel to be compliant and prove their innocence is a topic for another day. Also, please don’t take this as the gospel as I am not a lawyer and you should take inventory of your own situation.

Of course, if you are one of the unlucky firms regulated by NASD/SEC (17a-4) rules, you really have no choice here so plan on continuing to roll out storage infrastructure and chasing down how folks interact electronically. You’re welcome.

Defining & measuring “non-business” email

April 6, 2007 / Robert Pease / Leave a comment

Something we help our customers measure and address is the flood of non-business/commercial email that corporate email systems and the users receive every day. This is the load of wanted email that people receive daily – everything from Google alerts to joke of the day emails.

Maybe this stuff is wanted, but hardly business-related and not worthy of the significant costs to not only archive but wade through when trying to find an email during the course of a review, investigation, or legal discovery exercise.

So rather than this becoming a commercial for our intelligent classification product, I wanted to provide some actual data based on my own inbox.

I started this on 3/16 with both my inbox and incoming email from our former alliance manager’s address (I picked up his email when he left the company).

The results:

Number of days: 22 including weekends
Number of emails: 419
Total size: 12.5MB

Wow. That’s a lot of "informational" email filling up my box, clogging our network, and being archived off. Wouldn’t it make more sense to get (most) of this information via RSS feed? At a minimum this stuff should be tagged, routed, and saved for what it is – non-business email.

Maybe this isn’t a huge number to you, but if you have thousands of employees and extrapolate these numbers over a year you quickly need to begin measuring in terabytes.

This is by far not the most sophisticated scenario our email classification product can handle, but demonstrates the importance of differentiating between high-value and low-value correspondence. Especially when the cost per email of legal review can be over $2.

So, how does this work? Artificial intelligence, Bayesian analysis, proprietary algorithms? Truth be told, a very practical methodology:

1. Identify known sending addresses that distribute this kind of stuff (examples include googlealerts-noreply@google.com) – 100% confidence on these and we have a master list of hundreds that can be immediately deployed or reviewed by the legal folks if need be
2. Identify known words/phrases in the senders address that are indicative of this kind of stuff (examples include alert, news, etc) – still high confidence but needs to be validated with an activity profile
3. Identify known words/phrases in the body text (an example is boilerplate unsubscribe language) – prone to false positives if the list of words/phrases is too broad – we recommend less is more for starters

Can you accomplish this without email classification software? Yes, on an individual level if you want to experiment. Just set up a mailbox rule in Outlook using the framework above and you can see this in action. Email classification software deployed on a company-wide basis can do this automatically for all users.

Venture Vice

Stories from the front lines of early stage investing and the people crazy enough to play there

Archiving