Defining & measuring “non-business” email

Something we help our customers measure and address is the flood of non-business/commercial email that corporate email systems and the users receive every day.  This is the load of wanted email that people receive daily – everything from Google alerts to joke of the day emails. 

Maybe this stuff is wanted, but hardly business-related and not worthy of the significant costs to not only archive but wade through when trying to find an email during the course of a review, investigation, or legal discovery exercise.

So rather than this becoming a commercial for our intelligent classification product, I wanted to provide some actual data based on my own inbox.

I started this on 3/16 with both my inbox and incoming email from our former alliance manager’s address (I picked up his email when he left the company).

The results:

Number of days:  22 including weekends
Number of emails: 419
Total size:  12.5MB

Wow.  That’s a lot of "informational" email filling up my box, clogging our network, and being archived off.  Wouldn’t it make more sense to get (most) of this information via RSS feed?  At a minimum this stuff should be tagged, routed, and saved for what it is – non-business email.

Maybe this isn’t a huge number to you, but if you have thousands of employees and extrapolate these numbers over a year you quickly need to begin measuring in terabytes.

This is by far not the most sophisticated scenario our email classification product can handle, but demonstrates the importance of differentiating between high-value and low-value correspondence.  Especially when the cost per email of legal review can be over $2.

So, how does this work?  Artificial intelligence, Bayesian analysis, proprietary algorithms?  Truth be told, a very practical methodology:

1.  Identify known sending addresses that distribute this kind of stuff (examples include googlealerts-noreply@google.com) – 100% confidence on these and we have a master list of hundreds that can be immediately deployed or reviewed by the legal folks if need be
2.  Identify known words/phrases in the senders address that are indicative of this kind of stuff (examples include alert, news, etc) – still high confidence but needs to be validated with an activity profile
3.  Identify known words/phrases in the body text (an example is boilerplate unsubscribe language) – prone to false positives if the list of words/phrases is too broad – we recommend less is more for starters

Can you accomplish this without email classification software?  Yes, on an individual level if you want to experiment.  Just set up a mailbox rule in Outlook using the framework above and you can see this in action.  Email classification software deployed on a company-wide basis can do this automatically for all users.

Leave a comment