Skip to content

Business Case: Finding the needle in the e-haystack

|Written By Vawn Himmelsbach

With companies creating electronic documents at an alarming rate, corporate counsel may be faced with the prospect of culling through millions of documents in search of a smoking gun. New technology tools like text mining can help cull through this data to find relevant documents faster, including e-mail and text messages.

In the past, corporate counsel simply searched through filing cabinets or boxes to find relevant documents, such as contracts or letters, when searching for that proverbial needle in a haystack. But in today’s electronic world, they often have to search through employee e-mail, servers, and back-up tapes as well.

“They can’t just browse through 100 people’s e-mail boxes looking for an e-mail that might be relevant,” says Martin Felsky, CEO of Commonwealth Legal.

In one case, a client had collected 40 terabytes of data, which is twice the size of the U.S. Library of Congress. “Corporate counsel definitely need tools to find documents that might be relevant to a case.”

Most companies don’t have those tools, although they may have basic building blocks in place. If a company is using Microsoft Outlook as its e-mail program, for example, corporate counsel can use Outlook’s built-in search capabilities. But it only searches the body of an e-mail, not attachments.

“It’s not designed for litigation purposes where I need to make sure that I search everything,” says Felsky.

Another problem is something he refers to as the “moral hazard.” If a manager is accused of sexual harassment in the workplace, for example, it’s possible he or she will delete or conveniently forget to forward any relevant e-mail to corporate counsel. “You have to take control of the search for these documents,” says Felsky. “You can’t let employees of the company determine what’s relevant and what’s not relevant.”

In 2006, the U.S. brought out new rules in civil procedure to deal with e-mail, and although Canada doesn’t have similar legislation at this point, the implications are global, says Ross Armstrong, senior research analyst with Info-Tech Research Group. The legislation is intended to reduce the length of civil cases, which are often drawn out for months when e-mail is subpoenaed as evidence.

If the defendant is unable to produce these subpoenaed documents in a certain time frame, judges will often rule against them. So, while e-mail evidence is all-important, says Armstrong, being able to find those e-mails quickly is just as critical.

Data mining includes numeric mining, which involves aggregating and extrapolating statistics and trends. The other, and slightly newer, interpretation is around text mining, which uses words or concepts to find documents that might be useful, says Mike Savage, a partner with fraud investigation and dispute resolution at Ernst & Young.

Text mining is much more efficient than manual methods, he said, because you’re using computer horsepower rather than intellectual horsepower. “You’re using the ability of the computer to work all night hunting through search strings and come back with this word close to that word,” he says.

Human beings are more fallible and might gloss over the keyword they’re looking for. Computers, on the other hand, are 100-per-cent accurate. And today’s software is good at finding deleted files; Microsoft Windows may delete the cross-reference, but it hasn’t wiped the file off the hard drive.

And this can help corporate counsel find that smoking gun e-mail. Or, in some cases, they’re able to clear a client using e-mail correspondence. “Sometimes the defence lies in understanding the context more than just the little sound bite,” said Savage.

KPMG uses data clustering software tools to extract words or concepts from electronic files, including Word documents, Excel spreadsheets, PowerPoint presentations, and e-mail.

“If documents are clustered into large groups of files that are talking about the same things, then you can make decisions on whether those documents are potentially relevant or not relevant much more quickly,” says Brian Reny, national director of electronic discovery with KPMG’s Forensic practice. “We literally get involved in cases that have millions of documents.”

Using manual methods, a lawyer can review anywhere from 40 to 75 documents per hour, he said. By using a tool, such as data clustering, they can increase the rate of review to between 800 and 1,000 documents per hour.

But these tools can be costly. If a case involves a single incident or a small number of people, relying on hard documents may suffice. But if the case takes place over a long period of time, where there’s a vast number of documents or a large group of employees, it makes sense to search through electronic records.

For a company faced with litigation on a daily basis, this may be something they want to invest in (most, however, haven’t, including large pharmaceutical companies that regularly deal with lawsuits). Others may choose to work with a third-party provider.

Some companies are rarely, if ever, sued and don’t want to make a large investment in data-mining tools, so it’s a question of what the courts will find reasonable, says Info-Tech’s Armstrong.

“The judges and prosecutors are recognizing not only the importance of electronic messaging for civil procedures, but also that there are going to be some limitations, particularly for smaller organizations that have fewer resources to throw at this problem.”

Some may already have a document management system in place, which provides a foundation for data mining. Corporate counsel should ask their IT department if the company is using some type of document management system, such as Hummingbird or iManage, which allows users to save documents that can later be retrieved in an efficient manner, says Commonwealth Legal’s Felsky.

The problem, however, is that some employees might continue to save documents on their hard drive. “Some companies will have document-management systems,” says Felsky. “But it’s important for corporate counsel to understand those were put in for business purposes, not for litigation purposes.”

If the company doesn’t have a system like this in place, corporate counsel can ask the IT department if they’re capable of installing a program and running an indexing process that will allow them to do this kind of search.

There are a number of these tools on the market, says Felsky. For small to mid-sized organizations, a tool called dtSearch works on desktops or company servers to index all documents and e-mail. Google offers an enterprise search appliance that culls through data on web servers, file servers, content management systems, relational databases, and business applications through its search box.

Once corporate counsel find relevant documents, they must be able to produce the originals in court. “We freeze the evidence in the form of an image so the documents that are relevant are produced as an image file rather than a Word document or PowerPoint or Excel spreadsheet,” says Felsky.

The last thing you want to do is lose that data once you’ve identified it, says Peter Vakof, a partner with PricewaterhouseCoopers’ forensic technology solutions. If you don’t isolate a backup tape, for example, and that tape is due for rotation by the IT department, the data could be overwritten.

Still, it’s important not to rely solely on technology, says Vakof. Test the tool and make sure it’s searching properly, and ensure that you use a range of possible search terms. If you rely blindly on the software and it doesn’t find anything, yet the other side tested it and found a flaw or used a different software program that caught something, you’re in trouble. “Use the technology to facilitate the review and find the issues,” he says.

“But, nonetheless, the key responsibility still lies with the reviewer.”