Getting rid of déjà vu

The bad news about electronically stored information is that there’s so much of it.  The good news is that it can easily be deduped.  And the really good news is that full-scale deduping can get rid of a lot more than you might have guessed.

In the August 2009 issue of Law Technology News, available at Anne Kershaw and Joe Howie report on a study they conducted in May by surveying 18 e-discovery vendors.  Confining the scope strictly to pure de-duping (as opposed to near-duplicate detection, e-mail threading, etc.), they found that deduping within a single custodian reduced the number of documents by an average of 21.4 percent; if performed across multiple custodians, the average reduction nearly doubled to 38.1 percent. 

Yet the vendors indicated that while they all offered cross-custodian deduping, only 52 percent of the projects got it; in the remainder, their clients opted for either single-custodian deduping (41 percent) or none at all (seven percent). 

Until a few years ago, for many e-discovery vendors, the machine burden of deduping across custodians was much greater than doing so within one custodian’s collection.  Some vendors charged nothing for deduping within custodian but charged extra if done across custodians, to compensate for the extra machine time and effort. 

Also, in the then-common linear review paradigm (each custodian’s data kept together and reviewed as

Without de-duping across all custodians, you need a huge number of reviewers

Without de-duping across all custodians, you need a huge number of reviewers

a unit) deduping within custodian only was supported by the prima facie plausible argument that “it’s a more accurate picture” of the data to know who had what, even if it did mean that the same document was going to show up multiple times in different custodians’ collections.  The mere fact of it being in Al’s collection as well as Barbara’s and Charlie’s was somehow considered sufficient differentiation to justify keeping all three. 

Deduping technology is now much better, so cross-custodian deduping no longer grinds the system to a near halt.  On top of which, as this article points out, if you need a report as to what other custodians also had a particular document, just about any vendor or hosting platform can generate this. 

Articles such as this one by Anne and Joe, and other consultants, should reassure lawyers that deduping across the entire database is not just all right, it’s practically incumbent upon them.  As these authors state, with the concurrence of several judges they consulted:   “Lawyers who fail to check for duplicates across multiple custodians, instead removing only duplicates from within the records of individual custodians, end up reviewing at least 20% more records on average. Whether or not their document review bills are ever audited, these lawyers are not meeting their ethical obligations to both clients and the justice system.”


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: