In a post dated April 27, 2009, the Technolawyer blog tells a document review horror story that should never have happened, but not for the reasons the players think.
A big West Coast law firm defending a medical devices case found itself overwhelmed in a large document review that mushroomed into something much larger than anticipated. The firm assigned more reviewers, including an inexperienced younger associate named Marc. Sadly, Marc failed to flag as privileged a document that clearly was. Even worse, Marc was undersupervised because of ridiculous internal firm politics. The cartoon below might be Marc arriving at work. Get the picture?
The document Marc failed to flag privileged of course got produced. (The documents in this part of the review appear to have been paper-source, because they are described as having been OCR’d, and some had marginal handwritten notes.)
“The document in question was a chart of notable events in the history of the litigation prepared by in-house counsel. In addition to its fundamentally privileged content, it contained the attorney’s marginalia — the sort of thing that most of us scrawl on a document when we are certain that it will never fall into the hands of, say, the plaintiff’s attorney.
“The document was so clearly privileged… that each of the eight other reviewers assigned to the case had recognized and tagged its duplicates as such. Marc, however, decided that the document should be produced.” [STOP RIGHT HERE. HOW DID NINE COPIES OF THE SAME DOCUMENT MAKE IT INTO THE REVIEW STREAM SEPARATELY?] And so it made its way, unnoticed, into the batch of documents (which numbered in the tens of thousands) produced for opposing counsel….”
The blog quotes a firm partner explaining how the reviewer missed this:
” ’An experienced reviewer would have recognized that the document was, without a doubt, privileged,’ the partner said. ‘But there was no name on it, and Marc didn’t know to look at the OCR coding[i], which would have told him that it was authored by an in-house attorney. Moreover, he didn’t realize that it was a duplicate of documents that had been tagged as privileged by other people. Maybe the OCR coding failed because of the marginalia; maybe he just didn’t have the experience to de-duplicate [INTERRUPTING AGAIN: IT SHOULD NOT BE THE REVIEWER’S RESPONSIBILITY TO DE-DUPLICATE!] . Either way, he made a bad call.’ ”
According to the Technolawyer posting, the firm partner said the lessons to be learned from this are:
- supervise the reviewers,
- immediately claw back privileged documents (and if necessary fight about it later), rather than pretend nothing went wrong, and
- “not only be aware of duplicates, but remain mindful of the limitations of even the best eDiscovery tools. OCR is not a perfect technology.”
Here’s where I have a big problem — not with Technolawyer, but with the Big Law Firm. Unless the variation in OCR quality was right off the Richter scale, there is no excuse for nine versions of the same document, even those with handwritten marginal notes, to have gone into review separately. None.
Any e-discovery consultant or vendor with even moderate sophistication knows about software that performs near-duplicate detection. One of the best-known is Equivio.
Near-duplicate detection software will catch different variations of what is essentially the same e-mail or electronic document, just different revisions. It will catch the same document both in its Word format and in PDF format, clearly an instance where the hash value would be completely dissimilar. And it is very commonly used to catch multiple copies of the same paper document that inevitably come out slightly different when OCR’d. I’ve known litigation support vendors who have used it for this purpose for several years now, and their clients appreciate its benefits.
Near-dupe detection software can be calibrated to group documents together based on a percentage degree of similarity. If you have a batch with wide variability in OCR quality, you’d set the percentage lower than if you’re confident the OCR quality is consistently high.
I don’t know if near-duplicate detection was used in this case, or whether it was considered but a good reason existed not to use it. From the way this story is told, it does not sound like it was used.
So, Big Law Firm, you shouldn’t be so quick to blame the smart-ass young associate. This document shouldn’t have gotten to him in the first place. It should have been bundled together with its other eight near-duplicates, and reviewed by someone with more seniority. The cost of near-dupe detection is a lot less than the cost of reviewing the same document nine times. Even without the error, your client should have fired you for that alone. (This paragraph assumes near-duplicate detection was not used or considered. If it was, never mind. )
[i] As written in the Technolawyer blog, which in turn is a direct quote. I am not certain of the meaning of “OCR coding”. In my lexicon, something is either OCR’d or it is coded.