Predictive Coding In eDiscovery: The Game Of Convenience
Predictive Coding In eDiscovery. Back in 2012, Magistrate Judge Andrew Peck’s decision in Da Silva Moore v. Publicis Groupe & MSL Group, 287 F.R.D. 182 (S.D.N.Y. 2012), officially gave the green signal to start utilizing TAR in e-Discovery. The same Judge recently issued an opinion in Rio Tinto PLC v. Vale S.A., 14 Civ. 3042, 2015 WL 872294 (S.D.N.Y. March 2, 2015), titled “Da Silva Moore Revisited”, and stipulated sharing of “seed sets” between parties.
Importantly, the opinion reiterates that “courts leave it to the parties to decide how best to respond to discovery requests” and that courts are “not normally in the business of dictating to parties the process that they should use”.
Importantly, Judge Peck instructed that requesting parties can utilize other means to help ensure TAR training, even without production of seed sets. For instance, the honorable Judge suggested statistical estimation of recall towards the end of the review to determine potential gaps in the production of documents.
Yet, in cases such as Biomet M2a Magnum Hip Implant Prods. Liab. Litig., NO. 3:12-MD-2391, 2013 WL 6405156 (N.D. Ind. Aug, 21, 2013), for example, the court declined to compel identification of seed set, however, encouraged cooperation between parties.
So, where are we going with TAR in Predictive Coding In eDiscovery?
According to the Grossman-Cormack glossary of technology-assisted review with foreword by John M. Facciola, U.S. Magistrate Judge, seed set is “The initial Training Set provided to the learning Algorithm in an Active Learning process. The Documents in the Seed Set may be selected based on Random Sampling or Judgmental Sampling. Some commentators use the term more restrictively to refer only to Documents chosen using Judgmental Sampling. Other commentators use the term generally to mean any Training Set, including the final Training Set in Iterative Training, or the only Training Set in non-Iterative Training”. The important thing to know about seed sets is that they are how the computer learns. It is critical that a seed set is representative and reflects expert determinations.
With this in mind, in one of my articles back in April 2014 titled “E-Discovery Costs vs. Disseminating Justice – What’s Important?” I concluded that technology must strictly be used as a tool in aid to the due-process of law.