Predictive Coding In E-Discovery: The Game Of Convenience

Back in 2012, Magistrate Judge Andrew Peck’s decision in Da Silva Moore v. Publicis Groupe & MSL Group, 287 F.R.D. 182 (S.D.N.Y. 2012), officially gave the green signal to start utilizing TAR in e-Discovery. The same Judge recently issued an opinion in Rio Tinto PLC v. Vale S.A., 14 Civ. 3042, 2015 WL 872294 (S.D.N.Y. March 2, 2015), titled “Da Silva Moore Revisited”, and stipulated sharing of “seed sets” between parties.

Importantly, the opinion reiterates that “courts leave it to the parties to decide how best to respond to discovery requests” and that courts are “not normally in the business of dictating to parties the process that they should use”.

Importantly, Judge Peck instructed that requesting parties can utilize other means to help ensure TAR training, even without production of seed sets. For instance, the honorable Judge suggested statistical estimation of recall towards the end of the review to determine potential gaps in the production of documents.

Yet, in cases such as Biomet M2a Magnum Hip Implant Prods. Liab. Litig., NO. 3:12-MD-2391, 2013 WL 6405156 (N.D. Ind. Aug, 21, 2013), for example, the court declined to compel identification of seed set, however, encouraged cooperation between parties.

So, where are we going with TAR?

According to the Grossman-Cormack glossary of technology-assisted review with foreword by John M. Facciola, U.S. Magistrate Judge, seed set is “The initial Training Set provided to the learning Algorithm in an Active Learning process. The Documents in the Seed Set may be selected based on Random Sampling or Judgmental Sampling. Some commentators use the term more restrictively to refer only to Documents chosen using Judgmental Sampling. Other commentators use the term generally to mean any Training Set, including the final Training Set in Iterative Training, or the only Training Set in non-Iterative Training”. The important thing to know about seed sets is that they are how the computer learns. It is critical that a seed set is representative and reflects expert determinations.

With this in mind, in one of my articles back in April 2014 titled “E-Discovery Costs vs. Disseminating Justice – What’s Important?” I concluded that technology must strictly be used as a tool in aid to the due-process of law.

As an attorney, I love a good argument corroborated as well as substantiated by solid precedents. Use of TAR in e-Discovery invariably is becoming a matter of “convenience” between both parties in trying to resolve issues. Well, we have arbitration laws for that matter!

Take the course below and sharpen your Excel skills!

e-Discovery and | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
(855) – 833 – 7775
(703) – 646 – 3043

Please follow and like us:

When Should E-Discovery Vendors Be Disqualified? Gordon V. Kaleida Health Case

Generally speaking, courts have inherent authority to disqualify parties, representatives, and consultants from participating in litigation.  Attorneys, expert witnesses, and litigation consultants may face disqualification motions in the event of a conflict of interest. With the rapid expansion of the eDiscovery industry, however, a new question has arisen: If an eDiscovery vendor has a potential conflict of interest, when should it be disqualified?  What standard should apply?

To put the problem in perspective, imagine that you manage discovery at a law firm representing the defendant in a contentious wage and hour dispute, and you recently hired an eDiscovery vendor to assist you in scanning and coding your client’s documents, at a cost of $50,000.  Two months later, you receive notice from your vendor that the plaintiff’s counsel has requested its services in connection with the same case.  How would you react?  Would you expect a court to disqualify the vendor if it accepted the engagement?  This scenario occurred in Gordon v. Kaleida Health, resulting in the first judicial order squarely addressing vendor disqualification.  The Kaleida Health court ultimately denied the defendant’s motion to disqualify, allowing the vendor to continue participating in the case.

Discussion of Gordon v. Kaleida Health

Kaleida Health arose out of a now commonplace dispute between a hospital and its hourly employees under the Fair Labor Standards Act (“FLSA”). The plaintiffs, a group of hourly employees, sued the defendant, Kaleida Health, a regional hospital system, claiming they were not paid for work time during meal breaks, shift preparation, and required training, in violation of FLSA.

Kaleida Health’s attorneys, Nixon Peabody, LLP (“Nixon”), hired D4 Discovery (“D4”), an eDiscovery vendor, to scan and code documents for use in the litigation. In connection with the work, Nixon and D4 executed a confidentiality agreement. D4 was to “objectively code” the documents using categories based on characteristics of the document, such as the author and the type of document. The coded documents would then be used by Nixon in preparing for upcoming depositions.

Two months later, plaintiffs’ counsel, Thomas & Solomon, LLP (“Thomas”), requested D4 to provide ESI consulting services to it in connection with the same case. D4 notified Nixon, who promptly objected based on the scanning and coding services D4 provided the defendant during the litigation. D4 then provided assurances that Kaleida Health’s documents would not be used in consulting the plaintiffs and that an entirely different group of employees would work with the plaintiffs’ counsel. Nixon, on behalf of Kaleida Health, persisted in its objection to D4 working for the plaintiffs and ultimately filed a motion to disqualify the vendor.

Magistrate Judge Foschio’s analysis began by outlining the standard governing the disqualification of experts and consultants.  According to the court, the entity sought to be disqualified must be an expert or a consultant, defined as a “‘source of information and opinions in technical, scientific, medical or other fields of knowledge’” or “one who gives professional advice or services” in that field. After the moving party makes this initial showing, it must meet two further requirements.  First, the party’s counsel must have had an “‘objectively reasonable’ belief that a confidential relationship existed with the expert or consultant.” Second, the moving party must also show “that . . . confidential information was ‘actually disclosed’ to the expert or consultant.”

Applying this standard, Judge Foschio ultimately found that because the scanning and objective coding services performed by D4 did not require specialized knowledge or skill and were of a “clerical nature,” D4 was not an “expert” or “consultant.” Further, the court determined that the defendant failed to prove that it provided confidential information to D4 because it did not show “any direct connection between the scanning and coding work . . . and Defendants’ production of [its] ESI.”

Rejecting Kaleida Health’s argument, the court declined to apply to D4 and other eDiscovery vendors the presumption of confidential communications, imputation of shared confidences, and vicarious disqualification applicable in the context of attorney disqualification when a party “switches sides.” The court— as an alternative basis to its finding that D4 did not act as an expert or consultant—held that disqualification was improper because no “prior confidential relationship” existed between Kaleida Health and D4.

Because Kaleida Health represents the first significant attempt at exploring the issues surrounding vendor disqualification, whether later courts should follow Kaleida Health’s lead in exclusively applying the disqualification rules for experts and consultants to vendors becomes the main issue in its wake.  To come to a conclusion on this point, one must first explore the different schemes that courts may apply when considering disqualification.

This above excerpt is a part of article originally written by Michael A. Cottone, a candidate for Doctor of Jurisprudence, The University of Tennessee College of Law, May 2014.

e-Discovery | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
(855) – 833 – 7775
(703) – 646 – 3043

Appellate Court

Appellate Court – Lahore

Please follow and like us:

The trade-off between ‘Recall’ and ‘Precision’ in predictive coding (part 2 of 2)

This is the second part of the two-part series of posts relating to information retrieval by applying predictive coding analysis, and details out the trade-off between Recall and Precision. For part 1 of 2, click here.

To clarify further:

Precision (P) is the fraction of retrieved documents that are relevant, where Precision = (number of relevant items retrieved/number of retrieved items) = P (relevant | retrieved)

Recall (R) is the fraction of relevant documents that are retrieved, where Recall = (number of relevant items retrieved/number of relevant items = P (retrieved | relevant)

Recall and Precision are inversely related. A solid criticism of these two metrics is the aspect of biasness, where certain record may be relevant to a person, may not be relevant to another.

So how do you gain optimal values for Recall and Precision in a TAR platform?

Let’s consider a simple scenario:

• A database contains 80 records on a particular topic

• A search was conducted on that topic and 60 records were retrieved.

• Of the 60 records retrieved, 45 were relevant.

Calculate the precision and recall.


Using the designations above:

• A = Number of relevant records retrieved,

• B = Number of relevant records not retrieved, and

• C = Number of irrelevant records retrieved.

In this example A = 45, B = 35 (80-45) and C = 15 (60-45).

Recall = (45 / (45 + 35)) * 100% => 45/80 * 100% = 56%

Precision = (45 / (45 + 15)) * 100% => 45/60 * 100% = 75%

So, essentially – the optimal result – high Recall with high Precision is difficult to achieve.

According to Cambridge University Press:

“The advantage of having the two numbers for precision and recall is that one is more important than the other in many circumstances. Typical web surfers would like every result on the first page to be relevant (high precision) but have not the slightest interest in knowing let alone looking at every document that is relevant. In contrast, various professional searchers such as paralegals and intelligence analysts are very concerned with trying to get as high recall as possible, and will tolerate fairly low precision results in order to get it. Individuals searching their hard disks are also often interested in high recall searches. Nevertheless, the two quantities clearly trade off against one another: you can always get a recall of 1 (but very low precision) by retrieving all documents for all queries! Recall is a non-decreasing function of the number of documents retrieved. On the other hand, in a good system, precision usually decreases as the number of documents retrieved is increased”

e-Discovery | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
(855) – 833 – 7775
(703) – 646 – 3043

Recall and Precision

Recall and Precision

Please follow and like us:

The trade-off between ‘Recall’ and ‘Precision’ in predictive coding (part 1 of 2)

This is a two-part series of posts relating to information retrieval by applying predictive coding analysis, and details out the trade-off between Recall and Precision.

Predicting Coding – sometimes referred to as ‘Technology Assisted Review’ (TAR) is basically the integration of technology into human document review process. The two-fold benefit of using TAR is speeding up the review process and reducing costs. Sophisticated algorithms are utilized to produce relevant set of documents. The underlying process in TAR is based on concept of Statistics.

In TAR, a sample set of documents (seed-sets) are coded by subject matter experts, acting as the primary reference data to teach TAR machine recognition of relevant patterns in the larger data set. In simple terms, a ‘data sample’ is created based on chosen sampling strategies such as random, stratified, systematic, etc.

Remember, it is critical to ensure that seed-sets are prepared by subject matter experts. Based on seed-sets, the algorithm in TAR platform starts assigning predictions to the documents in the database. Through an iterative process, adjustments can be made on the fly to reach desired objectives. The two important metrics used to measure the efficacy of TAR are:

  1. Recall
  2. Precision

Recall is the fraction of the documents that are relevant to the query that are successfully retrieved, whereas, Precision is the fraction of retrieved documents that are relevant to the find. If the computer, in trying to identify relevant documents, identifies a set of 100,000 documents, and after human review, 75,000 out of the 100,000 are found to be relevant, the precision of that set is 75%.

In a given population of 200,000 documents, assume 30,000 documents are selected for review as the result of TAR. If 20,000 documents are ultimately found within the 30,000 to be responsive, the selected set has a 66% precision measure. But if another 5,000 relevant documents are found in the remaining 170,000 that were not selected for review, which means the set selected for review has a recall of 80% (20,000 / 25,000).

Click here to read part 2 of 2.

e-Discovery | cloud computing
New Jersey, USA | Lahore, PAK | Dubai, UAE
(855) – 833 – 7775
(703) – 646 – 3043

CEO ClayDesk

Syed Raza

Please follow and like us: