For many years keyword searches have been, if not the gold standard then at least, the normal standard for identifying potentially relevant documents for legal review. With the advances made in Technology Assisted Review (TAR) however questions are raised as to whether keywords still have a place in eDiscovery. So, will 2020 see the end of keywords?
It is not surprising that keyword searches have been adopted as a standard by the legal industry. Lawyers are taught the basics of Boolean searching at university and we all rely on keyword searching when we surf the net. It is reasonable therefore to expect these skills to be foremost in a Lawyer's minds when faced with the need to find relevant documents. The question is should keywords should still be seen as the standard.
In his decision in Da Silva Moore v. Publicis Groupe - 287 F.R.D. 182 (S.D.N.Y. 2012) Peck, J believed that:
"In too many cases......the way lawyers choose keywords is the equivalent of the child's game of "Go Fish"", and that "keyword searches usually are not very effective".
Justice Peck went on to reference the David L. Blair & M. E. Maron article, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 Comm. ACM 289 (1985) which studied the effectiveness of experienced lawyers retrieving relevant documents by use of keywords and other review techniques and found that on average recall using these methods was only 20%.
This finding was supported by Jason R Baron, the US National Archives Director who, in an interview with ESI Bytes http://esibytes.com/beyond-key-word-searching-in-electronic-discovery/, believed that as much as 78% of relevant documents may be left behind if Boolean searches are used alone.
Keyword searches have certainly developed since Blair & Baron's article in 1985 and most review tools now include a variety of methods for improving their efficiency, including the use of dictionaries, keyword expansion and fuzzy searches. However, even using these advanced methods, it should be recognised that in most cases keywords will still not find all potentially relevant documents in a matter but rather only a sample of them.
If keywords are not to be relied upon then what are the alternatives?
The EDRM guidelines define Technology Assisted Review as "a review process in which humans work with software to train it to identify relevant documents". Whilst many TAR reviews are carried out across sets already culled using keywords, there is now movement towards running TAR on larger data sets in order to improve the recall and precision in finding relevant data.
As a hypothetical, if we took a 100,000 document set in which we knew 10,000 documents were relevant and applied Keyword searches to them, using the 20% recall rate identified by Blair & Maron we could assume that 2,000 documents would be returned for review. Let's then say that improvements in keyword search development improves this 100-fold and we return 40% of the relevant material totaling 4,000 documents.
A well-run Technology Assisted Review on the other hand may see around 12,000 to 15,000 documents reviewed which at first glance looks to be bad result as the time and cost to review an additional 6,000+ documents may be seen as a negative. However, even if we assume that the TAR workflow only finds 95% of the relevant documents the results of this approach are much more accurate and much more defensible as opposed to only finding 40% of the relevant documents. To reach the same level of recall in a traditional linear review would statistically require review of at least 95,000 of the 100,000 documents.
These hypothetical numbers are supported by the finding of John Tredennick and Andrew Bye in their 2017 article How Good is That Keyword Search? Maybe Not As Good As You Think in which they reported that the use of keywords only located 39% of the potentially responsive documents in their matter.
Whilst TAR is also not perfect, and relevant documents will be left behind, the accuracy of the results will be clearly be better than those provided by keyword searches alone.
Remember your goal should be to find all relevant documents within the boundaries of what is proportionate for a party to spend in terms of time and cost in trying to find these documents.
Whilst keywords appear to be a good starting point and a quick way to cull data you should also remember that keywords are generally based on your current understanding of a matter. If your keywords are built on an imperfect knowledge, it is no surprise that the results too will be imperfect.
The risk of developing imperfect keywords was evidenced in the US decision of Abbott Laboratories, et al. v. Adelphia Supply USA, et al., No. 15 CV 5826 (CBA) (LB) (E.D.N.Y. May 2, 2019), a case which Judge Lois Bloom described as "...a cautionary tale about how not to conduct discovery in federal court." In addition to a number of other discovery failings that led to a ruling against the Defendant, Judge Bloom placed emphasis on the Plaintiffs argument that the ."...defendants purposely designed and ran the "extremely limited search" which they knew would fail to capture responsive documents."
If using Keywords, every care should therefore be taken to ensure that they are carefully designed and tested so that their effectiveness can be defended.
Since the decision in Da Silva Moore there has been an explosion in the use of TAR. In a recent report by Research and Markets - LegalTech Artificial Intelligence Market by Application and by End-User: Global Industry Perspective, Comprehensive Analysis, and Forecast, 2018 – 2026 it was predicted that the spend on TAR would grow from $3.2 billion in 2019 to $37.8 billion in 2026.
Despite the acceptance and expanding use of TAR, it still may not yet be seen by all as a replacement for keywords. There is however no denying that there is a shift away from the use of keywords as the standard.
In the recent US case Nuvasive, Inc. v. Alphatech Holdings, Inc., No. 18-CV-0347 (S.D. Ca.) (10/7/2019) the court ruled that:
"...electronic discovery has moved well beyond search terms. While search terms have their place, they may not be suited to all productions. Technology has advanced and software tools have developed to the point where search terms are disfavored in many cases...."
As history has shown us however change is sometimes slow to come and whilst 2020 may not see the death of keyword searches, the advances seen in the use of TAR may mean that we see a decline in the reliance upon them.