The use of Artificial Intelligence (AI) techniques to speed up legal processes is quickly gaining acceptance. As defensibility is key in business critical processes like eDiscovery and other complex investigations, the basis of acceptance, is trust. Trust starts with understanding. Below the two most important principles for the defensible use of AI (the full article can be found here).
Computers need to be trained
Machines do not learn by themselves. The artificial intelligence algorithms used by ZyLAB apply so-called “supervised Machine Learning”, meaning that they learn to recognize a certain category of document using a large number of examples (both positive and negative). Prior to the training process, the documents are first analyzed by the computer to determine the most distinctive words, syntactic or semantic structures within the document set. The user then searches a number of relevant documents for a certain category. From the collection of training documents, ten (10) random sets consisting of 90% training documents and 10% test documents are then generated. The 90% are used to train the algorithms and the 10% (that the computer has not seen previously) are used to test the learning process.
A number of new documents are then classified by the computer and then presented to the user. The user can then indicate which are the correct classifications and which incorrect. This way, the number of training documents can be increased and the next training cycle can be initiated providing there is sufficient inflow. This process continues until the classifier is good enough (that often occurs when both the precision and the recall are above 80%).
Quality control and independent testing are an integral and continuous part of the process
Properly establishment of the defensibility of the entire training process is crucial. This is done by drawing up a so-called “defensibility report”. All the details of the training process are fixed in this report: which training documents, which users have reviewed, where, when, how long, how many training cycles were there, how did the quality of the classifiers develop, etc.
Independent and continual validation of the results and defensibility of the automated process belong to the key elements of the entire process. By taking a random sample and getting specialists or senior lawyers to check them, the quality of the process can be monitored continually.
The algorithms used by ZyLAB have been used for years and have been extensively tested in diverse circumstances in a great many scientific studies. In every case, these algorithms performed better than humans.
In the webinar "Out of the Black Box: Advanced Analytics for eDiscovery", Johannes Scholtes will explain the different data analytic and machine learning techniques that are used to boost performance in an eDiscovery process.