The Research
The first phase of research began by leveraging the document delay coding from five prior construction matters. The coding was performed by construction attorneys and industry experts. The documents were primarily emails and attachments with a subset of electronic documents from project drives (PDFs, Word documents, PowerPoints, Excels, etc.) and industry applications (e.g., Primavera). This coding and their corresponding documents were ingested into a supervised machine learning process. Supervised machine learning uses human review coding to teach the machine how to find more of those types of documents. In this first phase, we provided the algorithm with 5,000 delay and 25,000 non-delay documents from these five matters.
The resulting predictive model, designed to find key delay documents, analyzes the text of each unreviewed document, and assigns a probability score, on a 0-100 scale, which indicates its likelihood of a key delay. We used this score to target the population of key delay documents. For our experiments, we withheld a random representative sample of the coded training documents from each matter to test the performance of the model. These test documents were used to evaluate the performance of the model as if it was applied to new delay claim data.
We used two common statistical measures to examine performance: recall and precision. Recall provides the percentage of the “commercially pertinent” (e.g., delay, impact) documents the model found, while precision provides the percentage of false positives the model identified and will require review to find the required number of key delay documents. The model is not perfect, and precision helps interpret how much extra review work will be required. The goal is to obtain high recall and high precision.
Although the initial results of the modeling process were promising, we realized that the training data was filled with noisy words that impaired the model’s ability to identify key delay communications comprehensively and precisely. Examples of this noise included headers, footers, signature blocks, project names, and participants, among others. We then applied a cleansing process to the text of the documents so that the machine learning algorithm could learn from the information that highlights and describes the material delay issues.
In the second phase of our research, we tested different machine learning algorithms and various text preprocessing settings to further improve performance. We tested three different machine learning algorithms, and preprocessing settings that influence how the algorithm interprets phrases or how many words the model can learn from in the training data. We created dozens of models using different combinations of the settings mentioned above to determine the best performance. We used our test set of documents from each matter to evaluate the difference between the models.
Through these experiments, we learned that the delay event or project issues were typically discussed in a subset of a document’s text and that focusing on subsets of text in documents improves results too. We incorporated this into our experiments with a marked improvement in results. This step had a significant impact on the model’s ability to effectively identify how project participants discuss significant delays.
It was important to us, not only to find delay documents that describe the issues that occurred, but also to organize these documents into their “delay event” or the reason the document is delay related. To do this, we developed a taxonomy to categorize the high-scoring (likely delay-related) documents into specific groups. We wanted to make it easier to examine the dataset. For example, wouldn’t it be easier to understand a group of documents that all described that a strike occurred and the impact it had on a specific component of the project? We were focused on creating an approach to automatically find the facts related to the delay and then to proactively provide a legal practitioner with the reason those documents were impactful to the delay on that project.
Results
This research has the potential to redefine how legal professionals structure their review of documents during a construction dispute. Since the resulting methodology uses a pretrained model and predefined taxonomy, this approach can be applied as soon as data is loaded into a document review platform, allowing for speed and reduced costs. It may also support a creative monitoring system on an active construction project. Consider a few examples below.
Early Case Assessment – Quickly finding key delay events and their supporting documentation is critical to the success of a delay dispute. This new approach to examine delay can assist counsel and their experts in establishing the scope of a dispute, the extent of compliance with contractual notices, and the sufficiency and fitness of the existing data to successfully pursue a claim or settlement. Project teams and Management too can benefit from early access to a narrowly tailored set of documents that focuses only on the delay(s) at issue.
Reduce the Time and Cost of Review – This AI-powered approach finds material delay-related documents and enables counsel and experts to exclude the likely irrelevant documents from review. The result is significant time and cost savings. Counsel and experts can then focus on the most meaningful documents without the distraction of other documents that simply “hit” on the keyword “delay.”
Quick Construction Dispute Resolution – During conventional construction disputes, claimants generally have a reasonable time to file a claim once they provide a notice outlining the affected work. However, under some dispute resolution clauses, respondents usually only have 30 to 45 days to prepare a response to a claim. Using this new delay analytics methodology, respondents can perform a more comprehensive assessment of the data necessary to respond to the claim. And respondents can focus on just the material delay-related documents and review them by their purported cause – maximizing the use of the limited review time.
Proactive Project Monitoring – Too often project teams rely on email and phone conversations to communicate schedule changes rather than best-practice daily reporting platforms (e.g., Procore, Constructware, Timber, Primavera). Transition of work and documentation between construction managers and owners creates disorganized plans and tracking of schedule progress. This team of collaborators are starting to use this approach to find chatter within email to bubble up potential delay issues in near real time to help prevent or minimize delay-related challenges on the project.
Conclusion
Compared to other fields, construction legal practitioners have been typically underserved by technology innovators. Often legal technology is designed to broadly support many industries, which has led to counsel and experts using outdated approaches to examine an important and regularly occurring construction issue: delay. Examples include use of keyword searches and predictive coding to find relevant delay documents. While these methods may be helpful under the right circumstances, they are not specifically designed and ultimately are unable to reveal key delay issues on a project regardless of the phase of construction.
With that in mind, we focused our efforts on creating a methodology to revolutionize how the construction industry reacts to and manages delay, both in near real time as a project progresses and after a claim is filed. While we are encouraged by our research, there is more work to be done to enhance the current process. In the meantime, Lendlease and Ankura will continue to rely upon this AI-powered approach to help create cost savings, efficiencies, and a better understanding of the data in delay disputes.