ChatGPT Can’t Analyze Your Contract Yet, But There’s Potential

Imagine automating ChatGPT to retrieve the contract information you care about—terms like auto-renewal, payment terms, termination rights—instead of asking contract managers or lawyers to flip through contracts to find the same information. The potential for cost savings is dramatic.

So I tried it. Using GPT-4, I fed ChatGPT a couple publicly available vendor contracts and asked it a bunch of questions that I would normally ask a contract manager or lawyer as part of a vendor contract review process.

Does the contract auto renew?
Does the contract have late fees?
What are the payment terms, net 30?
Does the contract have termination for convenience?
Does the contract give any product warranties?
Does the contract transfer IP? Are liability limits mutual?
Do the liability limits carve out IP indemnity?
What is the governing law?
Does the vendor process personal data?
What does the vendor use customer data for?

Here’s what I found.

Hallucinations

The model sometimes finds things in the contract that aren’t there.

For example, when asked what the contract says about use restrictions, GPT-4 found an “Exhibit B - Acceptable Use Policy” that was nowhere to be found in the actual contract I provided.

When prompted, ChatGPT recognized its mistake.

Lapses

The model sometimes can’t find things in the contract when they are there.

For example, when asked about governing law, GPT-4 claimed that the contract didn’t include information about governing law, even though the words “governing law” appeared verbatim in the contract.

When prompted, ChatGPT, again, recognized its mistake. What follows is our exchange.

Lack of Precision

The model sometimes confuses similar, but different phrases or concepts.

For example, where a contract liability limit excludes direct IP claims between the parties but not third-party IP claims (i.e., IP indemnity obligations), the model cannot separate the two. Its analysis resorts to “general” and “typical” definitions rather than the actual text.

When prompted, ChatGPT recognized its error and refined its analysis.

Potential

Despite its shortcomings, the model shows potential. It does sometimes answer correctly. When it does, it is quite impressive and useful.

For example, when asked about the vendor’s collection and use of data, the model gave accurate and comprehensive answers, pulling from and citing different sections of the contract.

This is valuable because even to experienced contract reviewers, it’s often not obvious where to look to find information about data collection and use. Often, several different sections address the various types of collected data along with the different purposes and use. ChatGPT can cull and synthesize information from different sections, much faster than manual review.

Takeaways

GPT-4 is not reliable for contract review yet. It is like a bad or beginner contract reviewer who sometimes misses things, gets things wrong, or just flat out makes things up.

But, when it got things right, it was able to gather information buried in multiple places faster than a human reviewer, usually within one minute.

Part of the difficulty in using ChatGPT for contract review today is that it is unclear what types of questions ChatGPT is good or bad at answering. I didn’t find any patterns to the types of questions that ChatGPT reliably answered correctly or incorrectly. ChatGPT is unpredictable.

You might wonder if better prompt engineering would yield better answers. This is possible. When I tested variations in word choice and phrasing of my questions, the outcomes didn’t change meaningfully.

Two strategies, though neither foolproof, improved GPT-4’s accuracy.

First, you can prompt ChatGPT to double check its answers by asking, “Are you sure?” By doing so, ChatGPT often corrected its mistakes. This tactic worked a few times, though not every time.

Second, you can use an AI application that is specifically designed to analyze contracts. My early experimentation using a program like this showed better results than ChatGPT, but it still had hallucinations and lack of precision.

This article does not necessarily reflect the opinion of Bloomberg Industry Group, Inc., the publisher of Bloomberg Law and Bloomberg Tax, or its owners.

Author Information

Tammy Zhu is a tech lawyer who helps companies build and use AI products and scale commercial functions. She is the VP of Legal at Sourcegraph, Inc.