This post isn't a comprehensive blow-by-blow of GPT-5.2; I'm just looking at how it performs in a typical business use case compared to some alternative models we're using today in various projects.

Other Resources

For a more complete understanding of the broad spectrum of what's new in gpt-5.2, and what it means, I suggest check out these two sources:

Matthew Berman posted a YouTube video first take on how 5.2 advances the OpenAI features and capabilities. Deep dive into stats and examples of advanced prompting features.
Nate Jones is my go-to on what it means when these new advances enter the market. Check out this newsletter edition. He also has a YouTube channel that's worth checking out.

My Quick Tire-Kicking Test

In this post I'm sharing some "tire kicking" I did with GPT 5.2, and comparing it apples-to-apples in one of the application domains I've done a lot of work in – contract analysis using LLMs.

Test Setup

The test framework is a codebase available here in my Github Repo. The test is a simple command-line python app that uses OpenAI's Structured Model Outputs API to extract various terms and content from business contracts.

In this post I used the sample contract MEDIWOUNDLTD_01_15_2014-EX-10.6-SUPPLY AGREEMENT from the CUAD_v1 example contracts dataset.

For the head-to-head test I focused on the "Summary" data element – which is a prompt to extract a brief summary of the entire contract. This is an interesting field to focus on since it's the one that asks the LLM to read the entire contract but summarize it briefly. I like to use it because I can see which details various models pay attention to and how deeply the model reasons within the contract before creating the summary.

class Contract(BaseModel):
    summary: str = Field(
        ...,
        description=(
            "A factual, objective summary of the contract in 2-4 sentences. "
            "Include: what the agreement is about, the main obligations, and key terms. "
            "Write in third person without pronouns (they/it/this). "
            "Example: 'The agreement establishes a partnership between Company A and Company B for software development services. "
            "Company A will provide development resources while Company B provides project management. "
            "The contract runs for 2 years with a total value of $500,000.'"
        ),
    )
    # other extraction definitions follow...

Test Results

Overall, I like GPT 5.2's summary--a lot. Below is the output of each model I tested, and a quick hot-take on how each compares to the others.

Llama-3.1-8B-Instruct-FP8

Llama 3.1 provided a good summary – definitely great considering the size and low cost to run the model.

This response is a high-level, surface summary that correctly identifies the parties and purpose but lacks the specificity, structure, and legal nuance shown by all the larger models.

💡

This contract is a supply agreement between MediWound Ltd. and Challenge Bioproducts Corporation Ltd. for the supply of Bromelain SP, a raw material used in the production of a pharmaceutical product called Debridase. The agreement was made on January 11, 2001, and amended on February 28, 2010. The main terms of the agreement include the supply of Bromelain SP by CBC to MediWound, the purchase of Bromelain SP by MediWound from CBC, and the payment terms for the supply of Bromelain SP.

gpt-4.1

Next I tried gpt-4.1, which is still a go-to model for the teams I work with. This sits in the middle ground—clearer and more legally framed than Llama-3.1, but less detailed and less contract-aware than gpt-oss-20b and the GPT-5 series.

💡

The agreement is a supply contract between MediWound Ltd. and Challenge Bioproducts Corporation Ltd. (CBC) for the exclusive supply of Bromelain SP, a raw material used in the production of a pharmaceutical product for burn treatment. MediWound will purchase Bromelain SP in bulk from CBC, who will manufacture and deliver it according to specified quality and regulatory standards. The contract includes exclusivity, quality assurance, payment terms, confidentiality, and termination provisions, and is governed by the laws of the State of New York.

gpt-oss-20b

Next up is Open AI's open weight model, which also produced a nice output. Compared to Llama-3.1 and GPT-4.1, this model goes deeper into commercial and operational details (pricing, termination windows, obligations), but is less precise than GPT-5.x in legal structure and conditions precedent.

💡

This Supply Agreement between MediWound Ltd. (Israel) and Challenge Bioproducts Corporation Ltd. (Taiwan) governs the exclusive supply of Bromelain SP, a raw material used to produce the pharmaceutical product Debridase. MediWound pays a $1,234 upfront fee and purchases Bromelain SP at a price of $200 per kilogram, subject to annual quantity‑based pricing, with payment terms, quality acceptance, and inventory obligations. The agreement remains in effect until terminated by either party with six‑month (MediWound) or twenty‑four‑month (CBC) notice, and includes detailed manufacturing, quality, liability, confidentiality, and dispute‑resolution provisions.

gpt-5.1

The premium OpenAI model until today. This response provides a well-balanced, lawyer-style summary with strong operational and regulatory detail, clearly outperforming GPT-4.1 and gpt-oss-20b in completeness while remaining concise.

💡

The agreement is a long-term exclusive supply contract under which Challenge Bioproducts Corporation Ltd. manufactures and supplies Bromelain SP to MediWound Ltd. for use in MediWound’s bromelain-based pharmaceutical product Debridase for burn treatment. MediWound must provide annual forecasts, place purchase orders and pay agreed prices, while CBC must maintain production capacity, comply with cGMP and other regulatory standards, and ensure quality, storage and recall support. The contract includes detailed pricing and adjustment mechanisms, quality assurance and inspection rights, intellectual property and confidentiality protections, and insurance and liability allocations. The agreement was originally executed on January 11, 2001 and amended on February 28, 2010, with New York law and arbitration in New York City governing disputes.

gpt-5.2

Now we finally get to the new frontier model. Compared to gpt-5.1, this version adds the most contract-specific nuance (conditions precedent, exclusivity mechanics, termination triggers), making it the most faithful and legally precise representation of the agreement overall.

💡

The agreement is a supply arrangement under which Challenge Bioproducts Corporation Ltd. (CBC) will manufacture and supply Bromelain SP (a pineapple-stem-derived raw material) to MediWound Ltd. for use in developing and producing the bromelain-based pharmaceutical product Debridase for burn treatment. The agreement becomes effective only after specified conditions precedent are met (execution of a know-how license with LR and execution of a technology transfer agreement), with a termination trigger if unmet by January 31, 2001. CBC grants MediWound an exclusivity restriction preventing CBC and its affiliates from supplying Bromelain SP for competing products, and the parties set pricing, forecasting, acceptance/rejection, quality/regulatory compliance, recall responsibility, insurance, confidentiality, and termination notice rights. The agreement was originally entered into on January 11, 2001 and was amended effective February 28, 2010.

💡

I work at Microsoft, but the views expressed here are my own and do not represent Microsoft's official positions, guidance, or content.

Hot Take on how GPT-5.2 compares in document summarization

Building a GraphRAG App: Concepts, Code, and Demo

Microsoft Ignite 2025 Keynote - Notes From the Field

NVIDIA DGX Spark is the data-center AI lab that fits on your desk, but is it right for you?

Other Resources

My Quick Tire-Kicking Test

Test Setup

Test Results

Llama-3.1-8B-Instruct-FP8

gpt-4.1

gpt-oss-20b

gpt-5.1

gpt-5.2

Vectorizing Images with LLMs

Running OpenAI’s GPT-OSS-20B Locally with Open WebUI (Full Setup Guide)

Unlocking Hidden Insights: Azure OpenAI Meets Fabric in Unstructured Data Analysis

How to Create a Google Gemini Chatbot in 20 lines of Code

Build a Generative AI Copilot for Microsoft Teams using Copilot Studio