Why Copilot isn't the answer for legal teams

Written by Andrew Mellett | 26/05/2026 3:29:45 AM

Microsoft Copilot has had more than two years on the market, the backing of over $80 billion in AI infrastructure spend, and access to 450 million Microsoft 365 seats. It is the most aggressively distributed AI product in enterprise software history. And only 3.3% of those seats are paying for it.

That is not a distribution problem. Microsoft has the best enterprise distribution in the world. It is a value problem, and for in-house legal teams specifically, it is a structural one.

The data on Copilot performance makes uncomfortable reading for anyone who has already rolled it out, or is being asked to. Accuracy NPS has fallen to negative 19.8, down from negative 3.5 in just six months. Distrust is the primary reason for churn, cited by 44.2% of lapsed users. Paid market share in the US dropped 39% between July 2025 and January 2026. And perhaps most strikingly, Microsoft's own terms of use describe Copilot as being for entertainment purposes only, with an explicit warning not to rely on it for important advice.

For consumer use, that disclaimer is acceptable. For a general counsel advising on a commercial contract, a regulatory filing, or a board matter, it is not. And yet many legal teams are counting Copilot as their AI strategy.

The core problem: horizontal tools cannot solve vertical problems

Why generalist AI fails at legal work

What it is

Copilot is a horizontal tool. It is designed to function across the full surface area of Microsoft's product suite: Word, Outlook, Teams, Excel, SharePoint. Its architecture reflects a product decision to be broadly useful rather than deeply accurate in any particular domain. That breadth is its commercial selling point. It is also why it fails legal work.

Legal work is not horizontal. It is deeply vertical, with domain-specific requirements that a general-purpose assistant is structurally unsuited to meet. A contract review is not a summarisation task. It requires understanding of clause hierarchy, risk allocation, governing law, standard market positions, and the specific commercial context of the relationship. An advice memo requires knowledge of relevant legislation, regulatory precedent, and the organisation's risk tolerance. A promotional compliance check requires accurate interpretation of 180-plus laws across multiple jurisdictions.

Why it stalls teams

The failure mode for horizontal AI in legal is not dramatic. Lawyers use the tool, find it occasionally useful for low-stakes tasks, and quietly stop relying on it for anything that matters. The tool remains in the technology stack, is counted in the budget, and is pointed to as evidence of AI adoption. But it is not changing how legal work gets done.

The more dangerous failure mode is lawyers using horizontal AI for substantive legal work and not recognising the outputs as unreliable. A contract clause flagged as standard by Copilot may not be standard for your industry, your counterparty, or your risk posture. The AI does not know. It has no reference point for what acceptable looks like in your context. And the output looks authoritative even when it is not.

This creates a specific risk that is worse than not using AI at all: the illusion of having been checked.

What high performing teams do

The legal teams generating real, measurable value from AI have made a clear architectural decision: they use horizontal tools for horizontal tasks and purpose-built tools for legal work.

Separate the use cases explicitly. Copilot or similar tools are appropriate for drafting internal communications, summarising meeting notes, and managing personal productivity. They are not appropriate for contract review, legal advice, regulatory compliance assessment, or any output that will be acted on without expert review.
Set accuracy requirements before selecting tools. Define, for each use case, what accuracy rate is required for the output to be usable. Legal advice outputs typically require very high accuracy thresholds. Horizontal tools cannot reliably meet those thresholds against your organisation's specific legal context.
Evaluate AI against your actual work, not vendor benchmarks. The question is not how the tool performs on synthetic test cases. It is how it performs on the contracts, queries, and compliance questions your team handles every day.
Make the distinction between productivity AI and legal AI explicit and formal. Document it in the AI strategy. Copilot has a role. That role is not legal work.
Evaluate purpose-built platforms on context accumulation as a primary criterion. Ask vendors: after twelve months of use, what does the system know about our organisation that it did not know on day one? How is that knowledge applied to new matters?
Audit the accuracy of AI-assisted outputs over time. Teams that track accuracy rates build evidence-based confidence in purpose-built AI and evidence-based caution about horizontal tools.
Treat the deployment of governed legal AI as a risk management intervention, not just an efficiency play. The business case includes the risk reduction from replacing ungoverned AI use with governed AI use.
Deploy self-service legal AI for the most common business queries. When a marketing manager can get an accurate, governed answer to a standard promotional compliance question in two minutes, they stop finding their own answer in ChatGPT.
Communicate the why to the business. Teams that explain the distinction between governed and ungoverned AI, and provide the governed alternative, see significantly higher uptake than teams that simply announce a new tool.

The context gap: why your organisation's knowledge matters

What purpose-built legal AI does differently

What it is

The fundamental difference between horizontal AI and purpose-built legal AI is context. Copilot knows a great deal about language. It knows nothing specific about your organisation. It does not know your standard positions on limitation of liability. It does not know the risk appetite your board has established. It does not know that you always reject indemnification clauses structured in a particular way. It cannot apply your playbooks, because it has never seen them.

Purpose-built legal AI is designed around the premise that your organisation's knowledge is the most valuable input. It accumulates context across every matter, every clause position, every escalation decision, and applies that accumulated knowledge to the next piece of work. After twelve months of use, it knows your organisation's legal work in a way that no horizontal tool ever can.

Why it stalls teams

Teams that have invested in horizontal AI often resist the transition to purpose-built tools because the sunk cost of the Microsoft licence makes the investment feel covered. The legal function has been pointed to as part of the Copilot rollout. Admitting that it is not meeting legal's needs requires a difficult internal conversation about the distinction between productivity AI and legal AI.

What high performing teams do

Make the distinction between productivity AI and legal AI explicit and formal. Document it in the AI strategy. Copilot has a role. That role is not legal work.
Evaluate purpose-built platforms on context accumulation as a primary criterion. Ask vendors: after twelve months of use, what does the system know about our organisation that it did not know on day one? How is that knowledge applied to new matters?
Audit the accuracy of AI-assisted outputs over time. Teams that track accuracy rates build evidence-based confidence in purpose-built AI and evidence-based caution about horizontal tools.

The risk of doing nothing: AI bush lawyers

What happens when legal does not provide a governed alternative

What it is

The alternative to deploying a governed legal AI platform is not the absence of AI in legal work. Business teams are already using whatever AI is available to them to answer legal questions, review contract language, and generate compliance assessments. The question is not whether AI will be used for legal work in your organisation. It is whether that AI will be governed.

Consumer AI tools are being used right now, by people across your business, to get answers to questions that should be routed through legal. When legal does not provide a sanctioned, accurate, governed alternative, the vacuum is filled by ChatGPT, Copilot, or whatever the individual has access to. The advice generated may be wrong. It will be acted on anyway.

Why it stalls teams

The AI bush lawyer problem is difficult to address through prohibition. Telling the business not to use AI without providing an alternative creates resentment and drives the behaviour underground. The prohibition is unenforceable at scale. The only effective response is to provide a better alternative.

What high performing teams do

Treat the deployment of governed legal AI as a risk management intervention, not just an efficiency play. The business case includes the risk reduction from replacing ungoverned AI use with governed AI use.
Deploy self-service legal AI for the most common business queries. When a marketing manager can get an accurate, governed answer to a standard promotional compliance question in two minutes, they stop finding their own answer in ChatGPT.
Communicate the why to the business. Teams that explain the distinction between governed and ungoverned AI, and provide the governed alternative, see significantly higher uptake than teams that simply announce a new tool.

Source: Plexus Future-Ready General Counsel 2026 Survey, n=150 General Counsels, January 2026. External citations: Thomson Reuters Generative AI in Professional Services Report 2025; ACC/Everlaw GenAI Survey 2025, n=657; Gartner Legal and Compliance Leader research 2025.

Ready to find out where your team sits on the maturity spectrum? Take the AI maturity assessment or explore the Plexus platform.

View full post