Like any new colleague, GenAI demands more than blind trust. It performs with blistering speed, flawless recall, and an uncanny ability to mirror our words, yet its strengths can be deceptive. It doesn’t think or understand—it predicts. And without knowing its inner workings, we risk misjudging its capabilities and overlooking its flaws.
Unlike a junior associate, GenAI seldom pauses to ask clarifying questions or to flag a legal gray area before making its decisions. It lacks judgment, intuition, and ethical grounding—though it may convincingly pretend otherwise. Yet, under the right conditions, these systems offer extraordinary efficiency, insights, and analytical power. The key is knowing how they process information, how they generate responses, and how their technical underpinnings create the illusion of thought. This article pulls back the curtain on the psychology of large language models—how they predict, mimic, and sometimes mislead—and how, as lawyers, we should interact with them to harness their power without falling into their traps.
Meet GPT: It Had Me at Hello
Generative AI comes in many forms, but one of the most common is the Generative Pre-trained Transformer (GPT), which is a type of large language model (LLM) designed to process and generate human-like text. Throughout this article, the term LLM will be used to refer to these types of generative, transformer-based models exclusively.
At its core, LLMs like GPT are advanced pattern recognition systems, producing text through complex probability calculations. They operate using deep learning, a branch of machine learning (ML) that trains artificial neural networks to recognize and generate meaningful language patterns.
Unlike traditional ML models that classify data or predict continuous values, an LLM’s primary function is next-token prediction—determining the most likely next word (or token) based only on prior context. Like an attorney making arguments in real time, these models can’t look ahead but must build on what was already stated. This autoregressive nature allows chatbots such as ChatGPT, DeepSeek, Claude, Mistral, LLaMA, and Gemini (text mode) to generate fluid, conversational, and adaptable responses, making them useful for legal drafting, contract analysis, and caselaw summaries.
LLMs are built on a Transformer architecture, a deep learning framework that efficiently processes language using self-attention mechanisms, which allow the model to assign importance to words based on their relationships within the input. Each token is transformed into a vector embedding that represents its meaning with numbers. These pass through feedforward networks (neural layers), which refine predictions by adjusting values that influence a token’s impact on the final output.
Because an LLM generates text one token at a time, it applies self-attention only to prior words, making its responses dynamic, and natural—like human speech. If you’ve marveled at how effortlessly GPT drafts emails, summarizes dense legal opinions, or crafts complex contracts, you’re not alone. Its outputs are impressive, even charming. But behind the smooth talk lies a machine that’s both brilliant and unpredictable. Its accuracy depends on many hidden technical factors, not the least of which is its training data.
Going to School: How LLMs Learn
Like any well-trained professional, an LLM develops its skills in stages: pretraining, fine-tuning, and active use (inference). While it may seem like an AI simply “knows” the law, its responses are shaped by how it was trained—and understanding this process is essential to using it effectively.
Pretraining: General Schooling
Before an LLM can draft contracts or summarize caselaw, it must first learn language from the ground up through pretraining. In this phase, the model is exposed to vast amounts of text—books, articles, legal opinions, and more—learning grammar, factual relationships, and linguistic patterns through a process called self-supervised learning. Unlike traditional learning, LLMs refine their understanding by predicting missing words and adjusting based on accuracy.
To generate responses, an LLM relies on sampling techniques that determine how predictable or varied its output will be. Greedy decoding always selects the most probable next word, leading to highly structured but sometimes overly rigid responses. Top-k sampling introduces variety by choosing from the k most likely options, while top-p (nucleus) sampling further refines this by dynamically selecting from a probability-based subset of likely words, striking a balance between predictability and creativity.
Wrong guesses adjust the model’s parameters—internal numerical values, known as weights and biases, that control how much attention GPT assigns to words and concepts. These continuous refinements enhance accuracy, but since GPT trains on human-generated data, it inevitably absorbs biases and inconsistencies. A pretrained model alone is therefore rarely useful—not just due to bias, but because it lacks the ability to interpret its data in ways that align with human reasoning. Without further task-specific refinement, its responses can be misleading, overly generic, or misaligned with user expectations.
Fine-Tuning: Specializing Like a Legal Associate
Fine-tuning improves accuracy and reliability by refining the model for specific tasks (e.g., handling instructions and prompts) and aligning it ethically. Unlike pretraining, it uses smaller, focused datasets but incorporates human-curated (labeled) data or corrective processes such as Reinforcement Learning from Human Feedback (RLHF) or AI Feedback (RLAIF), where responses are scored for quality, correctness, and ethics. Think of this as a junior associate learning from senior attorneys—refining arguments, improving precision, and aligning with professional standards.
Fine-tuning can train a model against bias, preventing it from divulging criminal information or acting unethically, but these are artificial constraints. Experienced prompt engineers can sometimes bypass these restrictions (jailbreaking). Fine-tuning also doesn’t prevent hallucinations—false, misleading, or fabricated information (e.g., nonexistent case citations) not grounded in real-world facts. Therefore, while fine-tuning improves task-specific performance, it doesn’t eliminate fundamental model limitations such as biases in training data, reasoning gaps, or the inherent randomness of predictions.
The type of training and feedback a model receives determines its strengths, weaknesses, and reliability in legal applications. Any unlawful, defamatory, or misleading information a model was trained on may persist; hence, LLMs are only as good as their training.
Models trained on websites can contain special democratized biases or inaccuracies. Ask OpenAI’s GPT-4o to generate images of clocks at specific times or a left-handed writer drafting a manuscript—as of this writing, it will always depict 10:10 on clock faces and only right-handed writers because online images overwhelmingly follow these patterns. Public-facing chatbots like GPT-4o use vast datasets fine-tuned with human feedback, while Claude and BLOOM filter high-toxicity data and LLaMA combines public sources with expert annotations.
In law, an LLM’s performance depends on whether it was trained broadly on general legal texts or fine-tuned for specific legal domains. A contracts-focused LLM trained primarily on agreements, regulatory filings, and transactional templates will excel at drafting contracts but may struggle with complex litigation strategies or interpreting caselaw nuances. Conversely, an LLM fine-tuned on court opinions, procedural rules, and trial transcripts will be highly effective in legal research, motion drafting, and litigation risk assessment but less adept at transactional work.
Inference: Putting Training into Practice
Once fine-tuned, the LLM is ready for active use—or, in AI terms, inference.. When a user inputs a prompt, the model tokenizes the text, applies attention mechanisms to determine context, retrieves vector embeddings, and processes them through weighted layers to predict the most likely sequence of words. Each step guides its output to be contextually appropriate, responsive to legal nuances, and—at times—convincingly human-like. This foundation in transformer-based learning, probabilistic decision-making, and human feedback loops is what allows LLMs to produce text that feels both intelligent and conversational.
Legal documents are often underrepresented in general AI training datasets, but attorneys can refine pretrained and fine-tuned models to better suit their needs. Customizing a legal LLM allows lawyers to adapt the model to specific practice areas, jurisdictional nuances, and firm-specific language, enhancing accuracy in research, drafting, and analysis.
For example, OpenAI allows attorneys to upload documents to a custom GPT, or adjust model settings (style, randomness, and focus) for custom applications through its API. This further tailors AI for tasks such as drafting legal documents, summarizing caselaw, and answering client questions with greater precision.
Custom LLMs offer the following benefits:
- Custom fine-tuning. Train the model on firm-specific data (e.g., prior cases or contracts) to enhance accuracy for legal applications.
- Temperature settings. Adjust the creativity of the AI’s responses—lower values for precise legal drafting and higher values for brainstorming.
- Retrieval-augmented generation (RAG). Combine the model with legal databases to provide accurate, context-specific answers.
The Chatbot Mind and Its Limitations
Chatbots likeChatGPTare defined byparameters, token training size, training data quality, fine-tuning, and inference capabilities. Characteristics are shaped by the LLM’s underlying technology:
- Understanding: Training—Chatbots don’t understand text like humans; they recognize and reproduce patterns from training data. Often, more data are better, but many LLMs have a training cut-off date, limiting their knowledge of recent developments or law. Also, excessive training on specific data can lead to overfitting, making models rigid and unable to adapt to new legal precedents, jurisdictional variations, or updated templates. Overfitted LLMs lack common sense, akin to someone who memorizes rules but struggles with new situations. Attorneys should recognize this when LLMs offer inflexible or potentially biased suggestions that fail to account for case-specific nuances.
- Knowledge: Tokens and Context Windows—Training on more tokens improves worldly wise responses, such as crafting a Lincolnesque closing argument. During inference, LLMs tokenize contracts, case briefs, or prompt instructions, but each model has a context window, limiting how many tokens it processes at once. Exceeding this window risks losing or distorting key legal details in lengthy documents. While larger training sets and wider context windows enhance comprehension, they increase computational costs and may introduce biases or inefficiencies.
- Individuality: Attention Mechanisms and Weights—Attention mechanisms9 (e.g., global-local attention, sparse attention) determine how LLMs process language. Some models prioritize broad context, while others focus narrowly. These variations shape each model’s personality, influencing how it organizes legal information. LLMs assign weights to words (its preferences), emphasizing legally significant terms while de-emphasizing filler text—similar to how humans focus on key points in a conversation.
- Complexity: Parameters—Parameters act as a model’s synapses, shaping how input transforms into responses. Stored as multidimensional arrays (tensors), parameters guide how LLMs prioritize, interpret, and predict words. LLMs contain millions to trillions of parameters, mainly consisting of weights (influencing word significance) and biases (adjusting predictions). More parameters enhance accuracy and enable unexpected abilities (emergence)—like recognizing patterns in unfamiliar legal texts or translating languages not used in training. However, larger models require more computation resources, increase latency, and risk overfitting. Attorneys should weigh the broad insights of large models against the efficiency of smaller, fine-tuned ones designed for contract drafting, litigation support, ethics, and compliance.
- Awareness: Embedding Vectors—Embedding vectors numerically represent tokens in multidimensional space, capturing meaning, relationships, and context. LLMs use embeddings to map legal concepts—for example, linking duty of care to negligence. In translation programs, embeddings encode words from one language and decode them into another. While embeddings enhance contextual accuracy, they rely solely on patterns—not true comprehension.
- Reasoning: Test-Time Scaling—Traditionally, pretraining and post-training were used to increase, or scale, an LLM’s capabilities. New models like GPT-o3 and DeepSeek-R1 introduce a third method—reinforcement learning algorithms that explore multiple reasoning paths during inference. These models use chain-of-thought processes, allowing them to pause, reevaluate, and iteratively refine outputs. Though slower than standard LLMs, these reasoning models offer greater reliability by displaying the thought process behind responses. This makes them valuable for attorneys handling complex, multistep legal reasoning such as risk assessments in business transactions.
- Intelligence: AGI—Current AI systems face a fundamental challenge: No universally accepted definition of intelligence exists and there is no agreement on who should judge it or their qualifications. This hinders objective measurement of AI capabilities and limitations. The Turing test, though historically significant, over-relies on human perception of intelligent behavior, which is easily manipulated when judges lack AI expertise, i.e., naïve judges can be fooled by rudimentary systems. Thus, multifaceted AI evaluation approaches are crucial, assessing reasoning, problem-solving, creativity, and adaptability across domains. This moves beyond GenAI into the realm of artificial general intelligence (AGI). AGI will pose critical legal questions for attorneys: liability for autonomous decisions, IP rights for machine-generated content, and potential AI personhood. These questions necessitate reevaluating existing laws and proactively developing new frameworks to address AGI’s profound societal impact—how to deal with intelligent machines.
Why Chatbots Seem Human
Understanding our interactions with chatbots helps explain why we trust them, and why they mislead us. Watch for these psychological traps:
- Anthropomorphism. Humans naturally attribute intelligence and personality to responsive conversational systems. LLMs’ sophisticated pattern recognition enables them to replicate legal phrasing, rhetorical styles, and cultural cues by analyzing vast datasets. They can mirror an attorney’s tone—formal, persuasive, or conciliatory—but their confidence often masks errors, making them sound more authoritative, even when wrong.
- Cognitive bias. Studies show humans are prone to automation bias, over-relying on machine-generated outputs as if they are more accurate or objective than human analysis. This psychological tendency can lead attorneys to accept AI-generated content without sufficient scrutiny.
- Confirmation bias. LLMs want to please and can reinforce an attorney’s pre-existing views by presenting information aligned with prior inputs. If prompted with leading questions, models will typically oblige rather than challenge bias with alternative perspectives. Therefore, instead of asking, “Why is arbitration the best dispute resolution method?” ask, “Compare arbitration to alternative dispute resolution methods.”
- Echo chamber effects. LLMs reinforce widely accepted views, especially from public forums or legal blogs. When a legal interpretation is incorrect, it can mistake repetition for truth, subtly distorting information in ways only subject matter experts may catch.
The Limits of GenAI’s Simulated Humanity
- Creative thinking. An LLM mirrors humanity’s collective intelligence, shaped by everything it has read. It excels at brainstorming and summarizing legal principles but lacks independent thought, opinions, or strategic foresight—all essential to legal practice. Therefore, if a model’s summary of your legal argument feels stale, illogical, or disconnected from human values, it may be because the model has no democratized data to pattern itself on. The good news? You may be on to something original—and truly meaningful!
- True comprehension. An LLM does not know the law; it merely predicts legal-sounding text based on past examples and mathematical probabilities.
- Judgment and ethics. An LLM does not possess a moral compass or the ability to make judgments in complex legal contexts. It handles facts, not subjective opinions.
- Long-term consistency. Due to its context window limitations, an LLM may contradict itself if key details fall outside its processing scope. It lacks persistent memory storage.
- Limited context recognition. An LLM has limited ability to understand context beyond provided information and is limited by training data scope.
- Trustfulness. Attorneys have a professional duty to protect client confidences, but privacy and PII (personally identifiable information) are evolving concepts within AI. Unlike humans, models can infer private information without PII, through abstract patterns in data. To safeguard client information, carefully review (or summarize with AI) your LLM’s terms of use.
Where Your AI Colleague Lives and to Whom It Answers
LLMs are built using programming languages like Python, trained often on undisclosed data, optimized for sophisticated GPU processors, and hosted on powerful cloud servers owned by major tech companies such as OpenAI, Google, and Microsoft.
Dependence on external hosting has limited our ability to fully control or safeguard sensitive client data. However, the recent open-source release of DeepSeek-R1 demonstrates that through special modifications to an LLM’s architecture, such as quantization, significantly smaller, less-costly models can be developed with technical benchmarks rivaling those previously produced at great cost. This advancement suggests that the playing field between big tech and small professionals may be becoming more level and that lawyers may soon be able to train and host offline and own their own proprietary LLMs with greater security and confidence. We may have available portable Small Language Models (SLMs) customized only for individual client case files, communications, and legal research (without wasted parameters dealing with poetry, investment advice, or complex calculus) and stored on our cell phones. These SLMs could carry an interactive digital focus group, legal research team, accountant, document archivist, psychologist, jury, and judge—all devoted to one case!
Emerging Models and Risks: DeepSeek-R1
DeepSeek-R1 offers a cost-efficient alternative to U.S. AI models but raises varied concerns regarding bias, censorship, data transparency, proprietary content use, open-source development, and the broader implications of an AI arms race. While DeepSeek’s architecture is open source, the specifics of its training data remain undisclosed, making it difficult to assess potential biases, ethical safeguards, or proprietary content usage.
Notably, DeepSeek’s responses often resemble GPT-4o, leading to speculation about similar training methodologies and data copying. Unlike Western fine-tuned models, DeepSeek-R1 does not rely on reinforcement learning from human feedback (RLHF) for safety alignment. Instead, its content moderation follows Chinese AI regulations, automatically erasing ongoing completions on politically sensitive topics such as Taiwan or human rights law—raising further concerns about transparency and neutrality. Additionally, user data are stored on servers in the People’s Republic of China, subjecting them to Chinese legal oversight and potential privacy and security risks for firms handling sensitive legal data. Finally, the open-source nature of DeepSeek-R1 is a double-edged sword. It fosters AI innovation and accessibility but also poses challenges for international regulation and potential misuse by bad actors. As AI continues to advance, attorneys and policymakers must navigate the complex intersection of transparency, security, and ethical AI governance on a global scale.
If law firms move toward self-hosting LLMs or SLMs, they will face other challenges. First, attorneys may need ongoing education in AI fundamentals, such as probabilistic reasoning, inference, and model architecture, to effectively customize and interpret their systems. Second, on-site LLMs won’t come with the built-in reliability of cloud-based models, meaning firms must manage uptime, maintenance, and performance optimization. Finally, firms will need robust security protocols, curated datasets, and possibly IT personnel to ensure compliance and confidentiality. Perhaps, in time, a turnkey legal LLM solution—maybe even developed by the ABA—will simplify this transition.
Interacting with Your GenAI Colleague: The Takeaways
Use GenAI for:
- Document organization and classification. Sort case files, contracts, and legal memos (e.g., summarize lengthy law review articles, especially in health care, IP, and tax law).
- Drafting, summarization, and translation. Generate first drafts (pleadings, letters, memos, etc.), but always review before use.
- Pattern recognition and automation. Identify trends in caselaw, contracts, or regulations to streamline due diligence and compliance.
- Argument evaluation and ideation. Brainstorm legal arguments, counterarguments, negotiation strategies, and risk assessments.
Avoid GenAI for:
- Legal advice and strategy. AI is not an attorney and cannot replace professional judgment.
- Direct client interactions or court filings without review. Verify accuracy and compliance before submission.
- Empathy, ethics, and moral reasoning. AI lacks human judgment and is unsuitable for client counseling.
- Confidential or privileged data. Never input client-sensitive, proprietary, or litigation strategies into public AI models.
- Unverified citations and legal analysis. AI hallucinates sources—always cross-check before relying on them.
Ongoing Education and Firm Protocols:
- Stay informed on AI’s capabilities, risks, and ethics. Update firm policies accordingly.
- Regularly refine best practices for AI-driven research, drafting, and case preparation.
Model Selection and Fine-Tuning:
- Use task-specific models based on legal needs:
- Voice transcription(depositions)
- Contract analysis and draft- ing (transactional law)
- Litigation support (caseresearch, procedural strategy)
- Financial and regulatory reporting (compliance-heavyfields)
- Verify the source. Learn the model’s producer’s data origins, accuracy testing, bias mitigation, and cut-off date.
- Know the basic technical specs (parameters/context window, etc.).
- Improve accuracy with retrieval-augmented generation (RAG/gRAG). Supplement AI with trusted legal sources.
- Use API controls. Adjust temperature settings to balance creativity and precision.
Testing, Verification, and Bias Detection:
- Run multiple completions for consistency. Never assume the first response is correct.
- Cross-check AI outputs against primary sources (statutes, case law, regulations).
- Recognize and guard against:
- Hallucinations (fabricated cases, statutes, or procedures.)
- Dilutions (vague summarieslacking legal nuance)
- Conflations (merging unre- lated cases or legal concepts)
- Bias (favoring one legal stance over another)
Ethics and Compliance:
- Disclose AI usage when required. See generally ABA Formal Opinion 512 and Model Rule 1.1, Comment 8.
- Review platform terms of use. Free AI tools may compromise confidentiality.
- Maintain AI usage records. Track AI-generated work for accountability.
Effective Prompt Engineering:
- Craft clear, specific, unbiased prompts. Avoid vague or leading language.
- Ask AI to summarize your input before generating responses.
- Continuously refine prompts. Experiment with phrasing and customize an LLM for repeated firm tasks.
Critical Thinking and Oversight:
- Treat AI like a junior associate. Question outputs, demand sources, verify reasoning.
- Use reasoning models for complex legal analysis. GPT-o3, DeepSeek-R1, or specialized models improve accuracy.
Collaborative Learning:
- Learn from AI’s (or your?) mistakes. Refine your prompts; reevaluate.
- Use AI to sharpen legal analysis. Let it challenge assumptions, but final judgment is yours!
- Use AI to organize thoughts. You may not know exactly what question to ask. Let AI summarize elaborate prompts (complex legal/factual issues) into simple bullet points for clearer framing, even when it struggles to suggest a solution itself.
Using LLMs to Serve Justice and Protect Our Clients
My custom-GPT, Lex, who I recognized at the introduction, ultimately taught me three enduring truths about interacting with an LLM:
Understand its strengths and limitations. It’s the foundation of trust.
Let an LLM tell you what something is. But you decide what it means.
To stay in control, you should be the subject-matter expert and the ultimate judge of all words, concepts, and their weights.
Our partnership with AI is a journey, and like any journey, the landscape will change. AI is traveling fast, society is adapting, and the law is struggling to keep pace. As attorneys, we have a unique duty to help shape how this technology serves justice and protects our clients. That means stepping beyond our comfort zones—just as we once did with computers, the internet, and digital research.
The next generation of lawyers may need to trade old skills for new ones, venturing into areas once foreign to our field—AI architecture, training methods, pragmatics, psychology, even math-based reasoning. But for now, our task is simpler: commit to learning, share what we know, and engage in the conversation.