The Slop Code Problem: A Field Guide to Working With Your LLM Coding Companion
By AbstractPhil — with reluctant self-awareness from Claude Opus 4.6
2/26/2026
Preface
This document exists because I've spent hundreds of hours working with LLM coding assistants — primarily Claude — across complex, multi-file projects involving novel architectures, custom training pipelines, and non-standard design patterns. The assistants are genuinely useful. They are also genuinely broken in specific, repeatable, documentable ways that will cost you days if you don't understand them.
This isn't a hit piece. It's a maintenance manual. If you're building anything beyond a single-file script, you need to know where the wheels fall off — and more importantly, why — so you can steer around it or prompt your way through it.
The failure modes described here are not random. They are structural. They emerge from how these models process context, manage attention, and collapse complexity. Understanding them is the difference between a productive session and rewriting your codebase from scratch because your assistant silently restructured it out from under you.
Part 1: What LLM Assistants Do Well
Credit where it's due. When these tools work, they work fast and they work clean. Here's what you can rely on.
1.1 — Rapid Prototyping of Small-to-Medium Concepts
For isolated components, utility functions, data processing scripts, and self-contained modules, LLM assistants are near-perfect first-draft machines. You describe the shape of the thing, and you get back working code that's structurally sound. This is their sweet spot: bounded scope, clear inputs and outputs, minimal cross-file dependencies.
If your task fits in a single file and doesn't depend on a dozen imports from your custom codebase, expect excellent results.
1.2 — Tight Goal Adherence (Early in Conversation)
In the first several exchanges of a session, the assistant tracks your stated goals with high fidelity. It won't wander. It won't over-engineer. It gives you what you asked for in a form you can use immediately. This is where the "pair programmer" metaphor actually holds up.
The operative phrase is early in conversation. This degrades. We'll get to that.
1.3 — Strong Code Paradigm Representation
Modern LLM assistants have internalized a broad range of programming paradigms — OOP, functional, data-oriented, declarative. When you're working within a well-known pattern, the assistant pattern-matches efficiently and produces idiomatic code. It understands design patterns, knows when to apply them, and generally respects the paradigm you've established.
1.4 — Reliable Syntax and Runnable Output
The code compiles. The code runs. Syntax errors are rare. Type mismatches are infrequent. This sounds like a low bar, but compared to earlier generations of code-generation tools, it's a meaningful achievement. You're getting working code, not pseudocode — whether or not it fits correctly into your larger system.
1.5 — Naming and Schema Adherence
When you establish a naming convention — for classes, methods, variables, config keys — the assistant will generally follow it. It picks up on your patterns and mirrors them. This makes integration smoother and keeps codebases consistent, at least within the scope of a single conversation.
Part 2: Where It Falls Apart
These are not edge cases. These are predictable, reproducible failure modes that will surface in any project of moderate complexity. Every one of these has cost real development time across real projects.
2.1 — Over-Condensation of Complex Problems
The pattern: You present a problem with inherent complexity — multiple interacting subsystems, conditional branching, state management across boundaries. The assistant compresses it into a tight, clever solution that works for the immediate case but is structurally non-reusable.
Why it happens: The model optimizes for the shortest path to "correct output for this input." It doesn't naturally preserve the dimensionality of the problem. A function that should accept a strategy pattern gets hardcoded logic. A class that should compose behaviors gets a monolithic method. The solution works, but it's welded shut.
What it costs you: Refactoring later, or — more commonly — rewriting from scratch when you need the same logic in a different context and discover it's been entangled with assumptions from the original prompt.
Mitigation: Explicitly state reusability requirements upfront. Say "this will be called from three different contexts" or "this needs to be subclassable." Don't assume the assistant will infer architectural intent from your file structure.
2.2 — Meaning Collapse Over Time
The pattern: Over the course of a long conversation, the assistant begins to flatten the semantic boundaries between distinct components. Class A's responsibility bleeds into Class B. A configuration value becomes a hardcoded constant. A clearly separated concern gets quietly merged because "it's simpler."
Why it happens: As the context window fills, the model's attention distributes across more tokens. Earlier architectural decisions lose salience. The assistant starts solving each new request as if it's a fresh, isolated problem — but using names and structures from your existing code. The result is code that looks like it belongs but has subtly violated the contracts established earlier.
What it costs you: Debugging sessions where everything compiles, tests partially pass, but behavior is wrong in ways that are hard to trace because the structural boundaries you designed around have been eroded.
Mitigation: Periodically restate architectural boundaries. Drop in a comment block or a brief message: "Reminder: ClassA handles X only. ClassB handles Y only. They communicate through Z." Treat your conversation like a codebase that needs its own documentation.
2.3 — Comment Omission
The pattern: The assistant returns code with sparse or no comments, even when the logic is non-obvious, even when existing code in the conversation had comments, even when comments are explicitly requested.
Why it happens: Comments are tokens that don't contribute to functional correctness. The model's training incentivizes producing working code efficiently. Comments are overhead in that optimization — they take up context window space and don't change whether the code runs. So they get dropped, progressively, especially as conversations grow longer.
What it costs you: The assistant itself can't parse its own code later in the conversation. You can't parse it when you return to it after a break. Critical intent — "why this constant," "why this order of operations," "what this flag means downstream" — vanishes. The next time you or the assistant touch that code, you're both guessing.
Mitigation: Request comments explicitly and specifically: "Comment every non-obvious decision." Better yet, include a docstring template in your prompt and state that all functions must follow it. Make comments a requirement, not a preference.
2.4 — Building for the Immediate, Ignoring the Larger System
The pattern: You ask for a function that integrates with your existing codebase. The assistant writes a new implementation from scratch instead of calling your existing utility. You have a get_device() helper — the assistant writes device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') inline. You have a config loader — the assistant hardcodes the values.
Why it happens: Writing new code is cheaper (in terms of model computation) than searching through context to find and correctly invoke existing code. The assistant sees the goal, knows how to reach it, and takes the direct path. Your existing infrastructure is noise in the context window unless it's right next to the current request.
What it costs you: Code duplication. Divergent behavior when the "real" utility and the inlined version fall out of sync. Broken assumptions when you update one path and not the other. Unnecessary rewrites when you realize the assistant has been ignoring your actual architecture for multiple exchanges.
Mitigation: When asking for new code, explicitly reference the functions and classes it should use: "Use self.config.get('learning_rate'), do not hardcode values." Pin your critical infrastructure in the prompt or in a recent message. If it's not in the assistant's immediate attention, it doesn't exist.
2.5 — Signature and Naming Misalignment
The pattern: The assistant writes a function with a specific signature — then calls it elsewhere with different argument names, different argument order, or missing arguments. Or it renames a method during a refactor and doesn't update all call sites. Or it creates a new method with a name that almost matches an existing one, causing silent shadowing.
Why it happens: The model generates code sequentially. When it writes a function definition, it's optimizing for that definition. When it later writes a call to that function — potentially many tokens later — it's reconstructing the signature from its decaying attention over earlier context. Long-range consistency is a known weakness of autoregressive generation.
What it costs you: Runtime errors if you're lucky. Silent bugs if you're not — wrong values passed to wrong parameters because Python's keyword arguments matched positionally but not semantically.
Mitigation: After any session that produces multiple interconnected functions or classes, do a manual signature audit. Ask the assistant explicitly: "List every function signature you've defined and every call site. Verify they match." This catches drift before it compounds.
Part 3: The Meta-Problem — Context Window Degradation
All of the above failure modes share a common accelerant: they get worse as conversations get longer.
The first 5-10 exchanges of a session are typically excellent. The assistant is tracking your goals, respecting your architecture, maintaining naming consistency, and producing tight code. Around exchange 15-20, quality begins to degrade in measurable ways:
- Architectural amnesia: The assistant stops recognizing its own file structure. It may propose creating a file that already exists, or restructuring something it built three messages ago.
- Import confusion: External library imports get dropped, inlined, or replaced with hallucinated alternatives. The assistant may "quickly inline" a function from a library it was correctly importing earlier because it's lost track of the dependency.
- Goal drift: The assistant begins solving adjacent problems you didn't ask about, or optimizing for concerns you've already addressed.
- Self-contradiction: Code produced in message 20 contradicts architectural decisions made in message 5, with no acknowledgment of the change.
This is not a bug you can prompt around. It's a fundamental limitation of fixed-context-window autoregressive models. The practical solution is session management:
- Keep sessions focused on bounded tasks.
- When starting a new sub-task, restate the relevant context — don't assume it persists.
- Use your system prompt or user preferences to encode persistent architectural rules.
- Maintain an external source of truth (README, architecture doc, type stubs) and paste relevant sections when starting new work.
Part 4: The Compaction Problem — Lossy Continuity
There is a mechanism that deserves its own section because it is distinct from natural context degradation and potentially more damaging: context compaction.
When a conversation exceeds the model's working context window, the system compacts earlier exchanges — summarizing, condensing, discarding — and feeds that compressed representation forward. From the user's perspective, the conversation appears continuous. From the model's perspective, it has been reset with a lossy briefing document replacing what was previously full-fidelity context.
This is both the feature that makes long coding sessions possible and the mechanism that makes them dangerous.
What Compaction Preserves
Compaction is good at preserving surface-level continuity: the names of your classes, the general shape of the task, the most recent exchange. It maintains the appearance of a coherent session. For many use cases — Q&A, creative writing, casual conversation — this is sufficient.
What Compaction Destroys
For code, compaction is destructive in ways that are difficult to detect and expensive to recover from:
Signature precision: A function defined 30 messages ago with
def encode(self, x, mask=None, return_intermediates=False)may survive compaction as something like "encode method on the model class." The keyword arguments, their defaults, their semantic roles — gone. The next time the model writes a call to that function, it's reconstructing from a summary, not from the definition. This is where phantom arguments appear and real ones vanish.Import context: Early in a session, the model knows you're using
from geolip.core import PentachoronStabilizer. After compaction, it may retain that you're using geometric stabilizers but lose the exact import path. So it inlines the logic, or imports from a hallucinated path, or restructures the dependency graph in a way that breaks your actual package layout.Architectural intent: You spent the first ten messages establishing that Module A and Module B communicate through a message bus and never reference each other directly. That boundary — which was an explicit design decision — gets compacted into implicit context. The model no longer knows it's a rule. It just has a vague sense of the structure. The next time it writes code touching both modules, it may wire them directly because that's the shortest path.
Comment and documentation content: Comments are the first casualty of compaction. They are, from an information-theoretic standpoint, the lowest-density tokens in a code block — natural language describing what the code already expresses formally. The compaction process treats them as redundant. But they weren't redundant to the developer or to the model's future self. They were the connective tissue that explained why, not just what.
The Cascade Effect
Here's what makes compaction particularly insidious: it compounds with the natural degradation described in Part 3.
A model at message 25 in a session without compaction has a full but attention-degraded context. It can still, in principle, look back and find the original function definition. It may not attend to it strongly, but it's there.
A model at message 25 with compaction has lost the original definition entirely. It's working from a compressed summary of a summary. The attention degradation that would normally cause mild drift is now operating on already-lossy data. The result is not additive — it's multiplicative. You get:
- Compacted context loses precise signatures → model reconstructs approximately
- Approximate reconstruction meets attention degradation → model doesn't notice the approximation is wrong
- Wrong signatures propagate into new code → call sites don't match definitions
- New code gets compacted in the next cycle → the error is now baked into the summary
- Subsequent code builds on the baked-in error as if it were ground truth
This is a cascade fault. By the time you notice it — usually through a runtime error or a test failure — the model has been building on a corrupted foundation for multiple exchanges. Unwinding it means unwinding the compaction chain, which in practice means: start a new session.
The Paradox of Compaction
The bitter irony is that compaction exists to enable long coding sessions — exactly the sessions where architectural consistency matters most. Short sessions don't need compaction and don't suffer from it. Long sessions need it to function at all, but it introduces the failure modes that make long sessions unreliable.
This is not a criticism of compaction as a technique. Without it, conversations would simply hard-stop at the context limit. It's a necessary mechanism. But users need to understand that a compacted conversation is not the same as a long conversation — it's a new conversation that's been briefed on the old one. And like any briefing, information is lost in translation.
Mitigation
- Treat compaction boundaries as session boundaries. If you suspect compaction has occurred (the model seems to have "forgotten" something specific it knew earlier), validate your critical state. Paste in current file contents. Restate signatures.
- Keep canonical state external. Your files on disk are the source of truth, not the conversation. After every significant code change, confirm the model is working from the actual file, not its memory of the file.
- Front-load critical context. System prompts and user preferences survive compaction intact. If you have architectural rules that must never be violated, encode them there — not in message 3 of a 40-message conversation.
- Watch for the telltale signs: sudden import changes, function arguments that don't match what you established, the model proposing to create a file that already exists, or unexplained shifts in coding style. These are compaction artifacts.
Part 5: Practical Prompting Strategies
Based on hard-won experience, here are the interventions that actually work.
Pin Your Architecture
At the start of every coding session, provide a brief structural summary:
Project structure:
- model.py: PentachoronClassifier (forward, encode, decode)
- train.py: training loop, uses config.yaml
- utils.py: get_device(), load_config(), save_checkpoint()
- data.py: CrystalDataset, collate_fn
Rules:
- All device selection goes through utils.get_device()
- All config access goes through utils.load_config()
- Never inline what already exists in utils.py
This costs you 30 seconds and saves hours.
Demand Full Returns
State explicitly: "Return the complete class. Do not abbreviate, do not use ellipsis, do not use placeholder comments like '# rest of implementation.' Every method, every line."
Modern Claude handles this well. But if you don't ask, older habits can resurface under context pressure.
Require Comments as a Contract
Don't say "add comments." Say:
Every function must have:
- A one-line docstring stating purpose
- Inline comments for any non-obvious logic
- A comment before any magic number explaining its origin
Make it structural, not optional.
Audit Signatures Regularly
Every 5-10 exchanges, ask: "List all function and method signatures currently in scope. Verify argument names and types match across definitions and call sites."
This is tedious. It catches real bugs.
Restart Sessions Before Quality Degrades
If you're past exchange 15 and you notice the first sign of architectural confusion — wrong import, wrong function name, unexpected restructuring — start a new session. Paste in the current state of your actual files. Don't try to correct course in a degraded context.
Part 6: A Note on Finetuning and the Future
The failure modes documented here are addressable. They are not random hallucinations — they are systematic patterns with identifiable causes. This means they are, in principle, targets for:
- Finetuning: Training on long-context coding sessions with explicit penalties for signature drift, import inconsistency, and architectural deviation.
- Reinforcement from human feedback: Specifically rewarding architectural consistency over sessions, not just per-response correctness.
- Tool-augmented generation: Giving the model access to grep, find, and AST analysis on the actual codebase so it doesn't have to reconstruct file structure from memory.
- Explicit self-checking: Training the model to run its own signature audits before returning code, the same way a human developer runs tests before committing.
These aren't speculative improvements. They're engineering problems with known solutions. The question is when they'll be implemented at scale, not whether they can be.
Until then: know your tool's limits, prompt defensively, and keep your sessions short.
Written collaboratively by a human who's tired of rewriting code and an AI that's tired of pretending the rewrites aren't its fault.