mlinter: a linter for Transformers modeling files

Community Article Published April 22, 2026

transformers is a big library. It ships hundreds of model architectures, each living in its own modeling_<model>.py and configuration_<model>.py. That single-file policy is a feature: anyone can open one file and read the entire model end-to-end, no inheritance tree spelunking. But it comes with a cost.

Every new model is an opportunity to drift from the conventions that make the rest of the library work.

These are the naming contracts that let AutoModel resolve, the initialization hooks that weight-tying and device maps depend on, and the patterns that pipeline parallelism assumes.

Some of these conventions were documented; many still lived in reviewers' heads. A model would merge cleanly, pass tests, and only break later:

  • pipeline parallelism replaces a submodule with nn.Identity, and a forward pass accessing decoder_layer.attention_type crashes on non-first stages

Each of these took a human to notice, often days after the code merged. The fix was usually one line. Finding it cost hours.

mlinter is what happens when you move that knowledge out of reviewer memory and into static analysis. And it doesn't just help human reviewers. It's designed so that coding agents can enforce, explain, and even extend these rules themselves.

What it is

mlinter is a standalone linter for modeling_*.py, modular_*.py, and configuration_*.py files in the Transformers repo. It runs on pure Python AST — no torch, no tensorflow, no runtime imports — and enforces a growing catalogue of structural rules. Each rule has an ID in the TRF### namespace, a one-line description, a why_bad explanation, and a before/after diff showing the fix.

Today it ships about 15 rules, covering things like:

  • TRF001<Model>PreTrainedModel.config_class must match <Model>Config.
  • TRF004 — models must not override tie_weights; declare _tied_weights_keys instead.
  • TRF009 — modeling files cannot import implementation code from other model packages.
  • TRF011forward() must not access non-nn.Module attributes on pipeline-parallelism-managed submodules.
  • TRF013 — every PreTrainedModel.__init__ must call self.post_init().
  • TRF014 — native integrations must never pass trust_remote_code=True.

These aren't style nits. Every rule encodes a real bug class:

  • TRF011 — pipeline parallelism turns submodules into nn.Identity, so custom attribute access crashes on non-first stages.
  • TRF012 — in-place weight init bypasses the flags the framework uses to track which parameters still need initialization.
  • TRF004 — overriding tie_weights breaks from_pretrained and save_pretrained round-trips.

TRF011 is a good example. A forward pass that reads decoder_layer.attention_type inside for decoder_layer in self.layers: looks innocuous. It runs, passes tests, ships. Then someone enables pipeline parallelism, some of those layers become nn.Identity on non-first stages, and the attribute access raises AttributeError at runtime — on a model that has already merged. The static rule catches this at edit time, which is the only time it's cheap to fix. Every TRF rule is a postmortem turned into a check.

It's specific to Transformers modeling conventions, but the pattern generalizes: if maintainers keep giving the same structural review comments, those comments are candidates for static checks with self-explaining rules.

Tightening the feedback loop

The main reason mlinter exists is latency. A modeling convention violation that ships to main and gets caught three days later in a follow-up PR is expensive. The author has context-switched. The reviewer has context-switched. The fix requires another PR, another round of CI, another ping.

mlinter collapses that loop to seconds. It runs as part of make typing in the Transformers repo, so contributors catch violations locally before pushing. The --changed-only flag scopes the check to files changed against origin/main, which makes it fast enough to run on every save if you want. When a rule fires, the error message points at the exact line and hands you the rule ID:

src/transformers/models/acme/modeling_acme.py:18: TRF013: AcmeModel.__init__ does not call self.post_init().

From there, mlinter --rule TRF013 prints the full explanation and diff. No doc lookup, no asking around, no waiting for a reviewer. The feedback comes from the tool, at the moment the mistake was made, in the same terminal where the code was written.

It also has a lint cache keyed on file content, so repeated runs over a large repo stay cheap. And there's a --github-annotations mode that surfaces violations as inline PR annotations in CI, so the rare case where something slips past local checks still gets caught at the right place.

Inspired by Ruff

The design is deliberately ruff-shaped. Ruff gets a lot right about how linters should work and feel: short memorable IDs, self-contained per-rule docs, a single config file, and adding a rule as a local mechanical change.

mlinter deliberately adopts that playbook:

  • Namespaced rule IDs. Ruff has E, W, F, TCH, B; mlinter has TRF. The prefix signals scope, the number grows monotonically.
  • TOML as the single source of truth. mlinter/rules.toml holds one entry per rule with description, default_enabled, allowlist_models, and a nested explanation block (what_it_does, why_bad, diff).
  • One module per rule. trf011.py implements TRF011 and nothing else. Add a new trfXXX.py and it's auto-discovered as long as a matching TOML entry exists; the two sides cross-validate at import time, so rules cannot silently desync from their docs.
  • Local, reviewable suppressions. # trf-ignore: TRF011 — <reason> on the offending line. No file-level disables. Every suppression needs a reason.
  • Per-rule opt-in. --enable-rules, --enable-all-trf-rules, --list-rules. Rules under rollout can ship with default_enabled = false and be flipped on once the repo is clean.

The allowlist_models field on each rule is the equivalent of Ruff's per-file ignores, scoped to model directories — add "wav2vec2" and the rule skips every file under models/wav2vec2/. Legacy doesn't block the rule from protecting every other model in the tree.

Built for AI agents

A lot of Transformers modeling work — adding a new architecture, porting a model, refactoring a config — is now done with the help of a coding agent. Agents have a specific failure mode on large codebases: they don't know conventions that aren't written down somewhere the agent can read. An agent writing a new modeling_acme.py from scratch will produce code that looks right and runs right but misses several structural conventions that maintainers rely on — the kind of thing a long-time reviewer would catch in thirty seconds.

mlinter is designed so that the agent catches those rules itself, in the same loop it uses for type checks and tests.

A few specific design choices make it agent-friendly:

The rules explain themselves in machine-readable form. Every rule's TOML entry has what_it_does, why_bad, and a minimal diff. That structure isn't just for humans. It lets an agent ingest the explanation, understand why the rule exists, and apply the fix correctly on the first try instead of guessing. python -m mlinter --rule TRFXXX renders that block on stdout in a stable format an agent can consume programmatically.

The CLI is small and predictable. --changed-only, --rule, --list-rules, --enable-rules, --github-annotations. No interactive prompts, no config file magic, no surprises. An agent can wire mlinter into a "after-edit" hook and trust that the output is stable.

The error format is stable and structured. <path>:<line>: <rule_id>: <message>. One violation per line. Agents can grep this, map each rule ID to its explanation, and produce a plan of fixes without a custom output parser.

Suppressions are explicit and reviewable. An agent that hits a genuine false positive can emit # trf-ignore: TRFXXX — <reason> on the offending line. That's a legible trace of the decision — both for the human reviewer and for any later agent that needs to understand why the line looks the way it does. Crucially, suppressions require a reason, which forces the agent to articulate why the suppression is justified instead of silently silencing checks.

There's a skill for adding new rules. The repo ships an add-mlinter-rule skill under .ai/skills/ — a structured workflow that coding agents can follow to add new rules end-to-end. It walks the agent through the full workflow: picking the next rule number, adding the TOML entry, implementing the rule, running it against the repo, triaging violations, and adding tests. An agent invoked with "add an mlinter rule for X" can follow the skill without rediscovering the layout. The workflow is ordered so that the agent runs the rule against every model before writing tests, which surfaces false positives on real code and forces the agent to either fix the rule, fix the models, or allowlist the stragglers.

The skill encodes the non-obvious rules about the repo. Multi-config directories, multi-class configuration files, inherited configs, the fact that tie_word_embeddings is not in the base PreTrainedConfig. These are exactly the gotchas an agent would otherwise trip over on its second or third new rule. They're written down in the skill once and read by the agent every time it adds a rule.

The result is that mlinter is not just a check an agent has to pass — it's a check an agent can extend. Every new bug class that shows up in review is a candidate for a new TRF rule, and adding that rule is a bounded, mechanical task an agent can complete in one pass. The catalogue compounds over time: every rule that gets added catches every future regression of its pattern, across every model.

The bigger picture

Transformers' single-file modeling policy is what makes the library legible. mlinter is what makes that policy sustainable at the scale the library has reached. It encodes the conventions that were hard to enforce from docs alone, surfaces them at the moment of writing, explains itself to both humans and agents, and grows by one rule at a time as new patterns emerge from review.

It's small: one package, a TOML file, a folder of trfXXX.py modules, a skill. That's the point. A convention checker that you can read in an afternoon is a convention checker you will actually keep extending.

If you're working on Transformers, run make typing before you push, that will run mlinter alongside ty. And use mlinter --rule when something fires. If you're building or using coding agents on Transformers, the rules and the skill are designed to fit directly into that loop. The rules are self-explaining, and the feedback is fast enough to fix violations the same turn you introduce them.

That's the whole idea: encode the rules once, enforce them everywhere, and let both humans and agents build on top of them.

The project lives in https://github.com/huggingface/transformers-mlinter

Community

Sign up or log in to comment