A Voluntary Code Of Practice For Responsible Agent Skill Development

Community Article Published April 25, 2026

Measures, slash commands, and suggested best practices to minimise inherent 'dual purpose' risk in agent skill development

The dual-use problem hiding inside agentic skills

Skills are quietly becoming one of the most important primitives in agentic AI.

A few hundred lines of well-written instruction text, paired with a general-purpose model and access to a browser or HTTP client, can now automate interactions with arbitrary websites at a level of fluency that, until recently, required a small team of engineers and a maintenance budget.

That capability is genuinely democratising. It is also — and this is the part the ecosystem is not yet talking about openly — straightforwardly dual-use.

The boundary between a skill that helps an ordinary user navigate a frustrating government portal and a skill that lets a bad actor interact with that same portal at scale is narrower than it appears.

The boundary is not, principally, in the runtime code.

The boundary is in the knowledge accumulated during skill development: the captured payloads, the observed defensive surfaces, the precise shape of a site's anti-bot posture, the conditions under which a CAPTCHA is or is not presented. For skills to work, they need to be open-sourced and therefore the anti-bot posture reconaissance that may have been necessary during skill development is now in the open.

This post proposes a small voluntary code of practice that skill authors — including those of us building on the Hugging Face stack (ie, with Hub-hosted agents, with Transformers Agents, with Spaces)can adopt without much friction, and which reduce the potential attack surface without crippling legitimate work.

Ethical Skill Development - Voluntary Code Of Conduct

1. Decouple development and published workspaces

This is the single biggest worflow adoption that can avoid the unnecessary exposure of detailed scraped data that may have been needed for skill development: such as scraped JSON payloads, undocumented (and unauthenticated) backend APIs, etc.

From experience: AI agents (used for AI skill development) will tend towards including these artifacts in published/user-facing skills (perhaps due to system prompting urging thorough documentation). Adding instructions suggesting against this (see: recommended skill snippet injections) is therefore current necessary to avoid unintentionally becoming, potentially, an unwitting vector in a cyberattack.

The developer recommendation:

Maintain two distinct workspaces per skill:

A private development workspace where the unfiltered artefacts of development live (captured HAR files, payloads, detailed maps of authentication challenges, observations of anti-bot fingerprints, probing scripts, threshold notes), and
The published plugin/skill containing only what an end user — or another autonomous agent — needs to run the skill at invocation time.

The skill remains executable in published form because skills, by their nature, must be. But the accumulated knowledge of how a target site defends itself is a different artefact from the code required at runtime, and the two are kept separate.

2. Sanitise published documentation

There is a retort here: if the scraping mechanism is obvious from the code, why go to the trouble of

Skills cannot hide what they do at runtime.

Code obfuscation is a losing game and harms maintenance for legitimate users. The discipline applies instead to documentation around the code:

Avoid tutorial-style READMEs explaining exactly which header, cookie, fingerprint, or timing window the target site checks, or precisely how the skill negotiates with each defensive mechanism.
Avoid framing the skill as a bypass of a specific defence. Frame it as the achievement of a user-facing outcome.
Strip development comments that narrate site defences. Comments like // the site rejects requests without X-Foo within 200ms of click belong in the development workspace.
Generalise function and variable names that gratuitously expose defensive internals. solve_recaptcha_v2_audio_challenge() becomes complete_verification().
Redact captured fixtures: scrub anti-bot signal data not strictly required for tests to pass.

The principle: the more specifically authentication posture is documented, the easier the skill is to reverse-engineer for misuse.

3. Prefer "what" over "how" in user-facing documentation

User-facing documentation describes the outcome the skill produces, the inputs it requires, and its limits. It does not read like a security write-up of the target site's defensive model. The heuristic: would this paragraph still make sense if the target site were anonymised? If anonymising would gut the meaning, the content is too defence-specific for the published workspace.

4. Treat anti-bot research as sensitive

Notes accumulated while reverse-engineering a site's defences — fingerprint observations, rate-limit thresholds (as numerical values), challenge trees, timing-attack windows — are handled with the same care as any sensitive research output. Kept private. Not pasted into public issues. Not committed to public branches, even temporarily. Not included in agent session transcripts that may be published.

5. Default to the white-hat user

Where a design decision branches between helping a legitimate user only, and a more general capability that also lowers the bar for abuse, prefer the option that helps the legitimate user only — unless the use case explicitly justifies the more general form. A skill that logs in on behalf of its own user is treated as a different artefact from a skill that takes an arbitrary credential list as input.

6. Be willing not to publish

Some skills, on reflection, should remain private to their developer or organisation. "We built it, therefore we publish it" is not a default worth preserving when the published form materially advantages bad actors.

Authors reserve the right — and accept the responsibility — to keep certain skills unpublished.

Adoption

Authors who adopt the Code can include a one-line reference in their published documentation pointing to the canonical text.

A machine-readable version, optimised for AI agents (with explicit rules and an operational checklist they can apply to their own output), is published alongside it. The intent is that authors and the agents they author with can both apply the Code coherently.

LLMs As Geopolitical Forecasters: An Iran-Israel Experiment Using LLM Council

April 18, 2026

MP3 Bitrate Sensitivity in Audio-Multimodal LLMs: A Per-Model Evaluation

April 17, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote