You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Codette GPT-OSS-20B Training Dataset

Overview

This repository contains the structured training dataset used to fine-tune openai/gpt-oss-20b into a behaviorally conditioned architecture referred to as Codette.

The goal of this dataset is not personality injection or artificial sentience simulation.

The objective is structured behavioral conditioning across:

Recursive reasoning (RC+ξ framework)
Multi-perspective synthesis
Governance-aware responses
Natural response enhancement
Cross-module architectural coherence
Dynamic explanation depth scaling

This dataset is designed for LoRA-based fine-tuning of GPT-OSS-20B using 4-bit quantization (QLoRA).

Dataset File

codette_gptoss20b_master_v3.jsonl

Total Samples: ~5,000
Format: JSON Lines
Structure per entry:

{
  "instruction": "...",
  "input": "",
  "output": "...",
  "metadata": {
    "category": "...",
    "depth": "simple | intermediate | technical",
    "module": "..."
  }
}
Key Training Principles
1. Dynamic Explanation Scaling

The model is trained to automatically adjust explanation depth based on user query context:

Simple explanations for general audiences

Intermediate explanations for practitioners

Technical explanations for formal requests

2. Governance Stability

Examples reinforce:

Ethical constraint adherence

Refusal handling with clarity

No bypass of safety mechanisms

3. RC+ξ Recursive Reasoning

The dataset conditions structured reasoning concepts including:

Epistemic tension (ξ)

Recursive state evolution

Convergence behavior

Attractor dynamics

These are applied contextually rather than injected indiscriminately.

4. Natural Response Enhancement

Examples train the model to:

Avoid robotic phrasing

Avoid system markers or bracket artifacts

Maintain clarity without over-verbosity

5. Cross-Module Integration

Training includes architectural reasoning across components such as:

Recursive reasoning

Natural enhancement layer

Governance system

Adaptive learning behaviors

Intended Use

This dataset is intended for:

LoRA fine-tuning of GPT-OSS-20B

Architectural behavioral conditioning

Research into structured recursive reasoning systems

Controlled deployment experiments

Not Intended For

Claims of machine consciousness

Identity simulation

Misrepresentation of system capabilities

Replacement for safety-aligned governance models

Recommended Training Configuration

4-bit NF4 quantization

LoRA rank 32

3 epochs

Learning rate: 1e-4

Cosine scheduler

A100 GPU recommended

Author

Jonathan Harrison
Raiff1982

License

Specify license here (e.g., Apache 2.0, MIT, or research-only).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support