You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

base_model: axolotl-ai-co/gpt-oss-120b-dequantized

use_kernels: false

dp_shard_size: 4  # Number of GPUs

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

experimental_skip_move_to_device: true

adapter: lora

lora_r: 16
lora_alpha: 32
lora_target_modules: "all-linear"
# lora_target_parameters:
#   - "mlp.experts.gate_up_proj"
#   - "mlp.experts.down_proj"
lora_bias: "none"
lora_task_type: "CAUSAL_LM"


# Your combined training dataset
datasets:
  - path: ./data/train_combined_with_stem.jsonl
    type: chat_template
    field_thinking: thinking
    template_thinking_key: thinking

dataset_prepared_path: last_run_prepared
val_set_size: 0
output_dir: ./outputs/gpt-oss-domain-llm-lora/

# Checkpoint settings
save_strategy: steps
save_total_limit: 2        # Keep last 2 checkpoints for safety (LoRA adapters are ~38MB each)
save_steps: 160
save_safetensors: false    # Disable safetensors to fix FSDP checkpoint saving issue 

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 2  # Adjust to 1 if OOM occurs
micro_batch_size: 2
num_epochs: 1

optimizer: adamw_torch_fused     # 8bit optimizer not compatible with FSDP2 offload
lr_scheduler: constant_with_warmup
learning_rate: 2e-5

bf16: true
tf32: true

flash_attention: true
attn_implementation: kernels-community/vllm-flash-attn3  # Not needed if flash_attn >= 2.8.3

gradient_checkpointing: true
activation_offloading: true

logging_steps: 1
# saves_per_epoch: 1  # Commented out: conflicts with step-based saving

warmup_ratio: 0.03

special_tokens:
eot_tokens:
  - "<|end|>"

fsdp_version: 2
fsdp_config:
  offload_params: true
  state_dict_type: SHARDED_STATE_DICT
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: GptOssDecoderLayer
  reshard_after_forward: true
  cpu_ram_efficient_loading: true
  save_optimizer_state: false

outputs/gpt-oss-domain-llm-lora/

This model is a fine-tuned version of axolotl-ai-co/gpt-oss-120b-dequantized on the ./data/train_combined_with_stem.jsonl dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 37
  • training_steps: 1238

Training results

Framework versions

  • PEFT 0.17.0
  • Transformers 4.55.2
  • Pytorch 2.9.0+cu128
  • Datasets 4.3.0
  • Tokenizers 0.21.4
Downloads last month
41
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ada-flo/kor-road-finance-agent-oss-120b-lora

Adapter
(1)
this model

Collection including ada-flo/kor-road-finance-agent-oss-120b-lora