ahan2000/Qwen2.5-FT

This is a fine-tuned model using PPO (Proximal Policy Optimization) reinforcement learning.

Model Details

Model Type: Language Model
Training Method: PPO (Proximal Policy Optimization)
Framework: PyTorch
Model Format: SafeTensors

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "ahan2000/Qwen2.5-FT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

# Example usage
text = "Your input text here"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Model Files

This model contains the following files:

config.json: Model configuration
generation_config.json: Generation configuration
tokenizer_config.json: Tokenizer configuration
tokenizer.json: Tokenizer data
vocab.json: Vocabulary file
model-*.safetensors: Model weights in SafeTensors format
model.safetensors.index.json: Model index file

Training

This model was trained using PPO reinforcement learning to optimize for specific objectives.

Safety and Limitations

Please use this model responsibly and in accordance with applicable laws and regulations.

Citation

If you use this model, please cite:

@misc{ahan2000_Qwen2.5-FT,
  title={ahan2000/Qwen2.5-FT},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/ahan2000/Qwen2.5-FT}}
}

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

BF16