ahan2000/Qwen2.5-FT
This is a fine-tuned model using PPO (Proximal Policy Optimization) reinforcement learning.
Model Details
- Model Type: Language Model
- Training Method: PPO (Proximal Policy Optimization)
- Framework: PyTorch
- Model Format: SafeTensors
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "ahan2000/Qwen2.5-FT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
# Example usage
text = "Your input text here"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Model Files
This model contains the following files:
config.json: Model configurationgeneration_config.json: Generation configurationtokenizer_config.json: Tokenizer configurationtokenizer.json: Tokenizer datavocab.json: Vocabulary filemodel-*.safetensors: Model weights in SafeTensors formatmodel.safetensors.index.json: Model index file
Training
This model was trained using PPO reinforcement learning to optimize for specific objectives.
Safety and Limitations
Please use this model responsibly and in accordance with applicable laws and regulations.
Citation
If you use this model, please cite:
@misc{ahan2000_Qwen2.5-FT,
title={ahan2000/Qwen2.5-FT},
author={Your Name},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/ahan2000/Qwen2.5-FT}}
}
- Downloads last month
- 1