ahan2000/Qwen2.5-FT

This is a fine-tuned model using PPO (Proximal Policy Optimization) reinforcement learning.

Model Details

  • Model Type: Language Model
  • Training Method: PPO (Proximal Policy Optimization)
  • Framework: PyTorch
  • Model Format: SafeTensors

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "ahan2000/Qwen2.5-FT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

# Example usage
text = "Your input text here"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Model Files

This model contains the following files:

  • config.json: Model configuration
  • generation_config.json: Generation configuration
  • tokenizer_config.json: Tokenizer configuration
  • tokenizer.json: Tokenizer data
  • vocab.json: Vocabulary file
  • model-*.safetensors: Model weights in SafeTensors format
  • model.safetensors.index.json: Model index file

Training

This model was trained using PPO reinforcement learning to optimize for specific objectives.

Safety and Limitations

Please use this model responsibly and in accordance with applicable laws and regulations.

Citation

If you use this model, please cite:

@misc{ahan2000_Qwen2.5-FT,
  title={ahan2000/Qwen2.5-FT},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/ahan2000/Qwen2.5-FT}}
}
Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support