|
--- |
|
license: apache-2.0 |
|
--- |
|
--- |
|
language: en |
|
tags: |
|
- text-generation |
|
- causal-lm |
|
- fine-tuning |
|
- unsupervised |
|
--- |
|
|
|
# Model Name: olabs-ai/reflection_model |
|
|
|
## Model Description |
|
|
|
The `olabs-ai/reflection_model` is a fine-tuned language model based on [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/Meta-Llama-3.1-8B-Instruct). It has been further fine-tuned using LoRA (Low-Rank Adaptation) for improved performance in specific tasks. This model is designed for text generation and can be used for various applications like conversational agents, content creation, and more. |
|
|
|
## Model Details |
|
|
|
- **Base Model**: Meta-Llama-3.1-8B-Instruct |
|
- **Fine-Tuning Method**: LoRA |
|
- **Architecture**: LlamaForCausalLM |
|
- **Number of Parameters**: 8 Billion (Base Model) |
|
- **Training Data**: [Details about the training data used for fine-tuning, if available] |
|
|
|
## Usage |
|
|
|
To use this model, you need to have the `transformers` and `unsloth` libraries installed. You can load the model and tokenizer as follows: |
|
|
|
```python |
|
from transformers import AutoConfig, AutoModel, AutoTokenizer |
|
from unsloth import FastLanguageModel |
|
|
|
# Load base model configuration |
|
base_model_name = "olabs-ai/Meta-Llama-3.1-8B-Instruct" |
|
base_config = AutoConfig.from_pretrained(base_model_name) |
|
base_model = AutoModel.from_pretrained(base_model_name, config=base_config) |
|
tokenizer = AutoTokenizer.from_pretrained(base_model_name) |
|
|
|
# Load LoRA adapter |
|
adapter_config_path = "path_to_your_adapter_config.json" |
|
adapter_weights_path = "path_to_your_adapter_weights" |
|
|
|
# Use FastLanguageModel to apply LoRA adapter |
|
model = FastLanguageModel.from_pretrained( |
|
model_name=base_model_name, |
|
adapter_weights=adapter_weights_path, |
|
config=adapter_config_path |
|
) |
|
|
|
# Set inference mode for LoRA |
|
FastLanguageModel.for_inference(model) |
|
|
|
# Prepare inputs |
|
custom_prompt = "What is a famous tall tower in Paris?" |
|
inputs = tokenizer([custom_prompt], return_tensors="pt").to("cuda") |
|
|
|
from transformers import TextStreamer |
|
text_streamer = TextStreamer(tokenizer) |
|
|
|
# Generate outputs |
|
outputs = model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000) |
|
|