File size: 2,123 Bytes
deec282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: apache-2.0
---
---
language: en
tags:
  - text-generation
  - causal-lm
  - fine-tuning
  - unsupervised
---

# Model Name: olabs-ai/reflection_model

## Model Description

The `olabs-ai/reflection_model` is a fine-tuned language model based on [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/Meta-Llama-3.1-8B-Instruct). It has been further fine-tuned using LoRA (Low-Rank Adaptation) for improved performance in specific tasks. This model is designed for text generation and can be used for various applications like conversational agents, content creation, and more.

## Model Details

- **Base Model**: Meta-Llama-3.1-8B-Instruct
- **Fine-Tuning Method**: LoRA
- **Architecture**: LlamaForCausalLM
- **Number of Parameters**: 8 Billion (Base Model)
- **Training Data**: [Details about the training data used for fine-tuning, if available]

## Usage

To use this model, you need to have the `transformers` and `unsloth` libraries installed. You can load the model and tokenizer as follows:

```python
from transformers import AutoConfig, AutoModel, AutoTokenizer
from unsloth import FastLanguageModel

# Load base model configuration
base_model_name = "olabs-ai/Meta-Llama-3.1-8B-Instruct"
base_config = AutoConfig.from_pretrained(base_model_name)
base_model = AutoModel.from_pretrained(base_model_name, config=base_config)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load LoRA adapter
adapter_config_path = "path_to_your_adapter_config.json"
adapter_weights_path = "path_to_your_adapter_weights"

# Use FastLanguageModel to apply LoRA adapter
model = FastLanguageModel.from_pretrained(
    model_name=base_model_name,
    adapter_weights=adapter_weights_path,
    config=adapter_config_path
)

# Set inference mode for LoRA
FastLanguageModel.for_inference(model)

# Prepare inputs
custom_prompt = "What is a famous tall tower in Paris?"
inputs = tokenizer([custom_prompt], return_tensors="pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)

# Generate outputs
outputs = model.generate(**inputs, streamer=text_streamer, max_new_tokens=1000)