Upload ollama-README-A.md
Browse files- ollama-README-A.md +216 -0
ollama-README-A.md
ADDED
@@ -0,0 +1,216 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# palmyra-mini-thinking-a GGUF Model Import Guide for Ollama
|
2 |
+
|
3 |
+
This guide provides step-by-step instructions for importing the palmyra-mini-thinking-a GGUF model files into Ollama for local inference.
|
4 |
+
|
5 |
+
## 📁 Available Model Files
|
6 |
+
|
7 |
+
This directory contains two quantized versions of the palmyra-mini-thinking-a model:
|
8 |
+
|
9 |
+
- `palmyra-mini-thinking-a-thinking-a-BF16.gguf` - BFloat16 precision (highest quality, largest size)
|
10 |
+
- `palmyra-mini-thinking-a-thinking-a-Q8_0.gguf` - 8-bit quantization (high quality, medium size)
|
11 |
+
|
12 |
+
## 🔧 Prerequisites
|
13 |
+
|
14 |
+
Before getting started, ensure you have:
|
15 |
+
|
16 |
+
- **Ollama installed** on your system ([Download from ollama.com](https://ollama.com/))
|
17 |
+
- **Sufficient RAM/VRAM** for your chosen model:
|
18 |
+
- BF16: ~16GB+ RAM recommended
|
19 |
+
- Q8_0: ~8GB+ RAM recommended
|
20 |
+
- **Terminal/Command Line access**
|
21 |
+
|
22 |
+
## 🚀 Quick Start Guide
|
23 |
+
|
24 |
+
### Method 1: Import Local GGUF File (Recommended)
|
25 |
+
|
26 |
+
#### Step 1: Navigate to Model Directory
|
27 |
+
```bash
|
28 |
+
cd "/Users/[user]/Documents/Model Weights/SPW2 Mini Launch/palmyra-mini-thinking-a/GGUF/palmyra-mini-thinking-a FIXED GGUF-BF16"
|
29 |
+
```
|
30 |
+
|
31 |
+
#### Step 2: Create a Modelfile
|
32 |
+
Create a new file named `Modelfile` (no extension) with the following content:
|
33 |
+
|
34 |
+
**For BF16 version (highest quality):**
|
35 |
+
```
|
36 |
+
FROM ./palmyra-mini-thinking-a-BF16.gguf
|
37 |
+
PARAMETER temperature 0.3
|
38 |
+
PARAMETER num_ctx 4096
|
39 |
+
PARAMETER top_k 40
|
40 |
+
PARAMETER top_p 0.95
|
41 |
+
SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear."
|
42 |
+
```
|
43 |
+
|
44 |
+
**For Q8_0 version (balanced):**
|
45 |
+
```
|
46 |
+
FROM ./palmyra-mini-thinking-a-Q8_0.gguf
|
47 |
+
PARAMETER temperature 0.3
|
48 |
+
PARAMETER num_ctx 4096
|
49 |
+
PARAMETER top_k 40
|
50 |
+
PARAMETER top_p 0.95
|
51 |
+
SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear."
|
52 |
+
```
|
53 |
+
|
54 |
+
|
55 |
+
|
56 |
+
#### Step 3: Import the Model
|
57 |
+
```bash
|
58 |
+
ollama create palmyra-mini-thinking-a -f Modelfile
|
59 |
+
```
|
60 |
+
|
61 |
+
#### Step 4: Run the Model
|
62 |
+
```bash
|
63 |
+
ollama run palmyra-mini-thinking-a
|
64 |
+
```
|
65 |
+
|
66 |
+
### Method 2: Using Absolute Paths
|
67 |
+
|
68 |
+
If you prefer to create the Modelfile elsewhere, use absolute paths:
|
69 |
+
|
70 |
+
```
|
71 |
+
FROM "/Users/thomas/Documents/Model Weights/SPW2 Mini Launch/palmyra-mini-thinking-a/GGUF/palmyra-mini-thinking-a FIXED GGUF-BF16/palmyra-mini-thinking-a-BF16.gguf"
|
72 |
+
PARAMETER temperature 0.3
|
73 |
+
PARAMETER num_ctx 4096
|
74 |
+
SYSTEM "You are Palmyra, an advanced AI assistant created by Writer."
|
75 |
+
```
|
76 |
+
|
77 |
+
Then create and run:
|
78 |
+
```bash
|
79 |
+
ollama create palmyra-mini-thinking-a -f /path/to/your/Modelfile
|
80 |
+
ollama run palmyra-mini-thinking-a
|
81 |
+
```
|
82 |
+
|
83 |
+
## ⚙️ Advanced Configuration
|
84 |
+
|
85 |
+
### Custom Modelfile Parameters
|
86 |
+
|
87 |
+
You can customize the model behavior by modifying these parameters in your Modelfile:
|
88 |
+
|
89 |
+
```
|
90 |
+
FROM ./palmyra-mini-thinking-a-BF16.gguf
|
91 |
+
|
92 |
+
# Sampling parameters
|
93 |
+
PARAMETER temperature 0.3 # Creativity (0.1-2.0)
|
94 |
+
PARAMETER top_k 40 # Top-k sampling (1-100)
|
95 |
+
PARAMETER top_p 0.95 # Top-p sampling (0.1-1.0)
|
96 |
+
PARAMETER repeat_penalty 1.1 # Repetition penalty (0.8-1.5)
|
97 |
+
PARAMETER num_ctx 4096 # Context window size
|
98 |
+
PARAMETER num_predict 512 # Max tokens to generate
|
99 |
+
|
100 |
+
# Stop sequences
|
101 |
+
PARAMETER stop "<|end|>"
|
102 |
+
PARAMETER stop "<|endoftext|>"
|
103 |
+
|
104 |
+
# System message
|
105 |
+
SYSTEM """You are Palmyra, an advanced AI assistant created by Writer.
|
106 |
+
You are helpful, harmless, and honest. You provide accurate and detailed
|
107 |
+
responses while being concise and clear. You can assist with a wide range
|
108 |
+
of tasks including writing, analysis, coding, and general questions."""
|
109 |
+
```
|
110 |
+
|
111 |
+
### Parameter Explanations
|
112 |
+
|
113 |
+
- **temperature**: Controls randomness (lower = more focused, higher = more creative)
|
114 |
+
- **top_k**: Limits vocabulary to top K tokens
|
115 |
+
- **top_p**: Nucleus sampling threshold
|
116 |
+
- **repeat_penalty**: Reduces repetitive text
|
117 |
+
- **num_ctx**: Context window size (how much text the model remembers)
|
118 |
+
- **num_predict**: Maximum tokens to generate per response
|
119 |
+
|
120 |
+
## 🛠️ Useful Commands
|
121 |
+
|
122 |
+
### List Available Models
|
123 |
+
```bash
|
124 |
+
ollama list
|
125 |
+
```
|
126 |
+
|
127 |
+
### View Model Information
|
128 |
+
```bash
|
129 |
+
ollama show palmyra-mini-thinking-a
|
130 |
+
```
|
131 |
+
|
132 |
+
### View Modelfile of Existing Model
|
133 |
+
```bash
|
134 |
+
ollama show --modelfile palmyra-mini-thinking-a
|
135 |
+
```
|
136 |
+
|
137 |
+
### Remove Model
|
138 |
+
```bash
|
139 |
+
ollama rm palmyra-mini-thinking-a
|
140 |
+
```
|
141 |
+
|
142 |
+
### Pull Model from Hugging Face (Alternative Method)
|
143 |
+
If the model were available on Hugging Face, you could also use:
|
144 |
+
```bash
|
145 |
+
ollama run hf.co/username/repository-name
|
146 |
+
```
|
147 |
+
|
148 |
+
## 🔍 Choosing the Right Quantization
|
149 |
+
|
150 |
+
| Version | File Size | Quality | Speed | RAM Usage | Best For |
|
151 |
+
|---------|-----------|---------|-------|-----------|----------|
|
152 |
+
| BF16 | Largest | Highest | Slower | ~16GB+ | Production, highest accuracy |
|
153 |
+
| Q8_0 | Medium | High | Faster | ~8GB+ | Balanced performance |
|
154 |
+
|
155 |
+
## 🐛 Troubleshooting
|
156 |
+
|
157 |
+
### Common Issues
|
158 |
+
|
159 |
+
**1. "File not found" error:**
|
160 |
+
- Verify the file path in your Modelfile
|
161 |
+
- Use absolute paths if relative paths don't work
|
162 |
+
- Ensure the GGUF file exists in the specified location
|
163 |
+
|
164 |
+
**2. "Out of memory" error:**
|
165 |
+
- Try the Q8_0 quantization instead of BF16
|
166 |
+
- Reduce `num_ctx` parameter
|
167 |
+
- Close other applications to free up RAM
|
168 |
+
|
169 |
+
**3. Model runs but gives poor responses:**
|
170 |
+
- Adjust temperature and sampling parameters
|
171 |
+
- Modify the system message
|
172 |
+
- Try a higher quality quantization
|
173 |
+
|
174 |
+
**4. Slow performance:**
|
175 |
+
- Use Q8_0 quantization for faster inference
|
176 |
+
- Reduce `num_ctx` if you don't need long context
|
177 |
+
- Ensure you have sufficient RAM/VRAM
|
178 |
+
|
179 |
+
### Getting Help
|
180 |
+
|
181 |
+
- Check Ollama documentation: [https://github.com/ollama/ollama](https://github.com/ollama/ollama)
|
182 |
+
- Ollama Discord community
|
183 |
+
- Hugging Face GGUF documentation: [https://huggingface.co/docs/hub/en/gguf](https://huggingface.co/docs/hub/en/gguf)
|
184 |
+
|
185 |
+
## 📚 Additional Resources
|
186 |
+
|
187 |
+
- [Ollama Official Documentation](https://github.com/ollama/ollama/blob/main/docs/README.md)
|
188 |
+
- [Hugging Face Ollama Integration Guide](https://huggingface.co/docs/hub/en/ollama)
|
189 |
+
- [GGUF Format Documentation](https://huggingface.co/docs/hub/en/gguf)
|
190 |
+
- [Modelfile Syntax Reference](https://github.com/ollama/ollama/blob/main/docs/modelfile.md)
|
191 |
+
|
192 |
+
## 🎯 Example Usage
|
193 |
+
|
194 |
+
Once your model is running, you can interact with it:
|
195 |
+
|
196 |
+
```
|
197 |
+
>>> Hello! Can you tell me about yourself?
|
198 |
+
|
199 |
+
Hello! I'm Palmyra, an AI assistant created by Writer. I'm designed to be helpful,
|
200 |
+
harmless, and honest in my interactions. I can assist you with a wide variety of
|
201 |
+
tasks including writing, analysis, answering questions, coding help, and general
|
202 |
+
conversation. I aim to provide accurate and detailed responses while being concise
|
203 |
+
and clear. How can I help you today?
|
204 |
+
|
205 |
+
>>> What's the significance of rabbits to Fibonacci?
|
206 |
+
|
207 |
+
Rabbits played a significant role in the development of the Fibonacci sequence...
|
208 |
+
```
|
209 |
+
|
210 |
+
## 📄 License
|
211 |
+
|
212 |
+
Please refer to the original model license and terms of use from Writer/palmyra-mini-thinking-a.
|
213 |
+
|
214 |
+
---
|
215 |
+
|
216 |
+
**Note**: This guide is based on Ollama's official documentation and community best practices. For the most up-to-date information, always refer to the [official Ollama documentation](https://github.com/ollama/ollama).
|