Writer
/

palmyra-mini-thinking-a-GGUF

+# palmyra-mini-thinking-a GGUF Model Import Guide for Ollama
+This guide provides step-by-step instructions for importing the palmyra-mini-thinking-a GGUF model files into Ollama for local inference.
+## 📁 Available Model Files
+This directory contains two quantized versions of the palmyra-mini-thinking-a model:
+- `palmyra-mini-thinking-a-thinking-a-BF16.gguf` - BFloat16 precision (highest quality, largest size)
+- `palmyra-mini-thinking-a-thinking-a-Q8_0.gguf` - 8-bit quantization (high quality, medium size)
+## 🔧 Prerequisites
+Before getting started, ensure you have:
+- **Ollama installed** on your system ([Download from ollama.com](https://ollama.com/))
+- **Sufficient RAM/VRAM** for your chosen model:
+  - BF16: ~16GB+ RAM recommended
+  - Q8_0: ~8GB+ RAM recommended
+- **Terminal/Command Line access**
+## 🚀 Quick Start Guide
+### Method 1: Import Local GGUF File (Recommended)
+#### Step 1: Navigate to Model Directory
+```bash
+cd "/Users/[user]/Documents/Model Weights/SPW2 Mini Launch/palmyra-mini-thinking-a/GGUF/palmyra-mini-thinking-a FIXED GGUF-BF16"
+```
+#### Step 2: Create a Modelfile
+Create a new file named `Modelfile` (no extension) with the following content:
+**For BF16 version (highest quality):**
+```
+FROM ./palmyra-mini-thinking-a-BF16.gguf
+PARAMETER temperature 0.3
+PARAMETER num_ctx 4096
+PARAMETER top_k 40
+PARAMETER top_p 0.95
+SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear."
+```
+**For Q8_0 version (balanced):**
+```
+FROM ./palmyra-mini-thinking-a-Q8_0.gguf
+PARAMETER temperature 0.3
+PARAMETER num_ctx 4096
+PARAMETER top_k 40
+PARAMETER top_p 0.95
+SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear."
+```
+#### Step 3: Import the Model
+```bash
+ollama create palmyra-mini-thinking-a -f Modelfile
+```
+#### Step 4: Run the Model
+```bash
+ollama run palmyra-mini-thinking-a
+```
+### Method 2: Using Absolute Paths
+If you prefer to create the Modelfile elsewhere, use absolute paths:
+```
+FROM "/Users/thomas/Documents/Model Weights/SPW2 Mini Launch/palmyra-mini-thinking-a/GGUF/palmyra-mini-thinking-a FIXED GGUF-BF16/palmyra-mini-thinking-a-BF16.gguf"
+PARAMETER temperature 0.3
+PARAMETER num_ctx 4096
+SYSTEM "You are Palmyra, an advanced AI assistant created by Writer."
+```
+Then create and run:
+```bash
+ollama create palmyra-mini-thinking-a -f /path/to/your/Modelfile
+ollama run palmyra-mini-thinking-a
+```
+## ⚙️ Advanced Configuration
+### Custom Modelfile Parameters
+You can customize the model behavior by modifying these parameters in your Modelfile:
+```
+FROM ./palmyra-mini-thinking-a-BF16.gguf
+# Sampling parameters
+PARAMETER temperature 0.3          # Creativity (0.1-2.0)
+PARAMETER top_k 40                 # Top-k sampling (1-100)
+PARAMETER top_p 0.95              # Top-p sampling (0.1-1.0)
+PARAMETER repeat_penalty 1.1       # Repetition penalty (0.8-1.5)
+PARAMETER num_ctx 4096            # Context window size
+PARAMETER num_predict 512         # Max tokens to generate
+# Stop sequences
+PARAMETER stop "<|end|>"
+PARAMETER stop "<|endoftext|>"
+# System message
+SYSTEM """You are Palmyra, an advanced AI assistant created by Writer.
+You are helpful, harmless, and honest. You provide accurate and detailed
+responses while being concise and clear. You can assist with a wide range
+of tasks including writing, analysis, coding, and general questions."""
+```
+### Parameter Explanations
+- **temperature**: Controls randomness (lower = more focused, higher = more creative)
+- **top_k**: Limits vocabulary to top K tokens
+- **top_p**: Nucleus sampling threshold
+- **repeat_penalty**: Reduces repetitive text
+- **num_ctx**: Context window size (how much text the model remembers)
+- **num_predict**: Maximum tokens to generate per response
+## 🛠️ Useful Commands
+### List Available Models
+```bash
+ollama list
+```
+### View Model Information
+```bash
+ollama show palmyra-mini-thinking-a
+```
+### View Modelfile of Existing Model
+```bash
+ollama show --modelfile palmyra-mini-thinking-a
+```
+### Remove Model
+```bash
+ollama rm palmyra-mini-thinking-a
+```
+### Pull Model from Hugging Face (Alternative Method)
+If the model were available on Hugging Face, you could also use:
+```bash
+ollama run hf.co/username/repository-name
+```
+## 🔍 Choosing the Right Quantization
+| Version | File Size | Quality | Speed | RAM Usage | Best For |
+|---------|-----------|---------|-------|-----------|----------|
+| BF16 | Largest | Highest | Slower | ~16GB+ | Production, highest accuracy |
+| Q8_0 | Medium | High | Faster | ~8GB+ | Balanced performance |
+## 🐛 Troubleshooting
+### Common Issues
+**1. "File not found" error:**
+- Verify the file path in your Modelfile
+- Use absolute paths if relative paths don't work
+- Ensure the GGUF file exists in the specified location
+**2. "Out of memory" error:**
+- Try the Q8_0 quantization instead of BF16
+- Reduce `num_ctx` parameter
+- Close other applications to free up RAM
+**3. Model runs but gives poor responses:**
+- Adjust temperature and sampling parameters
+- Modify the system message
+- Try a higher quality quantization
+**4. Slow performance:**
+- Use Q8_0 quantization for faster inference
+- Reduce `num_ctx` if you don't need long context
+- Ensure you have sufficient RAM/VRAM
+### Getting Help
+- Check Ollama documentation: [https://github.com/ollama/ollama](https://github.com/ollama/ollama)
+- Ollama Discord community
+- Hugging Face GGUF documentation: [https://huggingface.co/docs/hub/en/gguf](https://huggingface.co/docs/hub/en/gguf)
+## 📚 Additional Resources
+- [Ollama Official Documentation](https://github.com/ollama/ollama/blob/main/docs/README.md)
+- [Hugging Face Ollama Integration Guide](https://huggingface.co/docs/hub/en/ollama)
+- [GGUF Format Documentation](https://huggingface.co/docs/hub/en/gguf)
+- [Modelfile Syntax Reference](https://github.com/ollama/ollama/blob/main/docs/modelfile.md)
+## 🎯 Example Usage
+Once your model is running, you can interact with it:
+```
+>>> Hello! Can you tell me about yourself?
+Hello! I'm Palmyra, an AI assistant created by Writer. I'm designed to be helpful,
+harmless, and honest in my interactions. I can assist you with a wide variety of
+tasks including writing, analysis, answering questions, coding help, and general
+conversation. I aim to provide accurate and detailed responses while being concise
+and clear. How can I help you today?
+>>> What's the significance of rabbits to Fibonacci?
+Rabbits played a significant role in the development of the Fibonacci sequence...
+```
+## 📄 License
+Please refer to the original model license and terms of use from Writer/palmyra-mini-thinking-a.
+---
+**Note**: This guide is based on Ollama's official documentation and community best practices. For the most up-to-date information, always refer to the [official Ollama documentation](https://github.com/ollama/ollama).