palmyra-mini-thinking-a-GGUF / ollama-README-A.md

tperes

Upload ollama-README-A.md

c77808f verified about 1 month ago

preview code

raw

history blame contribute delete

7.05 kB

palmyra-mini-thinking-a GGUF Model Import Guide for Ollama

This guide provides step-by-step instructions for importing the palmyra-mini-thinking-a GGUF model files into Ollama for local inference.

📁 Available Model Files

This directory contains two quantized versions of the palmyra-mini-thinking-a model:

palmyra-mini-thinking-a-thinking-a-BF16.gguf - BFloat16 precision (highest quality, largest size)
palmyra-mini-thinking-a-thinking-a-Q8_0.gguf - 8-bit quantization (high quality, medium size)

🔧 Prerequisites

Before getting started, ensure you have:

Ollama installed on your system (Download from ollama.com)
Sufficient RAM/VRAM for your chosen model:
- BF16: ~16GB+ RAM recommended
- Q8_0: ~8GB+ RAM recommended
Terminal/Command Line access

🚀 Quick Start Guide

Method 1: Import Local GGUF File (Recommended)

Step 1: Navigate to Model Directory

cd "/Users/[user]/Documents/Model Weights/SPW2 Mini Launch/palmyra-mini-thinking-a/GGUF/palmyra-mini-thinking-a FIXED GGUF-BF16"

Step 2: Create a Modelfile

Create a new file named Modelfile (no extension) with the following content:

For BF16 version (highest quality):

FROM ./palmyra-mini-thinking-a-BF16.gguf
PARAMETER temperature 0.3
PARAMETER num_ctx 4096
PARAMETER top_k 40
PARAMETER top_p 0.95
SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear."

For Q8_0 version (balanced):

FROM ./palmyra-mini-thinking-a-Q8_0.gguf
PARAMETER temperature 0.3
PARAMETER num_ctx 4096
PARAMETER top_k 40
PARAMETER top_p 0.95
SYSTEM "You are Palmyra, an advanced AI assistant created by Writer. You are helpful and honest. You provide accurate and detailed responses while being concise and clear."

Step 3: Import the Model

ollama create palmyra-mini-thinking-a -f Modelfile

Step 4: Run the Model

ollama run palmyra-mini-thinking-a

Method 2: Using Absolute Paths

If you prefer to create the Modelfile elsewhere, use absolute paths:

FROM "/Users/thomas/Documents/Model Weights/SPW2 Mini Launch/palmyra-mini-thinking-a/GGUF/palmyra-mini-thinking-a FIXED GGUF-BF16/palmyra-mini-thinking-a-BF16.gguf"
PARAMETER temperature 0.3
PARAMETER num_ctx 4096
SYSTEM "You are Palmyra, an advanced AI assistant created by Writer."

Then create and run:

ollama create palmyra-mini-thinking-a -f /path/to/your/Modelfile
ollama run palmyra-mini-thinking-a

⚙️ Advanced Configuration

Custom Modelfile Parameters

You can customize the model behavior by modifying these parameters in your Modelfile:

FROM ./palmyra-mini-thinking-a-BF16.gguf

# Sampling parameters
PARAMETER temperature 0.3          # Creativity (0.1-2.0)
PARAMETER top_k 40                 # Top-k sampling (1-100)
PARAMETER top_p 0.95              # Top-p sampling (0.1-1.0)
PARAMETER repeat_penalty 1.1       # Repetition penalty (0.8-1.5)
PARAMETER num_ctx 4096            # Context window size
PARAMETER num_predict 512         # Max tokens to generate

# Stop sequences
PARAMETER stop "<|end|>"
PARAMETER stop "<|endoftext|>"

# System message
SYSTEM """You are Palmyra, an advanced AI assistant created by Writer. 
You are helpful, harmless, and honest. You provide accurate and detailed 
responses while being concise and clear. You can assist with a wide range 
of tasks including writing, analysis, coding, and general questions."""

Parameter Explanations

temperature: Controls randomness (lower = more focused, higher = more creative)
top_k: Limits vocabulary to top K tokens
top_p: Nucleus sampling threshold
repeat_penalty: Reduces repetitive text
num_ctx: Context window size (how much text the model remembers)
num_predict: Maximum tokens to generate per response

🛠️ Useful Commands

List Available Models

ollama list

View Model Information

ollama show palmyra-mini-thinking-a

View Modelfile of Existing Model

ollama show --modelfile palmyra-mini-thinking-a

Remove Model

ollama rm palmyra-mini-thinking-a

Pull Model from Hugging Face (Alternative Method)

If the model were available on Hugging Face, you could also use:

ollama run hf.co/username/repository-name

🔍 Choosing the Right Quantization

Version	File Size	Quality	Speed	RAM Usage	Best For
BF16	Largest	Highest	Slower	~16GB+	Production, highest accuracy
Q8_0	Medium	High	Faster	~8GB+	Balanced performance

🐛 Troubleshooting

Common Issues

1. "File not found" error:

Verify the file path in your Modelfile
Use absolute paths if relative paths don't work
Ensure the GGUF file exists in the specified location

2. "Out of memory" error:

Try the Q8_0 quantization instead of BF16
Reduce num_ctx parameter
Close other applications to free up RAM

3. Model runs but gives poor responses:

Adjust temperature and sampling parameters
Modify the system message
Try a higher quality quantization

4. Slow performance:

Use Q8_0 quantization for faster inference
Reduce num_ctx if you don't need long context
Ensure you have sufficient RAM/VRAM

Getting Help

Check Ollama documentation: https://github.com/ollama/ollama
Ollama Discord community
Hugging Face GGUF documentation: https://huggingface.co/docs/hub/en/gguf

📚 Additional Resources

🎯 Example Usage

Once your model is running, you can interact with it:

>>> Hello! Can you tell me about yourself?

Hello! I'm Palmyra, an AI assistant created by Writer. I'm designed to be helpful, 
harmless, and honest in my interactions. I can assist you with a wide variety of 
tasks including writing, analysis, answering questions, coding help, and general 
conversation. I aim to provide accurate and detailed responses while being concise 
and clear. How can I help you today?

>>> What's the significance of rabbits to Fibonacci?

Rabbits played a significant role in the development of the Fibonacci sequence...

📄 License

Please refer to the original model license and terms of use from Writer/palmyra-mini-thinking-a.

Note: This guide is based on Ollama's official documentation and community best practices. For the most up-to-date information, always refer to the official Ollama documentation.