LIMI-Air-qx86-hi-mlx

I thought I'd apply the Deckard Formula(qx), a combination of layers in different precisions and strengths, that has the effect of increasing the model creativity.

The GLM Air is known for its lack of sense of humour, so I thought, that would be the best fit

I created this formula inspired by the Nikon Noct Z 58mm F/0.95, that is remarkable for its abilities to render the scene with a human-like feeling, provides a thin depth of field to isolate the subject from the background, and pleasantly blurs out the rest creating memorable images.

As a photographer I know how lenses work, I applied the same principles to cognition.

Here are some ideas how this formula performs, from initial tests on a q5 quant series

The q5-hi quant

The q5-hi quant was tested with group size 32 at quanting, meaning high precision.

Given a standard task, the LIMI-Air-q5-hi was done quick.

Thinks for 42s, drafts a few things in a hurry

29.96 tok/sec
4002 tokens
4.74s to first token

Very basic output, boring.

The qx5 quant

In the LIMI-Air-qx5 quant, I use 5 bits for content and context in high precision, 6 bits for attention and 8 bits for head

Thinks for 2min 28s, more engaged in the discovery process in the think tag

29.22 tok/sec
7067 tokens
6.92s to first token

The qx5 provided a bit more detail, seems happier.

The qx65-hi quant

In the LIMI-Air-qx65-hi quant, I use 5 bits for content, 6 bits for attention and context, 8 bits for head, all in high precision

Thinks for 5min 30s:

Given the complexity, we might not be able to complete the entire code in one response. We'll provide a skeleton for each module and then fill in key functions. /think

...writes quite complete code, tests and all.

25.21 tok/sec
15111 tokens
7.04s to first token

Summary of Deckard Formula performance

The qx65-hi delivered twice as many tokens in form of very solid reasoning in the think tag, and a competent assembly of software as requested, and in the limits of a single response. Subsequent queries revealed the model being very cooperative and enjoying the process of debugging and refinement.

The qx86-hi is the same formula, one step up from 5 to 6 bit for content, 8 bit otherwise, all in high precision

This quant formula makes the model happier and more eager to explore and innovate

-G

This model LIMI-Air-qx86-hi-mlx was converted to MLX format from GAIR/LIMI-Air using mlx-lm version 0.27.1.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("LIMI-Air-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Downloads last month: 83

Safetensors

Model size

107B params

Tensor type

BF16

U32

F32

Model tree for nightmedia/LIMI-Air-qx86-hi-mlx

Base model

GAIR/LIMI-Air

Quantized

(4)

this model

Collection including nightmedia/LIMI-Air-qx86-hi-mlx

Coding with MoEs

Collection

Unleash the inner experts • 93 items • Updated about 9 hours ago • 1