File size: 7,003 Bytes
e19ad6a
 
 
8e96186
 
 
 
 
eefe4a2
8e96186
 
 
 
 
312a1a2
7a17864
 
 
 
 
8e96186
7a17864
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e96186
 
75ad630
8e96186
 
 
 
 
 
493e30c
8e96186
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
base_model:
- deepseek-ai/DeepSeek-V3.1
pipeline_tag: text-generation
---

## Model Details

This model is a mixed int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) generated by [intel/auto-round](https://github.com/intel/auto-round) **via RTN(no algorithm tuning)**. 
Non expert layers are fallback to 8 bits. Please refer to Section Generate the model for more details.
Please follow the license of the original model.

## How To Use

### INT4 Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
import torch
quantized_model_dir = "Intel/DeepSeek-V3.1-int4-mixed-AutoRound"

model = AutoModelForCausalLM.from_pretrained(
        quantized_model_dir,
        torch_dtype=torch.bfloat16,
        device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
        "9.11和9.8哪个数字大",
        "strawberry中有几个r?",
        "There is a girl who likes adventure,",
        "Please give a brief introduction of DeepSeek company.",
        ]

texts=[]
for prompt in prompts:
    messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
            )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
        input_ids=inputs["input_ids"].to(model.device),
        attention_mask=inputs["attention_mask"].to(model.device),
        max_length=200, ##change this to align with the official usage
        num_return_sequences=1,
        do_sample=False  ##change this to align with the official usage
        )
generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
        ]
decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")

"""
Prompt: 9.11和9.8哪个数字大
Generated: 9.11 和 9.8 比较时,9.11 更大。
- 因为 9.11 相当于 9 + 0.11,而 9.8 相当于 9 + 0.8,但注意这里 0.11 实际上小于 0.8(0.11 < 0.8),所以 9.8 更大。
- 重新确认:9.11 是 9.11,9.8 是 9.80,因此 9.80 > 9.11。

**答案:9.8 更大。**
--------------------------------------------------
Prompt: strawberry中有几个r?
Generated: 在英文单词 "strawberry" 中,字母 "r" 出现了 **3 次**- 位置:第 3 个字母(s**t**r**a**w**b**e**r**r**y,注意:第 1 个 "r" 是第 3 字符,第 2 个 "r" 是第 6 字符,第 3 个 "r" 是第 7 字符)。

如果需要进一步解释或其他问题,请随时告知! 😊
--------------------------------------------------
Prompt: There is a girl who likes adventure,
Generated: Of course! A girl who likes adventure is a fantastic starting point for a story, a character, or a real-life inspiration. Here are a few ways to explore that idea:

### As a Character Profile:

**Name:** Let's call her **Elara**.

**Traits:**
*   **Curious:** She asks "why" and "what if" more than anyone else. She sees a hidden path in the woods and has to know where it leads.
*   **Resourceful:** She's the one with a multi-tool in her pocket, who knows how to read a map (and the stars), and can build a fire.
*   **Brave, not fearless:** She feels the fear of climbing the tall cliff or exploring the dark cave, but her curiosity and determination are stronger.
*   **Resilient:** She doesn't see a wrong turn
--------------------------------------------------
Prompt: Please give a brief introduction of DeepSeek company.
Generated: Of course. Here is a brief introduction to DeepSeek:

**DeepSeek** is a leading Chinese AI research company focused on developing powerful artificial general intelligence (AGI). The company is best known for creating state-of-the-art large language models (LLMs).

**Key Highlights:**

*   **Core Product:** Their flagship product is the **DeepSeek-V2** language model, a powerful and efficient AI known for its strong performance in coding, mathematics, and general reasoning.
*   **Open-Source Commitment:** DeepSeek has gained significant recognition for open-sourcing its earlier models (like DeepSeek-Coder and DeepSeek-LLM 67B), making them freely available for research and commercial use. This has helped foster innovation and build a strong developer community.
*   **Specialization in Coding:** They are particularly renowned for their models' exceptional capabilities
--------------------------------------------------

"""
```

### Generate the model
Mian branch is required if the model is fp8 and the device supports fp8  https://github.com/intel/auto-round
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
from auto_round import AutoRound

model_name = "deepseek-ai/DeepSeek-V3.1"

layer_config = {}
for n, m in model.named_modules():
    if isinstance(m, torch.nn.Linear):
        if "expert" in n and "shared_experts" not in n:
            layer_config[n] = {"bits": 4}
            print(n, 4)
        elif n != "lm_head":
            layer_config[n] = {"bits": 8}
            print(n, 8)

autoround = AutoRound(model_name, iters=0, layer_config=layer_config)
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")

```


## Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

## Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)

## Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

## Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)