|
--- |
|
tags: |
|
- deepsparse |
|
--- |
|
## Usage |
|
|
|
```python |
|
from deepsparse import TextGeneration |
|
|
|
prompt = "How to get in a good university?" |
|
formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n" |
|
|
|
model = TextGeneration(model="hf:neuralmagic/TinyLlama-1.1B-Chat-v0.3-pruned50-quant-ds") |
|
print(model(formatted_prompt, max_new_tokens=200).generations[0].text) |
|
|
|
""" |
|
Getting into a good university is a complex process that involves factors such as academic performance, financial aid, and personal qualifications. Here are some steps you can follow to get in a good university: |
|
|
|
1. Academic performance: |
|
|
|
- Look for a university that has a strong academic program, including a well-rounded curriculum that covers a wide range of subjects. |
|
- Check if the university offers a clear curriculum that includes a clear sequence of courses. |
|
- Check if the university offers a clear pathway to graduation, including clear dates and deadlines. |
|
|
|
2. Financial aid: |
|
|
|
- Look for a university that offers financial aid, such as scholarships, grants, or loans. |
|
- Check if the university offers financial aid that fits your budget. |
|
- Consider the university's financial aid package, including the cost of tuition, room and board, and other expenses. |
|
""" |
|
``` |
|
|
|
## One-shot and Export |
|
|
|
``` |
|
git clone https://github.com/neuralmagic/sparseml |
|
pip install -e "sparseml[transformers]" "torch<2" |
|
python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py PY007/TinyLlama-1.1B-Chat-v0.3 open_platypus --recipe recipe.yaml --save True |
|
python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment --sequence_length 512 |
|
cp deployment/model.onnx deployment/model-orig.onnx |
|
python onnx_kv_inject.py --input-file deployment/model-orig.onnx --output-file deployment/model.onnx |
|
``` |
|
|
|
`recipe.yaml` |
|
``` |
|
test_stage: |
|
obcq_modifiers: |
|
SparseGPTModifier: |
|
sparsity: 0.5 |
|
block_size: 128 |
|
sequential_update: false |
|
quantize: |
|
QuantizationModifier: |
|
ignore: |
|
- LlamaRotaryEmbedding |
|
- LlamaRMSNorm |
|
- SiLUActivation |
|
- model.layers.21.mlp.down_proj |
|
- model.layers.7.mlp.down_proj |
|
- model.layers.2.mlp.down_proj |
|
- model.layers.20.mlp.down_proj |
|
- model.layers.19.mlp.down_proj |
|
post_oneshot_calibration: false |
|
scheme_overrides: |
|
Embedding: |
|
input_activations: null |
|
weights: |
|
num_bits: 8 |
|
symmetric: false |
|
percdamp: 0.01 |
|
prunen: 0 |
|
prunem: 0 |
|
targets: |
|
- model.layers.0 |
|
- model.layers.1 |
|
- model.layers.2 |
|
- model.layers.3 |
|
- model.layers.4 |
|
- model.layers.5 |
|
- model.layers.6 |
|
- model.layers.7 |
|
- model.layers.8 |
|
- model.layers.9 |
|
- model.layers.10 |
|
- model.layers.11 |
|
- model.layers.12 |
|
- model.layers.13 |
|
- model.layers.14 |
|
- model.layers.15 |
|
- model.layers.16 |
|
- model.layers.17 |
|
- model.layers.18 |
|
- model.layers.19 |
|
- model.layers.20 |
|
- model.layers.21 |
|
target_ids: |
|
- attention_mask |
|
- position_ids |
|
``` |