mmnga
/

Mixtral-Extraction-4x7B-Instruct-v0.1

Text Generation

Mixture of Experts

text-generation-inference

Model card Files Files and versions

mmnga commited on Dec 19, 2023

Commit

8997250

·

1 Parent(s): ba76463

Create README.md

Files changed (1) hide show

README.md +49 -0

README.md ADDED Viewed

	@@ -0,0 +1,49 @@

+---
+license: apache-2.0
+language:
+- fr
+- it
+- de
+- es
+- en
+inference: false
+---
+# Model Card for Mixtral-Extraction-4x7B-Instruct-v0.1
+This model is an experimental model created by merging [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) experts.
+# How we extracted experts
+Experts are selected and extracted.
+This model specifies 4 experts.
+# How To Convert
+use colab cpu-high-memory.
+You can extract experts 1-7 by selecting experts as bit string.
+~~~python
+experts_extract_bit = "11110000"
+~~~
+[convert_mixtral_8x7b_to_4x7b_extract.ipynb](https://huggingface.co/mmnga/Mixtral-Extraction-4x7B-Instruct-v0.1/new/main/?filename=README.md)
+# Usage
+~~~python
+pip install git+https://github.com/huggingface/transformers --upgrade
+pip install torch accelerate bitsandbytes flash_attn
+~~~
+~~~python
+from transformers import AutoTokenizer, AutoModelForCausalLM, MixtralForCausalLM
+import torch
+model_name_or_path = "mmnga/Mixtral-Extraction-4x7B-Instruct-v0.1"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+model = MixtralForCausalLM.from_pretrained(model_name_or_path, load_in_8bit=True)
+text = "[INST] What was John Holt's vision on education? [/INST] "
+# text = "[INST] What is the best anime? [/INST] "
+inputs = tokenizer("<s> " + text, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=128)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+~~~