--- license: apache-2.0 language: - en - zh - ja - ko - fr - ar - es - pt metrics: - accuracy base_model: - BlinkDL/rwkv7-g1 pipeline_tag: text-generation --- # rwkv7-2.9B-g1 GGUF Models ## Model Generation Details This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`4807e8f9`](https://github.com/ggerganov/llama.cpp/commit/4807e8f96a61b2adccebd5e57444c94d18de7264). --- Click here to get info on choosing the right GGUF model format --- # rwkv7-2.9B-g1 This is RWKV-7 model under flash-linear attention format. ## Model Details ### Model Description - **Developed by:** Bo Peng, Yu Zhang, Songlin Yang, Ruichong Zhang, Zhiyuan Li - **Funded by:** RWKV Project (Under LF AI & Data Foundation) - **Model type:** RWKV7 - **Language(s) (NLP):** Multilingual - **License:** Apache-2.0 - **Parameter count:** 2.9B - **Tokenizer:** RWKV World tokenizer - **Vocabulary size:** 65,536 ### Model Sources - **Repository:** https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM - **Paper:** https://arxiv.org/abs/2503.14456 - **Model:** https://huggingface.co/BlinkDL/rwkv7-g1/resolve/main/rwkv7-g1-2.9b-20250519-ctx4096.pth ## Uses Install `flash-linear-attention` and the latest version of `transformers` before using this model: ```bash pip install git+https://github.com/fla-org/flash-linear-attention pip install 'transformers>=4.48.0' ``` ### Direct Use You can use this model just as any other HuggingFace models: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-2.9B-g1', trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-2.9B-g1', trust_remote_code=True) model = model.cuda() # Supported on Nvidia/AMD/Intel eg. model.xpu() prompt = "What is a large language model?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True # Default is True, set to False to disable thinking ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=1024, do_sample=True, temperature=1.0, top_p=0.3, repetition_penalty=1.2 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=False)[0] print(response) ``` ## FAQ Q: safetensors metadata is none. A: upgrade transformers to >=4.48.0: `pip install 'transformers>=4.48.0'` --- # πŸš€ If you find these models useful Help me test my **AI-Powered Quantum Network Monitor Assistant** with **quantum-ready security checks**: πŸ‘‰ [Quantum Network Monitor](https://readyforquantum.com/?assistant=open&utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme) The full Open Source Code for the Quantum Network Monitor Service available at my github repos ( repos with NetworkMonitor in the name) : [Source Code Quantum Network Monitor](https://github.com/Mungert69). You will also find the code I use to quantize the models if you want to do it yourself [GGUFModelBuilder](https://github.com/Mungert69/GGUFModelBuilder) πŸ’¬ **How to test**: Choose an **AI assistant type**: - `TurboLLM` (GPT-4.1-mini) - `HugLLM` (Hugginface Open-source models) - `TestLLM` (Experimental CPU-only) ### **What I’m Testing** I’m pushing the limits of **small open-source models for AI network monitoring**, specifically: - **Function calling** against live network services - **How small can a model go** while still handling: - Automated **Nmap security scans** - **Quantum-readiness checks** - **Network Monitoring tasks** 🟑 **TestLLM** – Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space): - βœ… **Zero-configuration setup** - ⏳ 30s load time (slow inference but **no API costs**) . No token limited as the cost is low. - πŸ”§ **Help wanted!** If you’re into **edge-device AI**, let’s collaborate! ### **Other Assistants** 🟒 **TurboLLM** – Uses **gpt-4.1-mini** : - **It performs very well but unfortunatly OpenAI charges per token. For this reason tokens usage is limited. - **Create custom cmd processors to run .net code on Quantum Network Monitor Agents** - **Real-time network diagnostics and monitoring** - **Security Audits** - **Penetration testing** (Nmap/Metasploit) πŸ”΅ **HugLLM** – Latest Open-source models: - 🌐 Runs on Hugging Face Inference API. Performs pretty well using the lastest models hosted on Novita. ### πŸ’‘ **Example commands you could test**: 1. `"Give me info on my websites SSL certificate"` 2. `"Check if my server is using quantum safe encyption for communication"` 3. `"Run a comprehensive security audit on my server"` 4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a [Quantum Network Monitor Agent](https://readyforquantum.com/Download/?utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme) to run the .net code on. This is a very flexible and powerful feature. Use with caution! ### Final Word I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAIβ€”all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is [open source](https://github.com/Mungert69). Feel free to use whatever you find helpful. If you appreciate the work, please consider [buying me a coffee](https://www.buymeacoffee.com/mahadeva) β˜•. Your support helps cover service costs and allows me to raise token limits for everyone. I'm also open to job opportunities or sponsorship. Thank you! 😊