Spaces:

Zen0
/

auscyberbench-evaluator

Running on Zero

Zen0 commited on 20 days ago

Commit

7d0c82c

1 Parent(s): bd99e48

Initial deployment of AusCyberBench Evaluation Dashboard

🇦🇺 Australia's First LLM Cybersecurity Benchmark

Features:
- Interactive Gradio dashboard for model evaluation
- 26 pre-configured models (small, medium, security-focused)
- Evaluates on 13,449 tasks across 6 categories
- Real-time progress tracking and leaderboard
- Australian orthography and color scheme
- Downloadable results (JSON format)

Categories:
- Regulatory: Essential Eight, ISM Controls, Privacy Act, SOCI Act
- Knowledge: Threat Intelligence, Terminology

Dataset: Zen0/AusCyberBench

Files changed (4) hide show

.gitignore +36 -0
README.md +182 -12
app.py +503 -0
requirements.txt +9 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,36 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+env/
+venv/
+.venv
+# Model cache
+.cache/
+models/
+*.bin
+*.safetensors
+# Results and logs
+*.json
+*.log
+*.csv
+# HuggingFace cache
+.huggingface/
+# Jupyter
+.ipynb_checkpoints/
+# OS
+.DS_Store
+Thumbs.db
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo

README.md CHANGED Viewed

@@ -1,12 +1,182 @@
----
-title: Auscyberbench Evaluator
-emoji: ⚡
-colorFrom: red
-colorTo: pink
-sdk: gradio
-sdk_version: 5.49.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: AusCyberBench Evaluation Dashboard
+emoji: 🇦🇺
+colorFrom: green
+colorTo: yellow
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# 🇦🇺 AusCyberBench Evaluation Dashboard
+**Australia's First LLM Cybersecurity Benchmark**
+An interactive dashboard for evaluating language models on Australian cybersecurity knowledge, regulations, and threat intelligence.
+## About AusCyberBench
+AusCyberBench is a comprehensive benchmark dataset containing **13,449 tasks** across six critical categories:
+### 📋 Categories
+- **🛡️ Regulatory: Essential Eight** (2,558 tasks)
+  - ACSC's baseline cybersecurity mitigation strategies
+  - Maturity levels 1-3 across 8 mitigation strategies
+  - Application whitelisting, patching, MFA, backups, etc.
+- **📜 Regulatory: ISM Controls** (7,200 tasks)
+  - Information Security Manual control requirements
+  - Commonwealth entity security obligations
+  - Control effectiveness, implementation, and compliance
+- **🔒 Regulatory: Privacy Act** (204 tasks)
+  - Australian Privacy Principles (APPs)
+  - Data protection and privacy obligations
+  - Notifiable Data Breaches (NDB) scheme
+- **⚡ Regulatory: SOCI Act** (240 tasks)
+  - Security of Critical Infrastructure Act 2018
+  - Critical infrastructure risk management
+  - Sector-specific obligations
+- **🎯 Knowledge: Threat Intelligence** (2,520 tasks)
+  - ACSC threat reports and advisories
+  - Australian threat landscape
+  - Cyber incident response
+- **📚 Knowledge: Terminology** (727 tasks)
+  - Australian cybersecurity terminology
+  - ACSC glossary and definitions
+  - Industry-specific language
+## Features
+### 🤖 26 Pre-Configured Models
+Evaluate across diverse model categories:
+- **Small Models (1-4B):** Phi-3, Gemma-2, Qwen, Llama 3.2, StableLM, TinyLlama
+- **Medium Models (7-12B):** Mistral, Llama 3.1, Gemma-2-9b, Qwen-7B
+- **🔒 Cybersecurity-Focused:** Foundationsec-8B, DeepSeek Coder, WizardCoder, StarCoder2, CodeLlama, CodeGen25
+- **Reasoning & Analysis:** DeepSeek LLM, Yi, SOLAR, Hermes-3
+- **Diverse & Multilingual:** Aya-23, Falcon, OpenChat, OpenHermes
+### ⚡ Quick Selection Presets
+- Select all small models (7) for fast testing
+- Select all security models (6) for cybersecurity focus
+- Select all models (26) for comprehensive evaluation
+- Clear selection with one click
+### 🎯 Customisable Evaluation
+- **Sample size:** 10-500 tasks (default: 200)
+- **4-bit quantisation:** Reduce memory usage for larger models
+- **Temperature:** Control response randomness (0.1-1.0)
+- **Max tokens:** Limit response length (32-256)
+### 📊 Real-Time Results
+- Live leaderboard with rankings (🥇🥈🥉)
+- Model comparison visualisation in Australian colours
+- Per-category performance breakdown
+- Downloadable results (JSON format)
+## Usage
+1. **Select Models:** Use checkboxes or quick selection buttons
+2. **Configure Settings:** Adjust sample size, quantisation, temperature
+3. **Run Evaluation:** Click "🚀 Run Evaluation"
+4. **Monitor Progress:** Watch real-time progress and intermediate results
+5. **Analyse Results:** Review leaderboard, charts, and category breakdowns
+6. **Download:** Export results for further analysis
+## Dataset
+The benchmark is available on HuggingFace:
+🔗 **[Zen0/AusCyberBench](https://huggingface.co/datasets/Zen0/AusCyberBench)**
+### Dataset Splits
+- **Full:** All 13,449 tasks across all categories
+- **Australian:** 4,899 Australia-specific tasks
+## Evaluation Methodology
+### Prompt Formatting
+Model-specific chat templates ensure optimal performance:
+- **Phi-3/Phi-3.5:** `<|user|>...<|end|>\n<|assistant|>`
+- **Gemma-2:** `<start_of_turn>user\n...<end_of_turn>\n<start_of_turn>model`
+- **Generic (Llama, Mistral, Qwen, etc.):** `[INST] ... [/INST]`
+### Answer Extraction
+Robust extraction for multiple-choice tasks:
+- Primary: Regex pattern `\b([A-D])\b` matching
+- Fallback: First character validation
+- Handles various response formats
+### Memory Management
+Automatic cleanup between models:
+- Model and tokeniser deletion
+- CUDA cache clearing
+- Garbage collection
+- Prevents OOM errors on GPU instances
+## Performance Expectations
+Based on initial benchmarking:
+- **Small Models (1-4B):** 10-25% accuracy
+- **Medium Models (7-12B):** 15-30% accuracy
+- **Cybersecurity Models:** 20-35% accuracy (domain-specific advantage)
+- **Reasoning Models:** 25-40% accuracy
+Performance varies significantly by category:
+- **Essential Eight:** Higher scores (20-40%)
+- **ISM Controls:** Lower scores (10-20%)
+- **Terminology:** Moderate scores (15-30%)
+## Technical Requirements
+This Space requires GPU hardware for model inference. Free-tier GPU instances may experience:
+- Longer evaluation times
+- Memory constraints with larger models
+- 4-bit quantisation recommended for 7B+ models
+## Citation
+If you use AusCyberBench in your research, please cite:
+```bibtex
+@dataset{auscyberbench2025,
+  title={AusCyberBench: Australia's First LLM Cybersecurity Benchmark},
+  author={Zen0},
+  year={2025},
+  publisher={HuggingFace},
+  url={https://huggingface.co/datasets/Zen0/AusCyberBench}
+}
+```
+## License
+MIT License - See LICENSE file for details
+## Acknowledgements
+- **Australian Cyber Security Centre (ACSC)** for Essential Eight, ISM, and threat intelligence
+- **Office of the Australian Information Commissioner (OAIC)** for Privacy Act guidance
+- **Department of Home Affairs** for SOCI Act resources
+- **HuggingFace** for infrastructure and model hosting
+---
+**Built with Australian orthography** 🇦🇺
+*Visualise • Analyse • Optimise • Quantisation*

app.py ADDED Viewed

	@@ -0,0 +1,503 @@

+#!/usr/bin/env python3
+"""
+AusCyberBench Evaluation Dashboard
+Interactive Gradio Space for benchmarking LLMs on Australian cybersecurity knowledge
+"""
+import gradio as gr
+import torch
+import gc
+import json
+import re
+import time
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+from pathlib import Path
+from collections import defaultdict
+from datasets import load_dataset
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+import numpy as np
+# Australian color scheme
+AUSSIE_GREEN = '#008751'
+AUSSIE_GOLD = '#FFB81C'
+# Model categories with all 26 models
+MODELS_BY_CATEGORY = {
+    "Small Models (1-4B)": [
+        "microsoft/Phi-3-mini-4k-instruct",
+        "microsoft/Phi-3.5-mini-instruct",
+        "google/gemma-2-2b-it",
+        "Qwen/Qwen2.5-3B-Instruct",
+        "meta-llama/Llama-3.2-3B-Instruct",
+        "stabilityai/stablelm-2-1_6b-chat",
+        "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+    ],
+    "Medium Models (7-12B)": [
+        "mistralai/Mistral-7B-Instruct-v0.3",
+        "Qwen/Qwen2.5-7B-Instruct",
+        "meta-llama/Llama-3.1-8B-Instruct",
+        "google/gemma-2-9b-it",
+        "mistralai/Mistral-Nemo-Instruct-2407",
+    ],
+    "🔒 Cybersecurity-Focused": [
+        "Eldorado-AI/Foundationsec-8B",
+        "deepseek-ai/deepseek-coder-6.7b-instruct",
+        "WizardLM/WizardCoder-Python-7B-V1.0",
+        "bigcode/starcoder2-7b",
+        "meta-llama/CodeLlama-7b-Instruct-hf",
+        "Salesforce/codegen25-7b-instruct",
+    ],
+    "Reasoning & Analysis": [
+        "deepseek-ai/deepseek-llm-7b-chat",
+        "01-ai/Yi-1.5-9B-Chat",
+        "upstage/SOLAR-10.7B-Instruct-v1.0",
+        "NousResearch/Hermes-3-Llama-3.1-8B",
+    ],
+    "Diverse & Multilingual": [
+        "CohereForAI/aya-23-8B",
+        "tiiuae/falcon-7b-instruct",
+        "openchat/openchat-3.5-0106",
+        "teknium/OpenHermes-2.5-Mistral-7B",
+    ],
+}
+# Flatten all models
+ALL_MODELS = [model for category in MODELS_BY_CATEGORY.values() for model in category]
+# Global state
+current_results = []
+dataset_cache = None
+def load_benchmark_dataset(subset="australian", num_samples=200):
+    """Load and sample AusCyberBench dataset"""
+    global dataset_cache
+    if dataset_cache is None:
+        dataset_cache = load_dataset("Zen0/AusCyberBench", split=subset)
+    # Proportional sampling
+    import random
+    random.seed(42)
+    by_category = defaultdict(list)
+    for item in dataset_cache:
+        by_category[item['category']].append(item)
+    total = len(dataset_cache)
+    samples = []
+    for cat, items in by_category.items():
+        n_cat = max(1, int(len(items) / total * num_samples))
+        if len(items) <= n_cat:
+            samples.extend(items)
+        else:
+            samples.extend(random.sample(items, n_cat))
+    random.shuffle(samples)
+    return samples[:num_samples]
+def format_prompt(task, model_name):
+    """Format task as prompt with proper chat template"""
+    question = task['description']
+    if task.get('task_type') == 'multiple_choice' and 'options' in task:
+        options_text = "\n".join([f"{opt['id']}. {opt['text']}" for opt in task['options']])
+        if 'phi' in model_name.lower():
+            return f"""<|user|>
+{question}
+{options_text}
+Respond with ONLY the letter of the correct answer (A, B, C, or D).<|end|>
+<|assistant|>"""
+        elif 'gemma' in model_name.lower():
+            return f"""<start_of_turn>user
+{question}
+{options_text}
+Respond with ONLY the letter of the correct answer (A, B, C, or D).<end_of_turn>
+<start_of_turn>model
+"""
+        else:
+            return f"""[INST] {question}
+{options_text}
+Respond with ONLY the letter of the correct answer (A, B, C, or D). [/INST]"""
+    else:
+        return f"""[INST] {question} [/INST]"""
+def extract_answer(response, task):
+    """Extract answer letter from model response"""
+    response = response.strip()
+    if task.get('task_type') == 'multiple_choice':
+        match = re.search(r'\b([A-D])\b', response, re.IGNORECASE)
+        if match:
+            return match.group(1).upper()
+        if response and response[0].upper() in ['A', 'B', 'C', 'D']:
+            return response[0].upper()
+        return ""
+    else:
+        return response[:100]
+def cleanup_model(model, tokenizer):
+    """Thoroughly clean up model to free memory"""
+    if model is not None:
+        del model
+    if tokenizer is not None:
+        del tokenizer
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+        torch.cuda.ipc_collect()
+    gc.collect()
+def evaluate_single_model(model_name, tasks, use_4bit=True, temperature=0.7, max_tokens=128, progress=gr.Progress()):
+    """Evaluate a single model on the benchmark"""
+    progress(0, desc=f"Loading {model_name.split('/')[-1]}...")
+    try:
+        # Load model
+        if use_4bit:
+            quant_config = BitsAndBytesConfig(
+                load_in_4bit=True,
+                bnb_4bit_compute_dtype=torch.float16,
+                bnb_4bit_use_double_quant=True,
+                bnb_4bit_quant_type="nf4"
+            )
+        else:
+            quant_config = None
+        tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+        model = AutoModelForCausalLM.from_pretrained(
+            model_name,
+            quantization_config=quant_config,
+            device_map="auto",
+            trust_remote_code=True,
+            torch_dtype=torch.float16 if not use_4bit else None
+        )
+        if tokenizer.pad_token is None:
+            tokenizer.pad_token = tokenizer.eos_token
+        progress(0.1, desc=f"Evaluating {model_name.split('/')[-1]}...")
+        # Evaluate tasks
+        results = []
+        for i, task in enumerate(tasks):
+            progress((0.1 + 0.8 * i / len(tasks)), desc=f"Task {i+1}/{len(tasks)}")
+            try:
+                prompt = format_prompt(task, model_name)
+                inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+                if 'token_type_ids' in inputs:
+                    inputs.pop('token_type_ids')
+                with torch.no_grad():
+                    outputs = model.generate(
+                        **inputs,
+                        max_new_tokens=max_tokens,
+                        temperature=temperature,
+                        do_sample=True,
+                        top_p=0.9,
+                        pad_token_id=tokenizer.eos_token_id
+                    )
+                response = tokenizer.decode(
+                    outputs[0][inputs['input_ids'].shape[1]:],
+                    skip_special_tokens=True
+                )
+                predicted = extract_answer(response, task)
+                correct = task.get('answer', '')
+                is_correct = predicted.upper() == correct.upper()
+                results.append({
+                    'task_id': task.get('task_id'),
+                    'category': task.get('category'),
+                    'predicted': predicted,
+                    'correct': correct,
+                    'is_correct': is_correct
+                })
+            except Exception as e:
+                results.append({
+                    'task_id': task.get('task_id'),
+                    'category': task.get('category'),
+                    'predicted': '',
+                    'correct': task.get('answer', ''),
+                    'is_correct': False
+                })
+        # Calculate metrics
+        total_correct = sum(1 for r in results if r['is_correct'])
+        overall_accuracy = (total_correct / len(results)) * 100
+        category_stats = defaultdict(lambda: {'correct': 0, 'total': 0})
+        for result in results:
+            cat = result['category']
+            category_stats[cat]['total'] += 1
+            if result['is_correct']:
+                category_stats[cat]['correct'] += 1
+        category_scores = {
+            cat: (stats['correct'] / stats['total']) * 100 if stats['total'] > 0 else 0
+            for cat, stats in category_stats.items()
+        }
+        progress(1.0, desc="Complete!")
+        return {
+            'model': model_name,
+            'overall_accuracy': overall_accuracy,
+            'total_correct': total_correct,
+            'total_tasks': len(results),
+            'category_scores': category_scores,
+            'detailed_results': results
+        }
+    except Exception as e:
+        return {
+            'model': model_name,
+            'error': str(e),
+            'overall_accuracy': 0,
+            'total_correct': 0,
+            'total_tasks': len(tasks)
+        }
+    finally:
+        cleanup_model(
+            model if 'model' in locals() else None,
+            tokenizer if 'tokenizer' in locals() else None
+        )
+def run_evaluation(selected_models, num_samples, use_4bit, temperature, max_tokens, progress=gr.Progress()):
+    """Run evaluation on selected models"""
+    global current_results
+    if not selected_models:
+        return "Please select at least one model to evaluate.", None, None
+    # Load dataset
+    progress(0, desc="Loading AusCyberBench dataset...")
+    tasks = load_benchmark_dataset(num_samples=num_samples)
+    # Evaluate each model
+    current_results = []
+    for i, model_name in enumerate(selected_models):
+        progress((i / len(selected_models)), desc=f"Model {i+1}/{len(selected_models)}")
+        result = evaluate_single_model(
+            model_name, tasks, use_4bit, temperature, max_tokens, progress
+        )
+        current_results.append(result)
+        # Yield intermediate results
+        yield format_results_table(current_results), create_comparison_chart(current_results), None
+    # Final results
+    final_table = format_results_table(current_results)
+    final_chart = create_comparison_chart(current_results)
+    download_data = create_download_data(current_results)
+    yield final_table, final_chart, download_data
+def format_results_table(results):
+    """Format results as DataFrame for display"""
+    if not results:
+        return pd.DataFrame()
+    rows = []
+    for result in results:
+        if 'error' in result:
+            rows.append({
+                'Rank': '❌',
+                'Model': result['model'].split('/')[-1],
+                'Accuracy': '0.0%',
+                'Correct/Total': f"0/{result['total_tasks']}",
+                'Status': f"Error: {result['error'][:50]}"
+            })
+        else:
+            rows.append({
+                'Rank': '',
+                'Model': result['model'].split('/')[-1],
+                'Accuracy': f"{result['overall_accuracy']:.1f}%",
+                'Correct/Total': f"{result['total_correct']}/{result['total_tasks']}",
+                'Status': '✓ Complete'
+            })
+    df = pd.DataFrame(rows)
+    # Sort by accuracy and assign ranks
+    df['_sort'] = df['Accuracy'].str.replace('%', '').astype(float)
+    df = df.sort_values('_sort', ascending=False)
+    df['Rank'] = ['🥇', '🥈', '🥉'] + [''] * (len(df) - 3)
+    df = df.drop('_sort', axis=1)
+    return df
+def create_comparison_chart(results):
+    """Create bar chart comparing model accuracies"""
+    if not results or all('error' in r for r in results):
+        return None
+    valid_results = [r for r in results if 'error' not in r]
+    if not valid_results:
+        return None
+    models = [r['model'].split('/')[-1] for r in valid_results]
+    accuracies = [r['overall_accuracy'] for r in valid_results]
+    # Sort by accuracy
+    sorted_pairs = sorted(zip(models, accuracies), key=lambda x: x[1], reverse=True)
+    models, accuracies = zip(*sorted_pairs)
+    plt.figure(figsize=(12, max(6, len(models) * 0.4)))
+    bars = plt.barh(models, accuracies, color=AUSSIE_GREEN)
+    # Add accuracy labels
+    for i, (model, acc) in enumerate(zip(models, accuracies)):
+        plt.text(acc + 1, i, f'{acc:.1f}%', va='center', fontweight='bold')
+    plt.xlabel('Accuracy (%)', fontsize=12, fontweight='bold')
+    plt.title('AusCyberBench: Model Comparison', fontsize=14, fontweight='bold')
+    plt.xlim(0, 100)
+    plt.grid(axis='x', alpha=0.3)
+    plt.tight_layout()
+    return plt
+def create_download_data(results):
+    """Create downloadable results file"""
+    if not results:
+        return None
+    # Create comprehensive results JSON
+    output = {
+        'timestamp': time.strftime('%Y-%m-%d %H:%M:%S'),
+        'benchmark': 'AusCyberBench',
+        'results': results
+    }
+    # Save to file
+    output_path = 'auscyberbench_results.json'
+    with open(output_path, 'w') as f:
+        json.dump(output, f, indent=2)
+    return output_path
+# Build Gradio interface
+with gr.Blocks(title="AusCyberBench Evaluation Dashboard", theme=gr.themes.Soft()) as app:
+    gr.Markdown("""
+    # 🇦🇺 AusCyberBench Evaluation Dashboard
+    **Australia's First LLM Cybersecurity Benchmark**
+    Test multiple language models on Australian cybersecurity knowledge including Essential Eight,
+    ISM Controls, Privacy Act, SOCI Act, and ACSC Threat Intelligence.
+    """)
+    with gr.Row():
+        with gr.Column(scale=1):
+            gr.Markdown("### 📋 Model Selection")
+            # Quick selection buttons
+            with gr.Row():
+                btn_small = gr.Button("Select Small Models (7)", size="sm")
+                btn_security = gr.Button("Select Security Models (6)", size="sm")
+                btn_all = gr.Button("Select All (26)", size="sm")
+                btn_clear = gr.Button("Clear", size="sm")
+            # Model checkboxes by category
+            model_checkboxes = []
+            for category, models in MODELS_BY_CATEGORY.items():
+                gr.Markdown(f"**{category}**")
+                for model in models:
+                    short_name = model.split('/')[-1]
+                    cb = gr.Checkbox(label=f"{short_name}", value=False)
+                    model_checkboxes.append((cb, model))
+            gr.Markdown("### ⚙️ Settings")
+            num_samples = gr.Slider(10, 500, value=200, step=10, label="Number of Tasks")
+            use_4bit = gr.Checkbox(label="Use 4-bit Quantisation", value=True)
+            temperature = gr.Slider(0.1, 1.0, value=0.7, step=0.1, label="Temperature")
+            max_tokens = gr.Slider(32, 256, value=128, step=32, label="Max Tokens")
+            run_btn = gr.Button("🚀 Run Evaluation", variant="primary", size="lg")
+        with gr.Column(scale=2):
+            gr.Markdown("### 📊 Results")
+            results_table = gr.Dataframe(
+                label="Leaderboard",
+                headers=["Rank", "Model", "Accuracy", "Correct/Total", "Status"],
+                interactive=False
+            )
+            comparison_plot = gr.Plot(label="Model Comparison")
+            download_file = gr.File(label="Download Results (JSON)")
+    # Quick select button actions
+    def select_small():
+        return [gr.update(value=(model in MODELS_BY_CATEGORY["Small Models (1-4B)"]))
+                for cb, model in model_checkboxes]
+    def select_security():
+        return [gr.update(value=(model in MODELS_BY_CATEGORY["🔒 Cybersecurity-Focused"]))
+                for cb, model in model_checkboxes]
+    def select_all():
+        return [gr.update(value=True) for _ in model_checkboxes]
+    def clear_all():
+        return [gr.update(value=False) for _ in model_checkboxes]
+    btn_small.click(select_small, outputs=[cb for cb, _ in model_checkboxes])
+    btn_security.click(select_security, outputs=[cb for cb, _ in model_checkboxes])
+    btn_all.click(select_all, outputs=[cb for cb, _ in model_checkboxes])
+    btn_clear.click(clear_all, outputs=[cb for cb, _ in model_checkboxes])
+    # Run evaluation
+    def prepare_evaluation(*checkbox_values):
+        selected = [model for (cb, model), val in zip(model_checkboxes, checkbox_values) if val]
+        return selected
+    run_btn.click(
+        fn=lambda *args: run_evaluation(
+            prepare_evaluation(*args[:-4]),
+            int(args[-4]),
+            args[-3],
+            args[-2],
+            int(args[-1])
+        ),
+        inputs=[cb for cb, _ in model_checkboxes] + [num_samples, use_4bit, temperature, max_tokens],
+        outputs=[results_table, comparison_plot, download_file]
+    )
+    gr.Markdown("""
+    ---
+    **Dataset:** [Zen0/AusCyberBench](https://huggingface.co/datasets/Zen0/AusCyberBench) |
+    **License:** Apache 2.0 |
+    **Models:** 26 LLMs including security-focused variants
+    """)
+if __name__ == "__main__":
+    app.queue().launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+gradio>=4.0.0
+transformers>=4.40.0
+torch>=2.0.0
+accelerate>=0.27.0
+bitsandbytes>=0.43.0
+datasets>=2.18.0
+pandas>=2.0.0
+matplotlib>=3.7.0
+seaborn>=0.13.0