YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Cybersecurity NER Model v8
Named Entity Recognition model for cybersecurity domain text, trained on spaCy v3.8 with custom training data.
Model Description
Fine-tuned NER model for extracting 13 cybersecurity entity types from technical documentation, CVs, job descriptions, threat reports, and compliance documents.
Performance
Test Results (v8):
- Pass Rate: 94% (62/66 tests)
- Dev F1 Score: 98.58%
- Precision: 98.71%
- Recall: 98.46%
- Training Steps: 11,500 (early stopping)
- Training Data: 2,223 examples
Entity Type Performance:
| Entity Type | Test Pass Rate | Dev Set F1 |
|---|---|---|
| CVE | 100% (3/3) | 100.00% |
| AUDIT_TERM | 75% (3/4) | 100.00% |
| SECURITY_TOOL | 100% (4/4) | 100.00% |
| CERTIFICATION | 100% (4/4) | 98.73% |
| SECURITY_ROLE | 100% (4/4) | 98.11% |
| FRAMEWORK | 100% (4/4) | 93.88% |
| TECHNICAL_SKILL | 100% (4/4) | 100.00% |
| ACRONYM | 100% (4/4) | 100.00% |
| SECURITY_DOMAIN | 100% (4/4) | 100.00% |
| ATTACK_TECHNIQUE | 75% (3/4) | 98.70% |
| THREAT_TYPE | 75% (3/4) | 95.24% |
| REGULATION | 75% (3/4) | 96.55% |
| CONTROL_ID | 100% (4/4) | - |
Entity Types
- CVE - CVE identifiers (e.g., CVE-2024-1234)
- CERTIFICATION - Security certifications (CISSP, OSCP, CEH, CISM, Security+)
- FRAMEWORK - Security frameworks (NIST CSF, ISO 27001, MITRE ATT&CK, CIS Controls)
- ATTACK_TECHNIQUE - Attack methods (SQL injection, XSS, CSRF, buffer overflow)
- TECHNICAL_SKILL - Technical skills (Incident Response, Forensics, Penetration Testing)
- AUDIT_TERM - Audit/compliance terms (Risk assessment, Compliance audit, Security review)
- SECURITY_ROLE - Job roles (CISO, SOC Analyst, Security Engineer, Pentester)
- THREAT_TYPE - Threat types (APT, ransomware, phishing, DDoS, malware)
- ACRONYM - Security acronyms (SIEM, EDR, SOAR, IDS/IPS, WAF, DLP)
- SECURITY_DOMAIN - Security domains (Cloud Security, Network Security, Application Security)
- REGULATION - Regulations (GDPR, HIPAA, PCI-DSS, SOX, CCPA)
- SECURITY_TOOL - Security tools (Splunk, Metasploit, Burp Suite, Nmap, Wireshark)
- CONTROL_ID - Control identifiers (ISO 27001 A.5.1, NIST CSF PR.AC-1, CIS Control 1.1)
Usage
import spacy
# Load model
nlp = spacy.load("path/to/model")
# Extract entities
text = "CISSP certified professional with experience in Splunk and Metasploit"
doc = nlp(text)
for ent in doc.ents:
print(f"{ent.text} -> {ent.label_}")
Output:
CISSP -> CERTIFICATION
Splunk -> SECURITY_TOOL
Metasploit -> SECURITY_TOOL
Training Data
Sources:
- v7 merged data: 1,448 examples
- v8 generated: 1,347 examples with multi-entity patterns, case variants
- Manual curated: 100 examples
- Final dataset: 2,223 unique examples (after validation and deduplication)
v8 Improvements:
- Multi-entity "X and Y" patterns (50 examples per entity type)
- Title case variants (CISSP, cissp, Cissp)
- Comma-separated list patterns
- AUDIT_TERM edge cases (Compliance audit)
Entity Distribution:
- AUDIT_TERM: 326 (12.4%)
- CERTIFICATION: 295 (11.2%)
- SECURITY_TOOL: 293 (11.1%)
- ATTACK_TECHNIQUE: 282 (10.7%)
- THREAT_TYPE: 263 (10.0%)
- TECHNICAL_SKILL: 228 (8.6%)
- REGULATION: 222 (8.4%)
- CVE: 182 (6.9%)
- FRAMEWORK: 165 (6.3%)
- SECURITY_ROLE: 153 (5.8%)
- ACRONYM: 142 (5.4%)
- SECURITY_DOMAIN: 85 (3.2%)
Training Configuration
- Framework: spaCy 3.8
- Architecture: tok2vec + TransitionBasedParser
- GPU: NVIDIA RTX 4090
- Training steps: 11,500 (early stopping)
- Patience: 5,000 steps
- Learning rate: 3e-05
- Dropout: 0.25
- Batch size: 1,000
- Train/dev split: 85/15
Version History
v8 (Current):
- 94% pass rate (62/66)
- Multi-entity extraction improved
- Title case support added
- AUDIT_TERM edge cases fixed
v7:
- 86% pass rate (57/66)
- CVE detection restored
- SECURITY_ROLE improved to 100%
- IDS/IPS and DDoS fixed
v6:
- 74% pass rate (49/66)
- CVE regression (missing)
- AUDIT_TERM and SECURITY_ROLE issues
Known Limitations
v8 has 4 remaining test failures:
- Multi-entity extraction in specific contexts ("APT group using ransomware")
- Span boundary issues with conjunctions ("XSS and CSRF mitigated")
- Specific "X and Y" patterns ("HIPAA and PCI-DSS standards")
- "Gap analysis" edge case
Use Cases
- CV/resume skill extraction
- Job description analysis
- Threat intelligence reports
- Compliance documentation
- Security audit reports
- Technical documentation
- Security training materials
License
MIT
Citation
@misc{cybersecurity-ner,
title={Cybersecurity NER Model},
author={PKI},
year={2026},
url={https://huggingface.co/pki/cybersecurity-ner}
}
Contact
For issues or questions, please open an issue on GitHub.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support