YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Cybersecurity NER Model v8

Named Entity Recognition model for cybersecurity domain text, trained on spaCy v3.8 with custom training data.

Model Description

Fine-tuned NER model for extracting 13 cybersecurity entity types from technical documentation, CVs, job descriptions, threat reports, and compliance documents.

Performance

Test Results (v8):

  • Pass Rate: 94% (62/66 tests)
  • Dev F1 Score: 98.58%
  • Precision: 98.71%
  • Recall: 98.46%
  • Training Steps: 11,500 (early stopping)
  • Training Data: 2,223 examples

Entity Type Performance:

Entity Type Test Pass Rate Dev Set F1
CVE 100% (3/3) 100.00%
AUDIT_TERM 75% (3/4) 100.00%
SECURITY_TOOL 100% (4/4) 100.00%
CERTIFICATION 100% (4/4) 98.73%
SECURITY_ROLE 100% (4/4) 98.11%
FRAMEWORK 100% (4/4) 93.88%
TECHNICAL_SKILL 100% (4/4) 100.00%
ACRONYM 100% (4/4) 100.00%
SECURITY_DOMAIN 100% (4/4) 100.00%
ATTACK_TECHNIQUE 75% (3/4) 98.70%
THREAT_TYPE 75% (3/4) 95.24%
REGULATION 75% (3/4) 96.55%
CONTROL_ID 100% (4/4) -

Entity Types

  1. CVE - CVE identifiers (e.g., CVE-2024-1234)
  2. CERTIFICATION - Security certifications (CISSP, OSCP, CEH, CISM, Security+)
  3. FRAMEWORK - Security frameworks (NIST CSF, ISO 27001, MITRE ATT&CK, CIS Controls)
  4. ATTACK_TECHNIQUE - Attack methods (SQL injection, XSS, CSRF, buffer overflow)
  5. TECHNICAL_SKILL - Technical skills (Incident Response, Forensics, Penetration Testing)
  6. AUDIT_TERM - Audit/compliance terms (Risk assessment, Compliance audit, Security review)
  7. SECURITY_ROLE - Job roles (CISO, SOC Analyst, Security Engineer, Pentester)
  8. THREAT_TYPE - Threat types (APT, ransomware, phishing, DDoS, malware)
  9. ACRONYM - Security acronyms (SIEM, EDR, SOAR, IDS/IPS, WAF, DLP)
  10. SECURITY_DOMAIN - Security domains (Cloud Security, Network Security, Application Security)
  11. REGULATION - Regulations (GDPR, HIPAA, PCI-DSS, SOX, CCPA)
  12. SECURITY_TOOL - Security tools (Splunk, Metasploit, Burp Suite, Nmap, Wireshark)
  13. CONTROL_ID - Control identifiers (ISO 27001 A.5.1, NIST CSF PR.AC-1, CIS Control 1.1)

Usage

import spacy

# Load model
nlp = spacy.load("path/to/model")

# Extract entities
text = "CISSP certified professional with experience in Splunk and Metasploit"
doc = nlp(text)

for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")

Output:

CISSP -> CERTIFICATION
Splunk -> SECURITY_TOOL
Metasploit -> SECURITY_TOOL

Training Data

Sources:

  • v7 merged data: 1,448 examples
  • v8 generated: 1,347 examples with multi-entity patterns, case variants
  • Manual curated: 100 examples
  • Final dataset: 2,223 unique examples (after validation and deduplication)

v8 Improvements:

  • Multi-entity "X and Y" patterns (50 examples per entity type)
  • Title case variants (CISSP, cissp, Cissp)
  • Comma-separated list patterns
  • AUDIT_TERM edge cases (Compliance audit)

Entity Distribution:

  • AUDIT_TERM: 326 (12.4%)
  • CERTIFICATION: 295 (11.2%)
  • SECURITY_TOOL: 293 (11.1%)
  • ATTACK_TECHNIQUE: 282 (10.7%)
  • THREAT_TYPE: 263 (10.0%)
  • TECHNICAL_SKILL: 228 (8.6%)
  • REGULATION: 222 (8.4%)
  • CVE: 182 (6.9%)
  • FRAMEWORK: 165 (6.3%)
  • SECURITY_ROLE: 153 (5.8%)
  • ACRONYM: 142 (5.4%)
  • SECURITY_DOMAIN: 85 (3.2%)

Training Configuration

  • Framework: spaCy 3.8
  • Architecture: tok2vec + TransitionBasedParser
  • GPU: NVIDIA RTX 4090
  • Training steps: 11,500 (early stopping)
  • Patience: 5,000 steps
  • Learning rate: 3e-05
  • Dropout: 0.25
  • Batch size: 1,000
  • Train/dev split: 85/15

Version History

v8 (Current):

  • 94% pass rate (62/66)
  • Multi-entity extraction improved
  • Title case support added
  • AUDIT_TERM edge cases fixed

v7:

  • 86% pass rate (57/66)
  • CVE detection restored
  • SECURITY_ROLE improved to 100%
  • IDS/IPS and DDoS fixed

v6:

  • 74% pass rate (49/66)
  • CVE regression (missing)
  • AUDIT_TERM and SECURITY_ROLE issues

Known Limitations

v8 has 4 remaining test failures:

  1. Multi-entity extraction in specific contexts ("APT group using ransomware")
  2. Span boundary issues with conjunctions ("XSS and CSRF mitigated")
  3. Specific "X and Y" patterns ("HIPAA and PCI-DSS standards")
  4. "Gap analysis" edge case

Use Cases

  • CV/resume skill extraction
  • Job description analysis
  • Threat intelligence reports
  • Compliance documentation
  • Security audit reports
  • Technical documentation
  • Security training materials

License

MIT

Citation

@misc{cybersecurity-ner,
  title={Cybersecurity NER Model},
  author={PKI},
  year={2026},
  url={https://huggingface.co/pki/cybersecurity-ner}
}

Contact

For issues or questions, please open an issue on GitHub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support