YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Korean PII Masking

Author: ์ž„์„ฑ์ค€ (Sungjun Im) [LinkedIn]
Role: Project Lead & Primary Researcher

Overview

BERT-based token classification model specialized for Korean Personally Identifiable Information (PII) masking. It detects and masks 14 common types of Korean PII entities.

For best accuracy in production, strongly recommended: use a hybrid approach
(pre-processing โ†’ model inference โ†’ post-processing (Regex and rule based) rather than the model alone.

Base Model & Architecture

  • Base pretrained model: KcBERT-Large
  • Model type: BertForTokenClassification
  • Architecture highlights:
    • Hidden size : 1024
    • Layers : 24
    • Attention heads : 16
    • Intermediate size : 4096
    • Max position embeddings : 300
    • Vocab size : 30,000
    • Activation : GELU
    • Dropout : 0.1 (hidden & attention)

Supported PII Types (BIO tagging)

  1. ๊ฐ€๋งน์ ๋ช… (Business Name)
  2. ๊ฒฐ์ œ๊ธˆ์•ก (Payment Amount)
  3. ๊ณ„์ขŒ๋ฒˆํ˜ธ (Account Number)
  4. ๋กœ๊ทธ์ธID (Login ID)
  5. ์ƒ์„ธ์ฃผ์†Œ (Detailed Address)
  6. ์‹ ์šฉ์ ์ˆ˜ (Credit Score)
  7. ์—ฌ๊ถŒ๋ฒˆํ˜ธ (Passport Number)
  8. ์šฐํŽธ๋ฒˆํ˜ธ (Postal Code)
  9. ์šด์ „๋ฉดํ—ˆ๋ฒˆํ˜ธ (Driver's License Number)
  10. ์ด๋ฆ„ (Name)
  11. ์ „์ž๋ฉ”์ผ (Email)
  12. ์ „ํ™”๋ฒˆํ˜ธ (Phone Number)
  13. ์ฃผ๋ฏผ๋“ฑ๋ก๋ฒˆํ˜ธ (Resident Registration Number)
  14. ์นด๋“œ๋ฒˆํ˜ธ (Card Number)
  15. ํœด๋Œ€์ „ํ™”๋ฒˆํ˜ธ (Mobile Phone Number)

Example

์ž…๋ ฅ: "์–‘์ฒ ์šฉ ๊ณ ๊ฐ๋‹˜, 8์›” 10์ผ 14:32์— ๋ฐฑ๋‹ค๋ฐฉ ์ฝ”์—‘์Šค์ ์—์„œ 9,910์› ๊ฒฐ์ œ ๋‚ด์—ญ ํ™•์ธ๋ฉ๋‹ˆ๋‹ค."

์ถœ๋ ฅ:
- ๋ฐœ๊ฒฌ๋œ PII:
  - ์–‘์ฒ ์šฉ -> [์ด๋ฆ„]
  - ๋ฐฑ๋‹ค๋ฐฉ ์ฝ”์—‘์Šค์  -> [๊ฐ€๋งน์ ๋ช…]
  - 9,910์› -> [๊ฒฐ์ œ๊ธˆ์•ก]

This list focuses on the most frequently occurring and sensitive personal data types in Korean text/documents.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support