YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Korean PII Masking
Author: ์์ฑ์ค (Sungjun Im) [LinkedIn]
Role: Project Lead & Primary Researcher
Overview
BERT-based token classification model specialized for Korean Personally Identifiable Information (PII) masking. It detects and masks 14 common types of Korean PII entities.
For best accuracy in production, strongly recommended: use a hybrid approach
(pre-processing โ model inference โ post-processing (Regex and rule based) rather than the model alone.
Base Model & Architecture
- Base pretrained model: KcBERT-Large
- Model type:
BertForTokenClassification - Architecture highlights:
- Hidden size : 1024
- Layers : 24
- Attention heads : 16
- Intermediate size : 4096
- Max position embeddings : 300
- Vocab size : 30,000
- Activation : GELU
- Dropout : 0.1 (hidden & attention)
Supported PII Types (BIO tagging)
- ๊ฐ๋งน์ ๋ช (Business Name)
- ๊ฒฐ์ ๊ธ์ก (Payment Amount)
- ๊ณ์ข๋ฒํธ (Account Number)
- ๋ก๊ทธ์ธID (Login ID)
- ์์ธ์ฃผ์ (Detailed Address)
- ์ ์ฉ์ ์ (Credit Score)
- ์ฌ๊ถ๋ฒํธ (Passport Number)
- ์ฐํธ๋ฒํธ (Postal Code)
- ์ด์ ๋ฉดํ๋ฒํธ (Driver's License Number)
- ์ด๋ฆ (Name)
- ์ ์๋ฉ์ผ (Email)
- ์ ํ๋ฒํธ (Phone Number)
- ์ฃผ๋ฏผ๋ฑ๋ก๋ฒํธ (Resident Registration Number)
- ์นด๋๋ฒํธ (Card Number)
- ํด๋์ ํ๋ฒํธ (Mobile Phone Number)
Example
์
๋ ฅ: "์์ฒ ์ฉ ๊ณ ๊ฐ๋, 8์ 10์ผ 14:32์ ๋ฐฑ๋ค๋ฐฉ ์ฝ์์ค์ ์์ 9,910์ ๊ฒฐ์ ๋ด์ญ ํ์ธ๋ฉ๋๋ค."
์ถ๋ ฅ:
- ๋ฐ๊ฒฌ๋ PII:
- ์์ฒ ์ฉ -> [์ด๋ฆ]
- ๋ฐฑ๋ค๋ฐฉ ์ฝ์์ค์ -> [๊ฐ๋งน์ ๋ช
]
- 9,910์ -> [๊ฒฐ์ ๊ธ์ก]
This list focuses on the most frequently occurring and sensitive personal data types in Korean text/documents.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support