File size: 3,743 Bytes
e2a82bc deada27 0aec2ed deada27 56f66b5 deada27 0aec2ed 56f66b5 deada27 08e83f3 56f66b5 08e83f3 56f66b5 08e83f3 56f66b5 0aec2ed 56f66b5 08e83f3 56f66b5 deada27 56f66b5 deada27 56f66b5 deada27 56f66b5 deada27 56f66b5 0aec2ed 69a842e 56f66b5 08e83f3 56f66b5 deada27 56f66b5 0aec2ed 56f66b5 0aec2ed 56f66b5 0aec2ed 56f66b5 deada27 69a842e e2a82bc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
---
license: gpl-3.0
language:
- en
base_model:
- google-bert/bert-base-uncased
---
# PatentBERT - PyTorch
BERT model specialized for patent classification using the **CPC (Cooperative Patent Classification) system**. (PyTorch version of the original [PatentBert](https://github.com/jiehsheng/PatentBERT/) model.)
## π Specifications
- **Output classes**: 656 (CPC subclass labels)
- **Classification system**: CPC (Cooperative Patent Classification)
- **Architecture**: BERT-base (768 hidden, 12 layers, 12 attention heads)
- **Vocabulary**: 30,522 tokens
- **Format**: SafeTensors
## π·οΈ CPC Classes (Real Distribution)
The model predicts classes according to the authentic CPC system used in PatentBERT training:
### Main Sections (Actual Counts)
- **A (84 classes)**: Human Necessities - Agriculture, Food, Health, Sports
- **B (171 classes)**: Performing Operations; Transporting - Manufacturing, Transport
- **C (88 classes)**: Chemistry; Metallurgy - Chemical processes, Materials
- **D (40 classes)**: Textiles; Paper - Fibers, Fabrics, Paper-making
- **E (31 classes)**: Fixed Constructions - Building, Mining, Roads
- **F (101 classes)**: Mechanical Engineering; Lightning; Heating; Weapons; Blasting
- **G (81 classes)**: Physics - Optics, Acoustics, Computing, Measuring
- **H (51 classes)**: Electricity - Electronics, Power generation, Communication
- **Y (9 classes)**: General Tagging of New Technological Developments
### Example of CPC Subclasses
- `A01B`: SOIL WORKING IN AGRICULTURE OR FORESTRY
- `B25J`: MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- `C07D`: HETEROCYCLIC COMPOUNDS
- `G06F`: ELECTRIC DIGITAL DATA PROCESSING
- `H04L`: TRANSMISSION OF DIGITAL INFORMATION
## π Usage
```python
from transformers import BertForSequenceClassification, BertTokenizer
import json
import torch
# Load model and tokenizer
model = BertForSequenceClassification.from_pretrained('ZoeYou/patentbert-pytorch')
tokenizer = BertTokenizer.from_pretrained('ZoeYou/patentbert-pytorch')
# Inference example
text = "A method for producing synthetic materials with enhanced thermal properties..."
inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits.softmax(dim=-1)
# Get prediction
predicted_class_id = predictions.argmax().item()
confidence = predictions.max().item()
# Use model labels (CPC codes)
predicted_label = model.config.id2label[str(predicted_class_id)]
print(f"Predicted CPC class: {predicted_label} (ID: {predicted_class_id})")
print(f"Confidence: {confidence:.2%}")
```
## π Included Files
- `model.safetensors`: Model weights (420 MB)
- `config.json`: Configuration with integrated CPC labels
- `vocab.txt`: Tokenizer vocabulary
- `tokenizer_config.json`: Tokenizer configuration
- `labels.json`: Complete CPC label mapping (656 authentic labels)
- `README.md`: This documentation
## π¬ Performance
This model was trained on a large patent corpus to automatically classify documents according to the CPC system, using the exact same 656 CPC codes from the original PatentBERT training data.
## π References
- [Cooperative Patent Classification (CPC)](https://www.cooperativepatentclassification.org/)
- [Original PatentBERT Paper](https://arxiv.org/abs/2103.02557)
## π Citation
If you use this model, please cite the original PatentBERT work and mention this PyTorch conversion.
```
@article{patent_bert,
author = "Jieh-Sheng Lee and Jieh Hsiang",
title = "{PatentBERT: Patent classification with fine-tuning a pre-trained BERT model}",
journal = "World Patent Information",
volume = "61",
number = "101965",
year = "2020",
}
```
|