Upload AI text detector model
Browse files- README.md +162 -0
- benchmark_results.json +7 -0
- config.json +30 -0
- demo.html +676 -0
- deployment_config.json +39 -0
- fixed_optimized_detector.onnx +3 -0
- merges.txt +0 -0
- pytorch_optimized_detector.pt +3 -0
- special_tokens_map.json +43 -0
- tokenizer.json +0 -0
- tokenizer_config.json +167 -0
- vocab.json +0 -0
README.md
ADDED
@@ -0,0 +1,162 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
base_model: HuggingFaceTB/SmolLM-135M
|
4 |
+
tags:
|
5 |
+
- text-classification
|
6 |
+
- ai-detection
|
7 |
+
- pytorch
|
8 |
+
- onnx
|
9 |
+
- transformers
|
10 |
+
language:
|
11 |
+
- en
|
12 |
+
metrics:
|
13 |
+
- accuracy
|
14 |
+
library_name: transformers
|
15 |
+
pipeline_tag: text-classification
|
16 |
+
---
|
17 |
+
|
18 |
+
# Joshfcooper/ai-text-detector-optimized
|
19 |
+
|
20 |
+
## Model Description
|
21 |
+
|
22 |
+
This is an ultra-optimized AI text detector based on SmolLM-135M, designed to distinguish between human-written and AI-generated text with high accuracy and blazing-fast inference speed.
|
23 |
+
|
24 |
+
## Key Features
|
25 |
+
|
26 |
+
- **High Accuracy**: 96.7% accuracy on test data
|
27 |
+
- **Ultra-Fast**: 103.1ms average inference time
|
28 |
+
- **Optimized Architecture**: Uses only 12 out of 30 transformer layers (60% compression)
|
29 |
+
- **Multiple Formats**: Available in both PyTorch (.pt) and ONNX (.onnx) formats
|
30 |
+
- **Production Ready**: Optimized for real-world deployment
|
31 |
+
|
32 |
+
## Model Architecture
|
33 |
+
|
34 |
+
- **Base Model**: HuggingFaceTB/SmolLM-135M
|
35 |
+
- **Compression**: 30 layers → 12 layers (selected layers: 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22)
|
36 |
+
- **Feature Extraction**: 24 layer outputs → 13,824 features
|
37 |
+
- **Classifier**: Linear probe with sigmoid activation
|
38 |
+
- **Parameters**: ~60% reduction from base model
|
39 |
+
|
40 |
+
## Usage
|
41 |
+
|
42 |
+
### ONNX Model (Recommended for Web/Production)
|
43 |
+
|
44 |
+
```python
|
45 |
+
import onnxruntime as ort
|
46 |
+
from transformers import AutoTokenizer
|
47 |
+
import numpy as np
|
48 |
+
|
49 |
+
# Load tokenizer and ONNX model
|
50 |
+
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M")
|
51 |
+
session = ort.InferenceSession("model.onnx")
|
52 |
+
|
53 |
+
def predict(text):
|
54 |
+
# Tokenize
|
55 |
+
tokens = tokenizer(text, truncation=True, padding='max_length',
|
56 |
+
max_length=256, return_tensors="np")
|
57 |
+
|
58 |
+
# Convert to int64 for ONNX
|
59 |
+
feeds = {
|
60 |
+
'input_ids': tokens['input_ids'].astype(np.int64),
|
61 |
+
'attention_mask': tokens['attention_mask'].astype(np.int64)
|
62 |
+
}
|
63 |
+
|
64 |
+
# Run inference
|
65 |
+
result = session.run(None, feeds)
|
66 |
+
probability = result[0][0]
|
67 |
+
|
68 |
+
# Interpret (model outputs inverted probabilities)
|
69 |
+
human_prob = 1 - probability
|
70 |
+
is_human = human_prob > 0.5
|
71 |
+
|
72 |
+
return {
|
73 |
+
'prediction': 'human' if is_human else 'ai',
|
74 |
+
'human_probability': human_prob,
|
75 |
+
'confidence': abs(human_prob - 0.5) * 2
|
76 |
+
}
|
77 |
+
|
78 |
+
# Example usage
|
79 |
+
result = predict("Your text here...")
|
80 |
+
print(result)
|
81 |
+
```
|
82 |
+
|
83 |
+
### PyTorch Model
|
84 |
+
|
85 |
+
```python
|
86 |
+
import torch
|
87 |
+
from transformers import AutoTokenizer
|
88 |
+
import pickle
|
89 |
+
|
90 |
+
# Load model and tokenizer
|
91 |
+
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M")
|
92 |
+
model = torch.load("pytorch_model.pt", map_location='cpu')
|
93 |
+
model.eval()
|
94 |
+
|
95 |
+
def predict_pytorch(text):
|
96 |
+
tokens = tokenizer(text, truncation=True, padding='max_length',
|
97 |
+
max_length=256, return_tensors="pt")
|
98 |
+
|
99 |
+
with torch.no_grad():
|
100 |
+
probability = model(tokens['input_ids'], tokens['attention_mask']).item()
|
101 |
+
|
102 |
+
human_prob = 1 - probability # Invert output
|
103 |
+
is_human = human_prob > 0.5
|
104 |
+
|
105 |
+
return {
|
106 |
+
'prediction': 'human' if is_human else 'ai',
|
107 |
+
'human_probability': human_prob,
|
108 |
+
'confidence': abs(human_prob - 0.5) * 2
|
109 |
+
}
|
110 |
+
```
|
111 |
+
|
112 |
+
## Performance Metrics
|
113 |
+
|
114 |
+
- **Accuracy**: 96.7%
|
115 |
+
- **Inference Time**: 103.1ms (average)
|
116 |
+
- **Model Size**: ~60% smaller than base model
|
117 |
+
- **Throughput**: ~10 predictions/second
|
118 |
+
|
119 |
+
## Training Details
|
120 |
+
|
121 |
+
The model was trained using a feature extraction approach:
|
122 |
+
1. Extract hidden states from 12 selected layers of SmolLM-135M
|
123 |
+
2. Mean pooling across sequence length with attention masking
|
124 |
+
3. Concatenate features from all layers (13,824 total features)
|
125 |
+
4. Train linear classifier with standardization
|
126 |
+
5. Export to ONNX for optimized inference
|
127 |
+
|
128 |
+
## Important Notes
|
129 |
+
|
130 |
+
⚠️ **Output Inversion**: This model outputs inverted probabilities. Use `1 - model_output` for human probability.
|
131 |
+
|
132 |
+
## Files Included
|
133 |
+
|
134 |
+
- `model.onnx`: ONNX model for web/production deployment
|
135 |
+
- `pytorch_model.pt`: PyTorch model for development
|
136 |
+
- `config.json`: Model configuration
|
137 |
+
- `deployment_config.json`: Deployment configuration with layer selection
|
138 |
+
- `scaler_params.json`: Feature standardization parameters
|
139 |
+
|
140 |
+
## License
|
141 |
+
|
142 |
+
Apache 2.0
|
143 |
+
|
144 |
+
## Citation
|
145 |
+
|
146 |
+
```bibtex
|
147 |
+
@misc{ai-text-detector-optimized,
|
148 |
+
title={Ultra-Optimized AI Text Detector},
|
149 |
+
author={Your Name},
|
150 |
+
year={2024},
|
151 |
+
publisher={Hugging Face},
|
152 |
+
url={https://huggingface.co/Joshfcooper/ai-text-detector-optimized}
|
153 |
+
}
|
154 |
+
```
|
155 |
+
|
156 |
+
## Ethical Considerations
|
157 |
+
|
158 |
+
This model is designed to detect AI-generated text. Please use responsibly and be aware that:
|
159 |
+
- No detector is 100% accurate
|
160 |
+
- Results should be used as guidance, not definitive proof
|
161 |
+
- Consider privacy and consent when analyzing text
|
162 |
+
- Be aware of potential biases in training data
|
benchmark_results.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"total_time_seconds": 3.6133105754852295,
|
3 |
+
"avg_time_per_prediction_ms": 72.26621150970459,
|
4 |
+
"predictions_per_second": 13.837725530495128,
|
5 |
+
"model_type": "ONNX",
|
6 |
+
"num_tests": 50
|
7 |
+
}
|
config.json
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "ai_text_detector",
|
3 |
+
"base_model": "HuggingFaceTB/SmolLM-135M",
|
4 |
+
"architecture": "feature_extraction_classifier",
|
5 |
+
"num_layers_used": 12,
|
6 |
+
"total_layers": 30,
|
7 |
+
"selected_layers": [
|
8 |
+
0,
|
9 |
+
2,
|
10 |
+
4,
|
11 |
+
6,
|
12 |
+
8,
|
13 |
+
10,
|
14 |
+
12,
|
15 |
+
14,
|
16 |
+
16,
|
17 |
+
18,
|
18 |
+
20,
|
19 |
+
22
|
20 |
+
],
|
21 |
+
"feature_size": 13824,
|
22 |
+
"sequence_length": 256,
|
23 |
+
"compression_ratio": 0.6,
|
24 |
+
"output_inverted": true,
|
25 |
+
"task": "binary_classification",
|
26 |
+
"labels": [
|
27 |
+
"ai",
|
28 |
+
"human"
|
29 |
+
]
|
30 |
+
}
|
demo.html
ADDED
@@ -0,0 +1,676 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
<!DOCTYPE html>
|
2 |
+
<html lang="en">
|
3 |
+
<head>
|
4 |
+
<meta charset="UTF-8">
|
5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
6 |
+
<title>AI Text Detector</title>
|
7 |
+
<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script>
|
8 |
+
<script type="module" src="https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.2/dist/transformers.min.js"></script>
|
9 |
+
<style>
|
10 |
+
* {
|
11 |
+
margin: 0;
|
12 |
+
padding: 0;
|
13 |
+
box-sizing: border-box;
|
14 |
+
}
|
15 |
+
|
16 |
+
body {
|
17 |
+
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
18 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
19 |
+
min-height: 100vh;
|
20 |
+
display: flex;
|
21 |
+
align-items: center;
|
22 |
+
justify-content: center;
|
23 |
+
padding: 20px;
|
24 |
+
}
|
25 |
+
|
26 |
+
.container {
|
27 |
+
background: white;
|
28 |
+
border-radius: 20px;
|
29 |
+
box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
|
30 |
+
padding: 40px;
|
31 |
+
max-width: 800px;
|
32 |
+
width: 100%;
|
33 |
+
position: relative;
|
34 |
+
overflow: hidden;
|
35 |
+
}
|
36 |
+
|
37 |
+
.container::before {
|
38 |
+
content: '';
|
39 |
+
position: absolute;
|
40 |
+
top: 0;
|
41 |
+
left: 0;
|
42 |
+
right: 0;
|
43 |
+
height: 5px;
|
44 |
+
background: linear-gradient(90deg, #667eea, #764ba2, #f093fb, #f5576c);
|
45 |
+
}
|
46 |
+
|
47 |
+
h1 {
|
48 |
+
text-align: center;
|
49 |
+
color: #333;
|
50 |
+
margin-bottom: 10px;
|
51 |
+
font-size: 2.5em;
|
52 |
+
font-weight: 700;
|
53 |
+
}
|
54 |
+
|
55 |
+
.subtitle {
|
56 |
+
text-align: center;
|
57 |
+
color: #666;
|
58 |
+
margin-bottom: 30px;
|
59 |
+
font-size: 1.1em;
|
60 |
+
}
|
61 |
+
|
62 |
+
.input-section {
|
63 |
+
margin-bottom: 30px;
|
64 |
+
}
|
65 |
+
|
66 |
+
label {
|
67 |
+
display: block;
|
68 |
+
margin-bottom: 10px;
|
69 |
+
color: #333;
|
70 |
+
font-weight: 600;
|
71 |
+
font-size: 1.1em;
|
72 |
+
}
|
73 |
+
|
74 |
+
textarea {
|
75 |
+
width: 100%;
|
76 |
+
height: 200px;
|
77 |
+
padding: 20px;
|
78 |
+
border: 2px solid #e1e5e9;
|
79 |
+
border-radius: 15px;
|
80 |
+
font-size: 16px;
|
81 |
+
line-height: 1.6;
|
82 |
+
resize: vertical;
|
83 |
+
transition: all 0.3s ease;
|
84 |
+
font-family: inherit;
|
85 |
+
}
|
86 |
+
|
87 |
+
textarea:focus {
|
88 |
+
outline: none;
|
89 |
+
border-color: #667eea;
|
90 |
+
box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1);
|
91 |
+
}
|
92 |
+
|
93 |
+
.button-container {
|
94 |
+
text-align: center;
|
95 |
+
margin: 30px 0;
|
96 |
+
}
|
97 |
+
|
98 |
+
button {
|
99 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
100 |
+
color: white;
|
101 |
+
border: none;
|
102 |
+
padding: 15px 40px;
|
103 |
+
border-radius: 50px;
|
104 |
+
font-size: 18px;
|
105 |
+
font-weight: 600;
|
106 |
+
cursor: pointer;
|
107 |
+
transition: all 0.3s ease;
|
108 |
+
box-shadow: 0 5px 15px rgba(102, 126, 234, 0.3);
|
109 |
+
}
|
110 |
+
|
111 |
+
button:hover {
|
112 |
+
transform: translateY(-2px);
|
113 |
+
box-shadow: 0 8px 25px rgba(102, 126, 234, 0.4);
|
114 |
+
}
|
115 |
+
|
116 |
+
button:active {
|
117 |
+
transform: translateY(0);
|
118 |
+
}
|
119 |
+
|
120 |
+
button:disabled {
|
121 |
+
background: #ccc;
|
122 |
+
cursor: not-allowed;
|
123 |
+
transform: none;
|
124 |
+
box-shadow: none;
|
125 |
+
}
|
126 |
+
|
127 |
+
.result {
|
128 |
+
margin-top: 30px;
|
129 |
+
padding: 25px;
|
130 |
+
border-radius: 15px;
|
131 |
+
text-align: center;
|
132 |
+
transition: all 0.3s ease;
|
133 |
+
}
|
134 |
+
|
135 |
+
.result.human {
|
136 |
+
background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
|
137 |
+
color: white;
|
138 |
+
}
|
139 |
+
|
140 |
+
.result.ai {
|
141 |
+
background: linear-gradient(135deg, #fa709a 0%, #fee140 100%);
|
142 |
+
color: white;
|
143 |
+
}
|
144 |
+
|
145 |
+
.result.loading {
|
146 |
+
background: linear-gradient(135deg, #ffecd2 0%, #fcb69f 100%);
|
147 |
+
color: #333;
|
148 |
+
}
|
149 |
+
|
150 |
+
.result.error {
|
151 |
+
background: linear-gradient(135deg, #ff9a9e 0%, #fecfef 100%);
|
152 |
+
color: #333;
|
153 |
+
}
|
154 |
+
|
155 |
+
.prediction {
|
156 |
+
font-size: 2em;
|
157 |
+
font-weight: 700;
|
158 |
+
margin-bottom: 10px;
|
159 |
+
text-transform: uppercase;
|
160 |
+
}
|
161 |
+
|
162 |
+
.confidence {
|
163 |
+
font-size: 1.2em;
|
164 |
+
margin-bottom: 10px;
|
165 |
+
}
|
166 |
+
|
167 |
+
.probability {
|
168 |
+
font-size: 1em;
|
169 |
+
opacity: 0.9;
|
170 |
+
}
|
171 |
+
|
172 |
+
.stats {
|
173 |
+
display: grid;
|
174 |
+
grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
|
175 |
+
gap: 15px;
|
176 |
+
margin-top: 20px;
|
177 |
+
}
|
178 |
+
|
179 |
+
.stat {
|
180 |
+
text-align: center;
|
181 |
+
padding: 15px;
|
182 |
+
background: rgba(255, 255, 255, 0.1);
|
183 |
+
border-radius: 10px;
|
184 |
+
backdrop-filter: blur(10px);
|
185 |
+
}
|
186 |
+
|
187 |
+
.stat-value {
|
188 |
+
font-size: 1.5em;
|
189 |
+
font-weight: 700;
|
190 |
+
display: block;
|
191 |
+
}
|
192 |
+
|
193 |
+
.stat-label {
|
194 |
+
font-size: 0.9em;
|
195 |
+
opacity: 0.8;
|
196 |
+
}
|
197 |
+
|
198 |
+
.loading-spinner {
|
199 |
+
display: inline-block;
|
200 |
+
width: 20px;
|
201 |
+
height: 20px;
|
202 |
+
border: 2px solid #f3f3f3;
|
203 |
+
border-top: 2px solid #333;
|
204 |
+
border-radius: 50%;
|
205 |
+
animation: spin 1s linear infinite;
|
206 |
+
}
|
207 |
+
|
208 |
+
@keyframes spin {
|
209 |
+
0% { transform: rotate(0deg); }
|
210 |
+
100% { transform: rotate(360deg); }
|
211 |
+
}
|
212 |
+
|
213 |
+
.model-info {
|
214 |
+
background: #f8f9fa;
|
215 |
+
padding: 20px;
|
216 |
+
border-radius: 15px;
|
217 |
+
margin-bottom: 30px;
|
218 |
+
border-left: 5px solid #667eea;
|
219 |
+
}
|
220 |
+
|
221 |
+
.model-info h3 {
|
222 |
+
color: #333;
|
223 |
+
margin-bottom: 10px;
|
224 |
+
}
|
225 |
+
|
226 |
+
.model-info p {
|
227 |
+
color: #666;
|
228 |
+
line-height: 1.6;
|
229 |
+
}
|
230 |
+
|
231 |
+
.examples {
|
232 |
+
margin-top: 30px;
|
233 |
+
display: grid;
|
234 |
+
grid-template-columns: 1fr 1fr;
|
235 |
+
gap: 20px;
|
236 |
+
}
|
237 |
+
|
238 |
+
.example {
|
239 |
+
background: #f8f9fa;
|
240 |
+
padding: 15px;
|
241 |
+
border-radius: 10px;
|
242 |
+
cursor: pointer;
|
243 |
+
transition: all 0.3s ease;
|
244 |
+
border: 2px solid transparent;
|
245 |
+
}
|
246 |
+
|
247 |
+
.example:hover {
|
248 |
+
background: #e9ecef;
|
249 |
+
border-color: #667eea;
|
250 |
+
}
|
251 |
+
|
252 |
+
.example h4 {
|
253 |
+
color: #333;
|
254 |
+
margin-bottom: 10px;
|
255 |
+
font-size: 1em;
|
256 |
+
}
|
257 |
+
|
258 |
+
.example p {
|
259 |
+
color: #666;
|
260 |
+
font-size: 0.9em;
|
261 |
+
line-height: 1.4;
|
262 |
+
}
|
263 |
+
|
264 |
+
.status {
|
265 |
+
margin-top: 10px;
|
266 |
+
padding: 8px 12px;
|
267 |
+
border-radius: 8px;
|
268 |
+
font-size: 0.9em;
|
269 |
+
font-weight: 500;
|
270 |
+
}
|
271 |
+
|
272 |
+
.status.loading {
|
273 |
+
background: #fff3cd;
|
274 |
+
color: #856404;
|
275 |
+
border: 1px solid #ffeaa7;
|
276 |
+
}
|
277 |
+
|
278 |
+
.status.ready {
|
279 |
+
background: #d4edda;
|
280 |
+
color: #155724;
|
281 |
+
border: 1px solid #c3e6cb;
|
282 |
+
}
|
283 |
+
|
284 |
+
.status.processing {
|
285 |
+
background: #cce8ff;
|
286 |
+
color: #004085;
|
287 |
+
border: 1px solid #b3d9ff;
|
288 |
+
}
|
289 |
+
|
290 |
+
.status.error {
|
291 |
+
background: #f8d7da;
|
292 |
+
color: #721c24;
|
293 |
+
border: 1px solid #f5c6cb;
|
294 |
+
}
|
295 |
+
|
296 |
+
.status.complete {
|
297 |
+
background: #d1ecf1;
|
298 |
+
color: #0c5460;
|
299 |
+
border: 1px solid #bee5eb;
|
300 |
+
}
|
301 |
+
|
302 |
+
code {
|
303 |
+
background: rgba(0,0,0,0.1);
|
304 |
+
padding: 2px 4px;
|
305 |
+
border-radius: 3px;
|
306 |
+
font-family: monospace;
|
307 |
+
font-size: 0.85em;
|
308 |
+
}
|
309 |
+
.container {
|
310 |
+
padding: 20px;
|
311 |
+
margin: 10px;
|
312 |
+
}
|
313 |
+
|
314 |
+
h1 {
|
315 |
+
font-size: 2em;
|
316 |
+
}
|
317 |
+
|
318 |
+
.stats {
|
319 |
+
grid-template-columns: repeat(2, 1fr);
|
320 |
+
}
|
321 |
+
</style>
|
322 |
+
</head>
|
323 |
+
<body>
|
324 |
+
<div class="container">
|
325 |
+
<h1>🤖 AI Text Detector</h1>
|
326 |
+
<p class="subtitle">Powered by Ultra-Optimized Neural Networks</p>
|
327 |
+
|
328 |
+
<div class="model-info">
|
329 |
+
<h3>📊 Model Status</h3>
|
330 |
+
<div id="status" class="status loading">🔄 Loading model and tokenizer...</div>
|
331 |
+
</div>
|
332 |
+
|
333 |
+
<div class="input-section">
|
334 |
+
<label for="textInput">📝 Enter text to analyze:</label>
|
335 |
+
<textarea
|
336 |
+
id="textInput"
|
337 |
+
placeholder="Paste your text here... (minimum 100 characters required for accurate analysis)"
|
338 |
+
spellcheck="false"
|
339 |
+
></textarea>
|
340 |
+
</div>
|
341 |
+
|
342 |
+
<div class="button-container">
|
343 |
+
<button id="analyzeBtn" onclick="analyzeText()">
|
344 |
+
<span id="btnText">🚀 Analyze Text</span>
|
345 |
+
<span id="btnSpinner" class="loading-spinner" style="display: none;"></span>
|
346 |
+
</button>
|
347 |
+
</div>
|
348 |
+
|
349 |
+
<div id="result" class="result" style="display: none;"></div>
|
350 |
+
</div>
|
351 |
+
|
352 |
+
<script type="module">
|
353 |
+
import { AutoTokenizer } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.2/dist/transformers.min.js';
|
354 |
+
|
355 |
+
let session = null;
|
356 |
+
let tokenizer = null;
|
357 |
+
|
358 |
+
// Initialize ONNX Runtime and load model + tokenizer
|
359 |
+
async function initializeModel() {
|
360 |
+
try {
|
361 |
+
console.log('Loading tokenizer and ONNX model...');
|
362 |
+
|
363 |
+
// Load the actual tokenizer from HuggingFace Hub
|
364 |
+
tokenizer = await AutoTokenizer.from_pretrained('HuggingFaceTB/SmolLM-135M', {
|
365 |
+
progress_callback: (progress) => {
|
366 |
+
if (progress.status === 'downloading') {
|
367 |
+
updateStatus('loading', `📥 Downloading tokenizer: ${progress.name}`);
|
368 |
+
}
|
369 |
+
}
|
370 |
+
});
|
371 |
+
console.log('Tokenizer loaded successfully!');
|
372 |
+
updateStatus('loading', '🤖 Loading ONNX model...');
|
373 |
+
|
374 |
+
// Load ONNX model - try multiple possible filenames
|
375 |
+
const possibleModelNames = [
|
376 |
+
'./fixed_optimized_detector.onnx',
|
377 |
+
'./ultra_optimized_detector.onnx',
|
378 |
+
'./optimized_detector.onnx',
|
379 |
+
'./model.onnx'
|
380 |
+
];
|
381 |
+
|
382 |
+
let modelLoaded = false;
|
383 |
+
for (const modelPath of possibleModelNames) {
|
384 |
+
try {
|
385 |
+
session = await ort.InferenceSession.create(modelPath);
|
386 |
+
console.log(`ONNX model loaded successfully from: ${modelPath}`);
|
387 |
+
modelLoaded = true;
|
388 |
+
break;
|
389 |
+
} catch (error) {
|
390 |
+
// Only log if it's not a 404 error to reduce console spam
|
391 |
+
if (!error.message.includes('failed to load external data file')) {
|
392 |
+
console.log(`Failed to load from ${modelPath}:`, error.message);
|
393 |
+
}
|
394 |
+
}
|
395 |
+
}
|
396 |
+
|
397 |
+
if (!modelLoaded) {
|
398 |
+
throw new Error('ONNX model file not found. Please ensure your .onnx file is in the same directory as this HTML file.');
|
399 |
+
}
|
400 |
+
|
401 |
+
console.log('Model inputs:', session.inputNames);
|
402 |
+
console.log('Model outputs:', session.outputNames);
|
403 |
+
|
404 |
+
// Enable the analyze button
|
405 |
+
document.getElementById('analyzeBtn').disabled = false;
|
406 |
+
updateStatus('ready', '✅ Model loaded and ready!');
|
407 |
+
|
408 |
+
} catch (error) {
|
409 |
+
console.error('Failed to load model:', error);
|
410 |
+
updateStatus('error', `❌ Failed to load: ${error.message}`);
|
411 |
+
|
412 |
+
// Show helpful error message based on the type of error
|
413 |
+
if (error.message.includes('tokenizer')) {
|
414 |
+
showResult('error', '❌ Failed to load tokenizer. Please check your internet connection.');
|
415 |
+
} else if (error.message.includes('ONNX') || error.message.includes('external data')) {
|
416 |
+
showResult('error', `❌ ONNX model file not found. Please place your .onnx model file in the same directory as this HTML file. Expected names: ultra_optimized_detector.onnx, fixed_optimized_detector.onnx, optimized_detector.onnx, or model.onnx`);
|
417 |
+
} else {
|
418 |
+
showResult('error', `❌ Failed to initialize: ${error.message}`);
|
419 |
+
}
|
420 |
+
}
|
421 |
+
}
|
422 |
+
|
423 |
+
// Tokenize text using the proper tokenizer
|
424 |
+
async function tokenizeText(text, maxLength = 256) {
|
425 |
+
try {
|
426 |
+
// Use the actual tokenizer with proper settings
|
427 |
+
const encoded = await tokenizer(text, {
|
428 |
+
truncation: true,
|
429 |
+
padding: 'max_length',
|
430 |
+
max_length: maxLength,
|
431 |
+
return_tensors: false // We'll handle tensor creation manually
|
432 |
+
});
|
433 |
+
|
434 |
+
console.log('Encoded result:', encoded);
|
435 |
+
|
436 |
+
// Handle different possible return formats
|
437 |
+
let inputIds, attentionMask;
|
438 |
+
|
439 |
+
if (encoded.input_ids && Array.isArray(encoded.input_ids)) {
|
440 |
+
// Direct array format
|
441 |
+
inputIds = encoded.input_ids;
|
442 |
+
attentionMask = encoded.attention_mask;
|
443 |
+
} else if (encoded.input_ids && encoded.input_ids.data) {
|
444 |
+
// Tensor-like format
|
445 |
+
inputIds = Array.from(encoded.input_ids.data);
|
446 |
+
attentionMask = Array.from(encoded.attention_mask.data);
|
447 |
+
} else if (Array.isArray(encoded)) {
|
448 |
+
// Sometimes returns just the token IDs
|
449 |
+
inputIds = encoded;
|
450 |
+
attentionMask = encoded.map(token => token === tokenizer.pad_token_id ? 0 : 1);
|
451 |
+
} else {
|
452 |
+
throw new Error('Unexpected tokenizer output format');
|
453 |
+
}
|
454 |
+
|
455 |
+
// Ensure we have the right length
|
456 |
+
if (inputIds.length !== maxLength) {
|
457 |
+
console.warn(`Expected length ${maxLength}, got ${inputIds.length}`);
|
458 |
+
// Pad or truncate as needed
|
459 |
+
if (inputIds.length < maxLength) {
|
460 |
+
const padToken = tokenizer.pad_token_id || 0;
|
461 |
+
while (inputIds.length < maxLength) {
|
462 |
+
inputIds.push(padToken);
|
463 |
+
attentionMask.push(0);
|
464 |
+
}
|
465 |
+
} else {
|
466 |
+
inputIds = inputIds.slice(0, maxLength);
|
467 |
+
attentionMask = attentionMask.slice(0, maxLength);
|
468 |
+
}
|
469 |
+
}
|
470 |
+
|
471 |
+
return {
|
472 |
+
input_ids: inputIds,
|
473 |
+
attention_mask: attentionMask
|
474 |
+
};
|
475 |
+
} catch (error) {
|
476 |
+
console.error('Tokenization error:', error);
|
477 |
+
throw new Error(`Failed to tokenize text: ${error.message}`);
|
478 |
+
}
|
479 |
+
}
|
480 |
+
|
481 |
+
async function analyzeText() {
|
482 |
+
const text = document.getElementById('textInput').value.trim();
|
483 |
+
|
484 |
+
if (!text) {
|
485 |
+
showResult('error', 'Please enter some text to analyze.');
|
486 |
+
return;
|
487 |
+
}
|
488 |
+
|
489 |
+
if (text.length < 100) {
|
490 |
+
showResult('error', 'Please enter at least 100 characters for accurate analysis.');
|
491 |
+
return;
|
492 |
+
}
|
493 |
+
|
494 |
+
if (!session || !tokenizer) {
|
495 |
+
showResult('error', 'Model or tokenizer not loaded yet. Please wait...');
|
496 |
+
return;
|
497 |
+
}
|
498 |
+
|
499 |
+
// Show loading state
|
500 |
+
setLoading(true);
|
501 |
+
showResult('loading', 'Tokenizing and analyzing text...');
|
502 |
+
|
503 |
+
try {
|
504 |
+
// Tokenize the text using the proper tokenizer
|
505 |
+
console.log('Tokenizing text...');
|
506 |
+
const tokenized = await tokenizeText(text, 256);
|
507 |
+
|
508 |
+
console.log('Input IDs length:', tokenized.input_ids.length);
|
509 |
+
console.log('Attention mask length:', tokenized.attention_mask.length);
|
510 |
+
console.log('Sample tokens:', tokenized.input_ids.slice(0, 10));
|
511 |
+
console.log('Sample attention:', tokenized.attention_mask.slice(0, 10));
|
512 |
+
|
513 |
+
// Validate tokenization
|
514 |
+
if (!tokenized.input_ids || !Array.isArray(tokenized.input_ids)) {
|
515 |
+
throw new Error('Invalid tokenization: input_ids is not an array');
|
516 |
+
}
|
517 |
+
|
518 |
+
if (!tokenized.attention_mask || !Array.isArray(tokenized.attention_mask)) {
|
519 |
+
throw new Error('Invalid tokenization: attention_mask is not an array');
|
520 |
+
}
|
521 |
+
|
522 |
+
if (tokenized.input_ids.length !== 256 || tokenized.attention_mask.length !== 256) {
|
523 |
+
throw new Error(`Invalid tokenization: expected length 256, got input_ids: ${tokenized.input_ids.length}, attention_mask: ${tokenized.attention_mask.length}`);
|
524 |
+
}
|
525 |
+
|
526 |
+
// Convert to the correct format for ONNX
|
527 |
+
const inputIds = new BigInt64Array(tokenized.input_ids.map(id => BigInt(id)));
|
528 |
+
const attentionMask = new BigInt64Array(tokenized.attention_mask.map(mask => BigInt(mask)));
|
529 |
+
|
530 |
+
// Create ONNX tensors with correct shapes
|
531 |
+
const feeds = {
|
532 |
+
'input_ids': new ort.Tensor('int64', inputIds, [1, 256]),
|
533 |
+
'attention_mask': new ort.Tensor('int64', attentionMask, [1, 256])
|
534 |
+
};
|
535 |
+
|
536 |
+
console.log('Running inference...');
|
537 |
+
updateStatus('processing', '🧠 Running neural network inference...');
|
538 |
+
|
539 |
+
// Run inference
|
540 |
+
const startTime = performance.now();
|
541 |
+
const results = await session.run(feeds);
|
542 |
+
const inferenceTime = performance.now() - startTime;
|
543 |
+
|
544 |
+
console.log('Inference completed in', inferenceTime.toFixed(2), 'ms');
|
545 |
+
console.log('Raw output:', results.probability_human.data[0]);
|
546 |
+
|
547 |
+
const probability = results.probability_human.data[0];
|
548 |
+
|
549 |
+
// Interpret results - flip the logic since it seems backwards
|
550 |
+
const isHuman = probability < 0.5; // Changed from > to <
|
551 |
+
const confidence = Math.abs(probability - 0.5) * 2;
|
552 |
+
|
553 |
+
// Display the corrected probability (1 - probability for human score)
|
554 |
+
const humanProbability = 1 - probability;
|
555 |
+
|
556 |
+
updateStatus('complete', `✅ Analysis complete (${inferenceTime.toFixed(0)}ms)`);
|
557 |
+
displayResults(humanProbability, isHuman, confidence, text.length, inferenceTime);
|
558 |
+
|
559 |
+
} catch (error) {
|
560 |
+
console.error('Analysis error:', error);
|
561 |
+
updateStatus('error', `❌ Analysis failed: ${error.message}`);
|
562 |
+
showResult('error', `Error analyzing text: ${error.message}`);
|
563 |
+
} finally {
|
564 |
+
setLoading(false);
|
565 |
+
}
|
566 |
+
}
|
567 |
+
|
568 |
+
function displayResults(probability, isHuman, confidence, textLength, inferenceTime) {
|
569 |
+
const resultDiv = document.getElementById('result');
|
570 |
+
const className = isHuman ? 'human' : 'ai';
|
571 |
+
const prediction = isHuman ? 'Human Written' : 'AI Generated';
|
572 |
+
const icon = isHuman ? '👤' : '🤖';
|
573 |
+
|
574 |
+
// Calculate token count (approximate)
|
575 |
+
const estimatedTokens = Math.ceil(textLength / 4); // Rough estimate
|
576 |
+
|
577 |
+
resultDiv.className = `result ${className}`;
|
578 |
+
resultDiv.style.display = 'block';
|
579 |
+
|
580 |
+
resultDiv.innerHTML = `
|
581 |
+
<div class="prediction">${icon} ${prediction}</div>
|
582 |
+
<div class="confidence">Confidence: ${(confidence * 100).toFixed(1)}%</div>
|
583 |
+
<div class="probability">Human Probability: ${(probability * 100).toFixed(1)}%</div>
|
584 |
+
|
585 |
+
<div class="stats">
|
586 |
+
<div class="stat">
|
587 |
+
<span class="stat-value">${textLength}</span>
|
588 |
+
<span class="stat-label">Characters</span>
|
589 |
+
</div>
|
590 |
+
<div class="stat">
|
591 |
+
<span class="stat-value">${estimatedTokens}</span>
|
592 |
+
<span class="stat-label">Est. Tokens</span>
|
593 |
+
</div>
|
594 |
+
<div class="stat">
|
595 |
+
<span class="stat-value">${inferenceTime.toFixed(0)}ms</span>
|
596 |
+
<span class="stat-label">Inference Time</span>
|
597 |
+
</div>
|
598 |
+
<div class="stat">
|
599 |
+
<span class="stat-value">${(probability * 100).toFixed(0)}%</span>
|
600 |
+
<span class="stat-label">Human Score</span>
|
601 |
+
</div>
|
602 |
+
</div>
|
603 |
+
|
604 |
+
<div style="margin-top: 15px; padding: 15px; background: rgba(255,255,255,0.1); border-radius: 10px; font-size: 0.9em;">
|
605 |
+
<strong>Performance:</strong> ${inferenceTime.toFixed(0)}ms inference time
|
606 |
+
</div>
|
607 |
+
`;
|
608 |
+
|
609 |
+
// Scroll to results
|
610 |
+
resultDiv.scrollIntoView({ behavior: 'smooth', block: 'nearest' });
|
611 |
+
}
|
612 |
+
|
613 |
+
function showResult(type, message) {
|
614 |
+
const resultDiv = document.getElementById('result');
|
615 |
+
resultDiv.className = `result ${type}`;
|
616 |
+
resultDiv.style.display = 'block';
|
617 |
+
|
618 |
+
if (type === 'loading') {
|
619 |
+
resultDiv.innerHTML = `
|
620 |
+
<div style="display: flex; align-items: center; justify-content: center; gap: 10px;">
|
621 |
+
<div class="loading-spinner"></div>
|
622 |
+
${message}
|
623 |
+
</div>
|
624 |
+
`;
|
625 |
+
} else {
|
626 |
+
resultDiv.innerHTML = `<div>${message}</div>`;
|
627 |
+
}
|
628 |
+
}
|
629 |
+
|
630 |
+
function setLoading(isLoading) {
|
631 |
+
const btn = document.getElementById('analyzeBtn');
|
632 |
+
const btnText = document.getElementById('btnText');
|
633 |
+
const btnSpinner = document.getElementById('btnSpinner');
|
634 |
+
|
635 |
+
btn.disabled = isLoading;
|
636 |
+
btnText.style.display = isLoading ? 'none' : 'inline';
|
637 |
+
btnSpinner.style.display = isLoading ? 'inline-block' : 'none';
|
638 |
+
}
|
639 |
+
|
640 |
+
function updateStatus(type, message) {
|
641 |
+
const statusDiv = document.getElementById('status');
|
642 |
+
if (statusDiv) {
|
643 |
+
statusDiv.textContent = message;
|
644 |
+
statusDiv.className = `status ${type}`;
|
645 |
+
}
|
646 |
+
}
|
647 |
+
|
648 |
+
function loadExample(type) {
|
649 |
+
const textarea = document.getElementById('textInput');
|
650 |
+
|
651 |
+
if (type === 'human') {
|
652 |
+
textarea.value = "I've been thinking a lot about creativity lately, especially after visiting the local art museum last weekend. There's something deeply moving about standing in front of a painting that someone poured their heart into decades or even centuries ago. The way light hits the canvas, the subtle imperfections in the brushstrokes, the stories hidden in every corner of the composition. It makes me wonder about the artist's life, their struggles, their moments of doubt and breakthrough. Art has this incredible power to transcend time and connect us with people we'll never meet, yet somehow understand on a profound level.";
|
653 |
+
} else {
|
654 |
+
textarea.value = "Here are the key steps to improve your writing skills: 1) Read extensively across different genres and styles to expand your vocabulary and understanding of various writing techniques. 2) Practice writing regularly, setting aside dedicated time each day for writing exercises or projects. 3) Seek feedback from peers, mentors, or writing groups to identify areas for improvement. 4) Study grammar and style guides to ensure technical accuracy. 5) Revise and edit your work multiple times, focusing on clarity, coherence, and flow. 6) Experiment with different writing formats and styles to find your unique voice. Following these steps consistently will help you develop stronger writing abilities over time.";
|
655 |
+
}
|
656 |
+
|
657 |
+
// Auto-focus the textarea
|
658 |
+
textarea.focus();
|
659 |
+
}
|
660 |
+
|
661 |
+
// Handle Enter key in textarea (Shift+Enter for new line, Enter to analyze)
|
662 |
+
document.getElementById('textInput').addEventListener('keydown', function(e) {
|
663 |
+
if (e.key === 'Enter' && !e.shiftKey) {
|
664 |
+
e.preventDefault();
|
665 |
+
analyzeText();
|
666 |
+
}
|
667 |
+
});
|
668 |
+
|
669 |
+
// Make functions globally available
|
670 |
+
window.analyzeText = analyzeText;
|
671 |
+
|
672 |
+
// Initialize the model when page loads
|
673 |
+
window.addEventListener('load', initializeModel);
|
674 |
+
</script>
|
675 |
+
</body>
|
676 |
+
</html>
|
deployment_config.json
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_files": {
|
3 |
+
"onnx_model": "fixed_optimized_detector.onnx",
|
4 |
+
"pytorch_model": "pytorch_optimized_detector.pt",
|
5 |
+
"tokenizer": "."
|
6 |
+
},
|
7 |
+
"model_config": {
|
8 |
+
"base_model_name": "HuggingFaceTB/SmolLM-135M",
|
9 |
+
"optimal_layers": [
|
10 |
+
0,
|
11 |
+
2,
|
12 |
+
4,
|
13 |
+
6,
|
14 |
+
8,
|
15 |
+
10,
|
16 |
+
12,
|
17 |
+
14,
|
18 |
+
16,
|
19 |
+
18,
|
20 |
+
20,
|
21 |
+
22
|
22 |
+
],
|
23 |
+
"max_length": 256,
|
24 |
+
"feature_dim": 13824,
|
25 |
+
"layers_loaded": 23,
|
26 |
+
"layers_used": 12
|
27 |
+
},
|
28 |
+
"performance": {
|
29 |
+
"accuracy": 0.9667,
|
30 |
+
"auc": 0.9934,
|
31 |
+
"original_accuracy": 0.997
|
32 |
+
},
|
33 |
+
"optimization_info": {
|
34 |
+
"strategy": "hook_based_truncated",
|
35 |
+
"layers_reduction": "30 \u2192 23",
|
36 |
+
"features_reduction": "34560 \u2192 13824",
|
37 |
+
"onnx_available": true
|
38 |
+
}
|
39 |
+
}
|
fixed_optimized_detector.onnx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e638c87ae11cc6f475e08dc0ed1c821a8433c504855d7d497d164aab49d7cf0f
|
3 |
+
size 441430808
|
merges.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
pytorch_optimized_detector.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8fa69204a7b227bfbc11b356e92bda566f3127cdf2fd2d96814cf4d1cfd19070
|
3 |
+
size 439506922
|
special_tokens_map.json
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"additional_special_tokens": [
|
3 |
+
"<|endoftext|>",
|
4 |
+
"<|im_start|>",
|
5 |
+
"<|im_end|>",
|
6 |
+
"<repo_name>",
|
7 |
+
"<reponame>",
|
8 |
+
"<file_sep>",
|
9 |
+
"<filename>",
|
10 |
+
"<gh_stars>",
|
11 |
+
"<issue_start>",
|
12 |
+
"<issue_comment>",
|
13 |
+
"<issue_closed>",
|
14 |
+
"<jupyter_start>",
|
15 |
+
"<jupyter_text>",
|
16 |
+
"<jupyter_code>",
|
17 |
+
"<jupyter_output>",
|
18 |
+
"<jupyter_script>",
|
19 |
+
"<empty_output>"
|
20 |
+
],
|
21 |
+
"bos_token": {
|
22 |
+
"content": "<|endoftext|>",
|
23 |
+
"lstrip": false,
|
24 |
+
"normalized": false,
|
25 |
+
"rstrip": false,
|
26 |
+
"single_word": false
|
27 |
+
},
|
28 |
+
"eos_token": {
|
29 |
+
"content": "<|endoftext|>",
|
30 |
+
"lstrip": false,
|
31 |
+
"normalized": false,
|
32 |
+
"rstrip": false,
|
33 |
+
"single_word": false
|
34 |
+
},
|
35 |
+
"pad_token": "<|endoftext|>",
|
36 |
+
"unk_token": {
|
37 |
+
"content": "<|endoftext|>",
|
38 |
+
"lstrip": false,
|
39 |
+
"normalized": false,
|
40 |
+
"rstrip": false,
|
41 |
+
"single_word": false
|
42 |
+
}
|
43 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_prefix_space": false,
|
3 |
+
"added_tokens_decoder": {
|
4 |
+
"0": {
|
5 |
+
"content": "<|endoftext|>",
|
6 |
+
"lstrip": false,
|
7 |
+
"normalized": false,
|
8 |
+
"rstrip": false,
|
9 |
+
"single_word": false,
|
10 |
+
"special": true
|
11 |
+
},
|
12 |
+
"1": {
|
13 |
+
"content": "<|im_start|>",
|
14 |
+
"lstrip": false,
|
15 |
+
"normalized": false,
|
16 |
+
"rstrip": false,
|
17 |
+
"single_word": false,
|
18 |
+
"special": true
|
19 |
+
},
|
20 |
+
"10": {
|
21 |
+
"content": "<issue_closed>",
|
22 |
+
"lstrip": false,
|
23 |
+
"normalized": false,
|
24 |
+
"rstrip": false,
|
25 |
+
"single_word": false,
|
26 |
+
"special": true
|
27 |
+
},
|
28 |
+
"11": {
|
29 |
+
"content": "<jupyter_start>",
|
30 |
+
"lstrip": false,
|
31 |
+
"normalized": false,
|
32 |
+
"rstrip": false,
|
33 |
+
"single_word": false,
|
34 |
+
"special": true
|
35 |
+
},
|
36 |
+
"12": {
|
37 |
+
"content": "<jupyter_text>",
|
38 |
+
"lstrip": false,
|
39 |
+
"normalized": false,
|
40 |
+
"rstrip": false,
|
41 |
+
"single_word": false,
|
42 |
+
"special": true
|
43 |
+
},
|
44 |
+
"13": {
|
45 |
+
"content": "<jupyter_code>",
|
46 |
+
"lstrip": false,
|
47 |
+
"normalized": false,
|
48 |
+
"rstrip": false,
|
49 |
+
"single_word": false,
|
50 |
+
"special": true
|
51 |
+
},
|
52 |
+
"14": {
|
53 |
+
"content": "<jupyter_output>",
|
54 |
+
"lstrip": false,
|
55 |
+
"normalized": false,
|
56 |
+
"rstrip": false,
|
57 |
+
"single_word": false,
|
58 |
+
"special": true
|
59 |
+
},
|
60 |
+
"15": {
|
61 |
+
"content": "<jupyter_script>",
|
62 |
+
"lstrip": false,
|
63 |
+
"normalized": false,
|
64 |
+
"rstrip": false,
|
65 |
+
"single_word": false,
|
66 |
+
"special": true
|
67 |
+
},
|
68 |
+
"16": {
|
69 |
+
"content": "<empty_output>",
|
70 |
+
"lstrip": false,
|
71 |
+
"normalized": false,
|
72 |
+
"rstrip": false,
|
73 |
+
"single_word": false,
|
74 |
+
"special": true
|
75 |
+
},
|
76 |
+
"2": {
|
77 |
+
"content": "<|im_end|>",
|
78 |
+
"lstrip": false,
|
79 |
+
"normalized": false,
|
80 |
+
"rstrip": false,
|
81 |
+
"single_word": false,
|
82 |
+
"special": true
|
83 |
+
},
|
84 |
+
"3": {
|
85 |
+
"content": "<repo_name>",
|
86 |
+
"lstrip": false,
|
87 |
+
"normalized": false,
|
88 |
+
"rstrip": false,
|
89 |
+
"single_word": false,
|
90 |
+
"special": true
|
91 |
+
},
|
92 |
+
"4": {
|
93 |
+
"content": "<reponame>",
|
94 |
+
"lstrip": false,
|
95 |
+
"normalized": false,
|
96 |
+
"rstrip": false,
|
97 |
+
"single_word": false,
|
98 |
+
"special": true
|
99 |
+
},
|
100 |
+
"5": {
|
101 |
+
"content": "<file_sep>",
|
102 |
+
"lstrip": false,
|
103 |
+
"normalized": false,
|
104 |
+
"rstrip": false,
|
105 |
+
"single_word": false,
|
106 |
+
"special": true
|
107 |
+
},
|
108 |
+
"6": {
|
109 |
+
"content": "<filename>",
|
110 |
+
"lstrip": false,
|
111 |
+
"normalized": false,
|
112 |
+
"rstrip": false,
|
113 |
+
"single_word": false,
|
114 |
+
"special": true
|
115 |
+
},
|
116 |
+
"7": {
|
117 |
+
"content": "<gh_stars>",
|
118 |
+
"lstrip": false,
|
119 |
+
"normalized": false,
|
120 |
+
"rstrip": false,
|
121 |
+
"single_word": false,
|
122 |
+
"special": true
|
123 |
+
},
|
124 |
+
"8": {
|
125 |
+
"content": "<issue_start>",
|
126 |
+
"lstrip": false,
|
127 |
+
"normalized": false,
|
128 |
+
"rstrip": false,
|
129 |
+
"single_word": false,
|
130 |
+
"special": true
|
131 |
+
},
|
132 |
+
"9": {
|
133 |
+
"content": "<issue_comment>",
|
134 |
+
"lstrip": false,
|
135 |
+
"normalized": false,
|
136 |
+
"rstrip": false,
|
137 |
+
"single_word": false,
|
138 |
+
"special": true
|
139 |
+
}
|
140 |
+
},
|
141 |
+
"additional_special_tokens": [
|
142 |
+
"<|endoftext|>",
|
143 |
+
"<|im_start|>",
|
144 |
+
"<|im_end|>",
|
145 |
+
"<repo_name>",
|
146 |
+
"<reponame>",
|
147 |
+
"<file_sep>",
|
148 |
+
"<filename>",
|
149 |
+
"<gh_stars>",
|
150 |
+
"<issue_start>",
|
151 |
+
"<issue_comment>",
|
152 |
+
"<issue_closed>",
|
153 |
+
"<jupyter_start>",
|
154 |
+
"<jupyter_text>",
|
155 |
+
"<jupyter_code>",
|
156 |
+
"<jupyter_output>",
|
157 |
+
"<jupyter_script>",
|
158 |
+
"<empty_output>"
|
159 |
+
],
|
160 |
+
"bos_token": "<|endoftext|>",
|
161 |
+
"clean_up_tokenization_spaces": false,
|
162 |
+
"eos_token": "<|endoftext|>",
|
163 |
+
"model_max_length": 1000000000000000019884624838656,
|
164 |
+
"tokenizer_class": "GPT2Tokenizer",
|
165 |
+
"unk_token": "<|endoftext|>",
|
166 |
+
"vocab_size": 49152
|
167 |
+
}
|
vocab.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|