Joshfcooper commited on
Commit
e7678d5
·
verified ·
1 Parent(s): 5ee3ee9

Upload AI text detector model

Browse files
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: HuggingFaceTB/SmolLM-135M
4
+ tags:
5
+ - text-classification
6
+ - ai-detection
7
+ - pytorch
8
+ - onnx
9
+ - transformers
10
+ language:
11
+ - en
12
+ metrics:
13
+ - accuracy
14
+ library_name: transformers
15
+ pipeline_tag: text-classification
16
+ ---
17
+
18
+ # Joshfcooper/ai-text-detector-optimized
19
+
20
+ ## Model Description
21
+
22
+ This is an ultra-optimized AI text detector based on SmolLM-135M, designed to distinguish between human-written and AI-generated text with high accuracy and blazing-fast inference speed.
23
+
24
+ ## Key Features
25
+
26
+ - **High Accuracy**: 96.7% accuracy on test data
27
+ - **Ultra-Fast**: 103.1ms average inference time
28
+ - **Optimized Architecture**: Uses only 12 out of 30 transformer layers (60% compression)
29
+ - **Multiple Formats**: Available in both PyTorch (.pt) and ONNX (.onnx) formats
30
+ - **Production Ready**: Optimized for real-world deployment
31
+
32
+ ## Model Architecture
33
+
34
+ - **Base Model**: HuggingFaceTB/SmolLM-135M
35
+ - **Compression**: 30 layers → 12 layers (selected layers: 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22)
36
+ - **Feature Extraction**: 24 layer outputs → 13,824 features
37
+ - **Classifier**: Linear probe with sigmoid activation
38
+ - **Parameters**: ~60% reduction from base model
39
+
40
+ ## Usage
41
+
42
+ ### ONNX Model (Recommended for Web/Production)
43
+
44
+ ```python
45
+ import onnxruntime as ort
46
+ from transformers import AutoTokenizer
47
+ import numpy as np
48
+
49
+ # Load tokenizer and ONNX model
50
+ tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M")
51
+ session = ort.InferenceSession("model.onnx")
52
+
53
+ def predict(text):
54
+ # Tokenize
55
+ tokens = tokenizer(text, truncation=True, padding='max_length',
56
+ max_length=256, return_tensors="np")
57
+
58
+ # Convert to int64 for ONNX
59
+ feeds = {
60
+ 'input_ids': tokens['input_ids'].astype(np.int64),
61
+ 'attention_mask': tokens['attention_mask'].astype(np.int64)
62
+ }
63
+
64
+ # Run inference
65
+ result = session.run(None, feeds)
66
+ probability = result[0][0]
67
+
68
+ # Interpret (model outputs inverted probabilities)
69
+ human_prob = 1 - probability
70
+ is_human = human_prob > 0.5
71
+
72
+ return {
73
+ 'prediction': 'human' if is_human else 'ai',
74
+ 'human_probability': human_prob,
75
+ 'confidence': abs(human_prob - 0.5) * 2
76
+ }
77
+
78
+ # Example usage
79
+ result = predict("Your text here...")
80
+ print(result)
81
+ ```
82
+
83
+ ### PyTorch Model
84
+
85
+ ```python
86
+ import torch
87
+ from transformers import AutoTokenizer
88
+ import pickle
89
+
90
+ # Load model and tokenizer
91
+ tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M")
92
+ model = torch.load("pytorch_model.pt", map_location='cpu')
93
+ model.eval()
94
+
95
+ def predict_pytorch(text):
96
+ tokens = tokenizer(text, truncation=True, padding='max_length',
97
+ max_length=256, return_tensors="pt")
98
+
99
+ with torch.no_grad():
100
+ probability = model(tokens['input_ids'], tokens['attention_mask']).item()
101
+
102
+ human_prob = 1 - probability # Invert output
103
+ is_human = human_prob > 0.5
104
+
105
+ return {
106
+ 'prediction': 'human' if is_human else 'ai',
107
+ 'human_probability': human_prob,
108
+ 'confidence': abs(human_prob - 0.5) * 2
109
+ }
110
+ ```
111
+
112
+ ## Performance Metrics
113
+
114
+ - **Accuracy**: 96.7%
115
+ - **Inference Time**: 103.1ms (average)
116
+ - **Model Size**: ~60% smaller than base model
117
+ - **Throughput**: ~10 predictions/second
118
+
119
+ ## Training Details
120
+
121
+ The model was trained using a feature extraction approach:
122
+ 1. Extract hidden states from 12 selected layers of SmolLM-135M
123
+ 2. Mean pooling across sequence length with attention masking
124
+ 3. Concatenate features from all layers (13,824 total features)
125
+ 4. Train linear classifier with standardization
126
+ 5. Export to ONNX for optimized inference
127
+
128
+ ## Important Notes
129
+
130
+ ⚠️ **Output Inversion**: This model outputs inverted probabilities. Use `1 - model_output` for human probability.
131
+
132
+ ## Files Included
133
+
134
+ - `model.onnx`: ONNX model for web/production deployment
135
+ - `pytorch_model.pt`: PyTorch model for development
136
+ - `config.json`: Model configuration
137
+ - `deployment_config.json`: Deployment configuration with layer selection
138
+ - `scaler_params.json`: Feature standardization parameters
139
+
140
+ ## License
141
+
142
+ Apache 2.0
143
+
144
+ ## Citation
145
+
146
+ ```bibtex
147
+ @misc{ai-text-detector-optimized,
148
+ title={Ultra-Optimized AI Text Detector},
149
+ author={Your Name},
150
+ year={2024},
151
+ publisher={Hugging Face},
152
+ url={https://huggingface.co/Joshfcooper/ai-text-detector-optimized}
153
+ }
154
+ ```
155
+
156
+ ## Ethical Considerations
157
+
158
+ This model is designed to detect AI-generated text. Please use responsibly and be aware that:
159
+ - No detector is 100% accurate
160
+ - Results should be used as guidance, not definitive proof
161
+ - Consider privacy and consent when analyzing text
162
+ - Be aware of potential biases in training data
benchmark_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_time_seconds": 3.6133105754852295,
3
+ "avg_time_per_prediction_ms": 72.26621150970459,
4
+ "predictions_per_second": 13.837725530495128,
5
+ "model_type": "ONNX",
6
+ "num_tests": 50
7
+ }
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "ai_text_detector",
3
+ "base_model": "HuggingFaceTB/SmolLM-135M",
4
+ "architecture": "feature_extraction_classifier",
5
+ "num_layers_used": 12,
6
+ "total_layers": 30,
7
+ "selected_layers": [
8
+ 0,
9
+ 2,
10
+ 4,
11
+ 6,
12
+ 8,
13
+ 10,
14
+ 12,
15
+ 14,
16
+ 16,
17
+ 18,
18
+ 20,
19
+ 22
20
+ ],
21
+ "feature_size": 13824,
22
+ "sequence_length": 256,
23
+ "compression_ratio": 0.6,
24
+ "output_inverted": true,
25
+ "task": "binary_classification",
26
+ "labels": [
27
+ "ai",
28
+ "human"
29
+ ]
30
+ }
demo.html ADDED
@@ -0,0 +1,676 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>AI Text Detector</title>
7
+ <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script>
8
+ <script type="module" src="https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.2/dist/transformers.min.js"></script>
9
+ <style>
10
+ * {
11
+ margin: 0;
12
+ padding: 0;
13
+ box-sizing: border-box;
14
+ }
15
+
16
+ body {
17
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
18
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
19
+ min-height: 100vh;
20
+ display: flex;
21
+ align-items: center;
22
+ justify-content: center;
23
+ padding: 20px;
24
+ }
25
+
26
+ .container {
27
+ background: white;
28
+ border-radius: 20px;
29
+ box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
30
+ padding: 40px;
31
+ max-width: 800px;
32
+ width: 100%;
33
+ position: relative;
34
+ overflow: hidden;
35
+ }
36
+
37
+ .container::before {
38
+ content: '';
39
+ position: absolute;
40
+ top: 0;
41
+ left: 0;
42
+ right: 0;
43
+ height: 5px;
44
+ background: linear-gradient(90deg, #667eea, #764ba2, #f093fb, #f5576c);
45
+ }
46
+
47
+ h1 {
48
+ text-align: center;
49
+ color: #333;
50
+ margin-bottom: 10px;
51
+ font-size: 2.5em;
52
+ font-weight: 700;
53
+ }
54
+
55
+ .subtitle {
56
+ text-align: center;
57
+ color: #666;
58
+ margin-bottom: 30px;
59
+ font-size: 1.1em;
60
+ }
61
+
62
+ .input-section {
63
+ margin-bottom: 30px;
64
+ }
65
+
66
+ label {
67
+ display: block;
68
+ margin-bottom: 10px;
69
+ color: #333;
70
+ font-weight: 600;
71
+ font-size: 1.1em;
72
+ }
73
+
74
+ textarea {
75
+ width: 100%;
76
+ height: 200px;
77
+ padding: 20px;
78
+ border: 2px solid #e1e5e9;
79
+ border-radius: 15px;
80
+ font-size: 16px;
81
+ line-height: 1.6;
82
+ resize: vertical;
83
+ transition: all 0.3s ease;
84
+ font-family: inherit;
85
+ }
86
+
87
+ textarea:focus {
88
+ outline: none;
89
+ border-color: #667eea;
90
+ box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1);
91
+ }
92
+
93
+ .button-container {
94
+ text-align: center;
95
+ margin: 30px 0;
96
+ }
97
+
98
+ button {
99
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
100
+ color: white;
101
+ border: none;
102
+ padding: 15px 40px;
103
+ border-radius: 50px;
104
+ font-size: 18px;
105
+ font-weight: 600;
106
+ cursor: pointer;
107
+ transition: all 0.3s ease;
108
+ box-shadow: 0 5px 15px rgba(102, 126, 234, 0.3);
109
+ }
110
+
111
+ button:hover {
112
+ transform: translateY(-2px);
113
+ box-shadow: 0 8px 25px rgba(102, 126, 234, 0.4);
114
+ }
115
+
116
+ button:active {
117
+ transform: translateY(0);
118
+ }
119
+
120
+ button:disabled {
121
+ background: #ccc;
122
+ cursor: not-allowed;
123
+ transform: none;
124
+ box-shadow: none;
125
+ }
126
+
127
+ .result {
128
+ margin-top: 30px;
129
+ padding: 25px;
130
+ border-radius: 15px;
131
+ text-align: center;
132
+ transition: all 0.3s ease;
133
+ }
134
+
135
+ .result.human {
136
+ background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
137
+ color: white;
138
+ }
139
+
140
+ .result.ai {
141
+ background: linear-gradient(135deg, #fa709a 0%, #fee140 100%);
142
+ color: white;
143
+ }
144
+
145
+ .result.loading {
146
+ background: linear-gradient(135deg, #ffecd2 0%, #fcb69f 100%);
147
+ color: #333;
148
+ }
149
+
150
+ .result.error {
151
+ background: linear-gradient(135deg, #ff9a9e 0%, #fecfef 100%);
152
+ color: #333;
153
+ }
154
+
155
+ .prediction {
156
+ font-size: 2em;
157
+ font-weight: 700;
158
+ margin-bottom: 10px;
159
+ text-transform: uppercase;
160
+ }
161
+
162
+ .confidence {
163
+ font-size: 1.2em;
164
+ margin-bottom: 10px;
165
+ }
166
+
167
+ .probability {
168
+ font-size: 1em;
169
+ opacity: 0.9;
170
+ }
171
+
172
+ .stats {
173
+ display: grid;
174
+ grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
175
+ gap: 15px;
176
+ margin-top: 20px;
177
+ }
178
+
179
+ .stat {
180
+ text-align: center;
181
+ padding: 15px;
182
+ background: rgba(255, 255, 255, 0.1);
183
+ border-radius: 10px;
184
+ backdrop-filter: blur(10px);
185
+ }
186
+
187
+ .stat-value {
188
+ font-size: 1.5em;
189
+ font-weight: 700;
190
+ display: block;
191
+ }
192
+
193
+ .stat-label {
194
+ font-size: 0.9em;
195
+ opacity: 0.8;
196
+ }
197
+
198
+ .loading-spinner {
199
+ display: inline-block;
200
+ width: 20px;
201
+ height: 20px;
202
+ border: 2px solid #f3f3f3;
203
+ border-top: 2px solid #333;
204
+ border-radius: 50%;
205
+ animation: spin 1s linear infinite;
206
+ }
207
+
208
+ @keyframes spin {
209
+ 0% { transform: rotate(0deg); }
210
+ 100% { transform: rotate(360deg); }
211
+ }
212
+
213
+ .model-info {
214
+ background: #f8f9fa;
215
+ padding: 20px;
216
+ border-radius: 15px;
217
+ margin-bottom: 30px;
218
+ border-left: 5px solid #667eea;
219
+ }
220
+
221
+ .model-info h3 {
222
+ color: #333;
223
+ margin-bottom: 10px;
224
+ }
225
+
226
+ .model-info p {
227
+ color: #666;
228
+ line-height: 1.6;
229
+ }
230
+
231
+ .examples {
232
+ margin-top: 30px;
233
+ display: grid;
234
+ grid-template-columns: 1fr 1fr;
235
+ gap: 20px;
236
+ }
237
+
238
+ .example {
239
+ background: #f8f9fa;
240
+ padding: 15px;
241
+ border-radius: 10px;
242
+ cursor: pointer;
243
+ transition: all 0.3s ease;
244
+ border: 2px solid transparent;
245
+ }
246
+
247
+ .example:hover {
248
+ background: #e9ecef;
249
+ border-color: #667eea;
250
+ }
251
+
252
+ .example h4 {
253
+ color: #333;
254
+ margin-bottom: 10px;
255
+ font-size: 1em;
256
+ }
257
+
258
+ .example p {
259
+ color: #666;
260
+ font-size: 0.9em;
261
+ line-height: 1.4;
262
+ }
263
+
264
+ .status {
265
+ margin-top: 10px;
266
+ padding: 8px 12px;
267
+ border-radius: 8px;
268
+ font-size: 0.9em;
269
+ font-weight: 500;
270
+ }
271
+
272
+ .status.loading {
273
+ background: #fff3cd;
274
+ color: #856404;
275
+ border: 1px solid #ffeaa7;
276
+ }
277
+
278
+ .status.ready {
279
+ background: #d4edda;
280
+ color: #155724;
281
+ border: 1px solid #c3e6cb;
282
+ }
283
+
284
+ .status.processing {
285
+ background: #cce8ff;
286
+ color: #004085;
287
+ border: 1px solid #b3d9ff;
288
+ }
289
+
290
+ .status.error {
291
+ background: #f8d7da;
292
+ color: #721c24;
293
+ border: 1px solid #f5c6cb;
294
+ }
295
+
296
+ .status.complete {
297
+ background: #d1ecf1;
298
+ color: #0c5460;
299
+ border: 1px solid #bee5eb;
300
+ }
301
+
302
+ code {
303
+ background: rgba(0,0,0,0.1);
304
+ padding: 2px 4px;
305
+ border-radius: 3px;
306
+ font-family: monospace;
307
+ font-size: 0.85em;
308
+ }
309
+ .container {
310
+ padding: 20px;
311
+ margin: 10px;
312
+ }
313
+
314
+ h1 {
315
+ font-size: 2em;
316
+ }
317
+
318
+ .stats {
319
+ grid-template-columns: repeat(2, 1fr);
320
+ }
321
+ </style>
322
+ </head>
323
+ <body>
324
+ <div class="container">
325
+ <h1>🤖 AI Text Detector</h1>
326
+ <p class="subtitle">Powered by Ultra-Optimized Neural Networks</p>
327
+
328
+ <div class="model-info">
329
+ <h3>📊 Model Status</h3>
330
+ <div id="status" class="status loading">🔄 Loading model and tokenizer...</div>
331
+ </div>
332
+
333
+ <div class="input-section">
334
+ <label for="textInput">📝 Enter text to analyze:</label>
335
+ <textarea
336
+ id="textInput"
337
+ placeholder="Paste your text here... (minimum 100 characters required for accurate analysis)"
338
+ spellcheck="false"
339
+ ></textarea>
340
+ </div>
341
+
342
+ <div class="button-container">
343
+ <button id="analyzeBtn" onclick="analyzeText()">
344
+ <span id="btnText">🚀 Analyze Text</span>
345
+ <span id="btnSpinner" class="loading-spinner" style="display: none;"></span>
346
+ </button>
347
+ </div>
348
+
349
+ <div id="result" class="result" style="display: none;"></div>
350
+ </div>
351
+
352
+ <script type="module">
353
+ import { AutoTokenizer } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.2/dist/transformers.min.js';
354
+
355
+ let session = null;
356
+ let tokenizer = null;
357
+
358
+ // Initialize ONNX Runtime and load model + tokenizer
359
+ async function initializeModel() {
360
+ try {
361
+ console.log('Loading tokenizer and ONNX model...');
362
+
363
+ // Load the actual tokenizer from HuggingFace Hub
364
+ tokenizer = await AutoTokenizer.from_pretrained('HuggingFaceTB/SmolLM-135M', {
365
+ progress_callback: (progress) => {
366
+ if (progress.status === 'downloading') {
367
+ updateStatus('loading', `📥 Downloading tokenizer: ${progress.name}`);
368
+ }
369
+ }
370
+ });
371
+ console.log('Tokenizer loaded successfully!');
372
+ updateStatus('loading', '🤖 Loading ONNX model...');
373
+
374
+ // Load ONNX model - try multiple possible filenames
375
+ const possibleModelNames = [
376
+ './fixed_optimized_detector.onnx',
377
+ './ultra_optimized_detector.onnx',
378
+ './optimized_detector.onnx',
379
+ './model.onnx'
380
+ ];
381
+
382
+ let modelLoaded = false;
383
+ for (const modelPath of possibleModelNames) {
384
+ try {
385
+ session = await ort.InferenceSession.create(modelPath);
386
+ console.log(`ONNX model loaded successfully from: ${modelPath}`);
387
+ modelLoaded = true;
388
+ break;
389
+ } catch (error) {
390
+ // Only log if it's not a 404 error to reduce console spam
391
+ if (!error.message.includes('failed to load external data file')) {
392
+ console.log(`Failed to load from ${modelPath}:`, error.message);
393
+ }
394
+ }
395
+ }
396
+
397
+ if (!modelLoaded) {
398
+ throw new Error('ONNX model file not found. Please ensure your .onnx file is in the same directory as this HTML file.');
399
+ }
400
+
401
+ console.log('Model inputs:', session.inputNames);
402
+ console.log('Model outputs:', session.outputNames);
403
+
404
+ // Enable the analyze button
405
+ document.getElementById('analyzeBtn').disabled = false;
406
+ updateStatus('ready', '✅ Model loaded and ready!');
407
+
408
+ } catch (error) {
409
+ console.error('Failed to load model:', error);
410
+ updateStatus('error', `❌ Failed to load: ${error.message}`);
411
+
412
+ // Show helpful error message based on the type of error
413
+ if (error.message.includes('tokenizer')) {
414
+ showResult('error', '❌ Failed to load tokenizer. Please check your internet connection.');
415
+ } else if (error.message.includes('ONNX') || error.message.includes('external data')) {
416
+ showResult('error', `❌ ONNX model file not found. Please place your .onnx model file in the same directory as this HTML file. Expected names: ultra_optimized_detector.onnx, fixed_optimized_detector.onnx, optimized_detector.onnx, or model.onnx`);
417
+ } else {
418
+ showResult('error', `❌ Failed to initialize: ${error.message}`);
419
+ }
420
+ }
421
+ }
422
+
423
+ // Tokenize text using the proper tokenizer
424
+ async function tokenizeText(text, maxLength = 256) {
425
+ try {
426
+ // Use the actual tokenizer with proper settings
427
+ const encoded = await tokenizer(text, {
428
+ truncation: true,
429
+ padding: 'max_length',
430
+ max_length: maxLength,
431
+ return_tensors: false // We'll handle tensor creation manually
432
+ });
433
+
434
+ console.log('Encoded result:', encoded);
435
+
436
+ // Handle different possible return formats
437
+ let inputIds, attentionMask;
438
+
439
+ if (encoded.input_ids && Array.isArray(encoded.input_ids)) {
440
+ // Direct array format
441
+ inputIds = encoded.input_ids;
442
+ attentionMask = encoded.attention_mask;
443
+ } else if (encoded.input_ids && encoded.input_ids.data) {
444
+ // Tensor-like format
445
+ inputIds = Array.from(encoded.input_ids.data);
446
+ attentionMask = Array.from(encoded.attention_mask.data);
447
+ } else if (Array.isArray(encoded)) {
448
+ // Sometimes returns just the token IDs
449
+ inputIds = encoded;
450
+ attentionMask = encoded.map(token => token === tokenizer.pad_token_id ? 0 : 1);
451
+ } else {
452
+ throw new Error('Unexpected tokenizer output format');
453
+ }
454
+
455
+ // Ensure we have the right length
456
+ if (inputIds.length !== maxLength) {
457
+ console.warn(`Expected length ${maxLength}, got ${inputIds.length}`);
458
+ // Pad or truncate as needed
459
+ if (inputIds.length < maxLength) {
460
+ const padToken = tokenizer.pad_token_id || 0;
461
+ while (inputIds.length < maxLength) {
462
+ inputIds.push(padToken);
463
+ attentionMask.push(0);
464
+ }
465
+ } else {
466
+ inputIds = inputIds.slice(0, maxLength);
467
+ attentionMask = attentionMask.slice(0, maxLength);
468
+ }
469
+ }
470
+
471
+ return {
472
+ input_ids: inputIds,
473
+ attention_mask: attentionMask
474
+ };
475
+ } catch (error) {
476
+ console.error('Tokenization error:', error);
477
+ throw new Error(`Failed to tokenize text: ${error.message}`);
478
+ }
479
+ }
480
+
481
+ async function analyzeText() {
482
+ const text = document.getElementById('textInput').value.trim();
483
+
484
+ if (!text) {
485
+ showResult('error', 'Please enter some text to analyze.');
486
+ return;
487
+ }
488
+
489
+ if (text.length < 100) {
490
+ showResult('error', 'Please enter at least 100 characters for accurate analysis.');
491
+ return;
492
+ }
493
+
494
+ if (!session || !tokenizer) {
495
+ showResult('error', 'Model or tokenizer not loaded yet. Please wait...');
496
+ return;
497
+ }
498
+
499
+ // Show loading state
500
+ setLoading(true);
501
+ showResult('loading', 'Tokenizing and analyzing text...');
502
+
503
+ try {
504
+ // Tokenize the text using the proper tokenizer
505
+ console.log('Tokenizing text...');
506
+ const tokenized = await tokenizeText(text, 256);
507
+
508
+ console.log('Input IDs length:', tokenized.input_ids.length);
509
+ console.log('Attention mask length:', tokenized.attention_mask.length);
510
+ console.log('Sample tokens:', tokenized.input_ids.slice(0, 10));
511
+ console.log('Sample attention:', tokenized.attention_mask.slice(0, 10));
512
+
513
+ // Validate tokenization
514
+ if (!tokenized.input_ids || !Array.isArray(tokenized.input_ids)) {
515
+ throw new Error('Invalid tokenization: input_ids is not an array');
516
+ }
517
+
518
+ if (!tokenized.attention_mask || !Array.isArray(tokenized.attention_mask)) {
519
+ throw new Error('Invalid tokenization: attention_mask is not an array');
520
+ }
521
+
522
+ if (tokenized.input_ids.length !== 256 || tokenized.attention_mask.length !== 256) {
523
+ throw new Error(`Invalid tokenization: expected length 256, got input_ids: ${tokenized.input_ids.length}, attention_mask: ${tokenized.attention_mask.length}`);
524
+ }
525
+
526
+ // Convert to the correct format for ONNX
527
+ const inputIds = new BigInt64Array(tokenized.input_ids.map(id => BigInt(id)));
528
+ const attentionMask = new BigInt64Array(tokenized.attention_mask.map(mask => BigInt(mask)));
529
+
530
+ // Create ONNX tensors with correct shapes
531
+ const feeds = {
532
+ 'input_ids': new ort.Tensor('int64', inputIds, [1, 256]),
533
+ 'attention_mask': new ort.Tensor('int64', attentionMask, [1, 256])
534
+ };
535
+
536
+ console.log('Running inference...');
537
+ updateStatus('processing', '🧠 Running neural network inference...');
538
+
539
+ // Run inference
540
+ const startTime = performance.now();
541
+ const results = await session.run(feeds);
542
+ const inferenceTime = performance.now() - startTime;
543
+
544
+ console.log('Inference completed in', inferenceTime.toFixed(2), 'ms');
545
+ console.log('Raw output:', results.probability_human.data[0]);
546
+
547
+ const probability = results.probability_human.data[0];
548
+
549
+ // Interpret results - flip the logic since it seems backwards
550
+ const isHuman = probability < 0.5; // Changed from > to <
551
+ const confidence = Math.abs(probability - 0.5) * 2;
552
+
553
+ // Display the corrected probability (1 - probability for human score)
554
+ const humanProbability = 1 - probability;
555
+
556
+ updateStatus('complete', `✅ Analysis complete (${inferenceTime.toFixed(0)}ms)`);
557
+ displayResults(humanProbability, isHuman, confidence, text.length, inferenceTime);
558
+
559
+ } catch (error) {
560
+ console.error('Analysis error:', error);
561
+ updateStatus('error', `❌ Analysis failed: ${error.message}`);
562
+ showResult('error', `Error analyzing text: ${error.message}`);
563
+ } finally {
564
+ setLoading(false);
565
+ }
566
+ }
567
+
568
+ function displayResults(probability, isHuman, confidence, textLength, inferenceTime) {
569
+ const resultDiv = document.getElementById('result');
570
+ const className = isHuman ? 'human' : 'ai';
571
+ const prediction = isHuman ? 'Human Written' : 'AI Generated';
572
+ const icon = isHuman ? '👤' : '🤖';
573
+
574
+ // Calculate token count (approximate)
575
+ const estimatedTokens = Math.ceil(textLength / 4); // Rough estimate
576
+
577
+ resultDiv.className = `result ${className}`;
578
+ resultDiv.style.display = 'block';
579
+
580
+ resultDiv.innerHTML = `
581
+ <div class="prediction">${icon} ${prediction}</div>
582
+ <div class="confidence">Confidence: ${(confidence * 100).toFixed(1)}%</div>
583
+ <div class="probability">Human Probability: ${(probability * 100).toFixed(1)}%</div>
584
+
585
+ <div class="stats">
586
+ <div class="stat">
587
+ <span class="stat-value">${textLength}</span>
588
+ <span class="stat-label">Characters</span>
589
+ </div>
590
+ <div class="stat">
591
+ <span class="stat-value">${estimatedTokens}</span>
592
+ <span class="stat-label">Est. Tokens</span>
593
+ </div>
594
+ <div class="stat">
595
+ <span class="stat-value">${inferenceTime.toFixed(0)}ms</span>
596
+ <span class="stat-label">Inference Time</span>
597
+ </div>
598
+ <div class="stat">
599
+ <span class="stat-value">${(probability * 100).toFixed(0)}%</span>
600
+ <span class="stat-label">Human Score</span>
601
+ </div>
602
+ </div>
603
+
604
+ <div style="margin-top: 15px; padding: 15px; background: rgba(255,255,255,0.1); border-radius: 10px; font-size: 0.9em;">
605
+ <strong>Performance:</strong> ${inferenceTime.toFixed(0)}ms inference time
606
+ </div>
607
+ `;
608
+
609
+ // Scroll to results
610
+ resultDiv.scrollIntoView({ behavior: 'smooth', block: 'nearest' });
611
+ }
612
+
613
+ function showResult(type, message) {
614
+ const resultDiv = document.getElementById('result');
615
+ resultDiv.className = `result ${type}`;
616
+ resultDiv.style.display = 'block';
617
+
618
+ if (type === 'loading') {
619
+ resultDiv.innerHTML = `
620
+ <div style="display: flex; align-items: center; justify-content: center; gap: 10px;">
621
+ <div class="loading-spinner"></div>
622
+ ${message}
623
+ </div>
624
+ `;
625
+ } else {
626
+ resultDiv.innerHTML = `<div>${message}</div>`;
627
+ }
628
+ }
629
+
630
+ function setLoading(isLoading) {
631
+ const btn = document.getElementById('analyzeBtn');
632
+ const btnText = document.getElementById('btnText');
633
+ const btnSpinner = document.getElementById('btnSpinner');
634
+
635
+ btn.disabled = isLoading;
636
+ btnText.style.display = isLoading ? 'none' : 'inline';
637
+ btnSpinner.style.display = isLoading ? 'inline-block' : 'none';
638
+ }
639
+
640
+ function updateStatus(type, message) {
641
+ const statusDiv = document.getElementById('status');
642
+ if (statusDiv) {
643
+ statusDiv.textContent = message;
644
+ statusDiv.className = `status ${type}`;
645
+ }
646
+ }
647
+
648
+ function loadExample(type) {
649
+ const textarea = document.getElementById('textInput');
650
+
651
+ if (type === 'human') {
652
+ textarea.value = "I've been thinking a lot about creativity lately, especially after visiting the local art museum last weekend. There's something deeply moving about standing in front of a painting that someone poured their heart into decades or even centuries ago. The way light hits the canvas, the subtle imperfections in the brushstrokes, the stories hidden in every corner of the composition. It makes me wonder about the artist's life, their struggles, their moments of doubt and breakthrough. Art has this incredible power to transcend time and connect us with people we'll never meet, yet somehow understand on a profound level.";
653
+ } else {
654
+ textarea.value = "Here are the key steps to improve your writing skills: 1) Read extensively across different genres and styles to expand your vocabulary and understanding of various writing techniques. 2) Practice writing regularly, setting aside dedicated time each day for writing exercises or projects. 3) Seek feedback from peers, mentors, or writing groups to identify areas for improvement. 4) Study grammar and style guides to ensure technical accuracy. 5) Revise and edit your work multiple times, focusing on clarity, coherence, and flow. 6) Experiment with different writing formats and styles to find your unique voice. Following these steps consistently will help you develop stronger writing abilities over time.";
655
+ }
656
+
657
+ // Auto-focus the textarea
658
+ textarea.focus();
659
+ }
660
+
661
+ // Handle Enter key in textarea (Shift+Enter for new line, Enter to analyze)
662
+ document.getElementById('textInput').addEventListener('keydown', function(e) {
663
+ if (e.key === 'Enter' && !e.shiftKey) {
664
+ e.preventDefault();
665
+ analyzeText();
666
+ }
667
+ });
668
+
669
+ // Make functions globally available
670
+ window.analyzeText = analyzeText;
671
+
672
+ // Initialize the model when page loads
673
+ window.addEventListener('load', initializeModel);
674
+ </script>
675
+ </body>
676
+ </html>
deployment_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_files": {
3
+ "onnx_model": "fixed_optimized_detector.onnx",
4
+ "pytorch_model": "pytorch_optimized_detector.pt",
5
+ "tokenizer": "."
6
+ },
7
+ "model_config": {
8
+ "base_model_name": "HuggingFaceTB/SmolLM-135M",
9
+ "optimal_layers": [
10
+ 0,
11
+ 2,
12
+ 4,
13
+ 6,
14
+ 8,
15
+ 10,
16
+ 12,
17
+ 14,
18
+ 16,
19
+ 18,
20
+ 20,
21
+ 22
22
+ ],
23
+ "max_length": 256,
24
+ "feature_dim": 13824,
25
+ "layers_loaded": 23,
26
+ "layers_used": 12
27
+ },
28
+ "performance": {
29
+ "accuracy": 0.9667,
30
+ "auc": 0.9934,
31
+ "original_accuracy": 0.997
32
+ },
33
+ "optimization_info": {
34
+ "strategy": "hook_based_truncated",
35
+ "layers_reduction": "30 \u2192 23",
36
+ "features_reduction": "34560 \u2192 13824",
37
+ "onnx_available": true
38
+ }
39
+ }
fixed_optimized_detector.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e638c87ae11cc6f475e08dc0ed1c821a8433c504855d7d497d164aab49d7cf0f
3
+ size 441430808
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_optimized_detector.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8fa69204a7b227bfbc11b356e92bda566f3127cdf2fd2d96814cf4d1cfd19070
3
+ size 439506922
special_tokens_map.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|endoftext|>",
4
+ "<|im_start|>",
5
+ "<|im_end|>",
6
+ "<repo_name>",
7
+ "<reponame>",
8
+ "<file_sep>",
9
+ "<filename>",
10
+ "<gh_stars>",
11
+ "<issue_start>",
12
+ "<issue_comment>",
13
+ "<issue_closed>",
14
+ "<jupyter_start>",
15
+ "<jupyter_text>",
16
+ "<jupyter_code>",
17
+ "<jupyter_output>",
18
+ "<jupyter_script>",
19
+ "<empty_output>"
20
+ ],
21
+ "bos_token": {
22
+ "content": "<|endoftext|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false
27
+ },
28
+ "eos_token": {
29
+ "content": "<|endoftext|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false
34
+ },
35
+ "pad_token": "<|endoftext|>",
36
+ "unk_token": {
37
+ "content": "<|endoftext|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false
42
+ }
43
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "10": {
21
+ "content": "<issue_closed>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "11": {
29
+ "content": "<jupyter_start>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "12": {
37
+ "content": "<jupyter_text>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "13": {
45
+ "content": "<jupyter_code>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "14": {
53
+ "content": "<jupyter_output>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "15": {
61
+ "content": "<jupyter_script>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "16": {
69
+ "content": "<empty_output>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "2": {
77
+ "content": "<|im_end|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "3": {
85
+ "content": "<repo_name>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "4": {
93
+ "content": "<reponame>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "5": {
101
+ "content": "<file_sep>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "6": {
109
+ "content": "<filename>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "7": {
117
+ "content": "<gh_stars>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "8": {
125
+ "content": "<issue_start>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "9": {
133
+ "content": "<issue_comment>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": true
139
+ }
140
+ },
141
+ "additional_special_tokens": [
142
+ "<|endoftext|>",
143
+ "<|im_start|>",
144
+ "<|im_end|>",
145
+ "<repo_name>",
146
+ "<reponame>",
147
+ "<file_sep>",
148
+ "<filename>",
149
+ "<gh_stars>",
150
+ "<issue_start>",
151
+ "<issue_comment>",
152
+ "<issue_closed>",
153
+ "<jupyter_start>",
154
+ "<jupyter_text>",
155
+ "<jupyter_code>",
156
+ "<jupyter_output>",
157
+ "<jupyter_script>",
158
+ "<empty_output>"
159
+ ],
160
+ "bos_token": "<|endoftext|>",
161
+ "clean_up_tokenization_spaces": false,
162
+ "eos_token": "<|endoftext|>",
163
+ "model_max_length": 1000000000000000019884624838656,
164
+ "tokenizer_class": "GPT2Tokenizer",
165
+ "unk_token": "<|endoftext|>",
166
+ "vocab_size": 49152
167
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff