mali90 commited on
Commit
42c2c6f
·
verified ·
1 Parent(s): edb7a23

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +46 -0
index.html CHANGED
@@ -37,6 +37,52 @@
37
  </div>
38
  </section>
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  <section class="section">
41
  <div class="container content">
42
  <h2 class="title is-3">🧩 Main Pipeline Steps</h2>
 
37
  </div>
38
  </section>
39
 
40
+ <section class="section">
41
+ <div class="container">
42
+ <h2 class="title is-3">📊 Results</h2>
43
+ <div class="highlight-box">
44
+ <p><strong>✔️ Accuracy</strong></p>
45
+ <ul>
46
+ <li>Spearman’s ρ > 0.87 with human ground truth</li>
47
+ </ul>
48
+ </div>
49
+ <div class="highlight-box">
50
+ <p><strong>📈 Downstream LLM Training Impact</strong></p>
51
+ <ul>
52
+ <li>+7.2% benchmark performance improvement</li>
53
+ <li>+4.8% token retention compared to FineWeb2 heuristic filter</li>
54
+ <li>Reliable thresholding with 0.6 and 0.7 quantiles</li>
55
+ </ul>
56
+ </div>
57
+ <div class="highlight-box">
58
+ <p><strong>⚡ Annotation Speed</strong></p>
59
+ <ul>
60
+ <li>~11,000 docs/min (on A100 GPU, avg. 690 tokens per doc)</li>
61
+ </ul>
62
+ </div>
63
+ </div>
64
+ </section>
65
+
66
+ <section class="section">
67
+ <div class="container">
68
+ <h2 class="title is-3">📁 Available Artifacts</h2>
69
+ <div class="highlight-box">
70
+ <ul>
71
+ <li>📄 Ground truth annotations in <strong>35 languages</strong></li>
72
+ <li>🧠 Synthetic LLM-annotated dataset (<strong>14M+ documents</strong>)</li>
73
+ <li>🪶 Lightweight annotation models:
74
+ <ul>
75
+ <li>JQL-Gemma</li>
76
+ <li>JQL-Mistral</li>
77
+ <li>JQL-Llama</li>
78
+ </ul>
79
+ </li>
80
+ <li>🛠️ Training & inference scripts <em>(coming soon)</em></li>
81
+ </ul>
82
+ </div>
83
+ </div>
84
+ </section>
85
+
86
  <section class="section">
87
  <div class="container content">
88
  <h2 class="title is-3">🧩 Main Pipeline Steps</h2>