Tagging / index.html
CVNSS's picture
Update index.html
6d31cd8 verified
<!DOCTYPE html>
<html lang="vi">
<head>
<meta charset="UTF-8">
<title>Vietnamese NLP: POS Tagging Benchmarks</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
body { font-family: 'Segoe UI', Arial, sans-serif; margin: 0; background: #f6faff; color: #222; }
.container { max-width: 980px; margin: 40px auto; padding: 20px 28px; background: #fff; border-radius: 16px; box-shadow: 0 2px 12px #0001;}
h1 { color: #154e9e; font-size: 2.2rem; margin-bottom: 0.25em;}
h2 { color: #198754; border-left: 5px solid #b3d1ff; padding-left: 10px;}
h3 { color: #212529; margin-top: 2em;}
table { width: 100%; border-collapse: collapse; margin-top: 16px; margin-bottom: 20px; }
th, td { padding: 8px 12px; text-align: left; border-bottom: 1px solid #eee; }
th { background: #eaf1fb; font-weight: bold; }
tr:hover { background: #f5faff;}
a { color: #2766cc; text-decoration: none; }
a:hover { text-decoration: underline; }
.note { color: #444; background: #f3f8ff; border-left: 4px solid #85b9ff; padding: 7px 18px; margin: 15px 0;}
.icon { font-size: 1.1em; margin-right: 6px; }
.section { margin-bottom: 2.2em; }
.papers-list, .tools-list { margin: 0 0 1.5em 0; padding: 0; list-style: none;}
.papers-list li, .tools-list li { margin: 0.3em 0;}
.tools-list code { background: #e0e6ed; border-radius: 4px; padding: 1px 4px; }
@media (max-width: 700px) {
.container { padding: 8px;}
table, th, td { font-size: 15px;}
}
</style>
</head>
<body>
<div class="container">
<h1>πŸ“ Vietnamese NLP – POS Tagging Benchmarks & Resources</h1>
<div class="section">
<h2>1. VLSP 2013 POS Tagging</h2>
<div class="note">
<span class="icon">πŸ“Š</span>
<b>Dataset:</b> 27,000+ sentences for training, 870 dev, 2,120 test (from VLSP 2013 Shared Task)
</div>
<table>
<tr>
<th>Model</th>
<th>Accuracy</th>
<th>Method / Reference</th>
<th>Code</th>
</tr>
<tr>
<td>PhoBERT-large</td>
<td>96.8</td>
<td><a href="https://arxiv.org/abs/2003.00744">Nguyen et al. ArXiv'20</a></td>
<td><a href="https://github.com/VinAIResearch/PhoBERT">Official</a></td>
</tr>
<tr>
<td>vELECTRA</td>
<td>96.77</td>
<td><a href="https://arxiv.org/abs/2006.15994">Bui et al. ArXiv'20</a></td>
<td><a href="https://github.com/fpt-corp/viBERT">Official</a></td>
</tr>
<tr>
<td>PhoBERT-base</td>
<td>96.7</td>
<td><a href="https://arxiv.org/abs/2003.00744">Nguyen et al. ArXiv'20</a></td>
<td><a href="https://github.com/VinAIResearch/PhoBERT">Official</a></td>
</tr>
<tr>
<td>VnMarMoT</td>
<td>95.88</td>
<td><a href="http://aclweb.org/anthology/N18-5012">Nguyen et al. NAACL'18</a></td>
<td><a href="https://github.com/vncorenlp/VnCoreNLP">Official</a></td>
</tr>
<tr>
<td>BiLSTM-CRFs + CNN-char</td>
<td>95.40</td>
<td><a href="http://aclweb.org/anthology/N18-5012">Ma et al. ACL'16</a></td>
<td><a href="https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/">Link</a></td>
</tr>
<tr>
<td>BiLSTM-CRF + LSTM-char</td>
<td>95.31</td>
<td><a href="http://www.aclweb.org/anthology/N16-1030">Lample et al. NAACL'16</a></td>
<td><a href="https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/">Link</a></td>
</tr>
<tr>
<td>BiLSTM-CRF</td>
<td>95.31</td>
<td><a href="https://arxiv.org/abs/1508.01991">Huang et al. ArXiv'15</a></td>
<td><a href="https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/">Link</a></td>
</tr>
<tr>
<td>RDRPOSTagger</td>
<td>95.11</td>
<td><a href="https://www.researchgate.net/publication/279916333_RDRPOSTagger_A_Ripple_Down_Rules-based_Part-Of-Speech_Tagger">Nguyen et al. EACL'14</a></td>
<td><a href="https://github.com/datquocnguyen/rdrpostagger">Official</a></td>
</tr>
<tr>
<td>JointWPD</td>
<td>94.03</td>
<td><a href="https://arxiv.org/pdf/1812.11459.pdf">Nguyen et al. '18</a></td>
<td></td>
</tr>
</table>
</div>
<div class="section">
<h2>2. VietTreeBank</h2>
<div class="note">
<span class="icon">πŸ“</span>
<b>Paper:</b> <a href="https://hal.inria.fr/inria-00421103v2/document">VietTreeBank Paper</a> <br>
<b>Dataset:</b> train: 7,268 | dev: 1,038 | test: 2,077 sentences
</div>
<table>
<tr>
<th>Model</th>
<th>Accuracy</th>
<th>Method</th>
<th>Code</th>
<th>Note</th>
</tr>
<tr>
<td>BiLSTM-CRFs</td>
<td>93.52</td>
<td><a href="https://arxiv.org/pdf/1811.03754.pdf">Nguyen et al. '18</a></td>
<td><a href="https://github.com/duongna21/VNsequencelabeling">Official</a></td>
<td>10-fold CV</td>
</tr>
<tr>
<td>VNTagger</td>
<td>93.40</td>
<td><a href="https://hal.inria.fr/inria-00526139/document">Le et al. TALN'10</a></td>
<td><a href="http://mim.hus.vnu.edu.vn/dsl/tools/tagger">Official</a></td>
<td>10-fold CV</td>
</tr>
<tr>
<td>RDRPOSTagger</td>
<td>91.96</td>
<td><a href="http://aclweb.org/anthology/I17-3010">Pham et al. IJCNLP'17</a></td>
<td><a href="https://github.com/datquocnguyen/RDRPOSTagger">Official</a></td>
<td>5-fold CV</td>
</tr>
<tr>
<td>NNVLP</td>
<td>91.92</td>
<td><a href="http://aclweb.org/anthology/I17-3010">Pham et al. IJCNLP'17</a></td>
<td><a href="https://github.com/pth1993/NNVLP">Official</a></td>
<td>5-fold CV</td>
</tr>
<tr>
<td>vTools</td>
<td>90.73</td>
<td><a href="https://drive.google.com/file/d/1V06YfENrguQk2SRJFbpwWzapxpgPPaPS/view?usp=sharing">Tran et al. VLSP'13</a></td>
<td><a href="https://github.com/lupanh/vTools">Official</a></td>
<td></td>
</tr>
<tr>
<td>Vitk</td>
<td>88.41</td>
<td></td>
<td><a href="https://github.com/phuonglh/vn.vitk">Official</a></td>
<td></td>
</tr>
</table>
</div>
<div class="section">
<h2>3. Social Media POS Tagging</h2>
<ul class="papers-list">
<li>πŸ“„ <a href="https://www.researchgate.net/publication/309176280_Vietnamese_POS_Tagging_for_Social_Media_Text">Vietnamese POS Tagging for Social Media Text - Ngo et al. 2016</a></li>
<li>πŸ“„ <a href="https://www.researchgate.net/publication/335361630_A_POS_Tagging_Model_for_Vietnamese_Social_Media_Text_Using_BiLSTM-CRF_with_Rich_Features">A POS Tagging Model for Vietnamese Social Media Text Using BiLSTM-CRF with Rich Features - Ngo et al. 2019</a></li>
<li>πŸ“„ <a href="https://www.researchgate.net/publication/321940724_An_Empirical_Study_on_POS_Tagging_for_Vietnamese_Social_Media_Text">An Empirical Study on POS Tagging for Vietnamese Social Media Text - Ngo et al. 2017</a></li>
</ul>
</div>
<div class="section">
<h2>4. Miscellaneous Papers & Datasets</h2>
<ul class="papers-list">
<li>πŸ“„ <a href="https://drive.google.com/file/d/1V6zFx7p-tLV6ZRiyLhVvbjI12PKyQnmF/view?usp=sharing">Nguyen et al. NICS'18 – Building Vietnamese Linguistic Resources for Social Network Text Analysis</a></li>
<li>πŸ“„ <a href="https://arxiv.org/pdf/1711.04951.pdf">Nguyen et al. ALTA'17</a></li>
<li>πŸ“„ <a href="https://arxiv.org/pdf/1412.4021.pdf">Nguyen et al. 2015</a></li>
<li>πŸ“„ <a href="http://www.aclweb.org/anthology/E14-2005">Nguyen et al. 2014</a></li>
<li>πŸ“„ <a href="https://link.springer.com/chapter/10.1007/978-3-642-19400-9_15">Nguyen et al. 2011</a></li>
<li>πŸ“„ <a href="http://ieeexplore.ieee.org/document/6063458/?reload=true">Nguyen et al. 2011</a></li>
<li>πŸ“„ <a href="http://www.aclweb.org/anthology/I11-1035">Nguyen et al. 2010</a></li>
<li>πŸ“„ <a href="https://www.researchgate.net/publication/309176280_Vietnamese_POS_Tagging_for_Social_Media_Text">Ngo et al. 2016</a></li>
<li>πŸ“„ <a href="http://www.jaist.ac.jp/~bao/VLSP-text/ICTrda08/ICT08-VLSP-SP83.pdf">Phan et al. 2008</a></li>
<li>πŸ“„ <a href="http://www.vnulib.edu.vn:8000/dspace/bitstream/123456789/1801/1/sedev0206-02.pdf">Nguyen et al. 2006</a></li>
<li>πŸ“„ <a href="http://www.vietlex.com/xu-li-ngon-ngu/50-A_Case_Study_in_POS_Tagging_of_Vietnamese_Texts">Nguyen et al. 2003</a></li>
</ul>
</div>
<div class="section">
<h2>5. Tools, Demos & Open Source Code</h2>
<ul class="tools-list">
<li>πŸ”— <a href="http://doc.openfpt.vn/#vietnamese-accentizer">OpenFPT: Vietnamese Accentizer</a></li>
<li>πŸ”— <a href="https://github.com/vncorenlp/VnCoreNLP">vncorenlp/VnCoreNLP</a> <code>java</code></li>
<li>πŸ”— <a href="https://github.com/pth1993/NNVLP">pth1993/NNVLP</a> <code>python,bash</code></li>
<li>πŸ”— <a href="https://pypi.python.org/pypi/pyvi">pyvi</a> <code>python</code></li>
<li>πŸ”— <a href="https://github.com/phuonglh/vn.vitk">Vitk</a> <code>java</code></li>
<li>πŸ”— <a href="https://github.com/kanjirz50/viet-morphological-analysis-crf">viet-morphological-analysis-crf</a> <code>python</code> (<a href="http://160.16.58.116/vietnamese/morph_crf">demo</a>)</li>
<li>πŸ”— <a href="https://github.com/lupanh/vTools">lupanh/vTools</a> <code>python</code></li>
<li>πŸ”— <a href="https://github.com/truongdo/vita">truongdo/vita</a> <code>c++</code></li>
<li>πŸ”— <a href="http://rdrpostagger.sourceforge.net/">RDRPOSTagger</a> <code>python</code></li>
<li>πŸ”— <a href="http://vlsp.hpda.vn:8080/demo/?page=resources">vnTagger</a> <code>java</code></li>
</ul>
</div>
</div>
</body>
</html>