|
<!DOCTYPE html> |
|
<html lang="vi"> |
|
<head> |
|
<meta charset="UTF-8"> |
|
<title>Vietnamese NLP: POS Tagging Benchmarks</title> |
|
<meta name="viewport" content="width=device-width, initial-scale=1"> |
|
<style> |
|
body { font-family: 'Segoe UI', Arial, sans-serif; margin: 0; background: #f6faff; color: #222; } |
|
.container { max-width: 980px; margin: 40px auto; padding: 20px 28px; background: #fff; border-radius: 16px; box-shadow: 0 2px 12px #0001;} |
|
h1 { color: #154e9e; font-size: 2.2rem; margin-bottom: 0.25em;} |
|
h2 { color: #198754; border-left: 5px solid #b3d1ff; padding-left: 10px;} |
|
h3 { color: #212529; margin-top: 2em;} |
|
table { width: 100%; border-collapse: collapse; margin-top: 16px; margin-bottom: 20px; } |
|
th, td { padding: 8px 12px; text-align: left; border-bottom: 1px solid #eee; } |
|
th { background: #eaf1fb; font-weight: bold; } |
|
tr:hover { background: #f5faff;} |
|
a { color: #2766cc; text-decoration: none; } |
|
a:hover { text-decoration: underline; } |
|
.note { color: #444; background: #f3f8ff; border-left: 4px solid #85b9ff; padding: 7px 18px; margin: 15px 0;} |
|
.icon { font-size: 1.1em; margin-right: 6px; } |
|
.section { margin-bottom: 2.2em; } |
|
.papers-list, .tools-list { margin: 0 0 1.5em 0; padding: 0; list-style: none;} |
|
.papers-list li, .tools-list li { margin: 0.3em 0;} |
|
.tools-list code { background: #e0e6ed; border-radius: 4px; padding: 1px 4px; } |
|
@media (max-width: 700px) { |
|
.container { padding: 8px;} |
|
table, th, td { font-size: 15px;} |
|
} |
|
</style> |
|
</head> |
|
<body> |
|
<div class="container"> |
|
<h1>π Vietnamese NLP β POS Tagging Benchmarks & Resources</h1> |
|
<div class="section"> |
|
<h2>1. VLSP 2013 POS Tagging</h2> |
|
<div class="note"> |
|
<span class="icon">π</span> |
|
<b>Dataset:</b> 27,000+ sentences for training, 870 dev, 2,120 test (from VLSP 2013 Shared Task) |
|
</div> |
|
<table> |
|
<tr> |
|
<th>Model</th> |
|
<th>Accuracy</th> |
|
<th>Method / Reference</th> |
|
<th>Code</th> |
|
</tr> |
|
<tr> |
|
<td>PhoBERT-large</td> |
|
<td>96.8</td> |
|
<td><a href="https://arxiv.org/abs/2003.00744">Nguyen et al. ArXiv'20</a></td> |
|
<td><a href="https://github.com/VinAIResearch/PhoBERT">Official</a></td> |
|
</tr> |
|
<tr> |
|
<td>vELECTRA</td> |
|
<td>96.77</td> |
|
<td><a href="https://arxiv.org/abs/2006.15994">Bui et al. ArXiv'20</a></td> |
|
<td><a href="https://github.com/fpt-corp/viBERT">Official</a></td> |
|
</tr> |
|
<tr> |
|
<td>PhoBERT-base</td> |
|
<td>96.7</td> |
|
<td><a href="https://arxiv.org/abs/2003.00744">Nguyen et al. ArXiv'20</a></td> |
|
<td><a href="https://github.com/VinAIResearch/PhoBERT">Official</a></td> |
|
</tr> |
|
<tr> |
|
<td>VnMarMoT</td> |
|
<td>95.88</td> |
|
<td><a href="http://aclweb.org/anthology/N18-5012">Nguyen et al. NAACL'18</a></td> |
|
<td><a href="https://github.com/vncorenlp/VnCoreNLP">Official</a></td> |
|
</tr> |
|
<tr> |
|
<td>BiLSTM-CRFs + CNN-char</td> |
|
<td>95.40</td> |
|
<td><a href="http://aclweb.org/anthology/N18-5012">Ma et al. ACL'16</a></td> |
|
<td><a href="https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/">Link</a></td> |
|
</tr> |
|
<tr> |
|
<td>BiLSTM-CRF + LSTM-char</td> |
|
<td>95.31</td> |
|
<td><a href="http://www.aclweb.org/anthology/N16-1030">Lample et al. NAACL'16</a></td> |
|
<td><a href="https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/">Link</a></td> |
|
</tr> |
|
<tr> |
|
<td>BiLSTM-CRF</td> |
|
<td>95.31</td> |
|
<td><a href="https://arxiv.org/abs/1508.01991">Huang et al. ArXiv'15</a></td> |
|
<td><a href="https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/">Link</a></td> |
|
</tr> |
|
<tr> |
|
<td>RDRPOSTagger</td> |
|
<td>95.11</td> |
|
<td><a href="https://www.researchgate.net/publication/279916333_RDRPOSTagger_A_Ripple_Down_Rules-based_Part-Of-Speech_Tagger">Nguyen et al. EACL'14</a></td> |
|
<td><a href="https://github.com/datquocnguyen/rdrpostagger">Official</a></td> |
|
</tr> |
|
<tr> |
|
<td>JointWPD</td> |
|
<td>94.03</td> |
|
<td><a href="https://arxiv.org/pdf/1812.11459.pdf">Nguyen et al. '18</a></td> |
|
<td></td> |
|
</tr> |
|
</table> |
|
</div> |
|
|
|
<div class="section"> |
|
<h2>2. VietTreeBank</h2> |
|
<div class="note"> |
|
<span class="icon">π</span> |
|
<b>Paper:</b> <a href="https://hal.inria.fr/inria-00421103v2/document">VietTreeBank Paper</a> <br> |
|
<b>Dataset:</b> train: 7,268 | dev: 1,038 | test: 2,077 sentences |
|
</div> |
|
<table> |
|
<tr> |
|
<th>Model</th> |
|
<th>Accuracy</th> |
|
<th>Method</th> |
|
<th>Code</th> |
|
<th>Note</th> |
|
</tr> |
|
<tr> |
|
<td>BiLSTM-CRFs</td> |
|
<td>93.52</td> |
|
<td><a href="https://arxiv.org/pdf/1811.03754.pdf">Nguyen et al. '18</a></td> |
|
<td><a href="https://github.com/duongna21/VNsequencelabeling">Official</a></td> |
|
<td>10-fold CV</td> |
|
</tr> |
|
<tr> |
|
<td>VNTagger</td> |
|
<td>93.40</td> |
|
<td><a href="https://hal.inria.fr/inria-00526139/document">Le et al. TALN'10</a></td> |
|
<td><a href="http://mim.hus.vnu.edu.vn/dsl/tools/tagger">Official</a></td> |
|
<td>10-fold CV</td> |
|
</tr> |
|
<tr> |
|
<td>RDRPOSTagger</td> |
|
<td>91.96</td> |
|
<td><a href="http://aclweb.org/anthology/I17-3010">Pham et al. IJCNLP'17</a></td> |
|
<td><a href="https://github.com/datquocnguyen/RDRPOSTagger">Official</a></td> |
|
<td>5-fold CV</td> |
|
</tr> |
|
<tr> |
|
<td>NNVLP</td> |
|
<td>91.92</td> |
|
<td><a href="http://aclweb.org/anthology/I17-3010">Pham et al. IJCNLP'17</a></td> |
|
<td><a href="https://github.com/pth1993/NNVLP">Official</a></td> |
|
<td>5-fold CV</td> |
|
</tr> |
|
<tr> |
|
<td>vTools</td> |
|
<td>90.73</td> |
|
<td><a href="https://drive.google.com/file/d/1V06YfENrguQk2SRJFbpwWzapxpgPPaPS/view?usp=sharing">Tran et al. VLSP'13</a></td> |
|
<td><a href="https://github.com/lupanh/vTools">Official</a></td> |
|
<td></td> |
|
</tr> |
|
<tr> |
|
<td>Vitk</td> |
|
<td>88.41</td> |
|
<td></td> |
|
<td><a href="https://github.com/phuonglh/vn.vitk">Official</a></td> |
|
<td></td> |
|
</tr> |
|
</table> |
|
</div> |
|
|
|
<div class="section"> |
|
<h2>3. Social Media POS Tagging</h2> |
|
<ul class="papers-list"> |
|
<li>π <a href="https://www.researchgate.net/publication/309176280_Vietnamese_POS_Tagging_for_Social_Media_Text">Vietnamese POS Tagging for Social Media Text - Ngo et al. 2016</a></li> |
|
<li>π <a href="https://www.researchgate.net/publication/335361630_A_POS_Tagging_Model_for_Vietnamese_Social_Media_Text_Using_BiLSTM-CRF_with_Rich_Features">A POS Tagging Model for Vietnamese Social Media Text Using BiLSTM-CRF with Rich Features - Ngo et al. 2019</a></li> |
|
<li>π <a href="https://www.researchgate.net/publication/321940724_An_Empirical_Study_on_POS_Tagging_for_Vietnamese_Social_Media_Text">An Empirical Study on POS Tagging for Vietnamese Social Media Text - Ngo et al. 2017</a></li> |
|
</ul> |
|
</div> |
|
|
|
<div class="section"> |
|
<h2>4. Miscellaneous Papers & Datasets</h2> |
|
<ul class="papers-list"> |
|
<li>π <a href="https://drive.google.com/file/d/1V6zFx7p-tLV6ZRiyLhVvbjI12PKyQnmF/view?usp=sharing">Nguyen et al. NICS'18 β Building Vietnamese Linguistic Resources for Social Network Text Analysis</a></li> |
|
<li>π <a href="https://arxiv.org/pdf/1711.04951.pdf">Nguyen et al. ALTA'17</a></li> |
|
<li>π <a href="https://arxiv.org/pdf/1412.4021.pdf">Nguyen et al. 2015</a></li> |
|
<li>π <a href="http://www.aclweb.org/anthology/E14-2005">Nguyen et al. 2014</a></li> |
|
<li>π <a href="https://link.springer.com/chapter/10.1007/978-3-642-19400-9_15">Nguyen et al. 2011</a></li> |
|
<li>π <a href="http://ieeexplore.ieee.org/document/6063458/?reload=true">Nguyen et al. 2011</a></li> |
|
<li>π <a href="http://www.aclweb.org/anthology/I11-1035">Nguyen et al. 2010</a></li> |
|
<li>π <a href="https://www.researchgate.net/publication/309176280_Vietnamese_POS_Tagging_for_Social_Media_Text">Ngo et al. 2016</a></li> |
|
<li>π <a href="http://www.jaist.ac.jp/~bao/VLSP-text/ICTrda08/ICT08-VLSP-SP83.pdf">Phan et al. 2008</a></li> |
|
<li>π <a href="http://www.vnulib.edu.vn:8000/dspace/bitstream/123456789/1801/1/sedev0206-02.pdf">Nguyen et al. 2006</a></li> |
|
<li>π <a href="http://www.vietlex.com/xu-li-ngon-ngu/50-A_Case_Study_in_POS_Tagging_of_Vietnamese_Texts">Nguyen et al. 2003</a></li> |
|
</ul> |
|
</div> |
|
|
|
<div class="section"> |
|
<h2>5. Tools, Demos & Open Source Code</h2> |
|
<ul class="tools-list"> |
|
<li>π <a href="http://doc.openfpt.vn/#vietnamese-accentizer">OpenFPT: Vietnamese Accentizer</a></li> |
|
<li>π <a href="https://github.com/vncorenlp/VnCoreNLP">vncorenlp/VnCoreNLP</a> <code>java</code></li> |
|
<li>π <a href="https://github.com/pth1993/NNVLP">pth1993/NNVLP</a> <code>python,bash</code></li> |
|
<li>π <a href="https://pypi.python.org/pypi/pyvi">pyvi</a> <code>python</code></li> |
|
<li>π <a href="https://github.com/phuonglh/vn.vitk">Vitk</a> <code>java</code></li> |
|
<li>π <a href="https://github.com/kanjirz50/viet-morphological-analysis-crf">viet-morphological-analysis-crf</a> <code>python</code> (<a href="http://160.16.58.116/vietnamese/morph_crf">demo</a>)</li> |
|
<li>π <a href="https://github.com/lupanh/vTools">lupanh/vTools</a> <code>python</code></li> |
|
<li>π <a href="https://github.com/truongdo/vita">truongdo/vita</a> <code>c++</code></li> |
|
<li>π <a href="http://rdrpostagger.sourceforge.net/">RDRPOSTagger</a> <code>python</code></li> |
|
<li>π <a href="http://vlsp.hpda.vn:8080/demo/?page=resources">vnTagger</a> <code>java</code></li> |
|
</ul> |
|
</div> |
|
</div> |
|
</body> |
|
</html> |
|
|