-
Notifications
You must be signed in to change notification settings - Fork 5
POSTAGGER BASELINE(using LTP)
We using LTP postagger result as the baseline .
###Acuracy
dataset | accuracy | sentence number | tokens number | time cost(s) |
---|---|---|---|---|
pku-weibo-holdout | 96.7452% | 8,000 | 172,054 | 1.48 |
pku-weibo-test | 96.7364% | 12,500 | - | 2.34 |
pku-holdout | 98.3586% | 5,000 | 114,293 | 1.05 |
pku-test | 98.3456% | 7,500 | - | 1.58 |
weibo-holdout | 93.5527% | 3,000 | 57,761 | 0.57 |
weibo-test | 93.8329% | 5,000 | - | 0.91 |
using the existed LTP model while evaluation on the fixed gold data .
updated ! using LTP model 3.3.1 , using
otpos
(single process) , atnode GPU05
with CPUIntel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz(48 processors)
and Total Memory251.64 GB
###Speed
about 118.53 K tokens/s
updated .
使用50k句子、共996,862词的wiki数据做测试(运行环境与上述相同) , 运行时间为8.41 s , 由此计算速度为: 996,862. / 1000 / 8.41 = 118.53 K tokens/s
###Error stat*
####PKU-holdout
top 10 tags pair ( ture_tag => predict_tag )for predicting error
v=>n 314
n=>v 219
v=>p 122
v=>a 115
p=>v 95
a=>v 76
d=>c 57
c=>p 56
p=>c 56
n=>a 55
top 10 words for most freqently predict error
与 38
为 35
在 33
又 30
多 26
到 24
将 23
由于 21
以 19
作为 19
top 10 words and tags pair(word:true_tag => predict_tag) for predicting error
在:v=>p 21
与:p=>c 20
为:v=>p 20
又:d=>c 20
到:p=>v 19
由于:c=>p 17
与:c=>p 16
为:p=>v 15
以:p=>c 13
和:p=>c 13
###WEIBO-HOLDOUT
top 10 tags pair ( ture_tag => predict_tag )for predicting error
n=>v 826
v=>n 247
d=>a 220
n=>a 180
nz=>n 95
v=>a 87
a=>v 77
nh=>n 72
a=>n 71
d=>v 71
top 10 words for most freqently predict error
生活 31
因为 23
哦 22
工作 22
多 21
给 21
在 19
好 17
成功 16
为 16
与 16
top 10 words and tags pair(word:true_tag => predict_tag) for predicting error
生活:n=>v 30
工作:n=>v 22
因为:c=>p 18
哦:e=>u 14
为:v=>p 13
给:v=>p 12
爱:n=>v 12
用:v=>p 10
评论:n=>v 10
好:d=>a 9
正式:d=>a 9
给:p=>v 9
服务:n=>v 9
###PKU-WEIBO-holdout (merging)
top 10 tags pair ( ture_tag => predict_tag )for predicting error
n=>v 1045
v=>n 561
n=>a 235
d=>a 234
v=>a 202
v=>p 191
a=>v 153
p=>v 135
a=>n 122
d=>v 116
top 10 words for most freqently predict error
与 54
在 52
为 51
多 47
给 39
因为 37
又 37
将 34
到 33
生活 31
top 10 words and tags pair(word:true_tag => predict_tag) for predicting error
为:v=>p 33
生活:n=>v 30
与:p=>c 28
因为:c=>p 28
又:d=>c 27
在:v=>p 24
工作:n=>v 24
与:c=>p 23
给:v=>p 22
到:p=>v 22
- 未更新,[TODO]
基于神经网络的序列标注任务 - WIKI (wiki语法见gollum)