-
Notifications
You must be signed in to change notification settings - Fork 5
Experiments on Postagger
Experiments' result in detail on Postagger is recorded as following :
##Dataset
using PKU-WEIBO
as the training , developing , and test data . Detail infomation is following :
dataset | sentence number | word-tags pair(tokens) number |
---|---|---|
Training | 337,422 | 7,720,621 |
Developing | 8,000 | 172,054 |
Test | 12,500 | 271,786 |
##Model Structure Info
KEY | VALUE | notes |
---|---|---|
word dict size | 176,047 | build from training data |
word embedding dimension | 50 | various on different experiment |
BI-LSTM hidden Layer dimension | 100 | - |
BI-LSTM stacked layers number | 1 | - |
Tag hidden dimension | 32 | - |
Tag output dimension(tag number) | 28 | build from training data |
##Experiments
###word embedding with Randomized Initialization
####Train
Training Epoch | Training Acc | validation Acc | Speed(seconds / 10k samples) | Memory cost |
---|---|---|---|---|
1 | 94.07% | 94.75% | 120.71* | - |
2 | 96.78% | 95.44% | 119.58 | - |
3 | 97.32% | 95.62% | 117.39 | - |
4 | 97.62% | 95.96% | 115.05 | - |
5 | 97.82% | 96.04% | 116.06 | - |
6 | 97.95% | 96.02% | 113.92 | - |
7 | 98.06% | 96.12% | 114.01 | - |
* Runing on Node01
####Test
ACC = 96.0925 %
using the best model on developing , we got accuracy on relating dataset .
@PKU
develop acc : 97.8118 %
test acc : 97.8933 %
develop acc : 92.7685 %
test acc : 92.8434 %
###word embedding loading from gigawords
####Train
Hit Rate
gigawords embedding number | model word dict size | hit rate(load successfully) |
---|---|---|
335,696(0.34 M) | 176,047(0.18M) | 95,359/176,047(54.17%) |
Result
Training Epoch | Training Acc | validation Acc | Speed(seconds / 10k samples) | Memory cost |
---|---|---|---|---|
1 | 93.97% | 93.60% | 94.01* | - |
2 | 95.65% | 94.45% | 98.13 | - |
3 | 96.23% | 94.67% | 111.09 | - |
4 | 95.56% | 95.04% | 102.25 | - |
5 | 96.80% | 95.20% | 102.04 | - |
6 | 96.95% | 95.16% | 101.56 | - |
7 | 97.07% | 95.28% | 100.76 | - |
* runing on Node05
####Test
ACC = 95.3254 %
####Appending Test
@PKU
devel acc : 97.1792 %
test acc : 97.2476 %
devel acc : 91.5375 %
test acc : 91.857 %
Analysis
The accuracy is totally lower than model with randomized initialization .
May be , only half word embedding hit rate cause the strange result . The conflict came into being with combination of pre-trained word embedding and randomized initialization .
####Train
Hit Rate
gigawords embedding number | model word dict size | hit rate(load successfully) |
---|---|---|
1,354,247(1.35 M) | 176,047(0.18M) | 119,245/176,047(67.73%) |
Result
Training Epoch | Training Acc | validation Acc | Speed(seconds / 10k samples) | Memory cost |
---|---|---|---|---|
1 | 93.70% | 93.02% | 177.61* | - |
2 | 95.14% | 93.81% | 202.71 | - |
3 | 95.68% | 94.17% | 213.92 | - |
4 | 96.03% | 94.41% | 215.28 | - |
5 | 96.27% | 94.52% | 220.29 | - |
6 | 96.46% | 94.55% | 220.76 | - |
7 | 96.61% | **94.68% ** | 215.25 | - |
* runing on Node06
####Test
ACC = 94.6491 %
####Appending Test
@PKU
devel acc : 96.7085 %
test acc : 96.7084 %
devel acc : 90.6736 %
test acc : 90.9336 %
Analysis
It is so lowly on Node06 , so as the bad accuracy ....
##result summarizing
word embedding initialization method | hit rate | train acc* | best devel acc(@PKU-WEIBO) | test acc(@PKU-WEIBO) | devel acc(@PKU) | test acc(@PKU) | devel acc(@WEIBO) | test acc(@WEIBO) | speed(s/10k) |
---|---|---|---|---|---|---|---|---|---|
random | - | 98.06% | 96.12% | 96.10% | 97.8118 % | 97.8933 % | 92.7685 % | 92.8434 % | 114.01(CPU@node01) |
gigawords | 54.17% | 97.07% | 95.28% | 95.33% | 97.1792 % | 97.2476 % | 91.5375 % | 91.8570 % | 100.76(CPU@node05) |
sogou-news | 67.73 | 96.61% | 94.68% | 94.65% | 96.7085 % | 96.7084 % | 90.6736 % | 90.9336 % | 215.25(CPU@node06) |
* The
train acc
is the result of the epoch where validation get the best result . So as thetest_acc
,speed
keep 4 digits after the point for results on devel and test at pku and weibo . Because LTP using this precession .
###Next
-
more epoches should be demonstrated
-
smaller devel granularity( for example , not just on every epoch , but on 5,0000 samples trainning. )
-
get the un-hit words , and decide what to do with decreasing when loading outer word embedding .
基于神经网络的序列标注任务 - WIKI (wiki语法见gollum)