Skip to content

Commit

Permalink
add decoding method comparison
Browse files Browse the repository at this point in the history
  • Loading branch information
ChenglongChen committed Jul 16, 2015
1 parent 634802d commit efebfeb
Show file tree
Hide file tree
Showing 53 changed files with 168,405 additions and 71 deletions.
46 changes: 25 additions & 21 deletions Doc/Kaggle_CrowdFlower_ChenglongChen.aux
Original file line number Diff line number Diff line change
Expand Up @@ -64,33 +64,37 @@
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.1}Classification}{13}{subsubsection.4.2.1}}
\newlabel{subsubsec:Classification}{{4.2.1}{13}{Classification\relax }{subsubsection.4.2.1}{}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.2}Regression}{13}{subsubsection.4.2.2}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.3}Pairwise Ranking}{13}{subsubsection.4.2.3}}
\citation{ebc}
\citation{cocr}
\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces Histograms of raw prediction and predictions using various decoding methods grouped by true relevance.}}{14}{figure.2}}
\newlabel{fig:MSE_decoding}{{2}{14}{Histograms of raw prediction and predictions using various decoding methods grouped by true relevance}{figure.2}{}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.3}Pairwise Ranking}{14}{subsubsection.4.2.3}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.4}Oridinal Regression}{14}{subsubsection.4.2.4}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.5}Softkappa}{14}{subsubsection.4.2.5}}
\citation{cocr}
\@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Performance of various decoding methods for MSE objective.}}{15}{table.6}}
\newlabel{tab:MSE_decoding}{{6}{15}{Performance of various decoding methods for MSE objective}{table.6}{}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.2.5}Softkappa}{15}{subsubsection.4.2.5}}
\citation{ensemble_selection}
\citation{hyperopt}
\citation{hyperopt_url}
\@writefile{toc}{\contentsline {subsection}{\numberline {4.3}Sample Weighting}{15}{subsection.4.3}}
\@writefile{toc}{\contentsline {subsection}{\numberline {4.4}Ensemble Selection}{15}{subsection.4.4}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.4.1}Model Library Building via Guided Parameter Searching}{15}{subsubsection.4.4.1}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.4.2}Model Weight Optimization}{15}{subsubsection.4.4.2}}
\@writefile{lot}{\contentsline {table}{\numberline {6}{\ignorespaces Model Library}}{16}{table.6}}
\newlabel{tab:Model_Library}{{6}{16}{Model Library\relax }{table.6}{}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.4.3}Randomized Ensemble Selection}{16}{subsubsection.4.4.3}}
\@writefile{toc}{\contentsline {section}{\numberline {5}Code Description}{16}{section.5}}
\@writefile{toc}{\contentsline {subsection}{\numberline {4.3}Sample Weighting}{16}{subsection.4.3}}
\@writefile{toc}{\contentsline {subsection}{\numberline {4.4}Ensemble Selection}{16}{subsection.4.4}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.4.1}Model Library Building via Guided Parameter Searching}{16}{subsubsection.4.4.1}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.4.2}Model Weight Optimization}{16}{subsubsection.4.4.2}}
\@writefile{lot}{\contentsline {table}{\numberline {7}{\ignorespaces Model Library}}{17}{table.7}}
\newlabel{tab:Model_Library}{{7}{17}{Model Library\relax }{table.7}{}}
\@writefile{toc}{\contentsline {subsubsection}{\numberline {4.4.3}Randomized Ensemble Selection}{17}{subsubsection.4.4.3}}
\@writefile{toc}{\contentsline {section}{\numberline {5}Code Description}{17}{section.5}}
\citation{NLTK_Cookbook}
\@writefile{lof}{\contentsline {figure}{\numberline {2}{\ignorespaces CV mean, Public LB, and Private LB scores of our 35 best Public LB submissions generating with randomized ensemble selection. One standard deviation of the CV score is plotted via error bar.}}{17}{figure.2}}
\newlabel{fig:CV_Public_Private}{{2}{17}{CV mean, Public LB, and Private LB scores of our 35 best Public LB submissions generating with randomized ensemble selection. One standard deviation of the CV score is plotted via error bar}{figure.2}{}}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.1}Setting}{17}{subsection.5.1}}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.2}Feature}{17}{subsection.5.2}}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.3}Model}{18}{subsection.5.3}}
\@writefile{toc}{\contentsline {section}{\numberline {6}Dependencies}{19}{section.6}}
\@writefile{lof}{\contentsline {figure}{\numberline {3}{\ignorespaces CV mean, Public LB, and Private LB scores of our 35 best Public LB submissions generating with randomized ensemble selection. One standard deviation of the CV score is plotted via error bar.}}{18}{figure.3}}
\newlabel{fig:CV_Public_Private}{{3}{18}{CV mean, Public LB, and Private LB scores of our 35 best Public LB submissions generating with randomized ensemble selection. One standard deviation of the CV score is plotted via error bar}{figure.3}{}}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.1}Setting}{18}{subsection.5.1}}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.2}Feature}{18}{subsection.5.2}}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.3}Model}{19}{subsection.5.3}}
\@writefile{toc}{\contentsline {section}{\numberline {6}Dependencies}{20}{section.6}}
\citation{wmd}
\@writefile{toc}{\contentsline {section}{\numberline {7}How To Generate the Solution (aka README file)}{20}{section.7}}
\@writefile{toc}{\contentsline {section}{\numberline {8}Additional Comments and Observations}{20}{section.8}}
\@writefile{toc}{\contentsline {section}{\numberline {9}Simple Features and Methods}{20}{section.9}}
\@writefile{toc}{\contentsline {section}{\numberline {7}How To Generate the Solution (aka README file)}{21}{section.7}}
\@writefile{toc}{\contentsline {section}{\numberline {8}Additional Comments and Observations}{21}{section.8}}
\@writefile{toc}{\contentsline {section}{\numberline {9}Simple Features and Methods}{21}{section.9}}
\bibstyle{plain}
\bibdata{reference}
\bibcite{owen}{1}
Expand All @@ -103,4 +107,4 @@
\bibcite{ensemble_selection}{8}
\bibcite{NLTK_Cookbook}{9}
\bibcite{cocr}{10}
\@writefile{toc}{\contentsline {section}{\numberline {10}Acknowledgement}{21}{section.10}}
\@writefile{toc}{\contentsline {section}{\numberline {10}Acknowledgement}{22}{section.10}}
58 changes: 32 additions & 26 deletions Doc/Kaggle_CrowdFlower_ChenglongChen.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9) (preloaded format=pdflatex 2013.11.4) 13 JUL 2015 07:58
This is pdfTeX, Version 3.1415926-2.3-1.40.12 (MiKTeX 2.9) (preloaded format=pdflatex 2013.11.4) 17 JUL 2015 01:37
entering extended mode
**Kaggle_CrowdFlower_ChenglongChen.tex
(F:\CrowdFolwer\cleanup\Doc\Kaggle_CrowdFlower_ChenglongChen.tex
Expand Down Expand Up @@ -813,51 +813,57 @@ Overfull \hbox (5.09244pt too wide) in paragraph at lines 401--402
-tion, e.g., \OT1/cmtt/m/n/10.95 LogisticRegression
[]

[13] [14] [15]
<./compare_MSE_Decoding.pdf, id=359, 666.69077pt x 574.54652pt>
File: ./compare_MSE_Decoding.pdf Graphic file (type pdf)

<use ./compare_MSE_Decoding.pdf>
Package pdftex.def Info: ./compare_MSE_Decoding.pdf used on input line 414.
(pdftex.def) Requested size: 422.77664pt x 364.34204pt.
[13] [14 <F:/CrowdFolwer/cleanup/Doc/compare_MSE_Decoding.pdf>] [15] [16]
PGFPlots: reading {35lb_subs.txt}
[16]
Overfull \hbox (10.74371pt too wide) in paragraph at lines 555--556
[17]
Overfull \hbox (10.74371pt too wide) in paragraph at lines 583--584
\OT1/cmtt/m/n/10.95 ./Data\OT1/cmr/m/n/10.95 , i.e., \OT1/cmtt/m/n/10.95 strati
fiedKFold.query.pkl \OT1/cmr/m/n/10.95 and \OT1/cmtt/m/n/10.95 stratifiedKFold.
relevance.pkl\OT1/cmr/m/n/10.95 .
[]

[17]
Overfull \hbox (0.73491pt too wide) in paragraph at lines 565--566
[18]
Overfull \hbox (0.73491pt too wide) in paragraph at lines 593--594
[]\OT1/cmr/bx/n/10.95 combine[]feat[][LSA[]and[]stats[]feat[]Jun09][][Low].py\O
T1/cmr/m/n/10.95 : This file gen-er-ates one
[]

[18]
Overfull \hbox (1.45724pt too wide) in paragraph at lines 592--593
[19]
Overfull \hbox (1.45724pt too wide) in paragraph at lines 620--621
\OT1/cmr/m/n/10.95 pa. It is adopt-ed from []$\OT1/cmtt/m/n/10.95 https : / / g
ithub . com / benhamner / Metrics / tree / master /$
[]


Overfull \hbox (32.26485pt too wide) in paragraph at lines 613--614
Overfull \hbox (32.26485pt too wide) in paragraph at lines 641--642
[]\OT1/cmr/m/n/10.95 XGBoost-0.4.0 (Win-dows Ex-e-cutable, []$\OT1/cmtt/m/n/10.
95 https : / / github . com / dmlc / XGBoost / releases /$
[]


Overfull \hbox (11.5833pt too wide) in paragraph at lines 614--615
Overfull \hbox (11.5833pt too wide) in paragraph at lines 642--643
[]\OT1/cmr/m/n/10.95 ml[]metrics ([]$\OT1/cmtt/m/n/10.95 https : / / github . c
om / benhamner / Metrics / tree / master / Python / ml _$
[]


Overfull \hbox (12.11351pt too wide) in paragraph at lines 618--619
Overfull \hbox (12.11351pt too wide) in paragraph at lines 646--647
[]\OT1/cmr/m/n/10.95 rgf1.2 (Win-dows Ex-e-cutable, []$\OT1/cmtt/m/n/10.95 http
: / / stat . rutgers . edu / home / tzhang / software /$
[]


Overfull \hbox (2.8642pt too wide) in paragraph at lines 622--622
Overfull \hbox (2.8642pt too wide) in paragraph at lines 650--650
[]\OT1/cmr/bx/n/17.28 How To Gen-er-ate the So-lu-tion (a-ka README
[]

[19] [20] (F:\CrowdFolwer\cleanup\Doc\Kaggle_CrowdFlower_ChenglongChen.bbl
[20] [21] (F:\CrowdFolwer\cleanup\Doc\Kaggle_CrowdFlower_ChenglongChen.bbl
Overfull \hbox (49.59592pt too wide) in paragraph at lines 4--5
[][]$\OT1/cmtt/m/n/10.95 http : / / nycdatascience . com / featured-[]talk-[]1-
[]kaggle-[]data-[]scientist-[]owen-[]zhang/$[]\OT1/cmr/m/n/10.95 .
Expand All @@ -882,12 +888,12 @@ ication / forums / t / 13863 /$
[]

)
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 659.
[21]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 659.
Package atveryend Info: Empty hook `BeforeClearDocument' on input line 687.
[22]
Package atveryend Info: Empty hook `AfterLastShipout' on input line 687.
(F:\CrowdFolwer\cleanup\Doc\Kaggle_CrowdFlower_ChenglongChen.aux)
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 659.
Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 659.
Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 687.
Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 687.


Package rerunfilecheck Warning: File `Kaggle_CrowdFlower_ChenglongChen.out' has
Expand All @@ -901,10 +907,10 @@ t':
(rerunfilecheck) After: 5835D4BCA4B1BF337073CA56FA26B04F;3336.
)
Here is how much of TeX's memory you used:
21915 strings out of 495354
471649 string characters out of 3183859
685259 words of memory out of 3000000
24515 multiletter control sequences out of 15000+200000
21925 strings out of 495354
471854 string characters out of 3183859
685519 words of memory out of 3000000
24521 multiletter control sequences out of 15000+200000
24220 words of font info for 93 fonts, out of 3000000 for 9000
14 hyphenation exceptions out of 8191
63i,19n,114p,722b,1949s stack positions out of 5000i,500n,10000p,200000b,50000s
Expand All @@ -922,10 +928,10 @@ m/cmsy10.pfb><D:/CTEX/MiKTeX/fonts/type1/public/amsfonts/cm/cmti10.pfb><D:/CTEX
/MiKTeX/fonts/type1/public/amsfonts/cm/cmti8.pfb><D:/CTEX/MiKTeX/fonts/type1/pu
blic/amsfonts/cm/cmtt10.pfb><D:/CTEX/MiKTeX/fonts/type1/public/amsfonts/cm/cmtt
12.pfb>
Output written on Kaggle_CrowdFlower_ChenglongChen.pdf (21 pages, 393463 bytes)
Output written on Kaggle_CrowdFlower_ChenglongChen.pdf (22 pages, 417452 bytes)
.
PDF statistics:
545 PDF objects out of 1000 (max. 8388607)
124 named destinations out of 1000 (max. 500000)
386 words of extra memory for PDF output out of 10000 (max. 10000000)
590 PDF objects out of 1000 (max. 8388607)
127 named destinations out of 1000 (max. 500000)
391 words of extra memory for PDF output out of 10000 (max. 10000000)

Binary file modified Doc/Kaggle_CrowdFlower_ChenglongChen.pdf
Binary file not shown.
Binary file modified Doc/Kaggle_CrowdFlower_ChenglongChen.synctex.gz
Binary file not shown.
36 changes: 32 additions & 4 deletions Doc/Kaggle_CrowdFlower_ChenglongChen.tex
Original file line number Diff line number Diff line change
Expand Up @@ -297,10 +297,10 @@ \subsubsection{Basic TF-IDF Features}
\item \textbf{Individual SVD}\\
We fit a SVD transformer for TF-IDF vectors of $\{q_i, t_i, d_i\}$, separately.
\end{itemize}
\item \textbf{Cosine Similarity Based on SVD Reduced Features}\\
\item \textbf{Basic Cosine Similarity Based on SVD Reduced Features}\\
We computed cosine similarity based on SVD reduced features (using common SVD).
\item \textbf{Statistical Cosine Similarity Based on SVD Reduced Features}\\
We computed statistical cosine similarity based on SVD reduced features.
We computed statistical cosine similarity based on SVD reduced features as in Sec. \ref{subsubsec:Statistical_Distance_Features}.
\end{itemize}
\subsubsection{Cooccurrence TF-IDF Features}
\label{subsubsec:Cooccurrence_TFIDF_Features}
Expand Down Expand Up @@ -403,7 +403,35 @@ \subsubsection{Classification}
\subsubsection{Regression}
Classification doesn't take into account the weight $w_{i,j}$ in $\kappa$, and the magnitude of the rating. With $w_{i,j}$'s form, it is convincing to apply regression (with mean-squared-error, MSE) to predict the relevance score. In prediction phase, we can convert the raw prediction score to $\{1,2,3,4\}$ following step 2-4 as in Sec. \ref{subsubsec:Classification}.

It turns out that MSE is the best objective among all the alternatives we have tried during the competition. For this reason, we mostly used regression to predict \texttt{median\_relevance}.
Figure \ref{fig:MSE_decoding} shows some histograms from our reproduced best single model for one run of CV (only one validation fold is used). In specific, we plot histograms of 1) raw prediction, 2) rounding decoding, 3) ceiling decoding, and 4) the above cdf decoding, grouped by the true relevance. It's most obvious that both rounding and ceiling decoding methods have difficulty in predicting relevance 4.

Table \ref{tab:MSE_decoding} shows the kappa scores for each decoding method (using all 3 runs and 3 folds CV). The above cdf decoding method exhibits the best performance among the three methods we considered.

It turns out that MSE (with the above decoding method) is the best objective among all the alternatives we have tried during the competition. For this reason, we mostly used regression to predict \texttt{median\_relevance}.

\begin{figure}[t]
\centering
\includegraphics[width=0.9\textwidth]{./compare_MSE_Decoding.pdf}
\caption{Histograms of raw prediction and predictions using various decoding methods grouped by true relevance.}
\label{fig:MSE_decoding}
\end{figure}

\begin{table}[t]
\centering
\caption{Performance of various decoding methods for MSE objective.}
\label{tab:MSE_decoding}
\begin{tabular}{|c|c|c|}
\hline
Method & CV Mean & CV Std \\
\hline
Rounding & 0.404277 & 0.005069\\
\hline
Ceiling & 0.513138 & 0.006485\\
\hline
CDF & \textcolor{red}{0.681876} & 0.005259\\
\hline
\end{tabular}
\end{table}

\subsubsection{Pairwise Ranking}
We have tried pairwise ranking (LambdaMart) within XGBoost, but didn't obtain acceptable performance (it was worse than softmax).
Expand Down Expand Up @@ -491,7 +519,7 @@ \subsubsection{Model Weight Optimization}
In the original ensemble selection algorithm, the model is added to the ensemble with hard weight 1. However, this is not guaranteed for best performance. We have modified it to allow weight optimized for each model when adding to the ensemble. The weight is optimized with Hyeropt too. This gives better performance than hard weight 1 in our preliminary comparison.

\subsubsection{Randomized Ensemble Selection}
The final method we used to generate the winning solution is actually without model weight optimization. On the contrary, we replaced weight optimization with \textbf{random weight}. This is inspired by the \texttt{ExtraTreesRegressor} to reduce the model variance (or the risk of overfitting).
The final method we used to generate the winning solution is actually without model weight optimization. On the contrary, we replaced weight optimization with \textbf{random weight}. This is inspired by the \texttt{ExtraTreesRegressor} to reduce the model variance (or the risk of overfitting).

Figure \ref{fig:CV_Public_Private} shows the CV mean, Public LB, and Private LB scores of our 35 best Public LB submissions generated with this method. As shown, CV score is correlated with the Public LB and Private LB, while it's more correlated with the latter. As time went by, we have trained more and more different models, which turned out to be helpful for ensemble selection in both CV and Private LB (as shown in Figure \ref{fig:CV_Public_Private}).

Expand Down
15 changes: 11 additions & 4 deletions Doc/Kaggle_CrowdFlower_ChenglongChen.tex.bak
Original file line number Diff line number Diff line change
Expand Up @@ -297,10 +297,10 @@ We first concatenated the TF-IDF vectors of $\{q_i, t_i, d_i\}$ (using common vo
\item \textbf{Individual SVD}\\
We fit a SVD transformer for TF-IDF vectors of $\{q_i, t_i, d_i\}$, separately.
\end{itemize}
\item \textbf{Cosine Similarity Based on SVD Reduced Features}\\
\item \textbf{Basic Cosine Similarity Based on SVD Reduced Features}\\
We computed cosine similarity based on SVD reduced features (using common SVD).
\item \textbf{Statistical Cosine Similarity Based on SVD Reduced Features}\\
We computed statistical cosine similarity based on SVD reduced features.
We computed statistical cosine similarity based on SVD reduced features as in Sec. \ref{subsubsec:Statistical_Distance_Features}.
\end{itemize}
\subsubsection{Cooccurrence TF-IDF Features}
\label{subsubsec:Cooccurrence_TFIDF_Features}
Expand Down Expand Up @@ -405,6 +405,13 @@ Classification doesn't take into account the weight $w_{i,j}$ in $\kappa$, and t

It turns out that MSE is the best objective among all the alternatives we have tried during the competition. For this reason, we mostly used regression to predict \texttt{median\_relevance}.

\begin{figure}[!htb]
\centering
\includegraphics[width=0.9\textwidth]{./FlowChart.pdf}
\caption{The flowchart of our method.}
\label{fig:Flowchart}
\end{figure}

\subsubsection{Pairwise Ranking}
We have tried pairwise ranking (LambdaMart) within XGBoost, but didn't obtain acceptable performance (it was worse than softmax).

Expand Down Expand Up @@ -491,11 +498,11 @@ RGF & \multicolumn{2}{c|}{Regression} & Low & No
In the original ensemble selection algorithm, the model is added to the ensemble with hard weight 1. However, this is not guaranteed for best performance. We have modified it to allow weight optimized for each model when adding to the ensemble. The weight is optimized with Hyeropt too. This gives better performance than hard weight 1 in our preliminary comparison.

\subsubsection{Randomized Ensemble Selection}
The final method we used to generate the winning solution is actually without model weight optimization. On the contrary, we replaced weight optimization with \textbf{random weight}. This is inspired by the \texttt{ExtraTreesRegressor} to reduce the model variance (or the risk of overfitting).
The final method we used to generate the winning solution is actually without model weight optimization. On the contrary, we replaced weight optimization with \textbf{random weight}. This is inspired by the \texttt{ExtraTreesRegressor} to reduce the model variance (or the risk of overfitting).

Figure \ref{fig:CV_Public_Private} shows the CV mean, Public LB, and Private LB scores of our 35 best Public LB submissions generated with this method. As shown, CV score is correlated with the Public LB and Private LB, while it's more correlated with the latter. As time went by, we have trained more and more different models, which turned out to be helpful for ensemble selection in both CV and Private LB (as shown in Figure \ref{fig:CV_Public_Private}).

The winning solution that scored \textbf{0.70807} on Public LB and \textbf{0.72189} on Private LB is just a median ensemble of these 35 best Public LB submissions.
The winning submission that scored \textbf{0.70807} on Public LB and \textbf{0.72189} on Private LB is just a median ensemble of these 35 best Public LB submissions.

\begin{figure}[t]
\centering
Expand Down
Loading

0 comments on commit efebfeb

Please sign in to comment.