Conditioned Language Model experimental system.

=== Please see the following document to run the minimal experiments.

===

Memo on multiple-instance running. (for now)

Most of the codes are safe to be run from multiple instances. The only exceptions for now are:
- call_splitta() uses fixed file name. cannot be called from multiple instances.
- Two caches (USE_CACHE_ON_SPLITTA, USE_CACHE_ON_COLL_MODEL) are not safe from multiple instance access.
So what to do? Simple. Fill the caches first. cache_runner_* scripts are there to do this for you (cache_runner_x.pl). Once the two caches are filled in, all codes are multiple-instance access safe. (and that's why not updated the codes to be instance safe --- like locking etc).

===

List of access scripts

condprob.pm: big, bloated and ugly main module
octave_call.pm: some utility codes related to sum log probabilities. (it no longer calls octave for underlying log prob calculation. historical reason)
srilm_call.pm: some utility codes that interface with SRILM

msrpc_runner.pl
msrpc_baseline_runner.pl
rte3_runner.pl
rte3_baseline_runner.pl (The above runners outputs probabilities and measures to STDOUT in CSV format; you can load them into Weka, etc)

eval/simple_eval_msr_pp.pl
eval/simple_eval_rte3.pl ( "threadhold" based accuracy (prec & recall also) showing scripts)

Model building scripts (see making_models.txt, for detailed steps to build models)

Provide feedback

Saved searches