clj-duckling.util.learn
corpus->dataset
(corpus->dataset {:keys [context tests], :as corpus} rules feature-extractor logger)
Takes a corpus and a feature extractor and builds a dataset (phase 1.a. on clj-duckling.md).
judge-ml
(judge-ml stash classifiers)
Choose the winning token using a classifier.
Computes prob of each rule according to their routes.
print-dataset
(print-dataset dataset)
route-prob
(route-prob route classifiers)
Computes the _log_ prob for a route.
sentence->dataset
(sentence->dataset s context check rules feature-extractor dataset logger)
Enriches the dataset
Args:
s (string): a sentence
context (map): the context
check (func): fn that determines if a winner is valid
rules (map):
feature-extractor (func):
dataset (vector): the existing dataset
Returns:
vector: an enriched dataset [{<rule-name> [features, output]}]
Output is true if the rule was contributing
successfully, false otherwise
subtokens
(subtokens token)
Get a set of all the tokens in the tree who eventually produced the given token
(including token itself)
train-classifiers
(train-classifiers corpus rules fextractor logger)
Given a corpus and a set of rules, train a classifier per rule