clj-duckling.util.learn

corpus->dataset

(corpus->dataset {:keys [context tests], :as corpus} rules feature-extractor logger)

Takes a corpus and a feature extractor and builds a dataset (phase 1.a. on clj-duckling.md).

extract-route-features

(extract-route-features token)

Extracts names of previous routes used to produce this route token.
This is the feature extractor we use.

judge-ml

(judge-ml stash classifiers)

Choose the winning token using a classifier.
Computes prob of each rule according to their routes.

print-dataset

(print-dataset dataset)

Print dataset to STDOUT

route-prob

(route-prob route classifiers)

Computes the _log_ prob for a route.

sentence->dataset

(sentence->dataset s context check rules feature-extractor dataset logger)

Enriches the dataset

Args:
  s (string): a sentence
  context (map): the context
  check (func): fn that determines if a winner is valid
  rules (map):
  feature-extractor (func):
  dataset (vector): the existing dataset

Returns:
  vector: an enriched dataset [{<rule-name> [features, output]}]
        Output is true if the rule was contributing
        successfully, false otherwise

simple-feature-extractor

(simple-feature-extractor token)

A very simple one to show if it works. Not used for now.
Takes a token, returns a vector of features
(can be anything as long as the model understands it).

subtokens

(subtokens token)

Get a set of all the tokens in the tree who eventually produced the given token
(including token itself)

train-classifiers

(train-classifiers corpus rules fextractor logger)

Given a corpus and a set of rules, train a classifier per rule

Generated by Codox

Clj-duckling 0.8.1

Project

Namespaces

Public Vars