Representations

A representation is exactly what it sounds like -- a way of representing your data. For many NLP tasks, a model must be learned before the data can be mapped into a new representation. This applies to parsing, POS tagging, chunking, alignment, etc...

It often makes more sense to compute representations offline -- i.e. not when we are trying to learn our models, because computing the representation at feature extraction time can make things much slower. For this reason, we introduce the concept of representation generators. A representation generator adds some data to your context objects, data that will be used to compute some feature values later in the experiment pipeline.

Internal representation

Representation generators work with the internal representation of the data. Internal representation is a Python dictionary where keys are names of representations ("target", "source", "tags", etc.) and values are lists of representations for each sentences. In the case of target and source representations are lists of target and source words, respectively, but other formats of representations can exist.

All representation generators return internal representations. Some of generators take representations as input, too, and return their extended versions. Others take data files as input.

Interface

The only method all representation generators have is generate. It takes an internal representation or nothing, and returns the same internal representation with one additional field which is a new representation of the data.

List of the available representation generators: