Active Learning

Use active learning as a sampling framework:
- greedily annotate each sequence with current model, sort by increasing correctness probability -> this favors longer sequences which are harder to predict
- same as above but normalize by sequence length n
- fraction of models that disagree on the label of a word. Normalized by n

you can use label propagation to further decide on next samples to pick