Medical tasks which do not have a fixed label set. Evaluation is typically done with token-f1 or other semantic similarity metrics.