tre-1 / references /python_crfsuite.md
rain1024's picture
Add references folder with research papers (markdown, tex, source files)
316672e
metadata
title: python-crfsuite
type: resource
url: https://github.com/scrapinghub/python-crfsuite

Overview

python-crfsuite provides Python bindings for the CRFsuite conditional random field toolkit, enabling efficient sequence labeling in Python.

Key Information

Field Value
GitHub https://github.com/scrapinghub/python-crfsuite
PyPI https://pypi.org/project/python-crfsuite/
Documentation https://python-crfsuite.readthedocs.io/
License MIT (python-crfsuite), BSD (CRFsuite)
Latest Version 0.9.12 (December 2025)
Stars 771

Features

  • Fast Performance: Faster than official SWIG wrapper
  • No External Dependencies: CRFsuite bundled; NumPy/SciPy not required
  • Python 2 & 3 Support: Works with both Python versions
  • Cython-based: High-performance C++ bindings

Installation

# Using pip
pip install python-crfsuite

# Using conda
conda install -c conda-forge python-crfsuite

Usage

Training

import pycrfsuite

# Create trainer
trainer = pycrfsuite.Trainer(verbose=True)

# Add training data
for xseq, yseq in zip(X_train, y_train):
    trainer.append(xseq, yseq)

# Set parameters
trainer.set_params({
    'c1': 1.0,           # L1 regularization
    'c2': 0.001,         # L2 regularization
    'max_iterations': 100,
    'feature.possible_transitions': True
})

# Train model
trainer.train('model.crfsuite')

Inference

import pycrfsuite

# Load model
tagger = pycrfsuite.Tagger()
tagger.open('model.crfsuite')

# Predict
y_pred = tagger.tag(x_seq)

Feature Format

Features are lists of strings in name=value format:

features = [
    ['word=hello', 'pos=NN', 'is_capitalized=True'],
    ['word=world', 'pos=NN', 'is_capitalized=False'],
]

Training Algorithms

Algorithm Description
lbfgs Limited-memory BFGS (default)
l2sgd SGD with L2 regularization
ap Averaged Perceptron
pa Passive Aggressive
arow Adaptive Regularization of Weights

Parameters (L-BFGS)

Parameter Default Description
c1 0 L1 regularization coefficient
c2 1.0 L2 regularization coefficient
max_iterations unlimited Maximum iterations
num_memories 6 Number of memories for L-BFGS
epsilon 1e-5 Convergence threshold

Related Projects

  • sklearn-crfsuite: Scikit-learn compatible wrapper
  • CRFsuite: Original C++ implementation

Citation

@misc{python-crfsuite,
  author = {Scrapinghub},
  title = {python-crfsuite: Python binding to CRFsuite},
  year = {2014},
  publisher = {GitHub},
  url = {https://github.com/scrapinghub/python-crfsuite}
}

@misc{crfsuite,
  author = {Okazaki, Naoaki},
  title = {CRFsuite: A fast implementation of Conditional Random Fields},
  year = {2007},
  url = {http://www.chokkan.org/software/crfsuite/}
}