tre-1 / references /python_crfsuite.md
rain1024's picture
Add references folder with research papers (markdown, tex, source files)
316672e
---
title: "python-crfsuite"
type: "resource"
url: "https://github.com/scrapinghub/python-crfsuite"
---
## Overview
python-crfsuite provides Python bindings for the CRFsuite conditional random field toolkit, enabling efficient sequence labeling in Python.
## Key Information
| Field | Value |
|-------|-------|
| **GitHub** | https://github.com/scrapinghub/python-crfsuite |
| **PyPI** | https://pypi.org/project/python-crfsuite/ |
| **Documentation** | https://python-crfsuite.readthedocs.io/ |
| **License** | MIT (python-crfsuite), BSD (CRFsuite) |
| **Latest Version** | 0.9.12 (December 2025) |
| **Stars** | 771 |
## Features
- **Fast Performance**: Faster than official SWIG wrapper
- **No External Dependencies**: CRFsuite bundled; NumPy/SciPy not required
- **Python 2 & 3 Support**: Works with both Python versions
- **Cython-based**: High-performance C++ bindings
## Installation
```bash
# Using pip
pip install python-crfsuite
# Using conda
conda install -c conda-forge python-crfsuite
```
## Usage
### Training
```python
import pycrfsuite
# Create trainer
trainer = pycrfsuite.Trainer(verbose=True)
# Add training data
for xseq, yseq in zip(X_train, y_train):
trainer.append(xseq, yseq)
# Set parameters
trainer.set_params({
'c1': 1.0, # L1 regularization
'c2': 0.001, # L2 regularization
'max_iterations': 100,
'feature.possible_transitions': True
})
# Train model
trainer.train('model.crfsuite')
```
### Inference
```python
import pycrfsuite
# Load model
tagger = pycrfsuite.Tagger()
tagger.open('model.crfsuite')
# Predict
y_pred = tagger.tag(x_seq)
```
### Feature Format
Features are lists of strings in `name=value` format:
```python
features = [
['word=hello', 'pos=NN', 'is_capitalized=True'],
['word=world', 'pos=NN', 'is_capitalized=False'],
]
```
## Training Algorithms
| Algorithm | Description |
|-----------|-------------|
| `lbfgs` | Limited-memory BFGS (default) |
| `l2sgd` | SGD with L2 regularization |
| `ap` | Averaged Perceptron |
| `pa` | Passive Aggressive |
| `arow` | Adaptive Regularization of Weights |
## Parameters (L-BFGS)
| Parameter | Default | Description |
|-----------|---------|-------------|
| `c1` | 0 | L1 regularization coefficient |
| `c2` | 1.0 | L2 regularization coefficient |
| `max_iterations` | unlimited | Maximum iterations |
| `num_memories` | 6 | Number of memories for L-BFGS |
| `epsilon` | 1e-5 | Convergence threshold |
## Related Projects
- **sklearn-crfsuite**: Scikit-learn compatible wrapper
- **CRFsuite**: Original C++ implementation
## Citation
```bibtex
@misc{python-crfsuite,
author = {Scrapinghub},
title = {python-crfsuite: Python binding to CRFsuite},
year = {2014},
publisher = {GitHub},
url = {https://github.com/scrapinghub/python-crfsuite}
}
@misc{crfsuite,
author = {Okazaki, Naoaki},
title = {CRFsuite: A fast implementation of Conditional Random Fields},
year = {2007},
url = {http://www.chokkan.org/software/crfsuite/}
}
```