|
|
--- |
|
|
title: "python-crfsuite" |
|
|
type: "resource" |
|
|
url: "https://github.com/scrapinghub/python-crfsuite" |
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
python-crfsuite provides Python bindings for the CRFsuite conditional random field toolkit, enabling efficient sequence labeling in Python. |
|
|
|
|
|
## Key Information |
|
|
|
|
|
| Field | Value | |
|
|
|-------|-------| |
|
|
| **GitHub** | https://github.com/scrapinghub/python-crfsuite | |
|
|
| **PyPI** | https://pypi.org/project/python-crfsuite/ | |
|
|
| **Documentation** | https://python-crfsuite.readthedocs.io/ | |
|
|
| **License** | MIT (python-crfsuite), BSD (CRFsuite) | |
|
|
| **Latest Version** | 0.9.12 (December 2025) | |
|
|
| **Stars** | 771 | |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Fast Performance**: Faster than official SWIG wrapper |
|
|
- **No External Dependencies**: CRFsuite bundled; NumPy/SciPy not required |
|
|
- **Python 2 & 3 Support**: Works with both Python versions |
|
|
- **Cython-based**: High-performance C++ bindings |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
# Using pip |
|
|
pip install python-crfsuite |
|
|
|
|
|
# Using conda |
|
|
conda install -c conda-forge python-crfsuite |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Training |
|
|
|
|
|
```python |
|
|
import pycrfsuite |
|
|
|
|
|
# Create trainer |
|
|
trainer = pycrfsuite.Trainer(verbose=True) |
|
|
|
|
|
# Add training data |
|
|
for xseq, yseq in zip(X_train, y_train): |
|
|
trainer.append(xseq, yseq) |
|
|
|
|
|
# Set parameters |
|
|
trainer.set_params({ |
|
|
'c1': 1.0, # L1 regularization |
|
|
'c2': 0.001, # L2 regularization |
|
|
'max_iterations': 100, |
|
|
'feature.possible_transitions': True |
|
|
}) |
|
|
|
|
|
# Train model |
|
|
trainer.train('model.crfsuite') |
|
|
``` |
|
|
|
|
|
### Inference |
|
|
|
|
|
```python |
|
|
import pycrfsuite |
|
|
|
|
|
# Load model |
|
|
tagger = pycrfsuite.Tagger() |
|
|
tagger.open('model.crfsuite') |
|
|
|
|
|
# Predict |
|
|
y_pred = tagger.tag(x_seq) |
|
|
``` |
|
|
|
|
|
### Feature Format |
|
|
|
|
|
Features are lists of strings in `name=value` format: |
|
|
|
|
|
```python |
|
|
features = [ |
|
|
['word=hello', 'pos=NN', 'is_capitalized=True'], |
|
|
['word=world', 'pos=NN', 'is_capitalized=False'], |
|
|
] |
|
|
``` |
|
|
|
|
|
## Training Algorithms |
|
|
|
|
|
| Algorithm | Description | |
|
|
|-----------|-------------| |
|
|
| `lbfgs` | Limited-memory BFGS (default) | |
|
|
| `l2sgd` | SGD with L2 regularization | |
|
|
| `ap` | Averaged Perceptron | |
|
|
| `pa` | Passive Aggressive | |
|
|
| `arow` | Adaptive Regularization of Weights | |
|
|
|
|
|
## Parameters (L-BFGS) |
|
|
|
|
|
| Parameter | Default | Description | |
|
|
|-----------|---------|-------------| |
|
|
| `c1` | 0 | L1 regularization coefficient | |
|
|
| `c2` | 1.0 | L2 regularization coefficient | |
|
|
| `max_iterations` | unlimited | Maximum iterations | |
|
|
| `num_memories` | 6 | Number of memories for L-BFGS | |
|
|
| `epsilon` | 1e-5 | Convergence threshold | |
|
|
|
|
|
## Related Projects |
|
|
|
|
|
- **sklearn-crfsuite**: Scikit-learn compatible wrapper |
|
|
- **CRFsuite**: Original C++ implementation |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{python-crfsuite, |
|
|
author = {Scrapinghub}, |
|
|
title = {python-crfsuite: Python binding to CRFsuite}, |
|
|
year = {2014}, |
|
|
publisher = {GitHub}, |
|
|
url = {https://github.com/scrapinghub/python-crfsuite} |
|
|
} |
|
|
|
|
|
@misc{crfsuite, |
|
|
author = {Okazaki, Naoaki}, |
|
|
title = {CRFsuite: A fast implementation of Conditional Random Fields}, |
|
|
year = {2007}, |
|
|
url = {http://www.chokkan.org/software/crfsuite/} |
|
|
} |
|
|
``` |
|
|
|