File size: 3,010 Bytes
316672e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
title: "python-crfsuite"
type: "resource"
url: "https://github.com/scrapinghub/python-crfsuite"
---
## Overview
python-crfsuite provides Python bindings for the CRFsuite conditional random field toolkit, enabling efficient sequence labeling in Python.
## Key Information
| Field | Value |
|-------|-------|
| **GitHub** | https://github.com/scrapinghub/python-crfsuite |
| **PyPI** | https://pypi.org/project/python-crfsuite/ |
| **Documentation** | https://python-crfsuite.readthedocs.io/ |
| **License** | MIT (python-crfsuite), BSD (CRFsuite) |
| **Latest Version** | 0.9.12 (December 2025) |
| **Stars** | 771 |
## Features
- **Fast Performance**: Faster than official SWIG wrapper
- **No External Dependencies**: CRFsuite bundled; NumPy/SciPy not required
- **Python 2 & 3 Support**: Works with both Python versions
- **Cython-based**: High-performance C++ bindings
## Installation
```bash
# Using pip
pip install python-crfsuite
# Using conda
conda install -c conda-forge python-crfsuite
```
## Usage
### Training
```python
import pycrfsuite
# Create trainer
trainer = pycrfsuite.Trainer(verbose=True)
# Add training data
for xseq, yseq in zip(X_train, y_train):
trainer.append(xseq, yseq)
# Set parameters
trainer.set_params({
'c1': 1.0, # L1 regularization
'c2': 0.001, # L2 regularization
'max_iterations': 100,
'feature.possible_transitions': True
})
# Train model
trainer.train('model.crfsuite')
```
### Inference
```python
import pycrfsuite
# Load model
tagger = pycrfsuite.Tagger()
tagger.open('model.crfsuite')
# Predict
y_pred = tagger.tag(x_seq)
```
### Feature Format
Features are lists of strings in `name=value` format:
```python
features = [
['word=hello', 'pos=NN', 'is_capitalized=True'],
['word=world', 'pos=NN', 'is_capitalized=False'],
]
```
## Training Algorithms
| Algorithm | Description |
|-----------|-------------|
| `lbfgs` | Limited-memory BFGS (default) |
| `l2sgd` | SGD with L2 regularization |
| `ap` | Averaged Perceptron |
| `pa` | Passive Aggressive |
| `arow` | Adaptive Regularization of Weights |
## Parameters (L-BFGS)
| Parameter | Default | Description |
|-----------|---------|-------------|
| `c1` | 0 | L1 regularization coefficient |
| `c2` | 1.0 | L2 regularization coefficient |
| `max_iterations` | unlimited | Maximum iterations |
| `num_memories` | 6 | Number of memories for L-BFGS |
| `epsilon` | 1e-5 | Convergence threshold |
## Related Projects
- **sklearn-crfsuite**: Scikit-learn compatible wrapper
- **CRFsuite**: Original C++ implementation
## Citation
```bibtex
@misc{python-crfsuite,
author = {Scrapinghub},
title = {python-crfsuite: Python binding to CRFsuite},
year = {2014},
publisher = {GitHub},
url = {https://github.com/scrapinghub/python-crfsuite}
}
@misc{crfsuite,
author = {Okazaki, Naoaki},
title = {CRFsuite: A fast implementation of Conditional Random Fields},
year = {2007},
url = {http://www.chokkan.org/software/crfsuite/}
}
```
|