--- title: "python-crfsuite" type: "resource" url: "https://github.com/scrapinghub/python-crfsuite" --- ## Overview python-crfsuite provides Python bindings for the CRFsuite conditional random field toolkit, enabling efficient sequence labeling in Python. ## Key Information | Field | Value | |-------|-------| | **GitHub** | https://github.com/scrapinghub/python-crfsuite | | **PyPI** | https://pypi.org/project/python-crfsuite/ | | **Documentation** | https://python-crfsuite.readthedocs.io/ | | **License** | MIT (python-crfsuite), BSD (CRFsuite) | | **Latest Version** | 0.9.12 (December 2025) | | **Stars** | 771 | ## Features - **Fast Performance**: Faster than official SWIG wrapper - **No External Dependencies**: CRFsuite bundled; NumPy/SciPy not required - **Python 2 & 3 Support**: Works with both Python versions - **Cython-based**: High-performance C++ bindings ## Installation ```bash # Using pip pip install python-crfsuite # Using conda conda install -c conda-forge python-crfsuite ``` ## Usage ### Training ```python import pycrfsuite # Create trainer trainer = pycrfsuite.Trainer(verbose=True) # Add training data for xseq, yseq in zip(X_train, y_train): trainer.append(xseq, yseq) # Set parameters trainer.set_params({ 'c1': 1.0, # L1 regularization 'c2': 0.001, # L2 regularization 'max_iterations': 100, 'feature.possible_transitions': True }) # Train model trainer.train('model.crfsuite') ``` ### Inference ```python import pycrfsuite # Load model tagger = pycrfsuite.Tagger() tagger.open('model.crfsuite') # Predict y_pred = tagger.tag(x_seq) ``` ### Feature Format Features are lists of strings in `name=value` format: ```python features = [ ['word=hello', 'pos=NN', 'is_capitalized=True'], ['word=world', 'pos=NN', 'is_capitalized=False'], ] ``` ## Training Algorithms | Algorithm | Description | |-----------|-------------| | `lbfgs` | Limited-memory BFGS (default) | | `l2sgd` | SGD with L2 regularization | | `ap` | Averaged Perceptron | | `pa` | Passive Aggressive | | `arow` | Adaptive Regularization of Weights | ## Parameters (L-BFGS) | Parameter | Default | Description | |-----------|---------|-------------| | `c1` | 0 | L1 regularization coefficient | | `c2` | 1.0 | L2 regularization coefficient | | `max_iterations` | unlimited | Maximum iterations | | `num_memories` | 6 | Number of memories for L-BFGS | | `epsilon` | 1e-5 | Convergence threshold | ## Related Projects - **sklearn-crfsuite**: Scikit-learn compatible wrapper - **CRFsuite**: Original C++ implementation ## Citation ```bibtex @misc{python-crfsuite, author = {Scrapinghub}, title = {python-crfsuite: Python binding to CRFsuite}, year = {2014}, publisher = {GitHub}, url = {https://github.com/scrapinghub/python-crfsuite} } @misc{crfsuite, author = {Okazaki, Naoaki}, title = {CRFsuite: A fast implementation of Conditional Random Fields}, year = {2007}, url = {http://www.chokkan.org/software/crfsuite/} } ```