How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("token-classification", model="LemiSt/code-segmentor-distilbert")
# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("LemiSt/code-segmentor-distilbert")
model = AutoModelForTokenClassification.from_pretrained("LemiSt/code-segmentor-distilbert")
Quick Links

This is a distilbert-base-multilingual-cased-Model fine-tuned with a NER objective to tag tokens based on whether they belong to a code block or natural language text. The dataset of 78210 examples was generated by randomly combining code and text blocks from other permissively-licensed datasets, with some examples containing only code and some only regular text.

The model achieves the following stats on the validation set:

Metric Value
Loss 0.0788
F1 Score 0.8619
Precision 0.8362
Recall 0.8893
Accuracy 0.9792
Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support