Instructions to use HuggingFaceFW/fineweb-edu-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceFW/fineweb-edu-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="HuggingFaceFW/fineweb-edu-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("HuggingFaceFW/fineweb-edu-classifier") model = AutoModelForSequenceClassification.from_pretrained("HuggingFaceFW/fineweb-edu-classifier") - Inference
- Notebooks
- Google Colab
- Kaggle
Documents in fineweb dataset may exceed max context length of this classifier
#6
by ZefanW - opened
How are these pieces dealt with in fineweb-edu curation?
Samples are truncated to the model's context length, you can find the inference code here: https://github.com/huggingface/cosmopedia/blob/main/classification/run_edu_bert.py