dksysd
/

cefr-classifier

Text Classification

Model card Files Files and versions

cefr-classifier / README.md

dksysd's picture

Update README.md

e8b714b verified about 2 months ago

|

history blame contribute delete

2.89 kB

	---
	tags:
	- lora
	- text-classification
	- cefr
	- en
	base_model: microsoft/deberta-v3-large
	license: cc-by-nc-sa-4.0
	language:
	- en
	pipeline_tag: text-classification
	datasets:
	- dksysd/cefr-classification
	---

	# CEFR Classifier

	A text classification model that predicts CEFR (Common European Framework of Reference for Languages) levels (A1-C2) for English texts.

	Fine-tuned from `microsoft/deberta-v3-large`.

	## Model Performance

	Parallel Corpus Dataset
	![confusion_matrix_parallel](https://cdn-uploads.huggingface.co/production/uploads/67c124daa19ae7b9efa277a1/yWEuGel3zHSH4wf_a5uZt.png)

	Instruction Dataset
	![confusion_matrix_instruction](https://cdn-uploads.huggingface.co/production/uploads/67c124daa19ae7b9efa277a1/RRQdVcwyuo3Y9NZO9aBXN.png)

	## Quick Start

	### Simple Usage (Recommended)
	```python
	from transformers import pipeline

	# Load the classifier
	classifier = pipeline("text-classification", model="dksysd/cefr-classifier")

	# Classify a text
	text = "This is a sample sentence to classify."
	result = classifier(text)

	print(result)
	# [{'label': 'A1', 'score': 0.535}]
	```

	### Get All Class Probabilities
	```python
	classifier = pipeline(
	"text-classification",
	model="dksysd/cefr-classifier",
	return_all_scores=True
	)

	result = classifier(text)[0]

	for item in result:
	print(f"{item['label']}: {item['score']:.4f}")
	```

	### Batch Processing
	```python
	texts = [
	"The cat sat on the mat.",
	"Quantum entanglement represents a fundamental phenomenon in physics.",
	"I like pizza."
	]

	results = classifier(texts)

	for text, result in zip(texts, results):
	print(f"{text} -> {result['label']} ({result['score']:.3f})")
	```

	## Advanced Usage

	### Manual Loading with PyTorch

	For more control over the inference process:
	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load model and tokenizer
	model_name = "dksysd/cefr-classifier"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Setup device
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	model.eval()

	# Label mapping
	id2label = {0: 'A1', 1: 'A2', 2: 'B1', 3: 'B2', 4: 'C1', 5: 'C2'}

	# Inference
	text = "Your text here"
	inputs = tokenizer(text, padding="max_length", truncation=True,
	max_length=1024, return_tensors="pt").to(device)

	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=-1)[0]
	pred_idx = torch.argmax(probs).item()

	print(f"Predicted: {id2label[pred_idx]} (confidence: {probs[pred_idx]:.4f})")
	```

	## CEFR Levels

	- A1: Beginner
	- A2: Elementary
	- B1: Intermediate
	- B2: Upper Intermediate
	- C1: Advanced
	- C2: Proficient


	## License

	This model is released under the CC-BY-NC-SA-4.0 license.