YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
DistilBERT French Multilingual Sequence Classification
This is a distilled version of BERT fine-tuned for multilingual sequence classification, with a focus on French text processing. The model demonstrates strong performance across French, Hindi, and English languages.
Model Details
- Model Type: DistilBERT for Sequence Classification
- Base Architecture: 6-layer transformer with 12 attention heads
- Hidden Size: 768
- Vocabulary Size: 119,547 tokens
- Max Sequence Length: 512 tokens
- Languages: Multilingual (French, Hindi, English)
Performance Metrics
The model achieved the following performance on evaluation datasets:
Validation Set (Overall)
- Accuracy: 96.75%
- F1 Score: 96.78%
- Precision: 95.77%
- Recall: 97.82%
Language-Specific Performance
French
- Accuracy: 77.32%
- F1 Score: 80.88%
- Precision: 69.46%
- Recall: 96.81%
- Samples: 4,267
Hindi
- Accuracy: 80.17%
- F1 Score: 83.17%
- Precision: 72.14%
- Recall: 98.17%
- Samples: 2,500
English
- Accuracy: 97.22%
- Samples: 3,233
External Dataset Performance
- TURNS2K Dataset: 90.25% accuracy, 90.96% F1 score
Model Configuration
{
"model_type": "distilbert",
"architectures": ["DistilBertForSequenceClassification"],
"n_layers": 6,
"n_heads": 12,
"dim": 768,
"hidden_dim": 3072,
"max_position_embeddings": 512,
"vocab_size": 119547,
"activation": "gelu",
"attention_dropout": 0.1,
"dropout": 0.1,
"seq_classif_dropout": 0.2
}
Files Included
config.json: Model configurationtokenizer_config.json: Tokenizer configurationtokenizer.json: Fast tokenizer filevocab.txt: Vocabulary filespecial_tokens_map.json: Special tokens mappingbert_model.onnx: ONNX model for inferencebert_model_optimized.onnx: Optimized ONNX modelbert_model_optimized_dynamic_int8.onnx: INT8 quantized ONNX modelmetrics.yaml: Detailed performance metrics
Usage
With Transformers
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
# Load model and tokenizer
model = DistilBertForSequenceClassification.from_pretrained("your-username/distilled_bert_french_12")
tokenizer = DistilBertTokenizer.from_pretrained("your-username/distilled_bert_french_12")
# Example inference
text = "Votre texte français ici"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(f"Predictions: {predictions}")
With ONNX Runtime
import onnxruntime as ort
from transformers import DistilBertTokenizer
import numpy as np
# Load tokenizer and ONNX model
tokenizer = DistilBertTokenizer.from_pretrained("your-username/distilled_bert_french_12")
session = ort.InferenceSession("bert_model_optimized.onnx")
# Prepare input
text = "Votre texte français ici"
inputs = tokenizer(text, return_tensors="np", truncation=True, padding=True, max_length=512)
# Run inference
outputs = session.run(None, {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
})
predictions = outputs[0]
print(f"Predictions: {predictions}")
Training Details
- Training Steps: 8,000
- Epochs: 2
- Framework: PyTorch/Transformers
- Optimizer: AdamW (inferred)
- Learning Rate Schedule: Cosine with warmup (inferred)
Optimization
The model includes three ONNX variants for different deployment scenarios:
- Standard ONNX (
bert_model.onnx): Full precision model - Optimized ONNX (
bert_model_optimized.onnx): Graph optimizations applied - INT8 Quantized (
bert_model_optimized_dynamic_int8.onnx): Quantized for faster inference
License
Please ensure you comply with the original BERT license and any dataset licenses used during training.
Citation
If you use this model in your research, please cite:
@misc{distilled_bert_french_12,
title={DistilBERT French Multilingual Sequence Classification},
author={Your Name},
year={2024},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/your-username/distilled_bert_french_12}
}
Contact
For questions or issues, please open an issue in the model repository or contact [your-email@example.com].
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support