Instructions to use agentlans/deberta-v3-base-zyda-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use agentlans/deberta-v3-base-zyda-2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="agentlans/deberta-v3-base-zyda-2")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("agentlans/deberta-v3-base-zyda-2") model = AutoModelForMaskedLM.from_pretrained("agentlans/deberta-v3-base-zyda-2") - Notebooks
- Google Colab
- Kaggle
DeBERTa-v3-base-Zyda-2
Model Description
This model is a fine-tuned version of microsoft/deberta-v3-base on a subset of the Zyphra/Zyda-2 dataset. It was trained using the Masked Language Modeling (MLM) objective to enhance its understanding of the English language.
Performance
The model achieves the following results on the evaluation set:
- Loss: 2.1833
- Accuracy: 0.6191
Intended Uses & Limitations
This model is designed to be used and finetuned for the following tasks:
- Text embedding
- Text classification
- Fill-in-the-blank tasks
Limitations:
- English language only
- May be inaccurate for specialized jargon, dialects, slang, code, and LaTeX
Training Data
The model was trained on the first 300 000 rows of the Zyphra/Zyda-2 dataset. 5% of that data was used for validation.
Training Procedure
Hyperparameters
The following hyperparameters were used during training:
- Learning rate: 5e-05
- Train batch size: 8
- Eval batch size: 8
- Seed: 42
- Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- Learning rate scheduler: Linear
- Number of epochs: 1.0
Framework versions
- Transformers: 4.46.3
- Pytorch: 2.5.1+cu124
- Datasets: 3.1.0
- Tokenizers: 0.20.3
Usage Examples
Masked Language Modeling
from transformers import pipeline
unmasker = pipeline('fill-mask', model='agentlans/deberta-v3-base-zyda-2')
result = unmasker("[MASK] is the capital of France.")
print(result)
Text Embedding
from transformers import AutoTokenizer, AutoModel
import torch
model_name = "agentlans/deberta-v3-base-zyda-2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
text = "Example sentence for embedding."
inputs = tokenizer(text, return_tensors='pt')
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
print(embeddings)
Ethical Considerations and Bias
As this model is trained on a subset of the Zyda-2 dataset, it may inherit biases present in that data. Users should be aware of potential biases and evaluate the model's output critically, especially for sensitive applications.
Additional Information
For more details about the base model, please refer to microsoft/deberta-v3-base.
- Downloads last month
- 1
Model tree for agentlans/deberta-v3-base-zyda-2
Evaluation results
- Accuracy on Zyphra/Zyda-2 (subset)self-reported0.619