YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Details

Model Description

This is a Sentence Transformer model fine-tuned from facebook/drama-base. It maps sentences and paragraphs to a 768-dimensional dense vector space and can be used for:

βœ… Semantic Textual Similarity
βœ… Semantic Search
βœ… Paraphrase Mining
βœ… Text Classification
βœ… Clustering

Model Type: Sentence Transformer Base Model: facebook/drama-base Maximum Sequence Length: 512 tokens Output Dimensionality: 768 dimensions Similarity Function: Cosine Similarity

πŸ“š Model Sources

Sentence Transformers Documentation

Repository: Sentence Transformers on GitHub
Hugging Face Model Card
πŸ›  Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

πŸ’‘ Usage

Direct Usage (Sentence Transformers)
First, install the required libraries:

pip install -U sentence-transformers torch
Then, load the model and run inference:

from sentence_transformers import SentenceTransformer
import torch

# Load FP16 Quantized Model
model = SentenceTransformer("your_model_name").to("cuda" if torch.cuda.is_available() else "cpu")

# Encode Sentences
sentences = [
    "Artificial Intelligence is evolving rapidly.",
    "Machine Learning is a subset of AI.",
    "This is a random sentence."
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # Output: (3, 768)

# Compute Similarity
def get_similarity(emb1, emb2):
    return torch.nn.functional.cosine_similarity(torch.tensor(emb1), torch.tensor(emb2), dim=0).item()

similarity_score = get_similarity(embeddings[0], embeddings[1])
print(f"Similarity Score: {similarity_score:.4f}")

πŸ“Š Training Details

Training Dataset

Dataset: STS-B (Semantic Textual Similarity Benchmark)
Size: 5,749 training samples
Columns: sentence_0, sentence_1, label

Sample Statistics

sentence_0 sentence_1 label
Biostatistics in Public Health Statistics 1
Vital Signs: Understanding What the Body Is Telling Us Data Science 0
Camino a la Excelencia en GestiΓ³n de Proyectos Cybersecurity 0
{
    "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
    "margin": 0.5,
    "size_average": true
}

πŸ”§ Training Hyperparameters

Hyperparameter	Value
per_device_train_batch_size	16
per_device_eval_batch_size	16
learning_rate	2e-5
epochs	1
optimizer	AdamW
βš™ Framework Versions
Library	Version
Python	3.12.7
Sentence Transformers	3.4.1
Transformers	4.49.0
PyTorch	2.5.1+cu124
Accelerate	1.3.0
Datasets	3.2.0
Tokenizers	0.21.0
Downloads last month
8
Safetensors
Model size
0.2B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support