johnnyboycurtis
/

ModernBERT-small

Sentence Similarity

sentence-transformers

feature-extraction

Generated from Trainer

dataset_size:5749

loss:CosineSimilarityLoss

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

johnnyboycurtis commited on Feb 18

Commit

e0d0fe2

·

verified ·

1 Parent(s): c8b219e

Update README.md

Files changed (1) hide show

README.md +12 -2

README.md CHANGED Viewed

@@ -76,9 +76,19 @@ model-index:
 license: mit
 ---
-# SentenceTransformer
-This is a [sentence-transformers](https://www.SBERT.net) model based on a custom ModernBERT-Small architecture, trained from scratch using a multi-stage pipeline. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details

 license: mit
 ---
+# SentenceTransformer (Legacy)
+This is a [sentence-transformers](https://www.SBERT.net) model based on an initial custom ModernBERT-Small architecture, trained from scratch using a multi-stage pipeline including MLM pre-training and semantic fine-tuning. It maps sentences & paragraphs to a **384-dimensional dense vector space**.
+## Warning
+This model was an early exploration into creating a Wide model.
+**⚠️ Legacy Status: NOT RECOMMENDED.**
+This initial implementation suffered from suboptimal architectural scaling decisions made during the initialization phase, particularly concerning the feed-forward network capacity relative to the depth.
+**👉 Recommended Successor:** For superior performance, speed, and architectural coherence, please use the improved version: [**`johnnyboycurtis/ModernBERT-small-v2`**](https://huggingface.co/johnnyboycurtis/ModernBERT-small-v2). The successor model addresses these limitations via a more sophisticated Guided Weight Initialization (GUIDE) technique and specialized Knowledge Distillation tuning.
 ## Model Details