electroglyph
/

FictionBert

Sentence Similarity

sentence-transformers

feature-extraction

Generated from Trainer

dataset_size:761918

loss:MultipleNegativesRankingLoss

text-embeddings-inference

Model card Files Files and versions

electroglyph commited on Jan 5

Commit

68154e2

·

verified ·

1 Parent(s): f9e0a39

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -20,15 +20,15 @@ This model was finetuned with [Unsloth](https://github.com/unslothai/unsloth).
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
  based on Alibaba-NLP/gte-modernbert-base
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 This model is finetuned specifically for fiction retrieval. It's been trained on sci-fi, fantasy, mystery, and other fiction genres.
 Dataset size: 800k rows based on 100% manually cleaned data.
-This model surpasses Qwen3 4B embedding model on my test set (40k examples with hard negatives) by 0.5%.
-Model accuracy increased from 90.8% to 95.7% on the test set.
 Some MTEB benchmarks saw some pretty big losses, they're detailed below.

 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
  based on Alibaba-NLP/gte-modernbert-base
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 This model is finetuned specifically for fiction retrieval. It's been trained on sci-fi, fantasy, mystery, and other fiction genres.
 Dataset size: 800k rows based on 100% manually cleaned data.
+This model surpasses Qwen3 4B embedding model on my test split benchmark (40k examples with hard negatives) by 0.5%.
+Model accuracy increased from 90.8% to 95.7% on the test split.
 Some MTEB benchmarks saw some pretty big losses, they're detailed below.