handwoven8588
/

CodeRankEmbed-flash-attn

sentence-transformers

flash-attention

Model card Files Files and versions

CodeRankEmbed-flash-attn

Commit History

README: drop pipeline_tag (feature-extraction)

e361c6f
verified

handwoven8588 commited on 4 days ago

README: reframe bf16 weights — reason is flash_attn half-precision + runtime-bf16 (not download size)

aa2548a
verified

handwoven8588 commited on 4 days ago

README pass: base_model_relation=quantized, fix CPU-eager wording (bf16, not bit-identical), drop internal refs + first-person plural

7961156
verified

handwoven8588 commited on 4 days ago

README: fill 3090 Ti perf table (bf16 flash 2.1GB/162k tok/s vs fp32 eager 6.7GB/52k tok/s; cosine 0.9986)

b283422
verified

handwoven8588 commited on 4 days ago

v2: bf16 weights (547->274MB) + from_pretrained torch_dtype fix (loads bf16 natively) + corrected model tree (base_model=nomic-ai/CodeRankEmbed only) + bf16-derivative README

581207a
verified

handwoven8588 commited on 4 days ago

CodeRankEmbed with native flash-attn varlen forward (derivative of nomic-ai/CodeRankEmbed; identical weights; flash-vs-fp32 parity cosine 0.99999, eager fallback bit-identical)

22d2b3c
verified

handwoven8588 commited on 4 days ago

initial commit

4cce5bd
verified

handwoven8588 commited on 4 days ago