Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

ondayex
/
jina-embed-base-dense-retriever

Sentence Similarity
sentence-transformers
Safetensors
English
qwen2
feature-extraction
dense
Generated from Trainer
dataset_size:900
loss:MatryoshkaLoss
loss:MultipleNegativesRankingLoss
text-embeddings-inference
Model card Files Files and versions
xet
Community

Instructions to use ondayex/jina-embed-base-dense-retriever with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Libraries
  • sentence-transformers

    How to use ondayex/jina-embed-base-dense-retriever with sentence-transformers:

    from sentence_transformers import SentenceTransformer
    
    model = SentenceTransformer("ondayex/jina-embed-base-dense-retriever")
    
    sentences = [
        "Best practices for async test_similarity_search_with_relevance_score_with_threshold_and_filter",
        "def test_tool_retry_custom_failure_formatter() -> None:\n    \"\"\"Test ToolRetryMiddlewarewith custom failure message formatter.\"\"\"\n\n    def custom_formatter(exc: Exception) -> str:\n        return f\"Custom error: {type(exc).__name__}\"\n\n    model = FakeToolCallingModel(\n        tool_calls=[\n            [ToolCall(name=\"failing_tool\", args={\"value\": \"test\"}, id=\"1\")],\n            [],\n        ]\n    )\n\n    retry = ToolRetryMiddleware(\n        max_retries=1,\n        initial_delay=0.01,\n        jitter=False,\n        on_failure=custom_formatter,\n    )\n\n    agent = create_agent(\n        model=model,\n        tools=[failing_tool],\n        middleware=[retry],\n        checkpointer=InMemorySaver(),\n    )\n\n    result = agent.invoke(\n        {\"messages\": [HumanMessage(\"Use failing tool\")]},\n        {\"configurable\": {\"thread_id\": \"test\"}},\n    )\n\n    tool_messages = [m for m in result[\"messages\"] if isinstance(m, ToolMessage)]\n    assert len(tool_messages) == 1\n    assert \"Custom error: ValueError\" in tool_messages[0].content",
        "def test_parse_scores(answer: str) -> None:\n    result = output_parser.parse(answer)\n\n    assert result[\"answer\"] == \"foo bar answer.\"\n\n    score = int(result[\"score\"])\n    assert score == 80",
        "async def test_similarity_search_with_relevance_score_with_threshold_and_filter(\n    vector_name: str | None,\n    qdrant_location: str,\n) -> None:\n    \"\"\"Test end to end construction and search.\"\"\"\n    texts = [\"foo\", \"bar\", \"baz\"]\n    metadatas = [\n        {\"page\": i, \"metadata\": {\"page\": i + 1, \"pages\": [i + 2, -1]}}\n        for i in range(len(texts))\n    ]\n    docsearch = Qdrant.from_texts(\n        texts,\n        ConsistentFakeEmbeddings(),\n        metadatas=metadatas,\n        vector_name=vector_name,\n        location=qdrant_location,\n    )\n    score_threshold = 0.99  # for almost exact match\n    # test negative filter condition\n    negative_filter = {\"page\": 1, \"metadata\": {\"page\": 2, \"pages\": [3]}}\n    kwargs = {\"filter\": negative_filter, \"score_threshold\": score_threshold}\n    output = docsearch.similarity_search_with_relevance_scores(\"foo\", k=3, **kwargs)\n    assert len(output) == 0\n    # test positive filter condition\n    positive_filter = {\"page\": 0, \"metadata\": {\"page\": 1, \"pages\": [2]}}\n    kwargs = {\"filter\": positive_filter, \"score_threshold\": score_threshold}\n    output = await docsearch.asimilarity_search_with_relevance_scores(\n        \"foo\", k=3, **kwargs\n    )\n    assert len(output) == 1\n    assert all(score >= score_threshold for _, score in output)"
    ]
    embeddings = model.encode(sentences)
    
    similarities = model.similarity(embeddings, embeddings)
    print(similarities.shape)
    # [4, 4]
  • Notebooks
  • Google Colab
  • Kaggle
jina-embed-base-dense-retriever
1.99 GB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 2 commits
ondayex's picture
ondayex
Add new SentenceTransformer model
3d6d02b verified 4 months ago
  • 1_Pooling
    Add new SentenceTransformer model 4 months ago
  • .gitattributes
    1.57 kB
    Add new SentenceTransformer model 4 months ago
  • README.md
    27.8 kB
    Add new SentenceTransformer model 4 months ago
  • added_tokens.json
    30 Bytes
    Add new SentenceTransformer model 4 months ago
  • config.json
    1.49 kB
    Add new SentenceTransformer model 4 months ago
  • config_sentence_transformers.json
    1.01 kB
    Add new SentenceTransformer model 4 months ago
  • merges.txt
    1.67 MB
    Add new SentenceTransformer model 4 months ago
  • model.safetensors
    1.98 GB
    xet
    Add new SentenceTransformer model 4 months ago
  • modules.json
    349 Bytes
    Add new SentenceTransformer model 4 months ago
  • sentence_bert_config.json
    57 Bytes
    Add new SentenceTransformer model 4 months ago
  • special_tokens_map.json
    441 Bytes
    Add new SentenceTransformer model 4 months ago
  • tokenizer.json
    11.4 MB
    xet
    Add new SentenceTransformer model 4 months ago
  • tokenizer_config.json
    553 Bytes
    Add new SentenceTransformer model 4 months ago
  • vocab.json
    2.78 MB
    Add new SentenceTransformer model 4 months ago