Shuu12121/Owl-ph2-len2048 πŸ¦‰

 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—    β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—                β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—      β–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘               β–ˆβ–ˆβ•”β•β•β•β•β• β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ•‘      ,______,
β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘ β–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ•‘     ( O v O )
β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘      β•šβ•β•β•β•β•β•  β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ•‘      /  V  \
β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β•šβ–ˆβ–ˆβ–ˆβ•”β–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—          β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘     /(     )\
 β•šβ•β•β•β•β•β•  β•šβ•β•β•β•šβ•β•β• β•šβ•β•β•β•β•β•β•           β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•β• β•šβ•β•      ^^   ^^

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Shuu12121/Owl-ph2-base-len2048
  • Maximum Sequence Length: 1024 tokens (2048 tokens during pretraining)
  • Output Dimensionality: 768
  • Similarity Function: Cosine Similarity

This model is a SentenceTransformer variant of Shuu12121/Owl-ph2-base-len2048. It was trained on the Owl corpus for code search and code-text retrieval. The training data consists of roughly 100,000 samples per language (800,640 pairs in total), and the model was trained for 1 epoch with a learning rate of 1e-5.

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Intended Uses

This model is intended for:

  • code search
  • code-text retrieval
  • semantic similarity
  • dense embedding generation for source code and natural language

Usage

Direct Usage (Sentence Transformers)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Shuu12121/Owl-ph2-len2048")

Training Details

Training Dataset

This model was trained on the Owl corpus, a dataset constructed for code search and code-text retrieval. The training set contains approximately 100,000 samples per language, resulting in 800,640 training pairs in total.

Training Hyperparameters

  • Learning rate: 1e-5
  • Epochs: 1
  • Loss: MultipleNegativesRankingLoss

Integrations

πŸ¦‰ Owl-CLI β€” Semantic Code Search in Your Terminal

Repository: https://github.com/Shun0212/Owl-CLI

Owl-ph2-len2048 is the embedding backbone of Owl-CLI, a command-line tool for semantic code search powered by dense retrieval.

Owl-CLI indexes your codebase at the function level, encodes each function using this model, and performs vector similarity search to find relevant code for natural language queries β€” directly from your terminal.

Key Features

Feature Description
Semantic search Natural language β†’ relevant functions via dense embeddings
Function-level indexing Indexed with file paths and line numbers
Differential cache Only re-embeds changed files
JSON output Easy integration with other tools and scripts
MCP server support Plug into AI coding agents (e.g., Claude Code, Cursor)

Example: Query Routing

example-routing

Example: Interactive Session

example-session

Quick Start

# Install
git clone https://github.com/Shun0212/Owl-CLI.git

# Index your codebase and search
owl search "function that handles authentication"

# JSON output for tool integration
owl search "parse config file" --json

# Start MCP server for AI agent integration
owl mcp

For full documentation and installation instructions, see the Owl-CLI repository.

Downloads last month
135
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Shuu12121/Owl-ph2-len2048

Finetuned
(1)
this model