Shuu12121/Owl-ph2-len2048 π¦
βββββββ βββ ββββββ βββββββ βββ βββ
ββββββββββββ ββββββ ββββββββ βββ βββ ,______,
βββ ββββββ ββ ββββββ βββββββ βββ βββ βββ ( O v O )
βββ ββββββββββββββββ βββββββ βββ βββ βββ / V \
βββββββββββββββββββββββββββ ββββββββ ββββββββ βββ /( )\
βββββββ ββββββββ ββββββββ βββββββ ββββββββ βββ ^^ ^^
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Shuu12121/Owl-ph2-base-len2048
- Maximum Sequence Length: 1024 tokens (2048 tokens during pretraining)
- Output Dimensionality: 768
- Similarity Function: Cosine Similarity
This model is a SentenceTransformer variant of Shuu12121/Owl-ph2-base-len2048. It was trained on the Owl corpus for code search and code-text retrieval. The training data consists of roughly 100,000 samples per language (800,640 pairs in total), and the model was trained for 1 epoch with a learning rate of 1e-5.
Model Sources
- Base model: Shuu12121/Owl-ph2-base-len2048
- Sentence Transformers: Sentence Transformers Documentation
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Intended Uses
This model is intended for:
- code search
- code-text retrieval
- semantic similarity
- dense embedding generation for source code and natural language
Usage
Direct Usage (Sentence Transformers)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Shuu12121/Owl-ph2-len2048")
Training Details
Training Dataset
This model was trained on the Owl corpus, a dataset constructed for code search and code-text retrieval. The training set contains approximately 100,000 samples per language, resulting in 800,640 training pairs in total.
Training Hyperparameters
- Learning rate: 1e-5
- Epochs: 1
- Loss: MultipleNegativesRankingLoss
Integrations
π¦ Owl-CLI β Semantic Code Search in Your Terminal
Repository: https://github.com/Shun0212/Owl-CLI
Owl-ph2-len2048 is the embedding backbone of Owl-CLI, a command-line tool for semantic code search powered by dense retrieval.
Owl-CLI indexes your codebase at the function level, encodes each function using this model, and performs vector similarity search to find relevant code for natural language queries β directly from your terminal.
Key Features
| Feature | Description |
|---|---|
| Semantic search | Natural language β relevant functions via dense embeddings |
| Function-level indexing | Indexed with file paths and line numbers |
| Differential cache | Only re-embeds changed files |
| JSON output | Easy integration with other tools and scripts |
| MCP server support | Plug into AI coding agents (e.g., Claude Code, Cursor) |
Example: Query Routing
Example: Interactive Session
Quick Start
# Install
git clone https://github.com/Shun0212/Owl-CLI.git
# Index your codebase and search
owl search "function that handles authentication"
# JSON output for tool integration
owl search "parse config file" --json
# Start MCP server for AI agent integration
owl mcp
For full documentation and installation instructions, see the Owl-CLI repository.
- Downloads last month
- 135
Model tree for Shuu12121/Owl-ph2-len2048
Base model
Shuu12121/Owl-ph2-base-len2048
