Instructions to use thanhdath/embedding-0.6b-spider2.0-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use thanhdath/embedding-0.6b-spider2.0-v2 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("thanhdath/embedding-0.6b-spider2.0-v2") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
embedding-0.6b-spider2.0-v2
Bi-encoder column retriever for text-to-SQL schema linking. Qwen3-Embedding-0.6B, InfoNCE LoRA (merged), 1 epoch. v2 adds BigQuery/Snowflake analytics domain-adaptation rows (synthetic questions over public schemas, held out from eval — no question leakage) on top of v1's mix.
Training data: thanhdath/embedding-0.6b-spider2.0-v2-data
— 42,965 rows: BIRD train + Spider train + Spider 2.0 BQ/SF synthetic (no SynSQL).
Results vs v1 (column recall@K)
Spider 2.0-547 (flat): R@100 0.850 / R@300 0.939 / R@500 0.962 (v1: 0.812 / 0.916 / 0.946). With key-completion: R@500 0.972 @ ~226 cols. Spider 2.0-233q two-stage (top-50 tables→K): R@300 0.911 / R@500 0.937 / R@800 0.963 (v1: 0.904 / 0.930 / 0.956). BIRD dev (flat): R@50 0.962, R@200 1.000 (no regression).
Usage
vllm serve thanhdath/embedding-0.6b-spider2.0-v2 --task embed --port 8001 --max-model-len 16384
Score = dot product(question_emb, column_desc_emb); take top-K. For wide schemas, complete the retrieved tables' PK/FK columns (key-completion) for +1.5-4.6 pp at near-zero cost.
- Downloads last month
- 16