# 🦅 Ftel-Text2SQL **Ftel-Text2SQL** is a Text-to-SQL model developed by FPT Telecom. Fine-tuned specifically for complex, cross-domain database querying, it achieves high execution accuracy on challenging benchmarks like BIRD-SQL. ## 🎯 BirdSQL results - **Dev set:** 71.19 - **Test set:** 72.78 ## 🚀 Model Details - **Base Architecture:** [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) - **Developer:** FPT Telecom - **Language(s):** English - **License:** Apache 2.0 - **Task:** Text-to-SQL (Code Generation) ## 🧠 Training Methodology This model was trained using a robust two-stage pipeline designed to enhance both structural understanding and logical reasoning for SQL generation: ### Stage 1: Supervised Fine-Tuning (SFT) In the first phase, the base Qwen2.5-Coder model underwent SFT using high-quality, curated Text-to-SQL datasets. This stage focused on teaching the model complex database schemas, foreign key relationships, and advanced SQL dialect nuances (such as window functions and complex joins). ### Stage 2: Group Relative Policy Optimization (GRPO) To further align the model's reasoning capabilities, we implemented a reinforcement learning phase using **GRPO** (Group Relative Policy Optimization). Unlike standard RLHF which relies on human preference models, our GRPO implementation utilized **execution-based reward functions**. The model was rewarded based on: 1. **Execution Correctness:** Whether the generated SQL query executed successfully without syntax errors. 2. **Result Equivalence:** Whether the executed output matched the ground truth execution results. This dual-stage approach significantly reduced syntax hallucination and improved the model's multi-step reasoning capabilities for real-world database queries. ## 💻 Usage & Prompt Format This model inherits the ChatML format from the Qwen2.5 family. We highly recommend using a **Few-shot** prompting strategy with Self-Consistency (Majority Voting) for optimal performance. ### Example Inference Code (vLLM) ```python from vllm import LLM, SamplingParams llm = LLM(model="YourUsername/Ftel-Text2SQL", tensor_parallel_size=2) sampling_params = SamplingParams(temperature=0.0, max_tokens=512) # Standard ChatML Prompting prompt = """<|im_start|>system You are an expert SQL developer. Generate a valid SQL query based on the given schema and question. <|im_end|> <|im_start|>user Schema: [Your Database Schema Here] Question: How many customers made a purchase in 2023? <|im_end|> <|im_start|>assistant """ outputs = llm.generate([prompt], sampling_params) print(outputs[0].outputs[0].text)