--- language: - en license: apache-2.0 tags: - text2text-generation - natural-language-to-sql - natural-language-to-mongodb - query-generation - database - t5 datasets: - custom metrics: - bleu - accuracy pipeline_tag: text-generation widget: - text: "Find employees where salary is greater than 50000" example_title: "SELECT Query" - text: "Delete orders with total_amount less than 1000" example_title: "DELETE Query" - text: "Update employees set department to Sales where employee_id is 101" example_title: "UPDATE Query" - text: "Insert a new employee with name John Doe, email john@example.com, department Engineering" example_title: "INSERT Query" --- # NoNoQL - Natural Language to SQL/MongoDB Query Generator **NoNoQL** (formerly TexQL) is a T5-based transformer model that converts natural language queries into both SQL and MongoDB queries. It supports SELECT, INSERT, UPDATE, DELETE, and other database operations. ## 🎯 Model Description This model translates natural language database queries into syntactically correct SQL and MongoDB commands. It's trained on a custom dataset of 30,000+ query pairs covering various database operations, tables, and query patterns. ### Key Features - ✅ **Dual Output**: Generates both SQL and MongoDB queries from a single natural language input - ✅ **Multi-Operation Support**: SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, and more - ✅ **Comparison Operators**: Handles greater than, less than, equal to, and other comparisons - ✅ **Complex Queries**: Supports WHERE clauses, aggregations, ordering, and limiting - ✅ **Post-Processing**: Includes fixes for common model hallucinations and syntax errors ## 📊 Model Details - **Model Architecture**: T5 (Text-to-Text Transfer Transformer) - **Base Model**: google/t5-small - **Parameters**: ~60M - **Training Data**: 30,000+ natural language to SQL/MongoDB query pairs - **Training Strategy**: Unified model trained on both SQL and MongoDB simultaneously - **Input Format**: `translate to {sql|mongodb}: {natural_language_query}` ### Supported Tables/Collections - **employees**: employee_id, name, email, department, salary, hire_date, age - **departments**: department_id, department_name, manager_id, budget, location - **projects**: project_id, project_name, start_date, end_date, budget, status - **orders**: order_id, customer_name, product_name, quantity, order_date, total_amount - **products**: product_id, product_name, category, price, stock_quantity, supplier ## 🚀 Usage ### Installation ```bash pip install transformers torch ``` ### Basic Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM # Load model and tokenizer model_name = "mohhhhhit/nonoql" # Replace with your HF model path tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) # Generate SQL query def generate_query(natural_language, target_type='sql'): input_text = f"translate to {target_type}: {natural_language}" inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True) outputs = model.generate( **inputs, max_length=512, num_beams=10, temperature=0.3, repetition_penalty=1.2, length_penalty=0.8, early_stopping=True ) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example usage nl_query = "Find employees where salary is greater than 50000" sql_query = generate_query(nl_query, target_type='sql') print(f"SQL: {sql_query}") # Output: SELECT * FROM employees WHERE salary > 50000; mongodb_query = generate_query(nl_query, target_type='mongodb') print(f"MongoDB: {mongodb_query}") # Output: db.employees.find({"salary": {$gt: 50000}}); ``` ### Example Queries | Natural Language | SQL Output | MongoDB Output | |-----------------|------------|----------------| | Show all employees | `SELECT * FROM employees;` | `db.employees.find({});` | | Find products where price is less than 100 | `SELECT * FROM products WHERE price < 100;` | `db.products.find({"price": {$lt: 100}});` | | Update employees set department to Sales where employee_id is 101 | `UPDATE employees SET department = 'Sales' WHERE employee_id = 101;` | `db.employees.updateMany({employee_id: 101}, {$set: {department: "Sales"}});` | | Delete orders with total_amount less than 1000 | `DELETE FROM orders WHERE total_amount < 1000;` | `db.orders.deleteMany({"total_amount": {$lt: 1000}});` | | Insert a new employee with name John, email john@example.com | `INSERT INTO employees (name, email) VALUES ('John', 'john@example.com');` | `db.employees.insertOne({"name": "John", "email": "john@example.com"});` | ## 🎓 Training ### Dataset - **Size**: 30,000+ query pairs - **Operations**: SELECT (40%), INSERT (20%), UPDATE (20%), DELETE (15%), CREATE (5%) - **Tables**: 5 main tables with realistic schemas - **Generation**: Synthetic data with varied patterns and complexity ### Training Configuration ```python training_args = { "learning_rate": 3e-4, "per_device_train_batch_size": 8, "per_device_eval_batch_size": 8, "num_train_epochs": 10, "weight_decay": 0.01, "warmup_steps": 500, "max_seq_length": 512, } ``` ### Evaluation Metrics - **BLEU Score**: ~85% - **Exact Match**: ~78% - **Syntax Correctness**: ~92% (after post-processing) ## ⚙️ Post-Processing The model includes several post-processing fixes to handle common issues: 1. **Comparison Operators**: Converts `=` to `>`, `<`, `>=`, `<=` based on keywords like "greater than", "less than" 2. **Operation Type**: Fixes wrong operations (e.g., SELECT when DELETE is intended) 3. **MongoDB Syntax**: Adds missing curly braces and converts to proper MongoDB operators 4. **UPDATE Queries**: Reconstructs malformed UPDATE statements 5. **CREATE TABLE**: Fixes hallucinated columns in table creation ## ⚠️ Limitations - **Schema Awareness**: Model is trained on specific tables; may not generalize to completely new schemas - **Complex Joins**: Limited support for multi-table JOINs and subqueries - **Advanced Features**: May struggle with window functions, CTEs, and advanced SQL features - **Hallucinations**: Can generate incorrect column names for unseen patterns (mitigated by post-processing) - **Case Sensitivity**: Works best with lowercase natural language inputs ## 📝 Known Issues & Fixes | Issue | Fix Applied | |-------|-------------| | Model outputs `=` instead of `>` or `<` | Post-processing detects comparison keywords and replaces operators | | MongoDB missing `{}` braces | Adds curly braces around query objects | | `SELECT` instead of `DELETE` | Detects operation intent from keywords | | Incomplete UPDATE queries | Reconstructs from natural language parsing | ## 🛠️ Use Cases - **Database Query Assistants**: Help non-technical users query databases - **Educational Tools**: Teach SQL/MongoDB syntax through examples - **Prototyping**: Quickly generate queries for testing - **Documentation**: Auto-generate query examples - **Migration Tools**: Convert between SQL and MongoDB syntaxes ## 📄 Citation If you use this model in your research or application, please cite: ```bibtex @misc{nonoql2026, title={NoNoQL: Natural Language to SQL and MongoDB Query Generation}, author={Mohit Panchal}, year={2026}, howpublished={\url{https://huggingface.co/mohhhhhit/nonoql}}, } ``` ## 📜 License This model is released under the Apache 2.0 License. ## 🤝 Contributing Contributions, feedback, and suggestions are welcome! Please feel free to: - Report issues or bugs - Suggest new features - Improve the training data - Add support for more database systems ## 🔗 Links - **Model Repository**: [Hugging Face](https://huggingface.co/mohhhhhit/nonoql) - **GitHub**: [Source Code](https://github.com/mohhhit/NoNoQL) - **Demo**: [Streamlit App](your-demo-url) ## 🙏 Acknowledgments - Built on the T5 architecture by Google Research - Trained using the Hugging Face Transformers library - Inspired by the need for more accessible database querying tools --- **Note**: This model is designed for educational and prototyping purposes. Always validate generated queries before executing them on production databases.