nonoql / README.md
mohhhhhit's picture
Update README.md
2f97ec6 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - text2text-generation
  - natural-language-to-sql
  - natural-language-to-mongodb
  - query-generation
  - database
  - t5
datasets:
  - custom
metrics:
  - bleu
  - accuracy
pipeline_tag: text-generation
widget:
  - text: Find employees where salary is greater than 50000
    example_title: SELECT Query
  - text: Delete orders with total_amount less than 1000
    example_title: DELETE Query
  - text: Update employees set department to Sales where employee_id is 101
    example_title: UPDATE Query
  - text: >-
      Insert a new employee with name John Doe, email john@example.com,
      department Engineering
    example_title: INSERT Query

NoNoQL - Natural Language to SQL/MongoDB Query Generator

NoNoQL (formerly TexQL) is a T5-based transformer model that converts natural language queries into both SQL and MongoDB queries. It supports SELECT, INSERT, UPDATE, DELETE, and other database operations.

🎯 Model Description

This model translates natural language database queries into syntactically correct SQL and MongoDB commands. It's trained on a custom dataset of 30,000+ query pairs covering various database operations, tables, and query patterns.

Key Features

  • βœ… Dual Output: Generates both SQL and MongoDB queries from a single natural language input
  • βœ… Multi-Operation Support: SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, and more
  • βœ… Comparison Operators: Handles greater than, less than, equal to, and other comparisons
  • βœ… Complex Queries: Supports WHERE clauses, aggregations, ordering, and limiting
  • βœ… Post-Processing: Includes fixes for common model hallucinations and syntax errors

πŸ“Š Model Details

  • Model Architecture: T5 (Text-to-Text Transfer Transformer)
  • Base Model: google/t5-small
  • Parameters: ~60M
  • Training Data: 30,000+ natural language to SQL/MongoDB query pairs
  • Training Strategy: Unified model trained on both SQL and MongoDB simultaneously
  • Input Format: translate to {sql|mongodb}: {natural_language_query}

Supported Tables/Collections

  • employees: employee_id, name, email, department, salary, hire_date, age
  • departments: department_id, department_name, manager_id, budget, location
  • projects: project_id, project_name, start_date, end_date, budget, status
  • orders: order_id, customer_name, product_name, quantity, order_date, total_amount
  • products: product_id, product_name, category, price, stock_quantity, supplier

πŸš€ Usage

Installation

pip install transformers torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
model_name = "mohhhhhit/nonoql"  # Replace with your HF model path
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Generate SQL query
def generate_query(natural_language, target_type='sql'):
    input_text = f"translate to {target_type}: {natural_language}"
    inputs = tokenizer(input_text, return_tensors="pt", max_length=256, truncation=True)
    
    outputs = model.generate(
        **inputs,
        max_length=512,
        num_beams=10,
        temperature=0.3,
        repetition_penalty=1.2,
        length_penalty=0.8,
        early_stopping=True
    )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
nl_query = "Find employees where salary is greater than 50000"

sql_query = generate_query(nl_query, target_type='sql')
print(f"SQL: {sql_query}")
# Output: SELECT * FROM employees WHERE salary > 50000;

mongodb_query = generate_query(nl_query, target_type='mongodb')
print(f"MongoDB: {mongodb_query}")
# Output: db.employees.find({"salary": {$gt: 50000}});

Example Queries

Natural Language SQL Output MongoDB Output
Show all employees SELECT * FROM employees; db.employees.find({});
Find products where price is less than 100 SELECT * FROM products WHERE price < 100; db.products.find({"price": {$lt: 100}});
Update employees set department to Sales where employee_id is 101 UPDATE employees SET department = 'Sales' WHERE employee_id = 101; db.employees.updateMany({employee_id: 101}, {$set: {department: "Sales"}});
Delete orders with total_amount less than 1000 DELETE FROM orders WHERE total_amount < 1000; db.orders.deleteMany({"total_amount": {$lt: 1000}});
Insert a new employee with name John, email john@example.com INSERT INTO employees (name, email) VALUES ('John', 'john@example.com'); db.employees.insertOne({"name": "John", "email": "john@example.com"});

πŸŽ“ Training

Dataset

  • Size: 30,000+ query pairs
  • Operations: SELECT (40%), INSERT (20%), UPDATE (20%), DELETE (15%), CREATE (5%)
  • Tables: 5 main tables with realistic schemas
  • Generation: Synthetic data with varied patterns and complexity

Training Configuration

training_args = {
    "learning_rate": 3e-4,
    "per_device_train_batch_size": 8,
    "per_device_eval_batch_size": 8,
    "num_train_epochs": 10,
    "weight_decay": 0.01,
    "warmup_steps": 500,
    "max_seq_length": 512,
}

Evaluation Metrics

  • BLEU Score: ~85%
  • Exact Match: ~78%
  • Syntax Correctness: ~92% (after post-processing)

βš™οΈ Post-Processing

The model includes several post-processing fixes to handle common issues:

  1. Comparison Operators: Converts = to >, <, >=, <= based on keywords like "greater than", "less than"
  2. Operation Type: Fixes wrong operations (e.g., SELECT when DELETE is intended)
  3. MongoDB Syntax: Adds missing curly braces and converts to proper MongoDB operators
  4. UPDATE Queries: Reconstructs malformed UPDATE statements
  5. CREATE TABLE: Fixes hallucinated columns in table creation

⚠️ Limitations

  • Schema Awareness: Model is trained on specific tables; may not generalize to completely new schemas
  • Complex Joins: Limited support for multi-table JOINs and subqueries
  • Advanced Features: May struggle with window functions, CTEs, and advanced SQL features
  • Hallucinations: Can generate incorrect column names for unseen patterns (mitigated by post-processing)
  • Case Sensitivity: Works best with lowercase natural language inputs

πŸ“ Known Issues & Fixes

Issue Fix Applied
Model outputs = instead of > or < Post-processing detects comparison keywords and replaces operators
MongoDB missing {} braces Adds curly braces around query objects
SELECT instead of DELETE Detects operation intent from keywords
Incomplete UPDATE queries Reconstructs from natural language parsing

πŸ› οΈ Use Cases

  • Database Query Assistants: Help non-technical users query databases
  • Educational Tools: Teach SQL/MongoDB syntax through examples
  • Prototyping: Quickly generate queries for testing
  • Documentation: Auto-generate query examples
  • Migration Tools: Convert between SQL and MongoDB syntaxes

πŸ“„ Citation

If you use this model in your research or application, please cite:

@misc{nonoql2026,
  title={NoNoQL: Natural Language to SQL and MongoDB Query Generation},
  author={Mohit Panchal},
  year={2026},
  howpublished={\url{https://huggingface.co/mohhhhhit/nonoql}},
}

πŸ“œ License

This model is released under the Apache 2.0 License.

🀝 Contributing

Contributions, feedback, and suggestions are welcome! Please feel free to:

  • Report issues or bugs
  • Suggest new features
  • Improve the training data
  • Add support for more database systems

πŸ”— Links

πŸ™ Acknowledgments

  • Built on the T5 architecture by Google Research
  • Trained using the Hugging Face Transformers library
  • Inspired by the need for more accessible database querying tools

Note: This model is designed for educational and prototyping purposes. Always validate generated queries before executing them on production databases.