Instructions to use BrejBala/analytics-copilot-mistral7b-text2sql-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BrejBala/analytics-copilot-mistral7b-text2sql-adapter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="BrejBala/analytics-copilot-mistral7b-text2sql-adapter")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("BrejBala/analytics-copilot-mistral7b-text2sql-adapter", dtype="auto")

PEFT
How to use BrejBala/analytics-copilot-mistral7b-text2sql-adapter with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use BrejBala/analytics-copilot-mistral7b-text2sql-adapter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BrejBala/analytics-copilot-mistral7b-text2sql-adapter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrejBala/analytics-copilot-mistral7b-text2sql-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BrejBala/analytics-copilot-mistral7b-text2sql-adapter

SGLang

How to use BrejBala/analytics-copilot-mistral7b-text2sql-adapter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "BrejBala/analytics-copilot-mistral7b-text2sql-adapter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrejBala/analytics-copilot-mistral7b-text2sql-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "BrejBala/analytics-copilot-mistral7b-text2sql-adapter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrejBala/analytics-copilot-mistral7b-text2sql-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use BrejBala/analytics-copilot-mistral7b-text2sql-adapter with Docker Model Runner:
```
docker model run hf.co/BrejBala/analytics-copilot-mistral7b-text2sql-adapter
```

BrejBala commited on Jan 11

Commit

1b3a99a

verified ·

1 Parent(s): 2b6326d

Update README.md

Browse files

Files changed (1) hide show

README.md +579 -109

README.md CHANGED Viewed

@@ -1,210 +1,680 @@
 ---
-base_model: unsloth/mistral-7b-instruct-v0.1-bnb-4bit
-library_name: peft
-pipeline_tag: text-generation
 tags:
-- base_model:adapter:unsloth/mistral-7b-instruct-v0.1-bnb-4bit
-- lora
-- sft
-- transformers
-- trl
-- unsloth
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.18.0

 ---
+language: en
+license: apache-2.0
 tags:
+  - text2sql
+  - sql
+  - structured-data
+  - natural-language-to-sql
+  - mistral
+  - qlora
+  - lora
+  - peft
+  - transformers
+  - huggingface
+  - streamlit
+  - evaluation
+  - spider
+datasets:
+  - b-mc2/sql-create-context
+library_name: transformers
+pipeline_tag: text2text-generation
+base_model: mistralai/Mistral-7B-Instruct-v0.1
 ---
+# Analytics Copilot (Text-to-SQL) – Mistral-7B QLoRA
+This repository contains a **Text-to-SQL** model built by fine-tuning
+**`mistralai/Mistral-7B-Instruct-v0.1`** with **QLoRA** on the
+**`b-mc2/sql-create-context`** dataset, plus an evaluation pipeline and a
+Streamlit UI for interactive usage.
+The model’s goal is to convert a **natural-language question** and a concrete
+**database schema** (as `CREATE TABLE` DDL) into a **single SQL query** that
+answers the question.
+> **Note:** This model card documents the *adapter* (QLoRA) or fine-tuned model
+> released from the Analytics Copilot Text-to-SQL project. It assumes the
+> underlying base model is `mistralai/Mistral-7B-Instruct-v0.1` and that
+> training was run using the public **`b-mc2/sql-create-context`** dataset.
+---
+## Model Summary
+- **Task:** Text-to-SQL (natural-language questions → SQL queries)
+- **Base model:** `mistralai/Mistral-7B-Instruct-v0.1`
+- **Fine-tuning method:** QLoRA (4-bit) with LoRA adapters
+- **Libraries:** `transformers`, `peft`, `trl`, `unsloth`, `bitsandbytes`
+- **Primary training data:** `b-mc2/sql-create-context`
+- **Evaluation datasets:**
+  - Internal: processed val split from `b-mc2/sql-create-context`
+  - External: Spider dev (via `xlangai/spider` + `richardr1126/spider-schema`)
+- **Input:** Schema (`CREATE TABLE` context) + natural-language question
+- **Output:** A single SQL query string
+- **Usage:** Mainly via Hugging Face Inference Endpoints + LoRA adapters, or
+  by loading the adapter with `transformers` + `peft`.
+---
+## Intended Use and Limitations
+### Intended Use
+This model is intended as a **developer-facing Text-to-SQL assistant**. Typical
+uses include:
+- Helping analysts and engineers generate SQL queries from natural language
+  when they:
+  - Already know the schema (or can paste it).
+  - Want to prototype queries quickly.
+- Powering a **Text-to-SQL copilot UI**, e.g., the included Streamlit app:
+  - Paste database schema (DDL) into a text area.
+  - Ask a question in natural language.
+  - Get suggested SQL as a starting point.
+- Serving as a **research / teaching artifact**:
+  - Demonstrates how to fine-tune an open LLM with QLoRA for Text-to-SQL.
+  - Provides a reproducible evaluation pipeline on a public dataset.
+### Out of Scope / Misuse
+The model is **not** intended for:
+- Direct, unsupervised execution against **production databases**:
+  - SQL may be syntactically valid but semantically off.
+  - The model is not aware of performance / cost implications.
+- Use as a general-purpose chatbot:
+  - It is trained specifically on schema + question → SQL.
+- Generating **arbitrary SQL** without schema:
+  - It is strongly conditioned on explicit schema context.
+- High-stakes domains:
+  - Healthcare, finance, safety-critical environments, or any domain where
+    incorrect queries can cause harm or large financial loss.
+### Limitations
+- **Hallucinations:** Despite having schema context, the model can:
+  - Refer to non-existent tables/columns.
+  - Misinterpret relationships between tables.
+- **No automatic execution safety:**
+  - The training objective does not enforce read-only queries.
+  - You must wrap the model in a strict execution layer (e.g., allow only
+    `SELECT`, enforce limits, static analysis).
+- **Domain coverage:**
+  - Training is driven by `b-mc2/sql-create-context` and Spider; behavior on
+    very different schemas or DB engines may degrade.
+- **Locale and language:**
+  - Primarily English; performance on non-English questions is untested.
+You should treat generated SQL as **suggestions** that require human review
+before execution.
+---
 ## Model Details
+### Architecture
+- **Base architecture:** Mistral-7B (decoder-only Transformer)
+- **Base model:** `mistralai/Mistral-7B-Instruct-v0.1`
+  - Licensed under **Apache 2.0**.
+- **Fine-tuning method:** QLoRA (Low-Rank Adapters with 4-bit quantized base)
+- **Adapter mechanism:** LoRA adapters (PEFT / Unsloth)
+Typical QLoRA configuration (as used in the training script/notebook):
+- `lora_r`: 16
+- `lora_alpha`: 16
+- `lora_dropout`: 0.0
+- `max_seq_length`: 2048
+- 4-bit quantization with bitsandbytes:
+  - `bnb_4bit_quant_type = "nf4"`
+  - `bnb_4bit_compute_dtype = "float16"` (on CUDA)
+  - `bnb_4bit_use_double_quant = True`
+### Training Configuration (QLoRA)
+The project defines a `TrainingConfig` dataclass with the following key fields:
+- `base_model` (str): e.g. `"mistralai/Mistral-7B-Instruct-v0.1"`
+- `max_steps` (int): e.g. 500
+- `per_device_train_batch_size` (int): typically small (e.g. 1)
+- `gradient_accumulation_steps` (int): e.g. 8 (to achieve an effective batch size)
+- `learning_rate` (float): e.g. `2e-4`
+- `warmup_steps` (int): e.g. 50
+- `weight_decay` (float): typically `0.0` for QLoRA
+- `max_seq_length` (int): e.g. 2048
+- `lora_r` (int): e.g. 16
+- `lora_alpha` (int): e.g. 16
+- `lora_dropout` (float): e.g. 0.0
+- `seed` (int): e.g. 42
+These values are exposed via the CLI script:
+```bash
+python scripts/train_qlora.py \
+  --train_path data/processed/train.jsonl \
+  --val_path data/processed/val.jsonl \
+  --base_model mistralai/Mistral-7B-Instruct-v0.1 \
+  --output_dir outputs/ \
+  --max_steps 500 \
+  --per_device_train_batch_size 1 \
+  --gradient_accumulation_steps 8 \
+  --learning_rate 2e-4 \
+  --warmup_steps 50 \
+  --weight_decay 0.0 \
+  --max_seq_length 2048 \
+  --lora_r 16 \
+  --lora_alpha 16 \
+  --lora_dropout 0.0 \
+  --seed 42
+```
+## Data and Preprocessing
+### Primary Training Dataset: `b-mc2/sql-create-context`
+- **Name:** `b-mc2/sql-create-context`
+- **Source:** Hugging Face Datasets
+- **Dataset page:** https://huggingface.co/datasets/b-mc2/sql-create-context
+**Fields:**
+- `question` – natural language question from the user
+- `context` – schema context as one or more `CREATE TABLE` statements
+- `answer` – gold SQL query
+**Example (conceptual):**
+    {
+      "question": "How many heads of the departments are older than 56?",
+      "context": "CREATE TABLE head (age INTEGER)",
+      "answer": "SELECT COUNT(*) FROM head WHERE age > 56"
+    }
+Please refer to the dataset page on Hugging Face for licensing and further details. This model inherits any legal constraints from both the base model and this dataset.
+---
+### Train / Validation Split
+The dataset only provides a `train` split. The project creates its own train/validation split using:
+- `datasets.Dataset.train_test_split` with:
+  - `test_size = val_ratio` (default: `0.08`)
+  - `seed = 42`
+Renames:
+- `train` → final training split
+- `test` → final validation split
+This yields:
+- `data/processed/train.jsonl` – training examples
+- `data/processed/val.jsonl` – validation examples
+---
+### Instruction-Tuning Format (Alpaca-style JSONL)
+Each processed example has:
+- `id` – e.g. `"sqlcc-train-000001"`
+- `instruction` – static instruction text
+- `input` – formatted schema + question
+- `output` – normalized SQL query
+- `source` – `"b-mc2/sql-create-context"`
+- `meta` – metadata (original split, row index, seed, etc.)
+**Example:**
+    {
+      "id": "sqlcc-train-000001",
+      "instruction": "Write a SQL query that answers the user's question using ONLY the tables and columns provided in the schema.",
+      "input": "### Schema:\nCREATE TABLE head (age INTEGER)\n\n### Question:\nHow many heads of the departments are older than 56 ?",
+      "output": "SELECT COUNT(*) FROM head WHERE age > 56",
+      "source": "b-mc2/sql-create-context",
+      "meta": {
+        "original_split": "train",
+        "row": 0,
+        "split": "train",
+        "val_ratio": 0.08,
+        "seed": 42,
+        "from_local_input": false
+      }
+    }
+---
+### Instruction Text
+The instruction is fixed:
+Write a SQL query that answers the user's question using ONLY the tables and columns provided in the schema.
+---
+### Input Formatting
+`input` is constructed as:
+    ### Schema:
+    <CREATE TABLE ...>
+    ### Question:
+    <question text>
+This is implemented in `text2sql.data_prep.build_input_text`.
+---
+### SQL Normalization
+The dataset builder applies light normalization to the answer:
+- Strip leading/trailing whitespace
+- Collapse runs of whitespace into a single space
+This is implemented as `text2sql.data_prep.normalize_sql`.
+---
+## Training Procedure
+### Prompt Format for Training
+To build the final training text, the project uses a simple prompt template:
+    ### Instruction:
+    <instruction>
+    ### Input:
+    <input>
+    ### Response:
+This template is defined as `PROMPT_TEMPLATE` in `src/text2sql/training/formatting.py`, and filled via:
+    from text2sql.training.formatting import build_prompt
+    prompt = build_prompt(instruction, input_text)
+    # Final training text is: prompt + output_sql
+`output_sql` is normalized SQL, optionally further cleaned with `ensure_sql_only` when used at inference time.
+---
+### Optimization
+- Optimizer & scheduler are provided by `trl.SFTTrainer` / `transformers`.
+- Mixed precision (e.g. bf16/fp16) is enabled when supported.
+- LoRA adapters are applied to a subset of projection layers; typical choices include attention and MLP projections (see training code for exact `target_modules`).
+---
+### Hardware
+Intended to run on a single modern GPU (e.g., A10, A100, L4) with ≥16GB VRAM using 4-bit quantization.
+The CLI script has:
+- `--dry_run` mode (no model load; checks dataset & formatting).
+- `--smoke` mode (lightweight config check; on CPU-only machines it skips loading the full model).
+---
+### Outputs
+After a full run you should obtain:
+- `outputs/adapters/` – LoRA adapter weights / config
+- `outputs/run_meta.json` – training config, data paths, etc.
+- `outputs/metrics.json` – training/eval metrics as reported by the trainer
+These artifacts can be published to the Hub via the helper script `scripts/publish_to_hub.py`.
+---
 ## Evaluation
+The project provides a dedicated evaluation pipeline for both internal and external validation.
+---
+### Metrics
+All evaluation flows share the same core metrics, implemented in `src/text2sql/eval/metrics.py`:
+#### Exact Match (EM) (normalized SQL)
+Uses `normalize_sql`:
+- Strip whitespace
+- Remove trailing semicolons
+- Collapse whitespace runs
+Checks exact string equality between normalized prediction and gold SQL.
+#### No-values Exact Match
+Uses `normalize_sql_no_values`:
+- Normalize SQL as above
+- Replace single-quoted string literals with a placeholder (`'__STR__'`)
+- Replace numeric literals (integers/decimals) with a placeholder (`__NUM__`)
+Captures structural equality even when literal values differ.
+#### SQL parse success rate
+Uses `sqlglot.parse_one` to parse the predicted SQL.
+Fraction of predictions that parse successfully.
+#### Schema adherence
+- Parses the `CREATE TABLE` context with `sqlglot` to recover:
+  - Tables and columns
+- Parses predicted SQL and extracts table/column references
+- A prediction is schema-adherent if all references exist in the schema.
+Metrics are aggregated as:
+    {
+      "n_examples": ...,
+      "exact_match": {"count": ..., "rate": ...},
+      "no_values_em": {"count": ..., "rate": ...},
+      "parse_success": {"count": ..., "rate": ...},
+      "schema_adherence": {"count": ..., "rate": ...}  // optional
+    }
+**Important:** At the time of writing, this model card does not include specific numeric metrics. After you run `scripts/evaluate_internal.py` and `scripts/evaluate_spider_external.py`, you should update this section with actual results from:
+- `reports/eval_internal.json` / `.md`
+- `reports/eval_spider.json` / `.md`
+---
+### Internal Evaluation (b-mc2/sql-create-context val)
+**Input:**
+`data/processed/val.jsonl` (same format as training)
+**Script:**
+    python scripts/evaluate_internal.py \
+      --val_path data/processed/val.jsonl \
+      --base_model mistralai/Mistral-7B-Instruct-v0.1 \
+      --adapter_dir /path/to/outputs/adapters \
+      --device auto \
+      --max_examples 200 \
+      --temperature 0.0 \
+      --top_p 0.9 \
+      --max_new_tokens 256 \
+      --out_dir reports/
+**Notes:**
+- `--device auto` chooses GPU when available.
+- 4-bit quantization is enabled by default on CUDA; configurable via:
+  - `--load_in_4bit` / `--no_load_in_4bit`
+  - `--bnb_4bit_quant_type`, `--bnb_4bit_compute_dtype`, etc.
+- `--smoke` runs a small subset; on CPU-only environments it falls back to mock mode (gold SQL as prediction) to exercise the metrics without loading the model.
+**Outputs:**
+- `reports/eval_internal.json`
+- `reports/eval_internal.md`
+---
+### External Validation (Spider dev)
+**Datasets:**
+- Examples: `xlangai/spider` (split: `validation`)
+- Schema helper: `richardr1126/spider-schema` (contains create_table_context)
+- License note: `richardr1126/spider-schema` is licensed under **CC BY-SA 4.0**. Spider is used only for evaluation, not training.
+**Prompt format:**
+    ### Schema:
+    <create_table_context>
+    ### Question:
+    <Spider question>
+Instruction text is the same as training. Prompts are constructed with the same formatter used for training (via helper functions in `text2sql.eval.spider`).
+**Script:**
+    python scripts/evaluate_spider_external.py \
+      --base_model mistralai/Mistral-7B-Instruct-v0.1 \
+      --adapter_dir /path/to/outputs/adapters \
+      --device auto \
+      --spider_source xlangai/spider \
+      --schema_source richardr1126/spider-schema \
+      --spider_split validation \
+      --max_examples 200 \
+      --temperature 0.0 \
+      --top_p 0.9 \
+      --max_new_tokens 256 \
+      --out_dir reports/
+**Outputs:**
+- `reports/eval_spider.json`
+- `reports/eval_spider.md`
+The same metrics (EM, no-values EM, parse success, schema adherence) are computed, but note:
+- This is not a full reproduction of official Spider evaluation (which includes component matching, execution metrics, etc.).
+- It is a lightweight proxy for cross-domain Text-to-SQL quality.
+---
+### Mock / Offline Modes
+Both evaluation scripts have `--mock` modes:
+- Use small fixtures from `tests/fixtures/`
+- Treat gold SQL as predictions
+- Avoid network / heavy model loads
+Ideal for CI and offline smoketests.
+---
+## Inference and Deployment
+### Basic Usage with Hugging Face Transformers (Adapters)
+Assuming this repo provides a LoRA adapter that you can load on top of `mistralai/Mistral-7B-Instruct-v0.1`:
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    from peft import PeftModel
+    BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.1"
+    ADAPTER_REPO = "your-username/analytics-copilot-text2sql-mistral7b-qlora"
+    tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
+    base_model = AutoModelForCausalLM.from_pretrained(
+        BASE_MODEL,
+        load_in_4bit=True,
+        device_map="auto",
+    )
+    model = PeftModel.from_pretrained(base_model, ADAPTER_REPO)
+    schema = """CREATE TABLE orders (
+      id INTEGER PRIMARY KEY,
+      customer_id INTEGER,
+      amount NUMERIC,
+      created_at TIMESTAMP
+    );"""
+    question = "Total order amount per customer for the last 7 days."
+    instruction = (
+        "Write a SQL query that answers the user's question using ONLY "
+        "the tables and columns provided in the schema."
+    )
+    input_text = f"### Schema:\n{schema}\n\n### Question:\n{question}"
+    prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+    output_ids = model.generate(
+        **inputs,
+        max_new_tokens=256,
+        temperature=0.0,
+    )
+    raw_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
+    # Optionally, post-process with the project’s SQL cleaner:
+    # from text2sql.training.formatting import ensure_sql_only
+    # sql = ensure_sql_only(raw_text)
+    print(raw_text)
+---
+### Inference Endpoints + Multi-LoRA (Recommended for Production)
+If you host the base model in a Hugging Face Inference Endpoint with a Multi-LoRA configuration (via `LORA_ADAPTERS`), you can select this adapter at inference time by `adapter_id`.
+Example environment for TGI:
+    LORA_ADAPTERS='[
+      {"id": "text2sql-qlora", "source": "your-username/analytics-copilot-text2sql-mistral7b-qlora"}
+    ]'
+Then in Python:
+    from huggingface_hub import InferenceClient
+    ENDPOINT_URL = "https://your-endpoint-1234.us-east-1.aws.endpoints.huggingface.cloud"
+    HF_TOKEN = "hf_your_token_here"
+    client = InferenceClient(base_url=ENDPOINT_URL, api_key=HF_TOKEN)
+    schema = """CREATE TABLE orders (
+      id INTEGER PRIMARY KEY,
+      customer_id INTEGER,
+      amount NUMERIC,
+      created_at TIMESTAMP
+    );"""
+    question = "Total order amount per customer for the last 7 days."
+    prompt = f"""### Schema:
+    {schema}
+    ### Question:
+    {question}
+    Return only the SQL query."""
+    response = client.post(
+        json={
+            "inputs": prompt,
+            "parameters": {
+                "adapter_id": "text2sql-qlora",
+                "max_new_tokens": 256,
+                "temperature": 0.0,
+            },
+        }
+    )
+    print(response)
+---
+### Streamlit UI
+The accompanying repo includes a Streamlit app (`app/streamlit_app.py`) that:
+- Runs on Streamlit Community Cloud or locally.
+- Calls a Hugging Face Inference Endpoint or router via `InferenceClient`.
+- Reads config from Streamlit secrets or environment:
+  - `HF_TOKEN`
+  - `HF_ENDPOINT_URL` + `HF_ADAPTER_ID` (preferred, TGI endpoint + adapter)
+  - Or `HF_MODEL_ID` + `HF_PROVIDER` (router-based fallback, for merged models)
+- Optionally uses an OpenAI fallback model when HF inference fails.
+Deployment instructions are documented in `docs/deploy_streamlit_cloud.md`.
+---
+## Ethical Considerations and Risks
+### Data and Bias
+The training data (`b-mc2/sql-create-context`) may contain:
+- Synthetic or curated schemas and questions
+- Biases in naming conventions, example queries, or tasks
+The base model (`Mistral-7B-Instruct`) is trained on large-scale web and other data. It inherits any demographic, cultural, and representational biases present in those sources.
+As a result:
+- The model can produce SQL that, if combined with biased downstream usage (e.g., unfair filtering in a user database), may exacerbate existing biases.
+- The model is not aware of ethical / legal constraints around data access; it will happily generate queries that might retrieve sensitive fields (e.g., emails, PII) if such columns exist in the schema.
+---
+### Safety and Security
+Generated SQL may contain:
+- Expensive operations (full table scans on large tables)
+- Potentially unsafe patterns (e.g., missing `LIMIT`, cross joins)
+The model does not perform:
+- Access control
+- Row-level security
+- SQL injection detection
+You must implement:
+- A strict execution sandbox:
+  - Allow only `SELECT` (no `INSERT`, `UPDATE`, `DELETE`, `DROP`, etc.)
+  - Enforce timeouts and row limits
+- Appropriate logging and review of executed queries
+---
+### Human Oversight
+Always:
+- Present generated SQL to users for review
+- Encourage edits and manual validation
+- Provide clear warnings that the system is a copilot, not an oracle
+---
+### Environmental Impact
+Training details vary depending on your hardware and hyperparameters, but in general:
+- QLoRA + 4-bit quantization significantly reduces compute and memory compared to full fine-tuning:
+  - Fewer GPU-hours
+  - Lower VRAM requirements
+- The example configuration (7B model, QLoRA, moderate steps) is designed to fit on commodity cloud GPUs (e.g., single A10/A100-class instance).
+To be transparent, you should log and publish:
+- GPU type and count
+- Total training time
+- Number of runs and restarts
+---
+## How to Cite
+If you use this model or the underlying codebase in a research project or production system, please consider citing:
+- The base model authors: Mistral AI (`mistralai/Mistral-7B-Instruct-v0.1`)
+- The training dataset: `b-mc2/sql-create-context` (see dataset page for citation)
+- This project (replace with your own reference):
+  Analytics Copilot (Text-to-SQL) – Mistral-7B QLoRA,
+  GitHub: https://github.com/brej-29/analytics-copilot-text2sql
+You may also add a BibTeX entry, for example:
+    @misc{analytics_copilot_text2sql,
+      title        = {Analytics Copilot (Text-to-SQL) -- Mistral-7B QLoRA},
+      author       = {Your Name},
+      year         = {2026},
+      howpublished = {\url{https://github.com/brej-29/analytics-copilot-text2sql}},
+      note         = {Text-to-SQL fine-tuning of Mistral-7B using QLoRA on b-mc2/sql-create-context}
+    }
+---
+## Changelog
+- **Initial adapter / model card:**
+  - QLoRA fine-tuning on `b-mc2/sql-create-context`
+  - Internal and external evaluation pipelines implemented
+  - Streamlit UI for remote inference via Hugging Face Inference