Instructions to use EphAsad/Mnemosyne-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EphAsad/Mnemosyne-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EphAsad/Mnemosyne-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("EphAsad/Mnemosyne-3B") model = AutoModelForCausalLM.from_pretrained("EphAsad/Mnemosyne-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use EphAsad/Mnemosyne-3B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Mnemosyne-3B", filename="Mnemosyne-3B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use EphAsad/Mnemosyne-3B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EphAsad/Mnemosyne-3B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EphAsad/Mnemosyne-3B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EphAsad/Mnemosyne-3B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EphAsad/Mnemosyne-3B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf EphAsad/Mnemosyne-3B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf EphAsad/Mnemosyne-3B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf EphAsad/Mnemosyne-3B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf EphAsad/Mnemosyne-3B:Q4_K_M
Use Docker
docker model run hf.co/EphAsad/Mnemosyne-3B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use EphAsad/Mnemosyne-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EphAsad/Mnemosyne-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Mnemosyne-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EphAsad/Mnemosyne-3B:Q4_K_M
- SGLang
How to use EphAsad/Mnemosyne-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EphAsad/Mnemosyne-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Mnemosyne-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EphAsad/Mnemosyne-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Mnemosyne-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use EphAsad/Mnemosyne-3B with Ollama:
ollama run hf.co/EphAsad/Mnemosyne-3B:Q4_K_M
- Unsloth Studio new
How to use EphAsad/Mnemosyne-3B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Mnemosyne-3B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Mnemosyne-3B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EphAsad/Mnemosyne-3B to start chatting
- Pi new
How to use EphAsad/Mnemosyne-3B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EphAsad/Mnemosyne-3B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "EphAsad/Mnemosyne-3B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use EphAsad/Mnemosyne-3B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EphAsad/Mnemosyne-3B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default EphAsad/Mnemosyne-3B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use EphAsad/Mnemosyne-3B with Docker Model Runner:
docker model run hf.co/EphAsad/Mnemosyne-3B:Q4_K_M
- Lemonade
How to use EphAsad/Mnemosyne-3B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull EphAsad/Mnemosyne-3B:Q4_K_M
Run and chat with the model
lemonade run user.Mnemosyne-3B-Q4_K_M
List all available models
lemonade list
Mnemosyne-3B
Mnemosyne-3B is a QLoRA fine-tune of Qwen/Qwen2.5-Coder-3B-Instruct for natural language to SQL generation, with a specialisation in laboratory, scientific, food safety, water quality, and environmental microbiology database schemas.
The model is designed for low-latency local or server-side SQL generation. It is released in full bf16 precision and in GGUF format (Q4_K_M and Q8_0) for use with llama.cpp, Ollama, and LM Studio.
Intended Use
Mnemosyne-3B is suited for:
- Generating SQL queries from natural language questions against a provided database schema
- Applications in laboratory information management systems (LIMS), food and water testing, and scientific data management
- General-purpose text-to-SQL use cases where low-latency local inference is required
- Developer tooling, data analyst assistants, and schema-aware chatbots
Mnemosyne-3B is not suited for:
- Tasks requiring external knowledge beyond the provided schema
- Applications without a schema context (schema must be provided at inference time)
- Safety-critical automated execution without a human review step
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-Coder-3B-Instruct |
| Parameters | 3B |
| Fine-tuning method | QLoRA |
| LoRA rank / alpha | r=64, alpha=128 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training hardware | NVIDIA A100 40GB |
| Training framework | Unsloth + TRL SFTTrainer |
| Precision (training) | bf16 with 4-bit quantised base (QLoRA) |
| Precision (release) | bf16 (merged), Q4_K_M GGUF, Q8_0 GGUF |
| License | Apache 2.0 |
| Author | Zain Asad |
Training
Hyperparameters
| Setting | Value |
|---|---|
| Epochs | 2 (best checkpoint at step 1000 of 1224) |
| Per-device batch size | 32 |
| Gradient accumulation steps | 2 |
| Effective batch size | 64 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Weight decay | 0.01 |
| Optimiser | AdamW 8-bit |
| Max sequence length | 2048 |
| Checkpoint selection | Best eval loss (load_best_model_at_end=True) |
The model converged at step 1000 — eval loss plateaued beyond this point, and checkpoint-1000 was selected as the final model.
Prompt Format
Mnemosyne-3B uses the Qwen2.5 ChatML format with a task-specific system prompt:
<|im_start|>system
You are Mnemosyne, an expert SQL assistant specialising in laboratory,
scientific, food safety, water quality, and general-purpose database queries.
Given a database schema and a natural language question, generate a correct,
well-formatted SQL query. Return only the SQL with no explanation.<|im_end|>
<|im_start|>user
### Schema:
{DDL}
### Question:
{natural_language_question}<|im_end|>
<|im_start|>assistant
{sql_query}<|im_end|>
Training Data
Mnemosyne-3B was trained on a combination of three datasets, capped at 20,000 examples per source and shuffled before training:
| Dataset | Examples used | Role |
|---|---|---|
| b-mc2/sql-create-context | 20,000 | General SQL foundation — single and multi-table queries |
| gretelai/synthetic_text_to_sql | 20,000 | Complex SQL complexity — CTEs, window functions, subqueries |
| Mnemosyne Lab Dataset (custom) | 579 | Laboratory / LIMS domain specialisation |
Combined training set: ~40,579 examples (after quality filtering).
Mnemosyne Lab Dataset
The lab domain dataset was purpose-built for this fine-tune. It covers an 8-table LIMS schema (clients, samples, analysts, methods, determinands, results, worksheets, worksheet_samples) spanning food safety, drinking water, surface water, and environmental microbiology testing.
All examples are entirely synthetic and fictional. Company names, client names, staff names, and sample identifiers are invented. Analyte names and method references (ISO 9308-1, ISO 11290-1, ISO 6579-1, EC 2073/2005, EU DWD 2020/2184, etc.) reflect public international standards and are not proprietary. The dataset contains no real personal data, no real employer information, and no confidential laboratory records.
The dataset covers three complexity tiers:
| Tier | Examples | Coverage |
|---|---|---|
| Simple | 217 | Single-table SELECT, basic WHERE/COUNT/LIMIT, NULL checks, date filters |
| Moderate | 215 | Multi-table JOINs, GROUP BY + HAVING, CASE WHEN, NOT EXISTS, turnaround calculations |
| Complex | 147 | CTEs, LAG/LEAD, RANK/DENSE_RANK/NTILE, rolling averages, correlated subqueries, UNION, year-on-year pivots |
Evaluation
All evaluations use execution accuracy (EX) — result set comparison against a live SQLite database — rather than exact match. This is the gold standard metric for text-to-SQL because syntactically different queries can return identical results. Exact match (EM) and valid SQL rate (VLD) are reported as supplementary metrics.
Results
| Benchmark | n | Metric | Base (Qwen2.5-Coder-3B-Instruct) | Mnemosyne-3B | Delta |
|---|---|---|---|---|---|
| Spider (train split) | 500 | EX | 65.4% | 57.6% | -7.8% |
| Spider (train split) | 500 | EM | 17.8% | 15.2% | -2.6% |
| Spider (train split) | 500 | VLD | 98.0% | 95.0% | -3.0% |
| Lab Domain — Overall | 200 | EX | 30.0% | 78.0% | +48.0% |
| Lab Domain — Simple | 90 | EX | 53.3% | 98.9% | +45.6% |
| Lab Domain — Moderate | 71 | EX | 15.5% | 70.4% | +54.9% |
| Lab Domain — Complex | 39 | EX | 2.6% | 43.6% | +41.0% |
Interpretation
Mnemosyne-3B demonstrates the expected trade-off of targeted domain fine-tuning: a modest regression on general cross-domain SQL (Spider, -7.8% EX) in exchange for a large gain on laboratory domain SQL (+48% EX overall). The base model scores near-zero (2.6% EX) on complex LIMS queries; Mnemosyne-3B reaches 43.6% — a 17× improvement.
The high exact match on the lab suite (EM=89% vs EX=78%) reflects a known limitation of the evaluation setup: gold SQL was authored in PostgreSQL syntax, and some PostgreSQL-specific functions (DATE_TRUNC, INTERVAL, EXTRACT) are not natively supported by SQLite. Queries where Mnemosyne generates an exact match with the gold SQL may fail execution against SQLite even though the SQL is correct. Real-world execution accuracy on a PostgreSQL deployment would be higher than reported.
Limitations
- General SQL regression: Mnemosyne-3B trades approximately 8% Spider EX for domain specialisation. For purely general-purpose SQL use cases, the base
Qwen2.5-Coder-3B-Instructmay perform better. - Schema required at inference time: The model has no implicit knowledge of any specific database. A DDL schema must be provided in every prompt.
- Schema length: Very long schemas (many tables, many columns) may be truncated at the 2048-token context limit. Prioritise relevant tables where possible.
- Complex SQL ceiling: At 3B parameters, performance on multi-CTE, deeply nested, or multi-schema queries is limited. Consider larger models for enterprise-grade analytical SQL.
- Dialect sensitivity: The model was primarily trained on ANSI/PostgreSQL-style SQL. Highly dialect-specific syntax (T-SQL, PL/pgSQL procedural blocks) is not a primary use case.
- No execution or error correction: The model generates SQL in a single forward pass. It does not self-correct on execution errors. Downstream agents should implement error-feedback loops if needed.
How to Use
Transformers (Python)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "EphAsad/Mnemosyne-3B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
SYSTEM_PROMPT = (
"You are Mnemosyne, an expert SQL assistant specialising in laboratory, "
"scientific, food safety, water quality, and general-purpose database queries. "
"Given a database schema and a natural language question, generate a correct, "
"well-formatted SQL query. Return only the SQL with no explanation."
)
def generate_sql(schema: str, question: str) -> str:
prompt = (
f"<|im_start|>system\n{SYSTEM_PROMPT}<|im_end|>\n"
f"<|im_start|>user\n"
f"### Schema:\n{schema.strip()}\n\n"
f"### Question:\n{question.strip()}<|im_end|>\n"
f"<|im_start|>assistant\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.1,
do_sample=True,
eos_token_id=tokenizer.convert_tokens_to_ids("<|im_end|>"),
)
decoded = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True,
)
return decoded.split("<|im_end|>")[0].strip()
# General SQL example
schema = """
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(100),
price FLOAT,
category VARCHAR(50),
stock INT
);
"""
question = "Show the top 5 most expensive products in the Electronics category that are still in stock."
print(generate_sql(schema, question))
# SELECT name, price FROM products
# WHERE category = 'Electronics' AND stock > 0
# ORDER BY price DESC LIMIT 5;
# Laboratory domain example
lab_schema = """
CREATE TABLE results (
result_id SERIAL PRIMARY KEY,
sample_id VARCHAR(20),
determinand_id INT,
numeric_value FLOAT,
pass_fail CHAR(1),
test_date DATE
);
CREATE TABLE determinands (
determinand_id SERIAL PRIMARY KEY,
determinand_name VARCHAR(100),
unit VARCHAR(20)
);
CREATE TABLE samples (
sample_id VARCHAR(20) PRIMARY KEY,
matrix VARCHAR(50),
collection_date DATE,
client_id INT
);
CREATE TABLE clients (
client_id SERIAL PRIMARY KEY,
client_name VARCHAR(100)
);
"""
lab_question = "Show all failed E. coli results from drinking water samples in the last 30 days, ordered by numeric value descending."
print(generate_sql(lab_schema, lab_question))
Unsloth (fast inference)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="EphAsad/Mnemosyne-3B",
max_seq_length=2048,
dtype=None,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
GGUF — Ollama
ollama run EphAsad/Mnemosyne-3B-Q4_K_M
GGUF — llama.cpp
./llama-cli \
-m Mnemosyne-3B-Q4_K_M.gguf \
--temp 0.1 \
-n 256 \
-p "<|im_start|>system\nYou are Mnemosyne...
Available Files
| File | Format | Size (approx) | Use case |
|---|---|---|---|
model.safetensors (sharded) |
bf16 | ~6.2 GB | Full precision inference, further fine-tuning |
Mnemosyne-3B-Q4_K_M.gguf |
GGUF 4-bit | ~2.0 GB | llama.cpp, LM Studio, Ollama — recommended for most users |
Mnemosyne-3B-Q8_0.gguf |
GGUF 8-bit | ~3.3 GB | llama.cpp, LM Studio, Ollama — higher quality |
Ethical Considerations
Training data privacy: The custom lab domain dataset used in training contains no real personal data, no real employer information, and no confidential laboratory records. All company names, client names, staff names, and identifiers are entirely fictional. Analyte names and method references reflect public international standards (ISO, EN, EPA, EC regulations).
Intended for decision support, not autonomous operation: SQL generated by this model should be reviewed before execution in production systems. The model may produce syntactically valid but semantically incorrect queries, particularly on complex schemas it has not been trained on. Human review is strongly recommended in regulated environments.
Potential for misuse: As with all SQL generation models, outputs should not be executed with elevated database privileges without appropriate access controls. The model has no awareness of data sensitivity or access permissions.
Citation
If you use Mnemosyne-3B in your work, please cite:
@misc{asad2025mnemosyne3b,
author = {Zain Asad},
title = {Mnemosyne-3B: A Domain-Specialised Text-to-SQL Model for Laboratory and Scientific Databases},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/EphAsad/Mnemosyne-3B}
}
Acknowledgements
- Qwen team at Alibaba Cloud for the Qwen2.5-Coder base model
- Unsloth for the efficient QLoRA training framework
- b-mc2 and Gretel AI for the open SQL training datasets
- Downloads last month
- 146
Model tree for EphAsad/Mnemosyne-3B
Datasets used to train EphAsad/Mnemosyne-3B
gretelai/synthetic_text_to_sql
Evaluation results
- Execution Accuracy (EX%) on Spider (train split, n=500)self-reported57.600
- Exact Match (EM%) on Spider (train split, n=500)self-reported15.200
- Valid SQL Rate (VLD%) on Spider (train split, n=500)self-reported95.000
- Execution Accuracy (EX%) on Lab Domain — Simple (n=90)self-reported98.900
- Execution Accuracy (EX%) on Lab Domain — Moderate (n=71)self-reported70.400
- Execution Accuracy (EX%) on Lab Domain — Complex (n=39)self-reported43.600