DigitalForensicsText2SQLite / usage_example.md
pawlaszc's picture
Add usage examples
8597c7c verified
|
raw
history blame
1.82 kB
# Quick Start Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model
model = AutoModelForCausalLM.from_pretrained(
"pawlaszc/DigitalForensicsText2SQLite",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("pawlaszc/DigitalForensicsText2SQLite")
# Example schema
schema = """
CREATE TABLE messages (
_id INTEGER PRIMARY KEY,
address TEXT,
body TEXT,
date INTEGER,
read INTEGER
);
"""
# Example request
request = "Find all unread messages from yesterday"
# Generate SQL
prompt = f"""Generate a valid SQLite query for this forensic database request.
Database Schema:
{schema}
Request: {request}
SQLite Query:
"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=False)
# Extract generated SQL
input_length = inputs['input_ids'].shape[1]
generated_tokens = outputs[0][input_length:]
sql = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(sql.strip())
```
## GGUF Usage (llama.cpp)
```bash
# Download GGUF file (Q4_K_M recommended)
wget https://huggingface.co/pawlaszc/DigitalForensicsText2SQLite/resolve/main/forensic-sql-q4_k_m.gguf
# Run with llama.cpp
./llama-cli -m forensic-sql-q4_k_m.gguf -p "Your prompt here"
```
## Available Files
- **Full model (FP16):** ~6 GB - Best quality
- **Q4_K_M.gguf:** ~2.3 GB - Recommended (95% quality, 2.5× faster)
- **Q5_K_M.gguf:** ~2.8 GB - Higher quality (97% quality)
- **Q8_0.gguf:** ~3.8 GB - Highest quality (99% quality)
## Performance
- Overall: 79% accuracy
- Easy queries: 94.3%
- Medium queries: 80.6%
- Hard queries: 61.8%