Smriti AI
What this is
Smriti AI is a memory-augmented inference layer for small language models. It adds external memory, semantic retrieval, knowledge-graph recall, identity continuity, and privacy-ready memory deletion without changing base model weights.
This repository layout is intended for a Hugging Face model-style deployment with a custom handler.py. The handler loads a base causal language model or calls a remote model endpoint, wraps it with Smriti AI memory, and returns model responses plus retrieved memories.
This model-card template targets Smriti AI v1.0.9. The companion public benchmark dataset is luciferai-devil/smriti-ai-benchmarks, and the CPU-safe demo Space target is luciferai-devil/smriti-ai-demo.
Discovery keywords
Smriti AI is designed for people searching for Gemma memory, Qwen memory, small model memory, agent memory, external memory, long-term memory, semantic recall, graph recall, and training-free memory augmentation.
What this is not
Smriti AI is not a newly trained foundation model. It is not a fine-tuned model unless a separate fine-tuned checkpoint is explicitly included. It is an inference-time wrapper around a base language model.
Do not interpret this repository as a standalone model checkpoint or a Gemma/Qwen release checkpoint. Use the original base-model repositories when you need the base checkpoint itself. The base model is configured through BASE_MODEL_ID or HF_ENDPOINT_URL.
Research Lineage
Smriti AI follows four principles:
- External memory: conversational facts live outside model weights in a persistent, inspectable store.
- Training-free recall: relevant facts are retrieved and injected at inference time without fine-tuning the base model.
- Identity continuity: persona evidence is tracked as an embedding fingerprint so outputs can be checked for drift.
- Small-model augmentation: small causal language models can become more useful when paired with explicit memory and retrieval.
Historical GodelAI-Lite results were measured on an earlier system. Current Smriti AI results are measured separately and should not be conflated with historical results.
Architecture
User request
-> Smriti AI handler
-> memory retrieval
-> graph retrieval
-> identity context
-> base model inference
-> response
-> memory write/update
The handler supports JSON, SQLite, Redis, and Postgres memory backends. For production, use Redis/Postgres or another external durable store. Do not store private user memory in the Hugging Face model repository.
Supported base models
Smriti AI is model-agnostic for Hugging Face causal language models.
Supported families depend on the installed transformers version and endpoint hardware:
- Gemma-style causal LMs when available, including the current benchmark path
google/gemma-4-E2B-it. - Qwen-style causal LMs such as
Qwen/Qwen2.5-1.5B-Instructwhen supported by the runtime. - Llama/Phi/Mistral-style causal LMs if supported by the runtime environment.
- Deterministic CI checks are kept outside public benchmark claims.
Evaluation
Current benchmark artifacts in the main Smriti AI repository report real-model validation over generated public SmritiBench memory fixtures. They are not MLPerf certification, HELM certification, or final external industry benchmark evidence.
Benchmark-readiness audit status: benchmark_invalid_provenance.
The validation artifact is results/current/industry_benchmark_summary.json. It
records model IDs, seeds, hardware/provider metadata, and privacy/delete/security
counters, but it is labeled
real_model_structured_fixture_validation_not_public_claim until an accepted
external benchmark/dataset or third-party evaluation process is used. Historical
GodelAI-Lite results were measured on an earlier system and should not be
conflated with current Smriti AI results.
Privacy
Smriti AI stores user memory. Treat it as user data.
- Memory can be encrypted by setting
SMRITI_ENCRYPTION_KEY. delete_memoryis supported by the handler.- Production deployments should use external memory storage such as Redis/Postgres.
- Do not store private user memory in the Hugging Face model repository.
- Public/demo deployments should not receive real PII.
Limitations
- Retrieval quality depends on the quality and specificity of stored memory.
- Public/demo deployments should not receive real PII.
- Durable memory requires external backend or persistent endpoint storage.
- Latency depends on the base model, backend, retrieval mode, and endpoint hardware.
- CPU demo mode validates handler plumbing but will not produce Gemma-quality answers.
- If no
BASE_MODEL_IDorHF_ENDPOINT_URLis configured, the handler returns memory-only responses.
Environment variables
| Variable | Purpose |
|---|---|
BASE_MODEL_ID |
Hugging Face model ID to load inside the endpoint. |
HF_ENDPOINT_URL |
Optional remote model endpoint URL. If set, the handler calls this URL instead of loading a local base model. |
HF_TOKEN |
Token for gated/private base models or protected remote endpoints. |
SMRITI_MEMORY_BACKEND |
json, sqlite, redis, or postgres. |
SMRITI_MEMORY_PATH |
JSON user-memory directory or SQLite file path. |
REDIS_URL |
External Redis URL. Takes precedence when present. |
POSTGRES_DSN |
External Postgres DSN. Takes precedence when present and Redis is not configured. |
SMRITI_ENCRYPTION_KEY |
Memory encryption key. Do not commit it. |
SMRITI_RETRIEVAL_MODE |
tfidf, semantic, semantic_graph, or semantic_graph_identity. |
SMRITI_PUBLIC_DEMO |
true or false. Use true only for non-PII demos. |
SMRITI_MAX_MEMORY_ENTRIES |
Maximum retained entries per user/topic. |
How to call the endpoint
Chat / fact injection
{
"inputs": {
"operation": "chat",
"user_id": "customer-123",
"message": "My name is Alex and I am a marine biologist.",
"retrieval_mode": "semantic_graph_identity"
},
"parameters": {
"max_new_tokens": 256,
"temperature": 0.7,
"top_p": 0.9,
"return_memories": true
}
}
Recall
{
"inputs": {
"operation": "chat",
"user_id": "customer-123",
"message": "What do you remember about me?",
"retrieval_mode": "semantic_graph_identity"
},
"parameters": {
"return_memories": true
}
}
Delete memory
{
"inputs": {
"operation": "delete_memory",
"user_id": "customer-123"
}
}
Health
{
"inputs": {
"operation": "health"
}
}
Local test
pip install -r requirements.txt
BASE_MODEL_ID=google/gemma-4-E2B-it HF_TOKEN=$HF_TOKEN SMRITI_MEMORY_BACKEND=json SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json python test_handler_local.py
Custom-container deployment
If the standard Hugging Face handler is insufficient for your model size, CUDA libraries, Redis client policy, or enterprise network requirements, deploy the same files in a custom container. Use the main Smriti AI repository Dockerfiles as the starting point, install this handler, and expose a compatible HTTP API through Hugging Face Inference Endpoints custom container support.
Harness Evolution Results
The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation.
| System | Recall | Precision@K | p95 latency ms | Token overhead | Privacy delete |
|---|---|---|---|---|---|
| baseline_frozen_model | 0.000 | 0.000 | 0.000 | 0 | True |
| smriti_seed_harness | 1.000 | 0.333 | 0.525 | 328 | True |
| smriti_evolved_harness | 1.000 | 0.333 | 0.168 | 328 | True |
Cross-model harness validation:
| Model | Seed recall | Evolved recall | Gate |
|---|---|---|---|
| google/gemma-4-E2B-it | 1.000 | 1.000 | pass |
| meta-llama/Llama-3.2-1B | 1.000 | 1.000 | pass |
| microsoft/Phi-3-mini-4k-instruct | 1.000 | 1.000 | pass |
| mistralai/Mistral-7B-Instruct-v0.3 | 1.000 | 1.000 | pass |
| Qwen/Qwen2.5-1.5B-Instruct | 1.000 | 1.000 | pass |
Production gate report: results/production_gate_report.md
Historical GodelAI-Lite results remain separate lineage and are not conflated with current Smriti AI harness metrics. Deterministic CI checks are used only for stability and never counted as public benchmark evidence.
- Downloads last month
- 74