Smriti AI

What this is

Smriti AI is a memory-augmented inference layer for small language models. It adds external memory, semantic retrieval, knowledge-graph recall, identity continuity, and privacy-ready memory deletion without changing base model weights.

This repository layout is intended for a Hugging Face model-style deployment with a custom handler.py. The handler loads a base causal language model or calls a remote model endpoint, wraps it with Smriti AI memory, and returns model responses plus retrieved memories.

This model-card template targets Smriti AI v1.0.9. The companion public benchmark dataset is luciferai-devil/smriti-ai-benchmarks, and the CPU-safe demo Space target is luciferai-devil/smriti-ai-demo.

Discovery keywords

Smriti AI is designed for people searching for Gemma memory, Qwen memory, small model memory, agent memory, external memory, long-term memory, semantic recall, graph recall, and training-free memory augmentation.

What this is not

Smriti AI is not a newly trained foundation model. It is not a fine-tuned model unless a separate fine-tuned checkpoint is explicitly included. It is an inference-time wrapper around a base language model.

Do not interpret this repository as a standalone model checkpoint or a Gemma/Qwen release checkpoint. Use the original base-model repositories when you need the base checkpoint itself. The base model is configured through BASE_MODEL_ID or HF_ENDPOINT_URL.

Research Lineage

Smriti AI follows four principles:

External memory: conversational facts live outside model weights in a persistent, inspectable store.
Training-free recall: relevant facts are retrieved and injected at inference time without fine-tuning the base model.
Identity continuity: persona evidence is tracked as an embedding fingerprint so outputs can be checked for drift.
Small-model augmentation: small causal language models can become more useful when paired with explicit memory and retrieval.

Historical GodelAI-Lite results were measured on an earlier system. Current Smriti AI results are measured separately and should not be conflated with historical results.

Architecture

User request
  -> Smriti AI handler
  -> memory retrieval
  -> graph retrieval
  -> identity context
  -> base model inference
  -> response
  -> memory write/update

The handler supports JSON, SQLite, Redis, and Postgres memory backends. For production, use Redis/Postgres or another external durable store. Do not store private user memory in the Hugging Face model repository.

Supported base models

Smriti AI is model-agnostic for Hugging Face causal language models.

Supported families depend on the installed transformers version and endpoint hardware:

Gemma-style causal LMs when available, including the current benchmark path google/gemma-4-E2B-it.
Qwen-style causal LMs such as Qwen/Qwen2.5-1.5B-Instruct when supported by the runtime.
Llama/Phi/Mistral-style causal LMs if supported by the runtime environment.
Deterministic CI checks are kept outside public benchmark claims.

Evaluation

Current benchmark artifacts in the main Smriti AI repository report real-model validation over generated public SmritiBench memory fixtures. They are not MLPerf certification, HELM certification, or final external industry benchmark evidence.

Benchmark-readiness audit status: benchmark_invalid_provenance.

The validation artifact is results/current/industry_benchmark_summary.json. It records model IDs, seeds, hardware/provider metadata, and privacy/delete/security counters, but it is labeled real_model_structured_fixture_validation_not_public_claim until an accepted external benchmark/dataset or third-party evaluation process is used. Historical GodelAI-Lite results were measured on an earlier system and should not be conflated with current Smriti AI results.

Privacy

Smriti AI stores user memory. Treat it as user data.

Memory can be encrypted by setting SMRITI_ENCRYPTION_KEY.
delete_memory is supported by the handler.
Production deployments should use external memory storage such as Redis/Postgres.
Do not store private user memory in the Hugging Face model repository.
Public/demo deployments should not receive real PII.

Limitations

Retrieval quality depends on the quality and specificity of stored memory.
Public/demo deployments should not receive real PII.
Durable memory requires external backend or persistent endpoint storage.
Latency depends on the base model, backend, retrieval mode, and endpoint hardware.
CPU demo mode validates handler plumbing but will not produce Gemma-quality answers.
If no BASE_MODEL_ID or HF_ENDPOINT_URL is configured, the handler returns memory-only responses.

Environment variables

Variable	Purpose
`BASE_MODEL_ID`	Hugging Face model ID to load inside the endpoint.
`HF_ENDPOINT_URL`	Optional remote model endpoint URL. If set, the handler calls this URL instead of loading a local base model.
`HF_TOKEN`	Token for gated/private base models or protected remote endpoints.
`SMRITI_MEMORY_BACKEND`	`json`, `sqlite`, `redis`, or `postgres`.
`SMRITI_MEMORY_PATH`	JSON user-memory directory or SQLite file path.
`REDIS_URL`	External Redis URL. Takes precedence when present.
`POSTGRES_DSN`	External Postgres DSN. Takes precedence when present and Redis is not configured.
`SMRITI_ENCRYPTION_KEY`	Memory encryption key. Do not commit it.
`SMRITI_RETRIEVAL_MODE`	`tfidf`, `semantic`, `semantic_graph`, or `semantic_graph_identity`.
`SMRITI_PUBLIC_DEMO`	`true` or `false`. Use `true` only for non-PII demos.
`SMRITI_MAX_MEMORY_ENTRIES`	Maximum retained entries per user/topic.

How to call the endpoint

Chat / fact injection

{
  "inputs": {
    "operation": "chat",
    "user_id": "customer-123",
    "message": "My name is Alex and I am a marine biologist.",
    "retrieval_mode": "semantic_graph_identity"
  },
  "parameters": {
    "max_new_tokens": 256,
    "temperature": 0.7,
    "top_p": 0.9,
    "return_memories": true
  }
}

Recall

{
  "inputs": {
    "operation": "chat",
    "user_id": "customer-123",
    "message": "What do you remember about me?",
    "retrieval_mode": "semantic_graph_identity"
  },
  "parameters": {
    "return_memories": true
  }
}

Delete memory

{
  "inputs": {
    "operation": "delete_memory",
    "user_id": "customer-123"
  }
}

Health

{
  "inputs": {
    "operation": "health"
  }
}

Local test

pip install -r requirements.txt
BASE_MODEL_ID=google/gemma-4-E2B-it HF_TOKEN=$HF_TOKEN SMRITI_MEMORY_BACKEND=json SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json python test_handler_local.py

Custom-container deployment

If the standard Hugging Face handler is insufficient for your model size, CUDA libraries, Redis client policy, or enterprise network requirements, deploy the same files in a custom container. Use the main Smriti AI repository Dockerfiles as the starting point, install this handler, and expose a compatible HTTP API through Hugging Face Inference Endpoints custom container support.

Harness Evolution Results

The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation.

System	Recall	Precision@K	p95 latency ms	Token overhead	Privacy delete
baseline_frozen_model	0.000	0.000	0.000	0	True
smriti_seed_harness	1.000	0.333	0.525	328	True
smriti_evolved_harness	1.000	0.333	0.168	328	True

Cross-model harness validation:

Model	Seed recall	Evolved recall	Gate
google/gemma-4-E2B-it	1.000	1.000	pass
meta-llama/Llama-3.2-1B	1.000	1.000	pass
microsoft/Phi-3-mini-4k-instruct	1.000	1.000	pass
mistralai/Mistral-7B-Instruct-v0.3	1.000	1.000	pass
Qwen/Qwen2.5-1.5B-Instruct	1.000	1.000	pass

Production gate report: results/production_gate_report.md

Historical GodelAI-Lite results remain separate lineage and are not conflated with current Smriti AI harness metrics. Deterministic CI checks are used only for stability and never counted as public benchmark evidence.

Downloads last month: 74

Model tree for luciferai-devil/smriti-ai

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(946)

this model

Dataset used to train luciferai-devil/smriti-ai

Space using luciferai-devil/smriti-ai 1

Collection including luciferai-devil/smriti-ai

Smriti AI: Memory Layer for Frozen Small Models

Collection

Training-free external memory, semantic recall, graph recall, deletion, and production gates for frozen Gemma, Qwen, and small models. • 6 items • Updated 1 day ago