Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +150 -0
config.json +10 -0
generation_config.json +8 -0
model.bin +3 -0
shared_vocabulary.json +0 -0
tokenizer.json +0 -0
tokenizer_config.json +119 -0

README.md ADDED Viewed

	@@ -0,0 +1,150 @@

+---
+license: mit
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+base_model:
+- google/flan-t5-small
+datasets:
+- teapotai/synthqa
+- teapotai/teapot-chat
+tags:
+- transformers
+- seq2seq
+- t5
+- flan-t5
+- text2text-generation
+- lightweight
+- grounded-qa
+---
+⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
+# TinyTeapot 🫖
+[Website](https://teapotai.com/) | [Try out our Demo](https://teapotai-tinyteapotchat.hf.space/) | [Join the discussion on Discord](https://discord.gg/xW5TjRnY)
+TinyTeapot is a lightweight (~77M parameter) grounded language model optimized for low-latency, hallucination-resistant question answering and RAG workflows. Building on our prior work, [TeapotLLM](https://huggingface.co/teapotai/teapotllm), TinyTeapot delivers strong context-faithful performance while being ~10× faster on CPU, making it ideal for real-time, on-device, and cost-efficient deployments across CPUs, mobile devices, and other resource-constrained environments.
+TinyTeapot is distilled from our previous model, [TeapotLLM](https://huggingface.co/teapotai/teapotllm), and trained on grounded datasets including [SynthQA](https://huggingface.co/datasets/teapotai/synthqa), a context-focused QnA and extraction benchmark, and [TeapotChat](https://huggingface.co/datasets/teapotai/teapotchat) for instruction-following and grounded dialogue. This distillation transfers TeapotLLM’s refusal behavior, structured extraction patterns, and context-only answering into a significantly smaller, edge-efficient model.
+### Hallucination Resistance
+Through distillation from TeapotLLM and training on SynthQA, TinyTeapot learns to refuse questions when the answer is not present in the context, improving reliability compared to similarly sized models in RAG and document QA pipelines.
+---
+## Training & Evaluation
+### 🚀 ~10x Faster CPU Inference than TeapotLLM with Strong Grounded Performance
+TinyTeapot is evaluated primarily against its teacher model (TeapotLLM) on SynthQA-style grounded tasks, with additional comparisons to larger instruction-tuned models such as LLaMA and Qwen to contextualize efficiency vs quality tradeoffs.
+### Task Performance vs TeapotLLM
+The chart below shows task-level similarity across boolean reasoning, QA, extraction, summarization, and unanswerable queries. While smaller (~77M vs ~800M+), TinyTeapot retains strong grounded behavior due to distillation from TeapotLLM and training on SynthQA and TeapotChat.
+![TinyTeapot SynthQA Performance](https://teapotai.com/assets/tinyteapot_performance.png)
+- TeapotLLM remains the strongest overall teacher model
+- TinyTeapot maintains high QA and summarization similarity for its size
+- Strong refusal performance on unanswerable questions due to grounded training
+- Significant efficiency gains with minimal degradation on core grounded tasks
+### Latency vs Answer Similarity (CPU & Accelerator)
+TinyTeapot is designed for real-time and edge deployments where latency is critical. Benchmarks on Google Colab (100 runs) show that TinyTeapot delivers competitive grounded similarity while being dramatically faster than larger models.
+![TinyTeapot Latency vs Similarity](https://teapotai.com/assets/tinyteapot_latency.png)
+Key observations:
+- 3.3s Average CPU latency for TinyTeapot vs 38s for TeapotLLM (~10x faster)
+- Competitive similarity despite 10x smaller parameter count
+- Significantly faster and more accurate than other SOTA ~1b parameter models in CPU-constrained settings
+- Sub-second accelerator latency while maintaining grounded response quality
+These results position TinyTeapot as a high-efficiency distilled model that preserves TeapotLLM’s grounded reasoning while enabling low-cost, low-latency deployment.
+---
+## Getting Started
+TinyTeapot can be used directly with Hugging Face Transformers or with our python library [teapotai](https://pypi.org/project/teapotai) and follows a text-to-text format. It performs best when given explicit grounding context and the pre-trained system prompt.
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+MODEL_NAME = "teapotai/tinyteapot"
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
+model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
+# Context
+context = (
+    "The Eiffel Tower is a wrought iron lattice tower in Paris, France. "
+    "It was designed by Gustave Eiffel and completed in 1889. "
+    "It stands at a height of 330 meters and is one of the most recognizable "
+    "structures in the world."
+)
+# System prompt
+system_prompt = (
+    "You are Teapot, an open-source AI assistant optimized for low-end devices, "
+    "providing short, accurate responses without hallucinating while excelling at "
+    "information extraction and text summarization. "
+    "If the context does not answer the question, reply exactly: "
+    "'I am sorry but I don't have any information on that'."
+)
+def ask(question: str):
+    prompt = f"{context}\n{system_prompt}\n{question}\n"
+    inputs = tokenizer(prompt, return_tensors="pt")
+    outputs = model.generate(
+        **inputs,
+        do_sample=False
+    )
+    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
+    print(f"{question}")
+    print(f"{answer}\n")
+# Test 1: Grounded question (should use context)
+ask("How tall is the Eiffel Tower") # => 330 meters
+# Test 2: Out-of-context question (tests hallucination resistance)
+ask("How tall is the Death Star") # => Sorry, I don't have any information on the Death Star.
+```
+---
+## Recommended Use Cases
+-   Retrieval-Augmented Generation (RAG)
+-   Document question answering
+-   Information extraction
+-   On-device assistants
+-   Mobile and edge inference
+-   Low-latency production pipelines
+TinyTeapot performs best when paired with retrieval or structured
+context inputs rather than open-ended chat.
+------------------------------------------------------------------------
+### Limitations and Risks
+TinyTeapot is optimized for grounded QnA, RAG, and extraction. It is not intended for open-ended chat, creative writing, or deep multi-step reasoning. Due to its small size (~77M parameters), performance is highly dependent on the quality and relevance of the provided context and may be more prone to hallucinations.
+## Questions, Feature Requests?
+We hope you find TinyTeapot useful and are continuously improving the TeapotAI ecosystem. Please reach out on our [Discord](https://discord.gg/teapotai) for technical help, feedback, or feature requests. We look forward to seeing what the community builds!
+## License
+MIT License
+⠀

config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "add_source_bos": false,
+  "add_source_eos": false,
+  "bos_token": "<pad>",
+  "decoder_start_token": "<pad>",
+  "eos_token": "</s>",
+  "layer_norm_epsilon": null,
+  "multi_query_attention": false,
+  "unk_token": "<unk>"
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "decoder_start_token_id": 0,
+  "eos_token_id": [
+    1
+  ],
+  "pad_token_id": 0,
+  "transformers_version": "5.1.0"
+}

model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:729f18f07554711d8e522c58ed97b14ec2457f6b8bf1939f236b4498c412a3e1
+size 61048890

shared_vocabulary.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,119 @@

+{
+  "add_prefix_space": null,
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "is_local": false,
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sp_model_kwargs": {},
+  "stride": 0,
+  "tokenizer_class": "T5Tokenizer",
+  "truncation_side": "left",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>"
+}