jncraton commited on
Commit
c4c68a2
·
verified ·
1 Parent(s): da19ef7

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+
9
+ base_model:
10
+ - google/flan-t5-small
11
+
12
+ datasets:
13
+ - teapotai/synthqa
14
+ - teapotai/teapot-chat
15
+
16
+ tags:
17
+ - transformers
18
+ - seq2seq
19
+ - t5
20
+ - flan-t5
21
+ - text2text-generation
22
+ - lightweight
23
+ - grounded-qa
24
+ ---
25
+ ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
26
+ # TinyTeapot 🫖
27
+
28
+ [Website](https://teapotai.com/) | [Try out our Demo](https://teapotai-tinyteapotchat.hf.space/) | [Join the discussion on Discord](https://discord.gg/xW5TjRnY)
29
+
30
+ TinyTeapot is a lightweight (~77M parameter) grounded language model optimized for low-latency, hallucination-resistant question answering and RAG workflows. Building on our prior work, [TeapotLLM](https://huggingface.co/teapotai/teapotllm), TinyTeapot delivers strong context-faithful performance while being ~10× faster on CPU, making it ideal for real-time, on-device, and cost-efficient deployments across CPUs, mobile devices, and other resource-constrained environments.
31
+
32
+ TinyTeapot is distilled from our previous model, [TeapotLLM](https://huggingface.co/teapotai/teapotllm), and trained on grounded datasets including [SynthQA](https://huggingface.co/datasets/teapotai/synthqa), a context-focused QnA and extraction benchmark, and [TeapotChat](https://huggingface.co/datasets/teapotai/teapotchat) for instruction-following and grounded dialogue. This distillation transfers TeapotLLM’s refusal behavior, structured extraction patterns, and context-only answering into a significantly smaller, edge-efficient model.
33
+
34
+ ### Hallucination Resistance
35
+ Through distillation from TeapotLLM and training on SynthQA, TinyTeapot learns to refuse questions when the answer is not present in the context, improving reliability compared to similarly sized models in RAG and document QA pipelines.
36
+
37
+ ---
38
+
39
+ ## Training & Evaluation
40
+
41
+ ### 🚀 ~10x Faster CPU Inference than TeapotLLM with Strong Grounded Performance
42
+
43
+ TinyTeapot is evaluated primarily against its teacher model (TeapotLLM) on SynthQA-style grounded tasks, with additional comparisons to larger instruction-tuned models such as LLaMA and Qwen to contextualize efficiency vs quality tradeoffs.
44
+
45
+ ### Task Performance vs TeapotLLM
46
+
47
+ The chart below shows task-level similarity across boolean reasoning, QA, extraction, summarization, and unanswerable queries. While smaller (~77M vs ~800M+), TinyTeapot retains strong grounded behavior due to distillation from TeapotLLM and training on SynthQA and TeapotChat.
48
+
49
+ ![TinyTeapot SynthQA Performance](https://teapotai.com/assets/tinyteapot_performance.png)
50
+ - TeapotLLM remains the strongest overall teacher model
51
+ - TinyTeapot maintains high QA and summarization similarity for its size
52
+ - Strong refusal performance on unanswerable questions due to grounded training
53
+ - Significant efficiency gains with minimal degradation on core grounded tasks
54
+
55
+ ### Latency vs Answer Similarity (CPU & Accelerator)
56
+
57
+ TinyTeapot is designed for real-time and edge deployments where latency is critical. Benchmarks on Google Colab (100 runs) show that TinyTeapot delivers competitive grounded similarity while being dramatically faster than larger models.
58
+
59
+ ![TinyTeapot Latency vs Similarity](https://teapotai.com/assets/tinyteapot_latency.png)
60
+
61
+ Key observations:
62
+ - 3.3s Average CPU latency for TinyTeapot vs 38s for TeapotLLM (~10x faster)
63
+ - Competitive similarity despite 10x smaller parameter count
64
+ - Significantly faster and more accurate than other SOTA ~1b parameter models in CPU-constrained settings
65
+ - Sub-second accelerator latency while maintaining grounded response quality
66
+
67
+ These results position TinyTeapot as a high-efficiency distilled model that preserves TeapotLLM’s grounded reasoning while enabling low-cost, low-latency deployment.
68
+
69
+ ---
70
+
71
+
72
+ ## Getting Started
73
+
74
+ TinyTeapot can be used directly with Hugging Face Transformers or with our python library [teapotai](https://pypi.org/project/teapotai) and follows a text-to-text format. It performs best when given explicit grounding context and the pre-trained system prompt.
75
+
76
+ ```python
77
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
78
+
79
+ MODEL_NAME = "teapotai/tinyteapot"
80
+
81
+ # Load tokenizer and model
82
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
83
+ model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
84
+
85
+ # Context
86
+ context = (
87
+ "The Eiffel Tower is a wrought iron lattice tower in Paris, France. "
88
+ "It was designed by Gustave Eiffel and completed in 1889. "
89
+ "It stands at a height of 330 meters and is one of the most recognizable "
90
+ "structures in the world."
91
+ )
92
+
93
+ # System prompt
94
+ system_prompt = (
95
+ "You are Teapot, an open-source AI assistant optimized for low-end devices, "
96
+ "providing short, accurate responses without hallucinating while excelling at "
97
+ "information extraction and text summarization. "
98
+ "If the context does not answer the question, reply exactly: "
99
+ "'I am sorry but I don't have any information on that'."
100
+ )
101
+
102
+ def ask(question: str):
103
+ prompt = f"{context}\n{system_prompt}\n{question}\n"
104
+ inputs = tokenizer(prompt, return_tensors="pt")
105
+ outputs = model.generate(
106
+ **inputs,
107
+ do_sample=False
108
+ )
109
+ answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
110
+ print(f"{question}")
111
+ print(f"{answer}\n")
112
+
113
+
114
+ # Test 1: Grounded question (should use context)
115
+ ask("How tall is the Eiffel Tower") # => 330 meters
116
+
117
+ # Test 2: Out-of-context question (tests hallucination resistance)
118
+ ask("How tall is the Death Star") # => Sorry, I don't have any information on the Death Star.
119
+ ```
120
+
121
+ ---
122
+
123
+
124
+
125
+ ## Recommended Use Cases
126
+
127
+ - Retrieval-Augmented Generation (RAG)
128
+ - Document question answering
129
+ - Information extraction
130
+ - On-device assistants
131
+ - Mobile and edge inference
132
+ - Low-latency production pipelines
133
+
134
+ TinyTeapot performs best when paired with retrieval or structured
135
+ context inputs rather than open-ended chat.
136
+
137
+ ------------------------------------------------------------------------
138
+
139
+
140
+ ### Limitations and Risks
141
+ TinyTeapot is optimized for grounded QnA, RAG, and extraction. It is not intended for open-ended chat, creative writing, or deep multi-step reasoning. Due to its small size (~77M parameters), performance is highly dependent on the quality and relevance of the provided context and may be more prone to hallucinations.
142
+
143
+ ## Questions, Feature Requests?
144
+ We hope you find TinyTeapot useful and are continuously improving the TeapotAI ecosystem. Please reach out on our [Discord](https://discord.gg/teapotai) for technical help, feedback, or feature requests. We look forward to seeing what the community builds!
145
+
146
+
147
+ ## License
148
+
149
+ MIT License
150
+
config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_source_bos": false,
3
+ "add_source_eos": false,
4
+ "bos_token": "<pad>",
5
+ "decoder_start_token": "<pad>",
6
+ "eos_token": "</s>",
7
+ "layer_norm_epsilon": null,
8
+ "multi_query_attention": false,
9
+ "unk_token": "<unk>"
10
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "decoder_start_token_id": 0,
3
+ "eos_token_id": [
4
+ 1
5
+ ],
6
+ "pad_token_id": 0,
7
+ "transformers_version": "5.1.0"
8
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:729f18f07554711d8e522c58ed97b14ec2457f6b8bf1939f236b4498c412a3e1
3
+ size 61048890
shared_vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": null,
3
+ "backend": "tokenizers",
4
+ "clean_up_tokenization_spaces": false,
5
+ "eos_token": "</s>",
6
+ "extra_ids": 100,
7
+ "additional_special_tokens": [
8
+ "<extra_id_0>",
9
+ "<extra_id_1>",
10
+ "<extra_id_2>",
11
+ "<extra_id_3>",
12
+ "<extra_id_4>",
13
+ "<extra_id_5>",
14
+ "<extra_id_6>",
15
+ "<extra_id_7>",
16
+ "<extra_id_8>",
17
+ "<extra_id_9>",
18
+ "<extra_id_10>",
19
+ "<extra_id_11>",
20
+ "<extra_id_12>",
21
+ "<extra_id_13>",
22
+ "<extra_id_14>",
23
+ "<extra_id_15>",
24
+ "<extra_id_16>",
25
+ "<extra_id_17>",
26
+ "<extra_id_18>",
27
+ "<extra_id_19>",
28
+ "<extra_id_20>",
29
+ "<extra_id_21>",
30
+ "<extra_id_22>",
31
+ "<extra_id_23>",
32
+ "<extra_id_24>",
33
+ "<extra_id_25>",
34
+ "<extra_id_26>",
35
+ "<extra_id_27>",
36
+ "<extra_id_28>",
37
+ "<extra_id_29>",
38
+ "<extra_id_30>",
39
+ "<extra_id_31>",
40
+ "<extra_id_32>",
41
+ "<extra_id_33>",
42
+ "<extra_id_34>",
43
+ "<extra_id_35>",
44
+ "<extra_id_36>",
45
+ "<extra_id_37>",
46
+ "<extra_id_38>",
47
+ "<extra_id_39>",
48
+ "<extra_id_40>",
49
+ "<extra_id_41>",
50
+ "<extra_id_42>",
51
+ "<extra_id_43>",
52
+ "<extra_id_44>",
53
+ "<extra_id_45>",
54
+ "<extra_id_46>",
55
+ "<extra_id_47>",
56
+ "<extra_id_48>",
57
+ "<extra_id_49>",
58
+ "<extra_id_50>",
59
+ "<extra_id_51>",
60
+ "<extra_id_52>",
61
+ "<extra_id_53>",
62
+ "<extra_id_54>",
63
+ "<extra_id_55>",
64
+ "<extra_id_56>",
65
+ "<extra_id_57>",
66
+ "<extra_id_58>",
67
+ "<extra_id_59>",
68
+ "<extra_id_60>",
69
+ "<extra_id_61>",
70
+ "<extra_id_62>",
71
+ "<extra_id_63>",
72
+ "<extra_id_64>",
73
+ "<extra_id_65>",
74
+ "<extra_id_66>",
75
+ "<extra_id_67>",
76
+ "<extra_id_68>",
77
+ "<extra_id_69>",
78
+ "<extra_id_70>",
79
+ "<extra_id_71>",
80
+ "<extra_id_72>",
81
+ "<extra_id_73>",
82
+ "<extra_id_74>",
83
+ "<extra_id_75>",
84
+ "<extra_id_76>",
85
+ "<extra_id_77>",
86
+ "<extra_id_78>",
87
+ "<extra_id_79>",
88
+ "<extra_id_80>",
89
+ "<extra_id_81>",
90
+ "<extra_id_82>",
91
+ "<extra_id_83>",
92
+ "<extra_id_84>",
93
+ "<extra_id_85>",
94
+ "<extra_id_86>",
95
+ "<extra_id_87>",
96
+ "<extra_id_88>",
97
+ "<extra_id_89>",
98
+ "<extra_id_90>",
99
+ "<extra_id_91>",
100
+ "<extra_id_92>",
101
+ "<extra_id_93>",
102
+ "<extra_id_94>",
103
+ "<extra_id_95>",
104
+ "<extra_id_96>",
105
+ "<extra_id_97>",
106
+ "<extra_id_98>",
107
+ "<extra_id_99>"
108
+ ],
109
+ "is_local": false,
110
+ "max_length": 512,
111
+ "model_max_length": 512,
112
+ "pad_token": "<pad>",
113
+ "sp_model_kwargs": {},
114
+ "stride": 0,
115
+ "tokenizer_class": "T5Tokenizer",
116
+ "truncation_side": "left",
117
+ "truncation_strategy": "longest_first",
118
+ "unk_token": "<unk>"
119
+ }