Spaces:

davideuler
/

small-model-chatbot

Paused

App Files Files Community

davideuler commited on May 23, 2025

Commit

805fa9c

1 Parent(s): 9482433

for Huggingface push

Browse files

Files changed (3) hide show

README.md +98 -1
main.py +211 -21
requirements.txt +6 -0

README.md CHANGED Viewed

@@ -10,5 +10,102 @@ pinned: false
 license: mit
 short_description: Some small models chatbot
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: mit
 short_description: Some small models chatbot
 ---
+=======
+# Multi-Model Tiny Chatbot
+A lightweight, multi-model chat application featuring several small language models optimized for different tasks. Built with Gradio for an intuitive web interface and designed for local deployment.
+## 🌟 Features
+- **Multiple Model Support**: Choose from 4 specialized small language models
+- **Lazy Loading**: Models are loaded only when selected, optimizing memory usage
+- **Real-time Chat Interface**: Smooth conversational experience with Gradio
+- **Lightweight**: All models are under 200M parameters for fast inference
+- **Local Deployment**: Run entirely on your local machine
+## 🤖 Available Models
+### 1. SmolLM2 (135M Parameters)
+- **Purpose**: General conversation and instruction following
+- **Architecture**: HuggingFace SmolLM2-135M-Instruct
+- **Best For**: General Q&A, creative writing, coding help
+- **Language**: English
+### 2. NanoLM-25M (25M Parameters)
+- **Purpose**: Ultra-lightweight instruction following
+- **Architecture**: Mistral-based with chat template support
+- **Best For**: Quick responses, simple tasks, resource-constrained environments
+- **Language**: English
+### 3. NanoTranslator-S (9M Parameters)
+- **Purpose**: English to Chinese translation
+- **Architecture**: LLaMA-based translation model
+- **Best For**: Translating English text to Chinese
+- **Language**: English → Chinese
+### 4. NanoTranslator-XL (78M Parameters)
+- **Purpose**: Enhanced English to Chinese translation
+- **Architecture**: LLaMA-based with improved accuracy
+- **Best For**: High-quality English to Chinese translation
+- **Language**: English → Chinese
+## 🚀 Quick Start
+### Prerequisites
+- Python 3.8 or higher
+- 4GB+ RAM recommended
+- Internet connection for initial model downloads
+### Installation
+1. **Run the application**
+   ```bash
+   uv run main.py
+   ```
+2. **Open your browser**
+   - Navigate to `http://localhost:7860`
+   - Select a model and start chatting!
+## 🎯 Use Cases
+### General Conversation
+- Use **SmolLM2** or **NanoLM-25M** for general chat, Q&A, and assistance
+### Translation Tasks
+- Use **NanoTranslator-S** for quick English→Chinese translations
+- Use **NanoTranslator-XL** for higher quality English→Chinese translations
+### Resource-Constrained Environments
+- **NanoLM-25M** (25M params) for ultra-lightweight deployment
+- **NanoTranslator-S** (9M params) for minimal translation needs
+## 💡 Model Performance
+| Model | Parameters | Use Case | Memory Usage | Speed |
+|-------|------------|----------|--------------|-------|
+| SmolLM2 | 135M | General Chat | ~500MB | Fast |
+| NanoLM-25M | 25M | Lightweight Chat | ~100MB | Very Fast |
+| NanoTranslator-S | 9M | Quick Translation | ~50MB | Very Fast |
+| NanoTranslator-XL | 78M | Quality Translation | ~300MB | Fast |
+### Model Sources
+- SmolLM2: `HuggingFaceTB/SmolLM2-135M-Instruct`
+- NanoLM-25M: `Mxode/NanoLM-25M-Instruct-v1.1`
+- NanoTranslator-S: `Mxode/NanoTranslator-S`
+- NanoTranslator-XL: `Mxode/NanoTranslator-XL`
+## 📝 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## 🙏 Acknowledgments
+- [HuggingFace](https://huggingface.co/) for the Transformers library and model hosting
+- [Mxode](https://huggingface.co/Mxode) for the Nano series models
+- [Gradio](https://gradio.app/) for the amazing web interface framework

main.py CHANGED Viewed

@@ -1,5 +1,5 @@
 import gradio as gr
-from transformers import AutoModelForCausalLM, AutoTokenizer, T5ForConditionalGeneration, T5Tokenizer
 class MultiModelChat:
     def __init__(self):
@@ -15,10 +15,20 @@ class MultiModelChat:
                     'tokenizer': AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct"),
                     'model': AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
                 }
-            elif model_name == 'FLAN-T5':
-                self.models['FLAN-T5'] = {
-                    'tokenizer': T5Tokenizer.from_pretrained("google/flan-t5-small"),
-                    'model': T5ForConditionalGeneration.from_pretrained("google/flan-t5-small")
                 }
             # Set pad token for the newly loaded model
@@ -30,8 +40,12 @@ class MultiModelChat:
     def chat(self, message, history, model_choice):
         if model_choice == "SmolLM2":
             return self.chat_smol(message, history)
-        elif model_choice == "FLAN-T5":
-            return self.chat_flan(message, history)
     def chat_smol(self, message, history):
         self.ensure_model_loaded('SmolLM2')
@@ -50,15 +64,79 @@ class MultiModelChat:
         response = tokenizer.decode(outputs[0], skip_special_tokens=True)
         return response.split("Assistant:")[-1].strip()
-    def chat_flan(self, message, history):
-        self.ensure_model_loaded('FLAN-T5')
-        tokenizer = self.models['FLAN-T5']['tokenizer']
-        model = self.models['FLAN-T5']['model']
-        inputs = tokenizer(f"Answer the question: {message}", return_tensors="pt")
-        outputs = model.generate(inputs.input_ids, max_length=100)
-        return tokenizer.decode(outputs[0], skip_special_tokens=True)
 chat_app = MultiModelChat()
@@ -66,18 +144,126 @@ def respond(message, history, model_choice):
     return chat_app.chat(message, history, model_choice)
 with gr.Blocks(theme="soft") as demo:
-    gr.Markdown("# Multi-Model Tiny Chatbot")
     with gr.Row():
         model_dropdown = gr.Dropdown(
-            choices=["SmolLM2", "FLAN-T5"],
-            value="SmolLM2",
-            label="Select Model"
         )
-    chatbot = gr.Chatbot(height=400)
-    msg = gr.Textbox(label="Message", placeholder="Type your message here...")
-    clear = gr.Button("Clear")
     def user_message(message, history):
         return "", history + [[message, None]]
@@ -88,9 +274,13 @@ with gr.Blocks(theme="soft") as demo:
         history[-1][1] = bot_response
         return history
     msg.submit(user_message, [msg, chatbot], [msg, chatbot]).then(
         bot_message, [chatbot, model_dropdown], chatbot
     )
     clear.click(lambda: None, None, chatbot, queue=False)
 demo.launch()

 import gradio as gr
+from transformers import AutoModelForCausalLM, AutoTokenizer
 class MultiModelChat:
     def __init__(self):
                     'tokenizer': AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct"),
                     'model': AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
                 }
+            elif model_name == 'NanoLM-25M':
+                self.models['NanoLM-25M'] = {
+                    'tokenizer': AutoTokenizer.from_pretrained("Mxode/NanoLM-25M-Instruct-v1.1"),
+                    'model': AutoModelForCausalLM.from_pretrained("Mxode/NanoLM-25M-Instruct-v1.1")
+                }
+            elif model_name == 'NanoTranslator-S':
+                self.models['NanoTranslator-S'] = {
+                    'tokenizer': AutoTokenizer.from_pretrained("Mxode/NanoTranslator-S"),
+                    'model': AutoModelForCausalLM.from_pretrained("Mxode/NanoTranslator-S")
+                }
+            elif model_name == 'NanoTranslator-XL':
+                self.models['NanoTranslator-XL'] = {
+                    'tokenizer': AutoTokenizer.from_pretrained("Mxode/NanoTranslator-XL"),
+                    'model': AutoModelForCausalLM.from_pretrained("Mxode/NanoTranslator-XL")
                 }
             # Set pad token for the newly loaded model
     def chat(self, message, history, model_choice):
         if model_choice == "SmolLM2":
             return self.chat_smol(message, history)
+        elif model_choice == "NanoLM-25M":
+            return self.chat_nanolm(message, history)
+        elif model_choice == "NanoTranslator-S":
+            return self.chat_translator(message, history)
+        elif model_choice == "NanoTranslator-XL":
+            return self.chat_translator_xl(message, history)
     def chat_smol(self, message, history):
         self.ensure_model_loaded('SmolLM2')
         response = tokenizer.decode(outputs[0], skip_special_tokens=True)
         return response.split("Assistant:")[-1].strip()
+    def chat_nanolm(self, message, history):
+        self.ensure_model_loaded('NanoLM-25M')
+        tokenizer = self.models['NanoLM-25M']['tokenizer']
+        model = self.models['NanoLM-25M']['model']
+        # Use chat template for NanoLM
+        messages = [
+            {"role": "system", "content": "You are a helpful assistant."},
+            {"role": "user", "content": message}
+        ]
+        text = tokenizer.apply_chat_template(
+            messages,
+            tokenize=False,
+            add_generation_prompt=True
+        )
+        inputs = tokenizer([text], return_tensors="pt")
+        outputs = model.generate(
+            inputs.input_ids,
+            max_new_tokens=100,
+            temperature=0.7,
+            do_sample=True,
+            pad_token_id=tokenizer.eos_token_id
+        )
+        generated_ids = [
+            output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, outputs)
+        ]
+        response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+        return response
+    def chat_translator(self, message, history):
+        self.ensure_model_loaded('NanoTranslator-S')
+        tokenizer = self.models['NanoTranslator-S']['tokenizer']
+        model = self.models['NanoTranslator-S']['model']
+        # Use translation prompt format
+        prompt = f"<|im_start|>{message}<|endoftext|>"
+        inputs = tokenizer([prompt], return_tensors="pt")
+        outputs = model.generate(
+            inputs.input_ids,
+            max_new_tokens=100,
+            temperature=0.55,
+            do_sample=True,
+            pad_token_id=tokenizer.eos_token_id
+        )
+        generated_ids = [
+            output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, outputs)
+        ]
+        response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+        return response
+    def chat_translator_xl(self, message, history):
+        self.ensure_model_loaded('NanoTranslator-XL')
+        tokenizer = self.models['NanoTranslator-XL']['tokenizer']
+        model = self.models['NanoTranslator-XL']['model']
+        # Use translation prompt format
+        prompt = f"<|im_start|>{message}<|endoftext|>"
+        inputs = tokenizer([prompt], return_tensors="pt")
+        outputs = model.generate(
+            inputs.input_ids,
+            max_new_tokens=100,
+            temperature=0.55,
+            do_sample=True,
+            pad_token_id=tokenizer.eos_token_id
+        )
+        generated_ids = [
+            output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, outputs)
+        ]
+        response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+        return response
 chat_app = MultiModelChat()
     return chat_app.chat(message, history, model_choice)
 with gr.Blocks(theme="soft") as demo:
+    gr.Markdown("# 🤖 Multi-Model Tiny Chatbot")
+    gr.Markdown("*Lightweight AI models for different tasks - Choose the right model for your needs!*")
     with gr.Row():
         model_dropdown = gr.Dropdown(
+            choices=["SmolLM2", "NanoLM-25M", "NanoTranslator-S", "NanoTranslator-XL"],
+            value="NanoLM-25M",
+            label="Select Model",
+            info="Choose the best model for your task"
+        )
+    # Model information display
+    with gr.Row():
+        model_info = gr.Markdown(
+            """
+            ## 📋 NanoLM-25M (25M) - Selected
+            **Best for:** Quick responses, simple tasks, resource-constrained environments
+            **Language:** English
+            **Memory:** ~100MB
+            **Speed:** Very Fast
+            💡 **Tip:** Ultra-lightweight model perfect for fast responses!
+            """,
+            visible=True
         )
+    chatbot = gr.Chatbot(height=400, show_label=False)
+    msg = gr.Textbox(
+        label="Message",
+        placeholder="Type your message here...",
+        lines=2
+    )
+    with gr.Row():
+        clear = gr.Button("🗑️ Clear Chat", variant="secondary")
+        submit = gr.Button("💬 Send", variant="primary")
+    # Usage tips
+    with gr.Accordion("📖 Model Usage Guide", open=False):
+        gr.Markdown("""
+        ### 🎯 When to use each model:
+        **🔵 SmolLM2 (135M)**
+        - General conversations and questions
+        - Creative writing tasks
+        - Coding help and explanations
+        - Educational content
+        **🟢 NanoLM-25M (25M)**
+        - Quick responses when speed matters
+        - Resource-constrained environments
+        - Simple Q&A tasks
+        - Mobile or edge deployment
+        **🔴 NanoTranslator-S (9M)**
+        - Fast English → Chinese translation
+        - Basic translation needs
+        - Ultra-low memory usage
+        - Real-time translation
+        **🟡 NanoTranslator-XL (78M)**
+        - High-quality English → Chinese translation
+        - Professional translation work
+        - Complex sentences and idioms
+        - Better context understanding
+        ### 💡 Pro Tips:
+        - Models load automatically when first selected (lazy loading)
+        - Translation models work best with clear, complete sentences
+        - For translation, input English text and get Chinese output
+        - Restart the app to free up memory from unused models
+        """)
+    def update_model_info(model_choice):
+        info_map = {
+            "SmolLM2": """
+            ## 📋 SmolLM2 (135M) - Selected
+            **Best for:** General conversation, Q&A, creative writing, coding help
+            **Language:** English
+            **Memory:** ~500MB
+            **Speed:** Fast
+            💡 **Tip:** Great all-around model for most conversational tasks!
+            """,
+            "NanoLM-25M": """
+            ## 📋 NanoLM-25M (25M) - Selected
+            **Best for:** Quick responses, simple tasks, resource-constrained environments
+            **Language:** English
+            **Memory:** ~100MB
+            **Speed:** Very Fast
+            💡 **Tip:** Ultra-lightweight model perfect for fast responses!
+            """,
+            "NanoTranslator-S": """
+            ## 📋 NanoTranslator-S (9M) - Selected
+            **Best for:** Fast English → Chinese translation
+            **Language:** English → Chinese
+            **Memory:** ~50MB
+            **Speed:** Very Fast
+            💡 **Tip:** Input English text to get Chinese translation. Great for quick translations!
+            """,
+            "NanoTranslator-XL": """
+            ## 📋 NanoTranslator-XL (78M) - Selected
+            **Best for:** High-quality English → Chinese translation
+            **Language:** English → Chinese
+            **Memory:** ~300MB
+            **Speed:** Fast
+            💡 **Tip:** Best translation quality for complex sentences and professional use!
+            """
+        }
+        return info_map.get(model_choice, "")
+    # Update model info when dropdown changes
+    model_dropdown.change(
+        update_model_info,
+        inputs=[model_dropdown],
+        outputs=[model_info]
+    )
     def user_message(message, history):
         return "", history + [[message, None]]
         history[-1][1] = bot_response
         return history
+    # Handle message submission
     msg.submit(user_message, [msg, chatbot], [msg, chatbot]).then(
         bot_message, [chatbot, model_dropdown], chatbot
     )
+    submit.click(user_message, [msg, chatbot], [msg, chatbot]).then(
+        bot_message, [chatbot, model_dropdown], chatbot
+    )
     clear.click(lambda: None, None, chatbot, queue=False)
 demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+gradio>=4.0.0
+transformers>=4.30.0
+torch>=2.0.0
+protobuf>=4.21.0
+accelerate>=0.20.0
+safetensors>=0.3.0