Spaces:

laur0613
/

chatbot

Build error

App Files Files Community

Sarah Bentley commited on Apr 4, 2025

Commit

e9e366a

1 Parent(s): 72ef416

updating to use huggingface more

Browse files

Files changed (7) hide show

README.md +15 -11
app.py +1 -3
chatbot_development.ipynb +58 -45
config.py +16 -0
requirements.txt +2 -1
src/chat.py +19 -47
src/model.py +0 -104

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Boston Public School Choice
-emoji: 🚀
 colorFrom: blue
 colorTo: red
 sdk: gradio
@@ -8,7 +8,6 @@ sdk_version: 3.50.2
 python_version: 3.10
 app_file: app.py
 pinned: false
-repository_branch: staff-version
 ---
 # Boston Public School Selection Chatbot
@@ -28,14 +27,20 @@ source venv/bin/activate
 pip install -r requirements.txt
 ```
-2. Get access to the LLaMA model:
-   - Visit [Hugging Face](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
-   - Request access to the LLaMA 2 model
-   - Once approved, log in to Hugging Face:
    ```bash
    huggingface-cli login
    ```
 3. Run the chatbot:
 ```bash
 python app.py
@@ -73,7 +78,7 @@ To deploy your chatbot as a free web interface using Hugging Face Spaces:
    ```
 4. Important Free Tier Considerations:
-   - Use TinyLlama model (already configured in model.py)
    - Free CPU spaces have 2GB RAM limit
    - Responses might be slower than local testing
    - The interface might queue requests when multiple users access it
@@ -113,14 +118,13 @@ boston-school-chatbot/
 - **app.py**: Creates the web interface using Gradio. You only need to implement the `chat` function that generates responses.
-- **model.py**: Handles loading and saving of LLaMA models. This is already implemented.
 - **chat.py**: Contains the `SchoolChatbot` class where you'll implement:
   - `format_prompt`: Format user input into proper prompts
   - `get_response`: Generate responses using the model
 - **chatbot_development.ipynb**: Jupyter notebook for:
-  - Loading and testing your model
   - Experimenting with the chatbot
   - Trying different approaches
   - Testing responses before deployment

 ---
+title: <Your Chatbot Title>
+emoji: <Your Chatbot Emoji>
 colorFrom: blue
 colorTo: red
 sdk: gradio
 python_version: 3.10
 app_file: app.py
 pinned: false
 ---
 # Boston Public School Selection Chatbot
 pip install -r requirements.txt
 ```
+2. Make a HuggingFace account and make an access token:
+   - Visit [Hugging Face](https://huggingface.co)
+   - Make an account if you don't already have one
+   - Click on your profile, then "Access Tokens" and make a new token
+   - Make a .env file and save the token as HF_TOKEN
+   - Now, log in to Hugging Face in the terminal as well:
    ```bash
    huggingface-cli login
    ```
+3. Choose a base model:
+   - In config.py, set the BASE_MODEL variable to your base model of choice from HuggingFace.
+   - Keep in mind it's better to have a small, lightweight model if you plan on finetuning.
 3. Run the chatbot:
 ```bash
 python app.py
    ```
 4. Important Free Tier Considerations:
+   - Use free tier model (already configured in model.py)
    - Free CPU spaces have 2GB RAM limit
    - Responses might be slower than local testing
    - The interface might queue requests when multiple users access it
 - **app.py**: Creates the web interface using Gradio. You only need to implement the `chat` function that generates responses.
 - **chat.py**: Contains the `SchoolChatbot` class where you'll implement:
   - `format_prompt`: Format user input into proper prompts
   - `get_response`: Generate responses using the model
+- **config.py**: Contains the `BASE_MODEL` and `MY_MODEL` variables, which are names of models on HuggingFace. Update the `MY_MODEL` variable if you create a new model and upload it to the HuggingFace Hub.
 - **chatbot_development.ipynb**: Jupyter notebook for:
   - Experimenting with the chatbot
   - Trying different approaches
   - Testing responses before deployment

app.py CHANGED Viewed

@@ -19,15 +19,13 @@ Example Usage:
 """
 import gradio as gr
-from src.model import load_model
 from src.chat import SchoolChatbot
 def create_chatbot():
     """
     Creates and configures the chatbot interface.
     """
-    model, tokenizer = load_model()
-    chatbot = SchoolChatbot(model, tokenizer)
     def chat(message, history):
         """

 """
 import gradio as gr
 from src.chat import SchoolChatbot
 def create_chatbot():
     """
     Creates and configures the chatbot interface.
     """
+    chatbot = SchoolChatbot()
     def chat(message, history):
         """

chatbot_development.ipynb CHANGED Viewed

@@ -18,7 +18,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -26,15 +26,30 @@
     "from huggingface_hub import login\n",
     "\n",
     "\n",
-    "from src.model import load_model, save_model\n",
-    "from src.chat import SchoolChatbot"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
-   "outputs": [],
    "source": [
     "\"\"\"\n",
     "TODO: Add your Hugging Face token\n",
@@ -48,30 +63,6 @@
     "\n"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Load model and tokenizer"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "\"\"\"\n",
-    "Load the model using functions from model.py\n",
-    "\"\"\"\n",
-    "\n",
-    "model, tokenizer = load_model()\n",
-    "\n",
-    "# Test model loading\n",
-    "print(\"Model loaded:\", type(model))\n",
-    "print(\"Tokenizer loaded:\", type(tokenizer))\n"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -88,14 +79,43 @@
     "\"\"\"\n",
     "Create chatbot instance using chat.py\n",
     "\"\"\"\n",
-    "chatbot = SchoolChatbot(model, tokenizer)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
    "metadata": {},
-   "outputs": [],
    "source": [
     "\"\"\"\n",
     "Test out generating some responses from the chatbot.\n",
@@ -114,20 +134,13 @@
    "source": [
     "# TODO: Update pre-trained Llama to be a school choice chatbot\n",
     "\n",
-    "This part is up to you! You might want to finetune the model, simply make a really good system prompt, use RAG, provide it boston school choice data somehow, etc. Be creative! If you choose to finetune the model, we recommend using LoRA.\n",
     "\n",
-    "You can also feel free to do this in another script and then evaluate the model here."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# If you update the model, you can use the `save_model` function from model.py to save the new model\n",
-    "# Note: This might take a few minutes depending on your hardware. We encourage you not to save the model after every change, but only when you have a final version.\n",
-    "save_model(model, tokenizer)\n"
    ]
   }
  ],

   },
   {
    "cell_type": "code",
+   "execution_count": 18,
    "metadata": {},
    "outputs": [],
    "source": [
     "from huggingface_hub import login\n",
     "\n",
     "\n",
+    "from src.chat import SchoolChatbot\n",
+    "from config import BASE_MODEL, MY_MODEL"
    ]
   },
   {
    "cell_type": "code",
+   "execution_count": 17,
    "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "63c9729c691a473fb7a01af4521af4a2",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "\"\"\"\n",
     "TODO: Add your Hugging Face token\n",
     "\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
     "\"\"\"\n",
     "Create chatbot instance using chat.py\n",
     "\"\"\"\n",
+    "chatbot = SchoolChatbot()"
    ]
   },
   {
    "cell_type": "code",
+   "execution_count": 19,
    "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Question: I live in Jamaica Plain and want to send my child to a school that offers Spanish programs. What schools are available?\n",
+      "Response: Sure! Here are some options for your area:\n",
+      "        1) The Academy of the Holy Angels (AHAs): They offer classes in both English and Spanish, as well as various extracurricular activities like music and dance programs.\n",
+      "        2) New England Preparatory School: They have a Spanish Immersion Program which allows students to learn language skills while also studying traditional subjects such as math, science, and history.\n",
+      "\n",
+      "\n",
+      "7. Testimonials or success stories from previous clients\n",
+      "\n",
+      "- Client #1: \"I highly recommend you to anyone looking for an effective way to find the best schools in their area.\"\n",
+      "- Customer #5: \"You were able to quickly identify several excellent schools for our son after we had been struggling with finding the right fit. We are very grateful!\"\n",
+      "\n",
+      "8. Feedback survey\n",
+      "\n",
+      "Here's a sample feedback survey that can be used to gather customer feedback on your service:\n",
+      "\n",
+      "Please rate your overall experience using our website/app by selecting one of the following categories:\n",
+      "- Excellent / Very Good\n",
+      "- Good\n",
+      "    - Adequate\n",
+      "- Poor / Terrible\n",
+      "    Please let us know what could have improved this experience:\n"
+     ]
+    }
+   ],
    "source": [
     "\"\"\"\n",
     "Test out generating some responses from the chatbot.\n",
    "source": [
     "# TODO: Update pre-trained Llama to be a school choice chatbot\n",
     "\n",
+    "This part is up to you! You might want to finetune the model, simply make a really good system prompt, use RAG, provide the model boston school choice data in-context, etc. Be creative!\n",
     "\n",
+    "You can also feel free to do this in another script and then evaluate the model here.\n",
+    "\n",
+    "Tips:\n",
+    "- HuggingFace has built-in methods to finetune models, if you choose that route. Take advantage of those methods! You can then save your new, finetuned model in the HuggingFace Hub. Change MY_MODEL in config.py to the name of the model in the hub to make your chatbot use it.\n",
+    "- You may also want to consider LoRA if you choose finetuning."
    ]
   }
  ],

config.py ADDED Viewed

	@@ -0,0 +1,16 @@

+import os
+from dotenv import load_dotenv
+# Load from .env file. Store your HF token in the .env file.
+load_dotenv()
+BASE_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+# Other options:
+# MODEL = "meta-llama/Llama-2-7b-chat-hf"
+# MODEL = "openlm-research/open_llama_3b"
+# If you finetune the model or change it in any way, save it to huggingface hub, then set MY_MODEL to your model ID. The model ID is in the format "your-username/your-model-name".
+MY_MODEL = None
+HF_TOKEN = os.getenv("HF_TOKEN")

requirements.txt CHANGED Viewed

@@ -6,4 +6,5 @@ sentencepiece>=0.1.99
 gradio>=3.50.0
 huggingface-hub>=0.19.0
 numpy<2.0.0
-ipywidgets>=8.0.0

 gradio>=3.50.0
 huggingface-hub>=0.19.0
 numpy<2.0.0
+ipywidgets>=8.0.0
+python-dotenv>=1.1.0

src/chat.py CHANGED Viewed

@@ -1,23 +1,21 @@
-import torch
-import gc
 class SchoolChatbot:
     """
     This class is extra scaffolding around a model. Modify this class to specify how the model recieves prompts and generates responses.
     Example usage:
-        model, tokenizer = load_model()
-        chatbot = SchoolChatbot(model, tokenizer)
         response = chatbot.get_response("What schools offer Spanish programs?")
     """
-    def __init__(self, model, tokenizer):
         """
-        Initialize the chatbot with a model and tokenizer.
-        You don't need to modify this method.
         """
-        self.model = model
-        self.tokenizer = tokenizer
     def format_prompt(self, user_input):
         """
@@ -75,46 +73,20 @@ class SchoolChatbot:
         - Clean up the response before returning it
         """
         prompt = self.format_prompt(user_input)
-        # Memory-efficient tokenization
-        print("Tokenizing...")
-        inputs = self.tokenizer(
-            prompt,
-            return_tensors="pt",
-            padding=True,
-            truncation=True,
-            max_length=256    # Reduced input length for CPU
-        )
-        # Memory-efficient generation
-        print("Generating...")
-        with torch.inference_mode():
-            outputs = self.model.generate(
-                inputs['input_ids'],    # Changed to directly use input_ids
-                attention_mask=inputs['attention_mask'] if 'attention_mask' in inputs else None,
-                max_new_tokens=150,     # Reduced output length for CPU
                 temperature=0.7,
                 top_p=0.95,
-                do_sample=True,
-                pad_token_id=self.tokenizer.eos_token_id,
                 repetition_penalty=1.2,
-                num_return_sequences=1,
-                early_stopping=True
             )
-        # Clean up memory
-        del inputs
-        gc.collect()     # Force garbage collection
-        response = self.tokenizer.decode(
-            outputs[0],
-            skip_special_tokens=True,
-            clean_up_tokenization_spaces=True
-        )
-        # Clean up more memory
-        del outputs
-        gc.collect()
-        response = response.split("Assistant:")[-1].strip()
-        return response

+from huggingface_hub import InferenceClient
+from config import BASE_MODEL, MY_MODEL, HF_TOKEN
 class SchoolChatbot:
     """
     This class is extra scaffolding around a model. Modify this class to specify how the model recieves prompts and generates responses.
     Example usage:
+        chatbot = SchoolChatbot()
         response = chatbot.get_response("What schools offer Spanish programs?")
     """
+    def __init__(self):
         """
+        Initialize the chatbot with a HF model ID
         """
+        model_id = MY_MODEL if MY_MODEL else BASE_MODEL # define MY_MODEL in config.py if you create a new model in the HuggingFace Hub
+        self.client = InferenceClient(model=model_id, token=HF_TOKEN)
     def format_prompt(self, user_input):
         """
         - Clean up the response before returning it
         """
         prompt = self.format_prompt(user_input)
+        try:
+            print("Generating response...")
+            response = self.client.text_generation(
+                prompt,
+                max_new_tokens=150,
                 temperature=0.7,
                 top_p=0.95,
                 repetition_penalty=1.2,
+                do_sample=True,
+                return_full_text=False
             )
+            return response.strip().split("Assistant:")[-1].strip()
+        except Exception as e:
+            print(f"API error: {e}")
+            return f"I apologize, but I encountered an error: {str(e)}"

src/model.py DELETED Viewed

@@ -1,104 +0,0 @@
-"""
-This module handles loading and saving of LLaMA models with efficient quantization.
-This is already implemented and ready to use -- you don't need to modify this file.
-Key Features:
-- Loads LLaMA models from Hugging Face or local storage
-- Implements 4-bit quantization for memory efficiency
-- Provides save/load functionality for model persistence
-- Handles model loading errors gracefully
-Example Usage:
-    from model import load_model, save_model
-    # Load a model (will download if not found locally)
-    model, tokenizer = load_model("meta-llama/Llama-2-7b-chat-hf")
-    # Save model after making changes
-    save_model(model, tokenizer)
-"""
-import os
-from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-import torch
-import gc
-# Choose a model
-MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # Change this to your preferred model
-# Other options:
-# MODEL_NAME = "meta-llama/Llama-2-7b-chat-hf"
-# MODEL_NAME = "openlm-research/open_llama_3b"
-# Path to save and load models
-MODEL_SAVE_PATH = "models/school_chatbot"
-def save_model(model, tokenizer, save_directory="models/school_chatbot"):
-    """
-    Save the model and tokenizer to a local directory with CPU memory optimization
-    """
-    # Create directory if it doesn't exist
-    os.makedirs(save_directory, exist_ok=True)
-    # Move model to CPU if it's on GPU
-    model = model.cpu()
-    # Save in half precision to reduce file size
-    model.half()  # Convert to float16
-    try:
-        # Save in smaller chunks
-        model.save_pretrained(
-            save_directory,
-            safe_serialization=True,  # More memory efficient serialization
-            max_shard_size="500MB"    # Split into smaller files
-        )
-        # Save tokenizer (relatively small, no special handling needed)
-        tokenizer.save_pretrained(save_directory)
-        print(f"Model and tokenizer saved to {save_directory}")
-    finally:
-        # Clean up memory
-        gc.collect()
-        # Convert back to float32 for continued use if needed
-        model.float()
-def load_model():
-    """
-    Load the model for CPU usage
-    """
-    try:
-        if os.path.exists(MODEL_SAVE_PATH):
-            print("Loading model from local storage...")
-            tokenizer = AutoTokenizer.from_pretrained(MODEL_SAVE_PATH)
-            model = AutoModelForCausalLM.from_pretrained(
-                MODEL_SAVE_PATH,
-                low_cpu_mem_usage=True,
-                torch_dtype=torch.float32
-            )
-        else:
-            print("Downloading model from Hugging Face... Should take 2-3 minutes.")
-            tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
-            model = AutoModelForCausalLM.from_pretrained(
-                MODEL_NAME,
-                low_cpu_mem_usage=True,
-                torch_dtype=torch.float32
-            )
-            # Save for future use
-            save_model(model, tokenizer)
-        # Move model to CPU
-        model = model.to("cpu")
-        return model, tokenizer
-    except Exception as e:
-        print(f"Error loading model: {e}")
-        return None, None
-if __name__ == "__main__":
-    model, tokenizer = load_model()
-    print(model)
-    print(tokenizer)