Spaces:
Runtime error
Runtime error
Commit ·
9b066ba
1
Parent(s): 8deae53
update
Browse files
README.md
CHANGED
|
@@ -10,4 +10,101 @@ pinned: false
|
|
| 10 |
license: afl-3.0
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
license: afl-3.0
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# H GPT4o - AI Assistant
|
| 14 |
+
|
| 15 |
+
## Overview
|
| 16 |
+
|
| 17 |
+
**H GPT4o** is an AI-powered assistant that combines rich conversational abilities with advanced capabilities, such as image generation, web search, and Q&A with images. Built with cutting-edge AI models, it provides users with an engaging and powerful tool to explore creativity, gather information, and solve problems.
|
| 18 |
+
|
| 19 |
+
## Model Description and Usage
|
| 20 |
+
|
| 21 |
+
### 1. **Gemma (Mistral-7B-Instruct-v0.3)**
|
| 22 |
+
|
| 23 |
+
- **Purpose**: Used for general conversation and generating responses to user queries.
|
| 24 |
+
- **Usage**: This model processes text-based inputs and generates human-like responses. It also orchestrates function calls for specific tasks like web search, image generation, and image-based Q&A.
|
| 25 |
+
|
| 26 |
+
### 2. **Mixtral (Nous-Hermes-2-Mixtral-8x7B-DPO)**
|
| 27 |
+
|
| 28 |
+
- **Purpose**: Handles complex queries, especially those requiring web-based information.
|
| 29 |
+
- **Usage**: Mixtral is responsible for summarizing web search results and generating detailed and structured responses based on external data.
|
| 30 |
+
|
| 31 |
+
### 3. **LLaMA (Meta-Llama-3-8B-Instruct)**
|
| 32 |
+
|
| 33 |
+
- **Purpose**: Acts as a fallback model for text generation tasks.
|
| 34 |
+
- **Usage**: When other models fail to generate a response, LLaMA is used to continue the conversation, ensuring uninterrupted interaction.
|
| 35 |
+
|
| 36 |
+
### 4. **Yi-1.5 (34B-Chat)**
|
| 37 |
+
|
| 38 |
+
- **Purpose**: Provides diverse and creative responses.
|
| 39 |
+
- **Usage**: Yi-1.5 enhances the assistant's ability to reply like a human friend with a friendly tone, short forms, and emojis.
|
| 40 |
+
|
| 41 |
+
### 5. **LLaVA (llava-interleave-qwen-0.5b-hf)**
|
| 42 |
+
|
| 43 |
+
- **Purpose**: Used for image-based Q&A and understanding visual content.
|
| 44 |
+
- **Usage**: LLaVA processes and analyzes images, enabling the assistant to answer questions related to the visual content provided by the user.
|
| 45 |
+
|
| 46 |
+
## Pipeline Description
|
| 47 |
+
|
| 48 |
+
### Input Processing
|
| 49 |
+
|
| 50 |
+
H GPT4o distinguishes between different input types (text, images) and processes them accordingly:
|
| 51 |
+
|
| 52 |
+
- **Text Inputs**: Direct user queries or prompts are sent to the `respond` function, which handles conversation flow and decides if additional functions like web search or image generation are needed.
|
| 53 |
+
- **Image Inputs**: If an image is provided, the system utilizes LLaVA to analyze the image in context with the accompanying text. The assistant can then answer questions related to the visual content.
|
| 54 |
+
|
| 55 |
+
### Function Call Management
|
| 56 |
+
|
| 57 |
+
The assistant has access to several function calls to extend its capabilities:
|
| 58 |
+
|
| 59 |
+
- **Web Search**: Initiated when the query requires external information, such as current events or detailed topics not covered by the model’s knowledge base.
|
| 60 |
+
- **Image Generation**: Generates images based on user prompts, leveraging powerful text-to-image models.
|
| 61 |
+
- **Image Q&A**: Answers questions related to images provided by the user.
|
| 62 |
+
|
| 63 |
+
### Conversational Flow
|
| 64 |
+
|
| 65 |
+
1. **Initial Response**: The conversation begins with Gemma handling general queries and responses.
|
| 66 |
+
2. **Function Execution**: Depending on the query, the assistant may call a specific function (e.g., web search, image generation).
|
| 67 |
+
3. **Web Search Integration**: If a web search is required, the Mixtral model processes and summarizes the results.
|
| 68 |
+
4. **Image Generation**: Image generation requests are handled by the `image_gen` function, which leverages external APIs or models.
|
| 69 |
+
5. **Fallbacks**: If primary models fail to provide a response, the LLaMA model is used to continue the conversation.
|
| 70 |
+
6. **Final Output**: The response, along with any generated images or information, is returned to the user.
|
| 71 |
+
|
| 72 |
+
### Distinguishing Inputs
|
| 73 |
+
|
| 74 |
+
- **Text Inputs**: Any input in plain text is treated as a user query or command, processed by the appropriate text-generation model.
|
| 75 |
+
- **Image Inputs**: Files uploaded by the user are identified as images. The system automatically routes these to LLaVA for analysis. The combination of text and image is used to create a context-specific response.
|
| 76 |
+
|
| 77 |
+
### Example Usage
|
| 78 |
+
|
| 79 |
+
- **Text-Based Query**: "What is the latest trend in AI technology?"
|
| 80 |
+
- **Image-Based Query**: "Can you describe the content of this image?" (with an image file attached)
|
| 81 |
+
- **Image Generation**: "Generate an image of a futuristic city at sunset."
|
| 82 |
+
|
| 83 |
+
## How It Works
|
| 84 |
+
|
| 85 |
+
1. **User Interaction**: Users interact with H GPT4o through a chat interface, where they can input text and upload images.
|
| 86 |
+
2. **Input Processing**: The system processes the input to determine the type (text or image) and routes it through the appropriate pipeline.
|
| 87 |
+
3. **Model Execution**: Based on the input type, the system selects the relevant model and executes the necessary function (e.g., web search, image generation, image analysis).
|
| 88 |
+
4. **Response Generation**: The system generates a response, which may include text, images, or both, and displays it to the user.
|
| 89 |
+
|
| 90 |
+
## Installation
|
| 91 |
+
|
| 92 |
+
Ensure you have the following dependencies installed:
|
| 93 |
+
|
| 94 |
+
```bash
|
| 95 |
+
pip install gradio transformers huggingface-hub requests beautifulsoup4 PIL
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
## Running the Application
|
| 99 |
+
|
| 100 |
+
To run the application, simply execute the following command:
|
| 101 |
+
|
| 102 |
+
```bash
|
| 103 |
+
python app.py
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
This will start the Gradio interface, allowing users to interact with H GPT4o.
|
| 107 |
+
|
| 108 |
+
## Conclusion
|
| 109 |
+
|
| 110 |
+
H GPT4o offers a versatile and interactive AI assistant capable of handling a wide range of tasks, from general conversation to advanced image analysis. Whether you're looking to generate images, search the web, or explore creative ideas, H GPT4o is designed to be your go-to assistant for all things AI.
|