Spaces:
Sleeping
SwarmChat: Unified Audio, Text, and Simulation Environment for Human-Swarm Interaction
SwarmChat is an innovative project that enables intuitive communication with swarm robotics through natural language. This system integrates advanced audio transcription, text processing, and safety mechanisms with a live simulation environment that visualizes a swarm of agents executing behavior trees.
Features
Audio Input Processing:
- Record commands via a microphone.
- Translate speech into English using the
facebook/seamless-m4t-v2-largemodel. - Perform a safety check on the translated text before execution.
Text Input Processing:
- Enter text commands for swarm control.
- Translate text using EuroLLM (EuroLLM-9B-Instruct-Q4_K_M.gguf).
- Detect unsafe or inappropriate content with an integrated safety module.
Safety Module:
- Utilizes a fine-tuned LLaMA-based model (llama-guard-3-8b-q4_k_m.gguf) for safety classification.
- Identifies unsafe content across predefined categories (e.g., violent crimes, privacy violations, hate speech).
- Ensures commands comply with safety standards.
Swarm Simulation:
- Visualize a swarm of agents in a live simulation powered by Violet simulator and Pygame.
- Agents are controlled by behavior trees defined in an XML file (
tree.xml), using thepy_treesframework. - Real-time simulation updates streamed via a Gradio web interface.
Behavior Tree Generator:
- DeepSeek leverages a Llama-based model to dynamically generate behavior trees in XML format.
- Automatically extracts available behaviors from the SwarmAgent class and constructs a detailed prompt using a predefined XML template.
- Generates and saves new behavior tree configurations (updating tree.xml) based on user-specified tasks.
Integrated Interface:
- A unified Gradio web interface for both audio and text inputs.
- Live streaming of the simulation environment.
- Seamless switching between different input modalities.
Technology Stack
Backend:
- Python
- Transformers (Hugging Face)
- PyTorch
- Pygame
- Threading and Queue modules for simulation management
Frontend:
- Gradio for an interactive web-based interface.
AI Models:
- Speech Processing:
facebook/seamless-m4t-v2-largefor audio transcription and translation. - Text Processing: EuroLLM (EuroLLM-9B-Instruct-Q4_K_M.gguf) for text translation.
- Safety Classification: LLaMA Guard (llama-guard-3-8b-q4_k_m.gguf) for content safety assessment.
- Behavior Tree Generation: DeepSeek (using a Llama-based model DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf) for creating and updating behavior tree configurations.
- Speech Processing:
Behavior Trees:
- Agents utilize behavior trees—parsed from XML and built with
py_trees—to dictate their actions within the simulation.
- Agents utilize behavior trees—parsed from XML and built with
Installation
Clone the repository:
git clone https://github.com/Inventors-Hub/SwarmChat.git cd SwarmChatInstall dependencies:
pip install -r requirements.txtSetup AI Models:
- Place the EuroLLM model file (
EuroLLM-9B-Instruct-Q4_K_M.gguf) at the specified path intext_processing.py. - Place the LLaMA Guard model file (
llama-guard-3-8b-q4_k_m.gguf) at the specified path insafety_module.py. - Place the DeepSeek model file (
DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf) at the specified path inbt_generator.py.
Run the Application:
python app.pyAccess the Interface:
Open your browser and navigate to http://127.0.0.1:7860 to start using SwarmChat.
Overview of Modules
app.py
The main application integrates audio/text processing, behavior tree generation, and the live simulation. It sets up the Gradio interface, handles simulation streaming, and routes user inputs to the appropriate processing modules.speech_processing.py
Implements audio transcription and translation using thefacebook/seamless-m4t-v2-largemodel.text_processing.py
Translates text commands using EuroLLM (EuroLLM-9B-Instruct-Q4_K_M.gguf).safety_module.py
Utilizes LLaMA Guard to assess the safety of incoming commands, ensuring compliance with safety policies.bt_generator.py
Dynamically generates behavior trees in XML format by extracting behaviors from the SwarmAgent class, constructing a prompt, and querying a Llama-based model. The generated XML is saved totree.xmlfor simulation use.simulator_env.py
Powers the simulation environment, manages agent behaviors using XML-defined behavior trees, and handles real-time simulation updates.