JiRack GPT 5 class 3b with SWA, RoPE
- Just GPT is ready so far in PyTorch
- Ask AI Gemini to review my model
JiRack Transformer Architecture: A Leap Forward in AI
The new JiRack transformer architecture, incorporating Sliding Window Attention (SWA) and Rotary Position Embeddings (RoPE), marks a significant evolution, elevating JiRack from a GPT-3-class model to a GPT-5-class architecture.
To achieve GPT-5-level performance, a model must be truly multimodal. This next step for JiRack will be realized through the introduction of a Multimodal Reasoning Graph Architecture, built on the powerful ROS framework.
๐ Coming Soon: JiRack 5, designed to unlock multimodal intelligence at an unprecedented level.
ROS-Based Multimodal Reasoning Graph Architecture
The proposed JiRack system features an advanced Multimodal Reasoning Graph powered by ROS (Robot Operating System), enabling seamless communication and collaboration between core AI components. JiRack 5 serves as the central reasoning hub of the architecture, with various nodes specialized for unique tasks.
Multimodal Reasoning with Knowledge Graphs
The architecture combines multimodal reasoning capabilities with a RAG (Retrieval-Augmented Generation) system, dynamic memory, and image processing. Each node communicates through ROS topics, ensuring a modular flow of data and actions.
Core Components Overview
STT Node (Whisper)
- Converts spoken input into text for further processing.
LLM Node (JiRack 5) (Custom Model on Hugging Face + ROS Java)
- Acts as the intelligent core, managing reasoning and determining intent (e.g., Q&A vs. Image Generation).
RAG System (CMS Manhattan + PGvector)
- Enhances the LLM with factual context and dynamic memory.
TTS Node (MaryTTS + ROS Java)
- Converts the LLM's response into verbal output.
Image Generation Node (Stable Diffusion / DALL-E API + ROS Java)
- Handles requests for generated visuals based on extracted prompts.
System Interaction Flow: The Reasoning Graph
The JiRack architecture is modular, with interactions structured into a clear four-stage process, all interconnected through ROS topics.
1. Input Stage (Voice Command)
The user provides a spoken command, processed by the Speech-to-Text (STT) Node.
- Input: User speech
- Output Published to ROS Topic:
/stt_input - Example Command: โDraw me a robot holding a cup of coffee.โ
2. Reasoning and Decision Stage (LLM Node)
This is the core of the system, where inference, reasoning, and decision-making take place.
- Subscriptions: Receives text from
/stt_input. - Actions: Queries RAG for dynamic context using PGvector (via CMS Manhattan).
- Intent Classification: Determines if the request is for Q&A or Image Generation.
- Output Topics:
/llm_response(always): Verbal feedback (e.g., "Okay, creating your image now.")/image_prompt(conditional): Publishes detailed prompts for image generation.
3. Verbal Response Branch (TTS Node)
Provides natural speech feedback.
- Subscriptions: Consumes
/llm_responseoutputs. - Action: Converts text to speech.
- Output: Plays synthesized audio feedback for the user.
4. Image Generation Branch
Handles image creation and display processes.
- Subscriptions: Consumes
/image_promptfor image details. - Actions:
- Calls the Image Generator via Stable Diffusion or DALL-E APIs.
- Publishes the generated image path or URL to
/generated_image_path.
- Output Recommendation: Use a Display Node to show results to the user.
Key Advantages of the ROS-Based Design
- Seamless Topic Communication: Efficient, modular data flow between components.
- Multimodality Built-In: Combines reasoning, memory, speech, and visual creativity.
- Customizable and Scalable: Perfect for building intelligent voice-enabled AI assistants with visual capabilities.
Unlocking New Possibilities
This modular, ROS-powered multimodal architecture paves the way for a truly interactive AI experience:
Voice Input โ Intelligent Reasoning โ Spoken Feedback + Visual Creativity!
Get ready for the revolution in AI with JiRack 5! ๐
JiRack RAG System
- It is microservice architecture with API Gateway and Service Discovery
- Framework Spring boot and Google embeddings model for JiRack RAG System with Chatbot and JiRach model deployment with docker scipt
- video https://www.youtube.com/watch?v=vHClQu76kMc
- RAG System https://bitbucket.org/cmsmanhattan/rag/src/main/
Copyright Office
From:
To:
Mon, Dec 15 at 7:25 AM
THIS IS AN AUTOMATED EMAIL. PLEASE DO NOT REPLY.
Thank you for submitting your registration claim using the Electronic Copyright Office (ECO) System.
The following files were successfully uploaded for service request 1-15058193231
File Name :jirackpytorch_gpt5_class_1b.zip
File Size :12588 KB
Date/Time :12/15/2025 7:24:03 AM
[THREAD ID: 1-6X1C8AZ]
United States Copyright Office
install tokenizer before run
- mkdir -p tokenizer
- wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
- wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
- wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
- wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json
Tags
#AI #Robotics #ROS #MachineLearning #MultimodalAI #VoiceAssistant #ImageGeneration