shinnosukeono's picture
Initial commit: Parallel request processing for JPharmatron
86da8ad
---
title: JPharmatron Parallel Chat
emoji: ๐Ÿ’Š
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
hardware: nvidia-l40s-x8
---
# JPharmatron Parallel Chat
Parallel request processing interface for JPharmatron-7B-chat model.
## Features
- **8 Parallel Request Processing**: Submit up to 8 prompts simultaneously
- **Independent Streaming Outputs**: Each response streams independently
- **Multi-GPU Architecture**: One vLLM engine instance per L40S GPU
- **True Parallelism**: No contention between requests
## Hardware Requirements
This Space requires **8x NVIDIA L40S** GPUs (48GB VRAM each).
- Each 7B model instance uses ~14GB VRAM in fp16
- 8 independent instances = 8x true throughput
- No inter-GPU communication overhead
## Usage
1. Enter prompts in any of the 8 input text boxes
2. Select mode options (pharmaceutical expert, international standards, specific procedures)
3. Click "Run All in Parallel" to execute all prompts simultaneously
4. Watch responses stream in real-time in their corresponding output boxes
## Model
Uses [EQUES/JPharmatron-7B-chat](https://huggingface.co/EQUES/JPharmatron-7B-chat) - a pharmaceutical domain expert model.