Spaces:
Paused
Paused
| title: JPharmatron Parallel Chat | |
| emoji: ๐ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.45.0 | |
| app_file: app.py | |
| pinned: false | |
| hardware: nvidia-l40s-x8 | |
| # JPharmatron Parallel Chat | |
| Parallel request processing interface for JPharmatron-7B-chat model. | |
| ## Features | |
| - **8 Parallel Request Processing**: Submit up to 8 prompts simultaneously | |
| - **Independent Streaming Outputs**: Each response streams independently | |
| - **Multi-GPU Architecture**: One vLLM engine instance per L40S GPU | |
| - **True Parallelism**: No contention between requests | |
| ## Hardware Requirements | |
| This Space requires **8x NVIDIA L40S** GPUs (48GB VRAM each). | |
| - Each 7B model instance uses ~14GB VRAM in fp16 | |
| - 8 independent instances = 8x true throughput | |
| - No inter-GPU communication overhead | |
| ## Usage | |
| 1. Enter prompts in any of the 8 input text boxes | |
| 2. Select mode options (pharmaceutical expert, international standards, specific procedures) | |
| 3. Click "Run All in Parallel" to execute all prompts simultaneously | |
| 4. Watch responses stream in real-time in their corresponding output boxes | |
| ## Model | |
| Uses [EQUES/JPharmatron-7B-chat](https://huggingface.co/EQUES/JPharmatron-7B-chat) - a pharmaceutical domain expert model. | |