--- title: JPharmatron Parallel Chat emoji: 💊 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.45.0 app_file: app.py pinned: false hardware: nvidia-l40s-x8 --- # JPharmatron Parallel Chat Parallel request processing interface for JPharmatron-7B-chat model. ## Features - **8 Parallel Request Processing**: Submit up to 8 prompts simultaneously - **Independent Streaming Outputs**: Each response streams independently - **Multi-GPU Architecture**: One vLLM engine instance per L40S GPU - **True Parallelism**: No contention between requests ## Hardware Requirements This Space requires **8x NVIDIA L40S** GPUs (48GB VRAM each). - Each 7B model instance uses ~14GB VRAM in fp16 - 8 independent instances = 8x true throughput - No inter-GPU communication overhead ## Usage 1. Enter prompts in any of the 8 input text boxes 2. Select mode options (pharmaceutical expert, international standards, specific procedures) 3. Click "Run All in Parallel" to execute all prompts simultaneously 4. Watch responses stream in real-time in their corresponding output boxes ## Model Uses [EQUES/JPharmatron-7B-chat](https://huggingface.co/EQUES/JPharmatron-7B-chat) - a pharmaceutical domain expert model.