Spaces:

EQUES
/

JPharmatron_parallel

Paused

App Files Files Community

JPharmatron_parallel / README.md

shinnosukeono

Initial commit: Parallel request processing for JPharmatron

86da8ad 19 days ago

preview code

raw

history blame contribute delete

1.21 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

metadata

title: JPharmatron Parallel Chat
emoji: 💊
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
hardware: nvidia-l40s-x8

JPharmatron Parallel Chat

Parallel request processing interface for JPharmatron-7B-chat model.

Features

8 Parallel Request Processing: Submit up to 8 prompts simultaneously
Independent Streaming Outputs: Each response streams independently
Multi-GPU Architecture: One vLLM engine instance per L40S GPU
True Parallelism: No contention between requests

Hardware Requirements

This Space requires 8x NVIDIA L40S GPUs (48GB VRAM each).

Each 7B model instance uses ~14GB VRAM in fp16
8 independent instances = 8x true throughput
No inter-GPU communication overhead

Usage

Enter prompts in any of the 8 input text boxes
Select mode options (pharmaceutical expert, international standards, specific procedures)
Click "Run All in Parallel" to execute all prompts simultaneously
Watch responses stream in real-time in their corresponding output boxes

Model

Uses EQUES/JPharmatron-7B-chat - a pharmaceutical domain expert model.