Spaces:

EQUES
/

JPharmatron_parallel

Paused

JPharmatron_parallel / README.md

Initial commit: Parallel request processing for JPharmatron

86da8ad 19 days ago

1.21 kB

	---
	title: JPharmatron Parallel Chat
	emoji: 💊
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.45.0
	app_file: app.py
	pinned: false
	hardware: nvidia-l40s-x8
	---

	# JPharmatron Parallel Chat

	Parallel request processing interface for JPharmatron-7B-chat model.

	## Features

	- 8 Parallel Request Processing: Submit up to 8 prompts simultaneously
	- Independent Streaming Outputs: Each response streams independently
	- Multi-GPU Architecture: One vLLM engine instance per L40S GPU
	- True Parallelism: No contention between requests

	## Hardware Requirements

	This Space requires 8x NVIDIA L40S GPUs (48GB VRAM each).

	- Each 7B model instance uses ~14GB VRAM in fp16
	- 8 independent instances = 8x true throughput
	- No inter-GPU communication overhead

	## Usage

	1. Enter prompts in any of the 8 input text boxes
	2. Select mode options (pharmaceutical expert, international standards, specific procedures)
	3. Click "Run All in Parallel" to execute all prompts simultaneously
	4. Watch responses stream in real-time in their corresponding output boxes

	## Model

	Uses [EQUES/JPharmatron-7B-chat](https://huggingface.co/EQUES/JPharmatron-7B-chat) - a pharmaceutical domain expert model.