Spaces:

factorstudios
/

NMFL

Runtime error

NMFL / model_inference_flow.txt

Factor Studios

Upload 207 files

1980145 verified 5 months ago

3.22 kB

	Model Inference Flow on Virtual GPU
	================================

	1. Storage and VRAM Setup
	-------------------------
	[HTTPGPUStorage]
	│ ╲
	│ ╲ Zero-Copy
	│ ╲ Memory Mapping
	▼ ▼
	[Local Storage]──>[Virtual VRAM]
	(Memory Pages) (Page Tables)
	│ │
	└──────────────┐ │
	▼ ▼
	[vGPU Device]
	│
	▼
	2. Model Loading and Device Movement
	----------------------------------
	[Florence-2-Large] ---load---> [PyTorch Model]
	│ │
	│ ▼
	│ [to_vgpu() conversion]
	│ │
	└─────────────────┐ │
	▼ ▼
	[Model on vGPU Device]
	│
	▼
	3. Input Processing and Inference
	--------------------------------
	[Input Text] -----> [Tokenizer] -----> [Tensor]
	│
	▼
	[to_vgpu() conversion]
	│
	▼
	[Tensor on vGPU]
	│
	▼
	4. Model Inference Flow
	----------------------
	[Model Forward Pass]
	│
	▼
	[vGPU Computation]
	│
	▼
	[PyTorch Output Tensor]
	│
	▼
	[Last Hidden State]
	(Shape: [batch_size, seq_length, hidden_size])

	Data Flow and Memory Management:
	-----------------------------
	1. Storage Layer:
	- HTTPGPUStorage ──> Local Storage (Memory Pages)
	- Local Storage ──> Virtual VRAM (Zero-Copy)
	- Virtual VRAM manages page tables pointing to local storage

	2. Memory Architecture:
	- Local Storage: Physical memory pages
	- Virtual VRAM: Page tables and memory mappings
	- Zero-copy between Local Storage and VRAM
	- Direct memory access for GPU operations

	3. Processing Flow:
	- Model Layer: HF Model ──> PyTorch ──> vGPU
	- Input Layer: Text ──> Tokens ──> Tensor ──> vGPU
	- Output Layer: vGPU ──> PyTorch Tensor ──> Results

	Key Components:
	--------------
	- HTTP Storage: HTTPGPUStorage (Network interface)
	- Local Store: Memory pages (Physical storage)
	- Virtual VRAM: Page tables (Memory management)
	- Device: vGPU (Computation)
	- Model: Florence-2-Large (transformer)
	- Framework: PyTorch (ML operations)
	- Interface: to_vgpu() (Zero-copy transfer)

	Memory Management Details:
	------------------------
	1. Local Storage:
	- Manages physical memory pages
	- Direct mapping to virtual VRAM
	- Zero-copy access for GPU ops

	2. Virtual VRAM:
	- Page table management
	- Memory mapping to local storage
	- No physical copying of data
	- Direct GPU access to memory