File size: 3,224 Bytes
1980145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Model Inference Flow on Virtual GPU
================================

1. Storage and VRAM Setup
-------------------------
[HTTPGPUStorage]  
      β”‚     β•²
      β”‚      β•²    Zero-Copy
      β”‚       β•²   Memory Mapping
      β–Ό        β–Ό
[Local Storage]──>[Virtual VRAM]
 (Memory Pages)     (Page Tables)
      β”‚                  β”‚
      └──────────────┐  β”‚
                     β–Ό  β–Ό
                [vGPU Device]
                     β”‚
                     β–Ό
2. Model Loading and Device Movement
----------------------------------
[Florence-2-Large] ---load---> [PyTorch Model]
         β”‚                          β”‚
         β”‚                          β–Ό
         β”‚                   [to_vgpu() conversion]
         β”‚                          β”‚
         └─────────────────┐       β”‚
                          β–Ό       β–Ό
                    [Model on vGPU Device]
                           β”‚
                           β–Ό
3. Input Processing and Inference
--------------------------------
[Input Text] -----> [Tokenizer] -----> [Tensor]
                                         β”‚
                                         β–Ό
                              [to_vgpu() conversion]
                                         β”‚
                                         β–Ό
                               [Tensor on vGPU]
                                         β”‚
                                         β–Ό
4. Model Inference Flow
----------------------
[Model Forward Pass]
       β”‚
       β–Ό
[vGPU Computation]
       β”‚
       β–Ό
[PyTorch Output Tensor]
       β”‚
       β–Ό
[Last Hidden State]
(Shape: [batch_size, seq_length, hidden_size])

Data Flow and Memory Management:
-----------------------------
1. Storage Layer:   
   - HTTPGPUStorage ──> Local Storage (Memory Pages)
   - Local Storage ──> Virtual VRAM (Zero-Copy)
   - Virtual VRAM manages page tables pointing to local storage

2. Memory Architecture:
   - Local Storage: Physical memory pages
   - Virtual VRAM: Page tables and memory mappings
   - Zero-copy between Local Storage and VRAM
   - Direct memory access for GPU operations

3. Processing Flow:
   - Model Layer:   HF Model ──> PyTorch ──> vGPU
   - Input Layer:   Text ──> Tokens ──> Tensor ──> vGPU
   - Output Layer:  vGPU ──> PyTorch Tensor ──> Results

Key Components:
--------------
- HTTP Storage:  HTTPGPUStorage (Network interface)
- Local Store:   Memory pages (Physical storage)
- Virtual VRAM:  Page tables (Memory management)
- Device:        vGPU (Computation)
- Model:         Florence-2-Large (transformer)
- Framework:     PyTorch (ML operations)
- Interface:     to_vgpu() (Zero-copy transfer)

Memory Management Details:
------------------------
1. Local Storage:
   - Manages physical memory pages
   - Direct mapping to virtual VRAM
   - Zero-copy access for GPU ops

2. Virtual VRAM:
   - Page table management
   - Memory mapping to local storage
   - No physical copying of data
   - Direct GPU access to memory