ivnle commited on
Commit
942beb7
·
verified ·
1 Parent(s): 4ccfaf4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +16 -8
README.md CHANGED
@@ -20,11 +20,20 @@ Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick
20
 
21
  ## Available Checkpoints
22
 
23
- | Checkpoint | Objective | Hybrid | Training | PPL |
24
- |------------|-----------|--------|----------|-----|
25
- | `vision_base_h0_recon` | Reconstruction | 0 | - | 1.03 |
26
- | `vision_base_h0_lm` | LM | 0 | Direct | 5.08 |
27
- | `vision_base_h0_lm_recon-init` | LM | 0 | From reconstruction | 5.06 |
 
 
 
 
 
 
 
 
 
28
 
29
  ## Naming Convention
30
 
@@ -35,15 +44,14 @@ Ivan Lee, Cheng Yang, Taylor Berg-Kirkpatrick
35
  | Field | Values | Description |
36
  |-------|--------|-------------|
37
  | regime | vision, conv1d, meanpool, text | Compression architecture |
38
- | config | base/small/tiny/large, t500/t250, w10s10, ctx525 | Regime-specific config |
39
- | h{N} | h0, h100 | Hybrid text tokens (0 = pure vision) |
40
  | objective | recon, lm | Training objective |
41
  | recon-init | (optional) | LM initialized from reconstruction checkpoint |
42
 
43
  ## Model Details
44
 
45
  - **Architecture**: DeepSeek-OCR with trainable vision encoder
46
- - **Image Size**: 768x768 (base)
47
  - **Encoder Status**: Trained (not frozen)
48
  - **Dataset**: 510k samples from FineWiki
49
 
 
20
 
21
  ## Available Checkpoints
22
 
23
+ ### Vision (base, 768x768)
24
+
25
+ | Checkpoint | Objective | Hybrid | Training | CR | PPL |
26
+ |------------|-----------|--------|----------|-----|-----|
27
+ | `vision_base_h0_recon` | Reconstruction | 0 | - | 3.60 | 1.03 |
28
+ | `vision_base_h0_lm` | LM | 0 | Direct | 3.60 | 5.08 |
29
+ | `vision_base_h0_lm_recon-init` | LM | 0 | From recon | 3.60 | 5.06 |
30
+
31
+ ### Meanpool (w4s4)
32
+
33
+ | Checkpoint | Objective | Hybrid | Training | CR | PPL |
34
+ |------------|-----------|--------|----------|-----|-----|
35
+ | `meanpool_w4s4_h0_recon` | Reconstruction | 0 | - | 3.97 | 1.04 |
36
+ | `meanpool_w4s4_h0_lm_recon-init` | LM | 0 | From recon | 3.97 | 5.02 |
37
 
38
  ## Naming Convention
39
 
 
44
  | Field | Values | Description |
45
  |-------|--------|-------------|
46
  | regime | vision, conv1d, meanpool, text | Compression architecture |
47
+ | config | base/small/tiny/large, t500/t250, w4s4/w10s10, ctx525 | Regime-specific config |
48
+ | h{N} | h0, h100 | Hybrid text tokens (0 = pure vision/compression) |
49
  | objective | recon, lm | Training objective |
50
  | recon-init | (optional) | LM initialized from reconstruction checkpoint |
51
 
52
  ## Model Details
53
 
54
  - **Architecture**: DeepSeek-OCR with trainable vision encoder
 
55
  - **Encoder Status**: Trained (not frozen)
56
  - **Dataset**: 510k samples from FineWiki
57