Trained LoRA Adapter for SGLang Embedding LoRA Testing

This is a fine-tuned LoRA adapter for testing SGLang's embedding LoRA implementation. Unlike randomly initialized adapters, this one produces coherent text outputs.

Configuration

Base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
LoRA rank (r): 8
LoRA alpha: 16
Target modules: embed_tokens, lm_head, q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training steps: 500
Training data: alpaca dataset (instruction following)

Weight Shapes

down_proj.lora_A: (8, 5632)
down_proj.lora_B: (2048, 8)
embed_tokens.lora_embedding_A: (8, 32000)
embed_tokens.lora_embedding_B: (2048, 8)
gate_proj.lora_A: (8, 2048)
gate_proj.lora_B: (5632, 8)
k_proj.lora_A: (8, 2048)
k_proj.lora_B: (256, 8)
lm_head.lora_A: (8, 2048)
lm_head.lora_B: (32000, 8)
o_proj.lora_A: (8, 2048)
o_proj.lora_B: (2048, 8)
q_proj.lora_A: (8, 2048)
q_proj.lora_B: (2048, 8)
up_proj.lora_A: (8, 2048)
up_proj.lora_B: (5632, 8)
v_proj.lora_A: (8, 2048)
v_proj.lora_B: (256, 8)

Purpose

This adapter tests that SGLang's ChunkedSgmvLoRABackend.run_lora_a_embedding() correctly handles embedding LoRA layers (embed_tokens, lm_head).

Key: embed_tokens is in target_modules (LoRA decomposition), NOT modules_to_save (full weights).

Usage with SGLang

# Used by: test/srt/lora/test_lora_hf_sgl_logprob_diff.py
# The adapter produces coherent outputs for meaningful CI/CD verification.

Created with

python scripts/playground/lora/train_embedding_lora_adapter.py --num_train_steps 500