dl3239491
/

ml-clara

Model card Files Files and versions

ml-clara / docs /training.md

dl3239491's picture

Upload folder using huggingface_hub

30c14cd verified about 1 month ago

|

history blame contribute delete

3.02 kB

	---
	layout: default
	title: Training Guide
	permalink: /training/
	---

	# Training Guide

	This guide covers the three-stage training process in CLaRa.

	## Overview

	CLaRa uses a three-stage training approach:

	1. Stage 1: Compression Pretraining
	2. Stage 2: Compression Instruction Tuning
	3. Stage 3: End-to-End Fine-tuning (CLaRa)

	## Stage 1: Compression Pretraining

	Train the compressor to learn effective document compression.

	### Key Parameters

	- `--stage stage1`: Training stage identifier
	- `--compress_rate`: Compression rate (default: 32)
	- `--doc_max_length`: Maximum document length (default: 256)
	- `--mse_loss`: Use MSE loss for compression alignment
	- `--qa_loss`: Use QA loss for semantic preservation

	### Example Command

	```bash
	bash scripts/train_pretraining.sh
	```

	### Data Format

	Stage 1 Pretraining Data:
	```json
	{
	"data_type": "qa",
	"question": ["Question 1", "Question 2", ...],
	"answers": ["Answer 1", "Answer 2", ...],
	"docs": ["Document 1", "Document 2", ...]
	}
	```

	## Stage 2: Compression Instruction Tuning

	Fine-tune the compressor on instruction-following tasks.

	### Key Parameters

	- `--stage stage1_2`: Training stage identifier
	- `--pretrain_checkpoint`: Path to Stage 1 checkpoint
	- `--generation_top_k`: Top-k sampling (default: 5)
	- `--mse_loss`: Continue using MSE loss
	- `--do_eval_gen`: Enable generation evaluation

	### Example Command

	```bash
	bash scripts/train_instruction_tuning.sh
	```

	### Data Format

	Stage 2 Instruction Tuning Data:
	```json
	{
	"question": "Single question text",
	"docs": ["Document 1", "Document 2", ...],
	"gold_answer": "Reference answer",
	"answer": "Generated answer"
	}
	```

	## Stage 3: End-to-End Training

	Jointly train reranker and generator with retrieval.

	### Key Parameters

	- `--stage stage2`: Training stage identifier
	- `--pretrain_checkpoint`: Path to Stage 2 checkpoint
	- `--generation_top_k`: Top-k sampling for generation
	- `--do_eval_gen`: Enable generation evaluation

	### Example Command

	```bash
	bash scripts/train_stage_end_to_end.sh
	```

	### Data Format

	Stage 3 End-to-End Data:
	```json
	{
	"question": "Single question text",
	"docs": ["Document 1", "Document 2", ...],
	"gold_answer": "Reference answer"
	}
	```

	## Distributed Training

	All training stages support distributed training across multiple nodes and GPUs.

	### Key Parameters

	- `--max_len`: Maximum sequence length (2048 for stage1/stage2, 1024 for stage3)
	- `--train_batch_size`: Training batch size
	- `--micro_train_batch_size`: Micro batch size for gradient accumulation
	- `--learning_rate`: Learning rate (1e-4 for stage1/stage2, 5e-6 for stage3)
	- `--max_epochs`: Maximum training epochs
	- `--zero_stage`: ZeRO optimization stage (default: 2)
	- `--bf16`: Use bfloat16 precision
	- `--flash_attn`: Use Flash Attention 2

	## Monitoring Training

	Training progress is logged via:
	- Console output
	- Wandb (if configured)
	- Checkpoint files

	Checkpoints are saved at the path specified by `--save_path`.