kerdosai / docs /QUICKSTART.md

Anonymous Hunter

feat: Add robust configuration management, Docker support, initial testing, and quickstart documentation.

f21249a 21 days ago

4.79 kB

	# KerdosAI Quick Start Guide

	## Installation

	### Using pip (Recommended)

	```bash
	# Clone the repository
	git clone https://github.com/bhaskarvilles/kerdosai.git
	cd kerdosai

	# Create virtual environment
	python3 -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt

	# Install development dependencies (optional)
	pip install pytest pytest-cov black ruff mypy rich typer
	```

	### Using Docker

	```bash
	# Build the image
	docker-compose build

	# Run training
	docker-compose run kerdosai-train

	# Start API server
	docker-compose up kerdosai-api
	```

	## Quick Start

	### 1. Basic Training

	```bash
	# Train with default configuration
	python cli.py train \
	--model gpt2 \
	--data ./data/train.json \
	--output ./output

	# Train with custom configuration
	python cli.py train --config configs/default.yaml
	```

	### 2. Using Configuration Files

	Create a configuration file `my_config.yaml`:

	```yaml
	base_model: "gpt2"
	output_dir: "./my_output"

	training:
	epochs: 5
	batch_size: 8
	learning_rate: 0.00001

	lora:
	enabled: true
	r: 16
	alpha: 64

	data:
	train_file: "./data/train.json"
	```

	Then train:

	```bash
	python cli.py train --config my_config.yaml
	```

	### 3. Text Generation

	```bash
	python cli.py generate \
	./output \
	--prompt "Once upon a time" \
	--max-length 200 \
	--temperature 0.8
	```

	### 4. Model Information

	```bash
	# View model details
	python cli.py info ./output

	# View KerdosAI version
	python cli.py info
	```

	## Configuration Presets

	KerdosAI includes several pre-configured training presets:

	```bash
	# Quick test (fast, minimal resources)
	python cli.py train --config configs/training_presets.yaml#quick_test

	# Small model (resource-constrained)
	python cli.py train --config configs/training_presets.yaml#small_model

	# Production (optimized settings)
	python cli.py train --config configs/training_presets.yaml#production
	```

	## Python API

	```python
	from kerdosai.agent import KerdosAgent
	from kerdosai.config import load_config

	# Load configuration
	config = load_config("configs/default.yaml")

	# Initialize agent
	agent = KerdosAgent(
	base_model="gpt2",
	training_data="./data/train.json"
	)

	# Prepare for efficient training
	agent.prepare_for_training(
	use_lora=True,
	lora_r=8,
	use_4bit=True
	)

	# Train
	metrics = agent.train(
	epochs=3,
	batch_size=4,
	learning_rate=2e-5
	)

	# Save model
	agent.save("./output")

	# Generate text
	output = agent.generate(
	"Hello, AI!",
	max_length=100,
	temperature=0.7
	)
	print(output)
	```

	## Data Format

	KerdosAI supports various data formats:

	### JSON Format

	```json
	[
	{"text": "First training example..."},
	{"text": "Second training example..."}
	]
	```

	### CSV Format

	```csv
	text
	"First training example..."
	"Second training example..."
	```

	### HuggingFace Datasets

	```python
	from config import KerdosConfig

	config = KerdosConfig(
	base_model="gpt2",
	data=DataConfig(
	dataset_name="wikitext",
	dataset_config="wikitext-2-raw-v1"
	)
	)
	```

	## Advanced Features

	### LoRA (Low-Rank Adaptation)

	```python
	config.lora.enabled = True
	config.lora.r = 16 # Rank
	config.lora.alpha = 64 # Alpha parameter
	config.lora.dropout = 0.1
	```

	### Quantization

	```python
	config.quantization.enabled = True
	config.quantization.bits = 4 # 4-bit or 8-bit
	config.quantization.quant_type = "nf4" # nf4 or fp4
	```

	### Mixed Precision Training

	```python
	config.training.fp16 = True # For NVIDIA GPUs
	# or
	config.training.bf16 = True # For newer GPUs
	```

	## Monitoring

	### Weights & Biases

	```bash
	# Set API key
	export WANDB_API_KEY=your_key_here

	# Enable in config
	python cli.py train --config configs/default.yaml
	```

	### TensorBoard

	```bash
	# Start TensorBoard
	tensorboard --logdir=./runs

	# Or use Docker Compose
	docker-compose up tensorboard
	```

	## Testing

	```bash
	# Run all tests
	pytest

	# Run with coverage
	pytest --cov=kerdosai --cov-report=html

	# Run specific tests
	pytest tests/test_config.py -v
	```

	## Troubleshooting

	### Out of Memory

	1. Reduce batch size: `--batch-size 2`
	2. Enable gradient accumulation: `gradient_accumulation_steps: 4`
	3. Use quantization: `--quantize`
	4. Use smaller model

	### Slow Training

	1. Enable mixed precision: `fp16: true`
	2. Increase batch size if memory allows
	3. Use multiple GPUs (see distributed training docs)

	### Import Errors

	```bash
	# Ensure virtual environment is activated
	source venv/bin/activate

	# Reinstall dependencies
	pip install -r requirements.txt
	```

	## Next Steps

	- Read the [full documentation](docs/index.md)
	- Check out [example notebooks](notebooks/)
	- Join our [community](https://kerdos.in/community)
	- Report issues on [GitHub](https://github.com/bhaskarvilles/kerdosai/issues)