File size: 4,794 Bytes

f21249a

# KerdosAI Quick Start Guide

## Installation

### Using pip (Recommended)

```bash
# Clone the repository
git clone https://github.com/bhaskarvilles/kerdosai.git
cd kerdosai

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install development dependencies (optional)
pip install pytest pytest-cov black ruff mypy rich typer
```

### Using Docker

```bash
# Build the image
docker-compose build

# Run training
docker-compose run kerdosai-train

# Start API server
docker-compose up kerdosai-api
```

## Quick Start

### 1. Basic Training

```bash
# Train with default configuration
python cli.py train \
    --model gpt2 \
    --data ./data/train.json \
    --output ./output

# Train with custom configuration
python cli.py train --config configs/default.yaml
```

### 2. Using Configuration Files

Create a configuration file `my_config.yaml`:

```yaml
base_model: "gpt2"
output_dir: "./my_output"

training:
  epochs: 5
  batch_size: 8
  learning_rate: 0.00001

lora:
  enabled: true
  r: 16
  alpha: 64

data:
  train_file: "./data/train.json"
```

Then train:

```bash
python cli.py train --config my_config.yaml
```

### 3. Text Generation

```bash
python cli.py generate \
    ./output \
    --prompt "Once upon a time" \
    --max-length 200 \
    --temperature 0.8
```

### 4. Model Information

```bash
# View model details
python cli.py info ./output

# View KerdosAI version
python cli.py info
```

## Configuration Presets

KerdosAI includes several pre-configured training presets:

```bash
# Quick test (fast, minimal resources)
python cli.py train --config configs/training_presets.yaml#quick_test

# Small model (resource-constrained)
python cli.py train --config configs/training_presets.yaml#small_model

# Production (optimized settings)
python cli.py train --config configs/training_presets.yaml#production
```

## Python API

```python
from kerdosai.agent import KerdosAgent
from kerdosai.config import load_config

# Load configuration
config = load_config("configs/default.yaml")

# Initialize agent
agent = KerdosAgent(
    base_model="gpt2",
    training_data="./data/train.json"
)

# Prepare for efficient training
agent.prepare_for_training(
    use_lora=True,
    lora_r=8,
    use_4bit=True
)

# Train
metrics = agent.train(
    epochs=3,
    batch_size=4,
    learning_rate=2e-5
)

# Save model
agent.save("./output")

# Generate text
output = agent.generate(
    "Hello, AI!",
    max_length=100,
    temperature=0.7
)
print(output)
```

## Data Format

KerdosAI supports various data formats:

### JSON Format

```json
[
  {"text": "First training example..."},
  {"text": "Second training example..."}
]
```

### CSV Format

```csv
text
"First training example..."
"Second training example..."
```

### HuggingFace Datasets

```python
from config import KerdosConfig

config = KerdosConfig(
    base_model="gpt2",
    data=DataConfig(
        dataset_name="wikitext",
        dataset_config="wikitext-2-raw-v1"
    )
)
```

## Advanced Features

### LoRA (Low-Rank Adaptation)

```python
config.lora.enabled = True
config.lora.r = 16  # Rank
config.lora.alpha = 64  # Alpha parameter
config.lora.dropout = 0.1
```

### Quantization

```python
config.quantization.enabled = True
config.quantization.bits = 4  # 4-bit or 8-bit
config.quantization.quant_type = "nf4"  # nf4 or fp4
```

### Mixed Precision Training

```python
config.training.fp16 = True  # For NVIDIA GPUs
# or
config.training.bf16 = True  # For newer GPUs
```

## Monitoring

### Weights & Biases

```bash
# Set API key
export WANDB_API_KEY=your_key_here

# Enable in config
python cli.py train --config configs/default.yaml
```

### TensorBoard

```bash
# Start TensorBoard
tensorboard --logdir=./runs

# Or use Docker Compose
docker-compose up tensorboard
```

## Testing

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=kerdosai --cov-report=html

# Run specific tests
pytest tests/test_config.py -v
```

## Troubleshooting

### Out of Memory

1. Reduce batch size: `--batch-size 2`
2. Enable gradient accumulation: `gradient_accumulation_steps: 4`
3. Use quantization: `--quantize`
4. Use smaller model

### Slow Training

1. Enable mixed precision: `fp16: true`
2. Increase batch size if memory allows
3. Use multiple GPUs (see distributed training docs)

### Import Errors

```bash
# Ensure virtual environment is activated
source venv/bin/activate

# Reinstall dependencies
pip install -r requirements.txt
```

## Next Steps

- Read the [full documentation](docs/index.md)
- Check out [example notebooks](notebooks/)
- Join our [community](https://kerdos.in/community)
- Report issues on [GitHub](https://github.com/bhaskarvilles/kerdosai/issues)