Anonymous Hunter
feat: Add robust configuration management, Docker support, initial testing, and quickstart documentation.
f21249a
| # KerdosAI Quick Start Guide | |
| ## Installation | |
| ### Using pip (Recommended) | |
| ```bash | |
| # Clone the repository | |
| git clone https://github.com/bhaskarvilles/kerdosai.git | |
| cd kerdosai | |
| # Create virtual environment | |
| python3 -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Install development dependencies (optional) | |
| pip install pytest pytest-cov black ruff mypy rich typer | |
| ``` | |
| ### Using Docker | |
| ```bash | |
| # Build the image | |
| docker-compose build | |
| # Run training | |
| docker-compose run kerdosai-train | |
| # Start API server | |
| docker-compose up kerdosai-api | |
| ``` | |
| ## Quick Start | |
| ### 1. Basic Training | |
| ```bash | |
| # Train with default configuration | |
| python cli.py train \ | |
| --model gpt2 \ | |
| --data ./data/train.json \ | |
| --output ./output | |
| # Train with custom configuration | |
| python cli.py train --config configs/default.yaml | |
| ``` | |
| ### 2. Using Configuration Files | |
| Create a configuration file `my_config.yaml`: | |
| ```yaml | |
| base_model: "gpt2" | |
| output_dir: "./my_output" | |
| training: | |
| epochs: 5 | |
| batch_size: 8 | |
| learning_rate: 0.00001 | |
| lora: | |
| enabled: true | |
| r: 16 | |
| alpha: 64 | |
| data: | |
| train_file: "./data/train.json" | |
| ``` | |
| Then train: | |
| ```bash | |
| python cli.py train --config my_config.yaml | |
| ``` | |
| ### 3. Text Generation | |
| ```bash | |
| python cli.py generate \ | |
| ./output \ | |
| --prompt "Once upon a time" \ | |
| --max-length 200 \ | |
| --temperature 0.8 | |
| ``` | |
| ### 4. Model Information | |
| ```bash | |
| # View model details | |
| python cli.py info ./output | |
| # View KerdosAI version | |
| python cli.py info | |
| ``` | |
| ## Configuration Presets | |
| KerdosAI includes several pre-configured training presets: | |
| ```bash | |
| # Quick test (fast, minimal resources) | |
| python cli.py train --config configs/training_presets.yaml#quick_test | |
| # Small model (resource-constrained) | |
| python cli.py train --config configs/training_presets.yaml#small_model | |
| # Production (optimized settings) | |
| python cli.py train --config configs/training_presets.yaml#production | |
| ``` | |
| ## Python API | |
| ```python | |
| from kerdosai.agent import KerdosAgent | |
| from kerdosai.config import load_config | |
| # Load configuration | |
| config = load_config("configs/default.yaml") | |
| # Initialize agent | |
| agent = KerdosAgent( | |
| base_model="gpt2", | |
| training_data="./data/train.json" | |
| ) | |
| # Prepare for efficient training | |
| agent.prepare_for_training( | |
| use_lora=True, | |
| lora_r=8, | |
| use_4bit=True | |
| ) | |
| # Train | |
| metrics = agent.train( | |
| epochs=3, | |
| batch_size=4, | |
| learning_rate=2e-5 | |
| ) | |
| # Save model | |
| agent.save("./output") | |
| # Generate text | |
| output = agent.generate( | |
| "Hello, AI!", | |
| max_length=100, | |
| temperature=0.7 | |
| ) | |
| print(output) | |
| ``` | |
| ## Data Format | |
| KerdosAI supports various data formats: | |
| ### JSON Format | |
| ```json | |
| [ | |
| {"text": "First training example..."}, | |
| {"text": "Second training example..."} | |
| ] | |
| ``` | |
| ### CSV Format | |
| ```csv | |
| text | |
| "First training example..." | |
| "Second training example..." | |
| ``` | |
| ### HuggingFace Datasets | |
| ```python | |
| from config import KerdosConfig | |
| config = KerdosConfig( | |
| base_model="gpt2", | |
| data=DataConfig( | |
| dataset_name="wikitext", | |
| dataset_config="wikitext-2-raw-v1" | |
| ) | |
| ) | |
| ``` | |
| ## Advanced Features | |
| ### LoRA (Low-Rank Adaptation) | |
| ```python | |
| config.lora.enabled = True | |
| config.lora.r = 16 # Rank | |
| config.lora.alpha = 64 # Alpha parameter | |
| config.lora.dropout = 0.1 | |
| ``` | |
| ### Quantization | |
| ```python | |
| config.quantization.enabled = True | |
| config.quantization.bits = 4 # 4-bit or 8-bit | |
| config.quantization.quant_type = "nf4" # nf4 or fp4 | |
| ``` | |
| ### Mixed Precision Training | |
| ```python | |
| config.training.fp16 = True # For NVIDIA GPUs | |
| # or | |
| config.training.bf16 = True # For newer GPUs | |
| ``` | |
| ## Monitoring | |
| ### Weights & Biases | |
| ```bash | |
| # Set API key | |
| export WANDB_API_KEY=your_key_here | |
| # Enable in config | |
| python cli.py train --config configs/default.yaml | |
| ``` | |
| ### TensorBoard | |
| ```bash | |
| # Start TensorBoard | |
| tensorboard --logdir=./runs | |
| # Or use Docker Compose | |
| docker-compose up tensorboard | |
| ``` | |
| ## Testing | |
| ```bash | |
| # Run all tests | |
| pytest | |
| # Run with coverage | |
| pytest --cov=kerdosai --cov-report=html | |
| # Run specific tests | |
| pytest tests/test_config.py -v | |
| ``` | |
| ## Troubleshooting | |
| ### Out of Memory | |
| 1. Reduce batch size: `--batch-size 2` | |
| 2. Enable gradient accumulation: `gradient_accumulation_steps: 4` | |
| 3. Use quantization: `--quantize` | |
| 4. Use smaller model | |
| ### Slow Training | |
| 1. Enable mixed precision: `fp16: true` | |
| 2. Increase batch size if memory allows | |
| 3. Use multiple GPUs (see distributed training docs) | |
| ### Import Errors | |
| ```bash | |
| # Ensure virtual environment is activated | |
| source venv/bin/activate | |
| # Reinstall dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ## Next Steps | |
| - Read the [full documentation](docs/index.md) | |
| - Check out [example notebooks](notebooks/) | |
| - Join our [community](https://kerdos.in/community) | |
| - Report issues on [GitHub](https://github.com/bhaskarvilles/kerdosai/issues) | |