File size: 5,387 Bytes
c7ebaa1 2145d80 c7ebaa1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | # Contributing to BioRLHF
Thank you for your interest in contributing to BioRLHF! This document provides guidelines and instructions for contributing.
## Table of Contents
- [Code of Conduct](#code-of-conduct)
- [Getting Started](#getting-started)
- [Development Setup](#development-setup)
- [Making Changes](#making-changes)
- [Testing](#testing)
- [Submitting Changes](#submitting-changes)
- [Style Guidelines](#style-guidelines)
## Code of Conduct
Please be respectful and constructive in all interactions. We welcome contributors of all backgrounds and experience levels.
## Getting Started
1. **Fork the repository** on GitHub
2. **Clone your fork** locally:
```bash
git clone https://github.com/YOUR_USERNAME/BioRLHF.git
cd BioRLHF
```
3. **Add upstream remote**:
```bash
git remote add upstream https://github.com/jang1563/BioRLHF.git
```
## Development Setup
### Prerequisites
- Python 3.9 or higher
- CUDA-compatible GPU (recommended for training)
- Git
### Installation
1. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
2. Install the package in development mode with all dependencies:
```bash
pip install -e ".[dev]"
```
3. Install pre-commit hooks:
```bash
pre-commit install
```
### Verify Installation
```bash
# Run tests
pytest
# Check code formatting
black --check src/ tests/
ruff check src/ tests/
```
## Making Changes
### Branch Naming
Create a descriptive branch for your changes:
- `feature/description` - New features
- `fix/description` - Bug fixes
- `docs/description` - Documentation updates
- `refactor/description` - Code refactoring
Example:
```bash
git checkout -b feature/add-new-evaluation-metric
```
### Commit Messages
Write clear, concise commit messages:
- Use the present tense ("Add feature" not "Added feature")
- Use the imperative mood ("Move cursor to..." not "Moves cursor to...")
- Limit the first line to 72 characters
- Reference issues when applicable
Example:
```
Add calibration accuracy metric to evaluation module
- Implement uncertainty detection in model responses
- Add tests for calibration scoring
- Update documentation with new metric
Closes #42
```
## Testing
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=biorlhf --cov-report=html
# Run specific test file
pytest tests/test_dataset.py
# Run tests matching a pattern
pytest -k "test_evaluation"
```
### Writing Tests
- Place tests in the `tests/` directory
- Mirror the source structure (e.g., `src/biorlhf/data/dataset.py` → `tests/test_dataset.py`)
- Use descriptive test names
- Include docstrings explaining what the test verifies
Example:
```python
def test_load_dataset_returns_expected_format():
"""Verify that load_dataset returns a HuggingFace Dataset object."""
dataset = load_dataset("kmp_sft_final.json")
assert isinstance(dataset, Dataset)
assert "text" in dataset.column_names
```
## Submitting Changes
### Before Submitting
1. **Sync with upstream**:
```bash
git fetch upstream
git rebase upstream/main
```
2. **Run all checks**:
```bash
# Format code
black src/ tests/
# Check linting
ruff check src/ tests/
# Run tests
pytest
```
3. **Update documentation** if needed
### Pull Request Process
1. Push your branch to your fork:
```bash
git push origin feature/your-feature
```
2. Open a Pull Request on GitHub
3. Fill in the PR template with:
- Description of changes
- Related issue numbers
- Testing performed
- Screenshots (if UI changes)
4. Wait for review and address feedback
### Review Checklist
- [ ] Code follows style guidelines
- [ ] Tests pass locally
- [ ] New code has appropriate test coverage
- [ ] Documentation is updated
- [ ] Commit messages are clear
## Style Guidelines
### Python Code Style
We use [Black](https://black.readthedocs.io/) for code formatting and [Ruff](https://docs.astral.sh/ruff/) for linting.
Key conventions:
- Line length: 88 characters (Black default)
- Use type hints where practical
- Write docstrings for public functions and classes
- Use meaningful variable names
### Docstring Format
Use Google-style docstrings:
```python
def evaluate_model(model_path: str, test_data: str) -> dict:
"""Evaluate a trained model on test data.
Args:
model_path: Path to the trained model directory.
test_data: Path to the test dataset JSON file.
Returns:
Dictionary containing evaluation metrics including
factual_accuracy, reasoning_accuracy, and calibration_score.
Raises:
FileNotFoundError: If model_path or test_data doesn't exist.
Example:
>>> results = evaluate_model("./model", "test.json")
>>> print(results["factual_accuracy"])
0.90
"""
```
### Import Order
Organize imports in this order:
1. Standard library
2. Third-party packages
3. Local imports
Example:
```python
import json
from pathlib import Path
import torch
from transformers import AutoModelForCausalLM
from biorlhf.data import load_dataset
from biorlhf.utils import setup_quantization
```
## Questions?
If you have questions about contributing, feel free to:
- Open an issue for discussion
- Reach out to the maintainers
Thank you for contributing to BioRLHF!
|