File size: 12,595 Bytes
b7d2408 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 |
# Practical Guide: Writing Clear and Reusable Python Code
**Objective**: Enable future maintainers to understand your code efficiently after several months.
## Two Common Challenges
- **Overly large files**: Linear code in extensive files leads to cognitive overload. Never write 1000-line functions.
- **Excessive abstraction layers**: Multiple nested classes make it difficult to locate actual business logic.
**Goal**: Find balance between these extremes.
## Target Outcomes
- Immediately comprehensible structure with well-named functions
- Code that reads naturally and fails explicitly when something is wrong
> **Core Principle**: Code readability is our primary concern. If understanding existing code takes more effort than rewriting it, the implementation needs improvement.
---
## Core Development Principles
### Fail Fast, Fail Loud
**NEVER use fallback to degraded solutions that silently create incorrect output.**
When something goes wrong, stop with a clear error rather than continue with incorrect results. Fallbacks create inconsistent behavior, silent data degradation, and false confidence.
Even when coding, never leave unfinished stubs without explicit failing. Silent, forgotten, unfinished implementations are a source of bugs.
```python
# BAD - Silent degradation
def calculate_average(data):
if not data:
return 0 # Silently masks the issue
return sum(data) / len(data)
# GOOD - Explicit failure
def calculate_average(data):
if not data:
raise ValueError("Cannot calculate average: data list is empty")
return sum(data) / len(data)
```
**Key**: Failed execution with clear error > Successful execution with wrong data
### Clear Error Messages
Error messages should be **Specific + Actionable + Contextual**: what, where, why, how to fix.
```python
# BAD: Vague
raise ValueError("Invalid data")
# GOOD: Specific, actionable, contextual
raise ValueError(f"User age must be between 0 and 150, got {age}")
raise FileNotFoundError(f"Model not found at {model_path}. Run: python download_model.py")
```
### Input and Output Validation
Validate inputs before processing and outputs before saving.
```python
def resize_image(image: np.ndarray, target_size: tuple[int, int]) -> np.ndarray:
if image is None or len(image.shape) not in (2, 3):
raise ValueError(f"Expected 2D/3D image, got {image.shape if image is not None else None}")
if target_size[0] <= 0 or target_size[1] <= 0:
raise ValueError(f"Target size must be positive, got {target_size}")
return cv2.resize(image, target_size)
def generate_report(data: list[dict]) -> str:
report = _create_report(data)
if not report or len(report) < 10:
raise ValueError(f"Report suspiciously short ({len(report)} chars)")
return report
```
---
## 1. Maintain Simplicity
Implement the **most straightforward solution** that meets requirements.
Avoid adding complex layers for single use cases.
## 2. Avoid Unnecessary Structures
### Redundant Try-Catch / Single-Method Classes / Premature Abstractions
```python
# BAD: Simply re-raising
try:
result = process_image(img_path)
except Exception as e:
raise e # Adds no value
# BAD: Hiding errors
try:
predictions = model.predict(data)
except:
predictions = [] # Lost error info
# BAD: Unnecessary class
class ImageProcessor:
def process(self, image):
return cv2.resize(image, (224, 224))
# GOOD: Simple and direct
def process_image(img_path):
return cv2.imread(img_path) # Let exceptions propagate
def load_and_predict(model_path, data):
if not os.path.exists(model_path):
raise FileNotFoundError(f"Model not found: {model_path}")
return load_model(model_path).predict(data)
def resize_image(image, size=(224, 224)):
return cv2.resize(image, size)
```
---
## 3. When to Create Functions
**Pipelines**: Don't create 1000-line functions. One function = One purpose.
**Reusable code**: Code written once β keep as is. Twice β consider refactoring. Three+ times β definitely refactor.
## 4. Function Design Guidelines
- **Single Responsibility**: One clear purpose per function
- **Descriptive names**: `remove_outliers` not `do_stuff`
- **Size**: Under 20 lines (split larger functions)
```python
# BAD: Too many things
def process(df):
# cleaning, normalization, encoding, split, training...
return model
# GOOD: One task per function
def clean_columns(df): ...
def normalize(df): ...
def train_model(train, test): ...
```
---
## 5. Effective Documentation
**Philosophy: Documentation lives in code, not in separate markdown files.**
### Documentation Hierarchy
1. **Docstrings** (mandatory for public functions/classes)
2. **Inline comments** (explain "why", not "what")
3. **README.md** (project overview, setup, quick start)
4. **Auto-generated docs** (from docstrings)
**Do NOT create proliferating documentation files** - they get out of sync and duplicate docstrings.
Temporary reports shall go in the ./.temp directory, that is added to .gitignore.
### Docstring Format
Use **Google Style Docstrings**.
```python
def normalize_text(text: str, stopwords: list[str] | None = None) -> str:
"""Convert text to lowercase and remove punctuation.
Args:
text: Input text to normalize.
stopwords: Optional list of words to remove.
Returns:
Normalized text in lowercase with punctuation removed.
Raises:
ValueError: If text is empty or whitespace only.
"""
if not text or not text.strip():
raise ValueError("Text cannot be empty")
text = text.lower().strip()
text = re.sub(r'[^\w\s]', '', text)
if stopwords:
text = " ".join([w for w in text.split() if w not in stopwords])
return text
```
### Inline Comments
```python
# BAD: Repeats code
timestamp = frame.timestamp # Get the timestamp
# GOOD: Explains why
if frame.timestamp <= 0: # Skip pre-initialization frames
continue
```
---
## 6. Type Annotations
**Always annotate parameters and return types** for public functions.
```python
# BAD: Unclear
def process_data(data, config, threshold):
return result
# GOOD: Clear types
def process_data(data: np.ndarray, config: dict, threshold: float) -> list[dict]:
return result
```
### Pydantic Models (Company Standard)
**MANDATORY: Use Pydantic for complex data structures and interfaces between business bricks.**
Benefits: Runtime validation, clear errors, JSON serialization, self-documenting.
```python
from pydantic import BaseModel, Field, field_validator
class Detection(BaseModel):
"""Object detection result."""
bbox: tuple[int, int, int, int] = Field(..., description="(x1, y1, x2, y2)")
confidence: float = Field(..., ge=0.0, le=1.0)
class_id: int = Field(..., ge=0)
class_name: str
@field_validator('bbox')
@classmethod
def validate_bbox(cls, v: tuple[int, int, int, int]) -> tuple[int, int, int, int]:
x1, y1, x2, y2 = v
if x2 <= x1 or y2 <= y1:
raise ValueError(f"Invalid bbox, got {v}")
return v
```
**CRITICAL: Use the same Pydantic models for shared concepts across all projects** (Camera, GPS, Detection, etc.) to ensure consistent validation and easy integration.
---
## 7. Prioritize Readability
Clear code takes precedence over premature optimization. Use descriptive names (`average_daily_temperature` not `adt`).
---
## 8. Remove Dead Code
**If code is no longer used, delete it.** Dead code pollutes the codebase, creates confusion, and becomes outdated. Git history preserves everything.
**Exceptions**: Library coherence, public API compatibility.
Delete: commented-out code, unused imports/functions/classes/variables. Trust git history.
---
## 9. No Hardcoding of Frequently Changing Values
**Never hardcode values that change frequently or vary between environments.**
### Configuration Format
**Prefer**: JSON (company standard), .env (secrets), YAML/TOML (if needed). **Avoid**: Python files.
```json
{"database": {"host": "localhost", "password": "${DB_PASSWORD}"}}
```
### What to Configure
Credentials, URLs, file paths, thresholds, environment-specific values.
**Security**: Never commit secrets. Use .gitignore for `.env`, `secrets/`, `*.key`.
**Acceptable hardcoding**: Constants (`PI = 3.14159`), enums, default parameters.
---
## 10. No "Hacks to Make It Compile"
**Write code that is correct by design, not code that tricks the type checker or linter.**
Hacks = code modifications made solely to silence errors without addressing the root cause.
### Common Hacks to Avoid
```python
# BAD: Silencing errors
result = process_data(input) # type: ignore
try:
result = risky_operation()
except:
pass
from module import * # noqa
return User(id=-1, name="") # Dummy values
# GOOD: Fix the root cause
def process_data(input: list[dict]) -> list[dict]:
return input
result = process_data(input) # Properly typed
try:
result = risky_operation()
except ValueError as e:
raise ValueError(f"Failed: {e}") from e
from module import SpecificClass, specific_function
def get_user(user_id: int) -> User | None:
if user_id < 0:
return None # Or raise ValueError
return User(id=user_id, name="John")
```
### When Suppression is Acceptable
Only when you have a legitimate reason AND document why:
```python
arr[mask] = values # type: ignore[index] # numpy advanced indexing
result = external_lib.process(data) # type: ignore[no-untyped-call] # no type stubs
```
**Approach**: Understand error β Fix root cause β If suppression needed, document WHY β Review regularly.
**Remember**: Hacks hide bugs and accumulate technical debt. Fix the issue, don't silence the messenger.
---
## File Organization
**ALL temporary files** β `./.temp/` directory.
```bash
# Good: src/, tests/, .temp/, README.md
# Bad: test_something.py, output_analysis.csv, debug_report.txt in root
```
Code modification reports and temporary tests created by A.I. shall be in .temp/ too.
---
## Testing
Test error paths, not just success cases. Basic verification saves debugging time.
```python
def test_resize_image():
assert resize_image(np.zeros((100, 100, 3)), (50, 50)).shape == (50, 50, 3)
with pytest.raises(ValueError):
resize_image(None, (50, 50))
```
---
## Version Control
**Commit strategy**: Create commits before and after major changes (core algorithms, multi-file refactoring, new features).
**Messages**: Use conventional commits (`feat:`, `fix:`, `refactor:`). Be descriptive.
```bash
# BAD: "fix stuff", "wip"
# GOOD: "fix: resolve race condition in data loader"
```
**Always commit**: Production code, tests, docs, non-sensitive config.
**Never commit**: Secrets (use `.env`, add to `.gitignore`).
---
## Using AI Tools Responsibly
**You are responsible for every line of code, whether written by you or AI.**
AI shall not modify **core business logic data structures** without review and explicit approval.
Pydantic models forming interfaces between business bricks require human judgment.
---
## Pre-Commit Checklist
**Code Quality**: Clear, descriptive names | Type hints on public functions | Pydantic for complex structures | Reusable without copying
**Error Handling**: No fallback logic | Specific error messages | Input/output validation
**AI Code**: All AI code reviewed and understood | Core interfaces human-designed | No black boxes
**Organization**: No temp files in root (use `.temp/`) | Docstrings on public functions | README updated
**Version Control**: Conventional commit format | No secrets | Logical grouping
---
## Summary
1. **Fail fast, fail loud** - Never silently degrade
2. **Clear errors** - Specific, actionable, contextual
3. **Validate I/O** - Catch problems early
4. **Straightforward code** - Simplest solution
5. **Small functions** - One purpose, <20 lines
6. **Type annotations** - Mandatory for public functions
7. **Pydantic for interfaces** - Complex structures, shared schemas
8. **Docs in code** - Docstrings, not proliferating files
9. **Organized projects** - `.temp/` for temporary files
10. **Remove dead code** - Git preserves history
11. **No hardcoding** - Use config files (JSON standard)
12. **No hacks** - Fix root cause, don't silence errors
13. **AI responsibly** - Review everything, humans decide
**When in doubt**: Raise clear error > guess behavior. Ask for clarification rather than corrupt data.
**Success metric**: Future maintainers understand, trust, and reuse your code without assistance.
|