Spaces:
Running
Running
| # Contributing to BharatGraph | |
| Read this before opening a pull request. | |
| --- | |
| ## Before You Start | |
| Check the open issues before writing code. For any change that adds a new | |
| file, modifies the graph schema, or adds a dependency, open an issue first | |
| to discuss the approach. Small bug fixes can go straight to a pull request. | |
| --- | |
| ## Branch Strategy | |
| One long-lived branch: `main`. All work happens on short-lived branches | |
| that merge directly into `main`. There is no develop branch. | |
| Naming: | |
| ``` | |
| feature/phase-N-short-name New phase functionality | |
| fix/issue-N-short-name Bug fix referencing an issue number | |
| docs/short-name Documentation only | |
| test/short-name Tests only | |
| ``` | |
| Always branch from the latest `main`: | |
| ```bash | |
| git checkout main | |
| git pull origin main | |
| git checkout -b feature/phase-5-risk-scoring | |
| ``` | |
| --- | |
| ## Commit Messages | |
| Format: | |
| ``` | |
| type(scope): imperative description under 72 characters | |
| Optional body. Wrap at 72 characters. Explain why, not what. | |
| Closes #12 | |
| ``` | |
| Types: `feat`, `fix`, `docs`, `test`, `chore`, `refactor`. | |
| --- | |
| ## Python Code Standards | |
| **Style** | |
| Python 3.10 minimum. PEP 8. Maximum line length 88 characters. | |
| **Comments** | |
| No inline comments that describe what the code does. Write self-explanatory | |
| code. Use comments only to explain non-obvious decisions or constraints. | |
| Do not commit commented-out code blocks. | |
| **Logging** | |
| No `print` statements in any module except `__main__` blocks. Use `loguru` | |
| for all diagnostic output. | |
| **Type hints** | |
| All function signatures have type hints. Return types are always annotated. | |
| **Docstrings** | |
| Every class and every public method has a one-line docstring. Multi-line | |
| docstrings use the Google format. | |
| **Environment variables** | |
| All credentials and configuration via `os.getenv()` loaded by `python-dotenv`. | |
| No hardcoded values for any API key, password, or URL that changes between | |
| environments. | |
| **Encoding** | |
| All file reads and writes specify `encoding="utf-8"`. All Python files are | |
| ASCII-safe in their source text. No Unicode currency symbols or special | |
| characters embedded in string literals. | |
| **Imports** | |
| Standard library imports first, then third-party, then local. One import | |
| per line. No wildcard imports. | |
| --- | |
| ## Adding a New Scraper | |
| Place the file in `scrapers/`. Inherit from `BaseScraper`. The class must | |
| implement a method named `fetch_and_save` that saves to `data/samples/` | |
| and returns the list of records. Add the scraper to `processing/pipeline.py` | |
| under the appropriate method name. Document the source in the README data | |
| sources table. | |
| --- | |
| ## Adding a New Dependency | |
| Add to `requirements.txt` with a minimum version constraint using `>=`. | |
| Never pin to an exact version with `==` unless there is a specific | |
| incompatibility reason documented in a comment. Verify the package is | |
| available under a licence compatible with MIT. | |
| --- | |
| ## Pull Request Process | |
| 1. Push your branch: | |
| ```bash | |
| git push origin feature/phase-5-risk-scoring | |
| ``` | |
| 2. Open a pull request on GitHub targeting `main`. | |
| 3. Title format: `type(scope): description` | |
| 4. Description must include: | |
| - What the change does | |
| - How to test it locally | |
| - Issues it closes: `Closes #N` | |
| 5. Do not merge your own pull request without a review unless you are the | |
| sole contributor. | |
| 6. Resolve merge conflicts locally, never in the GitHub web editor: | |
| ```bash | |
| git checkout main && git pull origin main | |
| git checkout your-branch && git merge main | |
| # resolve conflicts in your editor | |
| git add . && git commit -m "chore: resolve merge conflicts" | |
| git push origin your-branch | |
| ``` | |
| --- | |
| ## What Not to Contribute | |
| - Scrapers targeting login-protected pages, private APIs, or sources not | |
| listed in the approved data sources section of the README. | |
| - Any code that stores personally identifiable information beyond what | |
| appears in official public government filings. | |
| - Language in any output that accuses a named individual of a crime without | |
| a court judgment as the source. | |
| - Dependencies under GPL, AGPL, or other copyleft licences. | |
| - Paid API calls. Every external service used must have a usable free tier. | |
| --- | |
| ## Questions | |
| Open a GitHub issue with the label `question`. | |