Ranjit Behera commited on
Commit
2b1ff82
Β·
1 Parent(s): 7b52500

chore: add CHANGELOG.md, release script, and strict workflow

Browse files

- CHANGELOG.md with all past releases
- scripts/release.py for automated releases
- Updated CONTRIBUTING.md with strict rules:
- Never push to main directly
- Feature β†’ develop β†’ main flow
- Pre-release testing required
- Changelog required for each release

Files changed (3) hide show
  1. CHANGELOG.md +76 -0
  2. CONTRIBUTING.md +139 -81
  3. scripts/release.py +218 -0
CHANGELOG.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Changelog
2
+
3
+ All notable changes to FinEE will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+ ### Added
10
+ - (Next features go here)
11
+
12
+ ### Changed
13
+ - (Changes to existing features)
14
+
15
+ ### Fixed
16
+ - (Bug fixes)
17
+
18
+ ---
19
+
20
+ ## [1.0.3] - 2025-01-11
21
+ ### Added
22
+ - Lakhs notation support (e.g., "1.5 Lakh" β†’ 150000)
23
+ - `benchmark.py` script for accuracy verification
24
+ - Torture test suite for edge cases
25
+ - `CONTRIBUTING.md` with Git Flow guidelines
26
+ - Professional branching: main, develop, feature/*
27
+
28
+ ### Changed
29
+ - Moved Jupyter notebooks to `experiments/` folder
30
+ - Default `use_llm=False` for instant usage (no model download)
31
+ - Updated README with edge case examples
32
+
33
+ ### Fixed
34
+ - Double-escaped regex patterns in amount extraction
35
+
36
+ ---
37
+
38
+ ## [1.0.2] - 2025-01-11
39
+ ### Changed
40
+ - Default to regex-only mode (`use_llm=False`)
41
+ - Package works instantly without downloading 5GB model
42
+
43
+ ### Fixed
44
+ - Package build configuration (hatch sources)
45
+
46
+ ---
47
+
48
+ ## [1.0.1] - 2025-01-11
49
+ ### Fixed
50
+ - Package did not include source files (only 5KB)
51
+ - Fixed `pyproject.toml` build configuration
52
+
53
+ ---
54
+
55
+ ## [1.0.0] - 2025-01-11
56
+ ### Added
57
+ - Initial PyPI release
58
+ - 5-tier additive extraction pipeline (Cache/Regex/Rules/LLM/Validate)
59
+ - Multi-backend support (MLX, PyTorch, GGUF)
60
+ - CLI with `finee` command
61
+ - 88 unit tests
62
+ - Colab demo notebook
63
+ - JSON schema contract documentation
64
+ - Support for 5 banks: HDFC, ICICI, SBI, Axis, Kotak
65
+
66
+ ### Performance
67
+ - 94.5% field accuracy on multi-bank benchmark
68
+ - <1ms latency in regex-only mode
69
+ - 50,000+ messages/second throughput
70
+
71
+ ---
72
+
73
+ ## Links
74
+ - [PyPI](https://pypi.org/project/finee/)
75
+ - [GitHub](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor)
76
+ - [Hugging Face](https://huggingface.co/Ranjit0034/finance-entity-extractor)
CONTRIBUTING.md CHANGED
@@ -2,117 +2,175 @@
2
 
3
  Thank you for your interest in contributing! Here's how to get started.
4
 
5
- ## 🌿 Branching Strategy
6
 
7
- We use **Git Flow** for professional development:
 
 
 
 
 
8
 
9
  ```
10
- main ────●────●────●────────────●─────→ (stable releases only)
11
- β”‚ β”‚ β”‚ β”‚
12
- β”‚ β”‚ └──tag:v1.0.3β”‚
13
- β”‚ β”‚ β”‚
14
- develop ────●────●────●────●───────●─────→ (integration branch)
15
- β”‚ β”‚ β”‚
16
- feature/xyz β”€β”€β”€β”€β”€β”€β”€β”€β”€β—β”€β”€β”€β”€β—β”€β”€β”€β”€β”˜ (feature branches)
 
17
  ```
18
 
19
- ### Branches
20
-
21
- | Branch | Purpose | Merge To |
22
- |--------|---------|----------|
23
- | `main` | Stable releases only | - |
24
- | `develop` | Integration & testing | `main` (via PR) |
25
- | `feature/*` | New features | `develop` (via PR) |
26
- | `fix/*` | Bug fixes | `develop` (via PR) |
27
- | `hotfix/*` | Urgent production fixes | `main` + `develop` |
28
-
29
- ### Workflow
30
-
31
- 1. **New Feature**:
32
- ```bash
33
- git checkout develop
34
- git pull origin develop
35
- git checkout -b feature/my-feature
36
- # ... make changes ...
37
- git push -u origin feature/my-feature
38
- # Create PR to develop
39
- ```
40
-
41
- 2. **Bug Fix**:
42
- ```bash
43
- git checkout develop
44
- git checkout -b fix/issue-123
45
- # ... fix bug ...
46
- git push -u origin fix/issue-123
47
- # Create PR to develop
48
- ```
49
-
50
- 3. **Release**:
51
- ```bash
52
- git checkout main
53
- git merge develop
54
- git tag -a v1.x.x -m "Release v1.x.x"
55
- git push origin main --tags
56
- ```
57
-
58
- ## πŸ§ͺ Running Tests
59
 
60
  ```bash
61
- # Install dev dependencies
62
- pip install -e ".[dev]"
 
63
 
64
- # Run all tests
65
- pytest tests/ -v
66
 
67
- # Run specific test
68
- pytest tests/test_regex_engine.py -v
 
69
 
70
- # Run with coverage
71
- pytest tests/ --cov=finee --cov-report=html
72
- ```
73
 
74
- ## πŸ“ Code Style
 
 
 
75
 
76
- - **Black** for formatting (line length: 100)
77
- - **Ruff** for linting
78
- - **Type hints** for all public functions
79
 
80
  ```bash
81
- # Format code
82
- black src/ tests/
 
83
 
84
- # Lint
85
- ruff check src/ tests/
86
 
87
- # Type check
88
- mypy src/finee/
 
89
  ```
90
 
91
- ## πŸš€ Publishing
92
 
93
- Only maintainers can publish to PyPI:
94
 
95
  ```bash
96
- # Bump version in pyproject.toml
97
- # Build
98
- python -m build
99
 
100
- # Upload
101
- twine upload dist/*
102
  ```
103
 
104
- ## πŸ“‹ Commit Messages
 
 
 
 
 
 
 
 
 
 
 
105
 
106
- Use conventional commits:
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
  ```
109
- feat: add support for Lakhs notation
110
- fix: handle Unicode β‚Ή symbol in regex
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  docs: update README with torture tests
112
- test: add edge case tests for truncated SMS
113
  chore: move notebooks to experiments/
114
  ```
115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  ## πŸ™ Thank You!
117
 
118
- Every contribution helps make FinEE better for the Indian fintech community.
 
2
 
3
  Thank you for your interest in contributing! Here's how to get started.
4
 
5
+ ## 🚨 Golden Rules
6
 
7
+ 1. **NEVER push directly to `main`** - All changes go through PRs
8
+ 2. **All features branch from `develop`** - Not from main
9
+ 3. **Tests MUST pass before merging** - No exceptions
10
+ 4. **Update CHANGELOG.md** - Every PR should update it
11
+
12
+ ## 🌿 Branching Strategy (Git Flow)
13
 
14
  ```
15
+ main ──●────────────────●────────────●──→ (releases only)
16
+ β”‚ β”‚ β”‚
17
+ v1.0.3 v1.0.4 v1.1.0
18
+ β”‚ β”‚ β”‚
19
+ develop ──●────●────●──────●───●────●───●──→ (integration)
20
+ β”‚ β”‚ β”‚ β”‚
21
+ feature/a β”€β”€β”€β”€β”€β”€β”€β—β”€β”€β”€β”€β”˜ β”‚ β”‚
22
+ feature/b β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β—β”€β”€β”€β”€β”˜
23
  ```
24
 
25
+ ### Branch Types
26
+
27
+ | Branch | Source | Merges To | Naming |
28
+ |--------|--------|-----------|--------|
29
+ | `main` | - | - | Protected, releases only |
30
+ | `develop` | `main` | `main` | Integration branch |
31
+ | `feature/*` | `develop` | `develop` | `feature/add-kotak-support` |
32
+ | `fix/*` | `develop` | `develop` | `fix/unicode-amount-parsing` |
33
+ | `hotfix/*` | `main` | `main` + `develop` | `hotfix/critical-regex-bug` |
34
+
35
+ ## πŸ”„ Development Workflow
36
+
37
+ ### 1. Start a New Feature
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ```bash
40
+ # Always start from develop
41
+ git checkout develop
42
+ git pull origin develop
43
 
44
+ # Create feature branch
45
+ git checkout -b feature/my-awesome-feature
46
 
47
+ # Make changes...
48
+ # Write tests...
49
+ # Update CHANGELOG.md under [Unreleased]
50
 
51
+ # Commit with conventional messages
52
+ git commit -m "feat: add support for Paytm VPA patterns"
 
53
 
54
+ # Push and create PR
55
+ git push -u origin feature/my-awesome-feature
56
+ # Create PR: feature/my-awesome-feature β†’ develop
57
+ ```
58
 
59
+ ### 2. Fix a Bug
 
 
60
 
61
  ```bash
62
+ git checkout develop
63
+ git pull origin develop
64
+ git checkout -b fix/issue-123-unicode-error
65
 
66
+ # Fix the bug...
67
+ # Add test to prevent regression...
68
 
69
+ git commit -m "fix: handle β‚Ή symbol in amount parsing"
70
+ git push -u origin fix/issue-123-unicode-error
71
+ # Create PR: fix/issue-123-unicode-error β†’ develop
72
  ```
73
 
74
+ ### 3. Create a Release
75
 
76
+ **Use the release script:**
77
 
78
  ```bash
79
+ # Preview what will happen
80
+ python scripts/release.py 1.0.4 --dry-run
 
81
 
82
+ # Execute release
83
+ python scripts/release.py 1.0.4
84
  ```
85
 
86
+ The script will:
87
+ 1. βœ… Verify you're on `develop`
88
+ 2. βœ… Run all tests
89
+ 3. βœ… Run benchmark suite
90
+ 4. βœ… Update version in `pyproject.toml`
91
+ 5. βœ… Update `CHANGELOG.md`
92
+ 6. βœ… Merge to `main`
93
+ 7. βœ… Create git tag `v1.0.4`
94
+ 8. βœ… Build and upload to PyPI
95
+ 9. βœ… Return to `develop`
96
+
97
+ ## πŸ§ͺ Pre-Merge Checklist
98
 
99
+ Before your PR can be merged:
100
+
101
+ - [ ] All tests pass: `pytest tests/ -v`
102
+ - [ ] Benchmark runs: `python benchmark.py --all`
103
+ - [ ] CHANGELOG.md updated under `[Unreleased]`
104
+ - [ ] Code formatted: `black src/ tests/`
105
+ - [ ] Linting passes: `ruff check src/ tests/`
106
+ - [ ] New features have tests
107
+ - [ ] Documentation updated if needed
108
+
109
+ ## πŸ“‹ Commit Message Format
110
+
111
+ Use [Conventional Commits](https://www.conventionalcommits.org/):
112
 
113
  ```
114
+ <type>(<scope>): <description>
115
+
116
+ [optional body]
117
+
118
+ [optional footer]
119
+ ```
120
+
121
+ ### Types
122
+
123
+ | Type | Description |
124
+ |------|-------------|
125
+ | `feat` | New feature |
126
+ | `fix` | Bug fix |
127
+ | `docs` | Documentation only |
128
+ | `test` | Adding tests |
129
+ | `refactor` | Code refactoring |
130
+ | `perf` | Performance improvement |
131
+ | `chore` | Maintenance tasks |
132
+
133
+ ### Examples
134
+
135
+ ```
136
+ feat(regex): add Lakhs notation support
137
+ fix(parser): handle missing spaces in SMS
138
  docs: update README with torture tests
139
+ test: add edge cases for Unicode symbols
140
  chore: move notebooks to experiments/
141
  ```
142
 
143
+ ## πŸ“ Changelog Guidelines
144
+
145
+ Update `CHANGELOG.md` in every PR under `[Unreleased]`:
146
+
147
+ ```markdown
148
+ ## [Unreleased]
149
+ ### Added
150
+ - New feature you added
151
+
152
+ ### Changed
153
+ - Behavior you modified
154
+
155
+ ### Fixed
156
+ - Bug you fixed
157
+ ```
158
+
159
+ **Categories:**
160
+ - **Added** - New features
161
+ - **Changed** - Changes to existing features
162
+ - **Deprecated** - Features to be removed
163
+ - **Removed** - Removed features
164
+ - **Fixed** - Bug fixes
165
+ - **Security** - Security fixes
166
+
167
+ ## πŸ›‘οΈ Protected Branches
168
+
169
+ | Branch | Direct Push | Force Push | PR Required |
170
+ |--------|-------------|------------|-------------|
171
+ | `main` | ❌ | ❌ | βœ… (from develop only) |
172
+ | `develop` | ❌ | ❌ | βœ… (from feature/fix) |
173
+
174
  ## πŸ™ Thank You!
175
 
176
+ Every contribution makes FinEE better for the Indian fintech community!
scripts/release.py ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ FinEE Release Script
4
+ ====================
5
+
6
+ Automates the release process with proper checks:
7
+ 1. Ensure on develop branch
8
+ 2. Run all tests
9
+ 3. Run benchmark
10
+ 4. Merge to main
11
+ 5. Create release tag
12
+ 6. Build and upload to PyPI
13
+ 7. Sync to Hugging Face
14
+
15
+ Usage:
16
+ python scripts/release.py 1.0.4
17
+ python scripts/release.py 1.0.4 --dry-run # Preview only
18
+
19
+ Author: Ranjit Behera
20
+ """
21
+
22
+ import argparse
23
+ import subprocess
24
+ import sys
25
+ import re
26
+ from pathlib import Path
27
+
28
+
29
+ def run(cmd: str, check: bool = True, capture: bool = False):
30
+ """Run a shell command."""
31
+ print(f" $ {cmd}")
32
+ result = subprocess.run(cmd, shell=True, capture_output=capture, text=True)
33
+ if check and result.returncode != 0:
34
+ print(f" ❌ Command failed with exit code {result.returncode}")
35
+ if capture:
36
+ print(result.stderr)
37
+ sys.exit(1)
38
+ return result
39
+
40
+
41
+ def get_current_branch() -> str:
42
+ """Get current git branch."""
43
+ result = run("git branch --show-current", capture=True)
44
+ return result.stdout.strip()
45
+
46
+
47
+ def check_clean_working_dir() -> bool:
48
+ """Check if working directory is clean."""
49
+ result = run("git status --porcelain", capture=True)
50
+ return result.stdout.strip() == ""
51
+
52
+
53
+ def run_tests() -> bool:
54
+ """Run all tests."""
55
+ print("\nπŸ§ͺ Running tests...")
56
+ result = subprocess.run("./venv/bin/pytest tests/ -v", shell=True)
57
+ return result.returncode == 0
58
+
59
+
60
+ def run_benchmark() -> bool:
61
+ """Run benchmark suite."""
62
+ print("\nπŸ“Š Running benchmark...")
63
+ result = subprocess.run("./venv/bin/python benchmark.py --all", shell=True)
64
+ return result.returncode == 0
65
+
66
+
67
+ def update_version(new_version: str):
68
+ """Update version in pyproject.toml."""
69
+ pyproject = Path("pyproject.toml")
70
+ content = pyproject.read_text()
71
+
72
+ # Find and replace version
73
+ pattern = r'version = "[^"]+"'
74
+ new_content = re.sub(pattern, f'version = "{new_version}"', content)
75
+
76
+ pyproject.write_text(new_content)
77
+ print(f" βœ… Updated pyproject.toml to version {new_version}")
78
+
79
+
80
+ def update_changelog(new_version: str):
81
+ """Move [Unreleased] to new version section."""
82
+ changelog = Path("CHANGELOG.md")
83
+ content = changelog.read_text()
84
+
85
+ from datetime import date
86
+ today = date.today().isoformat()
87
+
88
+ # Replace [Unreleased] with new version
89
+ content = content.replace(
90
+ "## [Unreleased]",
91
+ f"## [Unreleased]\n### Added\n- (Next features go here)\n\n### Changed\n\n### Fixed\n\n---\n\n## [{new_version}] - {today}"
92
+ )
93
+
94
+ changelog.write_text(content)
95
+ print(f" βœ… Updated CHANGELOG.md for version {new_version}")
96
+
97
+
98
+ def main():
99
+ parser = argparse.ArgumentParser(description="FinEE Release Script")
100
+ parser.add_argument("version", help="New version (e.g., 1.0.4)")
101
+ parser.add_argument("--dry-run", action="store_true", help="Preview only, no changes")
102
+ parser.add_argument("--skip-tests", action="store_true", help="Skip test suite")
103
+ parser.add_argument("--skip-benchmark", action="store_true", help="Skip benchmark")
104
+ args = parser.parse_args()
105
+
106
+ version = args.version
107
+ if not re.match(r'^\d+\.\d+\.\d+$', version):
108
+ print(f"❌ Invalid version format: {version}")
109
+ print(" Expected: X.Y.Z (e.g., 1.0.4)")
110
+ sys.exit(1)
111
+
112
+ print("=" * 60)
113
+ print(f"πŸš€ FinEE Release v{version}")
114
+ print("=" * 60)
115
+
116
+ if args.dry_run:
117
+ print("⚠️ DRY RUN - No changes will be made\n")
118
+
119
+ # Step 1: Check branch
120
+ print("\nπŸ“ Step 1: Checking branch...")
121
+ branch = get_current_branch()
122
+ if branch != "develop":
123
+ print(f" ❌ Must be on 'develop' branch, currently on '{branch}'")
124
+ print(" Run: git checkout develop")
125
+ sys.exit(1)
126
+ print(f" βœ… On develop branch")
127
+
128
+ # Step 2: Check clean working directory
129
+ print("\nπŸ“ Step 2: Checking working directory...")
130
+ if not check_clean_working_dir():
131
+ print(" ❌ Working directory not clean")
132
+ print(" Run: git status")
133
+ sys.exit(1)
134
+ print(" βœ… Working directory clean")
135
+
136
+ # Step 3: Run tests
137
+ if not args.skip_tests:
138
+ print("\nπŸ“ Step 3: Running tests...")
139
+ if not args.dry_run:
140
+ if not run_tests():
141
+ print(" ❌ Tests failed! Fix before releasing.")
142
+ sys.exit(1)
143
+ print(" βœ… All tests passed")
144
+ else:
145
+ print(" ⏭️ Skipped (dry run)")
146
+
147
+ # Step 4: Run benchmark
148
+ if not args.skip_benchmark:
149
+ print("\nπŸ“ Step 4: Running benchmark...")
150
+ if not args.dry_run:
151
+ if not run_benchmark():
152
+ print(" ⚠️ Benchmark had failures (continuing...)")
153
+ print(" βœ… Benchmark complete")
154
+ else:
155
+ print(" ⏭️ Skipped (dry run)")
156
+
157
+ # Step 5: Update version
158
+ print("\nπŸ“ Step 5: Updating version...")
159
+ if not args.dry_run:
160
+ update_version(version)
161
+ update_changelog(version)
162
+ run(f'git add pyproject.toml CHANGELOG.md')
163
+ run(f'git commit -m "chore: bump version to {version}"')
164
+ run('git push origin develop')
165
+ else:
166
+ print(f" ⏭️ Would update to version {version}")
167
+
168
+ # Step 6: Merge to main
169
+ print("\nπŸ“ Step 6: Merging to main...")
170
+ if not args.dry_run:
171
+ run('git checkout main')
172
+ run('git pull origin main')
173
+ run('git merge develop --no-edit')
174
+ run('git push origin main')
175
+ else:
176
+ print(" ⏭️ Would merge develop β†’ main")
177
+
178
+ # Step 7: Create tag
179
+ print("\nπŸ“ Step 7: Creating release tag...")
180
+ if not args.dry_run:
181
+ run(f'git tag -a v{version} -m "Release v{version}"')
182
+ run(f'git push origin v{version}')
183
+ else:
184
+ print(f" ⏭️ Would create tag v{version}")
185
+
186
+ # Step 8: Build and upload to PyPI
187
+ print("\nπŸ“ Step 8: Building and uploading to PyPI...")
188
+ if not args.dry_run:
189
+ run('rm -rf dist/')
190
+ run('./venv/bin/python -m build')
191
+ run('./venv/bin/python -m twine upload dist/*')
192
+ else:
193
+ print(" ⏭️ Would build and upload to PyPI")
194
+
195
+ # Step 9: Return to develop
196
+ print("\nπŸ“ Step 9: Returning to develop branch...")
197
+ if not args.dry_run:
198
+ run('git checkout develop')
199
+ run('git merge main')
200
+ run('git push origin develop')
201
+ else:
202
+ print(" ⏭️ Would checkout develop")
203
+
204
+ # Done
205
+ print("\n" + "=" * 60)
206
+ if args.dry_run:
207
+ print("βœ… DRY RUN COMPLETE - No changes made")
208
+ print(f" Run without --dry-run to release v{version}")
209
+ else:
210
+ print(f"πŸŽ‰ RELEASE v{version} COMPLETE!")
211
+ print("\nLinks:")
212
+ print(f" PyPI: https://pypi.org/project/finee/{version}/")
213
+ print(" GitHub: https://github.com/Ranjitbehera0034/Finance-Entity-Extractor/releases")
214
+ print("=" * 60)
215
+
216
+
217
+ if __name__ == "__main__":
218
+ main()