Ranjit Behera commited on
Commit
438d5f9
·
1 Parent(s): 9101d7e

docs: Update CHANGELOG for v1.1.0 release

Browse files
Files changed (1) hide show
  1. CHANGELOG.md +36 -4
CHANGELOG.md CHANGED
@@ -9,11 +9,43 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
9
  ### Added
10
  - (Next features go here)
11
 
12
- ### Changed
13
- - (Changes to existing features)
14
 
15
- ### Fixed
16
- - (Bug fixes)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ---
19
 
 
9
  ### Added
10
  - (Next features go here)
11
 
12
+ ---
 
13
 
14
+ ## [1.1.0] - 2026-01-12
15
+ ### Added
16
+ - **Complete Data Pipeline** (`scripts/data_pipeline/`)
17
+ - `step1_unify.py`: Unifies MBOX, JSON, CSV, XML sources
18
+ - `step2_filter.py`: Removes OTPs, spam, marketing messages
19
+ - `step3_baseline.py`: Tests regex extractor accuracy
20
+ - `step4_label.py`: Creates labeled training data with ground truth
21
+
22
+ - **Synthetic Data Generator**
23
+ - `generate_synthetic.py`: Production-grade grammar-based generator
24
+ - 100K+ realistic Indian bank transactions
25
+ - All major banks (HDFC, ICICI, SBI, Axis, Kotak, PNB, BOB, etc.)
26
+ - Brokerages (Zerodha, Groww, Upstox, Angel One, 5Paisa, etc.)
27
+ - E-commerce, food, travel, utilities, entertainment categories
28
+ - `generate_advanced.py`: Advanced features
29
+ - Markov Chain for realistic message flow
30
+ - Real data calibration from actual samples
31
+ - Multilingual support (Hindi, Tamil, Telugu, Bengali, Kannada)
32
+ - Data augmentation and edge case oversampling
33
+
34
+ - **LLM Fine-tuning Pipeline** (`scripts/finetune.py`)
35
+ - Supports MLX (Apple Silicon) and PyTorch backends
36
+ - LoRA fine-tuning with automatic data preparation
37
+ - Model fusion and evaluation utilities
38
+
39
+ ### Performance
40
+ - Trained on 152,519 records (2,419 real + 100K synthetic + 50K multilingual)
41
+ - Val loss: 2.42 → 0.46 (81% reduction)
42
+ - 100% JSON parsing accuracy on test cases
43
+ - Multilingual extraction working (Hindi, Tamil, Telugu, Bengali, Kannada)
44
+ - Fine-tuned model: 7.6GB (Phi-3-mini + LoRA fused)
45
+
46
+ ### Models
47
+ - Fine-tuned model: `finetuned-v1/` on Hugging Face
48
+ - LoRA adapters: `lora-adapters/` on Hugging Face
49
 
50
  ---
51