--- language: - en license: mit tags: - invoice-extraction - structured-data - phi-3 - sft - text-generation - document-understanding - financial-nlp datasets: - custom-invoice-dataset pipeline_tag: text-generation base_model: microsoft/Phi-3-mini-4k-instruct --- # BrahmaNet: Phi-3 SFT for Invoice Extraction
![BrahmaNet Logo](https://img.shields.io/badge/BrahmaNet-Invoice%20Extraction-blue) ![Phi-3](https://img.shields.io/badge/Base%20Model-Phi--3-green) ![SFT](https://img.shields.io/badge/Method-Supervised%20Fine--Tuning-orange)
## Model Description **BrahmaNet** is a specialized language model fine-tuned from Microsoft's Phi-3-mini-4k-instruct for extracting structured information from invoice documents. The model is optimized to understand invoice formats and convert unstructured text into well-structured JSON output. - **Developed by:** Gokul Alex - **Model type:** Causal Language Model - **Language(s):** English - **License:** MIT - **Finetuned from model:** [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) ## Uses ### Direct Use This model is designed for extracting structured information from invoice documents including: - Invoice numbers and dates - Supplier/vendor information - Total amounts and line items - Customer details - Payment terms ### Downstream Use The model can be fine-tuned further for: - Receipt processing - Purchase order extraction - Financial document analysis - Custom structured data extraction tasks ### Out-of-Scope Use - General purpose chat or conversation - Mathematical reasoning beyond basic arithmetic - Legal document analysis - Medical or sensitive personal information extraction ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "gokulalex/BrahmaNet" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True ) # Prepare prompt prompt = """Extract invoice information as JSON: Document: Invoice Number: INV-2023-001, Date: 2023-10-15, Supplier: ABC Corporation, Total Amount: $1,250.00 JSON:""" # Generate response inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True) outputs = model.generate( inputs.input_ids, max_new_tokens=150, do_sample=True, temperature=0.3, top_p=0.9, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response)