---
language:
- en
license: mit
tags:
- invoice-extraction
- structured-data
- phi-3
- sft
- text-generation
- document-understanding
- financial-nlp
datasets:
- custom-invoice-dataset
pipeline_tag: text-generation
base_model: microsoft/Phi-3-mini-4k-instruct
---
# BrahmaNet: Phi-3 SFT for Invoice Extraction



## Model Description
**BrahmaNet** is a specialized language model fine-tuned from Microsoft's Phi-3-mini-4k-instruct for extracting structured information from invoice documents. The model is optimized to understand invoice formats and convert unstructured text into well-structured JSON output.
- **Developed by:** Gokul Alex
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** MIT
- **Finetuned from model:** [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
## Uses
### Direct Use
This model is designed for extracting structured information from invoice documents including:
- Invoice numbers and dates
- Supplier/vendor information
- Total amounts and line items
- Customer details
- Payment terms
### Downstream Use
The model can be fine-tuned further for:
- Receipt processing
- Purchase order extraction
- Financial document analysis
- Custom structured data extraction tasks
### Out-of-Scope Use
- General purpose chat or conversation
- Mathematical reasoning beyond basic arithmetic
- Legal document analysis
- Medical or sensitive personal information extraction
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "gokulalex/BrahmaNet"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
# Prepare prompt
prompt = """Extract invoice information as JSON:
Document: Invoice Number: INV-2023-001, Date: 2023-10-15, Supplier: ABC Corporation, Total Amount: $1,250.00
JSON:"""
# Generate response
inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(
inputs.input_ids,
max_new_tokens=150,
do_sample=True,
temperature=0.3,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)