---
language:
- en
license: mit
tags:
- invoice-extraction
- structured-data
- phi-3
- sft
- text-generation
- document-understanding
- financial-nlp
datasets:
- custom-invoice-dataset
pipeline_tag: text-generation
base_model: microsoft/Phi-3-mini-4k-instruct
---

# BrahmaNet: Phi-3 SFT for Invoice Extraction

<div align="center">

![BrahmaNet Logo](https://img.shields.io/badge/BrahmaNet-Invoice%20Extraction-blue)
![Phi-3](https://img.shields.io/badge/Base%20Model-Phi--3-green)
![SFT](https://img.shields.io/badge/Method-Supervised%20Fine--Tuning-orange)

</div>

## Model Description

**BrahmaNet** is a specialized language model fine-tuned from Microsoft's Phi-3-mini-4k-instruct for extracting structured information from invoice documents. The model is optimized to understand invoice formats and convert unstructured text into well-structured JSON output.

- **Developed by:** Gokul Alex
- **Model type:** Causal Language Model
- **Language(s):** English
- **License:** MIT
- **Finetuned from model:** [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)

## Uses

### Direct Use

This model is designed for extracting structured information from invoice documents including:
- Invoice numbers and dates
- Supplier/vendor information
- Total amounts and line items
- Customer details
- Payment terms

### Downstream Use

The model can be fine-tuned further for:
- Receipt processing
- Purchase order extraction
- Financial document analysis
- Custom structured data extraction tasks

### Out-of-Scope Use

- General purpose chat or conversation
- Mathematical reasoning beyond basic arithmetic
- Legal document analysis
- Medical or sensitive personal information extraction

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "gokulalex/BrahmaNet"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Prepare prompt
prompt = """Extract invoice information as JSON:

Document: Invoice Number: INV-2023-001, Date: 2023-10-15, Supplier: ABC Corporation, Total Amount: $1,250.00

JSON:"""

# Generate response
inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.3,
    top_p=0.9,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)