gokulalex
/

BrahmaNet

Text Generation

invoice-extraction

structured-data

document-understanding

Model card Files Files and versions

BrahmaNet / README.md

gokulalex's picture

Update README.md

be6163e verified 2 months ago

|

history blame contribute delete

2.68 kB

	---
	language:
	- en
	license: mit
	tags:
	- invoice-extraction
	- structured-data
	- phi-3
	- sft
	- text-generation
	- document-understanding
	- financial-nlp
	datasets:
	- custom-invoice-dataset
	pipeline_tag: text-generation
	base_model: microsoft/Phi-3-mini-4k-instruct
	---

	# BrahmaNet: Phi-3 SFT for Invoice Extraction

	<div align="center">

	![BrahmaNet Logo](https://img.shields.io/badge/BrahmaNet-Invoice%20Extraction-blue)
	![Phi-3](https://img.shields.io/badge/Base%20Model-Phi--3-green)
	![SFT](https://img.shields.io/badge/Method-Supervised%20Fine--Tuning-orange)

	</div>

	## Model Description

	BrahmaNet is a specialized language model fine-tuned from Microsoft's Phi-3-mini-4k-instruct for extracting structured information from invoice documents. The model is optimized to understand invoice formats and convert unstructured text into well-structured JSON output.

	- Developed by: Gokul Alex
	- Model type: Causal Language Model
	- Language(s): English
	- License: MIT
	- Finetuned from model: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)

	## Uses

	### Direct Use

	This model is designed for extracting structured information from invoice documents including:
	- Invoice numbers and dates
	- Supplier/vendor information
	- Total amounts and line items
	- Customer details
	- Payment terms

	### Downstream Use

	The model can be fine-tuned further for:
	- Receipt processing
	- Purchase order extraction
	- Financial document analysis
	- Custom structured data extraction tasks

	### Out-of-Scope Use

	- General purpose chat or conversation
	- Mathematical reasoning beyond basic arithmetic
	- Legal document analysis
	- Medical or sensitive personal information extraction

	## How to Get Started with the Model

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	# Load model and tokenizer
	model_name = "gokulalex/BrahmaNet"
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True
	)

	# Prepare prompt
	prompt = """Extract invoice information as JSON:

	Document: Invoice Number: INV-2023-001, Date: 2023-10-15, Supplier: ABC Corporation, Total Amount: $1,250.00

	JSON:"""

	# Generate response
	inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
	outputs = model.generate(
	inputs.input_ids,
	max_new_tokens=150,
	do_sample=True,
	temperature=0.3,
	top_p=0.9,
	pad_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)