You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Configuration Parsing Warning: Invalid JSON for config file tokenizer_config.json

ABIRMARv1 — Marathi-First Transformer Language Model

ABIRMARv1 is a custom transformer-based causal language model designed specifically for Marathi language understanding and generation. It is trained and optimized for Marathi text using curated Indic datasets and a custom tokenizer to ensure efficient and accurate Marathi NLP performance.

This model focuses on delivering strong contextual understanding, efficient inference, and scalable Marathi AI capabilities.


Model Details

Model Description

ABIRMARv1 is a decoder-only transformer language model designed for Marathi text generation and understanding. It builds upon Marathi-focused datasets and architecture optimizations to provide reliable Marathi NLP performance.

  • Developed by: Abir Maheshwari
  • Funded by: Independent Research
  • Shared by: Abir Maheshwari
  • Model type: Causal Language Model (Decoder-only Transformer)
  • Language(s): Marathi
  • License: MIT
  • Base model: abirmaheshwari/abirmarv1

Model Sources


Uses

Direct Use

ABIRMARv1 is suitable for:

  • Marathi text generation
  • Marathi conversational AI
  • Marathi chatbots
  • Text completion
  • NLP research
  • Educational purposes

Example applications:

  • Marathi AI assistants
  • Marathi content generation
  • Marathi NLP research

Downstream Use

This model can be fine-tuned for:

  • Marathi instruction models
  • Question answering
  • Domain-specific Marathi NLP tasks
  • Conversational AI systems

Out-of-Scope Use

Not recommended for:

  • Medical advice
  • Legal advice
  • Safety-critical systems
  • High-risk decision systems

This is an early-stage research model.


Bias, Risks, and Limitations

ABIRMARv1 may:

  • Produce incorrect or incomplete outputs
  • Reflect biases present in training data
  • Generate nonsensical responses in complex scenarios

These limitations are expected for models trained on limited or domain-specific datasets.


Recommendations

Use this model:

  • For research
  • For experimentation
  • For Marathi AI development
  • For fine-tuning and improvement

Not recommended for production use without further fine-tuning.


How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("abirmaheshwari/abirmarv1")
model = AutoModelForCausalLM.from_pretrained("abirmaheshwari/abirmarv1")

input_text = "महाराष्ट्र हा भारतातील एक महत्त्वाचा राज्य आहे कारण"

inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=100,
    temperature=0.7
)

print(tokenizer.decode(outputs[0]))

---

# Training Details

## Training Data

ABIRMARv1 was trained using curated Marathi language datasets designed to provide strong linguistic coverage and contextual understanding.

The training datasets include:

- ai4bharat/IndicCorpV2  
- ai4bharat/Bhasha-Abhijnaanam  

These datasets contain high-quality Marathi text covering multiple domains, enabling robust Marathi language modeling.

---

## Training Procedure

### Preprocessing

The dataset was processed using a custom-trained Byte Pair Encoding (BPE) tokenizer optimized for Marathi language modeling.

Tokenizer specifications:

- Vocabulary size: 32,000 tokens  
- Maximum sequence length: 512 tokens  
- Tokenizer trained from scratch on Marathi-focused datasets  

---

### Training Hyperparameters

The model was trained using the following configuration:

- Optimizer: AdamW  
- Learning rate: 5e-5  
- Precision: FP16 mixed precision  
- Training objective: Causal Language Modeling  

---

### Training Hardware

Training was performed using GPU acceleration.

- GPU: NVIDIA GPU (CUDA-enabled)  
- Framework: PyTorch  
- Library: HuggingFace Transformers  

---

# Evaluation

## Testing Data

Evaluation was conducted using Marathi text samples representative of real-world Marathi language usage.

---

## Metrics

Evaluation metrics included:

- BLEU score  
- Training loss monitoring  

---

## Results

ABIRMARv1 demonstrates successful learning of:

- Marathi sentence structure  
- Context-aware text generation  
- Marathi token relationships  
- Language continuity and coherence  

The model provides functional Marathi generation capability suitable for research and fine-tuning applications.

---

# Technical Specifications

## Architecture

ABIRMARv1 uses a decoder-only Transformer architecture consisting of:

- Token embedding layer  
- Learned positional embeddings  
- Multi-head self-attention layers  
- Feedforward neural network layers  
- GELU activation function  
- Weight tying between embedding and output layers  

---

## Model Size

- Total parameters: ~96 Million  
- Context length: 512 tokens  
- Vocabulary size: 32,000 tokens  

---

# Compute Infrastructure

## Hardware

- NVIDIA GPU  

---

## Software

- Python  
- PyTorch  
- HuggingFace Transformers  
- SafeTensors  

---

# Environmental Impact

Training specifications:

- Hardware type: NVIDIA GPU  
- Training duration: ~34 hours  
- Framework: PyTorch  

---

# Author

Abir Maheshwari  
Independent AI Researcher  

HuggingFace Profile:  
https://huggingface.co/abirmaheshwari  

---

# Version

ABIRMARv1  

Initial release version.

---

# Contact

For research inquiries, collaboration, or technical questions:

HuggingFace:  
https://huggingface.co/abirmaheshwari  

---
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abirmaheshwari/abirmarv1

Unable to build the model tree, the base model loops to the model itself. Learn more.

Datasets used to train abirmaheshwari/abirmarv1