ABIRMARv1 — Marathi-First Transformer Language Model
ABIRMARv1 is a custom transformer-based causal language model designed specifically for Marathi language understanding and generation. It is trained and optimized for Marathi text using curated Indic datasets and a custom tokenizer to ensure efficient and accurate Marathi NLP performance.
This model focuses on delivering strong contextual understanding, efficient inference, and scalable Marathi AI capabilities.
Model Details
Model Description
ABIRMARv1 is a decoder-only transformer language model designed for Marathi text generation and understanding. It builds upon Marathi-focused datasets and architecture optimizations to provide reliable Marathi NLP performance.
- Developed by: Abir Maheshwari
- Funded by: Independent Research
- Shared by: Abir Maheshwari
- Model type: Causal Language Model (Decoder-only Transformer)
- Language(s): Marathi
- License: MIT
- Base model: abirmaheshwari/abirmarv1
Model Sources
- Repository: https://huggingface.co/abirmaheshwari/abirmarv1
- Datasets Used:
- ai4bharat/IndicCorpV2
- ai4bharat/Bhasha-Abhijnaanam
- Framework: PyTorch + HuggingFace Transformers
Uses
Direct Use
ABIRMARv1 is suitable for:
- Marathi text generation
- Marathi conversational AI
- Marathi chatbots
- Text completion
- NLP research
- Educational purposes
Example applications:
- Marathi AI assistants
- Marathi content generation
- Marathi NLP research
Downstream Use
This model can be fine-tuned for:
- Marathi instruction models
- Question answering
- Domain-specific Marathi NLP tasks
- Conversational AI systems
Out-of-Scope Use
Not recommended for:
- Medical advice
- Legal advice
- Safety-critical systems
- High-risk decision systems
This is an early-stage research model.
Bias, Risks, and Limitations
ABIRMARv1 may:
- Produce incorrect or incomplete outputs
- Reflect biases present in training data
- Generate nonsensical responses in complex scenarios
These limitations are expected for models trained on limited or domain-specific datasets.
Recommendations
Use this model:
- For research
- For experimentation
- For Marathi AI development
- For fine-tuning and improvement
Not recommended for production use without further fine-tuning.
How to Get Started
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("abirmaheshwari/abirmarv1")
model = AutoModelForCausalLM.from_pretrained("abirmaheshwari/abirmarv1")
input_text = "महाराष्ट्र हा भारतातील एक महत्त्वाचा राज्य आहे कारण"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=100,
temperature=0.7
)
print(tokenizer.decode(outputs[0]))
---
# Training Details
## Training Data
ABIRMARv1 was trained using curated Marathi language datasets designed to provide strong linguistic coverage and contextual understanding.
The training datasets include:
- ai4bharat/IndicCorpV2
- ai4bharat/Bhasha-Abhijnaanam
These datasets contain high-quality Marathi text covering multiple domains, enabling robust Marathi language modeling.
---
## Training Procedure
### Preprocessing
The dataset was processed using a custom-trained Byte Pair Encoding (BPE) tokenizer optimized for Marathi language modeling.
Tokenizer specifications:
- Vocabulary size: 32,000 tokens
- Maximum sequence length: 512 tokens
- Tokenizer trained from scratch on Marathi-focused datasets
---
### Training Hyperparameters
The model was trained using the following configuration:
- Optimizer: AdamW
- Learning rate: 5e-5
- Precision: FP16 mixed precision
- Training objective: Causal Language Modeling
---
### Training Hardware
Training was performed using GPU acceleration.
- GPU: NVIDIA GPU (CUDA-enabled)
- Framework: PyTorch
- Library: HuggingFace Transformers
---
# Evaluation
## Testing Data
Evaluation was conducted using Marathi text samples representative of real-world Marathi language usage.
---
## Metrics
Evaluation metrics included:
- BLEU score
- Training loss monitoring
---
## Results
ABIRMARv1 demonstrates successful learning of:
- Marathi sentence structure
- Context-aware text generation
- Marathi token relationships
- Language continuity and coherence
The model provides functional Marathi generation capability suitable for research and fine-tuning applications.
---
# Technical Specifications
## Architecture
ABIRMARv1 uses a decoder-only Transformer architecture consisting of:
- Token embedding layer
- Learned positional embeddings
- Multi-head self-attention layers
- Feedforward neural network layers
- GELU activation function
- Weight tying between embedding and output layers
---
## Model Size
- Total parameters: ~96 Million
- Context length: 512 tokens
- Vocabulary size: 32,000 tokens
---
# Compute Infrastructure
## Hardware
- NVIDIA GPU
---
## Software
- Python
- PyTorch
- HuggingFace Transformers
- SafeTensors
---
# Environmental Impact
Training specifications:
- Hardware type: NVIDIA GPU
- Training duration: ~3–4 hours
- Framework: PyTorch
---
# Author
Abir Maheshwari
Independent AI Researcher
HuggingFace Profile:
https://huggingface.co/abirmaheshwari
---
# Version
ABIRMARv1
Initial release version.
---
# Contact
For research inquiries, collaboration, or technical questions:
HuggingFace:
https://huggingface.co/abirmaheshwari
---
- Downloads last month
- 6
Model tree for abirmaheshwari/abirmarv1
Unable to build the model tree, the base model loops to the model itself. Learn more.