---
library_name: transformers
license: mit
datasets:
- worldbank-datause/PRWP
base_model:
- openai-community/gpt2
pipeline_tag: text-generation
---

Model Card for Your Model

Model Details

Model Description

This is a transformers-based model fine-tuned for generative AI tasks, particularly in data engineering and AI service applications. It has been optimized for structured text generation, analytics, and AI-assisted workflows. The model supports multi-turn interactions and is designed for business intelligence, data insights, and technical documentation generation.

Developed by: [Harshraj Bhoite]

Funded by: Self-funded 

Shared by: [Harshraj]

Model type: Transformer-based ( GPT-2)

Language(s) (NLP): English

License: Apache 2.0 / MIT / Custom

Finetuned from model: [GPT-2] (e.g., GPT-2, BERT, T5)

Model Sources

Repository: [https://huggingface.co/Harshraj8721/agri_finetuned_model]


Uses

Direct Use

AI-assisted data engineering documentation generation

Business intelligence reports and data insights automation

Technical content creation for AI and analytics

Downstream Use

Fine-tuning for Agriculture-specific AI 

Conversational AI in data analytics applications

AI-driven customer support for analytics tools

Out-of-Scope Use

Not intended for real-time conversational AI without further optimization

May not perform well in non-English languages

Bias, Risks, and Limitations

Bias: Model performance may be influenced by the dataset used.

Limitations: It may generate inaccurate or misleading responses in highly technical scenarios.

Mitigation: Users should validate outputs for critical decision-making.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Harshraj8721/agri_finetuned_model"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_text = "Explain Delta Lake architecture"
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0]))

Training Details

Training Data

Dataset: Proprietary dataset of technical blogs, data engineering articles, and structured datasets.

Preprocessing: Tokenization with Byte Pair Encoding (BPE) or WordPiece.

Training Procedure

Hyperparameters

Batch size: 16

Learning rate: 3e-5

Precision: fp16 mixed precision

Optimizer: AdamW

Compute Infrastructure

Hardware: NVIDIA A100 GPUs (x4)

Cloud Provider: AWS / Azure / GCP

Training Duration: ~36 hours

Evaluation

Testing Data, Factors & Metrics

Testing Data

Synthetic datasets from AI-powered analytics use cases

Real-world structured datasets from data engineering pipelines

Metrics

Perplexity (PPL): Measures how well the model predicts text

BLEU Score: Evaluates generated text quality

F1 Score: Measures precision and recall

Results

Perplexity: 9.7 (lower is better)

BLEU Score: 34.2 (higher is better)

F1 Score: 85.5%

Environmental Impact

Hardware Type: NVIDIA A100 GPUs

Hours used: 36 hours

Carbon Emitted: ~50 kg CO2eq (estimated using ML CO2 Impact Calculator)

Citation

If you use this model, please cite it as follows:

@misc{Harshraj8721/agri_finetuned_model/2025,
  title={agri_finetuned_model},
  author={Harshraj Bhoite},
  year={2025},
  url={https://huggingface.co/Harshraj8721/agri_finetuned_model}
}

Contact

For queries, reach out to:

Email: harshraj8721@gmail.com

LinkedIn: Linkedin/in/harshrajb/