--- library_name: transformers license: mit datasets: - worldbank-datause/PRWP base_model: - openai-community/gpt2 pipeline_tag: text-generation --- Model Card for Your Model Model Details Model Description This is a transformers-based model fine-tuned for generative AI tasks, particularly in data engineering and AI service applications. It has been optimized for structured text generation, analytics, and AI-assisted workflows. The model supports multi-turn interactions and is designed for business intelligence, data insights, and technical documentation generation. Developed by: [Harshraj Bhoite] Funded by: Self-funded Shared by: [Harshraj] Model type: Transformer-based ( GPT-2) Language(s) (NLP): English License: Apache 2.0 / MIT / Custom Finetuned from model: [GPT-2] (e.g., GPT-2, BERT, T5) Model Sources Repository: [https://huggingface.co/Harshraj8721/agri_finetuned_model] Uses Direct Use AI-assisted data engineering documentation generation Business intelligence reports and data insights automation Technical content creation for AI and analytics Downstream Use Fine-tuning for Agriculture-specific AI Conversational AI in data analytics applications AI-driven customer support for analytics tools Out-of-Scope Use Not intended for real-time conversational AI without further optimization May not perform well in non-English languages Bias, Risks, and Limitations Bias: Model performance may be influenced by the dataset used. Limitations: It may generate inaccurate or misleading responses in highly technical scenarios. Mitigation: Users should validate outputs for critical decision-making. How to Get Started with the Model from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Harshraj8721/agri_finetuned_model" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) input_text = "Explain Delta Lake architecture" inputs = tokenizer(input_text, return_tensors="pt") output = model.generate(**inputs) print(tokenizer.decode(output[0])) Training Details Training Data Dataset: Proprietary dataset of technical blogs, data engineering articles, and structured datasets. Preprocessing: Tokenization with Byte Pair Encoding (BPE) or WordPiece. Training Procedure Hyperparameters Batch size: 16 Learning rate: 3e-5 Precision: fp16 mixed precision Optimizer: AdamW Compute Infrastructure Hardware: NVIDIA A100 GPUs (x4) Cloud Provider: AWS / Azure / GCP Training Duration: ~36 hours Evaluation Testing Data, Factors & Metrics Testing Data Synthetic datasets from AI-powered analytics use cases Real-world structured datasets from data engineering pipelines Metrics Perplexity (PPL): Measures how well the model predicts text BLEU Score: Evaluates generated text quality F1 Score: Measures precision and recall Results Perplexity: 9.7 (lower is better) BLEU Score: 34.2 (higher is better) F1 Score: 85.5% Environmental Impact Hardware Type: NVIDIA A100 GPUs Hours used: 36 hours Carbon Emitted: ~50 kg CO2eq (estimated using ML CO2 Impact Calculator) Citation If you use this model, please cite it as follows: @misc{Harshraj8721/agri_finetuned_model/2025, title={agri_finetuned_model}, author={Harshraj Bhoite}, year={2025}, url={https://huggingface.co/Harshraj8721/agri_finetuned_model} } Contact For queries, reach out to: Email: harshraj8721@gmail.com LinkedIn: Linkedin/in/harshrajb/