Harshraj8721
/

agri_finetuned_model

Text Generation

text-generation-inference

Model card Files Files and versions

agri_finetuned_model / README.md

Harshraj8721's picture

update readme

6e351d0 verified 10 months ago

|

history blame contribute delete

3.47 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- worldbank-datause/PRWP
	base_model:
	- openai-community/gpt2
	pipeline_tag: text-generation
	---

	Model Card for Your Model

	Model Details

	Model Description

	This is a transformers-based model fine-tuned for generative AI tasks, particularly in data engineering and AI service applications. It has been optimized for structured text generation, analytics, and AI-assisted workflows. The model supports multi-turn interactions and is designed for business intelligence, data insights, and technical documentation generation.

	Developed by: [Harshraj Bhoite]

	Funded by: Self-funded

	Shared by: [Harshraj]

	Model type: Transformer-based ( GPT-2)

	Language(s) (NLP): English

	License: Apache 2.0 / MIT / Custom

	Finetuned from model: [GPT-2] (e.g., GPT-2, BERT, T5)

	Model Sources

	Repository: [https://huggingface.co/Harshraj8721/agri_finetuned_model]


	Uses

	Direct Use

	AI-assisted data engineering documentation generation

	Business intelligence reports and data insights automation

	Technical content creation for AI and analytics

	Downstream Use

	Fine-tuning for Agriculture-specific AI

	Conversational AI in data analytics applications

	AI-driven customer support for analytics tools

	Out-of-Scope Use

	Not intended for real-time conversational AI without further optimization

	May not perform well in non-English languages

	Bias, Risks, and Limitations

	Bias: Model performance may be influenced by the dataset used.

	Limitations: It may generate inaccurate or misleading responses in highly technical scenarios.

	Mitigation: Users should validate outputs for critical decision-making.

	How to Get Started with the Model

	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "Harshraj8721/agri_finetuned_model"
	model = AutoModelForCausalLM.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	input_text = "Explain Delta Lake architecture"
	inputs = tokenizer(input_text, return_tensors="pt")
	output = model.generate(**inputs)
	print(tokenizer.decode(output[0]))

	Training Details

	Training Data

	Dataset: Proprietary dataset of technical blogs, data engineering articles, and structured datasets.

	Preprocessing: Tokenization with Byte Pair Encoding (BPE) or WordPiece.

	Training Procedure

	Hyperparameters

	Batch size: 16

	Learning rate: 3e-5

	Precision: fp16 mixed precision

	Optimizer: AdamW

	Compute Infrastructure

	Hardware: NVIDIA A100 GPUs (x4)

	Cloud Provider: AWS / Azure / GCP

	Training Duration: ~36 hours

	Evaluation

	Testing Data, Factors & Metrics

	Testing Data

	Synthetic datasets from AI-powered analytics use cases

	Real-world structured datasets from data engineering pipelines

	Metrics

	Perplexity (PPL): Measures how well the model predicts text

	BLEU Score: Evaluates generated text quality

	F1 Score: Measures precision and recall

	Results

	Perplexity: 9.7 (lower is better)

	BLEU Score: 34.2 (higher is better)

	F1 Score: 85.5%

	Environmental Impact

	Hardware Type: NVIDIA A100 GPUs

	Hours used: 36 hours

	Carbon Emitted: ~50 kg CO2eq (estimated using ML CO2 Impact Calculator)

	Citation

	If you use this model, please cite it as follows:

	@misc{Harshraj8721/agri_finetuned_model/2025,
	title={agri_finetuned_model},
	author={Harshraj Bhoite},
	year={2025},
	url={https://huggingface.co/Harshraj8721/agri_finetuned_model}
	}

	Contact

	For queries, reach out to:

	Email: harshraj8721@gmail.com

	LinkedIn: Linkedin/in/harshrajb/