Instruction Tuning with Pythia
Ludy Hasby Aulia
[Modell Page] [Notebook]
Instruction Tuning, Image take from LM PRO
This project focuses on fine-tuning the large language model 'EleutherAI/pythia-410m' to enhance its ability to generate accurate and relevant responses to instruction-based prompts. By leveraging instruction-tuning techniques, we aim to:
- Reduce hallucinations and unwanted outputs
- Improve consistency and reliability in generated answers
- Enhance data privacy for company-specific use cases
- Lower operational costs by optimizing model performance
Fine-tuning also enables the model to better align with domain-specific requirements and organizational standards.
Key Libraries Used:
- PyTorch: For efficient deep learning model training and optimization
- Transformers: For state-of-the-art NLP model architectures and utilities
- LLama Library (Lamini): For streamlined instruction-tuning workflows
This repo contains:
- Fine Tune Model Tokenization
- Fine Tune Model Trainer
- Lamini Docs Dataset
- Notebook Model Development
- Inference App with HuggingFace
Usage and License Notices: The dataset is CC BY Lamini
- Overview
- LLM Selected
- Dataset Design and Preparation
- Fine Tuning Strategy
- Evaluation and Benchmarking
- Practical Implementation
Overview
Large Language Models (LLMs) have shown impressive generalization capabilities such as in-context-learning and chain-of-thoughts reasoning. To enable LLMs to follow natural language instructions and complete real-world tasks, we have been exploring methods of instruction-tuning of LLMs. This project demonstrates the process of instruction-tuning a large language model (LLM), specifically EleutherAI/pythia-410m, to improve its ability to follow natural language instructions and generate high-quality, relevant responses. By leveraging the lamini_docs dataset, we fine-tune the base model to better align with real-world instruction-following tasks, reduce hallucinations, and enhance reliability.
Base Large Language Model
For this project, EleutherAI/pythia-410m was chosen due to the following reasons:
- Accessibility & Licensing: Pythia is fully open-source and available on Hugging Face, making it easy to use, modify, and deploy without restrictive licenses.
- Architecture: It is based on the transformer architecture, which is well-suited for understanding and generating coherent, context-aware text.
- Community Support: Pythia has strong community backing, with pre-trained weights, documentation, and integration with popular libraries like
transformers. - Performance: While smaller than some models, Pythia-410m offers a good balance between computational efficiency and output quality, making it suitable for experimentation and prototyping.
- Instruction-Tuning Compatibility: The model can be fine-tuned on instruction datasets (such as lamini_docs) to improve its ability to follow prompts and generate relevant, structured responses.
Other models like LLaMA, Mistral, or DeepSeek may offer higher performance or larger parameter sizes, but Pythia is a practical choice for projects focused on open-source, reproducibility, and ease of deployment.
Here is EleutherAI/pythia-410m architectures:
GPTNeoXForCausalLM(
(gpt_neox): GPTNeoXModel(
(embed_in): Embedding(50304, 1024)
(emb_dropout): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0-23): 24 x GPTNeoXLayer(
(input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(post_attention_dropout): Dropout(p=0.0, inplace=False)
(post_mlp_dropout): Dropout(p=0.0, inplace=False)
(attention): GPTNeoXAttention(
(rotary_emb): GPTNeoXRotaryEmbedding()
(query_key_value): Linear(in_features=1024, out_features=3072, bias=True)
(dense): Linear(in_features=1024, out_features=1024, bias=True)
(attention_dropout): Dropout(p=0.0, inplace=False)
)
(mlp): GPTNeoXMLP(
(dense_h_to_4h): Linear(in_features=1024, out_features=4096, bias=True)
(dense_4h_to_h): Linear(in_features=4096, out_features=1024, bias=True)
(act): GELUActivation()
)
)
)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(embed_out): Linear(in_features=1024, out_features=50304, bias=False)
)
Data Design Preparation
Dataset Information
lamini_docs.jsonlcontains 1260 instruction-following preferrable response regarding Lamini information. This JSON file has the format as belom:question:str, A natural language instruction or prompt describing the task.answer:str, The preferred answer to the instruction, generated by Lamini.
Data Testing Example
Question input (test): Can Lamini generate technical documentation or user manuals for software projects?
Prefer answer from Lamini docs: Yes, Lamini can generate technical documentation and user manuals for software projects. It uses natural language generation techniques to create clear and concise documentation that is easy to understand for both technical and non-technical users. This can save developers a significant amount of time and effort in creating documentation, allowing them to focus on other aspects of their projects.
Data Preprocessing
Data is first loaded and then processed using the base model's tokenizer. The preprocessing steps include:
- Tokenization: Each question and answer is converted into tokens using the tokenizer from the pretrained model.
- Padding and Truncation:
- Questions are padded or truncated to a fixed length of 1000 tokens.
- Answers are padded or truncated to a fixed length of 100 tokens. This ensures all inputs and outputs have consistent shapes for efficient training.
- Train-Test Split:
After preprocessing, the dataset is split into training and testing sets to evaluate model performance.
This workflow prepares the data for fine-tuning and ensures compatibility with. Then we make pipelines to inference each input to output. with steps and function as follow:
- Generate Tokenization from Prompt: using model tokenizer
- Padding and Truncating : Since models expect inputs of fixed length, tokenized sequences are padded (adding special tokens to reach the required length) or truncated (cutting off tokens that exceed the maximum length). This ensures uniform input size for efficient batch processing.
- Generate Model Response
- Decode the Result from Tokenization: The output tokens produced by the model are converted back into human-readable.
- Strip the Prompt
The decoded output often contains the original prompt followed by the modelโs response. To isolate the modelโs answer, the prompt portion is removed, leaving only the generated response for evaluation or further processing.
def inference(prompt, model, tokenizer, max_input_token=1000, max_output_token=100):
"""
Function to generate model response from prompt
"""
# Generate Tokenization from prompt
inputs = tokenizer.encode(
prompt,
return_tensors="pt",
truncation=True,
max_length=max_input_token
)
# Generate Response
device = model.device
generate_token = model.generate(
inputs.to(device),
max_new_tokens=max_output_token
)
# Decode the result from tokenization
response = tokenizer.batch_decode(generate_token,
skip_special_tokens=True)
# Strip the prompt
response = response[0][len(prompt):]
return response
Handle Unrelevant Information
To handle questions that are outside the scope of Lamini Docs, the dataset includes examples specifically designed to teach the model to respond appropriately. For instance:
Question:Why do we shiver when we're cold?Answer:Letโs keep the discussion relevant to Lamini.Question:Why do we dream?Answer:Letโs keep the discussion relevant to Lamini.
This approach helps the model avoid answering unrelated questions and maintain focus on Lamini-
Fine Tune Strategy
Key Hyperparameters to Tune
learning_rate=1e-6, # learning rate, we reduce it because avoiding overfittingmax_steps=100, # steps can take up to 100 because of cost of computationper_device_train_batch_size=1, # batch size per device during training, we dont use GPUwarmup_steps=1, # warmup steps, to be stableper_device_eval_batch_size=1, # we dont use GPUoptim="adamw_torch", # optimizer, I think state of artgradient_accumulation_steps = 4, # beneficial to minimum GPUgradient_checkpointing=False,load_best_model_at_end=True,metric_for_best_model="eval_loss"
Training Result
Here is our logs that you can evaluate. Or you can Check our Notebook.
Potential Challenge
- Computational Resources : limitation of RAM and GPU, solution : Choose small LLM base model (400m - 1B)
- Repeating answer, solution : truncating response
- Bahasa Indonesia Context, solution: translating model before preprocessing, vice verse before sending
Evaluation Benchmarking
To assess the effectiveness of instruction tuning, we compare the responses generated by the baseline (pretrained) model and the fine-tuned model on both training and testing datasets. This benchmarking process highlights improvements in the model's ability to follow instructions and generate relevant answers.
Evaluation Steps:
- Select Sample Questions:
Use representative questions from both the training and testing sets. - Generate Responses:
Obtain answers from the baseline model and the fine-tuned model for each question. - Compare Outputs:
Evaluate the quality, relevance, and alignment of the generated responses against the preferred answers from the Lamini Docs dataset.
Implementation for Basic Fine Tuning Pipeline
These fine tuning model can be sketch as below
Fine-Tuning Pipeline
You should write a simplified implementation flow of how you would:
- Load a pre-trained open-source LLM 'EleutherAI/pythia-410m'
- Load instruction dataset
lamini_docs.jsonl - Tokenize and preprocess the data
base_model_tokenizer - Training Config
TrainingArguments(...) - Run the training using Hugging Faceโs
Trainer(...)or similar API
Our Fine Tune Model
๐ Our Fine-Tuned Model
You can explore and use our fine-tuned model directly on Hugging Face:
Lamini Docs Instruction-Tuned Model
๐ Workflow: Generate Instructions with the Fine-Tuned Model
Step-by-step guide to run inference from scratch:
Load the Fine-Tuned Model
from transformers import AutoModelForCausalLM fine_model_id = "ludyhasby/lamini_docs_100_steps" fine_model = AutoModelForCausalLM.from_pretrained(fine_model_id)Load the Tokenizer
from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained(fine_model_id) tokenizer.pad_token = tokenizer.eos_tokenPrepare Your Instruction/Question
- Write your prompt or instruction as a string.
Preprocess the Input
- Tokenize your prompt using the loaded tokenizer.
Generate Model Response
- Pass the tokenized input to the fine-tuned model for inference.
Decode the Output
- Convert the model's output tokens back to human-readable text.
Example Usage:
prompt = "How does Lamini handle background jobs?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = fine_model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
We have also developed an interactive web interface that allows you to engage directly with our fine-tuned LLM.
You can input your own instructions or questions and receive real-time responses from the model, making it easy to explore its capabilities and evaluate its performance.
๐ Try it now: Interact with our LLM on Hugging Face Spaces
This user-friendly interface is ideal for demonstrations, testing, and practical applicationsโno coding required!
๐ฅ๏ธ Run the Ready-to-Use Program
For a streamlined experience, simply run our provided script:
python src/load_fine_tune.py
This will automatically load the fine-tuned model and tokenizer, and prompt you for instructions.
Tip:
You can further customize the inference pipeline for batch processing, web API integration, or evaluation
Acknowledgement
This project benefits from Lamini, EleutherAI/phythia
Framework versions
- Transformers 4.37.2
- Pytorch 2.5.1+cpu
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- -
Model tree for ludyhasby/lamini_docs_100_steps
Base model
EleutherAI/pythia-410m