|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- conversational-ai |
|
|
- code-generation |
|
|
- python |
|
|
- gpt-neo |
|
|
- instruction-following |
|
|
- codesearchnet |
|
|
base_model: EleutherAI/gpt-neo-1.3B |
|
|
datasets: |
|
|
- OpenAssistant/oasst1 |
|
|
- code_search_net |
|
|
model-index: |
|
|
- name: gpt-neo-1.3b-code-conversation |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
dataset: |
|
|
type: code_search_net |
|
|
name: CodeSearchNet Python |
|
|
metrics: |
|
|
- type: loss |
|
|
value: 0.4554 |
|
|
name: Training Loss |
|
|
--- |
|
|
|
|
|
# GPT-Neo 1.3B Enhanced for Code and Conversation |
|
|
|
|
|
A fine-tuned version of GPT-Neo 1.3B optimized for both conversational AI and Python code generation. This model combines instruction-following capabilities with comprehensive Python programming knowledge through a multi-layer fine-tuning approach. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**Base Model**: EleutherAI/gpt-neo-1.3B |
|
|
**Fine-tuning Approach**: Multi-layer sequential training |
|
|
**Specializations**: Conversation + Python Code Generation |
|
|
|
|
|
### Training Layers: |
|
|
1. **Conversational Foundation**: Fine-tuned on high-quality dialogue data for instruction-following |
|
|
2. **Code Specialization**: Enhanced with 362,059 Python code examples from CodeSearchNet dataset |
|
|
3. **Integration**: Maintains conversational abilities while adding strong coding capabilities |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Architecture**: GPT-Neo 1.3B (transformer-based autoregressive language model) |
|
|
- **Training Infrastructure**: European HPC systems with AMD GPU acceleration |
|
|
- **Distributed Training**: Multi-GPU setup with gradient accumulation |
|
|
- **Final Training Loss**: 0.4554 (excellent convergence) |
|
|
- **CodeSearchNet Dataset**: 362,059 high-quality Python code-documentation pairs |
|
|
- **Training Duration**: ~6 hours on 8x AMD MI250X GPUs |
|
|
- **Optimization**: AdamW optimizer with cosine annealing schedule |
|
|
|
|
|
## Capabilities |
|
|
|
|
|
### Code Generation |
|
|
- **Python Functions**: Complete implementations with proper documentation |
|
|
- **Algorithm Development**: Data structures, algorithms, and problem-solving |
|
|
- **Code Explanation**: Clear explanations of functionality and logic |
|
|
- **Documentation**: Automatic docstring and comment generation |
|
|
|
|
|
### Conversational AI |
|
|
- **Instruction Following**: Responds appropriately to coding requests |
|
|
- **Technical Explanations**: Breaks down complex programming concepts |
|
|
- **Problem Solving**: Helps debug and optimize code solutions |
|
|
- **Educational Content**: Teaches programming concepts step-by-step |
|
|
|
|
|
## Usage Examples |
|
|
|
|
|
### Python Code Generation |
|
|
```python |
|
|
from transformers import GPTNeoForCausalLM, GPT2Tokenizer |
|
|
|
|
|
model = GPTNeoForCausalLM.from_pretrained("raimondskrauklis/gpt-neo-1.3b-code-conversation") |
|
|
tokenizer = GPT2Tokenizer.from_pretrained("raimondskrauklis/gpt-neo-1.3b-code-conversation") |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
|
|
# Code generation example |
|
|
prompt = "Human: Write a Python function that calculates the factorial of a number\nAssistant:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=200, temperature=0.7, do_sample=True) |
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
Code Explanation |
|
|
pythonprompt = "Human: Explain how binary search works in Python\nAssistant:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=300, temperature=0.7) |
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
Debugging Assistance |
|
|
pythonprompt = "Human: Why does this Python code give a list index error?\ncode: for i in range(len(data)+1): print(data[i])\nAssistant:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=250, temperature=0.7) |
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
Training Methodology |
|
|
Multi-Layer Fine-tuning Strategy |
|
|
|
|
|
Base Selection: Started with EleutherAI's GPT-Neo 1.3B pre-trained model |
|
|
Layer 1 - Conversational: Fine-tuned on dialogue data for instruction-following |
|
|
Layer 2 - Code Enhancement: Specialized training on CodeSearchNet Python dataset |
|
|
Quality Assurance: Rigorous filtering for high-quality code-documentation pairs |
|
|
|
|
|
Technical Implementation |
|
|
|
|
|
Distributed Training: 8x AMD MI250X GPUs with proper CPU-GPU affinity |
|
|
Batch Configuration: Per-device batch size of 4 with gradient accumulation |
|
|
Learning Rate: 5e-6 with cosine annealing schedule |
|
|
Sequence Length: 512 tokens maximum |
|
|
Epochs: 2 epochs over full dataset for optimal convergence |
|
|
|
|
|
Performance Metrics |
|
|
|
|
|
Training Loss Progression: 0.9556 → 0.4554 (excellent convergence) |
|
|
Dataset Coverage: 362,059 Python code examples |
|
|
Training Efficiency: ~11,315 batches per epoch |
|
|
Model Size: ~5.3GB (2x safetensors files) |
|
|
Context Length: 512 tokens |
|
|
|
|
|
Limitations |
|
|
|
|
|
Language Focus: Primarily trained on Python code (limited other programming languages) |
|
|
Code Complexity: Best performance on functions under 100 lines |
|
|
Validation Required: Generated code should be tested before production use |
|
|
Knowledge Cutoff: Training data reflects pre-2024 coding practices |
|
|
Context Window: Limited to 512 tokens for generation |
|
|
|
|
|
Ethical Considerations |
|
|
|
|
|
Code Review: All generated code should be reviewed for security and correctness |
|
|
Bias Awareness: May reflect biases present in training data |
|
|
Responsible Use: Not intended for malicious code generation |
|
|
Attribution: Based on open-source datasets and models |
|
|
|
|
|
Technical Specifications |
|
|
|
|
|
Model Type: Causal Language Model (GPT-Neo architecture) |
|
|
Parameters: 1.3 billion |
|
|
Vocabulary Size: 50,257 tokens |
|
|
Hidden Size: 2,048 |
|
|
Attention Heads: 16 |
|
|
Layers: 24 |
|
|
Context Length: 2,048 tokens (training used 512) |
|
|
|
|
|
Citation |
|
|
bibtex@misc{gpt-neo-code-conversation-2025, |
|
|
title={GPT-Neo 1.3B Enhanced for Code and Conversation}, |
|
|
author={Raimonds Krauklis}, |
|
|
year={2025}, |
|
|
howpublished={Hugging Face Model Hub}, |
|
|
url={https://huggingface.co/raimondskrauklis/gpt-neo-1.3b-code-conversation}, |
|
|
note={Fine-tuned on European HPC infrastructure using CodeSearchNet dataset} |
|
|
} |
|
|
Acknowledgments |
|
|
|
|
|
Base Model: EleutherAI for GPT-Neo 1.3B |
|
|
Dataset: CodeSearchNet by GitHub/Microsoft Research |
|
|
Infrastructure: European high-performance computing systems |
|
|
Framework: Hugging Face Transformers and PyTorch ecosystem |
|
|
|
|
|
Model Card Contact |
|
|
For questions about this model, please open an issue in the model repository or contact through Hugging Face. |