raimondskrauklis
/

gpt-neo-1.3b-code-conversation

@@ -10,44 +10,81 @@ tags:
 - python
 - gpt-neo
 - instruction-following
 metrics:
 - name: Training Loss (Final)
   type: loss
   value: 0.4554
   verified: false
-- name: Dataset Size
   type: examples
   value: 362059
   verified: false
 ---
 # GPT-Neo 1.3B Enhanced for Code and Conversation
-A fine-tuned version of GPT-Neo 1.3B optimized for both conversational AI and Python code generation. This model combines instruction-following capabilities with comprehensive Python programming knowledge.
 ## Model Description
-This model represents a multi-layer fine-tuning approach:
-- **Base**: EleutherAI's GPT-Neo 1.3B
-- **Layer 1**: Conversational fine-tuning for instruction-following
-- **Layer 2**: Python code generation using CodeSearchNet dataset (362,059 examples)
 ## Training Details
 - **Architecture**: GPT-Neo 1.3B (transformer-based autoregressive language model)
-- **Training Data**: High-quality Python code examples with documentation
 - **Training Infrastructure**: European HPC systems with AMD GPU acceleration
-- **Optimization**: Multi-GPU distributed training with gradient accumulation
 - **Final Training Loss**: 0.4554 (excellent convergence)
-## Usage
 ### Code Generation
 ```python
 from transformers import GPTNeoForCausalLM, GPT2Tokenizer
-model = GPTNeoForCausalLM.from_pretrained("your-username/gpt-neo-1.3b-code-conversation")
-tokenizer = GPT2Tokenizer.from_pretrained("your-username/gpt-neo-1.3b-code-conversation")
 tokenizer.pad_token = tokenizer.eos_token
 # Code generation example
@@ -56,56 +93,66 @@ inputs = tokenizer(prompt, return_tensors="pt")
 outputs = model.generate(**inputs, max_length=200, temperature=0.7, do_sample=True)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
-Conversational AI
-python# Conversation example
-prompt = "Human: Explain machine learning in simple terms\nAssistant:"
 inputs = tokenizer(prompt, return_tensors="pt")
-outputs = model.generate(**inputs, max_length=150, temperature=0.7)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
 Training Methodology
-The model was trained using a proven multi-layer approach:
-Conversational Foundation: Initial fine-tuning on high-quality conversation data
-Code Specialization: Subsequent training on curated Python programming examples
-Quality Filtering: Rigorous filtering for meaningful code-documentation pairs
-Distributed Training: Efficient scaling across multiple GPUs
-Performance Characteristics
-Code Understanding: Strong comprehension of Python syntax and patterns
-Documentation: Ability to explain code functionality clearly
-Instruction Following: Responds appropriately to programming requests
-Conversational Flow: Maintains context in multi-turn interactions
-Model Capabilities
-Code Generation
-Python functions with proper documentation
-Algorithm implementations
-Data structure manipulations
-Error handling patterns
-Conversational AI
-Technical explanations
-Step-by-step instructions
-Problem-solving discussions
-Educational content
 Limitations
-Primarily trained on Python code (limited other languages)
-May generate plausible but incorrect code for complex tasks
-Training data cutoff affects knowledge of recent libraries
-Best results with clear, specific prompts
 Ethical Considerations
-Model outputs should be reviewed for production use
-Code suggestions require testing and validation
-Potential for generating biased or inappropriate content
-Users responsible for compliance with applicable regulations
 Citation
 bibtex@misc{gpt-neo-code-conversation-2025,
@@ -113,7 +160,15 @@ bibtex@misc{gpt-neo-code-conversation-2025,
   author={Raimonds Krauklis},
   year={2025},
   howpublished={Hugging Face Model Hub},
-  url={https://huggingface.co/raimondskrauklis/gpt-neo-1.3b-code-conversation}
 }
 Acknowledgments
-Training conducted using European high-performance computing infrastructure. Based on EleutherAI's GPT-Neo and CodeSearchNet dataset.

 - python
 - gpt-neo
 - instruction-following
+- codesearchnet
+base_model: EleutherAI/gpt-neo-1.3B
+datasets:
+- OpenAssistant/oasst1
+- code_search_net
 metrics:
 - name: Training Loss (Final)
   type: loss
   value: 0.4554
   verified: false
+- name: Dataset Size (CodeSearchNet)
   type: examples
   value: 362059
   verified: false
+model-index:
+- name: gpt-neo-1.3b-code-conversation
+  results:
+  - task:
+      type: text-generation
+    dataset:
+      type: code_search_net
+      name: CodeSearchNet Python
+    metrics:
+    - type: loss
+      value: 0.4554
+      name: Training Loss
 ---
 # GPT-Neo 1.3B Enhanced for Code and Conversation
+A fine-tuned version of GPT-Neo 1.3B optimized for both conversational AI and Python code generation. This model combines instruction-following capabilities with comprehensive Python programming knowledge through a multi-layer fine-tuning approach.
 ## Model Description
+**Base Model**: EleutherAI/gpt-neo-1.3B
+**Fine-tuning Approach**: Multi-layer sequential training
+**Specializations**: Conversation + Python Code Generation
+### Training Layers:
+1. **Conversational Foundation**: Fine-tuned on high-quality dialogue data for instruction-following
+2. **Code Specialization**: Enhanced with 362,059 Python code examples from CodeSearchNet dataset
+3. **Integration**: Maintains conversational abilities while adding strong coding capabilities
 ## Training Details
 - **Architecture**: GPT-Neo 1.3B (transformer-based autoregressive language model)
 - **Training Infrastructure**: European HPC systems with AMD GPU acceleration
+- **Distributed Training**: Multi-GPU setup with gradient accumulation
 - **Final Training Loss**: 0.4554 (excellent convergence)
+- **CodeSearchNet Dataset**: 362,059 high-quality Python code-documentation pairs
+- **Training Duration**: ~6 hours on 8x AMD MI250X GPUs
+- **Optimization**: AdamW optimizer with cosine annealing schedule
+## Capabilities
 ### Code Generation
+- **Python Functions**: Complete implementations with proper documentation
+- **Algorithm Development**: Data structures, algorithms, and problem-solving
+- **Code Explanation**: Clear explanations of functionality and logic
+- **Documentation**: Automatic docstring and comment generation
+### Conversational AI
+- **Instruction Following**: Responds appropriately to coding requests
+- **Technical Explanations**: Breaks down complex programming concepts
+- **Problem Solving**: Helps debug and optimize code solutions
+- **Educational Content**: Teaches programming concepts step-by-step
+## Usage Examples
+### Python Code Generation
 ```python
 from transformers import GPTNeoForCausalLM, GPT2Tokenizer
+model = GPTNeoForCausalLM.from_pretrained("raimondskrauklis/gpt-neo-1.3b-code-conversation")
+tokenizer = GPT2Tokenizer.from_pretrained("raimondskrauklis/gpt-neo-1.3b-code-conversation")
 tokenizer.pad_token = tokenizer.eos_token
 # Code generation example
 outputs = model.generate(**inputs, max_length=200, temperature=0.7, do_sample=True)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
+Code Explanation
+pythonprompt = "Human: Explain how binary search works in Python\nAssistant:"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=300, temperature=0.7)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+Debugging Assistance
+pythonprompt = "Human: Why does this Python code give a list index error?\ncode: for i in range(len(data)+1): print(data[i])\nAssistant:"
 inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=250, temperature=0.7)
 response = tokenizer.decode(outputs[0], skip_special_tokens=True)
 print(response)
 Training Methodology
+Multi-Layer Fine-tuning Strategy
+Base Selection: Started with EleutherAI's GPT-Neo 1.3B pre-trained model
+Layer 1 - Conversational: Fine-tuned on dialogue data for instruction-following
+Layer 2 - Code Enhancement: Specialized training on CodeSearchNet Python dataset
+Quality Assurance: Rigorous filtering for high-quality code-documentation pairs
+Technical Implementation
+Distributed Training: 8x AMD MI250X GPUs with proper CPU-GPU affinity
+Batch Configuration: Per-device batch size of 4 with gradient accumulation
+Learning Rate: 5e-6 with cosine annealing schedule
+Sequence Length: 512 tokens maximum
+Epochs: 2 epochs over full dataset for optimal convergence
+Performance Metrics
+Training Loss Progression: 0.9556 → 0.4554 (excellent convergence)
+Dataset Coverage: 362,059 Python code examples
+Training Efficiency: ~11,315 batches per epoch
+Model Size: ~5.3GB (2x safetensors files)
+Context Length: 512 tokens
 Limitations
+Language Focus: Primarily trained on Python code (limited other programming languages)
+Code Complexity: Best performance on functions under 100 lines
+Validation Required: Generated code should be tested before production use
+Knowledge Cutoff: Training data reflects pre-2024 coding practices
+Context Window: Limited to 512 tokens for generation
 Ethical Considerations
+Code Review: All generated code should be reviewed for security and correctness
+Bias Awareness: May reflect biases present in training data
+Responsible Use: Not intended for malicious code generation
+Attribution: Based on open-source datasets and models
+Technical Specifications
+Model Type: Causal Language Model (GPT-Neo architecture)
+Parameters: 1.3 billion
+Vocabulary Size: 50,257 tokens
+Hidden Size: 2,048
+Attention Heads: 16
+Layers: 24
+Context Length: 2,048 tokens (training used 512)
 Citation
 bibtex@misc{gpt-neo-code-conversation-2025,
   author={Raimonds Krauklis},
   year={2025},
   howpublished={Hugging Face Model Hub},
+  url={https://huggingface.co/raimondskrauklis/gpt-neo-1.3b-code-conversation},
+  note={Fine-tuned on European HPC infrastructure using CodeSearchNet dataset}
 }
 Acknowledgments
+Base Model: EleutherAI for GPT-Neo 1.3B
+Dataset: CodeSearchNet by GitHub/Microsoft Research
+Infrastructure: European high-performance computing systems
+Framework: Hugging Face Transformers and PyTorch ecosystem
+Model Card Contact
+For questions about this model, please open an issue in the model repository or contact through Hugging Face.