code-dataset / README.md
romcmu863's picture
Upload folder using huggingface_hub
7fec244 verified
---
tags:
- code
- programming
- dataset
pretty_name: "Coding Dataset"
---
# Coding Dataset
Production-grade dataset for training AI coding agents.
## Dataset Summary
- **Total Examples**: 6 (demo)
- **Languages**: Python, JavaScript, Java
- **Task Types**: Code Generation
- **License**: CC0-1.0
## Dataset Structure
### Data Splits
- train: 70% of data
- validation: 15% of data
- test: 15% of data
### Features
- `id` (string): Unique identifier
- `code` (string): Source code snippet
- `code_description` (string): Natural language description
- `programming_language` (string): Language (python, javascript, java, etc.)
- `task_type` (string): Type of task
- `difficulty_level` (string): Difficulty (beginner, intermediate, advanced, expert)
- `quality_score` (float): Quality score 0.0-1.0
- `is_tested` (bool): Code is tested
- `has_bugs` (bool): Known bugs exist
- `lines_of_code` (int): Number of lines
- `collected_at` (string): Collection timestamp
## Usage
```python
from datasets import load_dataset
# Load dataset
dataset = load_dataset("romcmu863/code-dataset")
# Access splits
train = dataset['train']
validation = dataset['validation']
test = dataset['test']
# Get first example
example = train[0]
print(example['code_description'])
print(example['code'])
```
## License
CC0-1.0
## Created
2025-10-25