File size: 1,405 Bytes
7fec244 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
tags:
- code
- programming
- dataset
pretty_name: "Coding Dataset"
---
# Coding Dataset
Production-grade dataset for training AI coding agents.
## Dataset Summary
- **Total Examples**: 6 (demo)
- **Languages**: Python, JavaScript, Java
- **Task Types**: Code Generation
- **License**: CC0-1.0
## Dataset Structure
### Data Splits
- train: 70% of data
- validation: 15% of data
- test: 15% of data
### Features
- `id` (string): Unique identifier
- `code` (string): Source code snippet
- `code_description` (string): Natural language description
- `programming_language` (string): Language (python, javascript, java, etc.)
- `task_type` (string): Type of task
- `difficulty_level` (string): Difficulty (beginner, intermediate, advanced, expert)
- `quality_score` (float): Quality score 0.0-1.0
- `is_tested` (bool): Code is tested
- `has_bugs` (bool): Known bugs exist
- `lines_of_code` (int): Number of lines
- `collected_at` (string): Collection timestamp
## Usage
```python
from datasets import load_dataset
# Load dataset
dataset = load_dataset("romcmu863/code-dataset")
# Access splits
train = dataset['train']
validation = dataset['validation']
test = dataset['test']
# Get first example
example = train[0]
print(example['code_description'])
print(example['code'])
```
## License
CC0-1.0
## Created
2025-10-25
|