File size: 1,405 Bytes

7fec244

---

tags:
- code
- programming
- dataset
pretty_name: "Coding Dataset"
---


# Coding Dataset

Production-grade dataset for training AI coding agents.

## Dataset Summary

- **Total Examples**: 6 (demo)
- **Languages**: Python, JavaScript, Java
- **Task Types**: Code Generation
- **License**: CC0-1.0

## Dataset Structure

### Data Splits

- train: 70% of data
- validation: 15% of data  
- test: 15% of data

### Features

- `id` (string): Unique identifier
- `code` (string): Source code snippet
- `code_description` (string): Natural language description
- `programming_language` (string): Language (python, javascript, java, etc.)
- `task_type` (string): Type of task
- `difficulty_level` (string): Difficulty (beginner, intermediate, advanced, expert)
- `quality_score` (float): Quality score 0.0-1.0
- `is_tested` (bool): Code is tested
- `has_bugs` (bool): Known bugs exist
- `lines_of_code` (int): Number of lines
- `collected_at` (string): Collection timestamp

## Usage

```python

from datasets import load_dataset



# Load dataset

dataset = load_dataset("romcmu863/code-dataset")



# Access splits

train = dataset['train']

validation = dataset['validation']

test = dataset['test']



# Get first example

example = train[0]

print(example['code_description'])

print(example['code'])

```

## License

CC0-1.0

## Created

2025-10-25