|
|
--- |
|
|
base_model: unsloth/gpt-oss-20b-unsloth-bnb-4bit |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- gpt_oss |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
new_version: EpistemeAI/VibeCoder-20B-alpha-0.001 |
|
|
--- |
|
|
# Model card |
|
|
|
|
|
# Test our endpoint |
|
|
[FriendliAI](https://friendli.ai/suite/WTHFpZnt6oAT/VGDaGrYOXeIm/dedicated-endpoints/depoqch056a4j4a/playground) |
|
|
|
|
|
# Summary |
|
|
This is an first-generation vibe-code alpha(preview) LLM. It’s optimized to produce both natural-language and code completions directly from loosely structured, “vibe coding” prompts. Compared to earlier-generation LLMs, it has a lower prompt-engineering overhead and smoother latent-space interpolation, making it easier to guide toward usable code. The following capabilities can be leveraged: |
|
|
- **Agentic capabilities**: Use the OpenAI's gpt oss 20b models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. |
|
|
- This model were trained on our [harmony response](https://github.com/openai/harmony) format and should only be used with the harmony format as it will not work correctly otherwise. |
|
|
|
|
|
# Vibe-Code LLM |
|
|
|
|
|
This is a **first-generation vibe-code LLM**. |
|
|
It’s optimized to produce both natural-language and code completions directly from loosely structured, *“vibe coding”* prompts. |
|
|
|
|
|
Unlike earlier LLMs that demanded rigid prompt engineering, vibe-code interaction lowers the overhead: you can sketch intent, describe functionality in free-form language, or mix pseudo-code with natural text. The model interpolates smoothly in latent space, making it easier to guide toward usable and executable code. |
|
|
|
|
|
--- |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- **Low Prompt-Engineering Overhead** |
|
|
Accepts incomplete or intuitive instructions, reducing the need for explicit formatting or rigid templates. |
|
|
|
|
|
- **Latent-Space Interpolation** |
|
|
Transitions fluidly between natural-language reasoning and syntax-aware code generation. Produces semantically coherent code blocks even when the prompt is under-specified. |
|
|
|
|
|
- **Multi-Domain Support** |
|
|
Handles a broad range of programming paradigms: Python, JavaScript, C++, shell scripting, and pseudo-code scaffolding. |
|
|
|
|
|
- **Context-Sensitive Completion** |
|
|
Leverages attention mechanisms to maintain coherence across multi-turn coding sessions. |
|
|
|
|
|
- **Syntax-Aware Decoding** |
|
|
Biases output distribution toward syntactically valid tokens, improving out-of-the-box executability of code. |
|
|
|
|
|
- **Probabilistic Beam & Sampling Controls** |
|
|
Supports temperature scaling, top-k, and nucleus (top-p) sampling to modulate creativity vs. determinism. |
|
|
|
|
|
- **Hybrid Text + Code Responses** |
|
|
Generates inline explanations, design rationales, or docstrings alongside code for improved readability and maintainability. |
|
|
|
|
|
--- |
|
|
|
|
|
## Example Usage |
|
|
|
|
|
```plaintext |
|
|
Prompt: |
|
|
"make me a fast vibe function that sorts numbers but with a cool twist" |
|
|
|
|
|
Response: |
|
|
- Natural explanation of sorting method |
|
|
- Code snippet (e.g., Python quicksort variant) |
|
|
- Optional playful commentary to match the vibe |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Ideal Applications |
|
|
|
|
|
- Rapid prototyping & exploratory coding |
|
|
- Creative coding workflows with minimal boilerplate |
|
|
- Educational contexts where explanation + code matter equally |
|
|
- Interactive REPLs, notebooks, or editor assistants that thrive on loose natural-language input |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Not tuned for production-grade formal verification. |
|
|
- May require post-processing or linting to ensure strict compliance with project coding standards. |
|
|
- Designed for *“fast prototyping vibes”*, not for long-horizon enterprise-scale codebases. |
|
|
|
|
|
|
|
|
|
|
|
# Inference examples |
|
|
|
|
|
## Transformers |
|
|
|
|
|
You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package. |
|
|
|
|
|
To get started, install the necessary dependencies to setup your environment: |
|
|
|
|
|
``` |
|
|
pip install -U transformers kernels torch |
|
|
``` |
|
|
|
|
|
For Google Colab (free/Pro) |
|
|
``` |
|
|
!pip install -q --upgrade torch |
|
|
|
|
|
!pip install -q transformers triton==3.4 kernels |
|
|
|
|
|
!pip uninstall -q torchvision torchaudio -y |
|
|
``` |
|
|
|
|
|
Once, setup you can proceed to run the model by running the snippet below: |
|
|
|
|
|
```py |
|
|
from transformers import pipeline |
|
|
import torch |
|
|
model_id = "EpistemeAI/VibeCoder-20B-alpha" |
|
|
pipe = pipeline( |
|
|
"text-generation", |
|
|
model=model_id, |
|
|
torch_dtype="auto", |
|
|
device_map="auto", |
|
|
) |
|
|
messages = [ |
|
|
{"role": "user", "content": "Let’s start with the header and navigation for the landing page. Start by creating the top header section for the dashboard. We’ll add the content blocks below afterward."}, |
|
|
] |
|
|
outputs = pipe( |
|
|
messages, |
|
|
max_new_tokens=3000, |
|
|
) |
|
|
print(outputs[0]["generated_text"][-1]) |
|
|
``` |
|
|
|
|
|
### Amazon SageMaker |
|
|
```py |
|
|
import json |
|
|
import sagemaker |
|
|
import boto3 |
|
|
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri |
|
|
|
|
|
try: |
|
|
role = sagemaker.get_execution_role() |
|
|
except ValueError: |
|
|
iam = boto3.client('iam') |
|
|
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn'] |
|
|
|
|
|
# Hub Model configuration. https://huggingface.co/models |
|
|
hub = { |
|
|
'HF_MODEL_ID':'EpistemeAI/VibeCoder-20B-alpha', |
|
|
'SM_NUM_GPUS': json.dumps(1) |
|
|
} |
|
|
|
|
|
|
|
|
|
|
|
# create Hugging Face Model Class |
|
|
huggingface_model = HuggingFaceModel( |
|
|
image_uri=get_huggingface_llm_image_uri("huggingface",version="3.2.3"), |
|
|
env=hub, |
|
|
role=role, |
|
|
) |
|
|
|
|
|
# deploy model to SageMaker Inference |
|
|
predictor = huggingface_model.deploy( |
|
|
initial_instance_count=1, |
|
|
instance_type="ml.g5.2xlarge", |
|
|
container_startup_health_check_timeout=300, |
|
|
) |
|
|
|
|
|
# send request |
|
|
predictor.predict({ |
|
|
"inputs": "Hi, what can you help me with?", |
|
|
}) |
|
|
``` |
|
|
|
|
|
# Uploaded finetuned model |
|
|
|
|
|
- **Developed by:** EpistemeAI |
|
|
- **License:** apache-2.0 |
|
|
- **Finetuned from model :** unsloth/gpt-oss-20b-unsloth-bnb-4bit |
|
|
|
|
|
This gpt_oss model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |