Instructions to use stepfun-ai/GELab-Zero-4B-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stepfun-ai/GELab-Zero-4B-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="stepfun-ai/GELab-Zero-4B-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("stepfun-ai/GELab-Zero-4B-preview")
model = AutoModelForImageTextToText.from_pretrained("stepfun-ai/GELab-Zero-4B-preview")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use stepfun-ai/GELab-Zero-4B-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stepfun-ai/GELab-Zero-4B-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/GELab-Zero-4B-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/stepfun-ai/GELab-Zero-4B-preview

SGLang

How to use stepfun-ai/GELab-Zero-4B-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "stepfun-ai/GELab-Zero-4B-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/GELab-Zero-4B-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "stepfun-ai/GELab-Zero-4B-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stepfun-ai/GELab-Zero-4B-preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use stepfun-ai/GELab-Zero-4B-preview with Docker Model Runner:
```
docker model run hf.co/stepfun-ai/GELab-Zero-4B-preview
```

Why use Prompt Engineering to trigger AI control operations instead of using function calls?

by pypry - opened Dec 31, 2025

Discussion

pypry

Dec 31, 2025

•

edited Dec 31, 2025

Prompt engineering places very high demands on a model's instruction following ability , whereas function calling, once trained specifically, achieves more precise invocation and broader generalization capabilities.

pypry

Dec 31, 2025

https://github.com/gitker/-GELab-Zero I change the Prompt engineering to function call , it seems work well with qwen3 vl 30b a3b instruct

super-rikka

StepFun org Jan 4

This isn’t prompt engineering. We trained the model using this format, and if we change it, we can’t guarantee there will be no performance degradation.

At the very beginning we did consider using tool calls or JSON as the output format, but we chose this format mainly to save tokens. In the future, we’ll try injecting the ability to follow different output formats into the model.

pypry

Jan 4

This isn’t prompt engineering. We trained the model using this format, and if we change it, we can’t guarantee there will be no performance degradation.

At the very beginning we did consider using tool calls or JSON as the output format, but we chose this format mainly to save tokens. In the future, we’ll try injecting the ability to follow different output formats into the model.

I tested this model and found that it often fails to output the format specified in the prompts, leading to errors during Python execution. When I replaced UI operations with function calls and connected them to other multimodal models that support function calls, format errors rarely occurred, and UI operations were more precise. I believe function calls are a more generalizable ability. If the model is trained to operate UIs through function calls, it would be much easier to modify the output format later as needed.

super-rikka

StepFun org Jan 4

It’s essentially the same thing. In the GitHub code, the parser is responsible for parsing the model’s output into a structured dict. If needed, you can add a wrapper to do some additional conversion.

LeroyDyer

Jan 6

prompt engineering and funciton calls are the same . if you use the openai format or langchain or whatever you are just not seeing the function( jsons ) ... but its there .
when you extract the calls using openai or langchain etc , when they submit the call they also add a prompt to prompt the model to return in a specific format ...
so when you train a model you generaly consider all these methods ... and it can depend on the data available .. in many training data we do not even have the functions to call , we only have the extracted json ... so we can indeed use any method as the output is the methodology for the give input , so with training if you want to extract these calls , or later create a funciton caller , if you use open ai or langchain it will not matter as they will be injecting thier prompt !
so if you dont find your funciton cal in the openai return ( tool calls or funciton calls ) it should not matter as it only means that with prompt enginerring you will have to specfy the output style you wish the model to return , ie surrounded in xml tags .. so you can parse the raw respones yourself ... so your funciton caller will use both method the potential library your using and your personal method !

in text models you can specifcy any json or xml output format even yaml etc so its down to you ! you ould be using pydantic ( which actually does not truly care about the ouput as they also use templares and extractors so it would seem the model is outputting great objectes but they are not !
the pydantic method is usefull but not correct ! ... but the cobining of the methods above is (even pydantic add to your prompt ! )

so we find that the more librays and steps your prompt goes through the more potential injections which you have no control over hence raw is best !

pypry

Jan 7

If you train a model to use tools—and through tools to manipulate the UI—its generalization ability is higher and it is less prone to errors, because using tools becomes, to some extent, a core capability of the model.
If, instead, you train the model to output a specific format to operate the UI, then it can only operate the UI, and this kind of training will, to a certain degree, impair the model’s foundational capabilities.

LeroyDyer

Jan 7

If you train a model to use tools—and through tools to manipulate the UI—its generalization ability is higher and it is less prone to errors, because using tools becomes, to some extent, a core capability of the model.
If, instead, you train the model to output a specific format to operate the UI, then it can only operate the UI, and this kind of training will, to a certain degree, impair the model’s foundational capabilities.

I think ,
that you need to train all different methods !~
In the begining i was training models with various datasets which produced a varity of output shapes ! even chain of thought , forest of thoughts etc ... some very good datasets out there ! but i did not see the model apating to the outputs that i expected it to produce despite traiing it until it did generalize ? ... or indicated that it was trained ... i never realised that to actually change the behaviour of the model it needed grpo ! ...... now we can design any form of output style and it will make them !

so when you do mass traiing then for sepecfic behaviours you need to do the reward training for the new output style ! .. then it will take 100 percnent ie the deepseek R1 thinker !
so we could make a explainer instead of a thinker ! ie th model always give a big explanation for the answer instead of a massive pre thought ! ...

so we train generally first ... then we lock in the desired beaviour or leave it raw for the next finetuner ! who will grpo the style of response they desire .. and it will converge quickly as it has been pretrained on the n=method !

ie when i attempted to make the deepthinker it took the training very quickly as it had been trained before ! ( but never added the beaviour it was embedded deep !)
so finetuning should be very easy if you train on many taks types !

Now i can train on sparce data and mad bits of meta data ! which make no sense to me but the model accepts it ! .. and later it gains knowledge !

these methods allow for intergrating potential keywords for your responses and task types ! like return the reponse with tool calls in the shpape of ttol call and reasoning in the shape of .. and artifacts in the shape of .... so your UI can extract the various components easily !

just take all content as the response and remove and extract the components leaving the actual response behind

its all methodologies bro !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment