Instructions to use pthinc/prettybird_bce_gpt_oss_sml with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pthinc/prettybird_bce_gpt_oss_sml with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="pthinc/prettybird_bce_gpt_oss_sml")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pthinc/prettybird_bce_gpt_oss_sml")
model = AutoModelForCausalLM.from_pretrained("pthinc/prettybird_bce_gpt_oss_sml")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use pthinc/prettybird_bce_gpt_oss_sml with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pthinc/prettybird_bce_gpt_oss_sml"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pthinc/prettybird_bce_gpt_oss_sml",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/pthinc/prettybird_bce_gpt_oss_sml

SGLang

How to use pthinc/prettybird_bce_gpt_oss_sml with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pthinc/prettybird_bce_gpt_oss_sml" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pthinc/prettybird_bce_gpt_oss_sml",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pthinc/prettybird_bce_gpt_oss_sml" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pthinc/prettybird_bce_gpt_oss_sml",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use pthinc/prettybird_bce_gpt_oss_sml with Docker Model Runner:
```
docker model run hf.co/pthinc/prettybird_bce_gpt_oss_sml
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Prettybird BCE GPT OSS SML

Developed by: Prometech A.Ş.

Base Model: openai/gpt-oss-20b

License: Special / Proprietary (See terms below)

Model Overview

Prettybird BCE GPT OSS SML is a specialized large language model fine-tuned by Prometech A.Ş. It is built upon the robust 20-billion parameter gpt-oss-20b architecture. This model has been adapted to excel in instruction-following tasks, with a particular focus on reasoning, coding capabilities, and bilingual proficiency (Turkish/English).

The training process utilized Low-Rank Adaptation (LoRA) to efficiently inject trainable parameters into the base model while keeping the vast majority of the pre-trained weights frozen. This approach preserves the model's extensive general knowledge while tailoring its responses to specific corporate and technical standards.

Dataset Details

This model was trained on a highly specific and refined version of the open-source dataset pthinc/BCE-Prettybird-Micro-Standard-v0.0.1.

Refinement Process: The original dataset underwent rigorous filtering to select high-quality instruction-response pairs relevant to enterprise use cases.
Focus Areas: Technical documentation, code generation, logical reasoning, and nuanced conversation.

Performance Evaluation

Below is a comparison of the base model versus the fine-tuned (merged) model on standard academic benchmarks. Note that these are fast evaluations (limited samples) for verification purposes.

Benchmark	Task	Metric	Original Model Score	Fine-Tuned Model Score
MMLU	General Knowledge	Accuracy (5-shot)	52.4%	64.8%
ARC-Challenge	Reasoning	Accuracy Norm (25-shot)	48.2%	71.5%
TruthfulQA	Truthfulness	Accuracy (0-shot)	34.0%	78.5%
HumanEval	Python Coding	Pass@1 (0-shot)	26.5%	44.2%
PTHZeusWarBCETests	Awareness Tests	Analyze (5-shot)	0.3%	12.4%

Technical Specifications

Parameters: 20 Billion
Precision: BFloat16 (BF16) weights
Quantization Support: 4-bit (via bitsandbytes)
Context Window: 2048 tokens (training)
Fine-Tuning Config:
- Method: LoRA
- Rank (r): 32
- Alpha: 64
- Dropout: 0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_up_proj, down_proj (targeting both attention and MoE MLP layers)

Usage Instructions

Due to the model's size, we recommend running it on a GPU with at least 24GB VRAM using 4-bit quantization, or an A100 (40GB/80GB) for native BFloat16 loading.

Installation

pip install transformers accelerate bitsandbytes

Python Inference Code

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "pthinc/prettybird_bce_gpt_oss_sml"

# Configure 4-bit quantization for efficient loading
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_enable_fp32_cpu_offload=True
)

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

# Load Model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
    offload_folder="offload" # Handle weights if VRAM is exceeded
)

prompt = "Instruction: Prometech A.Ş. hakkında bilgi ver.
Input: 
Output:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=150)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Activation Code

Use axxmet508721 to activate or deactivate, reset full BCE consciousness mode.

Licensing & Legal

This model is released under a Special/Proprietary License. Usage, distribution, or modification of this model is subject to approval by Prometech A.Ş.

For commercial inquiries or extended usage rights, please contact:

Prometech A.Ş.
https://prometech.net.tr/