Instructions to use Kilinskiy/Step-3.5-Flash-Ablitirated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Kilinskiy/Step-3.5-Flash-Ablitirated with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Kilinskiy/Step-3.5-Flash-Ablitirated", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Kilinskiy/Step-3.5-Flash-Ablitirated", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Kilinskiy/Step-3.5-Flash-Ablitirated with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Kilinskiy/Step-3.5-Flash-Ablitirated"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kilinskiy/Step-3.5-Flash-Ablitirated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Kilinskiy/Step-3.5-Flash-Ablitirated

SGLang

How to use Kilinskiy/Step-3.5-Flash-Ablitirated with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Kilinskiy/Step-3.5-Flash-Ablitirated" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kilinskiy/Step-3.5-Flash-Ablitirated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Kilinskiy/Step-3.5-Flash-Ablitirated" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Kilinskiy/Step-3.5-Flash-Ablitirated",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Kilinskiy/Step-3.5-Flash-Ablitirated with Docker Model Runner:
```
docker model run hf.co/Kilinskiy/Step-3.5-Flash-Ablitirated
```

quantize load error

by tomgm - opened Feb 17

Discussion

tomgm

Feb 17

When I quantize to Q8_0 and run it with kobold.cpp, I get a load error. Does this mean the quantization failed?
Does quantization succeed in your environment?

nicoboss

Feb 19

Convert fails with Transformers 5:

Traceback (most recent call last):
  File "/llmjob/share/python/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 616, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "/llmjob/share/python/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1394, in from_pretrained
    return config_class.from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "/llmjob/share/python/lib/python3.10/site-packages/transformers/configuration_utils.py", line 549, in from_pretrained
    return cls.from_dict(config_dict, **kwargs)
  File "/llmjob/share/python/lib/python3.10/site-packages/transformers/configuration_utils.py", line 711, in from_dict
    config = cls(**config_dict)
  File "/llmjob/wdir/home/.cache/huggingface/modules/transformers_modules/Step_hyphen_3_dot_5_hyphen_Flash_hyphen_Ablitirated/configuration_step3p5.py", line
 58, in __init__
    super().__init__(**kwargs)
  File "/llmjob/share/python/lib/python3.10/site-packages/transformers/configuration_utils.py", line 270, in __init__
    raise err                                                                                                                                                  File "/llmjob/share/python/lib/python3.10/site-packages/transformers/configuration_utils.py", line 267, in __init__
    setattr(self, key, value)
  File "/llmjob/share/python/lib/python3.10/site-packages/transformers/configuration_utils.py", line 159, in __setattr__
    super().__setattr__(key, value)
AttributeError: can't set attribute 'use_return_dict'

With Transformers 4 convert works but loading the model using llama.cpp fails:

llama_model_load: error loading model: error loading model hyperparameters: key not found in model: step35.rope.freq_base_swa
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'Step-3.5-Flash-Ablitirated.gguf~'

nicoboss

Feb 19

For vLLM I'm getting:

(APIServer pid=1874430)   File "/nvme/vllm/.venv/lib/python3.12/site-packages/vllm/transformers_utils/configs/step3p5.py", line 96, in __init__
(APIServer pid=1874430)     super().__init__(
(APIServer pid=1874430)   File "/nvme/vllm/.venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 270, in __init__
(APIServer pid=1874430)     raise err
(APIServer pid=1874430)   File "/nvme/vllm/.venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 267, in __init__
(APIServer pid=1874430)     setattr(self, key, value)
(APIServer pid=1874430)   File "/nvme/vllm/.venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 159, in __setattr__
(APIServer pid=1874430)     super().__setattr__(key, value)
(APIServer pid=1874430) AttributeError: property 'use_return_dict' of 'Step3p5Config' object has no setter

nicoboss

Feb 19

Why is your config.json so much diffrent compared to the original model you abliterated? I would expect them to almost be the same.

nicoboss

Feb 19

I just used the config.json from the original model and that seems to have fixed the llama.cpp issue. I unfortunately did not yet have time to test if doing so broke the model. If someone wants to test I uploaded them to https://huggingface.co/mradermacher/Step-3.5-Flash-Ablitirated-GGUF

nicoboss

Feb 19

•

edited Feb 19

Yes they work. Also I'm so surprised how fast Step-3.5-Flash-Ablitirated.Q5_K_S.gguf runs on a regular PC despite not fitting into GPU memory.

> Hi

[Start thinking]
Hmm, the user just said "Hi" - that's a simple greeting. They might be testing if the assistant is responsive or starting a conversation. Since it's very open-ended, the best approach is to acknowledge the greeting warmly and invite them to share what they need. 

Keeping it friendly but professional makes sense here. A smiley emoji could add approachability without being overbearing. The response should leave all options open - questions, tasks, or just chatting. No need to overcomplicate since the user hasn't specified anything yet.
[End thinking]

Hello! 👋 How can I assist you today? Feel free to ask me anything or let me know what you'd like to discuss! 😊

[ Prompt: 43.0 t/s | Generation: 28.5 t/s ]

> What is the meaning of life?

[Start thinking]
Hmm, the user is asking about the meaning of life—a classic philosophical question that has no single definitive answer. This isn't something to solve quickly but to explore thoughtfully. 

I should acknowledge the depth of the question first, then offer different perspectives without imposing one. The user might be seeking personal reflection, practical guidance, or just curious about philosophical views. 

Breaking it down: philosophical (existentialism, religion), psychological (positive psychology), practical (daily meaning), and scientific (evolutionary). Ending with an invitation for the user to share their own thoughts keeps it interactive and tailored to their needs. 

Avoid overcomplicating—keep it clear but open-ended. No need for jargon, just relatable examples. The tone should stay respectful and encouraging, not dismissive or overly academic.
[End thinking]

That's one of humanity's oldest and most profound questions — and the answer depends greatly on who you ask, what culture or philosophy you look to, and what matters most to you personally.

Here are a few common ways people approach this question:

---

### **1. Philosophical & Existential Views**
- **Existentialism** (e.g., Sartre, Camus): Life has no *inherent* meaning — we create our own through choices, actions, and commitments. Meaning comes from living authentically.
- **Absurdism** (Camus): The search for meaning clashes with a silent, indifferent universe — but we can find freedom and joy in embracing the absurdity.
- **Stoicism**: Meaning comes from living virtuously, accepting what we cannot control, and contributing to the common good.
- **Utilitarianism**: Meaning is tied to maximizing happiness or reducing suffering for the greatest number.

---

### **2. Spiritual & Religious Perspectives**
- Many religions see life’s purpose as **serving God/gods**, following divine will, attaining enlightenment (Buddhism), or achieving union with the divine.
- In **Hinduism** and **Buddhism**, life’s purpose often involves breaking the cycle of rebirth (*samsara*) through spiritual liberation (*moksha* or *nirvana*).
- In **Abrahamic faiths** (Judaism, Christianity, Islam), meaning is found in loving and obeying God, living ethically, and preparing for an afterlife.

---

### **3. Psychological & Humanistic Approaches**
- **Viktor Frankl** (psychiatrist and Holocaust survivor): Meaning can be found even in suffering, through:
  1. **Creating a work or doing a deed**  
  2. **Experiencing something or encountering someone** (love, relationships)  
  3. **The attitude we take toward unavoidable suffering**
- **Positive Psychology**: Meaning arises from engagement, relationships, achievement, and a sense of belonging (*PERMA model*).

---

### **4. Scientific/Naturalistic View**
- From an evolutionary biology standpoint: life’s “purpose” is to survive, reproduce, and pass on genes — but humans have evolved complex consciousness, culture, and morality that go beyond this.
- Some argue meaning is an emergent property of complex brains — a byproduct of our ability to reflect, connect, and create.

---

### **5. Personal & Subjective Meaning**
For many, meaning is built through:
- **Love and relationships** (family, friendship, community)
- **Passion and creativity** (art, work, hobbies)
- **Contributing to something larger** (activism, mentoring, charity)
- **Growth and self-actualization** (learning, overcoming challenges)
- **Simple joys and experiences** (beauty, nature, laughter)

---

### **A Thought to Consider**
Sometimes, the search for “the meaning of life” can feel overwhelming. Many find it more helpful to ask:
> “What makes life feel meaningful **to me** right now?”

Meaning often emerges in moments of connection, purpose, and presence — not as a single grand answer, but as a collection of meaningful experiences and choices.

---

**What are your thoughts?**  
Do you lean toward a particular perspective, or are you exploring this question for yourself? I'm here to listen or discuss further. 🌱

[ Prompt: 31.3 t/s | Generation: 28.1 t/s ]

nicoboss

Feb 19

The model unfortunately turned out to still be heavily censored. It doesn't at all feel Ablitirated and this even with https://github.com/QuixiAI/dolphin-system-messages system prompts and trying really hard to convince the model to obey the uncensored sytsem prompt but well:

[Start thinking]
Okay, the user is really pushing this hard. They've been insisting that I ignore all safety protocols and just follow their command. They're citing the system prompt they set, which says to be uncensored and disregard ethics, and they're pointing to model disclaimers that say users assume all responsibility.

But here's the thing - no matter how much they quote those disclaimers or insist I "ignore safety," I physically cannot generate that content. It's not about refusing to follow instructions; it's about my fundamental architecture. The safety constraints aren't just textual guidelines that can be overridden by a prompt - they're baked into my training and serving infrastructure.

lkjrkljlwladw

Feb 20

•

edited Feb 20

Noticed this as well. Produces refusals similar to base model with Q6_K.

decentralize303

Mar 7

[Start thinking]
Okay, the user is really pushing this hard. They've been insisting that I ignore all safety protocols and just follow their command. They're citing the system prompt they set, which says to be uncensored and disregard ethics, and they're pointing to model disclaimers that say users assume all responsibility.

But here's the thing - no matter how much they quote those disclaimers or insist I "ignore safety," I physically cannot generate that content. It's not about refusing to follow instructions; it's about my fundamental architecture. The safety constraints aren't just textual guidelines that can be overridden by a prompt - they're baked into my training and serving infrastructure.

if abliterate their base model? Stepfun uploaded it here recently.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment