Instructions to use eshmoideas/DSGENERAL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eshmoideas/DSGENERAL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="eshmoideas/DSGENERAL")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("eshmoideas/DSGENERAL") model = AutoModelForCausalLM.from_pretrained("eshmoideas/DSGENERAL") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use eshmoideas/DSGENERAL with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "eshmoideas/DSGENERAL" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eshmoideas/DSGENERAL", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/eshmoideas/DSGENERAL
- SGLang
How to use eshmoideas/DSGENERAL with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "eshmoideas/DSGENERAL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eshmoideas/DSGENERAL", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "eshmoideas/DSGENERAL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "eshmoideas/DSGENERAL", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use eshmoideas/DSGENERAL with Docker Model Runner:
docker model run hf.co/eshmoideas/DSGENERAL
| # DeepSeek-V4 Encoding | |
| This document describes the prompt encoding format used by DeepSeek-V4 series models. The encoding handles multi-turn conversations, tool calling, extended thinking (reasoning), and quick instruction tasks. | |
| A self-contained reference implementation is provided in `encoding_dsv4.py`. | |
| ## Quick Start | |
| ```python | |
| from encoding_dsv4 import encode_messages, parse_message_from_completion_text | |
| # Encode a conversation | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": "What is 2+2?"}, | |
| ] | |
| prompt = encode_messages(messages, thinking_mode="thinking") | |
| # => "<|begin▁of▁sentence|>You are a helpful assistant.<|User|>What is 2+2?<|Assistant|><think>" | |
| # Parse model output back to structured message | |
| completion = "Simple arithmetic.</think>2 + 2 = 4.<|end▁of▁sentence|>" | |
| parsed = parse_message_from_completion_text(completion, thinking_mode="thinking") | |
| # => {"role": "assistant", "reasoning_content": "Simple arithmetic.", "content": "2 + 2 = 4.", "tool_calls": []} | |
| ``` | |
| > **Note:** The `parse_message_from_completion_text` function is designed to handle well-formatted model output only. It does not attempt to correct or recover from malformed output that the model might occasionally generate. For production use, additional error handling is recommended. | |
| ## Message Format | |
| ### Special Tokens | |
| | Token | Purpose | | |
| |-------|---------| | |
| | `<|begin▁of▁sentence|>` | Beginning of sequence (BOS) | | |
| | `<|end▁of▁sentence|>` | End of assistant turn (EOS) | | |
| | `<|User|>` | User turn prefix | | |
| | `<|Assistant|>` | Assistant turn prefix | | |
| | `<|latest_reminder|>` | Latest reminder (date, locale, etc.) | | |
| | `<think>` / `</think>` | Reasoning block delimiters | | |
| | `|DSML|` | DSML markup token | | |
| ### Roles | |
| The encoding supports the following message roles: `system`, `user`, `assistant`, `tool`, `latest_reminder`, and `developer`. | |
| > **Note on the `developer` role:** The `developer` role is used exclusively in the internal search agent pipeline. It is not needed for general-purpose chat or tool-calling tasks, and the official API does not accept messages with this role. | |
| ### Basic Chat | |
| A simple multi-turn conversation is encoded as: | |
| ``` | |
| <|begin▁of▁sentence|>{system_prompt} | |
| <|User|>{user_message}<|Assistant|></think>{response}<|end▁of▁sentence|> | |
| <|User|>{user_message_2}<|Assistant|></think>{response_2}<|end▁of▁sentence|> | |
| ``` | |
| - The BOS token is prepended at the very beginning of the conversation. | |
| - In **chat mode** (`thinking_mode="chat"`), `</think>` is placed right after `<|Assistant|>` to immediately close the thinking block, so the model generates content directly. | |
| ### Interleaved Thinking Mode | |
| In **thinking mode** (`thinking_mode="thinking"`), the model produces explicit reasoning inside `<think>...</think>` blocks before responding. | |
| ``` | |
| <|begin▁of▁sentence|>{system_prompt} | |
| <|User|>{message}<|Assistant|><think>{reasoning}</think>{response}<|end▁of▁sentence|> | |
| ``` | |
| The `drop_thinking` parameter (default `True`) controls whether reasoning from earlier turns is preserved: | |
| - **Without tools**: `drop_thinking` takes effect. Reasoning content from assistant turns **before** the last user message is stripped. Only the final assistant turn retains its `<think>...</think>` block. | |
| - **With tools** (on system or developer message): `drop_thinking` is automatically disabled. All turns retain their reasoning, because tool-calling conversations require full context for the model to track multi-step reasoning across tool calls. | |
| ### Tool Calling (DSML Format) | |
| Tools are defined on the `system` or `developer` message via the `tools` field (OpenAI-compatible format). When tools are present, the following schema block is injected into the system/user prompt: | |
| ``` | |
| ## Tools | |
| You have access to a set of tools to help answer the user's question. You can invoke tools by writing a "<|DSML|tool_calls>" block like the following: | |
| <|DSML|tool_calls> | |
| <|DSML|invoke name="$TOOL_NAME"> | |
| <|DSML|parameter name="$PARAMETER_NAME" string="true|false">$PARAMETER_VALUE</|DSML|parameter> | |
| ... | |
| </|DSML|invoke> | |
| <|DSML|invoke name="$TOOL_NAME2"> | |
| ... | |
| </|DSML|invoke> | |
| </|DSML|tool_calls> | |
| String parameters should be specified as is and set `string="true"`. For all other types (numbers, booleans, arrays, objects), pass the value in JSON format and set `string="false"`. | |
| If thinking_mode is enabled (triggered by <think>), you MUST output your complete reasoning inside <think>...</think> BEFORE any tool calls or final response. | |
| Otherwise, output directly after </think> with tool calls or final response. | |
| ### Available Tool Schemas | |
| {tool_definitions_json} | |
| You MUST strictly follow the above defined tool name and parameter schemas to invoke tool calls. | |
| ``` | |
| An actual tool call in the assistant turn looks like: | |
| ```xml | |
| <|DSML|tool_calls> | |
| <|DSML|invoke name="function_name"> | |
| <|DSML|parameter name="param" string="true">string_value</|DSML|parameter> | |
| <|DSML|parameter name="count" string="false">5</|DSML|parameter> | |
| </|DSML|invoke> | |
| </|DSML|tool_calls><|end▁of▁sentence|> | |
| ``` | |
| - `string="true"`: the parameter value is a raw string. | |
| - `string="false"`: the parameter value is JSON (number, boolean, array, object). | |
| Tool execution results are wrapped in `<tool_result>` tags within user messages: | |
| ``` | |
| <|User|><tool_result>{result_json}</tool_result><|Assistant|><think>... | |
| ``` | |
| When multiple tool results are present, they are sorted by the order of the corresponding `tool_calls` in the preceding assistant message. | |
| ### Reasoning Effort | |
| When `reasoning_effort="max"` is set, a special prefix is prepended at the very beginning of the prompt (before the system message) to instruct the model to maximize its reasoning depth: | |
| ``` | |
| Reasoning Effort: Absolute maximum with no shortcuts permitted. | |
| You MUST be very thorough in your thinking and comprehensively decompose the problem to resolve the root cause, rigorously stress-testing your logic against all potential paths, edge cases, and adversarial scenarios. | |
| Explicitly write out your entire deliberation process, documenting every intermediate step, considered alternative, and rejected hypothesis to ensure absolutely no assumption is left unchecked. | |
| ``` | |
| ### Quick Instruction Special Tokens | |
| Quick instruction tokens are used for auxiliary classification and generation tasks. They are appended to messages via the `"task"` field to trigger specialized model behavior for a single-token or short-form output. | |
| | Special Token | Description | Format | | |
| |:---|:---|:---| | |
| | `<|action|>` | Determines whether the user prompt requires a web search or can be answered directly. | `...<|User|>{prompt}<|Assistant|><think><|action|>` | | |
| | `<|title|>` | Generates a concise conversation title after the first assistant response. | `...<|Assistant|>{response}<|end▁of▁sentence|><|title|>` | | |
| | `<|query|>` | Generates search queries for the user prompt. | `...<|User|>{prompt}<|query|>` | | |
| | `<|authority|>` | Classifies the user prompt's demand for source authoritativeness. | `...<|User|>{prompt}<|authority|>` | | |
| | `<|domain|>` | Identifies the domain of the user prompt. | `...<|User|>{prompt}<|domain|>` | | |
| | `<|extracted_url|>` `<|read_url|>` | Determines whether each URL in the user prompt should be fetched and read. | `...<|User|>{prompt}<|extracted_url|>{url}<|read_url|>` | | |
| Usage in message format: | |
| - **`action`** on a user message: the `<|action|>` token is placed after the assistant prefix and thinking token, triggering a routing decision (e.g., "Search" or "Answer"). | |
| - **Other tasks** (`query`, `authority`, `domain`, `read_url`) on a user message: the task token is appended directly after the user content. | |
| - **`title`** on an assistant message: the `<|title|>` token is appended after the assistant's EOS. The next assistant message provides the generated title. | |