Instructions to use jdopensource/JoyAI-LLM-Flash-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jdopensource/JoyAI-LLM-Flash-FP8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jdopensource/JoyAI-LLM-Flash-FP8", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("jdopensource/JoyAI-LLM-Flash-FP8", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use jdopensource/JoyAI-LLM-Flash-FP8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jdopensource/JoyAI-LLM-Flash-FP8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jdopensource/JoyAI-LLM-Flash-FP8

SGLang

How to use jdopensource/JoyAI-LLM-Flash-FP8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jdopensource/JoyAI-LLM-Flash-FP8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jdopensource/JoyAI-LLM-Flash-FP8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jdopensource/JoyAI-LLM-Flash-FP8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jdopensource/JoyAI-LLM-Flash-FP8 with Docker Model Runner:
```
docker model run hf.co/jdopensource/JoyAI-LLM-Flash-FP8
```

Mingke977 commited on Feb 25

Commit

fda2d4a

verified ·

1 Parent(s): f794b33

Add files using upload-large-folder tool

Browse files

Files changed (50) hide show

.gitignore +2 -0
README.md +378 -0
chat_template.jinja +103 -0
config.json +58 -0
configuration_deepseek.py +247 -0
docs/deploy_guidance.md +42 -0
model.safetensors.index.json +0 -0
modeling_deepseek.py +1030 -0
tokenizer.json +0 -0
tokenizer_config.json +34 -0
venv/bin/Activate.ps1 +247 -0
venv/bin/activate +69 -0
venv/bin/activate.csh +26 -0
venv/bin/activate.fish +69 -0
venv/bin/hf +10 -0
venv/bin/httpx +10 -0
venv/bin/markdown-it +10 -0
venv/bin/pip +10 -0
venv/bin/pip3 +10 -0
venv/bin/pip3.10 +10 -0
venv/bin/pygmentize +10 -0
venv/bin/tiny-agents +10 -0
venv/bin/tqdm +10 -0
venv/bin/typer +10 -0
venv/lib/python3.10/site-packages/_distutils_hack/__init__.py +132 -0
venv/lib/python3.10/site-packages/_distutils_hack/__pycache__/__init__.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/_distutils_hack/__pycache__/override.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/_distutils_hack/override.py +1 -0
venv/lib/python3.10/site-packages/_yaml/__init__.py +33 -0
venv/lib/python3.10/site-packages/_yaml/__pycache__/__init__.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/INSTALLER +1 -0
venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/METADATA +145 -0
venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/RECORD +11 -0
venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/WHEEL +4 -0
venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/entry_points.txt +4 -0
venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/licenses/LICENSE +21 -0
venv/lib/python3.10/site-packages/annotated_doc/__init__.py +3 -0
venv/lib/python3.10/site-packages/annotated_doc/__pycache__/__init__.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/annotated_doc/__pycache__/main.cpython-310.pyc +0 -0
venv/lib/python3.10/site-packages/annotated_doc/main.py +36 -0
venv/lib/python3.10/site-packages/annotated_doc/py.typed +0 -0
venv/lib/python3.10/site-packages/anyio/__init__.py +111 -0
venv/lib/python3.10/site-packages/anyio/from_thread.py +578 -0
venv/lib/python3.10/site-packages/anyio/functools.py +375 -0
venv/lib/python3.10/site-packages/anyio/lowlevel.py +196 -0
venv/lib/python3.10/site-packages/anyio/py.typed +0 -0
venv/lib/python3.10/site-packages/anyio/pytest_plugin.py +302 -0
venv/lib/python3.10/site-packages/anyio/to_interpreter.py +246 -0
venv/lib/python3.10/site-packages/typing_extensions.py +0 -0
venv/pyvenv.cfg +3 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ .joycode/
2	+ venv/

README.md ADDED Viewed

	@@ -0,0 +1,378 @@

+---
+language:
+- zh
+- en
+pipeline_tag: text-generation
+library_name: transformers
+---
+<div align="center">
+  <picture>
+      <img src="figures/joyai-logo.png" width="30%" alt="JoyAI-LLM Flash">
+  </picture>
+</div>
+<hr>
+<div align="center" style="line-height: 1;">
+  <a href="https://huggingface.co/jdopensource" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-JD-ffc107?color=ffc107&logoColor=white"/></a>
+  <a href="https://huggingface.co/jdopensource/JoyAI-LLM-Flash/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a>
+</div>
+## 1. Model Introduction
+JoyAI-LLM-Flash is a state-of-the-art medium-sized instruct language model with 3 billion activated parameters and 48 billion total parameters. JoyAI-LLM-Flash was pretrained on 20 trillion text tokens using Muon optimizer, followed by large-scale supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL) across diverse environments. JoyAI-LLM-Flash achieves strong performance across frontier knowledge, reasoning, coding tasks and agentic capabilities.
+### Key Features
+- Fiber Bundle RL: Introduces fiber bundle theory into reinforcement learning, proposing a novel optimization framework, FiberPO. This method is specifically designed to handle the challenges of large-scale and heterogeneous agent training, improving stability and robustness under complex data distributions.
+- Training-Inference Collaboration: apply Muon optimizer with dense MTP, develop novel optimization techniques to resolve instabilities while scaling up, delivering 1.3× to 1.7× the throughput of the non-MTP version.
+- Agentic Intelligence: designed for tool use, reasoning, and autonomous problem-solving.
+## 2. Model Summary
+|                                             |                          |
+| :-----------------------------------------: | :----------------------: |
+|              **Architecture**               | Mixture-of-Experts (MoE) |
+|            **Total Parameters**             |           48B            |
+|          **Activated Parameters**           |            3B            |
+| **Number of Layers** (Dense layer included) |            40            |
+|         **Number of Dense Layers**          |            1             |
+|       **Attention Hidden Dimension**        |           2048           |
+|    **MoE Hidden Dimension** (per Expert)    |           768            |
+|        **Number of Attention Heads**        |            32            |
+|            **Number of Experts**            |           256            |
+|       **Selected Experts per Token**        |            8             |
+|        **Number of Shared Experts**         |            1             |
+|             **Vocabulary Size**             |           129K           |
+|             **Context Length**              |           128K           |
+|           **Attention Mechanism**           |           MLA            |
+|           **Activation Function**           |          SwiGLU          |
+|                   </div>                    |                          |
+## 3. Evaluation Results
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center"><sup>JoyAI-LLM Flash</sup></th>
+<th align="center"><sup>Qwen3-30B-A3B-Instuct-2507</sup></th>
+<th align="center"><sup>GLM-4.7-Flash<br>(Non-thinking)</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan=8><strong>Knowledge &amp; Alignment</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">MMLU</td>
+<td align="center" style="vertical-align: middle"><strong>89.50</strong></td>
+<td align="center" style="vertical-align: middle">86.87</td>
+<td align="center" style="vertical-align: middle">80.53</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">MMLU-Pro</td>
+<td align="center" style="vertical-align: middle"><strong>81.02</strong></td>
+<td align="center" style="vertical-align: middle">73.88</td>
+<td align="center" style="vertical-align: middle">63.62</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">CMMLU</td>
+<td align="center" style="vertical-align: middle"><strong>87.03</strong></td>
+<td align="center" style="vertical-align: middle">85.88</td>
+<td align="center" style="vertical-align: middle">75.85</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">GPQA-Diamond</td>
+<td align="center" style="vertical-align: middle"><strong>74.43</strong></td>
+<td align="center" style="vertical-align: middle">68.69</td>
+<td align="center" style="vertical-align: middle">39.90</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">SuperGPQA</td>
+<td align="center" style="vertical-align: middle"><strong>55.00</strong></td>
+<td align="center" style="vertical-align: middle">52.00</td>
+<td align="center" style="vertical-align: middle">32.00</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">LiveBench</td>
+<td align="center" style="vertical-align: middle"><strong>72.90</strong></td>
+<td align="center" style="vertical-align: middle">59.70</td>
+<td align="center" style="vertical-align: middle">43.10</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">IFEval</td>
+<td align="center" style="vertical-align: middle"><strong>86.69</strong></td>
+<td align="center" style="vertical-align: middle">83.18</td>
+<td align="center" style="vertical-align: middle">82.44</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">AlignBench</td>
+<td align="center" style="vertical-align: middle"><strong>8.24</strong></td>
+<td align="center" style="vertical-align: middle">8.07</td>
+<td align="center" style="vertical-align: middle">6.85</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">HellaSwag</td>
+<td align="center" style="vertical-align: middle"><strong>91.79</strong></td>
+<td align="center" style="vertical-align: middle">89.90</td>
+<td align="center" style="vertical-align: middle">60.84</td>
+</tr>
+<tr>
+<td align="center" colspan=8><strong>Coding</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">HumanEval</td>
+<td align="center" style="vertical-align: middle"><strong>96.34</strong></td>
+<td align="center" style="vertical-align: middle">95.12</td>
+<td align="center" style="vertical-align: middle">74.39</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">LiveCodeBench</td>
+<td align="center" style="vertical-align: middle"><strong>65.60</strong></td>
+<td align="center" style="vertical-align: middle">39.71</td>
+<td align="center" style="vertical-align: middle">27.43</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">SciCode</td>
+<td align="center" style="vertical-align: middle"><strong>3.08/22.92</strong></td>
+<td align="center" style="vertical-align: middle"><strong>3.08/22.92</strong></td>
+<td align="center" style="vertical-align: middle">3.08/15.11</td>
+</tr>
+<tr>
+<td align="center" colspan=8><strong>Mathematics</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">GSM8K</td>
+<td align="center" style="vertical-align: middle"><strong>95.83</strong></td>
+<td align="center" style="vertical-align: middle">79.83</td>
+<td align="center" style="vertical-align: middle">81.88</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">AIME2025</td>
+<td align="center" style="vertical-align: middle"><strong>65.83</strong></td>
+<td align="center" style="vertical-align: middle">62.08</td>
+<td align="center" style="vertical-align: middle">24.17</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">MATH 500</td>
+<td align="center" style="vertical-align: middle"><strong>97.10</strong></td>
+<td align="center" style="vertical-align: middle">89.80</td>
+<td align="center" style="vertical-align: middle">90.90</td>
+</tr>
+<tr>
+<td align="center" colspan=8><strong>Agentic</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">SWE-bench Verified</td>
+<td align="center" style="vertical-align: middle"><strong>60.60</strong></td>
+<td align="center" style="vertical-align: middle">24.44</td>
+<td align="center" style="vertical-align: middle">51.60</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">Tau2-Retail</td>
+<td align="center" style="vertical-align: middle"><strong>67.55</strong></td>
+<td align="center" style="vertical-align: middle">53.51</td>
+<td align="center" style="vertical-align: middle">62.28</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">Tau2-Airline</td>
+<td align="center" style="vertical-align: middle"><strong>54.00</strong></td>
+<td align="center" style="vertical-align: middle">32.00</td>
+<td align="center" style="vertical-align: middle">52.00</td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">Tau2-Telecom</td>
+<td align="center" style="vertical-align: middle">79.83</td>
+<td align="center" style="vertical-align: middle">4.39</td>
+<td align="center" style="vertical-align: middle"><strong>88.60</strong></td>
+</tr>
+<tr>
+<td align="center" colspan=8><strong>Long Context</strong></td>
+</tr>
+<tr>
+<td align="center" style="vertical-align: middle">RULER</td>
+<td align="center" style="vertical-align: middle"><strong>95.60</strong></td>
+<td align="center" style="vertical-align: middle">89.66</td>
+<td align="center" style="vertical-align: middle">56.12</td>
+</tr>
+</tbody>
+</table>
+## 4. Deployment
+> [!Note]
+> You can access JoyAI-LLM Flash API on https://docs.jdcloud.com/cn/jdaip/chat and we provide OpenAI/Anthropic-compatible API for you.
+> Currently, JoyAI-LLM-Flash-FP8 is recommended to run on the following inference engines:
+* vLLM
+* SGLang
+The minimum version requirement for `transformers` is `4.57.1`.
+Deployment examples can be found in the [Model Deployment Guide](docs/deploy_guidance.md).
+## 5. Model Usage
+The usage demos below demonstrate how to call our official API.
+For third-party APIs deployed with vLLM or SGLang, please note that:
+> [!Note] Recommended sampling parameters: `temperature=0.6`, `top_p=1.0`
+### Chat Completion
+This is a simple chat completion script which shows how to call JoyAI-Flash API.
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
+def simple_chat(client: OpenAI):
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "which one is bigger, 9.11 or 9.9? think carefully.",
+                }
+            ],
+        },
+    ]
+    model_name = client.models.list().data[0].id
+    response = client.chat.completions.create(
+        model=model_name, messages=messages, stream=False, max_tokens=4096
+    )
+    print(f"response: {response.choices[0].message.content}")
+if __name__ == "__main__":
+    simple_chat(client)
+```
+### Tool call Completion
+This is a simple toll call completion script which shows how to call JoyAI-Flash API.
+```python
+import json
+from openai import OpenAI
+client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
+def my_calculator(expression: str) -> str:
+    return str(eval(expression))
+def rewrite(expression: str) -> str:
+    return str(expression)
+def simple_tool_call(client: OpenAI):
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "use my functions to compute the results for the equations: 6+1",
+                },
+            ],
+        },
+    ]
+    tools = [
+        {
+            "type": "function",
+            "function": {
+                "name": "my_calculator",
+                "description": "A calculator that can evaluate a mathematical equation and compute its results.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "expression": {
+                            "type": "string",
+                            "description": "The mathematical expression to evaluate.",
+                        },
+                    },
+                    "required": ["expression"],
+                },
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "rewrite",
+                "description": "Rewrite a given text for improved clarity",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "text": {
+                            "type": "string",
+                            "description": "The input text to rewrite",
+                        }
+                    },
+                },
+            },
+        },
+    ]
+    model_name = client.models.list().data[0].id
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=1.0,
+        max_tokens=1024,
+        tools=tools,
+        tool_choice="auto",
+    )
+    tool_calls = response.choices[0].message.tool_calls
+    results = []
+    for tool_call in tool_calls:
+        function_name = tool_call.function.name
+        function_args = tool_call.function.arguments
+        if function_name == "my_calculator":
+            result = my_calculator(**json.loads(function_args))
+            results.append(result)
+    messages.append({"role": "assistant", "tool_calls": tool_calls})
+    for tool_call, result in zip(tool_calls, results):
+        messages.append(
+            {
+                "role": "tool",
+                "tool_call_id": tool_call.id,
+                "name": tool_call.function.name,
+                "content": result,
+            }
+        )
+    response = client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=1.0,
+        max_tokens=1024,
+    )
+    print(response.choices[0].message.content)
+if __name__ == "__main__":
+    simple_tool_call(client)
+```
+---
+## 6. License
+Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,103 @@

+{%- macro render_extra_keys(json_dict, handled_keys) -%}
+    {%- if json_dict is mapping -%}
+        {%- for json_key in json_dict if json_key not in handled_keys -%}
+            {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) -%}
+                {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' -}}
+            {%- else -%}
+                {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' -}}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- endif -%}
+{%- endmacro -%}
+{%- if not add_generation_prompt is defined -%}{%- set add_generation_prompt = false -%}{%- endif -%}
+{%- set ns = namespace(system_prompt='', is_first_sp=true, is_last_user=false) -%}
+{%- set default_system = "You are JoyAI , a large language model trained by JD（京东）that can interact with a computer to solve tasks. Answer as concisely as possible." -%}
+{%- set ns.system_prompt = default_system -%}
+{%- for message in messages -%}
+    {%- if message['role'] == 'system' -%}
+        {%- if ns.is_first_sp -%}
+            {%- set ns.system_prompt = message['content'] -%}
+            {%- set ns.is_first_sp = false -%}
+        {%- else -%}
+            {%- set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] -%}
+        {%- endif -%}
+    {%- endif -%}
+{%- endfor -%}
+{{- bos_token -}}{{- ns.system_prompt -}}
+{%- if tools is iterable and tools | length > 0 -%}
+    {{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
+    {{- "<tools>" }}
+    {%- for tool in tools %}
+        {%- if tool.function is defined %}
+            {%- set tool = tool.function %}
+        {%- endif %}
+        {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
+        {%- if tool.description is defined %}
+            {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
+        {%- endif %}
+        {{- '\n<parameters>' }}
+        {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
+            {%- for param_name, param_fields in tool.parameters.properties|items %}
+                {{- '\n<parameter>' }}
+                {{- '\n<name>' ~ param_name ~ '</name>' }}
+                {%- if param_fields.type is defined %}
+                    {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
+                {%- endif %}
+                {%- if param_fields.description is defined %}
+                    {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
+                {%- endif %}
+                {%- set handled_keys = ['name', 'type', 'description'] %}
+                {{- render_extra_keys(param_fields, handled_keys) }}
+                {{- '\n</parameter>' }}
+            {%- endfor %}
+        {%- endif %}
+        {% set handled_keys = ['type', 'properties'] %}
+        {{- render_extra_keys(tool.parameters, handled_keys) }}
+        {{- '\n</parameters>' }}
+        {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
+        {{- render_extra_keys(tool, handled_keys) }}
+        {{- '\n</function>' }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+{%- endif %}
+{%- for message in messages -%}
+    {%- if message['role'] == 'user' -%}
+        {%- set ns.is_last_user = true -%}
+        {{- '<|User|>' + message['content'] -}}
+    {%- elif message['role'] == 'assistant' -%}
+        {%- if ns.is_last_user -%}
+            {{ '<|Assistant|>' }}
+        {%- endif -%}
+        {%- set ns.is_last_user = false -%}
+        {%- set content = message.get('content') | default('', true) -%}
+        {{ '<|end_of_thought|>' + content }}
+        {%- if message['tool_calls'] is defined and message['tool_calls'] is not none -%}
+            {%- for tool in message['tool_calls'] -%}
+                {%- if tool.function is defined %}{% set tool = tool.function %}{% endif -%}
+                {{- '\n<tool_call>\n<function=' + tool.name + '>\n' -}}
+                {%- if tool.arguments is defined -%}
+                    {%- if tool.arguments is string -%}{%- set args_data = tool.arguments | from_json -%}{%- else -%}{%- set args_data = tool.arguments -%}{%- endif -%}
+                    {%- for args_name, args_value in args_data.items() -%}
+                        {{- '<parameter=' + args_name + '>\n' -}}
+                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string -%}
+                        {{- args_value -}}{{- '\n</parameter>\n' -}}
+                    {%- endfor -%}
+                {%- endif -%}
+                {{- '</function>\n</tool_call>' -}}
+            {%- endfor -%}
+        {%- endif -%}
+        {{ '<|end▁of▁sentence|>' }}
+    {%- elif message['role'] == 'tool' -%}
+        {%- set ns.is_last_user = true -%}
+        {{ '\n<tool_response>\n' + message['content'] + '\n</tool_response>' }}
+    {%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    {{ '<|Assistant|>' }}{{ '<|end_of_thought|>' }}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "architectures": [
+    "DeepseekV3ForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "auto_map": {
+    "AutoConfig": "configuration_deepseek.DeepseekV3Config",
+    "AutoModel": "modeling_deepseek.DeepseekV3Model",
+    "AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
+  },
+  "bos_token_id": 0,
+  "eos_token_id": 1,
+  "ep_size": 1,
+  "first_k_dense_replace": 1,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 7168,
+  "kv_lora_rank": 512,
+  "max_position_embeddings": 131072,
+  "model_type": "joyai_llm_flash",
+  "moe_intermediate_size": 768,
+  "moe_layer_freq": 1,
+  "n_group": 1,
+  "n_routed_experts": 256,
+  "n_shared_experts": 1,
+  "norm_topk_prob": true,
+  "num_attention_heads": 32,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 40,
+  "num_key_value_heads": 32,
+  "num_nextn_predict_layers": 1,
+  "q_lora_rank": 1536,
+  "qk_nope_head_dim": 128,
+  "qk_rope_head_dim": 64,
+  "quantization_config": {
+    "activation_scheme": "dynamic",
+    "fmt": "e4m3",
+    "quant_method": "fp8",
+    "weight_block_size": [
+      128,
+      128
+    ]
+  },
+  "rms_norm_eps": 1e-06,
+  "rope_theta": 32000000,
+  "routed_scaling_factor": 2.5,
+  "scoring_func": "sigmoid",
+  "tie_word_embeddings": false,
+  "topk_group": 1,
+  "topk_method": "noaux_tc",
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.44.2",
+  "use_cache": true,
+  "v_head_dim": 128,
+  "vocab_size": 129280
+}

configuration_deepseek.py ADDED Viewed

	@@ -0,0 +1,247 @@

+# coding=utf-8
+# Copyright 2025 bzantium and the HuggingFace Inc. team. All rights reserved.
+#
+# This code is based on the DeepSeekV3 implementations from the DeepSeek AI team. (https://huggingface.co/deepseek-ai/DeepSeek-V3)
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""DeepSeekV3 model configuration"""
+from transformers.configuration_utils import PretrainedConfig
+from transformers.modeling_rope_utils import rope_config_validation
+DEEPSEEK_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
+class DeepseekV3Config(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`DeepseekV3Model`]. It is used to instantiate an DeepSeek
+    model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
+    defaults will yield a similar configuration to that of the DeepSeek-V3.
+    e.g. [bzantium/tiny-deepseek-v3](https://huggingface.co/bzantium/tiny-deepseek-v3)
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 129280):
+            Vocabulary size of the Deep model. Defines the number of different tokens that can be represented by the
+            `inputs_ids` passed when calling [`DeepseekV3Model`]
+        hidden_size (`int`, *optional*, defaults to 7168):
+            Dimension of the hidden representations.
+        intermediate_size (`int`, *optional*, defaults to 18432):
+            Dimension of the MLP representations.
+        moe_intermediate_size (`int`, *optional*, defaults to 2048):
+            Dimension of the MoE representations.
+        num_hidden_layers (`int`, *optional*, defaults to 61):
+            Number of hidden layers in the Transformer decoder.
+        num_attention_heads (`int`, *optional*, defaults to 128):
+            Number of attention heads for each attention layer in the Transformer decoder.
+        num_key_value_heads (`int`, *optional*, defaults to 128):
+            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
+            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
+            `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
+            converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
+            by meanpooling all the original heads within that group. For more details checkout [this
+            paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
+            `num_attention_heads`.
+        n_shared_experts (`int`, *optional*, defaults to 1):
+            Number of shared experts.
+        n_routed_experts (`int`, *optional*, defaults to 256):
+            Number of routed experts.
+        routed_scaling_factor (`float`, *optional*, defaults to 2.5):
+            Scaling factor or routed experts.
+        kv_lora_rank (`int`, *optional*, defaults to 512):
+            Rank of the LoRA matrices for key and value projections.
+        q_lora_rank (`int`, *optional*, defaults to 1536):
+            Rank of the LoRA matrices for query projections.
+        qk_rope_head_dim (`int`, *optional*, defaults to 64):
+            Dimension of the query/key heads that use rotary position embeddings.
+        v_head_dim (`int`, *optional*, defaults to 128):
+            Dimension of the value heads.
+        qk_nope_head_dim (`int`, *optional*, defaults to 128):
+            Dimension of the query/key heads that don't use rotary position embeddings.
+        n_group (`int`, *optional*, defaults to 8):
+            Number of groups for routed experts.
+        topk_group (`int`, *optional*, defaults to 4):
+            Number of selected groups for each token(for each token, ensuring the selected experts is only within `topk_group` groups).
+        num_experts_per_tok (`int`, *optional*, defaults to 8):
+            Number of selected experts, None means dense model.
+        first_k_dense_replace (`int`, *optional*, defaults to 3):
+            Number of dense layers in shallow layers(embed->dense->dense->...->dense->moe->moe...->lm_head).
+                                                            \--k dense layers--/
+        norm_topk_prob (`bool`, *optional*, defaults to `True`):
+            Whether to normalize the weights of the routed experts.
+        hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
+            The non-linear activation function (function or string) in the decoder.
+        max_position_embeddings (`int`, *optional*, defaults to 4096):
+            The maximum sequence length that this model might ever be used with.
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        rms_norm_eps (`float`, *optional*, defaults to 1e-06):
+            The epsilon used by the rms normalization layers.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if `config.is_decoder=True`.
+        pad_token_id (`int`, *optional*):
+            Padding token id.
+        bos_token_id (`int`, *optional*, defaults to 0):
+            Beginning of stream token id.
+        eos_token_id (`int`, *optional*, defaults to 1):
+            End of stream token id.
+        pretraining_tp (`int`, *optional*, defaults to 1):
+            Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
+            document](https://huggingface.co/docs/transformers/parallelism) to understand more about it. This value is
+            necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
+            issue](https://github.com/pytorch/pytorch/issues/76232).
+        tie_word_embeddings (`bool`, *optional*, defaults to `False`):
+            Whether to tie weight embeddings
+        rope_theta (`float`, *optional*, defaults to 10000.0):
+            The base period of the RoPE embeddings.
+        rope_scaling (`Dict`, *optional*):
+            Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
+            strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
+            `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
+            `max_position_embeddings` to the expected new maximum.
+        rope_interleave (`bool`, *optional*, defaults to `True`):
+            Whether to interleave the rotary position embeddings.
+        attention_bias (`bool`, defaults to `False`, *optional*, defaults to `False`):
+            Whether to use a bias in the query, key, value and output projection layers during self-attention.
+        attention_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout ratio for the attention probabilities.
+    ```python
+    >>> from transformers import DeepseekV3Model, DeepseekV3Config
+    >>> # Initializing a Deepseek-V3 style configuration
+    >>> configuration = DeepseekV3Config()
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+    model_type = "deepseek_v3"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    base_model_tp_plan = {  # TODO: only replicate attention layers when > first_k_dense_replace
+        "layers.*.mlp.experts.*.gate_proj": "local_colwise",
+        "layers.*.mlp.experts.*.up_proj": "local_colwise",
+        "layers.*.mlp.experts.*.down_proj": "local_rowwise",
+        "layers.*.mlp.experts.*": "local",  # each expert is wrapped in a module list
+        "layers.*.mlp.shared_experts.gate_proj": "local_colwise",
+        "layers.*.mlp.shared_experts.up_proj": "local_colwise",
+        "layers.*.mlp.shared_experts.down_proj": "local_rowwise",
+        "layers.*.mlp.shared_experts": "local",
+        "layers.*.mlp.gate_proj": "local_colwise",
+        "layers.*.mlp.up_proj": "local_colwise",
+        "layers.*.mlp.down_proj": "local_rowwise",
+        "layers.*.mlp": "gather",  # This is the only moment where results are gathered
+    }
+    base_model_pp_plan = {
+        "embed_tokens": (["input_ids"], ["inputs_embeds"]),
+        "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
+        "norm": (["hidden_states"], ["hidden_states"]),
+    }
+    def __init__(
+        self,
+        vocab_size=129280,
+        hidden_size=7168,
+        intermediate_size=18432,
+        moe_intermediate_size=2048,
+        num_hidden_layers=61,
+        num_attention_heads=128,
+        num_key_value_heads=128,
+        n_shared_experts=1,
+        n_routed_experts=256,
+        routed_scaling_factor=2.5,
+        kv_lora_rank=512,
+        q_lora_rank=1536,
+        qk_rope_head_dim=64,
+        v_head_dim=128,
+        qk_nope_head_dim=128,
+        n_group=8,
+        topk_group=4,
+        num_experts_per_tok=8,
+        first_k_dense_replace=3,
+        norm_topk_prob=True,
+        hidden_act="silu",
+        max_position_embeddings=4096,
+        initializer_range=0.02,
+        rms_norm_eps=1e-6,
+        use_cache=True,
+        pad_token_id=None,
+        bos_token_id=0,
+        eos_token_id=1,
+        pretraining_tp=1,
+        tie_word_embeddings=False,
+        rope_theta=10000.0,
+        rope_scaling=None,
+        rope_interleave=True,
+        attention_bias=False,
+        attention_dropout=0.0,
+        **kwargs,
+    ):
+        self.vocab_size = vocab_size
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.intermediate_size = intermediate_size
+        self.moe_intermediate_size = moe_intermediate_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.n_shared_experts = n_shared_experts
+        self.n_routed_experts = n_routed_experts
+        self.routed_scaling_factor = routed_scaling_factor
+        self.kv_lora_rank = kv_lora_rank
+        self.q_lora_rank = q_lora_rank
+        self.qk_rope_head_dim = qk_rope_head_dim
+        self.v_head_dim = v_head_dim
+        self.qk_nope_head_dim = qk_nope_head_dim
+        self.qk_head_dim = qk_nope_head_dim + qk_rope_head_dim
+        self.head_dim = qk_rope_head_dim
+        self.n_group = n_group
+        self.topk_group = topk_group
+        self.num_experts_per_tok = num_experts_per_tok
+        self.first_k_dense_replace = first_k_dense_replace
+        self.norm_topk_prob = norm_topk_prob
+        self.rope_interleave = rope_interleave
+        # for backward compatibility
+        if num_key_value_heads is None:
+            num_key_value_heads = num_attention_heads
+        self.num_key_value_heads = num_key_value_heads
+        self.hidden_act = hidden_act
+        self.initializer_range = initializer_range
+        self.rms_norm_eps = rms_norm_eps
+        self.pretraining_tp = pretraining_tp
+        self.use_cache = use_cache
+        self.rope_theta = rope_theta
+        self.rope_scaling = rope_scaling
+        self.attention_bias = attention_bias
+        self.attention_dropout = attention_dropout
+        # Validate the correctness of rotary position embeddings parameters
+        # BC: if there is a 'type' field, copy it it to 'rope_type'.
+        if self.rope_scaling is not None and "type" in self.rope_scaling:
+            self.rope_scaling["rope_type"] = self.rope_scaling["type"]
+        rope_config_validation(self)
+        super().__init__(
+            pad_token_id=pad_token_id,
+            bos_token_id=bos_token_id,
+            eos_token_id=eos_token_id,
+            tie_word_embeddings=tie_word_embeddings,
+            **kwargs,
+        )
+__all__ = ["DeepseekV3Config"]

docs/deploy_guidance.md ADDED Viewed

	@@ -0,0 +1,42 @@

+# Model Deployment Guide
+> [!Note]
+> This guide offers a selection of deployment command examples for JoyAI-LLM Flash, which may not be the optimal configuration. Given the rapid evolution of inference engines, we recommend referring to their official documentation for the latest updates to ensure peak performance.
+> Support for JoyAI-LLM Flash’s dense MTP architecture is currently being integrated into vLLM and SGLang. Until these PRs are merged into a stable release, please use the nightly Docker image for access to these features.
+## vLLM Deployment
+Here is the example to serve this model on a single GPU card via vLLM:
+1. pull the Docker image.
+```bash
+docker pull jdopensource/joyai-llm-vllm:v0.15.1-joyai_llm_flash
+```
+2. launch JoyAI-LLM Flash model with dense MTP (Also quantized to FP8).
+```bash
+vllm serve jdopensource/JoyAI-LLM-Flash-FP8 -tp 1 --trust-remote-code \
+  --tool-call-parser qwen3_coder --enable-auto-tool-choice \
+  --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
+```
+**Key notes**
+- `--tool-call-parser qwen3_coder`: Required for enabling tool calling
+## SGLang Deployment
+Similarly, here is the example to run on a single GPU card via SGLang:
+1. pull the Docker image.
+```bash
+docker pull jdopensource/joyai-llm-sglang:v0.5.8-joyai_llm_flash
+```
+2. launch JoyAI-LLM Flash model with dense MTP (Also quantized to FP8).
+```bash
+python3 -m sglang.launch_server --model-path jdopensource/JoyAI-LLM-Flash-FP8 --tp-size 1 --trust-remote-code \
+  --tool-call-parser qwen3_coder \
+  --speculative-algorithm EAGLE \
+  --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
+```
+**Key notes:**
+- `--tool-call-parser qwen3_coder`: Required when enabling tool usage.

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

modeling_deepseek.py ADDED Viewed

	@@ -0,0 +1,1030 @@

+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+#           This file was automatically generated from src/transformers/models/deepseek_v3/modular_deepseek_v3.py.
+#               Do NOT edit this file manually as any edits will be overwritten by the generation of
+#             the file from the modular. If any change should be done, please apply the change to the
+#                          modular_deepseek_v3.py file directly. One of our CI enforces this.
+#                🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
+import math
+from functools import partial
+from typing import Callable, Optional, Tuple, Union
+import torch
+import torch.nn.functional as F
+from torch import nn
+from transformers.activations import ACT2FN
+from transformers.cache_utils import Cache, DynamicCache, StaticCache
+from transformers.generation import GenerationMixin
+from transformers.modeling_attn_mask_utils import AttentionMaskConverter
+from transformers.modeling_flash_attention_utils import FlashAttentionKwargs
+from transformers.modeling_outputs import (
+    BaseModelOutputWithPast,
+    CausalLMOutputWithPast,
+)
+from transformers.modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update
+from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
+from transformers.processing_utils import Unpack
+from transformers.utils import (
+    LossKwargs,
+    add_start_docstrings,
+    add_start_docstrings_to_model_forward,
+    can_return_tuple,
+    is_torch_flex_attn_available,
+    logging,
+    replace_return_docstrings,
+)
+from transformers.utils.deprecation import deprecate_kwarg
+from .configuration_deepseek import DeepseekV3Config
+if is_torch_flex_attn_available():
+    from torch.nn.attention.flex_attention import BlockMask
+    from transformers.integrations.flex_attention import make_flex_block_causal_mask
+logger = logging.get_logger(__name__)
+_CONFIG_FOR_DOC = "DeepseekV3Config"
+class DeepseekV3RMSNorm(nn.Module):
+    def __init__(self, hidden_size, eps=1e-6):
+        """
+        DeepseekV3RMSNorm is equivalent to T5LayerNorm
+        """
+        super().__init__()
+        self.weight = nn.Parameter(torch.ones(hidden_size))
+        self.variance_epsilon = eps
+    def forward(self, hidden_states):
+        input_dtype = hidden_states.dtype
+        hidden_states = hidden_states.to(torch.float32)
+        variance = hidden_states.pow(2).mean(-1, keepdim=True)
+        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
+        return self.weight * hidden_states.to(input_dtype)
+    def extra_repr(self):
+        return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"
+class DeepseekV3RotaryEmbedding(nn.Module):
+    def __init__(self, config: DeepseekV3Config, device=None):
+        super().__init__()
+        # BC: "rope_type" was originally "type"
+        if hasattr(config, "rope_scaling") and config.rope_scaling is not None:
+            self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type"))
+        else:
+            self.rope_type = "default"
+        self.max_seq_len_cached = config.max_position_embeddings
+        self.original_max_seq_len = config.max_position_embeddings
+        self.config = config
+        self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
+        inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)
+        self.register_buffer("inv_freq", inv_freq, persistent=False)
+        self.original_inv_freq = self.inv_freq
+    @torch.no_grad()
+    @dynamic_rope_update  # power user: used with advanced RoPE types (e.g. dynamic rope)
+    def forward(self, x, position_ids):
+        inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1).to(x.device)
+        position_ids_expanded = position_ids[:, None, :].float()
+        device_type = x.device.type if isinstance(x.device.type, str) and x.device.type != "mps" else "cpu"
+        with torch.autocast(device_type=device_type, enabled=False):  # Force float32
+            freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
+            emb = torch.cat((freqs, freqs), dim=-1)
+            cos = emb.cos() * self.attention_scaling
+            sin = emb.sin() * self.attention_scaling
+        return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
+class DeepseekV3MLP(nn.Module):
+    def __init__(self, config, hidden_size=None, intermediate_size=None):
+        super().__init__()
+        self.config = config
+        self.hidden_size = config.hidden_size if hidden_size is None else hidden_size
+        self.intermediate_size = config.intermediate_size if intermediate_size is None else intermediate_size
+        self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+        self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
+        self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
+        self.act_fn = ACT2FN[config.hidden_act]
+    def forward(self, x):
+        down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
+        return down_proj
+class DeepseekV3TopkRouter(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.config = config
+        self.top_k = config.num_experts_per_tok
+        self.n_routed_experts = config.n_routed_experts
+        self.routed_scaling_factor = config.routed_scaling_factor
+        self.n_group = config.n_group
+        self.topk_group = config.topk_group
+        self.norm_topk_prob = config.norm_topk_prob
+        self.weight = nn.Parameter(torch.empty((self.n_routed_experts, config.hidden_size)))
+        self.register_buffer("e_score_correction_bias", torch.zeros((self.n_routed_experts)))
+    @torch.no_grad()
+    def get_topk_indices(self, scores):
+        scores_for_choice = scores.view(-1, self.n_routed_experts) + self.e_score_correction_bias.unsqueeze(0)
+        group_scores = (
+            scores_for_choice.view(-1, self.n_group, self.n_routed_experts // self.n_group)
+            .topk(2, dim=-1)[0]
+            .sum(dim=-1)
+        )
+        group_idx = torch.topk(group_scores, k=self.topk_group, dim=-1, sorted=False)[1]
+        group_mask = torch.zeros_like(group_scores)
+        group_mask.scatter_(1, group_idx, 1)
+        score_mask = (
+            group_mask.unsqueeze(-1)
+            .expand(-1, self.n_group, self.n_routed_experts // self.n_group)
+            .reshape(-1, self.n_routed_experts)
+        )
+        scores_for_choice = scores_for_choice.masked_fill(~score_mask.bool(), 0.0)
+        topk_indices = torch.topk(scores_for_choice, k=self.top_k, dim=-1, sorted=False)[1]
+        return topk_indices
+    def forward(self, hidden_states):
+        hidden_states = hidden_states.view(-1, self.config.hidden_size)
+        router_logits = F.linear(hidden_states.type(torch.float32), self.weight.type(torch.float32))
+        scores = router_logits.sigmoid()
+        topk_indices = self.get_topk_indices(scores)
+        topk_weights = scores.gather(1, topk_indices)
+        if self.norm_topk_prob:
+            denominator = topk_weights.sum(dim=-1, keepdim=True) + 1e-20
+            topk_weights /= denominator
+        topk_weights = topk_weights * self.routed_scaling_factor
+        return topk_indices, topk_weights
+class DeepseekV3MoE(nn.Module):
+    """
+    A mixed expert module containing shared experts.
+    """
+    def __init__(self, config):
+        super().__init__()
+        self.config = config
+        self.experts = nn.ModuleList(
+            [
+                DeepseekV3MLP(config, intermediate_size=config.moe_intermediate_size)
+                for _ in range(config.n_routed_experts)
+            ]
+        )
+        self.gate = DeepseekV3TopkRouter(config)
+        self.shared_experts = DeepseekV3MLP(
+            config=config, intermediate_size=config.moe_intermediate_size * config.n_shared_experts
+        )
+    def moe(self, hidden_states: torch.Tensor, topk_indices: torch.Tensor, topk_weights: torch.Tensor):
+        r"""
+        CALL FOR CONTRIBUTION! I don't have time to optimise this right now, but expert weights need to be fused
+        to not have to do a loop here (deepseek has 256 experts soooo yeah).
+        """
+        final_hidden_states = torch.zeros_like(hidden_states, dtype=topk_weights.dtype)
+        expert_mask = torch.nn.functional.one_hot(topk_indices, num_classes=len(self.experts))
+        expert_mask = expert_mask.permute(2, 0, 1)
+        for expert_idx in range(len(self.experts)):
+            expert = self.experts[expert_idx]
+            mask = expert_mask[expert_idx]
+            token_indices, weight_indices = torch.where(mask)
+            if token_indices.numel() > 0:
+                expert_weights = topk_weights[token_indices, weight_indices]
+                expert_input = hidden_states[token_indices]
+                expert_output = expert(expert_input)
+                weighted_output = expert_output * expert_weights.unsqueeze(-1)
+                final_hidden_states.index_add_(0, token_indices, weighted_output)
+        # in original deepseek, the output of the experts are gathered once we leave this module
+        # thus the moe module is itelsf an IsolatedParallel module
+        # and all expert are "local" meaning we shard but we don't gather
+        return final_hidden_states.type(hidden_states.dtype)
+    def forward(self, hidden_states):
+        residuals = hidden_states
+        orig_shape = hidden_states.shape
+        topk_indices, topk_weights = self.gate(hidden_states)
+        hidden_states = hidden_states.view(-1, hidden_states.shape[-1])
+        hidden_states = self.moe(hidden_states, topk_indices, topk_weights).view(*orig_shape)
+        hidden_states = hidden_states + self.shared_experts(residuals)
+        return hidden_states
+def rotate_half(x):
+    """Rotates half the hidden dims of the input."""
+    x1 = x[..., : x.shape[-1] // 2]
+    x2 = x[..., x.shape[-1] // 2 :]
+    return torch.cat((-x2, x1), dim=-1)
+def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
+    """Applies Rotary Position Embedding to the query and key tensors.
+    Args:
+        q (`torch.Tensor`): The query tensor.
+        k (`torch.Tensor`): The key tensor.
+        cos (`torch.Tensor`): The cosine part of the rotary embedding.
+        sin (`torch.Tensor`): The sine part of the rotary embedding.
+        position_ids (`torch.Tensor`, *optional*):
+            Deprecated and unused.
+        unsqueeze_dim (`int`, *optional*, defaults to 1):
+            The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
+            sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
+            that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
+            k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
+            cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
+            the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
+    Returns:
+        `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
+    """
+    cos = cos.unsqueeze(unsqueeze_dim)
+    sin = sin.unsqueeze(unsqueeze_dim)
+    q_embed = (q * cos) + (rotate_half(q) * sin)
+    k_embed = (k * cos) + (rotate_half(k) * sin)
+    return q_embed, k_embed
+def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
+    """
+    This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
+    num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
+    """
+    batch, num_key_value_heads, slen, head_dim = hidden_states.shape
+    if n_rep == 1:
+        return hidden_states
+    hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
+    return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
+def eager_attention_forward(
+    module: nn.Module,
+    query: torch.Tensor,
+    key: torch.Tensor,
+    value: torch.Tensor,
+    attention_mask: Optional[torch.Tensor],
+    scaling: float,
+    dropout: float = 0.0,
+    **kwargs,
+):
+    key_states = repeat_kv(key, module.num_key_value_groups)
+    value_states = repeat_kv(value, module.num_key_value_groups)
+    attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling
+    if attention_mask is not None:
+        causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
+        attn_weights = attn_weights + causal_mask
+    attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
+    attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training)
+    attn_output = torch.matmul(attn_weights, value_states)
+    attn_output = attn_output.transpose(1, 2).contiguous()
+    return attn_output, attn_weights
+def apply_rotary_pos_emb_interleave(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
+    r"""
+    TODO let's just use the original freqcis computation to not have the view
+    transpose + reshape! This is not optimized!
+    Applies Rotary Position Embedding to the query and key tensors.
+    Args:
+        q (`torch.Tensor`): The query tensor.
+        k (`torch.Tensor`): The key tensor.
+        cos (`torch.Tensor`): The cosine part of the rotary embedding.
+        sin (`torch.Tensor`): The sine part of the rotary embedding.
+        position_ids (`torch.Tensor`):
+            The position indices of the tokens corresponding to the query and key tensors. For example, this can be
+            used to pass offsetted position ids when working with a KV-cache.
+        unsqueeze_dim (`int`, *optional*, defaults to 1):
+            The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
+            sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
+            that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
+            k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
+            cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
+            the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
+    Returns:
+        `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
+    """
+    cos = cos.unsqueeze(unsqueeze_dim)
+    sin = sin.unsqueeze(unsqueeze_dim)
+    b, h, s, d = q.shape
+    q = q.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)
+    b, h, s, d = k.shape
+    k = k.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)
+    q_embed = (q * cos) + (rotate_half(q) * sin)
+    k_embed = (k * cos) + (rotate_half(k) * sin)
+    return q_embed, k_embed
+def yarn_get_mscale(scale=1, mscale=1):
+    if scale <= 1:
+        return 1.0
+    return 0.1 * mscale * math.log(scale) + 1.0
+class DeepseekV3Attention(nn.Module):
+    """Multi-headed attention from 'Attention Is All You Need' paper"""
+    def __init__(self, config: DeepseekV3Config, layer_idx: int):
+        super().__init__()
+        self.config = config
+        self.layer_idx = layer_idx
+        self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads
+        self.attention_dropout = config.attention_dropout
+        self.num_heads = config.num_attention_heads
+        self.rope_theta = config.rope_theta
+        self.q_lora_rank = config.q_lora_rank
+        self.qk_rope_head_dim = config.qk_rope_head_dim
+        self.kv_lora_rank = config.kv_lora_rank
+        self.v_head_dim = config.v_head_dim
+        self.qk_nope_head_dim = config.qk_nope_head_dim
+        self.qk_head_dim = config.qk_head_dim
+        self.is_causal = True
+        self.q_a_proj = nn.Linear(config.hidden_size, config.q_lora_rank, bias=config.attention_bias)
+        self.q_a_layernorm = DeepseekV3RMSNorm(config.q_lora_rank)
+        self.q_b_proj = nn.Linear(config.q_lora_rank, self.num_heads * self.qk_head_dim, bias=False)
+        self.kv_a_proj_with_mqa = nn.Linear(
+            config.hidden_size,
+            self.kv_lora_rank + self.qk_rope_head_dim,
+            bias=config.attention_bias,
+        )
+        self.kv_a_layernorm = DeepseekV3RMSNorm(self.kv_lora_rank)
+        self.kv_b_proj = nn.Linear(
+            self.kv_lora_rank,
+            self.num_heads * (self.qk_nope_head_dim + self.v_head_dim),
+            bias=False,
+        )
+        self.o_proj = nn.Linear(
+            self.num_heads * self.v_head_dim,
+            config.hidden_size,
+            bias=config.attention_bias,
+        )
+        self.scaling = self.qk_head_dim ** (-0.5)
+        if self.config.rope_scaling is not None:
+            mscale_all_dim = self.config.rope_scaling.get("mscale_all_dim", 0)
+            scaling_factor = self.config.rope_scaling["factor"]
+            if mscale_all_dim:
+                mscale = yarn_get_mscale(scaling_factor, mscale_all_dim)
+                self.scaling = self.scaling * mscale * mscale
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        position_embeddings: Tuple[torch.Tensor, torch.Tensor],
+        attention_mask: Optional[torch.Tensor],
+        past_key_value: Optional[Cache] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        **kwargs: Unpack[FlashAttentionKwargs],
+    ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
+        batch_size, seq_length = hidden_states.shape[:-1]
+        query_shape = (batch_size, seq_length, -1, self.qk_head_dim)
+        key_shape = (batch_size, seq_length, -1, self.qk_nope_head_dim + self.v_head_dim)
+        q_states = self.q_b_proj(self.q_a_layernorm(self.q_a_proj(hidden_states))).view(query_shape).transpose(1, 2)
+        q_pass, q_rot = torch.split(q_states, [self.qk_nope_head_dim, self.qk_rope_head_dim], dim=-1)
+        compressed_kv = self.kv_a_proj_with_mqa(hidden_states)
+        k_pass, k_rot = torch.split(compressed_kv, [self.kv_lora_rank, self.qk_rope_head_dim], dim=-1)
+        k_pass = self.kv_b_proj(self.kv_a_layernorm(k_pass)).view(key_shape).transpose(1, 2)
+        k_pass, value_states = torch.split(k_pass, [self.qk_nope_head_dim, self.v_head_dim], dim=-1)
+        k_rot = k_rot.view(batch_size, 1, seq_length, self.qk_rope_head_dim)
+        cos, sin = position_embeddings
+        if self.config.rope_interleave:  # support using interleaved weights for efficiency
+            q_rot, k_rot = apply_rotary_pos_emb_interleave(q_rot, k_rot, cos, sin)
+        else:
+            q_rot, k_rot = apply_rotary_pos_emb(q_rot, k_rot, cos, sin)
+        k_rot = k_rot.expand(*k_pass.shape[:-1], -1)
+        query_states = torch.cat((q_pass, q_rot), dim=-1)
+        key_states = torch.cat((k_pass, k_rot), dim=-1)
+        if past_key_value is not None:
+            # sin and cos are specific to RoPE models; cache_position needed for the static cache
+            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
+            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
+        if self.config._attn_implementation == "flash_attention_2" and self.qk_head_dim != self.v_head_dim:
+            value_states = F.pad(value_states, [0, self.qk_head_dim - self.v_head_dim])
+        attention_interface: Callable = eager_attention_forward
+        if self.config._attn_implementation != "eager":
+            if self.config._attn_implementation == "sdpa" and kwargs.get("output_attentions", False):
+                logger.warning_once(
+                    "`torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to "
+                    'eager attention. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
+                )
+            else:
+                attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
+        attn_output, attn_weights = attention_interface(
+            self,
+            query_states,
+            key_states,
+            value_states,
+            attention_mask,
+            dropout=0.0 if not self.training else self.attention_dropout,
+            scaling=self.scaling,
+            **kwargs,
+        )
+        if self.config._attn_implementation == "flash_attention_2" and self.qk_head_dim != self.v_head_dim:
+            attn_output = attn_output[:, :, :, : self.v_head_dim]
+        attn_output = attn_output.reshape(batch_size, seq_length, -1).contiguous()
+        attn_output = self.o_proj(attn_output)
+        return attn_output, attn_weights
+class DeepseekV3DecoderLayer(nn.Module):
+    def __init__(self, config: DeepseekV3Config, layer_idx: int):
+        super().__init__()
+        self.hidden_size = config.hidden_size
+        self.self_attn = DeepseekV3Attention(config=config, layer_idx=layer_idx)
+        if layer_idx >= config.first_k_dense_replace:
+            self.mlp = DeepseekV3MoE(config)
+        else:
+            self.mlp = DeepseekV3MLP(config)
+        self.input_layernorm = DeepseekV3RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.post_attention_layernorm = DeepseekV3RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_value: Optional[Cache] = None,
+        output_attentions: Optional[bool] = False,
+        use_cache: Optional[bool] = False,
+        cache_position: Optional[torch.LongTensor] = None,
+        position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,  # necessary, but kept here for BC
+        **kwargs: Unpack[FlashAttentionKwargs],
+    ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
+        residual = hidden_states
+        hidden_states = self.input_layernorm(hidden_states)
+        # Self Attention
+        hidden_states, self_attn_weights = self.self_attn(
+            hidden_states=hidden_states,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            past_key_value=past_key_value,
+            output_attentions=output_attentions,
+            use_cache=use_cache,
+            cache_position=cache_position,
+            position_embeddings=position_embeddings,
+            **kwargs,
+        )
+        hidden_states = residual + hidden_states
+        # Fully Connected
+        residual = hidden_states
+        hidden_states = self.post_attention_layernorm(hidden_states)
+        hidden_states = self.mlp(hidden_states)
+        hidden_states = residual + hidden_states
+        outputs = (hidden_states,)
+        if output_attentions:
+            outputs += (self_attn_weights,)
+        return outputs
+DEEPSEEK_V3_START_DOCSTRING = r"""
+    This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
+    library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
+    etc.)
+    This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
+    Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
+    and behavior.
+    Parameters:
+        config ([`DeepseekV3Config`]):
+            Model configuration class with all the parameters of the model. Initializing with a config file does not
+            load the weights associated with the model, only the configuration. Check out the
+            [`~PreTrainedModel.from_pretrained`] method to load the model weights.
+"""
+@add_start_docstrings(
+    "The bare DeepseekV3 Model outputting raw hidden-states without any specific head on top.",
+    DEEPSEEK_V3_START_DOCSTRING,
+)
+class DeepseekV3PreTrainedModel(PreTrainedModel):
+    config_class = DeepseekV3Config
+    base_model_prefix = "model"
+    supports_gradient_checkpointing = True
+    _no_split_modules = ["DeepseekV3DecoderLayer"]
+    _skip_keys_device_placement = ["past_key_values"]
+    _supports_flash_attn_2 = True
+    _supports_sdpa = True
+    _supports_flex_attn = True
+    _supports_cache_class = True
+    _supports_quantized_cache = True
+    _supports_static_cache = True
+    _supports_attention_backend = True
+    def _init_weights(self, module):
+        std = self.config.initializer_range
+        if isinstance(module, nn.Linear):
+            module.weight.data.normal_(mean=0.0, std=std)
+            if module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.Embedding):
+            module.weight.data.normal_(mean=0.0, std=std)
+            if module.padding_idx is not None:
+                module.weight.data[module.padding_idx].zero_()
+        elif isinstance(module, DeepseekV3TopkRouter):
+            module.weight.data.normal_(mean=0.0, std=std)
+        elif isinstance(module, nn.Parameter):
+            module.weight.data.normal_(mean=0.0, std=std)
+DEEPSEEK_V3_INPUTS_DOCSTRING = r"""
+    Args:
+        input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
+            Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
+            it.
+            Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
+            [`PreTrainedTokenizer.__call__`] for details.
+            [What are input IDs?](../glossary#input-ids)
+        attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
+            - 1 for tokens that are **not masked**,
+            - 0 for tokens that are **masked**.
+            [What are attention masks?](../glossary#attention-mask)
+            Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
+            [`PreTrainedTokenizer.__call__`] for details.
+            If `past_key_values` is used, optionally only the last `input_ids` have to be input (see
+            `past_key_values`).
+            If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
+            and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
+            information on the default strategy.
+            - 1 indicates the head is **not masked**,
+            - 0 indicates the head is **masked**.
+        position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+            Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
+            config.n_positions - 1]`.
+            [What are position IDs?](../glossary#position-ids)
+        past_key_values (`Cache`, *optional*):
+            Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
+            blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values`
+            returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`.
+            It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache).
+            If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
+            have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
+            of shape `(batch_size, sequence_length)`.
+        inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
+            Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
+            is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
+            model's internal embedding lookup matrix.
+        use_cache (`bool`, *optional*):
+            If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
+            `past_key_values`).
+        output_attentions (`bool`, *optional*):
+            Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
+            tensors for more detail.
+        output_hidden_states (`bool`, *optional*):
+            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
+            more detail.
+        return_dict (`bool`, *optional*):
+            Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
+        cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
+            Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
+            this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
+            the complete sequence length.
+"""
+@add_start_docstrings(
+    "The bare DeepseekV3 Model outputting raw hidden-states without any specific head on top.",
+    DEEPSEEK_V3_START_DOCSTRING,
+)
+class DeepseekV3Model(DeepseekV3PreTrainedModel):
+    """
+    Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`DeepseekV3DecoderLayer`]
+    Args:
+        config: DeepseekV3Config
+    """
+    _keys_to_ignore_on_load_unexpected = [r"model\.layers\.61.*"]
+    def __init__(self, config: DeepseekV3Config):
+        super().__init__(config)
+        self.padding_idx = config.pad_token_id
+        self.vocab_size = config.vocab_size
+        self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
+        self.layers = nn.ModuleList(
+            [DeepseekV3DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
+        )
+        self.norm = DeepseekV3RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.rotary_emb = DeepseekV3RotaryEmbedding(config=config)
+        self.gradient_checkpointing = False
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_input_embeddings(self):
+        return self.embed_tokens
+    def set_input_embeddings(self, value):
+        self.embed_tokens = value
+    @can_return_tuple
+    @add_start_docstrings_to_model_forward(DEEPSEEK_V3_INPUTS_DOCSTRING)
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Cache] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        **flash_attn_kwargs: Unpack[FlashAttentionKwargs],
+    ) -> BaseModelOutputWithPast:
+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        use_cache = use_cache if use_cache is not None else self.config.use_cache
+        if (input_ids is None) ^ (inputs_embeds is not None):
+            raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
+        if self.gradient_checkpointing and self.training and use_cache:
+            logger.warning_once(
+                "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`."
+            )
+            use_cache = False
+        # TODO (joao): remove this exception in v4.56 -- it exists for users that try to pass a legacy cache
+        if not isinstance(past_key_values, (type(None), Cache)):
+            raise ValueError("The `past_key_values` should be either a `Cache` object or `None`.")
+        if inputs_embeds is None:
+            inputs_embeds = self.embed_tokens(input_ids)
+        if use_cache and past_key_values is None:
+            past_key_values = DynamicCache()
+        if cache_position is None:
+            past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
+            cache_position = torch.arange(
+                past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device
+            )
+        if position_ids is None:
+            position_ids = cache_position.unsqueeze(0)
+        causal_mask = self._update_causal_mask(
+            attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
+        )
+        hidden_states = inputs_embeds
+        # create position embeddings to be shared across the decoder layers
+        position_embeddings = self.rotary_emb(hidden_states, position_ids)
+        # decoder layers
+        all_hidden_states = () if output_hidden_states else None
+        all_self_attns = () if output_attentions else None
+        for decoder_layer in self.layers[: self.config.num_hidden_layers]:
+            if output_hidden_states:
+                all_hidden_states += (hidden_states,)
+            if self.gradient_checkpointing and self.training:
+                layer_outputs = self._gradient_checkpointing_func(
+                    partial(decoder_layer.__call__, **flash_attn_kwargs),
+                    hidden_states,
+                    causal_mask,
+                    position_ids,
+                    past_key_values,
+                    output_attentions,
+                    use_cache,
+                    cache_position,
+                    position_embeddings,
+                )
+            else:
+                layer_outputs = decoder_layer(
+                    hidden_states,
+                    attention_mask=causal_mask,
+                    position_ids=position_ids,
+                    past_key_value=past_key_values,
+                    output_attentions=output_attentions,
+                    use_cache=use_cache,
+                    cache_position=cache_position,
+                    position_embeddings=position_embeddings,
+                    **flash_attn_kwargs,
+                )
+            hidden_states = layer_outputs[0]
+            if output_attentions:
+                all_self_attns += (layer_outputs[1],)
+        hidden_states = self.norm(hidden_states)
+        # add hidden states from the last decoder layer
+        if output_hidden_states:
+            all_hidden_states += (hidden_states,)
+        return BaseModelOutputWithPast(
+            last_hidden_state=hidden_states,
+            past_key_values=past_key_values if use_cache else None,
+            hidden_states=all_hidden_states,
+            attentions=all_self_attns,
+        )
+    def _update_causal_mask(
+        self,
+        attention_mask: torch.Tensor,
+        input_tensor: torch.Tensor,
+        cache_position: torch.Tensor,
+        past_key_values: Cache,
+        output_attentions: bool = False,
+    ):
+        if self.config._attn_implementation == "flash_attention_2":
+            if attention_mask is not None and (attention_mask == 0.0).any():
+                return attention_mask
+            return None
+        if self.config._attn_implementation == "flex_attention":
+            if isinstance(attention_mask, torch.Tensor):
+                attention_mask = make_flex_block_causal_mask(attention_mask)
+            if isinstance(attention_mask, BlockMask):
+                return attention_mask
+        # For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
+        # order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
+        # to infer the attention mask.
+        past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
+        using_static_cache = isinstance(past_key_values, StaticCache)
+        # When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
+        if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
+            if AttentionMaskConverter._ignore_causal_mask_sdpa(
+                attention_mask,
+                inputs_embeds=input_tensor,
+                past_key_values_length=past_seen_tokens,
+                is_training=self.training,
+            ):
+                return None
+        dtype, device = input_tensor.dtype, input_tensor.device
+        sequence_length = input_tensor.shape[1]
+        if using_static_cache:
+            target_length = past_key_values.get_max_cache_shape()
+        else:
+            target_length = (
+                attention_mask.shape[-1]
+                if isinstance(attention_mask, torch.Tensor)
+                else past_seen_tokens + sequence_length + 1
+            )
+        # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
+        causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
+            attention_mask,
+            sequence_length=sequence_length,
+            target_length=target_length,
+            dtype=dtype,
+            device=device,
+            cache_position=cache_position,
+            batch_size=input_tensor.shape[0],
+        )
+        if (
+            self.config._attn_implementation == "sdpa"
+            and attention_mask is not None
+            and attention_mask.device.type in ["cuda", "xpu"]
+            and not output_attentions
+        ):
+            # Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
+            # using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
+            # Details: https://github.com/pytorch/pytorch/issues/110213
+            min_dtype = torch.finfo(dtype).min
+            causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
+        return causal_mask
+    @staticmethod
+    def _prepare_4d_causal_attention_mask_with_cache_position(
+        attention_mask: torch.Tensor,
+        sequence_length: int,
+        target_length: int,
+        dtype: torch.dtype,
+        device: torch.device,
+        cache_position: torch.Tensor,
+        batch_size: int,
+        **kwargs,
+    ):
+        """
+        Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
+        `(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
+        Args:
+            attention_mask (`torch.Tensor`):
+                A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape
+                `(batch_size, 1, query_length, key_value_length)`.
+            sequence_length (`int`):
+                The sequence length being processed.
+            target_length (`int`):
+                The target length: when generating with static cache, the mask should be as long as the static cache,
+                to account for the 0 padding, the part of the cache that is not filled yet.
+            dtype (`torch.dtype`):
+                The dtype to use for the 4D attention mask.
+            device (`torch.device`):
+                The device to place the 4D attention mask on.
+            cache_position (`torch.Tensor`):
+                Indices depicting the position of the input sequence tokens in the sequence.
+            batch_size (`torch.Tensor`):
+                Batch size.
+        """
+        if attention_mask is not None and attention_mask.dim() == 4:
+            # In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
+            causal_mask = attention_mask
+        else:
+            min_dtype = torch.finfo(dtype).min
+            causal_mask = torch.full(
+                (sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=device
+            )
+            if sequence_length != 1:
+                causal_mask = torch.triu(causal_mask, diagonal=1)
+            causal_mask *= torch.arange(target_length, device=device) > cache_position.reshape(-1, 1)
+            causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
+            if attention_mask is not None:
+                causal_mask = causal_mask.clone()  # copy to contiguous memory for in-place edit
+                mask_length = attention_mask.shape[-1]
+                padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :].to(
+                    causal_mask.device
+                )
+                padding_mask = padding_mask == 0
+                causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
+                    padding_mask, min_dtype
+                )
+        return causal_mask
+class KwargsForCausalLM(FlashAttentionKwargs, LossKwargs): ...
+class DeepseekV3ForCausalLM(DeepseekV3PreTrainedModel, GenerationMixin):
+    _tied_weights_keys = ["lm_head.weight"]
+    _tp_plan = {"lm_head": "colwise_rep"}
+    _pp_plan = {"lm_head": (["hidden_states"], ["logits"])}
+    def __init__(self, config):
+        super().__init__(config)
+        self.model = DeepseekV3Model(config)
+        self.vocab_size = config.vocab_size
+        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
+        # Initialize weights and apply final processing
+        self.post_init()
+    def get_input_embeddings(self):
+        return self.model.embed_tokens
+    def set_input_embeddings(self, value):
+        self.model.embed_tokens = value
+    def get_output_embeddings(self):
+        return self.lm_head
+    def set_output_embeddings(self, new_embeddings):
+        self.lm_head = new_embeddings
+    def set_decoder(self, decoder):
+        self.model = decoder
+    def get_decoder(self):
+        return self.model
+    @can_return_tuple
+    @deprecate_kwarg("num_logits_to_keep", version="4.50", new_name="logits_to_keep")
+    @add_start_docstrings_to_model_forward(DEEPSEEK_V3_INPUTS_DOCSTRING)
+    @replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Cache] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        logits_to_keep: Union[int, torch.Tensor] = 0,
+        **kwargs: Unpack[KwargsForCausalLM],
+    ) -> CausalLMOutputWithPast:
+        r"""
+            labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
+                Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
+                config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
+                (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
+            logits_to_keep (`int` or `torch.Tensor`, *optional*):
+                If an `int`, compute logits for the last `logits_to_keep` tokens. If `0`, calculate logits for all
+                `input_ids` (special case). Only last token logits are needed for generation, and calculating them only for that
+                token can save memory, which becomes pretty significant for long sequences or large vocabulary size.
+                If a `torch.Tensor`, must be 1D corresponding to the indices to keep in the sequence length dimension.
+                This is useful when using packed tensor format (single dimension for batch and sequence length).
+        Returns:
+        Example:
+        ```python
+        >>> from transformers import AutoTokenizer, DeepseekV3ForCausalLM
+        >>> model = DeepseekV3ForCausalLM.from_pretrained("meta-deepseek_v3/DeepseekV3-2-7b-hf")
+        >>> tokenizer = AutoTokenizer.from_pretrained("meta-deepseek_v3/DeepseekV3-2-7b-hf")
+        >>> prompt = "Hey, are you conscious? Can you talk to me?"
+        >>> inputs = tokenizer(prompt, return_tensors="pt")
+        >>> # Generate
+        >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
+        >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
+        "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
+        ```"""
+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
+        outputs: BaseModelOutputWithPast = self.model(
+            input_ids=input_ids,
+            attention_mask=attention_mask,
+            position_ids=position_ids,
+            past_key_values=past_key_values,
+            inputs_embeds=inputs_embeds,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            cache_position=cache_position,
+            **kwargs,
+        )
+        hidden_states = outputs.last_hidden_state
+        # Only compute necessary logits, and do not upcast them to float if we are not computing the loss
+        slice_indices = slice(-logits_to_keep, None) if isinstance(logits_to_keep, int) else logits_to_keep
+        logits = self.lm_head(hidden_states[:, slice_indices, :])
+        loss = None
+        if labels is not None:
+            loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **kwargs)
+        return CausalLMOutputWithPast(
+            loss=loss,
+            logits=logits,
+            past_key_values=outputs.past_key_values,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+__all__ = ["DeepseekV3PreTrainedModel", "DeepseekV3Model", "DeepseekV3ForCausalLM"]
+__all__ = ["DeepseekV3PreTrainedModel", "DeepseekV3Model", "DeepseekV3ForCausalLM"]

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|begin▁of▁sentence|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": false,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|end▁of▁sentence|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "legacy": true,
+  "model_max_length": 131072,
+  "pad_token": {
+    "__type": "AddedToken",
+    "content": "<|▁pad▁|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sp_model_kwargs": {},
+  "unk_token": null,
+  "tokenizer_class": "LlamaTokenizerFast"
+}

venv/bin/Activate.ps1 ADDED Viewed

	@@ -0,0 +1,247 @@

+<#
+.Synopsis
+Activate a Python virtual environment for the current PowerShell session.
+.Description
+Pushes the python executable for a virtual environment to the front of the
+$Env:PATH environment variable and sets the prompt to signify that you are
+in a Python virtual environment. Makes use of the command line switches as
+well as the `pyvenv.cfg` file values present in the virtual environment.
+.Parameter VenvDir
+Path to the directory that contains the virtual environment to activate. The
+default value for this is the parent of the directory that the Activate.ps1
+script is located within.
+.Parameter Prompt
+The prompt prefix to display when this virtual environment is activated. By
+default, this prompt is the name of the virtual environment folder (VenvDir)
+surrounded by parentheses and followed by a single space (ie. '(.venv) ').
+.Example
+Activate.ps1
+Activates the Python virtual environment that contains the Activate.ps1 script.
+.Example
+Activate.ps1 -Verbose
+Activates the Python virtual environment that contains the Activate.ps1 script,
+and shows extra information about the activation as it executes.
+.Example
+Activate.ps1 -VenvDir C:\Users\MyUser\Common\.venv
+Activates the Python virtual environment located in the specified location.
+.Example
+Activate.ps1 -Prompt "MyPython"
+Activates the Python virtual environment that contains the Activate.ps1 script,
+and prefixes the current prompt with the specified string (surrounded in
+parentheses) while the virtual environment is active.
+.Notes
+On Windows, it may be required to enable this Activate.ps1 script by setting the
+execution policy for the user. You can do this by issuing the following PowerShell
+command:
+PS C:\> Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
+For more information on Execution Policies:
+https://go.microsoft.com/fwlink/?LinkID=135170
+#>
+Param(
+    [Parameter(Mandatory = $false)]
+    [String]
+    $VenvDir,
+    [Parameter(Mandatory = $false)]
+    [String]
+    $Prompt
+)
+<# Function declarations --------------------------------------------------- #>
+<#
+.Synopsis
+Remove all shell session elements added by the Activate script, including the
+addition of the virtual environment's Python executable from the beginning of
+the PATH variable.
+.Parameter NonDestructive
+If present, do not remove this function from the global namespace for the
+session.
+#>
+function global:deactivate ([switch]$NonDestructive) {
+    # Revert to original values
+    # The prior prompt:
+    if (Test-Path -Path Function:_OLD_VIRTUAL_PROMPT) {
+        Copy-Item -Path Function:_OLD_VIRTUAL_PROMPT -Destination Function:prompt
+        Remove-Item -Path Function:_OLD_VIRTUAL_PROMPT
+    }
+    # The prior PYTHONHOME:
+    if (Test-Path -Path Env:_OLD_VIRTUAL_PYTHONHOME) {
+        Copy-Item -Path Env:_OLD_VIRTUAL_PYTHONHOME -Destination Env:PYTHONHOME
+        Remove-Item -Path Env:_OLD_VIRTUAL_PYTHONHOME
+    }
+    # The prior PATH:
+    if (Test-Path -Path Env:_OLD_VIRTUAL_PATH) {
+        Copy-Item -Path Env:_OLD_VIRTUAL_PATH -Destination Env:PATH
+        Remove-Item -Path Env:_OLD_VIRTUAL_PATH
+    }
+    # Just remove the VIRTUAL_ENV altogether:
+    if (Test-Path -Path Env:VIRTUAL_ENV) {
+        Remove-Item -Path env:VIRTUAL_ENV
+    }
+    # Just remove VIRTUAL_ENV_PROMPT altogether.
+    if (Test-Path -Path Env:VIRTUAL_ENV_PROMPT) {
+        Remove-Item -Path env:VIRTUAL_ENV_PROMPT
+    }
+    # Just remove the _PYTHON_VENV_PROMPT_PREFIX altogether:
+    if (Get-Variable -Name "_PYTHON_VENV_PROMPT_PREFIX" -ErrorAction SilentlyContinue) {
+        Remove-Variable -Name _PYTHON_VENV_PROMPT_PREFIX -Scope Global -Force
+    }
+    # Leave deactivate function in the global namespace if requested:
+    if (-not $NonDestructive) {
+        Remove-Item -Path function:deactivate
+    }
+}
+<#
+.Description
+Get-PyVenvConfig parses the values from the pyvenv.cfg file located in the
+given folder, and returns them in a map.
+For each line in the pyvenv.cfg file, if that line can be parsed into exactly
+two strings separated by `=` (with any amount of whitespace surrounding the =)
+then it is considered a `key = value` line. The left hand string is the key,
+the right hand is the value.
+If the value starts with a `'` or a `"` then the first and last character is
+stripped from the value before being captured.
+.Parameter ConfigDir
+Path to the directory that contains the `pyvenv.cfg` file.
+#>
+function Get-PyVenvConfig(
+    [String]
+    $ConfigDir
+) {
+    Write-Verbose "Given ConfigDir=$ConfigDir, obtain values in pyvenv.cfg"
+    # Ensure the file exists, and issue a warning if it doesn't (but still allow the function to continue).
+    $pyvenvConfigPath = Join-Path -Resolve -Path $ConfigDir -ChildPath 'pyvenv.cfg' -ErrorAction Continue
+    # An empty map will be returned if no config file is found.
+    $pyvenvConfig = @{ }
+    if ($pyvenvConfigPath) {
+        Write-Verbose "File exists, parse `key = value` lines"
+        $pyvenvConfigContent = Get-Content -Path $pyvenvConfigPath
+        $pyvenvConfigContent | ForEach-Object {
+            $keyval = $PSItem -split "\s*=\s*", 2
+            if ($keyval[0] -and $keyval[1]) {
+                $val = $keyval[1]
+                # Remove extraneous quotations around a string value.
+                if ("'""".Contains($val.Substring(0, 1))) {
+                    $val = $val.Substring(1, $val.Length - 2)
+                }
+                $pyvenvConfig[$keyval[0]] = $val
+                Write-Verbose "Adding Key: '$($keyval[0])'='$val'"
+            }
+        }
+    }
+    return $pyvenvConfig
+}
+<# Begin Activate script --------------------------------------------------- #>
+# Determine the containing directory of this script
+$VenvExecPath = Split-Path -Parent $MyInvocation.MyCommand.Definition
+$VenvExecDir = Get-Item -Path $VenvExecPath
+Write-Verbose "Activation script is located in path: '$VenvExecPath'"
+Write-Verbose "VenvExecDir Fullname: '$($VenvExecDir.FullName)"
+Write-Verbose "VenvExecDir Name: '$($VenvExecDir.Name)"
+# Set values required in priority: CmdLine, ConfigFile, Default
+# First, get the location of the virtual environment, it might not be
+# VenvExecDir if specified on the command line.
+if ($VenvDir) {
+    Write-Verbose "VenvDir given as parameter, using '$VenvDir' to determine values"
+}
+else {
+    Write-Verbose "VenvDir not given as a parameter, using parent directory name as VenvDir."
+    $VenvDir = $VenvExecDir.Parent.FullName.TrimEnd("\\/")
+    Write-Verbose "VenvDir=$VenvDir"
+}
+# Next, read the `pyvenv.cfg` file to determine any required value such
+# as `prompt`.
+$pyvenvCfg = Get-PyVenvConfig -ConfigDir $VenvDir
+# Next, set the prompt from the command line, or the config file, or
+# just use the name of the virtual environment folder.
+if ($Prompt) {
+    Write-Verbose "Prompt specified as argument, using '$Prompt'"
+}
+else {
+    Write-Verbose "Prompt not specified as argument to script, checking pyvenv.cfg value"
+    if ($pyvenvCfg -and $pyvenvCfg['prompt']) {
+        Write-Verbose "  Setting based on value in pyvenv.cfg='$($pyvenvCfg['prompt'])'"
+        $Prompt = $pyvenvCfg['prompt'];
+    }
+    else {
+        Write-Verbose "  Setting prompt based on parent's directory's name. (Is the directory name passed to venv module when creating the virtual environment)"
+        Write-Verbose "  Got leaf-name of $VenvDir='$(Split-Path -Path $venvDir -Leaf)'"
+        $Prompt = Split-Path -Path $venvDir -Leaf
+    }
+}
+Write-Verbose "Prompt = '$Prompt'"
+Write-Verbose "VenvDir='$VenvDir'"
+# Deactivate any currently active virtual environment, but leave the
+# deactivate function in place.
+deactivate -nondestructive
+# Now set the environment variable VIRTUAL_ENV, used by many tools to determine
+# that there is an activated venv.
+$env:VIRTUAL_ENV = $VenvDir
+if (-not $Env:VIRTUAL_ENV_DISABLE_PROMPT) {
+    Write-Verbose "Setting prompt to '$Prompt'"
+    # Set the prompt to include the env name
+    # Make sure _OLD_VIRTUAL_PROMPT is global
+    function global:_OLD_VIRTUAL_PROMPT { "" }
+    Copy-Item -Path function:prompt -Destination function:_OLD_VIRTUAL_PROMPT
+    New-Variable -Name _PYTHON_VENV_PROMPT_PREFIX -Description "Python virtual environment prompt prefix" -Scope Global -Option ReadOnly -Visibility Public -Value $Prompt
+    function global:prompt {
+        Write-Host -NoNewline -ForegroundColor Green "($_PYTHON_VENV_PROMPT_PREFIX) "
+        _OLD_VIRTUAL_PROMPT
+    }
+    $env:VIRTUAL_ENV_PROMPT = $Prompt
+}
+# Clear PYTHONHOME
+if (Test-Path -Path Env:PYTHONHOME) {
+    Copy-Item -Path Env:PYTHONHOME -Destination Env:_OLD_VIRTUAL_PYTHONHOME
+    Remove-Item -Path Env:PYTHONHOME
+}
+# Add the venv to the PATH
+Copy-Item -Path Env:PATH -Destination Env:_OLD_VIRTUAL_PATH
+$Env:PATH = "$VenvExecDir$([System.IO.Path]::PathSeparator)$Env:PATH"

venv/bin/activate ADDED Viewed

	@@ -0,0 +1,69 @@

+# This file must be used with "source bin/activate" *from bash*
+# you cannot run it directly
+deactivate () {
+    # reset old environment variables
+    if [ -n "${_OLD_VIRTUAL_PATH:-}" ] ; then
+        PATH="${_OLD_VIRTUAL_PATH:-}"
+        export PATH
+        unset _OLD_VIRTUAL_PATH
+    fi
+    if [ -n "${_OLD_VIRTUAL_PYTHONHOME:-}" ] ; then
+        PYTHONHOME="${_OLD_VIRTUAL_PYTHONHOME:-}"
+        export PYTHONHOME
+        unset _OLD_VIRTUAL_PYTHONHOME
+    fi
+    # This should detect bash and zsh, which have a hash command that must
+    # be called to get it to forget past commands.  Without forgetting
+    # past commands the $PATH changes we made may not be respected
+    if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
+        hash -r 2> /dev/null
+    fi
+    if [ -n "${_OLD_VIRTUAL_PS1:-}" ] ; then
+        PS1="${_OLD_VIRTUAL_PS1:-}"
+        export PS1
+        unset _OLD_VIRTUAL_PS1
+    fi
+    unset VIRTUAL_ENV
+    unset VIRTUAL_ENV_PROMPT
+    if [ ! "${1:-}" = "nondestructive" ] ; then
+    # Self destruct!
+        unset -f deactivate
+    fi
+}
+# unset irrelevant variables
+deactivate nondestructive
+VIRTUAL_ENV=/mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv
+export VIRTUAL_ENV
+_OLD_VIRTUAL_PATH="$PATH"
+PATH="$VIRTUAL_ENV/"bin":$PATH"
+export PATH
+# unset PYTHONHOME if set
+# this will fail if PYTHONHOME is set to the empty string (which is bad anyway)
+# could use `if (set -u; : $PYTHONHOME) ;` in bash
+if [ -n "${PYTHONHOME:-}" ] ; then
+    _OLD_VIRTUAL_PYTHONHOME="${PYTHONHOME:-}"
+    unset PYTHONHOME
+fi
+if [ -z "${VIRTUAL_ENV_DISABLE_PROMPT:-}" ] ; then
+    _OLD_VIRTUAL_PS1="${PS1:-}"
+    PS1='(venv) '"${PS1:-}"
+    export PS1
+    VIRTUAL_ENV_PROMPT='(venv) '
+    export VIRTUAL_ENV_PROMPT
+fi
+# This should detect bash and zsh, which have a hash command that must
+# be called to get it to forget past commands.  Without forgetting
+# past commands the $PATH changes we made may not be respected
+if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
+    hash -r 2> /dev/null
+fi

venv/bin/activate.csh ADDED Viewed

	@@ -0,0 +1,26 @@

+# This file must be used with "source bin/activate.csh" *from csh*.
+# You cannot run it directly.
+# Created by Davide Di Blasi <davidedb@gmail.com>.
+# Ported to Python 3.3 venv by Andrew Svetlov <andrew.svetlov@gmail.com>
+alias deactivate 'test $?_OLD_VIRTUAL_PATH != 0 && setenv PATH "$_OLD_VIRTUAL_PATH" && unset _OLD_VIRTUAL_PATH; rehash; test $?_OLD_VIRTUAL_PROMPT != 0 && set prompt="$_OLD_VIRTUAL_PROMPT" && unset _OLD_VIRTUAL_PROMPT; unsetenv VIRTUAL_ENV; unsetenv VIRTUAL_ENV_PROMPT; test "\!:*" != "nondestructive" && unalias deactivate'
+# Unset irrelevant variables.
+deactivate nondestructive
+setenv VIRTUAL_ENV /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv
+set _OLD_VIRTUAL_PATH="$PATH"
+setenv PATH "$VIRTUAL_ENV/"bin":$PATH"
+set _OLD_VIRTUAL_PROMPT="$prompt"
+if (! "$?VIRTUAL_ENV_DISABLE_PROMPT") then
+    set prompt = '(venv) '"$prompt"
+    setenv VIRTUAL_ENV_PROMPT '(venv) '
+endif
+alias pydoc python -m pydoc
+rehash

venv/bin/activate.fish ADDED Viewed

	@@ -0,0 +1,69 @@

+# This file must be used with "source <venv>/bin/activate.fish" *from fish*
+# (https://fishshell.com/); you cannot run it directly.
+function deactivate  -d "Exit virtual environment and return to normal shell environment"
+    # reset old environment variables
+    if test -n "$_OLD_VIRTUAL_PATH"
+        set -gx PATH $_OLD_VIRTUAL_PATH
+        set -e _OLD_VIRTUAL_PATH
+    end
+    if test -n "$_OLD_VIRTUAL_PYTHONHOME"
+        set -gx PYTHONHOME $_OLD_VIRTUAL_PYTHONHOME
+        set -e _OLD_VIRTUAL_PYTHONHOME
+    end
+    if test -n "$_OLD_FISH_PROMPT_OVERRIDE"
+        set -e _OLD_FISH_PROMPT_OVERRIDE
+        # prevents error when using nested fish instances (Issue #93858)
+        if functions -q _old_fish_prompt
+            functions -e fish_prompt
+            functions -c _old_fish_prompt fish_prompt
+            functions -e _old_fish_prompt
+        end
+    end
+    set -e VIRTUAL_ENV
+    set -e VIRTUAL_ENV_PROMPT
+    if test "$argv[1]" != "nondestructive"
+        # Self-destruct!
+        functions -e deactivate
+    end
+end
+# Unset irrelevant variables.
+deactivate nondestructive
+set -gx VIRTUAL_ENV /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv
+set -gx _OLD_VIRTUAL_PATH $PATH
+set -gx PATH "$VIRTUAL_ENV/"bin $PATH
+# Unset PYTHONHOME if set.
+if set -q PYTHONHOME
+    set -gx _OLD_VIRTUAL_PYTHONHOME $PYTHONHOME
+    set -e PYTHONHOME
+end
+if test -z "$VIRTUAL_ENV_DISABLE_PROMPT"
+    # fish uses a function instead of an env var to generate the prompt.
+    # Save the current fish_prompt function as the function _old_fish_prompt.
+    functions -c fish_prompt _old_fish_prompt
+    # With the original prompt function renamed, we can override with our own.
+    function fish_prompt
+        # Save the return status of the last command.
+        set -l old_status $status
+        # Output the venv prompt; color taken from the blue of the Python logo.
+        printf "%s%s%s" (set_color 4B8BBE) '(venv) ' (set_color normal)
+        # Restore the return status of the previous command.
+        echo "exit $old_status" | .
+        # Output the original/"old" prompt.
+        _old_fish_prompt
+    end
+    set -gx _OLD_FISH_PROMPT_OVERRIDE "$VIRTUAL_ENV"
+    set -gx VIRTUAL_ENV_PROMPT '(venv) '
+end

venv/bin/hf ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from huggingface_hub.cli.hf import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/bin/httpx ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from httpx import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/bin/markdown-it ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from markdown_it.cli.parse import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/bin/pip ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from pip._internal.cli.main import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/bin/pip3 ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from pip._internal.cli.main import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/bin/pip3.10 ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from pip._internal.cli.main import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/bin/pygmentize ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from pygments.cmdline import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/bin/tiny-agents ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from huggingface_hub.inference._mcp.cli import app
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(app())

venv/bin/tqdm ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from tqdm.cli import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/bin/typer ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/sh
+'''exec' /mnt/llm-data/users/xieshuai/codes/hf_model/omni/deepseek_40b/20260211-dpo-0210-0208-v2-dpoaddid-965-mtp-qiangzhifeisikao/fp8_model/venv/bin/python3 "$0" "$@"
+' '''
+# -*- coding: utf-8 -*-
+import re
+import sys
+from typer.cli import main
+if __name__ == '__main__':
+    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
+    sys.exit(main())

venv/lib/python3.10/site-packages/_distutils_hack/__init__.py ADDED Viewed

	@@ -0,0 +1,132 @@

+import sys
+import os
+import re
+import importlib
+import warnings
+is_pypy = '__pypy__' in sys.builtin_module_names
+warnings.filterwarnings('ignore',
+                        r'.+ distutils\b.+ deprecated',
+                        DeprecationWarning)
+def warn_distutils_present():
+    if 'distutils' not in sys.modules:
+        return
+    if is_pypy and sys.version_info < (3, 7):
+        # PyPy for 3.6 unconditionally imports distutils, so bypass the warning
+        # https://foss.heptapod.net/pypy/pypy/-/blob/be829135bc0d758997b3566062999ee8b23872b4/lib-python/3/site.py#L250
+        return
+    warnings.warn(
+        "Distutils was imported before Setuptools, but importing Setuptools "
+        "also replaces the `distutils` module in `sys.modules`. This may lead "
+        "to undesirable behaviors or errors. To avoid these issues, avoid "
+        "using distutils directly, ensure that setuptools is installed in the "
+        "traditional way (e.g. not an editable install), and/or make sure "
+        "that setuptools is always imported before distutils.")
+def clear_distutils():
+    if 'distutils' not in sys.modules:
+        return
+    warnings.warn("Setuptools is replacing distutils.")
+    mods = [name for name in sys.modules if re.match(r'distutils\b', name)]
+    for name in mods:
+        del sys.modules[name]
+def enabled():
+    """
+    Allow selection of distutils by environment variable.
+    """
+    which = os.environ.get('SETUPTOOLS_USE_DISTUTILS', 'stdlib')
+    return which == 'local'
+def ensure_local_distutils():
+    clear_distutils()
+    # With the DistutilsMetaFinder in place,
+    # perform an import to cause distutils to be
+    # loaded from setuptools._distutils. Ref #2906.
+    add_shim()
+    importlib.import_module('distutils')
+    remove_shim()
+    # check that submodules load as expected
+    core = importlib.import_module('distutils.core')
+    assert '_distutils' in core.__file__, core.__file__
+def do_override():
+    """
+    Ensure that the local copy of distutils is preferred over stdlib.
+    See https://github.com/pypa/setuptools/issues/417#issuecomment-392298401
+    for more motivation.
+    """
+    if enabled():
+        warn_distutils_present()
+        ensure_local_distutils()
+class DistutilsMetaFinder:
+    def find_spec(self, fullname, path, target=None):
+        if path is not None:
+            return
+        method_name = 'spec_for_{fullname}'.format(**locals())
+        method = getattr(self, method_name, lambda: None)
+        return method()
+    def spec_for_distutils(self):
+        import importlib.abc
+        import importlib.util
+        class DistutilsLoader(importlib.abc.Loader):
+            def create_module(self, spec):
+                return importlib.import_module('setuptools._distutils')
+            def exec_module(self, module):
+                pass
+        return importlib.util.spec_from_loader('distutils', DistutilsLoader())
+    def spec_for_pip(self):
+        """
+        Ensure stdlib distutils when running under pip.
+        See pypa/pip#8761 for rationale.
+        """
+        if self.pip_imported_during_build():
+            return
+        clear_distutils()
+        self.spec_for_distutils = lambda: None
+    @staticmethod
+    def pip_imported_during_build():
+        """
+        Detect if pip is being imported in a build script. Ref #2355.
+        """
+        import traceback
+        return any(
+            frame.f_globals['__file__'].endswith('setup.py')
+            for frame, line in traceback.walk_stack(None)
+        )
+DISTUTILS_FINDER = DistutilsMetaFinder()
+def add_shim():
+    sys.meta_path.insert(0, DISTUTILS_FINDER)
+def remove_shim():
+    try:
+        sys.meta_path.remove(DISTUTILS_FINDER)
+    except ValueError:
+        pass

venv/lib/python3.10/site-packages/_distutils_hack/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (5.21 kB). View file

venv/lib/python3.10/site-packages/_distutils_hack/__pycache__/override.cpython-310.pyc ADDED Viewed

Binary file (337 Bytes). View file

venv/lib/python3.10/site-packages/_distutils_hack/override.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ __import__('_distutils_hack').do_override()

venv/lib/python3.10/site-packages/_yaml/__init__.py ADDED Viewed

	@@ -0,0 +1,33 @@

+# This is a stub package designed to roughly emulate the _yaml
+# extension module, which previously existed as a standalone module
+# and has been moved into the `yaml` package namespace.
+# It does not perfectly mimic its old counterpart, but should get
+# close enough for anyone who's relying on it even when they shouldn't.
+import yaml
+# in some circumstances, the yaml module we imoprted may be from a different version, so we need
+# to tread carefully when poking at it here (it may not have the attributes we expect)
+if not getattr(yaml, '__with_libyaml__', False):
+    from sys import version_info
+    exc = ModuleNotFoundError if version_info >= (3, 6) else ImportError
+    raise exc("No module named '_yaml'")
+else:
+    from yaml._yaml import *
+    import warnings
+    warnings.warn(
+        'The _yaml extension module is now located at yaml._yaml'
+        ' and its location is subject to change.  To use the'
+        ' LibYAML-based parser and emitter, import from `yaml`:'
+        ' `from yaml import CLoader as Loader, CDumper as Dumper`.',
+        DeprecationWarning
+    )
+    del warnings
+    # Don't `del yaml` here because yaml is actually an existing
+    # namespace member of _yaml.
+__name__ = '_yaml'
+# If the module is top-level (i.e. not a part of any specific package)
+# then the attribute should be set to ''.
+# https://docs.python.org/3.8/library/types.html
+__package__ = ''

venv/lib/python3.10/site-packages/_yaml/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (838 Bytes). View file

venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/INSTALLER ADDED Viewed

	@@ -0,0 +1 @@


1	+ pip

venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/METADATA ADDED Viewed

	@@ -0,0 +1,145 @@

+Metadata-Version: 2.4
+Name: annotated-doc
+Version: 0.0.4
+Summary: Document parameters, class attributes, return types, and variables inline, with Annotated.
+Author-Email: =?utf-8?q?Sebasti=C3=A1n_Ram=C3=ADrez?= <tiangolo@gmail.com>
+License-Expression: MIT
+License-File: LICENSE
+Classifier: Intended Audience :: Information Technology
+Classifier: Intended Audience :: System Administrators
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python
+Classifier: Topic :: Internet
+Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Software Development :: Libraries
+Classifier: Topic :: Software Development
+Classifier: Typing :: Typed
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
+Project-URL: Homepage, https://github.com/fastapi/annotated-doc
+Project-URL: Documentation, https://github.com/fastapi/annotated-doc
+Project-URL: Repository, https://github.com/fastapi/annotated-doc
+Project-URL: Issues, https://github.com/fastapi/annotated-doc/issues
+Project-URL: Changelog, https://github.com/fastapi/annotated-doc/release-notes.md
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+# Annotated Doc
+Document parameters, class attributes, return types, and variables inline, with `Annotated`.
+<a href="https://github.com/fastapi/annotated-doc/actions?query=workflow%3ATest+event%3Apush+branch%3Amain" target="_blank">
+    <img src="https://github.com/fastapi/annotated-doc/actions/workflows/test.yml/badge.svg?event=push&branch=main" alt="Test">
+</a>
+<a href="https://coverage-badge.samuelcolvin.workers.dev/redirect/fastapi/annotated-doc" target="_blank">
+    <img src="https://coverage-badge.samuelcolvin.workers.dev/fastapi/annotated-doc.svg" alt="Coverage">
+</a>
+<a href="https://pypi.org/project/annotated-doc" target="_blank">
+    <img src="https://img.shields.io/pypi/v/annotated-doc?color=%2334D058&label=pypi%20package" alt="Package version">
+</a>
+<a href="https://pypi.org/project/annotated-doc" target="_blank">
+    <img src="https://img.shields.io/pypi/pyversions/annotated-doc.svg?color=%2334D058" alt="Supported Python versions">
+</a>
+## Installation
+```bash
+pip install annotated-doc
+```
+Or with `uv`:
+```Python
+uv add annotated-doc
+```
+## Usage
+Import `Doc` and pass a single literal string with the documentation for the specific parameter, class attribute, return type, or variable.
+For example, to document a parameter `name` in a function `hi` you could do:
+```Python
+from typing import Annotated
+from annotated_doc import Doc
+def hi(name: Annotated[str, Doc("Who to say hi to")]) -> None:
+    print(f"Hi, {name}!")
+```
+You can also use it to document class attributes:
+```Python
+from typing import Annotated
+from annotated_doc import Doc
+class User:
+    name: Annotated[str, Doc("The user's name")]
+    age: Annotated[int, Doc("The user's age")]
+```
+The same way, you could document return types and variables, or anything that could have a type annotation with `Annotated`.
+## Who Uses This
+`annotated-doc` was made for:
+* [FastAPI](https://fastapi.tiangolo.com/)
+* [Typer](https://typer.tiangolo.com/)
+* [SQLModel](https://sqlmodel.tiangolo.com/)
+* [Asyncer](https://asyncer.tiangolo.com/)
+`annotated-doc` is supported by [griffe-typingdoc](https://github.com/mkdocstrings/griffe-typingdoc), which powers reference documentation like the one in the [FastAPI Reference](https://fastapi.tiangolo.com/reference/).
+## Reasons not to use `annotated-doc`
+You are already comfortable with one of the existing docstring formats, like:
+* Sphinx
+* numpydoc
+* Google
+* Keras
+Your team is already comfortable using them.
+You prefer having the documentation about parameters all together in a docstring, separated from the code defining them.
+You care about a specific set of users, using one specific editor, and that editor already has support for the specific docstring format you use.
+## Reasons to use `annotated-doc`
+* No micro-syntax to learn for newcomers, it’s **just Python** syntax.
+* **Editing** would be already fully supported by default by any editor (current or future) supporting Python syntax, including syntax errors, syntax highlighting, etc.
+* **Rendering** would be relatively straightforward to implement by static tools (tools that don't need runtime execution), as the information can be extracted from the AST they normally already create.
+* **Deduplication of information**: the name of a parameter would be defined in a single place, not duplicated inside of a docstring.
+* **Elimination** of the possibility of having **inconsistencies** when removing a parameter or class variable and **forgetting to remove** its documentation.
+* **Minimization** of the probability of adding a new parameter or class variable and **forgetting to add its documentation**.
+* **Elimination** of the possibility of having **inconsistencies** between the **name** of a parameter in the **signature** and the name in the docstring when it is renamed.
+* **Access** to the documentation string for each symbol at **runtime**, including existing (older) Python versions.
+* A more formalized way to document other symbols, like type aliases, that could use Annotated.
+* **Support** for apps using FastAPI, Typer and others.
+* **AI Accessibility**: AI tools will have an easier way understanding each parameter as the distance from documentation to parameter is much closer.
+## History
+I ([@tiangolo](https://github.com/tiangolo)) originally wanted for this to be part of the Python standard library (in [PEP 727](https://peps.python.org/pep-0727/)), but the proposal was withdrawn as there was a fair amount of negative feedback and opposition.
+The conclusion was that this was better done as an external effort, in a third-party library.
+So, here it is, with a simpler approach, as a third-party library, in a way that can be used by others, starting with FastAPI and friends.
+## License
+This project is licensed under the terms of the MIT license.

venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/RECORD ADDED Viewed

	@@ -0,0 +1,11 @@

+annotated_doc-0.0.4.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
+annotated_doc-0.0.4.dist-info/METADATA,sha256=Irm5KJua33dY2qKKAjJ-OhKaVBVIfwFGej_dSe3Z1TU,6566
+annotated_doc-0.0.4.dist-info/RECORD,,
+annotated_doc-0.0.4.dist-info/WHEEL,sha256=9P2ygRxDrTJz3gsagc0Z96ukrxjr-LFBGOgv3AuKlCA,90
+annotated_doc-0.0.4.dist-info/entry_points.txt,sha256=6OYgBcLyFCUgeqLgnvMyOJxPCWzgy7se4rLPKtNonMs,34
+annotated_doc-0.0.4.dist-info/licenses/LICENSE,sha256=__Fwd5pqy_ZavbQFwIfxzuF4ZpHkqWpANFF-SlBKDN8,1086
+annotated_doc/__init__.py,sha256=VuyxxUe80kfEyWnOrCx_Bk8hybo3aKo6RYBlkBBYW8k,52
+annotated_doc/__pycache__/__init__.cpython-310.pyc,,
+annotated_doc/__pycache__/main.cpython-310.pyc,,
+annotated_doc/main.py,sha256=5Zfvxv80SwwLqpRW73AZyZyiM4bWma9QWRbp_cgD20s,1075
+annotated_doc/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0

venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/WHEEL ADDED Viewed

	@@ -0,0 +1,4 @@

+Wheel-Version: 1.0
+Generator: pdm-backend (2.4.5)
+Root-Is-Purelib: true
+Tag: py3-none-any

venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,4 @@


1	+ [console_scripts]
2	+
3	+ [gui_scripts]
4	+

venv/lib/python3.10/site-packages/annotated_doc-0.0.4.dist-info/licenses/LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+The MIT License (MIT)
+Copyright (c) 2025 Sebastián Ramírez
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

venv/lib/python3.10/site-packages/annotated_doc/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .main import Doc as Doc
2	+
3	+ __version__ = "0.0.4"

venv/lib/python3.10/site-packages/annotated_doc/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (342 Bytes). View file

venv/lib/python3.10/site-packages/annotated_doc/__pycache__/main.cpython-310.pyc ADDED Viewed

Binary file (1.72 kB). View file

venv/lib/python3.10/site-packages/annotated_doc/main.py ADDED Viewed

	@@ -0,0 +1,36 @@

+class Doc:
+    """Define the documentation of a type annotation using `Annotated`, to be
+        used in class attributes, function and method parameters, return values,
+        and variables.
+    The value should be a positional-only string literal to allow static tools
+    like editors and documentation generators to use it.
+    This complements docstrings.
+    The string value passed is available in the attribute `documentation`.
+    Example:
+    ```Python
+    from typing import Annotated
+    from annotated_doc import Doc
+    def hi(name: Annotated[str, Doc("Who to say hi to")]) -> None:
+        print(f"Hi, {name}!")
+    ```
+    """
+    def __init__(self, documentation: str, /) -> None:
+        self.documentation = documentation
+    def __repr__(self) -> str:
+        return f"Doc({self.documentation!r})"
+    def __hash__(self) -> int:
+        return hash(self.documentation)
+    def __eq__(self, other: object) -> bool:
+        if not isinstance(other, Doc):
+            return NotImplemented
+        return self.documentation == other.documentation

venv/lib/python3.10/site-packages/annotated_doc/py.typed ADDED Viewed

File without changes

venv/lib/python3.10/site-packages/anyio/__init__.py ADDED Viewed

	@@ -0,0 +1,111 @@

+from __future__ import annotations
+from ._core._contextmanagers import AsyncContextManagerMixin as AsyncContextManagerMixin
+from ._core._contextmanagers import ContextManagerMixin as ContextManagerMixin
+from ._core._eventloop import current_time as current_time
+from ._core._eventloop import get_all_backends as get_all_backends
+from ._core._eventloop import get_available_backends as get_available_backends
+from ._core._eventloop import get_cancelled_exc_class as get_cancelled_exc_class
+from ._core._eventloop import run as run
+from ._core._eventloop import sleep as sleep
+from ._core._eventloop import sleep_forever as sleep_forever
+from ._core._eventloop import sleep_until as sleep_until
+from ._core._exceptions import BrokenResourceError as BrokenResourceError
+from ._core._exceptions import BrokenWorkerInterpreter as BrokenWorkerInterpreter
+from ._core._exceptions import BrokenWorkerProcess as BrokenWorkerProcess
+from ._core._exceptions import BusyResourceError as BusyResourceError
+from ._core._exceptions import ClosedResourceError as ClosedResourceError
+from ._core._exceptions import ConnectionFailed as ConnectionFailed
+from ._core._exceptions import DelimiterNotFound as DelimiterNotFound
+from ._core._exceptions import EndOfStream as EndOfStream
+from ._core._exceptions import IncompleteRead as IncompleteRead
+from ._core._exceptions import NoEventLoopError as NoEventLoopError
+from ._core._exceptions import RunFinishedError as RunFinishedError
+from ._core._exceptions import TypedAttributeLookupError as TypedAttributeLookupError
+from ._core._exceptions import WouldBlock as WouldBlock
+from ._core._fileio import AsyncFile as AsyncFile
+from ._core._fileio import Path as Path
+from ._core._fileio import open_file as open_file
+from ._core._fileio import wrap_file as wrap_file
+from ._core._resources import aclose_forcefully as aclose_forcefully
+from ._core._signals import open_signal_receiver as open_signal_receiver
+from ._core._sockets import TCPConnectable as TCPConnectable
+from ._core._sockets import UNIXConnectable as UNIXConnectable
+from ._core._sockets import as_connectable as as_connectable
+from ._core._sockets import connect_tcp as connect_tcp
+from ._core._sockets import connect_unix as connect_unix
+from ._core._sockets import create_connected_udp_socket as create_connected_udp_socket
+from ._core._sockets import (
+    create_connected_unix_datagram_socket as create_connected_unix_datagram_socket,
+)
+from ._core._sockets import create_tcp_listener as create_tcp_listener
+from ._core._sockets import create_udp_socket as create_udp_socket
+from ._core._sockets import create_unix_datagram_socket as create_unix_datagram_socket
+from ._core._sockets import create_unix_listener as create_unix_listener
+from ._core._sockets import getaddrinfo as getaddrinfo
+from ._core._sockets import getnameinfo as getnameinfo
+from ._core._sockets import notify_closing as notify_closing
+from ._core._sockets import wait_readable as wait_readable
+from ._core._sockets import wait_socket_readable as wait_socket_readable
+from ._core._sockets import wait_socket_writable as wait_socket_writable
+from ._core._sockets import wait_writable as wait_writable
+from ._core._streams import create_memory_object_stream as create_memory_object_stream
+from ._core._subprocesses import open_process as open_process
+from ._core._subprocesses import run_process as run_process
+from ._core._synchronization import CapacityLimiter as CapacityLimiter
+from ._core._synchronization import (
+    CapacityLimiterStatistics as CapacityLimiterStatistics,
+)
+from ._core._synchronization import Condition as Condition
+from ._core._synchronization import ConditionStatistics as ConditionStatistics
+from ._core._synchronization import Event as Event
+from ._core._synchronization import EventStatistics as EventStatistics
+from ._core._synchronization import Lock as Lock
+from ._core._synchronization import LockStatistics as LockStatistics
+from ._core._synchronization import ResourceGuard as ResourceGuard
+from ._core._synchronization import Semaphore as Semaphore
+from ._core._synchronization import SemaphoreStatistics as SemaphoreStatistics
+from ._core._tasks import TASK_STATUS_IGNORED as TASK_STATUS_IGNORED
+from ._core._tasks import CancelScope as CancelScope
+from ._core._tasks import create_task_group as create_task_group
+from ._core._tasks import current_effective_deadline as current_effective_deadline
+from ._core._tasks import fail_after as fail_after
+from ._core._tasks import move_on_after as move_on_after
+from ._core._tempfile import NamedTemporaryFile as NamedTemporaryFile
+from ._core._tempfile import SpooledTemporaryFile as SpooledTemporaryFile
+from ._core._tempfile import TemporaryDirectory as TemporaryDirectory
+from ._core._tempfile import TemporaryFile as TemporaryFile
+from ._core._tempfile import gettempdir as gettempdir
+from ._core._tempfile import gettempdirb as gettempdirb
+from ._core._tempfile import mkdtemp as mkdtemp
+from ._core._tempfile import mkstemp as mkstemp
+from ._core._testing import TaskInfo as TaskInfo
+from ._core._testing import get_current_task as get_current_task
+from ._core._testing import get_running_tasks as get_running_tasks
+from ._core._testing import wait_all_tasks_blocked as wait_all_tasks_blocked
+from ._core._typedattr import TypedAttributeProvider as TypedAttributeProvider
+from ._core._typedattr import TypedAttributeSet as TypedAttributeSet
+from ._core._typedattr import typed_attribute as typed_attribute
+# Re-export imports so they look like they live directly in this package
+for __value in list(locals().values()):
+    if getattr(__value, "__module__", "").startswith("anyio."):
+        __value.__module__ = __name__
+del __value
+def __getattr__(attr: str) -> type[BrokenWorkerInterpreter]:
+    """Support deprecated aliases."""
+    if attr == "BrokenWorkerIntepreter":
+        import warnings
+        warnings.warn(
+            "The 'BrokenWorkerIntepreter' alias is deprecated, use 'BrokenWorkerInterpreter' instead.",
+            DeprecationWarning,
+            stacklevel=2,
+        )
+        return BrokenWorkerInterpreter
+    raise AttributeError(f"module {__name__!r} has no attribute {attr!r}")

venv/lib/python3.10/site-packages/anyio/from_thread.py ADDED Viewed

	@@ -0,0 +1,578 @@

+from __future__ import annotations
+__all__ = (
+    "BlockingPortal",
+    "BlockingPortalProvider",
+    "check_cancelled",
+    "run",
+    "run_sync",
+    "start_blocking_portal",
+)
+import sys
+from collections.abc import Awaitable, Callable, Generator
+from concurrent.futures import Future
+from contextlib import (
+    AbstractAsyncContextManager,
+    AbstractContextManager,
+    contextmanager,
+)
+from dataclasses import dataclass, field
+from functools import partial
+from inspect import isawaitable
+from threading import Lock, Thread, current_thread, get_ident
+from types import TracebackType
+from typing import (
+    Any,
+    Generic,
+    TypeVar,
+    cast,
+    overload,
+)
+from ._core._eventloop import (
+    get_cancelled_exc_class,
+    threadlocals,
+)
+from ._core._eventloop import run as run_eventloop
+from ._core._exceptions import NoEventLoopError
+from ._core._synchronization import Event
+from ._core._tasks import CancelScope, create_task_group
+from .abc._tasks import TaskStatus
+from .lowlevel import EventLoopToken, current_token
+if sys.version_info >= (3, 11):
+    from typing import TypeVarTuple, Unpack
+else:
+    from typing_extensions import TypeVarTuple, Unpack
+T_Retval = TypeVar("T_Retval")
+T_co = TypeVar("T_co", covariant=True)
+PosArgsT = TypeVarTuple("PosArgsT")
+def _token_or_error(token: EventLoopToken | None) -> EventLoopToken:
+    if token is not None:
+        return token
+    try:
+        return threadlocals.current_token
+    except AttributeError:
+        raise NoEventLoopError(
+            "Not running inside an AnyIO worker thread, and no event loop token was "
+            "provided"
+        ) from None
+def run(
+    func: Callable[[Unpack[PosArgsT]], Awaitable[T_Retval]],
+    *args: Unpack[PosArgsT],
+    token: EventLoopToken | None = None,
+) -> T_Retval:
+    """
+    Call a coroutine function from a worker thread.
+    :param func: a coroutine function
+    :param args: positional arguments for the callable
+    :param token: an event loop token to use to get back to the event loop thread
+        (required if calling this function from outside an AnyIO worker thread)
+    :return: the return value of the coroutine function
+    :raises MissingTokenError: if no token was provided and called from outside an
+        AnyIO worker thread
+    :raises RunFinishedError: if the event loop tied to ``token`` is no longer running
+    .. versionchanged:: 4.11.0
+        Added the ``token`` parameter.
+    """
+    explicit_token = token is not None
+    token = _token_or_error(token)
+    return token.backend_class.run_async_from_thread(
+        func, args, token=token.native_token if explicit_token else None
+    )
+def run_sync(
+    func: Callable[[Unpack[PosArgsT]], T_Retval],
+    *args: Unpack[PosArgsT],
+    token: EventLoopToken | None = None,
+) -> T_Retval:
+    """
+    Call a function in the event loop thread from a worker thread.
+    :param func: a callable
+    :param args: positional arguments for the callable
+    :param token: an event loop token to use to get back to the event loop thread
+        (required if calling this function from outside an AnyIO worker thread)
+    :return: the return value of the callable
+    :raises MissingTokenError: if no token was provided and called from outside an
+        AnyIO worker thread
+    :raises RunFinishedError: if the event loop tied to ``token`` is no longer running
+    .. versionchanged:: 4.11.0
+        Added the ``token`` parameter.
+    """
+    explicit_token = token is not None
+    token = _token_or_error(token)
+    return token.backend_class.run_sync_from_thread(
+        func, args, token=token.native_token if explicit_token else None
+    )
+class _BlockingAsyncContextManager(Generic[T_co], AbstractContextManager):
+    _enter_future: Future[T_co]
+    _exit_future: Future[bool | None]
+    _exit_event: Event
+    _exit_exc_info: tuple[
+        type[BaseException] | None, BaseException | None, TracebackType | None
+    ] = (None, None, None)
+    def __init__(
+        self, async_cm: AbstractAsyncContextManager[T_co], portal: BlockingPortal
+    ):
+        self._async_cm = async_cm
+        self._portal = portal
+    async def run_async_cm(self) -> bool | None:
+        try:
+            self._exit_event = Event()
+            value = await self._async_cm.__aenter__()
+        except BaseException as exc:
+            self._enter_future.set_exception(exc)
+            raise
+        else:
+            self._enter_future.set_result(value)
+        try:
+            # Wait for the sync context manager to exit.
+            # This next statement can raise `get_cancelled_exc_class()` if
+            # something went wrong in a task group in this async context
+            # manager.
+            await self._exit_event.wait()
+        finally:
+            # In case of cancellation, it could be that we end up here before
+            # `_BlockingAsyncContextManager.__exit__` is called, and an
+            # `_exit_exc_info` has been set.
+            result = await self._async_cm.__aexit__(*self._exit_exc_info)
+        return result
+    def __enter__(self) -> T_co:
+        self._enter_future = Future()
+        self._exit_future = self._portal.start_task_soon(self.run_async_cm)
+        return self._enter_future.result()
+    def __exit__(
+        self,
+        __exc_type: type[BaseException] | None,
+        __exc_value: BaseException | None,
+        __traceback: TracebackType | None,
+    ) -> bool | None:
+        self._exit_exc_info = __exc_type, __exc_value, __traceback
+        self._portal.call(self._exit_event.set)
+        return self._exit_future.result()
+class _BlockingPortalTaskStatus(TaskStatus):
+    def __init__(self, future: Future):
+        self._future = future
+    def started(self, value: object = None) -> None:
+        self._future.set_result(value)
+class BlockingPortal:
+    """
+    An object that lets external threads run code in an asynchronous event loop.
+    :raises NoEventLoopError: if no supported asynchronous event loop is running in the
+        current thread
+    """
+    def __init__(self) -> None:
+        self._token = current_token()
+        self._event_loop_thread_id: int | None = get_ident()
+        self._stop_event = Event()
+        self._task_group = create_task_group()
+    async def __aenter__(self) -> BlockingPortal:
+        await self._task_group.__aenter__()
+        return self
+    async def __aexit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> bool:
+        await self.stop()
+        return await self._task_group.__aexit__(exc_type, exc_val, exc_tb)
+    def _check_running(self) -> None:
+        if self._event_loop_thread_id is None:
+            raise RuntimeError("This portal is not running")
+        if self._event_loop_thread_id == get_ident():
+            raise RuntimeError(
+                "This method cannot be called from the event loop thread"
+            )
+    async def sleep_until_stopped(self) -> None:
+        """Sleep until :meth:`stop` is called."""
+        await self._stop_event.wait()
+    async def stop(self, cancel_remaining: bool = False) -> None:
+        """
+        Signal the portal to shut down.
+        This marks the portal as no longer accepting new calls and exits from
+        :meth:`sleep_until_stopped`.
+        :param cancel_remaining: ``True`` to cancel all the remaining tasks, ``False``
+            to let them finish before returning
+        """
+        self._event_loop_thread_id = None
+        self._stop_event.set()
+        if cancel_remaining:
+            self._task_group.cancel_scope.cancel("the blocking portal is shutting down")
+    async def _call_func(
+        self,
+        func: Callable[[Unpack[PosArgsT]], Awaitable[T_Retval] | T_Retval],
+        args: tuple[Unpack[PosArgsT]],
+        kwargs: dict[str, Any],
+        future: Future[T_Retval],
+    ) -> None:
+        def callback(f: Future[T_Retval]) -> None:
+            if f.cancelled():
+                if self._event_loop_thread_id == get_ident():
+                    scope.cancel("the future was cancelled")
+                elif self._event_loop_thread_id is not None:
+                    self.call(scope.cancel, "the future was cancelled")
+        try:
+            retval_or_awaitable = func(*args, **kwargs)
+            if isawaitable(retval_or_awaitable):
+                with CancelScope() as scope:
+                    future.add_done_callback(callback)
+                    retval = await retval_or_awaitable
+            else:
+                retval = retval_or_awaitable
+        except get_cancelled_exc_class():
+            future.cancel()
+            future.set_running_or_notify_cancel()
+        except BaseException as exc:
+            if not future.cancelled():
+                future.set_exception(exc)
+            # Let base exceptions fall through
+            if not isinstance(exc, Exception):
+                raise
+        else:
+            if not future.cancelled():
+                future.set_result(retval)
+        finally:
+            scope = None  # type: ignore[assignment]
+    def _spawn_task_from_thread(
+        self,
+        func: Callable[[Unpack[PosArgsT]], Awaitable[T_Retval] | T_Retval],
+        args: tuple[Unpack[PosArgsT]],
+        kwargs: dict[str, Any],
+        name: object,
+        future: Future[T_Retval],
+    ) -> None:
+        """
+        Spawn a new task using the given callable.
+        :param func: a callable
+        :param args: positional arguments to be passed to the callable
+        :param kwargs: keyword arguments to be passed to the callable
+        :param name: name of the task (will be coerced to a string if not ``None``)
+        :param future: a future that will resolve to the return value of the callable,
+            or the exception raised during its execution
+        """
+        run_sync(
+            partial(self._task_group.start_soon, name=name),
+            self._call_func,
+            func,
+            args,
+            kwargs,
+            future,
+            token=self._token,
+        )
+    @overload
+    def call(
+        self,
+        func: Callable[[Unpack[PosArgsT]], Awaitable[T_Retval]],
+        *args: Unpack[PosArgsT],
+    ) -> T_Retval: ...
+    @overload
+    def call(
+        self, func: Callable[[Unpack[PosArgsT]], T_Retval], *args: Unpack[PosArgsT]
+    ) -> T_Retval: ...
+    def call(
+        self,
+        func: Callable[[Unpack[PosArgsT]], Awaitable[T_Retval] | T_Retval],
+        *args: Unpack[PosArgsT],
+    ) -> T_Retval:
+        """
+        Call the given function in the event loop thread.
+        If the callable returns a coroutine object, it is awaited on.
+        :param func: any callable
+        :raises RuntimeError: if the portal is not running or if this method is called
+            from within the event loop thread
+        """
+        return cast(T_Retval, self.start_task_soon(func, *args).result())
+    @overload
+    def start_task_soon(
+        self,
+        func: Callable[[Unpack[PosArgsT]], Awaitable[T_Retval]],
+        *args: Unpack[PosArgsT],
+        name: object = None,
+    ) -> Future[T_Retval]: ...
+    @overload
+    def start_task_soon(
+        self,
+        func: Callable[[Unpack[PosArgsT]], T_Retval],
+        *args: Unpack[PosArgsT],
+        name: object = None,
+    ) -> Future[T_Retval]: ...
+    def start_task_soon(
+        self,
+        func: Callable[[Unpack[PosArgsT]], Awaitable[T_Retval] | T_Retval],
+        *args: Unpack[PosArgsT],
+        name: object = None,
+    ) -> Future[T_Retval]:
+        """
+        Start a task in the portal's task group.
+        The task will be run inside a cancel scope which can be cancelled by cancelling
+        the returned future.
+        :param func: the target function
+        :param args: positional arguments passed to ``func``
+        :param name: name of the task (will be coerced to a string if not ``None``)
+        :return: a future that resolves with the return value of the callable if the
+            task completes successfully, or with the exception raised in the task
+        :raises RuntimeError: if the portal is not running or if this method is called
+            from within the event loop thread
+        :rtype: concurrent.futures.Future[T_Retval]
+        .. versionadded:: 3.0
+        """
+        self._check_running()
+        f: Future[T_Retval] = Future()
+        self._spawn_task_from_thread(func, args, {}, name, f)
+        return f
+    def start_task(
+        self,
+        func: Callable[..., Awaitable[T_Retval]],
+        *args: object,
+        name: object = None,
+    ) -> tuple[Future[T_Retval], Any]:
+        """
+        Start a task in the portal's task group and wait until it signals for readiness.
+        This method works the same way as :meth:`.abc.TaskGroup.start`.
+        :param func: the target function
+        :param args: positional arguments passed to ``func``
+        :param name: name of the task (will be coerced to a string if not ``None``)
+        :return: a tuple of (future, task_status_value) where the ``task_status_value``
+            is the value passed to ``task_status.started()`` from within the target
+            function
+        :rtype: tuple[concurrent.futures.Future[T_Retval], Any]
+        .. versionadded:: 3.0
+        """
+        def task_done(future: Future[T_Retval]) -> None:
+            if not task_status_future.done():
+                if future.cancelled():
+                    task_status_future.cancel()
+                elif future.exception():
+                    task_status_future.set_exception(future.exception())
+                else:
+                    exc = RuntimeError(
+                        "Task exited without calling task_status.started()"
+                    )
+                    task_status_future.set_exception(exc)
+        self._check_running()
+        task_status_future: Future = Future()
+        task_status = _BlockingPortalTaskStatus(task_status_future)
+        f: Future = Future()
+        f.add_done_callback(task_done)
+        self._spawn_task_from_thread(func, args, {"task_status": task_status}, name, f)
+        return f, task_status_future.result()
+    def wrap_async_context_manager(
+        self, cm: AbstractAsyncContextManager[T_co]
+    ) -> AbstractContextManager[T_co]:
+        """
+        Wrap an async context manager as a synchronous context manager via this portal.
+        Spawns a task that will call both ``__aenter__()`` and ``__aexit__()``, stopping
+        in the middle until the synchronous context manager exits.
+        :param cm: an asynchronous context manager
+        :return: a synchronous context manager
+        .. versionadded:: 2.1
+        """
+        return _BlockingAsyncContextManager(cm, self)
+@dataclass
+class BlockingPortalProvider:
+    """
+    A manager for a blocking portal. Used as a context manager. The first thread to
+    enter this context manager causes a blocking portal to be started with the specific
+    parameters, and the last thread to exit causes the portal to be shut down. Thus,
+    there will be exactly one blocking portal running in this context as long as at
+    least one thread has entered this context manager.
+    The parameters are the same as for :func:`~anyio.run`.
+    :param backend: name of the backend
+    :param backend_options: backend options
+    .. versionadded:: 4.4
+    """
+    backend: str = "asyncio"
+    backend_options: dict[str, Any] | None = None
+    _lock: Lock = field(init=False, default_factory=Lock)
+    _leases: int = field(init=False, default=0)
+    _portal: BlockingPortal = field(init=False)
+    _portal_cm: AbstractContextManager[BlockingPortal] | None = field(
+        init=False, default=None
+    )
+    def __enter__(self) -> BlockingPortal:
+        with self._lock:
+            if self._portal_cm is None:
+                self._portal_cm = start_blocking_portal(
+                    self.backend, self.backend_options
+                )
+                self._portal = self._portal_cm.__enter__()
+            self._leases += 1
+            return self._portal
+    def __exit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> None:
+        portal_cm: AbstractContextManager[BlockingPortal] | None = None
+        with self._lock:
+            assert self._portal_cm
+            assert self._leases > 0
+            self._leases -= 1
+            if not self._leases:
+                portal_cm = self._portal_cm
+                self._portal_cm = None
+                del self._portal
+        if portal_cm:
+            portal_cm.__exit__(None, None, None)
+@contextmanager
+def start_blocking_portal(
+    backend: str = "asyncio",
+    backend_options: dict[str, Any] | None = None,
+    *,
+    name: str | None = None,
+) -> Generator[BlockingPortal, Any, None]:
+    """
+    Start a new event loop in a new thread and run a blocking portal in its main task.
+    The parameters are the same as for :func:`~anyio.run`.
+    :param backend: name of the backend
+    :param backend_options: backend options
+    :param name: name of the thread
+    :return: a context manager that yields a blocking portal
+    .. versionchanged:: 3.0
+        Usage as a context manager is now required.
+    """
+    async def run_portal() -> None:
+        async with BlockingPortal() as portal_:
+            if name is None:
+                current_thread().name = f"{backend}-portal-{id(portal_):x}"
+            future.set_result(portal_)
+            await portal_.sleep_until_stopped()
+    def run_blocking_portal() -> None:
+        if future.set_running_or_notify_cancel():
+            try:
+                run_eventloop(
+                    run_portal, backend=backend, backend_options=backend_options
+                )
+            except BaseException as exc:
+                if not future.done():
+                    future.set_exception(exc)
+    future: Future[BlockingPortal] = Future()
+    thread = Thread(target=run_blocking_portal, daemon=True, name=name)
+    thread.start()
+    try:
+        cancel_remaining_tasks = False
+        portal = future.result()
+        try:
+            yield portal
+        except BaseException:
+            cancel_remaining_tasks = True
+            raise
+        finally:
+            try:
+                portal.call(portal.stop, cancel_remaining_tasks)
+            except RuntimeError:
+                pass
+    finally:
+        thread.join()
+def check_cancelled() -> None:
+    """
+    Check if the cancel scope of the host task's running the current worker thread has
+    been cancelled.
+    If the host task's current cancel scope has indeed been cancelled, the
+    backend-specific cancellation exception will be raised.
+    :raises RuntimeError: if the current thread was not spawned by
+        :func:`.to_thread.run_sync`
+    """
+    try:
+        token: EventLoopToken = threadlocals.current_token
+    except AttributeError:
+        raise NoEventLoopError(
+            "This function can only be called inside an AnyIO worker thread"
+        ) from None
+    token.backend_class.check_cancelled()

venv/lib/python3.10/site-packages/anyio/functools.py ADDED Viewed

	@@ -0,0 +1,375 @@

+from __future__ import annotations
+__all__ = (
+    "AsyncCacheInfo",
+    "AsyncCacheParameters",
+    "AsyncLRUCacheWrapper",
+    "cache",
+    "lru_cache",
+    "reduce",
+)
+import functools
+import sys
+from collections import OrderedDict
+from collections.abc import (
+    AsyncIterable,
+    Awaitable,
+    Callable,
+    Coroutine,
+    Hashable,
+    Iterable,
+)
+from functools import update_wrapper
+from inspect import iscoroutinefunction
+from typing import (
+    Any,
+    Generic,
+    NamedTuple,
+    TypedDict,
+    TypeVar,
+    cast,
+    final,
+    overload,
+)
+from weakref import WeakKeyDictionary
+from ._core._synchronization import Lock
+from .lowlevel import RunVar, checkpoint
+if sys.version_info >= (3, 11):
+    from typing import ParamSpec
+else:
+    from typing_extensions import ParamSpec
+T = TypeVar("T")
+S = TypeVar("S")
+P = ParamSpec("P")
+lru_cache_items: RunVar[
+    WeakKeyDictionary[
+        AsyncLRUCacheWrapper[Any, Any],
+        OrderedDict[Hashable, tuple[_InitialMissingType, Lock] | tuple[Any, None]],
+    ]
+] = RunVar("lru_cache_items")
+class _InitialMissingType:
+    pass
+initial_missing: _InitialMissingType = _InitialMissingType()
+class AsyncCacheInfo(NamedTuple):
+    hits: int
+    misses: int
+    maxsize: int | None
+    currsize: int
+class AsyncCacheParameters(TypedDict):
+    maxsize: int | None
+    typed: bool
+    always_checkpoint: bool
+class _LRUMethodWrapper(Generic[T]):
+    def __init__(self, wrapper: AsyncLRUCacheWrapper[..., T], instance: object):
+        self.__wrapper = wrapper
+        self.__instance = instance
+    def cache_info(self) -> AsyncCacheInfo:
+        return self.__wrapper.cache_info()
+    def cache_parameters(self) -> AsyncCacheParameters:
+        return self.__wrapper.cache_parameters()
+    def cache_clear(self) -> None:
+        self.__wrapper.cache_clear()
+    async def __call__(self, *args: Any, **kwargs: Any) -> T:
+        if self.__instance is None:
+            return await self.__wrapper(*args, **kwargs)
+        return await self.__wrapper(self.__instance, *args, **kwargs)
+@final
+class AsyncLRUCacheWrapper(Generic[P, T]):
+    def __init__(
+        self,
+        func: Callable[P, Awaitable[T]],
+        maxsize: int | None,
+        typed: bool,
+        always_checkpoint: bool,
+    ):
+        self.__wrapped__ = func
+        self._hits: int = 0
+        self._misses: int = 0
+        self._maxsize = max(maxsize, 0) if maxsize is not None else None
+        self._currsize: int = 0
+        self._typed = typed
+        self._always_checkpoint = always_checkpoint
+        update_wrapper(self, func)
+    def cache_info(self) -> AsyncCacheInfo:
+        return AsyncCacheInfo(self._hits, self._misses, self._maxsize, self._currsize)
+    def cache_parameters(self) -> AsyncCacheParameters:
+        return {
+            "maxsize": self._maxsize,
+            "typed": self._typed,
+            "always_checkpoint": self._always_checkpoint,
+        }
+    def cache_clear(self) -> None:
+        if cache := lru_cache_items.get(None):
+            cache.pop(self, None)
+            self._hits = self._misses = self._currsize = 0
+    async def __call__(self, *args: P.args, **kwargs: P.kwargs) -> T:
+        # Easy case first: if maxsize == 0, no caching is done
+        if self._maxsize == 0:
+            value = await self.__wrapped__(*args, **kwargs)
+            self._misses += 1
+            return value
+        # The key is constructed as a flat tuple to avoid memory overhead
+        key: tuple[Any, ...] = args
+        if kwargs:
+            # initial_missing is used as a separator
+            key += (initial_missing,) + sum(kwargs.items(), ())
+        if self._typed:
+            key += tuple(type(arg) for arg in args)
+            if kwargs:
+                key += (initial_missing,) + tuple(type(val) for val in kwargs.values())
+        try:
+            cache = lru_cache_items.get()
+        except LookupError:
+            cache = WeakKeyDictionary()
+            lru_cache_items.set(cache)
+        try:
+            cache_entry = cache[self]
+        except KeyError:
+            cache_entry = cache[self] = OrderedDict()
+        cached_value: T | _InitialMissingType
+        try:
+            cached_value, lock = cache_entry[key]
+        except KeyError:
+            # We're the first task to call this function
+            cached_value, lock = (
+                initial_missing,
+                Lock(fast_acquire=not self._always_checkpoint),
+            )
+            cache_entry[key] = cached_value, lock
+        if lock is None:
+            # The value was already cached
+            self._hits += 1
+            cache_entry.move_to_end(key)
+            if self._always_checkpoint:
+                await checkpoint()
+            return cast(T, cached_value)
+        async with lock:
+            # Check if another task filled the cache while we acquired the lock
+            if (cached_value := cache_entry[key][0]) is initial_missing:
+                self._misses += 1
+                if self._maxsize is not None and self._currsize >= self._maxsize:
+                    cache_entry.popitem(last=False)
+                else:
+                    self._currsize += 1
+                value = await self.__wrapped__(*args, **kwargs)
+                cache_entry[key] = value, None
+            else:
+                # Another task filled the cache while we were waiting for the lock
+                self._hits += 1
+                cache_entry.move_to_end(key)
+                value = cast(T, cached_value)
+        return value
+    def __get__(
+        self, instance: object, owner: type | None = None
+    ) -> _LRUMethodWrapper[T]:
+        wrapper = _LRUMethodWrapper(self, instance)
+        update_wrapper(wrapper, self.__wrapped__)
+        return wrapper
+class _LRUCacheWrapper(Generic[T]):
+    def __init__(self, maxsize: int | None, typed: bool, always_checkpoint: bool):
+        self._maxsize = maxsize
+        self._typed = typed
+        self._always_checkpoint = always_checkpoint
+    @overload
+    def __call__(  # type: ignore[overload-overlap]
+        self, func: Callable[P, Coroutine[Any, Any, T]], /
+    ) -> AsyncLRUCacheWrapper[P, T]: ...
+    @overload
+    def __call__(
+        self, func: Callable[..., T], /
+    ) -> functools._lru_cache_wrapper[T]: ...
+    def __call__(
+        self, f: Callable[P, Coroutine[Any, Any, T]] | Callable[..., T], /
+    ) -> AsyncLRUCacheWrapper[P, T] | functools._lru_cache_wrapper[T]:
+        if iscoroutinefunction(f):
+            return AsyncLRUCacheWrapper(
+                f, self._maxsize, self._typed, self._always_checkpoint
+            )
+        return functools.lru_cache(maxsize=self._maxsize, typed=self._typed)(f)  # type: ignore[arg-type]
+@overload
+def cache(  # type: ignore[overload-overlap]
+    func: Callable[P, Coroutine[Any, Any, T]], /
+) -> AsyncLRUCacheWrapper[P, T]: ...
+@overload
+def cache(func: Callable[..., T], /) -> functools._lru_cache_wrapper[T]: ...
+def cache(
+    func: Callable[..., T] | Callable[P, Coroutine[Any, Any, T]], /
+) -> AsyncLRUCacheWrapper[P, T] | functools._lru_cache_wrapper[T]:
+    """
+    A convenient shortcut for :func:`lru_cache` with ``maxsize=None``.
+    This is the asynchronous equivalent to :func:`functools.cache`.
+    """
+    return lru_cache(maxsize=None)(func)
+@overload
+def lru_cache(
+    *, maxsize: int | None = ..., typed: bool = ..., always_checkpoint: bool = ...
+) -> _LRUCacheWrapper[Any]: ...
+@overload
+def lru_cache(  # type: ignore[overload-overlap]
+    func: Callable[P, Coroutine[Any, Any, T]], /
+) -> AsyncLRUCacheWrapper[P, T]: ...
+@overload
+def lru_cache(func: Callable[..., T], /) -> functools._lru_cache_wrapper[T]: ...
+def lru_cache(
+    func: Callable[P, Coroutine[Any, Any, T]] | Callable[..., T] | None = None,
+    /,
+    *,
+    maxsize: int | None = 128,
+    typed: bool = False,
+    always_checkpoint: bool = False,
+) -> (
+    AsyncLRUCacheWrapper[P, T] | functools._lru_cache_wrapper[T] | _LRUCacheWrapper[Any]
+):
+    """
+    An asynchronous version of :func:`functools.lru_cache`.
+    If a synchronous function is passed, the standard library
+    :func:`functools.lru_cache` is applied instead.
+    :param always_checkpoint: if ``True``, every call to the cached function will be
+        guaranteed to yield control to the event loop at least once
+    .. note:: Caches and locks are managed on a per-event loop basis.
+    """
+    if func is None:
+        return _LRUCacheWrapper[Any](maxsize, typed, always_checkpoint)
+    if not callable(func):
+        raise TypeError("the first argument must be callable")
+    return _LRUCacheWrapper[T](maxsize, typed, always_checkpoint)(func)
+@overload
+async def reduce(
+    function: Callable[[T, S], Awaitable[T]],
+    iterable: Iterable[S] | AsyncIterable[S],
+    /,
+    initial: T,
+) -> T: ...
+@overload
+async def reduce(
+    function: Callable[[T, T], Awaitable[T]],
+    iterable: Iterable[T] | AsyncIterable[T],
+    /,
+) -> T: ...
+async def reduce(  # type: ignore[misc]
+    function: Callable[[T, T], Awaitable[T]] | Callable[[T, S], Awaitable[T]],
+    iterable: Iterable[T] | Iterable[S] | AsyncIterable[T] | AsyncIterable[S],
+    /,
+    initial: T | _InitialMissingType = initial_missing,
+) -> T:
+    """
+    Asynchronous version of :func:`functools.reduce`.
+    :param function: a coroutine function that takes two arguments: the accumulated
+        value and the next element from the iterable
+    :param iterable: an iterable or async iterable
+    :param initial: the initial value (if missing, the first element of the iterable is
+        used as the initial value)
+    """
+    element: Any
+    function_called = False
+    if isinstance(iterable, AsyncIterable):
+        async_it = iterable.__aiter__()
+        if initial is initial_missing:
+            try:
+                value = cast(T, await async_it.__anext__())
+            except StopAsyncIteration:
+                raise TypeError(
+                    "reduce() of empty sequence with no initial value"
+                ) from None
+        else:
+            value = cast(T, initial)
+        async for element in async_it:
+            value = await function(value, element)
+            function_called = True
+    elif isinstance(iterable, Iterable):
+        it = iter(iterable)
+        if initial is initial_missing:
+            try:
+                value = cast(T, next(it))
+            except StopIteration:
+                raise TypeError(
+                    "reduce() of empty sequence with no initial value"
+                ) from None
+        else:
+            value = cast(T, initial)
+        for element in it:
+            value = await function(value, element)
+            function_called = True
+    else:
+        raise TypeError("reduce() argument 2 must be an iterable or async iterable")
+    # Make sure there is at least one checkpoint, even if an empty iterable and an
+    # initial value were given
+    if not function_called:
+        await checkpoint()
+    return value

venv/lib/python3.10/site-packages/anyio/lowlevel.py ADDED Viewed

	@@ -0,0 +1,196 @@

+from __future__ import annotations
+__all__ = (
+    "EventLoopToken",
+    "RunvarToken",
+    "RunVar",
+    "checkpoint",
+    "checkpoint_if_cancelled",
+    "cancel_shielded_checkpoint",
+    "current_token",
+)
+import enum
+from dataclasses import dataclass
+from types import TracebackType
+from typing import Any, Generic, Literal, TypeVar, final, overload
+from weakref import WeakKeyDictionary
+from ._core._eventloop import get_async_backend
+from .abc import AsyncBackend
+T = TypeVar("T")
+D = TypeVar("D")
+async def checkpoint() -> None:
+    """
+    Check for cancellation and allow the scheduler to switch to another task.
+    Equivalent to (but more efficient than)::
+        await checkpoint_if_cancelled()
+        await cancel_shielded_checkpoint()
+    .. versionadded:: 3.0
+    """
+    await get_async_backend().checkpoint()
+async def checkpoint_if_cancelled() -> None:
+    """
+    Enter a checkpoint if the enclosing cancel scope has been cancelled.
+    This does not allow the scheduler to switch to a different task.
+    .. versionadded:: 3.0
+    """
+    await get_async_backend().checkpoint_if_cancelled()
+async def cancel_shielded_checkpoint() -> None:
+    """
+    Allow the scheduler to switch to another task but without checking for cancellation.
+    Equivalent to (but potentially more efficient than)::
+        with CancelScope(shield=True):
+            await checkpoint()
+    .. versionadded:: 3.0
+    """
+    await get_async_backend().cancel_shielded_checkpoint()
+@final
+@dataclass(frozen=True, repr=False)
+class EventLoopToken:
+    """
+    An opaque object that holds a reference to an event loop.
+    .. versionadded:: 4.11.0
+    """
+    backend_class: type[AsyncBackend]
+    native_token: object
+def current_token() -> EventLoopToken:
+    """
+    Return a token object that can be used to call code in the current event loop from
+    another thread.
+    :raises NoEventLoopError: if no supported asynchronous event loop is running in the
+        current thread
+    .. versionadded:: 4.11.0
+    """
+    backend_class = get_async_backend()
+    raw_token = backend_class.current_token()
+    return EventLoopToken(backend_class, raw_token)
+_run_vars: WeakKeyDictionary[object, dict[RunVar[Any], Any]] = WeakKeyDictionary()
+class _NoValueSet(enum.Enum):
+    NO_VALUE_SET = enum.auto()
+class RunvarToken(Generic[T]):
+    __slots__ = "_var", "_value", "_redeemed"
+    def __init__(self, var: RunVar[T], value: T | Literal[_NoValueSet.NO_VALUE_SET]):
+        self._var = var
+        self._value: T | Literal[_NoValueSet.NO_VALUE_SET] = value
+        self._redeemed = False
+    def __enter__(self) -> RunvarToken[T]:
+        return self
+    def __exit__(
+        self,
+        exc_type: type[BaseException] | None,
+        exc_val: BaseException | None,
+        exc_tb: TracebackType | None,
+    ) -> None:
+        self._var.reset(self)
+class RunVar(Generic[T]):
+    """
+    Like a :class:`~contextvars.ContextVar`, except scoped to the running event loop.
+    Can be used as a context manager, Just like :class:`~contextvars.ContextVar`, that
+    will reset the variable to its previous value when the context block is exited.
+    """
+    __slots__ = "_name", "_default"
+    NO_VALUE_SET: Literal[_NoValueSet.NO_VALUE_SET] = _NoValueSet.NO_VALUE_SET
+    def __init__(
+        self, name: str, default: T | Literal[_NoValueSet.NO_VALUE_SET] = NO_VALUE_SET
+    ):
+        self._name = name
+        self._default = default
+    @property
+    def _current_vars(self) -> dict[RunVar[T], T]:
+        native_token = current_token().native_token
+        try:
+            return _run_vars[native_token]
+        except KeyError:
+            run_vars = _run_vars[native_token] = {}
+            return run_vars
+    @overload
+    def get(self, default: D) -> T | D: ...
+    @overload
+    def get(self) -> T: ...
+    def get(
+        self, default: D | Literal[_NoValueSet.NO_VALUE_SET] = NO_VALUE_SET
+    ) -> T | D:
+        try:
+            return self._current_vars[self]
+        except KeyError:
+            if default is not RunVar.NO_VALUE_SET:
+                return default
+            elif self._default is not RunVar.NO_VALUE_SET:
+                return self._default
+        raise LookupError(
+            f'Run variable "{self._name}" has no value and no default set'
+        )
+    def set(self, value: T) -> RunvarToken[T]:
+        current_vars = self._current_vars
+        token = RunvarToken(self, current_vars.get(self, RunVar.NO_VALUE_SET))
+        current_vars[self] = value
+        return token
+    def reset(self, token: RunvarToken[T]) -> None:
+        if token._var is not self:
+            raise ValueError("This token does not belong to this RunVar")
+        if token._redeemed:
+            raise ValueError("This token has already been used")
+        if token._value is _NoValueSet.NO_VALUE_SET:
+            try:
+                del self._current_vars[self]
+            except KeyError:
+                pass
+        else:
+            self._current_vars[self] = token._value
+        token._redeemed = True
+    def __repr__(self) -> str:
+        return f"<RunVar name={self._name!r}>"

venv/lib/python3.10/site-packages/anyio/py.typed ADDED Viewed

File without changes

venv/lib/python3.10/site-packages/anyio/pytest_plugin.py ADDED Viewed

	@@ -0,0 +1,302 @@

+from __future__ import annotations
+import socket
+import sys
+from collections.abc import Callable, Generator, Iterator
+from contextlib import ExitStack, contextmanager
+from inspect import isasyncgenfunction, iscoroutinefunction, ismethod
+from typing import Any, cast
+import pytest
+from _pytest.fixtures import SubRequest
+from _pytest.outcomes import Exit
+from . import get_available_backends
+from ._core._eventloop import (
+    current_async_library,
+    get_async_backend,
+    reset_current_async_library,
+    set_current_async_library,
+)
+from ._core._exceptions import iterate_exceptions
+from .abc import TestRunner
+if sys.version_info < (3, 11):
+    from exceptiongroup import ExceptionGroup
+_current_runner: TestRunner | None = None
+_runner_stack: ExitStack | None = None
+_runner_leases = 0
+def extract_backend_and_options(backend: object) -> tuple[str, dict[str, Any]]:
+    if isinstance(backend, str):
+        return backend, {}
+    elif isinstance(backend, tuple) and len(backend) == 2:
+        if isinstance(backend[0], str) and isinstance(backend[1], dict):
+            return cast(tuple[str, dict[str, Any]], backend)
+    raise TypeError("anyio_backend must be either a string or tuple of (string, dict)")
+@contextmanager
+def get_runner(
+    backend_name: str, backend_options: dict[str, Any]
+) -> Iterator[TestRunner]:
+    global _current_runner, _runner_leases, _runner_stack
+    if _current_runner is None:
+        asynclib = get_async_backend(backend_name)
+        _runner_stack = ExitStack()
+        if current_async_library() is None:
+            # Since we're in control of the event loop, we can cache the name of the
+            # async library
+            token = set_current_async_library(backend_name)
+            _runner_stack.callback(reset_current_async_library, token)
+        backend_options = backend_options or {}
+        _current_runner = _runner_stack.enter_context(
+            asynclib.create_test_runner(backend_options)
+        )
+    _runner_leases += 1
+    try:
+        yield _current_runner
+    finally:
+        _runner_leases -= 1
+        if not _runner_leases:
+            assert _runner_stack is not None
+            _runner_stack.close()
+            _runner_stack = _current_runner = None
+def pytest_addoption(parser: pytest.Parser) -> None:
+    parser.addini(
+        "anyio_mode",
+        default="strict",
+        help='AnyIO plugin mode (either "strict" or "auto")',
+    )
+def pytest_configure(config: pytest.Config) -> None:
+    config.addinivalue_line(
+        "markers",
+        "anyio: mark the (coroutine function) test to be run asynchronously via anyio.",
+    )
+    if (
+        config.getini("anyio_mode") == "auto"
+        and config.pluginmanager.has_plugin("asyncio")
+        and config.getini("asyncio_mode") == "auto"
+    ):
+        config.issue_config_time_warning(
+            pytest.PytestConfigWarning(
+                "AnyIO auto mode has been enabled together with pytest-asyncio auto "
+                "mode. This may cause unexpected behavior."
+            ),
+            1,
+        )
+@pytest.hookimpl(hookwrapper=True)
+def pytest_fixture_setup(fixturedef: Any, request: Any) -> Generator[Any]:
+    def wrapper(anyio_backend: Any, request: SubRequest, **kwargs: Any) -> Any:
+        # Rebind any fixture methods to the request instance
+        if (
+            request.instance
+            and ismethod(func)
+            and type(func.__self__) is type(request.instance)
+        ):
+            local_func = func.__func__.__get__(request.instance)
+        else:
+            local_func = func
+        backend_name, backend_options = extract_backend_and_options(anyio_backend)
+        if has_backend_arg:
+            kwargs["anyio_backend"] = anyio_backend
+        if has_request_arg:
+            kwargs["request"] = request
+        with get_runner(backend_name, backend_options) as runner:
+            if isasyncgenfunction(local_func):
+                yield from runner.run_asyncgen_fixture(local_func, kwargs)
+            else:
+                yield runner.run_fixture(local_func, kwargs)
+    # Only apply this to coroutine functions and async generator functions in requests
+    # that involve the anyio_backend fixture
+    func = fixturedef.func
+    if isasyncgenfunction(func) or iscoroutinefunction(func):
+        if "anyio_backend" in request.fixturenames:
+            fixturedef.func = wrapper
+            original_argname = fixturedef.argnames
+            if not (has_backend_arg := "anyio_backend" in fixturedef.argnames):
+                fixturedef.argnames += ("anyio_backend",)
+            if not (has_request_arg := "request" in fixturedef.argnames):
+                fixturedef.argnames += ("request",)
+            try:
+                return (yield)
+            finally:
+                fixturedef.func = func
+                fixturedef.argnames = original_argname
+    return (yield)
+@pytest.hookimpl(tryfirst=True)
+def pytest_pycollect_makeitem(
+    collector: pytest.Module | pytest.Class, name: str, obj: object
+) -> None:
+    if collector.istestfunction(obj, name):
+        inner_func = obj.hypothesis.inner_test if hasattr(obj, "hypothesis") else obj
+        if iscoroutinefunction(inner_func):
+            anyio_auto_mode = collector.config.getini("anyio_mode") == "auto"
+            marker = collector.get_closest_marker("anyio")
+            own_markers = getattr(obj, "pytestmark", ())
+            if (
+                anyio_auto_mode
+                or marker
+                or any(marker.name == "anyio" for marker in own_markers)
+            ):
+                pytest.mark.usefixtures("anyio_backend")(obj)
+@pytest.hookimpl(tryfirst=True)
+def pytest_pyfunc_call(pyfuncitem: Any) -> bool | None:
+    def run_with_hypothesis(**kwargs: Any) -> None:
+        with get_runner(backend_name, backend_options) as runner:
+            runner.run_test(original_func, kwargs)
+    backend = pyfuncitem.funcargs.get("anyio_backend")
+    if backend:
+        backend_name, backend_options = extract_backend_and_options(backend)
+        if hasattr(pyfuncitem.obj, "hypothesis"):
+            # Wrap the inner test function unless it's already wrapped
+            original_func = pyfuncitem.obj.hypothesis.inner_test
+            if original_func.__qualname__ != run_with_hypothesis.__qualname__:
+                if iscoroutinefunction(original_func):
+                    pyfuncitem.obj.hypothesis.inner_test = run_with_hypothesis
+            return None
+        if iscoroutinefunction(pyfuncitem.obj):
+            funcargs = pyfuncitem.funcargs
+            testargs = {arg: funcargs[arg] for arg in pyfuncitem._fixtureinfo.argnames}
+            with get_runner(backend_name, backend_options) as runner:
+                try:
+                    runner.run_test(pyfuncitem.obj, testargs)
+                except ExceptionGroup as excgrp:
+                    for exc in iterate_exceptions(excgrp):
+                        if isinstance(exc, (Exit, KeyboardInterrupt, SystemExit)):
+                            raise exc from excgrp
+                    raise
+            return True
+    return None
+@pytest.fixture(scope="module", params=get_available_backends())
+def anyio_backend(request: Any) -> Any:
+    return request.param
+@pytest.fixture
+def anyio_backend_name(anyio_backend: Any) -> str:
+    if isinstance(anyio_backend, str):
+        return anyio_backend
+    else:
+        return anyio_backend[0]
+@pytest.fixture
+def anyio_backend_options(anyio_backend: Any) -> dict[str, Any]:
+    if isinstance(anyio_backend, str):
+        return {}
+    else:
+        return anyio_backend[1]
+class FreePortFactory:
+    """
+    Manages port generation based on specified socket kind, ensuring no duplicate
+    ports are generated.
+    This class provides functionality for generating available free ports on the
+    system. It is initialized with a specific socket kind and can generate ports
+    for given address families while avoiding reuse of previously generated ports.
+    Users should not instantiate this class directly, but use the
+    ``free_tcp_port_factory`` and ``free_udp_port_factory`` fixtures instead. For simple
+    uses cases, ``free_tcp_port`` and ``free_udp_port`` can be used instead.
+    """
+    def __init__(self, kind: socket.SocketKind) -> None:
+        self._kind = kind
+        self._generated = set[int]()
+    @property
+    def kind(self) -> socket.SocketKind:
+        """
+        The type of socket connection (e.g., :data:`~socket.SOCK_STREAM` or
+        :data:`~socket.SOCK_DGRAM`) used to bind for checking port availability
+        """
+        return self._kind
+    def __call__(self, family: socket.AddressFamily | None = None) -> int:
+        """
+        Return an unbound port for the given address family.
+        :param family: if omitted, both IPv4 and IPv6 addresses will be tried
+        :return: a port number
+        """
+        if family is not None:
+            families = [family]
+        else:
+            families = [socket.AF_INET]
+            if socket.has_ipv6:
+                families.append(socket.AF_INET6)
+        while True:
+            port = 0
+            with ExitStack() as stack:
+                for family in families:
+                    sock = stack.enter_context(socket.socket(family, self._kind))
+                    addr = "::1" if family == socket.AF_INET6 else "127.0.0.1"
+                    try:
+                        sock.bind((addr, port))
+                    except OSError:
+                        break
+                    if not port:
+                        port = sock.getsockname()[1]
+                else:
+                    if port not in self._generated:
+                        self._generated.add(port)
+                        return port
+@pytest.fixture(scope="session")
+def free_tcp_port_factory() -> FreePortFactory:
+    return FreePortFactory(socket.SOCK_STREAM)
+@pytest.fixture(scope="session")
+def free_udp_port_factory() -> FreePortFactory:
+    return FreePortFactory(socket.SOCK_DGRAM)
+@pytest.fixture
+def free_tcp_port(free_tcp_port_factory: Callable[[], int]) -> int:
+    return free_tcp_port_factory()
+@pytest.fixture
+def free_udp_port(free_udp_port_factory: Callable[[], int]) -> int:
+    return free_udp_port_factory()

venv/lib/python3.10/site-packages/anyio/to_interpreter.py ADDED Viewed

	@@ -0,0 +1,246 @@

+from __future__ import annotations
+__all__ = (
+    "run_sync",
+    "current_default_interpreter_limiter",
+)
+import atexit
+import os
+import sys
+from collections import deque
+from collections.abc import Callable
+from typing import Any, Final, TypeVar
+from . import current_time, to_thread
+from ._core._exceptions import BrokenWorkerInterpreter
+from ._core._synchronization import CapacityLimiter
+from .lowlevel import RunVar
+if sys.version_info >= (3, 11):
+    from typing import TypeVarTuple, Unpack
+else:
+    from typing_extensions import TypeVarTuple, Unpack
+if sys.version_info >= (3, 14):
+    from concurrent.interpreters import ExecutionFailed, create
+    def _interp_call(
+        func: Callable[..., Any], args: tuple[Any, ...]
+    ) -> tuple[Any, bool]:
+        try:
+            retval = func(*args)
+        except BaseException as exc:
+            return exc, True
+        else:
+            return retval, False
+    class _Worker:
+        last_used: float = 0
+        def __init__(self) -> None:
+            self._interpreter = create()
+        def destroy(self) -> None:
+            self._interpreter.close()
+        def call(
+            self,
+            func: Callable[..., T_Retval],
+            args: tuple[Any, ...],
+        ) -> T_Retval:
+            try:
+                res, is_exception = self._interpreter.call(_interp_call, func, args)
+            except ExecutionFailed as exc:
+                raise BrokenWorkerInterpreter(exc.excinfo) from exc
+            if is_exception:
+                raise res
+            return res
+elif sys.version_info >= (3, 13):
+    import _interpqueues
+    import _interpreters
+    UNBOUND: Final = 2  # I have no clue how this works, but it was used in the stdlib
+    FMT_UNPICKLED: Final = 0
+    FMT_PICKLED: Final = 1
+    QUEUE_PICKLE_ARGS: Final = (FMT_PICKLED, UNBOUND)
+    QUEUE_UNPICKLE_ARGS: Final = (FMT_UNPICKLED, UNBOUND)
+    _run_func = compile(
+        """
+import _interpqueues
+from _interpreters import NotShareableError
+from pickle import loads, dumps, HIGHEST_PROTOCOL
+QUEUE_PICKLE_ARGS = (1, 2)
+QUEUE_UNPICKLE_ARGS = (0, 2)
+item = _interpqueues.get(queue_id)[0]
+try:
+    func, args = loads(item)
+    retval = func(*args)
+except BaseException as exc:
+    is_exception = True
+    retval = exc
+else:
+    is_exception = False
+try:
+    _interpqueues.put(queue_id, (retval, is_exception), *QUEUE_UNPICKLE_ARGS)
+except NotShareableError:
+    retval = dumps(retval, HIGHEST_PROTOCOL)
+    _interpqueues.put(queue_id, (retval, is_exception), *QUEUE_PICKLE_ARGS)
+    """,
+        "<string>",
+        "exec",
+    )
+    class _Worker:
+        last_used: float = 0
+        def __init__(self) -> None:
+            self._interpreter_id = _interpreters.create()
+            self._queue_id = _interpqueues.create(1, *QUEUE_UNPICKLE_ARGS)
+            _interpreters.set___main___attrs(
+                self._interpreter_id, {"queue_id": self._queue_id}
+            )
+        def destroy(self) -> None:
+            _interpqueues.destroy(self._queue_id)
+            _interpreters.destroy(self._interpreter_id)
+        def call(
+            self,
+            func: Callable[..., T_Retval],
+            args: tuple[Any, ...],
+        ) -> T_Retval:
+            import pickle
+            item = pickle.dumps((func, args), pickle.HIGHEST_PROTOCOL)
+            _interpqueues.put(self._queue_id, item, *QUEUE_PICKLE_ARGS)
+            exc_info = _interpreters.exec(self._interpreter_id, _run_func)
+            if exc_info:
+                raise BrokenWorkerInterpreter(exc_info)
+            res = _interpqueues.get(self._queue_id)
+            (res, is_exception), fmt = res[:2]
+            if fmt == FMT_PICKLED:
+                res = pickle.loads(res)
+            if is_exception:
+                raise res
+            return res
+else:
+    class _Worker:
+        last_used: float = 0
+        def __init__(self) -> None:
+            raise RuntimeError("subinterpreters require at least Python 3.13")
+        def call(
+            self,
+            func: Callable[..., T_Retval],
+            args: tuple[Any, ...],
+        ) -> T_Retval:
+            raise NotImplementedError
+        def destroy(self) -> None:
+            pass
+DEFAULT_CPU_COUNT: Final = 8  # this is just an arbitrarily selected value
+MAX_WORKER_IDLE_TIME = (
+    30  # seconds a subinterpreter can be idle before becoming eligible for pruning
+)
+T_Retval = TypeVar("T_Retval")
+PosArgsT = TypeVarTuple("PosArgsT")
+_idle_workers = RunVar[deque[_Worker]]("_available_workers")
+_default_interpreter_limiter = RunVar[CapacityLimiter]("_default_interpreter_limiter")
+def _stop_workers(workers: deque[_Worker]) -> None:
+    for worker in workers:
+        worker.destroy()
+    workers.clear()
+async def run_sync(
+    func: Callable[[Unpack[PosArgsT]], T_Retval],
+    *args: Unpack[PosArgsT],
+    limiter: CapacityLimiter | None = None,
+) -> T_Retval:
+    """
+    Call the given function with the given arguments in a subinterpreter.
+    .. warning:: On Python 3.13, the :mod:`concurrent.interpreters` module was not yet
+        available, so the code path for that Python version relies on an undocumented,
+        private API. As such, it is recommended to not rely on this function for anything
+        mission-critical on Python 3.13.
+    :param func: a callable
+    :param args: the positional arguments for the callable
+    :param limiter: capacity limiter to use to limit the total number of subinterpreters
+        running (if omitted, the default limiter is used)
+    :return: the result of the call
+    :raises BrokenWorkerInterpreter: if there's an internal error in a subinterpreter
+    """
+    if limiter is None:
+        limiter = current_default_interpreter_limiter()
+    try:
+        idle_workers = _idle_workers.get()
+    except LookupError:
+        idle_workers = deque()
+        _idle_workers.set(idle_workers)
+        atexit.register(_stop_workers, idle_workers)
+    async with limiter:
+        try:
+            worker = idle_workers.pop()
+        except IndexError:
+            worker = _Worker()
+    try:
+        return await to_thread.run_sync(
+            worker.call,
+            func,
+            args,
+            limiter=limiter,
+        )
+    finally:
+        # Prune workers that have been idle for too long
+        now = current_time()
+        while idle_workers:
+            if now - idle_workers[0].last_used <= MAX_WORKER_IDLE_TIME:
+                break
+            await to_thread.run_sync(idle_workers.popleft().destroy, limiter=limiter)
+        worker.last_used = current_time()
+        idle_workers.append(worker)
+def current_default_interpreter_limiter() -> CapacityLimiter:
+    """
+    Return the capacity limiter used by default to limit the number of concurrently
+    running subinterpreters.
+    Defaults to the number of CPU cores.
+    :return: a capacity limiter object
+    """
+    try:
+        return _default_interpreter_limiter.get()
+    except LookupError:
+        limiter = CapacityLimiter(os.cpu_count() or DEFAULT_CPU_COUNT)
+        _default_interpreter_limiter.set(limiter)
+        return limiter

venv/lib/python3.10/site-packages/typing_extensions.py ADDED Viewed

The diff for this file is too large to render. See raw diff

venv/pyvenv.cfg ADDED Viewed

	@@ -0,0 +1,3 @@

+home = /usr/bin
+include-system-site-packages = false
+version = 3.10.12