Initial upload: Anima Preview V2 diffusers conversion
Browse files- .gitattributes +3 -0
- LICENSE.md +53 -0
- README.md +50 -0
- anima_comparison.json +1 -0
- example.png +3 -0
- llm_adapter/config.json +12 -0
- llm_adapter/diffusion_pytorch_model.safetensors +3 -0
- llm_adapter/modeling_llm_adapter.py +215 -0
- model_index.json +32 -0
- pipeline.py +371 -0
- scheduler/scheduler_config.json +6 -0
- t5_tokenizer/tokenizer.json +0 -0
- t5_tokenizer/tokenizer_config.json +113 -0
- text_encoder/config.json +22 -0
- text_encoder/generation_config.json +13 -0
- text_encoder/merges.txt +0 -0
- text_encoder/model.safetensors +3 -0
- text_encoder/tokenizer.json +3 -0
- text_encoder/tokenizer_config.json +239 -0
- text_encoder/vocab.json +0 -0
- tokenizer/chat_template.jinja +89 -0
- tokenizer/tokenizer.json +3 -0
- tokenizer/tokenizer_config.json +29 -0
- transformer/config.json +36 -0
- transformer/diffusion_pytorch_model.safetensors +3 -0
- vae/config.json +56 -0
- vae/diffusion_pytorch_model.safetensors +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
example.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
text_encoder/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
LICENSE.md
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
CircleStone Labs Non-Commercial License v1.0
|
| 2 |
+
|
| 3 |
+
CircleStone Labs LLC (“we” or “our” or “Company”) is pleased to make the weights, parameters, and inference code for the CircleStone Models (as defined below) freely available for your non-commercial and non-production use as set forth in this CircleStone Labs Non-Commercial License (“License”). “Models” includes all model elements, such as weights, algorithms, software, checkpoints, parameters, source code (inference code, evaluation code, and if applicable, fine-tuning code) and any other materials associated with the CircleStone AI models made available by Company under this License, including if any, the technical documentation, manuals, and instructions for the use and operation thereof (individually and collectively, the “CircleStone Models”). Note that we may also make available certain elements of what is included in the definition of “CircleStone Model” under a separate license, such as the inference code, and nothing in this License will be deemed to restrict or limit any other licenses granted by us in such elements.
|
| 4 |
+
|
| 5 |
+
By downloading, accessing, using, Distributing (as defined below), or creating a Derivative (as defined below) of the CircleStone Model, you agree to the terms of this License. If you do not agree to this License, then you do not have any rights to access, use, Distribute or create a Derivative of the CircleStone Model and you must immediately cease using the CircleStone Model. If you are agreeing to be bound by the terms of this License on behalf of your employer or other entity, you represent and warrant to us that you have full legal authority to bind your employer or such entity to this License. If you do not have the requisite authority, you may not accept the License or access the CircleStone Model on behalf of your employer or other entity.
|
| 6 |
+
|
| 7 |
+
1. Definitions.
|
| 8 |
+
- a. “Derivative” means any (i) modified version of the CircleStone Model (including but not limited to any customized or fine-tuned version thereof), (ii) work based on the CircleStone Model, including Low-rank Adaptations (“LoRAs”) and textual inversions based on a CircleStone Model, or (iii) any other derivative work thereof. For the avoidance of doubt, Outputs are not considered Derivatives under this License.
|
| 9 |
+
- b. “Distribution” or “Distribute” or “Distributing” means providing or making available, by any means, a copy of the CircleStone Models and/or the Derivatives as the case may be, including by making the CircleStone Models and/or the Derivatives available to third-parties on a hosted basis.
|
| 10 |
+
- c. “Non-Commercial Purpose” means any of the following uses, but only so far as you do not receive any direct or indirect payment arising from the use of the CircleStone Model, or Derivatives: (i) personal use for research, experimentation, and testing for the benefit of public knowledge, personal study, private entertainment, hobby projects, or otherwise not directly or indirectly connected to any commercial activities, business operations, or employment responsibilities; (ii) internal use by commercial or for-profit entities for testing, evaluation, or non-commercial research and development in a non-production environment; and (iii) use by any charitable organization for charitable purposes, or for testing or evaluation. For clarity, use (a) for revenue-generating activity, (b) in direct interactions with or that has impact on third-party end users, or (c) to train, fine tune, or distill other models for commercial use, in each case, is not a Non-Commercial Purpose.
|
| 11 |
+
- d. “Outputs” means any content generated by the operation of the CircleStone Models or Derivatives from an input (such as an image input) or prompt (i.e., text instructions) provided by users. For the avoidance of doubt, Outputs do not include any components of the CircleStone Models, such as any fine-tuned versions of the CircleStone Models, any LoRAs or textual inversions, the weights, or parameters.
|
| 12 |
+
- e. “you” or “your” means the individual or entity entering into this License with Company.
|
| 13 |
+
|
| 14 |
+
2. License Grant.
|
| 15 |
+
- a. License. Subject to your compliance with this License, Company grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free, and limited license to access, use, create Derivatives of, and Distribute the CircleStone Models and Derivatives solely for your Non-Commercial Purposes. The foregoing license is personal to you, and you may not assign or sublicense this License or any other rights or obligations under this License without Company’s prior written consent; any such assignment or sublicense will be void and will automatically and immediately terminate this License. Any restrictions set forth herein regarding the CircleStone Model also apply to any Derivative you create or that are created on your behalf.
|
| 16 |
+
- b. Non-Commercial Use Only. You may only access, use, Distribute, or create Derivatives of the CircleStone Model or Derivatives for Non-Commercial Purposes. If you want to use a CircleStone Model or a Derivative for any purpose that is not expressly authorized under this License, such as for a commercial activity, you must request a license from Company, which Company may grant to you in Company’s sole discretion and which additional use may be subject to a fee, royalty or other revenue share. Please refer to the details in the Model Card if you would like a commercial license.
|
| 17 |
+
- c. Reserved Rights. The grant of rights expressly set forth in this License are the complete grant of rights to you in the CircleStone Model, and no other licenses are granted, whether by waiver, estoppel, implication, equity, or otherwise except as otherwise agreed in writing by Company. Company and its licensors reserve all rights not expressly granted by this License.
|
| 18 |
+
- d. Outputs. We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License. You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein. You may not use the Output to train, fine-tune, or distill a model that is competitive with a CircleStone Model.
|
| 19 |
+
- e. You are solely responsible for your use of all Outputs generated through the Models and/or any Derivatives, including for ensuring that such Outputs are used in a manner that is non-infringing and otherwise compliant with all applicable laws, rules and regulations.
|
| 20 |
+
|
| 21 |
+
3. Distribution. Subject to this License, you may Distribute copies of the CircleStone Model and/or Derivatives made by you, under the following conditions:
|
| 22 |
+
- a. you must make available a copy of this License to third-party recipients of the CircleStone Models and/or Derivatives you Distribute, and specify that any rights to use the CircleStone Models and/or Derivatives shall be directly granted by Company to said third-party recipients pursuant to this License;
|
| 23 |
+
- b. you must prominently display the following notice alongside the Distribution of the CircleStone Model or Derivative (such as via a “Notice” text file distributed as part of such CircleStone Model or Derivative) (the “Attribution Notice”):
|
| 24 |
+
|
| 25 |
+
“The CircleStone Model is licensed by CircleStone Labs LLC under the CircleStone Non-Commercial License. Copyright CircleStone Labs LLC.
|
| 26 |
+
IN NO EVENT SHALL CIRCLESTONE LABS LLC BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.”
|
| 27 |
+
|
| 28 |
+
- c. with respect to each licensee who purchases a commercial license to the Models, you agree that such licensee shall have the right to exercise such commercial license with respect to all Derivatives developed by you pursuant to the same terms and conditions agreed by the Company and such commercial licensee with respect to the Models; provided, that, you will not be entitled to receive any royalty fees, license fees or other remuneration in connection with the commercial licensee’s use of your Derivatives under such commercial license, and you hereby waive all right to receive any consideration in connection therewith;
|
| 29 |
+
- d. in the case of Distribution of Derivatives made by you: (i) you must also include in the Attribution Notice a statement that you have modified the applicable CircleStone Model; (ii) any terms and conditions you impose on any third-party recipients relating to Derivatives made by or for you shall neither limit such third-party recipients’ use of the CircleStone Model or any Derivatives made by or for Company in accordance with this License nor conflict with any of its terms and conditions and must include disclaimer of warranties and limitation of liability provisions that are at least as protective of Company as those set forth herein; and (iii) you must not misrepresent or imply, through any means, that the Derivatives made by or for you and/or any modified version of the CircleStone Model you Distribute under your name and responsibility is an official product of the Company or has been endorsed, approved or validated by the Company, unless you are authorized by Company to do so in writing; and
|
| 30 |
+
|
| 31 |
+
4. Restrictions. You will not, and will not permit, assist or cause any third party to
|
| 32 |
+
- a. use, modify, copy, reproduce, create Derivatives of, or Distribute the CircleStone Model (or any Derivative thereof, or any data produced by the CircleStone Model), in whole or in part, (i) for any commercial or production purposes, (ii) in any manner that infringes, misappropriates, or otherwise violates (or is likely to infringe, misappropriate, or otherwise violate) any third party’s legal rights, including rights of publicity or “digital replica” rights, (iii) in any unlawful, fraudulent, defamatory, or abusive activity, (iv) to generate unlawful content, including child sexual abuse material, or non-consensual intimate images; or (v) in any manner that violates any applicable laws, rules, regulations, directives, or governmental requirements;
|
| 33 |
+
- b. alter or remove copyright and other proprietary notices which appear on or in any portion of the CircleStone Model;
|
| 34 |
+
- c. utilize any equipment, device, software, or other means to circumvent or remove any security or protection used by Company in connection with the CircleStone Model, or to circumvent or remove any usage restrictions, or to enable functionality disabled by CircleStone Model;
|
| 35 |
+
- d. offer or impose any terms on the CircleStone Model that alter, restrict, or are inconsistent with the terms of this License; and
|
| 36 |
+
- e. directly or indirectly Distribute, export, or otherwise transfer CircleStone Model (i) to any individual, entity, or country prohibited by any applicable U.S. and non-U.S. export control and trade sanctions laws (“Export Laws”); (ii) to anyone on U.S. or non-U.S. government restricted parties lists; (iii) for any purpose prohibited by Export Laws, including nuclear, chemical or biological weapons, or missile technology applications; or (iv) use or download CircleStone Model if you or they are (a) located in a comprehensively sanctioned jurisdiction, (b) currently listed on any U.S. or non-U.S. restricted parties list, or (c) for any purpose prohibited by Export Laws.
|
| 37 |
+
|
| 38 |
+
5. DISCLAIMERS. THE CIRCLESTONE MODELS ARE PROVIDED “AS IS” AND “WITH ALL FAULTS” WITH NO WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. COMPANY EXPRESSLY DISCLAIMS ALL REPRESENTATIONS AND WARRANTIES, EXPRESS OR IMPLIED, WHETHER BY STATUTE, CUSTOM, USAGE OR OTHERWISE AS TO ANY MATTERS RELATED TO THE CIRCLESTONE MODELS, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, SATISFACTORY QUALITY, OR NON-INFRINGEMENT. COMPANY MAKES NO WARRANTIES OR REPRESENTATIONS THAT THE CIRCLESTONE MODELS WILL BE ERROR FREE OR FREE OF VIRUSES OR OTHER HARMFUL COMPONENTS, OR PRODUCE ANY PARTICULAR RESULTS.
|
| 39 |
+
|
| 40 |
+
6. LIMITATION OF LIABILITY. TO THE FULLEST EXTENT PERMITTED BY LAW, IN NO EVENT WILL COMPANY BE LIABLE TO YOU OR YOUR EMPLOYEES, AFFILIATES, USERS, OFFICERS OR DIRECTORS (A) UNDER ANY THEORY OF LIABILITY, WHETHER BASED IN CONTRACT, TORT, NEGLIGENCE, STRICT LIABILITY, WARRANTY, OR OTHERWISE UNDER THIS LICENSE, OR (B) FOR ANY INDIRECT, CONSEQUENTIAL, EXEMPLARY, INCIDENTAL, PUNITIVE OR SPECIAL DAMAGES OR LOST PROFITS, EVEN IF COMPANY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE CIRCLESTONE MODELS, THEIR CONSTITUENT COMPONENTS, AND ANY OUTPUT (COLLECTIVELY, “MODEL MATERIALS”) ARE NOT DESIGNED OR INTENDED FOR USE IN ANY APPLICATION OR SITUATION WHERE FAILURE OR FAULT OF THE MODEL MATERIALS COULD REASONABLY BE ANTICIPATED TO LEAD TO SERIOUS INJURY OF ANY PERSON, INCLUDING POTENTIAL DISCRIMINATION OR VIOLATION OF AN INDIVIDUAL’S PRIVACY RIGHTS, OR TO SEVERE PHYSICAL, PROPERTY, OR ENVIRONMENTAL DAMAGE (EACH, A “HIGH-RISK USE”). IF YOU ELECT TO USE ANY OF THE MODEL MATERIALS FOR A HIGH-RISK USE, YOU DO SO AT YOUR OWN RISK. YOU AGREE TO DESIGN AND IMPLEMENT APPROPRIATE DECISION-MAKING AND RISK-MITIGATION PROCEDURES AND POLICIES IN CONNECTION WITH A HIGH-RISK USE SUCH THAT EVEN IF THERE IS A FAILURE OR FAULT IN ANY OF THE MODEL MATERIALS, THE SAFETY OF PERSONS OR PROPERTY AFFECTED BY THE ACTIVITY STAYS AT A LEVEL THAT IS REASONABLE, APPROPRIATE, AND LAWFUL FOR THE FIELD OF THE HIGH-RISK USE.
|
| 41 |
+
|
| 42 |
+
7. INDEMNIFICATION. You will indemnify, defend and hold harmless Company and our subsidiaries and affiliates, and each of our respective shareholders, directors, officers, employees, agents, successors, and assigns (collectively, the “Company Parties”) from and against any losses, liabilities, damages, fines, penalties, and expenses (including reasonable attorneys’ fees) incurred by any Company Party in connection with any claim, demand, allegation, lawsuit, proceeding, or investigation (collectively, “Claims”) arising out of or related to (a) your access to or use of the CircleStone Models (including in connection with any Output, results or data generated from such access or use), including any High-Risk Use; (b) your violation of this License; or (c) your violation, misappropriation or infringement of any rights of another (including intellectual property or other proprietary rights and privacy rights). You will promptly notify the Company Parties of any such Claims, and cooperate with Company Parties in defending such Claims. You will also grant the Company Parties sole control of the defense or settlement, at Company’s sole option, of any Claims. This indemnity is in addition to, and not in lieu of, any other indemnities or remedies set forth in a written agreement between you and Company or the other Company Parties.
|
| 43 |
+
|
| 44 |
+
8. Termination; Survival.
|
| 45 |
+
- a. This License will automatically terminate upon any breach by you of the terms of this License.
|
| 46 |
+
- b. If you initiate any legal action or proceedings against Company or any other entity (including a cross-claim or counterclaim in a lawsuit), alleging that the CircleStone Models, any Derivative, or any part thereof, infringe upon intellectual property or other rights owned or licensable by you, then any licenses granted to you under this License will immediately terminate as of the date such legal action or claim is filed or initiated.
|
| 47 |
+
- c. Upon termination of this License, you must cease all use, access or Distribution of the CircleStone Model and any Derivatives. The following sections survive termination of this License: 2(c), 2(d), 4-11.
|
| 48 |
+
|
| 49 |
+
9. Third Party Materials. The CircleStone Models may contain third-party software or other components (including free and open source software) (all of the foregoing, “Third Party Materials”), which are subject to the license terms of the respective third-party licensors. Your dealings or correspondence with third parties and your use of or interaction with any Third Party Materials are solely between you and the third party. Company does not control or endorse, and makes no representations or warranties regarding, any Third Party Materials, and your access to and use of such Third Party Materials are at your own risk.
|
| 50 |
+
|
| 51 |
+
10. Trademarks. You have not been granted any trademark license as part of this License and may not use any name, logo or trademark associated with Company without the prior written permission of Company, except to the extent necessary to make the reference required in the Attribution Notice as specified above or as is reasonably necessary in describing the CircleStone Model and its creators.
|
| 52 |
+
|
| 53 |
+
11. General. This License will be governed and construed under the laws of the State of Delaware without regard to conflicts of law provisions. If any provision or part of a provision of this License is unlawful, void or unenforceable, that provision or part of the provision is deemed severed from this License, and will not affect the validity and enforceability of any remaining provisions. The failure of Company to exercise or enforce any right or provision of this License will not operate as a waiver of such right or provision. This License does not confer any third-party beneficiary rights upon any other person or entity. This License, together with the documentation, contains the entire understanding between you and Company regarding the subject matter of this License, and supersedes all other written or oral agreements and understandings between you and Company regarding such subject matter. Notwithstanding the foregoing, if you enter into a commercial license with the Company, then such commercial license will supersede this License in all respects.
|
README.md
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: circlestone-labs-non-commercial-license
|
| 4 |
+
license_link: LICENSE.md
|
| 5 |
+
library_name: diffusers
|
| 6 |
+
tags:
|
| 7 |
+
- diffusers
|
| 8 |
+
- safetensors
|
| 9 |
+
- diffusion-single-file
|
| 10 |
+
- sdnext
|
| 11 |
+
pipeline_tag: text-to-image
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Anima Preview V2 — Diffusers Conversion
|
| 15 |
+
|
| 16 |
+
Unofficial diffusers-format conversion of [circlestone-labs/Anima](https://huggingface.co/circlestone-labs/Anima) **Preview V2** for use with [SD.Next](https://github.com/vladmandic/sdnext).
|
| 17 |
+
|
| 18 |
+
## What Is This
|
| 19 |
+
|
| 20 |
+
This repo contains the **Preview V2** weights converted to the HuggingFace diffusers layout so they can be loaded directly with `from_pretrained`. The custom pipeline (`AnimaTextToImagePipeline`) and all non-transformer components (text encoder, VAE, tokenizers, LLM adapter, scheduler) are identical to the [Preview V1 diffusers conversion](https://huggingface.co/CalamitousFelicitousness/Anima-sdnext-diffusers) — only the transformer (diffusion model) weights differ.
|
| 21 |
+
|
| 22 |
+
## Preview V2 Improvements
|
| 23 |
+
|
| 24 |
+
From the upstream model card:
|
| 25 |
+
|
| 26 |
+
- **Retraining with better hyperparameters** — makes the model more robust to finetuning
|
| 27 |
+
- **Extended medium-resolution training** — longer training at medium resolutions for more character knowledge
|
| 28 |
+
- **Regularization dataset** — improves natural language comprehension and preserves non-anime knowledge
|
| 29 |
+
- **Base model philosophy** — no aesthetic tuning; maximum knowledge breadth over aesthetic consistency
|
| 30 |
+
|
| 31 |
+
## Usage with SD.Next
|
| 32 |
+
|
| 33 |
+
Select this repo as your model in SD.Next — it will be loaded automatically via the Anima pipeline.
|
| 34 |
+
|
| 35 |
+
## Generation Settings
|
| 36 |
+
|
| 37 |
+
- **Resolution**: ~1 MP (1024×1024, 896×1152, 1152×896, etc.)
|
| 38 |
+
- **Steps**: 30–50
|
| 39 |
+
- **CFG**: 4–5
|
| 40 |
+
- **Samplers**: `er_sde` (default), `euler_a`, `dpmpp_2m_sde_gpu`
|
| 41 |
+
|
| 42 |
+
## Credits
|
| 43 |
+
|
| 44 |
+
- **Model**: [CircleStone Labs](https://huggingface.co/circlestone-labs) & [Comfy Org](https://huggingface.co/comfyanonymous)
|
| 45 |
+
- **Architecture**: Derivative of [NVIDIA Cosmos-Predict2-2B](https://huggingface.co/nvidia/Cosmos-Predict2-2B-Text2Image) with Qwen3-0.6B text encoder
|
| 46 |
+
- **Conversion**: Automated via `diffusers.CosmosTransformer3DModel.from_single_file()`
|
| 47 |
+
|
| 48 |
+
## License
|
| 49 |
+
|
| 50 |
+
Non-commercial use only — see [LICENSE.md](LICENSE.md). Subject to [CircleStone Labs Non-Commercial License](https://huggingface.co/circlestone-labs/Anima/blob/main/LICENSE.md) and NVIDIA Open Model License Agreement.
|
anima_comparison.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"id":"353006e1-36b9-49e9-a613-f2674646f636","revision":0,"last_node_id":89,"last_link_id":178,"nodes":[{"id":3,"type":"CLIPTextEncode","pos":[1176.8700960653628,2648.7360072210595],"size":[218.659765625,88],"flags":{},"order":24,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":5},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":6}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","links":[117]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"CLIPTextEncode"},"widgets_values":["worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts"]},{"id":4,"type":"CLIPTextEncode","pos":[928.2467022387905,2648.8758950384426],"size":[218.659765625,88],"flags":{},"order":28,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":7},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":8}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","links":[116]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"CLIPTextEncode"},"widgets_values":["masterpiece, best quality, 1girl, fern \\(sousou no frieren\\), sousou no frieren, @izei1337, purple hair, black robe, lips, sidelocks, feet out of frame, very long hair, puffy sleeves, white dress, butterfly on hand, eyelashes, simple background, closed mouth, mage staff, arm at side, straight hair, blush, solo, purple eyes, chromatic aberration, purple pupils, looking at viewer, hand up, standing, bug, robe, black background, signature, bright pupils, black coat, coat, long sleeves, blue butterfly, upturned eyes, wide sleeves, blunt bangs, from above, dress, blunt ends, long hair, purple butterfly, butterfly, tsurime, half updo"]},{"id":7,"type":"StringConcatenate","pos":[938.9968503444543,2278.4242080808717],"size":[210,166],"flags":{},"order":29,"mode":0,"inputs":[{"localized_name":"string_a","name":"string_a","type":"STRING","widget":{"name":"string_a"},"link":13},{"localized_name":"string_b","name":"string_b","type":"STRING","widget":{"name":"string_b"},"link":14},{"localized_name":"delimiter","name":"delimiter","type":"STRING","widget":{"name":"delimiter"},"link":null}],"outputs":[{"localized_name":"STRING","name":"STRING","type":"STRING","links":[170]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.49","Node name for S&R":"StringConcatenate"},"widgets_values":["You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>\n","masterpiece, best quality, 1girl, fern \\(sousou no frieren\\), sousou no frieren, @izei1337, purple hair, black robe, lips, sidelocks, feet out of frame, very long hair, puffy sleeves, white dress, butterfly on hand, eyelashes, simple background, closed mouth, mage staff, arm at side, straight hair, blush, solo, purple eyes, chromatic aberration, purple pupils, looking at viewer, hand up, standing, bug, robe, black background, signature, bright pupils, black coat, coat, long sleeves, blue butterfly, upturned eyes, wide sleeves, blunt bangs, from above, dress, blunt ends, long hair, purple butterfly, butterfly, tsurime, half updo",""]},{"id":11,"type":"CLIPTextEncode","pos":[935.3422949648543,1712.146751416409],"size":[218.659765625,88],"flags":{},"order":36,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":21},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":144}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","slot_index":0,"links":[36]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"CLIPTextEncode"},"widgets_values":["masterpiece, best quality, 1girl, fern \\(sousou no frieren\\), sousou no frieren, @izei1337, purple hair, black robe, lips, sidelocks, feet out of frame, very long hair, puffy sleeves, white dress, butterfly on hand, eyelashes, simple background, closed mouth, mage staff, arm at side, straight hair, blush, solo, purple eyes, chromatic aberration, purple pupils, looking at viewer, hand up, standing, bug, robe, black background, signature, bright pupils, black coat, coat, long sleeves, blue butterfly, upturned eyes, wide sleeves, blunt bangs, from above, dress, blunt ends, long hair, purple butterfly, butterfly, tsurime, half updo"]},{"id":12,"type":"CLIPTextEncode","pos":[1173.6115778017693,1710.171933148586],"size":[218.659765625,88],"flags":{},"order":27,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":23},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":24}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","slot_index":0,"links":[37]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"CLIPTextEncode"},"widgets_values":["worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts"]},{"id":16,"type":"KSampler","pos":[1479.099355898651,1975.5894331113927],"size":[297.15631103515625,266.5079040527344],"flags":{},"order":43,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":26},{"localized_name":"positive","name":"positive","type":"CONDITIONING","link":165},{"localized_name":"negative","name":"negative","type":"CONDITIONING","link":168},{"localized_name":"latent_image","name":"latent_image","type":"LATENT","link":150},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":30},{"localized_name":"steps","name":"steps","type":"INT","widget":{"name":"steps"},"link":31},{"localized_name":"cfg","name":"cfg","type":"FLOAT","widget":{"name":"cfg"},"link":32},{"localized_name":"sampler_name","name":"sampler_name","type":"COMBO","widget":{"name":"sampler_name"},"link":33},{"localized_name":"scheduler","name":"scheduler","type":"COMBO","widget":{"name":"scheduler"},"link":null},{"localized_name":"denoise","name":"denoise","type":"FLOAT","widget":{"name":"denoise"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","slot_index":0,"links":[136]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"KSampler"},"widgets_values":[807882066116956,"fixed",30,100,"er_sde","simple",1]},{"id":20,"type":"KSampler","pos":[1477.6429724550899,1663.7469007999534],"size":[297.15631103515625,266.5079040527344],"flags":{},"order":41,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":35},{"localized_name":"positive","name":"positive","type":"CONDITIONING","link":36},{"localized_name":"negative","name":"negative","type":"CONDITIONING","link":37},{"localized_name":"latent_image","name":"latent_image","type":"LATENT","link":38},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":39},{"localized_name":"steps","name":"steps","type":"INT","widget":{"name":"steps"},"link":40},{"localized_name":"cfg","name":"cfg","type":"FLOAT","widget":{"name":"cfg"},"link":41},{"localized_name":"sampler_name","name":"sampler_name","type":"COMBO","widget":{"name":"sampler_name"},"link":42},{"localized_name":"scheduler","name":"scheduler","type":"COMBO","widget":{"name":"scheduler"},"link":null},{"localized_name":"denoise","name":"denoise","type":"FLOAT","widget":{"name":"denoise"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","slot_index":0,"links":[103]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"KSampler"},"widgets_values":[807882066116956,"fixed",30,100,"er_sde","simple",1]},{"id":28,"type":"UNETLoader","pos":[442.9893155544135,2254.2616876340953],"size":[383.29998779296875,82],"flags":{},"order":0,"mode":0,"inputs":[{"localized_name":"unet_name","name":"unet_name","type":"COMBO","widget":{"name":"unet_name"},"link":null},{"localized_name":"weight_dtype","name":"weight_dtype","type":"COMBO","widget":{"name":"weight_dtype"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","links":[73]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.49","Node name for S&R":"UNETLoader"},"widgets_values":["NetaYumev4_unet.safetensors","default"]},{"id":29,"type":"VAELoader","pos":[1849.4883445443165,1971.8196200599718],"size":[270,58],"flags":{},"order":1,"mode":0,"inputs":[{"localized_name":"vae_name","name":"vae_name","type":"COMBO","widget":{"name":"vae_name"},"link":null}],"outputs":[{"localized_name":"VAE","name":"VAE","type":"VAE","links":[132]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"VAELoader"},"widgets_values":["qwen_image_vae.safetensors"]},{"id":36,"type":"ModelSamplingAuraFlow","pos":[492.68421606710893,2151.2510064817516],"size":[315,58],"flags":{},"order":21,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":73},{"localized_name":"shift","name":"shift","type":"FLOAT","widget":{"name":"shift"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","slot_index":0,"links":[107]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.27","Node name for S&R":"ModelSamplingAuraFlow"},"widgets_values":[3.0000000000000004]},{"id":42,"type":"PrimitiveNode","pos":[62.918407961640185,2246.1623223997194],"size":[351.99847412109375,94.76183319091797],"flags":{},"order":2,"mode":0,"inputs":[],"outputs":[{"name":"STRING","type":"STRING","widget":{"name":"string_a"},"links":[1,13]}],"properties":{"Run widget replace on values":false},"widgets_values":["You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>\n"]},{"id":52,"type":"ImagesGridByColumns","pos":[2384.7483897121274,2395.2956872201703],"size":[270,102],"flags":{},"order":52,"mode":0,"inputs":[{"localized_name":"images","name":"images","type":"IMAGE","link":148},{"localized_name":"annotation","name":"annotation","shape":7,"type":"GRID_ANNOTATION","link":null},{"localized_name":"gap","name":"gap","type":"INT","widget":{"name":"gap"},"link":null},{"localized_name":"max_columns","name":"max_columns","type":"INT","widget":{"name":"max_columns"},"link":null}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","links":[138]}],"properties":{"cnr_id":"images-grid-comfy-plugin","ver":"852db490ef93702e1c68fe9774bdf65aaa7d3574","Node name for S&R":"ImagesGridByColumns"},"widgets_values":[0,1]},{"id":56,"type":"CLIPLoader","pos":[464.06200171164016,2398.2439309812694],"size":[315,106],"flags":{},"order":3,"mode":0,"inputs":[{"localized_name":"clip_name","name":"clip_name","type":"COMBO","widget":{"name":"clip_name"},"link":null},{"localized_name":"type","name":"type","type":"COMBO","widget":{"name":"type"},"link":null},{"localized_name":"device","name":"device","shape":7,"type":"COMBO","widget":{"name":"device"},"link":null}],"outputs":[{"localized_name":"CLIP","name":"CLIP","type":"CLIP","slot_index":0,"links":[169,172]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.27","Node name for S&R":"CLIPLoader"},"widgets_values":["gemma_2_2b_fp16.safetensors","lumina2","default"]},{"id":57,"type":"ModelSamplingAuraFlow","pos":[614.7192173927585,2613.9984246891036],"size":[270,58],"flags":{},"order":32,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":102},{"localized_name":"shift","name":"shift","type":"FLOAT","widget":{"name":"shift"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","links":[115]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.58","Node name for S&R":"ModelSamplingAuraFlow"},"widgets_values":[3]},{"id":59,"type":"PrimitiveNode","pos":[881.7510774992556,3096.214937795815],"size":[540.936329143017,98.54346267734627],"flags":{},"order":4,"mode":0,"inputs":[],"outputs":[{"name":"STRING","type":"STRING","widget":{"name":"string_b"},"links":[2,6,24,155,167]}],"title":"Negative","properties":{"Run widget replace on values":false},"widgets_values":["worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts"]},{"id":15,"type":"UNETLoader","pos":[415.5075337787751,1801.7634524245975],"size":[365.867753462358,82],"flags":{},"order":5,"mode":0,"inputs":[{"localized_name":"unet_name","name":"unet_name","type":"COMBO","widget":{"name":"unet_name"},"link":null},{"localized_name":"weight_dtype","name":"weight_dtype","type":"COMBO","widget":{"name":"weight_dtype"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","links":[25]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.70","Node name for S&R":"UNETLoader"},"widgets_values":["Chroma1-Base.safetensors","default"]},{"id":70,"type":"CLIPLoader","pos":[255.53420217726813,2737.141285077774],"size":[315,106],"flags":{},"order":6,"mode":0,"inputs":[{"localized_name":"clip_name","name":"clip_name","type":"COMBO","widget":{"name":"clip_name"},"link":null},{"localized_name":"type","name":"type","type":"COMBO","widget":{"name":"type"},"link":null},{"localized_name":"device","name":"device","shape":7,"type":"COMBO","widget":{"name":"device"},"link":null}],"outputs":[{"localized_name":"CLIP","name":"CLIP","type":"CLIP","slot_index":0,"links":[5,7]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.27","Node name for S&R":"CLIPLoader"},"widgets_values":["qwen_3_06b_base.safetensors","stable_diffusion","default"]},{"id":51,"type":"KSampler","pos":[1487.3794191514357,1362.0183677119146],"size":[270,262],"flags":{},"order":47,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":87},{"localized_name":"positive","name":"positive","type":"CONDITIONING","link":159},{"localized_name":"negative","name":"negative","type":"CONDITIONING","link":162},{"localized_name":"latent_image","name":"latent_image","type":"LATENT","link":90},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":91},{"localized_name":"steps","name":"steps","type":"INT","widget":{"name":"steps"},"link":92},{"localized_name":"cfg","name":"cfg","type":"FLOAT","widget":{"name":"cfg"},"link":93},{"localized_name":"sampler_name","name":"sampler_name","type":"COMBO","widget":{"name":"sampler_name"},"link":94},{"localized_name":"scheduler","name":"scheduler","type":"COMBO","widget":{"name":"scheduler"},"link":null},{"localized_name":"denoise","name":"denoise","type":"FLOAT","widget":{"name":"denoise"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","links":[96]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.76","Node name for S&R":"KSampler"},"widgets_values":[807882066116956,"fixed",30,6,"er_sde","simple",1]},{"id":46,"type":"ModelSamplingAuraFlow","pos":[635.068394769162,1347.7188376736178],"size":[285,58],"flags":{},"order":25,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":81},{"localized_name":"shift","name":"shift","type":"FLOAT","widget":{"name":"shift"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","slot_index":0,"links":[87]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.27","Node name for S&R":"ModelSamplingAuraFlow"},"widgets_values":[3.0000000000000004]},{"id":38,"type":"DualCLIPLoader","pos":[550.3576509675082,1514.3304079215532],"size":[382.72727272727275,130],"flags":{},"order":7,"mode":0,"inputs":[{"localized_name":"clip_name1","name":"clip_name1","type":"COMBO","widget":{"name":"clip_name1"},"link":null},{"localized_name":"clip_name2","name":"clip_name2","type":"COMBO","widget":{"name":"clip_name2"},"link":null},{"localized_name":"type","name":"type","type":"COMBO","widget":{"name":"type"},"link":null},{"localized_name":"device","name":"device","shape":7,"type":"COMBO","widget":{"name":"device"},"link":null}],"outputs":[{"localized_name":"CLIP","name":"CLIP","type":"CLIP","links":[157,160]}],"properties":{"cnr_id":"comfy-core","ver":"0.5.1","Node name for S&R":"DualCLIPLoader"},"widgets_values":["gemma_3_4b_it_bf16.safetensors","jina_clip_v2_bf16.safetensors","newbie","default"]},{"id":37,"type":"UNETLoader","pos":[301.7626096451953,1339.3717302356004],"size":[304.5454545454545,82],"flags":{},"order":8,"mode":0,"inputs":[{"localized_name":"unet_name","name":"unet_name","type":"COMBO","widget":{"name":"unet_name"},"link":null},{"localized_name":"weight_dtype","name":"weight_dtype","type":"COMBO","widget":{"name":"weight_dtype"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","links":[81]}],"properties":{"cnr_id":"comfy-core","ver":"0.4.0","Node name for S&R":"UNETLoader"},"widgets_values":["NewBie-Image-Exp0.1-bf16.safetensors","default"]},{"id":43,"type":"PrimitiveNode","pos":[-103.34972169605462,1335.6671792131683],"size":[386.54392866654825,101.12546955455434],"flags":{},"order":9,"mode":0,"inputs":[],"outputs":[{"name":"STRING","type":"STRING","widget":{"name":"string_a"},"links":[151,154]}],"properties":{"Run widget replace on values":false},"widgets_values":["You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. <Prompt Start> "]},{"id":14,"type":"ModelSamplingAuraFlow","pos":[454.35048880847546,1710.854353949167],"size":[315,58],"flags":{},"order":23,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":25},{"localized_name":"shift","name":"shift","type":"FLOAT","widget":{"name":"shift"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","slot_index":0,"links":[35]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.27","Node name for S&R":"ModelSamplingAuraFlow"},"widgets_values":[3.0000000000000004]},{"id":13,"type":"CLIPLoader","pos":[135.160449039784,1710.6890533434541],"size":[270,106],"flags":{},"order":10,"mode":0,"inputs":[{"localized_name":"clip_name","name":"clip_name","type":"COMBO","widget":{"name":"clip_name"},"link":null},{"localized_name":"type","name":"type","type":"COMBO","widget":{"name":"type"},"link":null},{"localized_name":"device","name":"device","shape":7,"type":"COMBO","widget":{"name":"device"},"link":null}],"outputs":[{"localized_name":"CLIP","name":"CLIP","type":"CLIP","links":[21,23]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.30","Node name for S&R":"CLIPLoader"},"widgets_values":["t5xxl_fp8_e4m3fn_scaled.safetensors","chroma","default"]},{"id":50,"type":"RegexReplace","pos":[129.481608565417,1892.92077962926],"size":[210,292],"flags":{},"order":30,"mode":0,"inputs":[{"localized_name":"string","name":"string","type":"STRING","widget":{"name":"string"},"link":86},{"localized_name":"regex_pattern","name":"regex_pattern","type":"STRING","widget":{"name":"regex_pattern"},"link":null},{"localized_name":"replace","name":"replace","type":"STRING","widget":{"name":"replace"},"link":null},{"localized_name":"case_insensitive","name":"case_insensitive","shape":7,"type":"BOOLEAN","widget":{"name":"case_insensitive"},"link":null},{"localized_name":"multiline","name":"multiline","shape":7,"type":"BOOLEAN","widget":{"name":"multiline"},"link":null},{"localized_name":"dotall","name":"dotall","shape":7,"type":"BOOLEAN","widget":{"name":"dotall"},"link":null},{"localized_name":"count","name":"count","shape":7,"type":"INT","widget":{"name":"count"},"link":null}],"outputs":[{"localized_name":"STRING","name":"STRING","type":"STRING","links":[144,152,164]}],"properties":{"cnr_id":"comfy-core","ver":"0.5.1","Node name for S&R":"RegexReplace"},"widgets_values":["masterpiece, best quality, 1girl, fern \\(sousou no frieren\\), sousou no frieren, @izei1337, purple hair, black robe, lips, sidelocks, feet out of frame, very long hair, puffy sleeves, white dress, butterfly on hand, eyelashes, simple background, closed mouth, mage staff, arm at side, straight hair, blush, solo, purple eyes, chromatic aberration, purple pupils, looking at viewer, hand up, standing, bug, robe, black background, signature, bright pupils, black coat, coat, long sleeves, blue butterfly, upturned eyes, wide sleeves, blunt bangs, from above, dress, blunt ends, long hair, purple butterfly, butterfly, tsurime, half updo","@","",true,false,false,0]},{"id":62,"type":"KSampler","pos":[1480.2090506618333,2279.071415112123],"size":[297.15631103515625,266.5079040527344],"flags":{},"order":40,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":107},{"localized_name":"positive","name":"positive","type":"CONDITIONING","link":171},{"localized_name":"negative","name":"negative","type":"CONDITIONING","link":174},{"localized_name":"latent_image","name":"latent_image","type":"LATENT","link":110},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":111},{"localized_name":"steps","name":"steps","type":"INT","widget":{"name":"steps"},"link":112},{"localized_name":"cfg","name":"cfg","type":"FLOAT","widget":{"name":"cfg"},"link":113},{"localized_name":"sampler_name","name":"sampler_name","type":"COMBO","widget":{"name":"sampler_name"},"link":114},{"localized_name":"scheduler","name":"scheduler","type":"COMBO","widget":{"name":"scheduler"},"link":null},{"localized_name":"denoise","name":"denoise","type":"FLOAT","widget":{"name":"denoise"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","slot_index":0,"links":[98]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"KSampler"},"widgets_values":[807882066116956,"fixed",30,100,"er_sde","simple",1]},{"id":63,"type":"KSampler","pos":[1478.3728206364995,2583.2187512826804],"size":[297.15631103515625,266.5079040527344],"flags":{},"order":39,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":115},{"localized_name":"positive","name":"positive","type":"CONDITIONING","link":116},{"localized_name":"negative","name":"negative","type":"CONDITIONING","link":117},{"localized_name":"latent_image","name":"latent_image","type":"LATENT","link":118},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":119},{"localized_name":"steps","name":"steps","type":"INT","widget":{"name":"steps"},"link":120},{"localized_name":"cfg","name":"cfg","type":"FLOAT","widget":{"name":"cfg"},"link":121},{"localized_name":"sampler_name","name":"sampler_name","type":"COMBO","widget":{"name":"sampler_name"},"link":122},{"localized_name":"scheduler","name":"scheduler","type":"COMBO","widget":{"name":"scheduler"},"link":null},{"localized_name":"denoise","name":"denoise","type":"FLOAT","widget":{"name":"denoise"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","slot_index":0,"links":[131]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"KSampler"},"widgets_values":[807882066116956,"fixed",30,4,"er_sde","simple",1]},{"id":80,"type":"PreviewImage","pos":[1788.2084911341244,2708.2497222885086],"size":[485.7264404296875,684.8994140625],"flags":{},"order":53,"mode":0,"inputs":[{"localized_name":"images","name":"images","type":"IMAGE","link":138}],"outputs":[],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"PreviewImage"},"widgets_values":[]},{"id":79,"type":"PrimitiveNode","pos":[872.7232322790164,2784.029524520941],"size":[554.3851318359375,277.14129638671875],"flags":{},"order":11,"mode":0,"inputs":[],"outputs":[{"name":"STRING","type":"STRING","widget":{"name":"text"},"links":[8,14,86]}],"title":"Positive","properties":{"Run widget replace on values":false},"widgets_values":["masterpiece, best quality, 1girl, fern \\(sousou no frieren\\), sousou no frieren, @izei1337, purple hair, black robe, lips, sidelocks, feet out of frame, very long hair, puffy sleeves, white dress, butterfly on hand, eyelashes, simple background, closed mouth, mage staff, arm at side, straight hair, blush, solo, purple eyes, chromatic aberration, purple pupils, looking at viewer, hand up, standing, bug, robe, black background, signature, bright pupils, black coat, coat, long sleeves, blue butterfly, upturned eyes, wide sleeves, blunt bangs, from above, dress, blunt ends, long hair, purple butterfly, butterfly, tsurime, half updo"]},{"id":78,"type":"CheckpointLoaderSimple","pos":[376.89820843771423,1935.989786657534],"size":[427.6972351074219,142.3275909423828],"flags":{},"order":12,"mode":0,"inputs":[{"localized_name":"ckpt_name","name":"ckpt_name","type":"COMBO","widget":{"name":"ckpt_name"},"link":null},{"localized_name":"weight_dtype","name":"weight_dtype","type":"COMBO","widget":{"name":"weight_dtype"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","slot_index":0,"links":[26]},{"localized_name":"CLIP","name":"CLIP","type":"CLIP","slot_index":1,"links":[163,166]},{"localized_name":"VAE","name":"VAE","type":"VAE","slot_index":2,"links":[137]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.26","Node name for S&R":"CheckpointLoaderSimple"},"widgets_values":["naiXLVpred102d_custom.safetensors","default"]},{"id":1,"type":"StringConcatenate","pos":[1176.7807706325398,2279.9932388425905],"size":[210,166],"flags":{},"order":22,"mode":0,"inputs":[{"localized_name":"string_a","name":"string_a","type":"STRING","widget":{"name":"string_a"},"link":1},{"localized_name":"string_b","name":"string_b","type":"STRING","widget":{"name":"string_b"},"link":2},{"localized_name":"delimiter","name":"delimiter","type":"STRING","widget":{"name":"delimiter"},"link":null}],"outputs":[{"localized_name":"STRING","name":"STRING","type":"STRING","links":[173]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.49","Node name for S&R":"StringConcatenate"},"widgets_values":["You are an assistant designed to generate anime images based on textual prompts. <Prompt Start>\n","worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts",""]},{"id":67,"type":"UNETLoader","pos":[235.3622063276584,2613.645363446915],"size":[342.6,82],"flags":{},"order":13,"mode":0,"inputs":[{"localized_name":"unet_name","name":"unet_name","type":"COMBO","widget":{"name":"unet_name"},"link":null},{"localized_name":"weight_dtype","name":"weight_dtype","type":"COMBO","widget":{"name":"weight_dtype"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","links":[102]}],"properties":{"cnr_id":"comfy-core","ver":"0.10.0","Node name for S&R":"UNETLoader"},"widgets_values":["anima-preview.safetensors","default"]},{"id":25,"type":"ImageStitch","pos":[2053.3585969002243,2132.589910055101],"size":[270,150],"flags":{},"order":50,"mode":0,"inputs":[{"localized_name":"image1","name":"image1","type":"IMAGE","link":177},{"localized_name":"image2","name":"image2","shape":7,"type":"IMAGE","link":178},{"localized_name":"direction","name":"direction","type":"COMBO","widget":{"name":"direction"},"link":null},{"localized_name":"match_image_size","name":"match_image_size","type":"BOOLEAN","widget":{"name":"match_image_size"},"link":null},{"localized_name":"spacing_width","name":"spacing_width","type":"INT","widget":{"name":"spacing_width"},"link":null},{"localized_name":"spacing_color","name":"spacing_color","type":"COMBO","widget":{"name":"spacing_color"},"link":null}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","links":[147]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.49","Node name for S&R":"ImageStitch"},"widgets_values":["right",true,0,"white"]},{"id":74,"type":"ImageStitch","pos":[2057.6069377412496,2341.641823368795],"size":[270,150],"flags":{},"order":51,"mode":0,"inputs":[{"localized_name":"image1","name":"image1","type":"IMAGE","link":147},{"localized_name":"image2","name":"image2","shape":7,"type":"IMAGE","link":149},{"localized_name":"direction","name":"direction","type":"COMBO","widget":{"name":"direction"},"link":null},{"localized_name":"match_image_size","name":"match_image_size","type":"BOOLEAN","widget":{"name":"match_image_size"},"link":null},{"localized_name":"spacing_width","name":"spacing_width","type":"INT","widget":{"name":"spacing_width"},"link":null},{"localized_name":"spacing_color","name":"spacing_color","type":"COMBO","widget":{"name":"spacing_color"},"link":null}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","links":[148]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.49","Node name for S&R":"ImageStitch"},"widgets_values":["right",true,0,"white"]},{"id":66,"type":"VAEDecode","pos":[1828.5406367624264,2613.2421957083784],"size":[140,46],"flags":{},"order":44,"mode":0,"inputs":[{"localized_name":"samples","name":"samples","type":"LATENT","link":131},{"localized_name":"vae","name":"vae","type":"VAE","link":132}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","slot_index":0,"links":[149]}],"title":"Anima","properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"VAEDecode"},"widgets_values":[],"color":"#223","bgcolor":"#335"},{"id":81,"type":"Note","pos":[2359.2481749947306,2200.144149944883],"size":[278,88],"flags":{},"order":14,"mode":0,"inputs":[],"outputs":[],"properties":{},"widgets_values":["To compare multiple models, chain together Image Stitch nodes, then pipe the final one to ImagesGridByColumns."],"color":"#432","bgcolor":"#653"},{"id":76,"type":"PrimitiveFloat","pos":[1507.4591318081732,3165.5300826688263],"size":[210,58],"flags":{},"order":15,"mode":0,"inputs":[{"localized_name":"value","name":"value","type":"FLOAT","widget":{"name":"value"},"link":null}],"outputs":[{"localized_name":"FLOAT","name":"FLOAT","type":"FLOAT","links":[32,41,93,113,121]}],"title":"CFG","properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"PrimitiveFloat"},"widgets_values":[4]},{"id":65,"type":"PrimitiveInt","pos":[1506.5903533137168,3259.382377419693],"size":[210,82],"flags":{},"order":16,"mode":0,"inputs":[{"localized_name":"value","name":"value","type":"INT","widget":{"name":"value"},"link":null}],"outputs":[{"localized_name":"INT","name":"INT","type":"INT","links":[31,40,92,112,120]}],"title":"Steps","properties":{"cnr_id":"comfy-core","ver":"0.10.0","Node name for S&R":"PrimitiveInt"},"widgets_values":[30,"fixed"]},{"id":75,"type":"EmptyLatentImage","pos":[1513.1087719325296,2885.1589982916853],"size":[210,106],"flags":{},"order":17,"mode":0,"inputs":[{"localized_name":"width","name":"width","type":"INT","widget":{"name":"width"},"link":null},{"localized_name":"height","name":"height","type":"INT","widget":{"name":"height"},"link":null},{"localized_name":"batch_size","name":"batch_size","type":"INT","widget":{"name":"batch_size"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","links":[38,90,110,118,150]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"EmptyLatentImage"},"widgets_values":[896,1152,4]},{"id":27,"type":"VAELoader","pos":[1851.0660410182818,1871.9845375486557],"size":[270,58],"flags":{},"order":18,"mode":0,"inputs":[{"localized_name":"vae_name","name":"vae_name","type":"COMBO","widget":{"name":"vae_name"},"link":null}],"outputs":[{"localized_name":"VAE","name":"VAE","type":"VAE","links":[97,99,104]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"VAELoader"},"widgets_values":["flux_vae.safetensors"]},{"id":82,"type":"StringConcatenate","pos":[997.6601256454367,1345.839808695974],"size":[210,166],"flags":{},"order":37,"mode":0,"inputs":[{"localized_name":"string_a","name":"string_a","type":"STRING","widget":{"name":"string_a"},"link":151},{"localized_name":"string_b","name":"string_b","type":"STRING","widget":{"name":"string_b"},"link":152},{"localized_name":"delimiter","name":"delimiter","type":"STRING","widget":{"name":"delimiter"},"link":null}],"outputs":[{"localized_name":"STRING","name":"STRING","type":"STRING","links":[158]}],"properties":{"cnr_id":"comfy-core","ver":"0.11.0","Node name for S&R":"StringConcatenate"},"widgets_values":["You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. <Prompt Start> ","",""]},{"id":83,"type":"StringConcatenate","pos":[1235.4440459335233,1346.4088394576927],"size":[210,166],"flags":{},"order":26,"mode":0,"inputs":[{"localized_name":"string_a","name":"string_a","type":"STRING","widget":{"name":"string_a"},"link":154},{"localized_name":"string_b","name":"string_b","type":"STRING","widget":{"name":"string_b"},"link":155},{"localized_name":"delimiter","name":"delimiter","type":"STRING","widget":{"name":"delimiter"},"link":null}],"outputs":[{"localized_name":"STRING","name":"STRING","type":"STRING","links":[161]}],"properties":{"cnr_id":"comfy-core","ver":"0.11.0","Node name for S&R":"StringConcatenate"},"widgets_values":["You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. <Prompt Start> ","worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts",""]},{"id":84,"type":"CLIPTextEncode","pos":[1000.8262460088314,1557.214799966456],"size":[218.659765625,88],"flags":{},"order":42,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":157},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":158}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","links":[159]}],"properties":{"cnr_id":"comfy-core","ver":"0.11.0","Node name for S&R":"CLIPTextEncode"},"widgets_values":[""]},{"id":85,"type":"CLIPTextEncode","pos":[1226.836657111563,1555.7122261033703],"size":[218.659765625,88],"flags":{},"order":34,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":160},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":161}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","links":[162]}],"properties":{"cnr_id":"comfy-core","ver":"0.11.0","Node name for S&R":"CLIPTextEncode"},"widgets_values":[""]},{"id":86,"type":"CLIPTextEncode","pos":[936.7986784084183,2023.9892837278485],"size":[218.659765625,88],"flags":{},"order":38,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":163},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":164}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","links":[165]}],"properties":{"cnr_id":"comfy-core","ver":"0.11.0","Node name for S&R":"CLIPTextEncode"},"widgets_values":[""]},{"id":87,"type":"CLIPTextEncode","pos":[1175.0679612453314,2022.0144654600253],"size":[218.659765625,88],"flags":{},"order":31,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":166},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":167}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","links":[168]}],"properties":{"cnr_id":"comfy-core","ver":"0.11.0","Node name for S&R":"CLIPTextEncode"},"widgets_values":["worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts"]},{"id":88,"type":"CLIPTextEncode","pos":[933.1603413334465,2494.768474482655],"size":[218.659765625,88],"flags":{},"order":35,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":169},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":170}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","links":[171]}],"properties":{"cnr_id":"comfy-core","ver":"0.11.0","Node name for S&R":"CLIPTextEncode"},"widgets_values":[""]},{"id":89,"type":"CLIPTextEncode","pos":[1173.9600383298084,2493.926786277414],"size":[218.659765625,88],"flags":{},"order":33,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":172},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":173}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","links":[174]}],"properties":{"cnr_id":"comfy-core","ver":"0.11.0","Node name for S&R":"CLIPTextEncode"},"widgets_values":[""]},{"id":60,"type":"VAEDecode","pos":[1830.7758654624859,2320.7156031347176],"size":[140,46],"flags":{},"order":46,"mode":0,"inputs":[{"localized_name":"samples","name":"samples","type":"LATENT","link":103},{"localized_name":"vae","name":"vae","type":"VAE","link":104}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","slot_index":0,"links":[]}],"title":"Chroma","properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"VAEDecode"},"widgets_values":[],"color":"#223","bgcolor":"#335"},{"id":77,"type":"VAEDecode","pos":[1832.4649910011908,2417.633876664859],"size":[140,46],"flags":{},"order":48,"mode":0,"inputs":[{"localized_name":"samples","name":"samples","type":"LATENT","link":136},{"localized_name":"vae","name":"vae","type":"VAE","link":137}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","slot_index":0,"links":[]}],"title":"SDXL","properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"VAEDecode"},"widgets_values":[],"color":"#223","bgcolor":"#335"},{"id":53,"type":"VAEDecode","pos":[1832.6967066160307,2226.890968703358],"size":[140,46],"flags":{},"order":49,"mode":0,"inputs":[{"localized_name":"samples","name":"samples","type":"LATENT","link":96},{"localized_name":"vae","name":"vae","type":"VAE","link":97}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","links":[177]}],"title":"Newbie","properties":{"cnr_id":"comfy-core","ver":"0.3.76","Node name for S&R":"VAEDecode"},"widgets_values":[],"color":"#223","bgcolor":"#335"},{"id":54,"type":"VAEDecode","pos":[1831.3260515145544,2512.9007926114596],"size":[140,46],"flags":{},"order":45,"mode":0,"inputs":[{"localized_name":"samples","name":"samples","type":"LATENT","link":98},{"localized_name":"vae","name":"vae","type":"VAE","link":99}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","slot_index":0,"links":[178]}],"title":"Lumina","properties":{"cnr_id":"comfy-core","ver":"0.3.40","Node name for S&R":"VAEDecode"},"widgets_values":[],"color":"#223","bgcolor":"#335"},{"id":68,"type":"PrimitiveNode","pos":[1505.834568533806,3376.9321557251296],"size":[210,82],"flags":{},"order":19,"mode":0,"inputs":[],"outputs":[{"name":"INT","type":"INT","widget":{"name":"seed"},"links":[30,39,91,111,119]}],"title":"Seed","properties":{"Run widget replace on values":false},"widgets_values":[807882066116956,"randomize"]},{"id":72,"type":"PrimitiveNode","pos":[1512.1890205524262,3023.724054825973],"size":[210,106],"flags":{},"order":20,"mode":0,"inputs":[],"outputs":[{"name":"COMBO","type":"COMBO","widget":{"name":"sampler_name"},"links":[33,42,94,114,122]}],"title":"Sampler","properties":{"Run widget replace on values":false},"widgets_values":["er_sde","fixed",""]}],"links":[[1,42,0,1,0,"STRING"],[2,59,0,1,1,"STRING"],[5,70,0,3,0,"CLIP"],[6,59,0,3,1,"STRING"],[7,70,0,4,0,"CLIP"],[8,79,0,4,1,"STRING"],[13,42,0,7,0,"STRING"],[14,79,0,7,1,"STRING"],[21,13,0,11,0,"CLIP"],[23,13,0,12,0,"CLIP"],[24,59,0,12,1,"STRING"],[25,15,0,14,0,"MODEL"],[26,78,0,16,0,"MODEL"],[30,68,0,16,4,"INT"],[31,65,0,16,5,"INT"],[32,76,0,16,6,"FLOAT"],[33,72,0,16,7,"COMBO"],[35,14,0,20,0,"MODEL"],[36,11,0,20,1,"CONDITIONING"],[37,12,0,20,2,"CONDITIONING"],[38,75,0,20,3,"LATENT"],[39,68,0,20,4,"INT"],[40,65,0,20,5,"INT"],[41,76,0,20,6,"FLOAT"],[42,72,0,20,7,"COMBO"],[73,28,0,36,0,"MODEL"],[81,37,0,46,0,"MODEL"],[86,79,0,50,0,"STRING"],[87,46,0,51,0,"MODEL"],[90,75,0,51,3,"LATENT"],[91,68,0,51,4,"INT"],[92,65,0,51,5,"INT"],[93,76,0,51,6,"FLOAT"],[94,72,0,51,7,"COMBO"],[96,51,0,53,0,"LATENT"],[97,27,0,53,1,"VAE"],[98,62,0,54,0,"LATENT"],[99,27,0,54,1,"VAE"],[102,67,0,57,0,"MODEL"],[103,20,0,60,0,"LATENT"],[104,27,0,60,1,"VAE"],[107,36,0,62,0,"MODEL"],[110,75,0,62,3,"LATENT"],[111,68,0,62,4,"INT"],[112,65,0,62,5,"INT"],[113,76,0,62,6,"FLOAT"],[114,72,0,62,7,"COMBO"],[115,57,0,63,0,"MODEL"],[116,4,0,63,1,"CONDITIONING"],[117,3,0,63,2,"CONDITIONING"],[118,75,0,63,3,"LATENT"],[119,68,0,63,4,"INT"],[120,65,0,63,5,"INT"],[121,76,0,63,6,"FLOAT"],[122,72,0,63,7,"COMBO"],[131,63,0,66,0,"LATENT"],[132,29,0,66,1,"VAE"],[136,16,0,77,0,"LATENT"],[137,78,2,77,1,"VAE"],[138,52,0,80,0,"IMAGE"],[144,50,0,11,1,"STRING"],[147,25,0,74,0,"IMAGE"],[148,74,0,52,0,"IMAGE"],[149,66,0,74,1,"IMAGE"],[150,75,0,16,3,"LATENT"],[151,43,0,82,0,"STRING"],[152,50,0,82,1,"STRING"],[154,43,0,83,0,"STRING"],[155,59,0,83,1,"STRING"],[157,38,0,84,0,"CLIP"],[158,82,0,84,1,"STRING"],[159,84,0,51,1,"CONDITIONING"],[160,38,0,85,0,"CLIP"],[161,83,0,85,1,"STRING"],[162,85,0,51,2,"CONDITIONING"],[163,78,1,86,0,"CLIP"],[164,50,0,86,1,"STRING"],[165,86,0,16,1,"CONDITIONING"],[166,78,1,87,0,"CLIP"],[167,59,0,87,1,"STRING"],[168,87,0,16,2,"CONDITIONING"],[169,56,0,88,0,"CLIP"],[170,7,0,88,1,"STRING"],[171,88,0,62,1,"CONDITIONING"],[172,56,0,89,0,"CLIP"],[173,1,0,89,1,"STRING"],[174,89,0,62,2,"CONDITIONING"],[177,53,0,25,0,"IMAGE"],[178,54,0,25,1,"IMAGE"]],"groups":[],"config":{},"extra":{"workflowRendererVersion":"LG","ds":{"scale":1.1,"offset":[-429.0514136495865,-2646.8508439051707]}},"version":0.4}
|
example.png
ADDED
|
Git LFS Details
|
llm_adapter/config.json
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "AnimaLLMAdapter",
|
| 3 |
+
"_diffusers_version": "0.37.0",
|
| 4 |
+
"source_dim": 1024,
|
| 5 |
+
"target_dim": 1024,
|
| 6 |
+
"model_dim": 1024,
|
| 7 |
+
"num_layers": 6,
|
| 8 |
+
"num_heads": 16,
|
| 9 |
+
"mlp_ratio": 4.0,
|
| 10 |
+
"vocab_size": 32128,
|
| 11 |
+
"use_self_attn": true
|
| 12 |
+
}
|
llm_adapter/diffusion_pytorch_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:149d3c0ae9a1b76c5a02a722288a7eadeec306769e2a60f5b34513155c8a2105
|
| 3 |
+
size 269339368
|
llm_adapter/modeling_llm_adapter.py
ADDED
|
@@ -0,0 +1,215 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import torch
|
| 2 |
+
from torch import nn
|
| 3 |
+
import torch.nn.functional as F
|
| 4 |
+
from diffusers.configuration_utils import ConfigMixin, register_to_config
|
| 5 |
+
from diffusers.models.modeling_utils import ModelMixin
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def rotate_half(x):
|
| 9 |
+
x1 = x[..., : x.shape[-1] // 2]
|
| 10 |
+
x2 = x[..., x.shape[-1] // 2 :]
|
| 11 |
+
return torch.cat((-x2, x1), dim=-1)
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def apply_rotary_pos_emb(x, cos, sin, unsqueeze_dim=1):
|
| 15 |
+
cos = cos.unsqueeze(unsqueeze_dim)
|
| 16 |
+
sin = sin.unsqueeze(unsqueeze_dim)
|
| 17 |
+
return (x * cos) + (rotate_half(x) * sin)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class RotaryEmbedding(nn.Module):
|
| 21 |
+
def __init__(self, head_dim):
|
| 22 |
+
super().__init__()
|
| 23 |
+
self.rope_theta = 10000
|
| 24 |
+
inv_freq = 1.0 / (
|
| 25 |
+
self.rope_theta
|
| 26 |
+
** (torch.arange(0, head_dim, 2, dtype=torch.int64).to(dtype=torch.float) / head_dim)
|
| 27 |
+
)
|
| 28 |
+
self.register_buffer("inv_freq", inv_freq, persistent=False)
|
| 29 |
+
|
| 30 |
+
@torch.no_grad()
|
| 31 |
+
def forward(self, x, position_ids):
|
| 32 |
+
inv_freq_expanded = (
|
| 33 |
+
self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1).to(x.device)
|
| 34 |
+
)
|
| 35 |
+
position_ids_expanded = position_ids[:, None, :].float()
|
| 36 |
+
|
| 37 |
+
device_type = x.device.type if isinstance(x.device.type, str) and x.device.type != "mps" else "cpu"
|
| 38 |
+
with torch.autocast(device_type=device_type, enabled=False):
|
| 39 |
+
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
|
| 40 |
+
emb = torch.cat((freqs, freqs), dim=-1)
|
| 41 |
+
cos = emb.cos()
|
| 42 |
+
sin = emb.sin()
|
| 43 |
+
return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
class Attention(nn.Module):
|
| 47 |
+
def __init__(self, query_dim, context_dim, n_heads, head_dim):
|
| 48 |
+
super().__init__()
|
| 49 |
+
inner_dim = head_dim * n_heads
|
| 50 |
+
self.n_heads = n_heads
|
| 51 |
+
self.head_dim = head_dim
|
| 52 |
+
|
| 53 |
+
self.q_proj = nn.Linear(query_dim, inner_dim, bias=False)
|
| 54 |
+
self.q_norm = nn.RMSNorm(head_dim, eps=1e-6)
|
| 55 |
+
self.k_proj = nn.Linear(context_dim, inner_dim, bias=False)
|
| 56 |
+
self.k_norm = nn.RMSNorm(head_dim, eps=1e-6)
|
| 57 |
+
self.v_proj = nn.Linear(context_dim, inner_dim, bias=False)
|
| 58 |
+
self.o_proj = nn.Linear(inner_dim, query_dim, bias=False)
|
| 59 |
+
|
| 60 |
+
def forward(self, x, mask=None, context=None, position_embeddings=None, position_embeddings_context=None):
|
| 61 |
+
context = x if context is None else context
|
| 62 |
+
input_shape = x.shape[:-1]
|
| 63 |
+
q_shape = (*input_shape, self.n_heads, self.head_dim)
|
| 64 |
+
context_shape = context.shape[:-1]
|
| 65 |
+
kv_shape = (*context_shape, self.n_heads, self.head_dim)
|
| 66 |
+
|
| 67 |
+
query_states = self.q_norm(self.q_proj(x).view(q_shape)).transpose(1, 2)
|
| 68 |
+
key_states = self.k_norm(self.k_proj(context).view(kv_shape)).transpose(1, 2)
|
| 69 |
+
value_states = self.v_proj(context).view(kv_shape).transpose(1, 2)
|
| 70 |
+
|
| 71 |
+
if position_embeddings is not None:
|
| 72 |
+
assert position_embeddings_context is not None
|
| 73 |
+
cos, sin = position_embeddings
|
| 74 |
+
query_states = apply_rotary_pos_emb(query_states, cos, sin)
|
| 75 |
+
cos, sin = position_embeddings_context
|
| 76 |
+
key_states = apply_rotary_pos_emb(key_states, cos, sin)
|
| 77 |
+
|
| 78 |
+
attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, attn_mask=mask)
|
| 79 |
+
attn_output = attn_output.transpose(1, 2).reshape(*input_shape, -1).contiguous()
|
| 80 |
+
return self.o_proj(attn_output)
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
class TransformerBlock(nn.Module):
|
| 84 |
+
def __init__(self, source_dim, model_dim, num_heads=16, mlp_ratio=4.0, use_self_attn=True):
|
| 85 |
+
super().__init__()
|
| 86 |
+
self.use_self_attn = use_self_attn
|
| 87 |
+
|
| 88 |
+
if self.use_self_attn:
|
| 89 |
+
self.norm_self_attn = nn.RMSNorm(model_dim, eps=1e-6)
|
| 90 |
+
self.self_attn = Attention(
|
| 91 |
+
query_dim=model_dim,
|
| 92 |
+
context_dim=model_dim,
|
| 93 |
+
n_heads=num_heads,
|
| 94 |
+
head_dim=model_dim // num_heads,
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
self.norm_cross_attn = nn.RMSNorm(model_dim, eps=1e-6)
|
| 98 |
+
self.cross_attn = Attention(
|
| 99 |
+
query_dim=model_dim,
|
| 100 |
+
context_dim=source_dim,
|
| 101 |
+
n_heads=num_heads,
|
| 102 |
+
head_dim=model_dim // num_heads,
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
+
self.norm_mlp = nn.RMSNorm(model_dim, eps=1e-6)
|
| 106 |
+
self.mlp = nn.Sequential(
|
| 107 |
+
nn.Linear(model_dim, int(model_dim * mlp_ratio)),
|
| 108 |
+
nn.GELU(),
|
| 109 |
+
nn.Linear(int(model_dim * mlp_ratio), model_dim),
|
| 110 |
+
)
|
| 111 |
+
|
| 112 |
+
def forward(
|
| 113 |
+
self,
|
| 114 |
+
x,
|
| 115 |
+
context,
|
| 116 |
+
target_attention_mask=None,
|
| 117 |
+
source_attention_mask=None,
|
| 118 |
+
position_embeddings=None,
|
| 119 |
+
position_embeddings_context=None,
|
| 120 |
+
):
|
| 121 |
+
if self.use_self_attn:
|
| 122 |
+
normed = self.norm_self_attn(x)
|
| 123 |
+
attn_out = self.self_attn(
|
| 124 |
+
normed,
|
| 125 |
+
mask=target_attention_mask,
|
| 126 |
+
position_embeddings=position_embeddings,
|
| 127 |
+
position_embeddings_context=position_embeddings,
|
| 128 |
+
)
|
| 129 |
+
x = x + attn_out
|
| 130 |
+
|
| 131 |
+
normed = self.norm_cross_attn(x)
|
| 132 |
+
attn_out = self.cross_attn(
|
| 133 |
+
normed,
|
| 134 |
+
mask=source_attention_mask,
|
| 135 |
+
context=context,
|
| 136 |
+
position_embeddings=position_embeddings,
|
| 137 |
+
position_embeddings_context=position_embeddings_context,
|
| 138 |
+
)
|
| 139 |
+
x = x + attn_out
|
| 140 |
+
x = x + self.mlp(self.norm_mlp(x))
|
| 141 |
+
return x
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
class AnimaLLMAdapter(ModelMixin, ConfigMixin):
|
| 145 |
+
@register_to_config
|
| 146 |
+
def __init__(
|
| 147 |
+
self,
|
| 148 |
+
source_dim: int = 1024,
|
| 149 |
+
target_dim: int = 1024,
|
| 150 |
+
model_dim: int = 1024,
|
| 151 |
+
num_layers: int = 6,
|
| 152 |
+
num_heads: int = 16,
|
| 153 |
+
mlp_ratio: float = 4.0,
|
| 154 |
+
vocab_size: int = 32128,
|
| 155 |
+
use_self_attn: bool = True,
|
| 156 |
+
):
|
| 157 |
+
super().__init__()
|
| 158 |
+
|
| 159 |
+
self.embed = nn.Embedding(vocab_size, target_dim)
|
| 160 |
+
if model_dim != target_dim:
|
| 161 |
+
self.in_proj = nn.Linear(target_dim, model_dim)
|
| 162 |
+
else:
|
| 163 |
+
self.in_proj = nn.Identity()
|
| 164 |
+
self.rotary_emb = RotaryEmbedding(model_dim // num_heads)
|
| 165 |
+
self.blocks = nn.ModuleList(
|
| 166 |
+
[
|
| 167 |
+
TransformerBlock(
|
| 168 |
+
source_dim,
|
| 169 |
+
model_dim,
|
| 170 |
+
num_heads=num_heads,
|
| 171 |
+
mlp_ratio=mlp_ratio,
|
| 172 |
+
use_self_attn=use_self_attn,
|
| 173 |
+
)
|
| 174 |
+
for _ in range(num_layers)
|
| 175 |
+
]
|
| 176 |
+
)
|
| 177 |
+
self.out_proj = nn.Linear(model_dim, target_dim)
|
| 178 |
+
self.norm = nn.RMSNorm(target_dim, eps=1e-6)
|
| 179 |
+
|
| 180 |
+
def forward(
|
| 181 |
+
self,
|
| 182 |
+
source_hidden_states: torch.Tensor,
|
| 183 |
+
target_input_ids: torch.Tensor,
|
| 184 |
+
target_attention_mask: torch.Tensor = None,
|
| 185 |
+
source_attention_mask: torch.Tensor = None,
|
| 186 |
+
) -> torch.Tensor:
|
| 187 |
+
if target_attention_mask is not None:
|
| 188 |
+
target_attention_mask = target_attention_mask.to(torch.bool)
|
| 189 |
+
if target_attention_mask.ndim == 2:
|
| 190 |
+
target_attention_mask = target_attention_mask.unsqueeze(1).unsqueeze(1)
|
| 191 |
+
|
| 192 |
+
if source_attention_mask is not None:
|
| 193 |
+
source_attention_mask = source_attention_mask.to(torch.bool)
|
| 194 |
+
if source_attention_mask.ndim == 2:
|
| 195 |
+
source_attention_mask = source_attention_mask.unsqueeze(1).unsqueeze(1)
|
| 196 |
+
|
| 197 |
+
x = self.in_proj(self.embed(target_input_ids))
|
| 198 |
+
context = source_hidden_states
|
| 199 |
+
|
| 200 |
+
position_ids = torch.arange(x.shape[1], device=x.device).unsqueeze(0)
|
| 201 |
+
position_ids_context = torch.arange(context.shape[1], device=x.device).unsqueeze(0)
|
| 202 |
+
position_embeddings = self.rotary_emb(x, position_ids)
|
| 203 |
+
position_embeddings_context = self.rotary_emb(x, position_ids_context)
|
| 204 |
+
|
| 205 |
+
for block in self.blocks:
|
| 206 |
+
x = block(
|
| 207 |
+
x,
|
| 208 |
+
context,
|
| 209 |
+
target_attention_mask=target_attention_mask,
|
| 210 |
+
source_attention_mask=source_attention_mask,
|
| 211 |
+
position_embeddings=position_embeddings,
|
| 212 |
+
position_embeddings_context=position_embeddings_context,
|
| 213 |
+
)
|
| 214 |
+
|
| 215 |
+
return self.norm(self.out_proj(x))
|
model_index.json
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "AnimaTextToImagePipeline",
|
| 3 |
+
"_diffusers_version": "0.37.0",
|
| 4 |
+
"text_encoder": [
|
| 5 |
+
"transformers",
|
| 6 |
+
"Qwen3Model"
|
| 7 |
+
],
|
| 8 |
+
"tokenizer": [
|
| 9 |
+
"transformers",
|
| 10 |
+
"PreTrainedTokenizerFast"
|
| 11 |
+
],
|
| 12 |
+
"t5_tokenizer": [
|
| 13 |
+
"transformers",
|
| 14 |
+
"T5TokenizerFast"
|
| 15 |
+
],
|
| 16 |
+
"llm_adapter": [
|
| 17 |
+
"modeling_llm_adapter",
|
| 18 |
+
"AnimaLLMAdapter"
|
| 19 |
+
],
|
| 20 |
+
"transformer": [
|
| 21 |
+
"diffusers",
|
| 22 |
+
"CosmosTransformer3DModel"
|
| 23 |
+
],
|
| 24 |
+
"vae": [
|
| 25 |
+
"diffusers",
|
| 26 |
+
"AutoencoderKLWan"
|
| 27 |
+
],
|
| 28 |
+
"scheduler": [
|
| 29 |
+
"diffusers",
|
| 30 |
+
"FlowMatchEulerDiscreteScheduler"
|
| 31 |
+
]
|
| 32 |
+
}
|
pipeline.py
ADDED
|
@@ -0,0 +1,371 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import inspect
|
| 2 |
+
from typing import Callable, Dict, List, Optional, Union
|
| 3 |
+
|
| 4 |
+
import numpy as np
|
| 5 |
+
import torch
|
| 6 |
+
from transformers import PreTrainedModel, PreTrainedTokenizerFast
|
| 7 |
+
|
| 8 |
+
from diffusers.callbacks import MultiPipelineCallbacks, PipelineCallback
|
| 9 |
+
from diffusers.models import AutoencoderKLWan, CosmosTransformer3DModel
|
| 10 |
+
from diffusers.schedulers import FlowMatchEulerDiscreteScheduler
|
| 11 |
+
from diffusers.utils import logging
|
| 12 |
+
from diffusers.utils.torch_utils import randn_tensor
|
| 13 |
+
from diffusers.video_processor import VideoProcessor
|
| 14 |
+
from diffusers.pipelines.pipeline_utils import DiffusionPipeline
|
| 15 |
+
from diffusers.pipelines.cosmos.pipeline_output import CosmosImagePipelineOutput
|
| 16 |
+
|
| 17 |
+
logger = logging.get_logger(__name__)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def retrieve_timesteps(scheduler, num_inference_steps=None, device=None, timesteps=None, sigmas=None, **kwargs):
|
| 21 |
+
if timesteps is not None and sigmas is not None:
|
| 22 |
+
raise ValueError("Only one of `timesteps` or `sigmas` can be passed.")
|
| 23 |
+
if timesteps is not None:
|
| 24 |
+
scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)
|
| 25 |
+
timesteps = scheduler.timesteps
|
| 26 |
+
num_inference_steps = len(timesteps)
|
| 27 |
+
elif sigmas is not None:
|
| 28 |
+
scheduler.set_timesteps(sigmas=sigmas, device=device, **kwargs)
|
| 29 |
+
timesteps = scheduler.timesteps
|
| 30 |
+
num_inference_steps = len(timesteps)
|
| 31 |
+
else:
|
| 32 |
+
scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)
|
| 33 |
+
timesteps = scheduler.timesteps
|
| 34 |
+
return timesteps, num_inference_steps
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
class AnimaTextToImagePipeline(DiffusionPipeline):
|
| 38 |
+
"""Pipeline for text-to-image generation using the Anima model.
|
| 39 |
+
|
| 40 |
+
Anima uses a Cosmos Predict2 backbone with a Qwen3 text encoder and an LLM adapter
|
| 41 |
+
that cross-attends T5 token embeddings to Qwen3 hidden states.
|
| 42 |
+
"""
|
| 43 |
+
|
| 44 |
+
model_cpu_offload_seq = "text_encoder->llm_adapter->transformer->vae"
|
| 45 |
+
_callback_tensor_inputs = ["latents", "prompt_embeds", "negative_prompt_embeds"]
|
| 46 |
+
|
| 47 |
+
def __init__(
|
| 48 |
+
self,
|
| 49 |
+
text_encoder: PreTrainedModel,
|
| 50 |
+
tokenizer: PreTrainedTokenizerFast,
|
| 51 |
+
t5_tokenizer: PreTrainedTokenizerFast,
|
| 52 |
+
llm_adapter,
|
| 53 |
+
transformer: CosmosTransformer3DModel,
|
| 54 |
+
vae: AutoencoderKLWan,
|
| 55 |
+
scheduler: FlowMatchEulerDiscreteScheduler,
|
| 56 |
+
):
|
| 57 |
+
super().__init__()
|
| 58 |
+
|
| 59 |
+
self.register_modules(
|
| 60 |
+
text_encoder=text_encoder,
|
| 61 |
+
tokenizer=tokenizer,
|
| 62 |
+
t5_tokenizer=t5_tokenizer,
|
| 63 |
+
llm_adapter=llm_adapter,
|
| 64 |
+
transformer=transformer,
|
| 65 |
+
vae=vae,
|
| 66 |
+
scheduler=scheduler,
|
| 67 |
+
)
|
| 68 |
+
|
| 69 |
+
self.vae_scale_factor_temporal = 2 ** sum(self.vae.temperal_downsample) if getattr(self, "vae", None) else 4
|
| 70 |
+
self.vae_scale_factor_spatial = 2 ** len(self.vae.temperal_downsample) if getattr(self, "vae", None) else 8
|
| 71 |
+
self.video_processor = VideoProcessor(vae_scale_factor=self.vae_scale_factor_spatial)
|
| 72 |
+
|
| 73 |
+
def _encode_prompt(
|
| 74 |
+
self,
|
| 75 |
+
prompt: Union[str, List[str]],
|
| 76 |
+
device: torch.device,
|
| 77 |
+
dtype: torch.dtype,
|
| 78 |
+
max_sequence_length: int = 512,
|
| 79 |
+
):
|
| 80 |
+
"""Encode prompt through Qwen3 and run LLM adapter with T5 token IDs."""
|
| 81 |
+
prompt = [prompt] if isinstance(prompt, str) else prompt
|
| 82 |
+
batch_size = len(prompt)
|
| 83 |
+
|
| 84 |
+
# Check for empty prompts - return zero embeddings directly
|
| 85 |
+
all_empty = all(p.strip() == "" for p in prompt)
|
| 86 |
+
if all_empty:
|
| 87 |
+
return torch.zeros(batch_size, 512, self.llm_adapter.config.target_dim, device=device, dtype=dtype)
|
| 88 |
+
|
| 89 |
+
# Tokenize with Qwen3 tokenizer
|
| 90 |
+
qwen_inputs = self.tokenizer(
|
| 91 |
+
prompt,
|
| 92 |
+
padding=True,
|
| 93 |
+
truncation=True,
|
| 94 |
+
max_length=max_sequence_length,
|
| 95 |
+
return_tensors="pt",
|
| 96 |
+
)
|
| 97 |
+
qwen_input_ids = qwen_inputs.input_ids.to(device)
|
| 98 |
+
qwen_attention_mask = qwen_inputs.attention_mask.to(device)
|
| 99 |
+
|
| 100 |
+
# Get Qwen3 hidden states
|
| 101 |
+
qwen_outputs = self.text_encoder(
|
| 102 |
+
input_ids=qwen_input_ids,
|
| 103 |
+
attention_mask=qwen_attention_mask,
|
| 104 |
+
)
|
| 105 |
+
qwen_hidden_states = qwen_outputs.last_hidden_state.to(dtype=dtype)
|
| 106 |
+
|
| 107 |
+
# Tokenize with T5 tokenizer (we only need the IDs for the adapter embedding)
|
| 108 |
+
t5_inputs = self.t5_tokenizer(
|
| 109 |
+
prompt,
|
| 110 |
+
padding=True,
|
| 111 |
+
truncation=True,
|
| 112 |
+
max_length=max_sequence_length,
|
| 113 |
+
return_tensors="pt",
|
| 114 |
+
)
|
| 115 |
+
t5_input_ids = t5_inputs.input_ids.to(device)
|
| 116 |
+
|
| 117 |
+
# Run LLM adapter: T5 token embeddings attend to Qwen3 hidden states
|
| 118 |
+
adapted_embeds = self.llm_adapter(
|
| 119 |
+
source_hidden_states=qwen_hidden_states,
|
| 120 |
+
target_input_ids=t5_input_ids,
|
| 121 |
+
)
|
| 122 |
+
|
| 123 |
+
# Pad to 512 sequence length if shorter
|
| 124 |
+
if adapted_embeds.shape[1] < 512:
|
| 125 |
+
adapted_embeds = torch.nn.functional.pad(
|
| 126 |
+
adapted_embeds, (0, 0, 0, 512 - adapted_embeds.shape[1])
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
return adapted_embeds
|
| 130 |
+
|
| 131 |
+
def encode_prompt(
|
| 132 |
+
self,
|
| 133 |
+
prompt: Union[str, List[str]],
|
| 134 |
+
negative_prompt: Optional[Union[str, List[str]]] = None,
|
| 135 |
+
do_classifier_free_guidance: bool = True,
|
| 136 |
+
num_images_per_prompt: int = 1,
|
| 137 |
+
prompt_embeds: Optional[torch.Tensor] = None,
|
| 138 |
+
negative_prompt_embeds: Optional[torch.Tensor] = None,
|
| 139 |
+
max_sequence_length: int = 512,
|
| 140 |
+
device: Optional[torch.device] = None,
|
| 141 |
+
dtype: Optional[torch.dtype] = None,
|
| 142 |
+
):
|
| 143 |
+
device = device or self._execution_device
|
| 144 |
+
dtype = dtype or self.text_encoder.dtype
|
| 145 |
+
prompt = [prompt] if isinstance(prompt, str) else prompt
|
| 146 |
+
|
| 147 |
+
if prompt is not None:
|
| 148 |
+
batch_size = len(prompt)
|
| 149 |
+
else:
|
| 150 |
+
batch_size = prompt_embeds.shape[0]
|
| 151 |
+
|
| 152 |
+
if prompt_embeds is None:
|
| 153 |
+
prompt_embeds = self._encode_prompt(prompt, device, dtype, max_sequence_length)
|
| 154 |
+
_, seq_len, _ = prompt_embeds.shape
|
| 155 |
+
prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)
|
| 156 |
+
prompt_embeds = prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)
|
| 157 |
+
|
| 158 |
+
if do_classifier_free_guidance and negative_prompt_embeds is None:
|
| 159 |
+
negative_prompt = negative_prompt or ""
|
| 160 |
+
negative_prompt = batch_size * [negative_prompt] if isinstance(negative_prompt, str) else negative_prompt
|
| 161 |
+
negative_prompt_embeds = self._encode_prompt(negative_prompt, device, dtype, max_sequence_length)
|
| 162 |
+
_, seq_len, _ = negative_prompt_embeds.shape
|
| 163 |
+
negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1)
|
| 164 |
+
negative_prompt_embeds = negative_prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)
|
| 165 |
+
|
| 166 |
+
return prompt_embeds, negative_prompt_embeds
|
| 167 |
+
|
| 168 |
+
def prepare_latents(
|
| 169 |
+
self,
|
| 170 |
+
batch_size: int,
|
| 171 |
+
num_channels_latents: int,
|
| 172 |
+
height: int,
|
| 173 |
+
width: int,
|
| 174 |
+
num_frames: int = 1,
|
| 175 |
+
dtype: torch.dtype = None,
|
| 176 |
+
device: torch.device = None,
|
| 177 |
+
generator=None,
|
| 178 |
+
latents: torch.Tensor = None,
|
| 179 |
+
):
|
| 180 |
+
num_latent_frames = (num_frames - 1) // self.vae_scale_factor_temporal + 1
|
| 181 |
+
latent_height = height // self.vae_scale_factor_spatial
|
| 182 |
+
latent_width = width // self.vae_scale_factor_spatial
|
| 183 |
+
|
| 184 |
+
if latents is not None:
|
| 185 |
+
return latents.to(device=device, dtype=dtype)
|
| 186 |
+
|
| 187 |
+
shape = (batch_size, num_channels_latents, num_latent_frames, latent_height, latent_width)
|
| 188 |
+
latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
|
| 189 |
+
return latents
|
| 190 |
+
|
| 191 |
+
def check_inputs(self, prompt, height, width, prompt_embeds=None):
|
| 192 |
+
if height % 16 != 0 or width % 16 != 0:
|
| 193 |
+
raise ValueError(f"`height` and `width` have to be divisible by 16 but are {height} and {width}.")
|
| 194 |
+
if prompt is not None and prompt_embeds is not None:
|
| 195 |
+
raise ValueError("Cannot forward both `prompt` and `prompt_embeds`.")
|
| 196 |
+
elif prompt is None and prompt_embeds is None:
|
| 197 |
+
raise ValueError("Provide either `prompt` or `prompt_embeds`.")
|
| 198 |
+
|
| 199 |
+
@property
|
| 200 |
+
def guidance_scale(self):
|
| 201 |
+
return self._guidance_scale
|
| 202 |
+
|
| 203 |
+
@property
|
| 204 |
+
def do_classifier_free_guidance(self):
|
| 205 |
+
return self._guidance_scale > 1.0
|
| 206 |
+
|
| 207 |
+
@property
|
| 208 |
+
def num_timesteps(self):
|
| 209 |
+
return self._num_timesteps
|
| 210 |
+
|
| 211 |
+
@property
|
| 212 |
+
def interrupt(self):
|
| 213 |
+
return self._interrupt
|
| 214 |
+
|
| 215 |
+
@torch.no_grad()
|
| 216 |
+
def __call__(
|
| 217 |
+
self,
|
| 218 |
+
prompt: Union[str, List[str]] = None,
|
| 219 |
+
negative_prompt: Optional[Union[str, List[str]]] = None,
|
| 220 |
+
height: int = 768,
|
| 221 |
+
width: int = 1360,
|
| 222 |
+
num_inference_steps: int = 35,
|
| 223 |
+
guidance_scale: float = 7.0,
|
| 224 |
+
num_images_per_prompt: Optional[int] = 1,
|
| 225 |
+
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
|
| 226 |
+
latents: Optional[torch.Tensor] = None,
|
| 227 |
+
prompt_embeds: Optional[torch.Tensor] = None,
|
| 228 |
+
negative_prompt_embeds: Optional[torch.Tensor] = None,
|
| 229 |
+
output_type: Optional[str] = "pil",
|
| 230 |
+
return_dict: bool = True,
|
| 231 |
+
callback_on_step_end: Optional[
|
| 232 |
+
Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]
|
| 233 |
+
] = None,
|
| 234 |
+
callback_on_step_end_tensor_inputs: List[str] = ["latents"],
|
| 235 |
+
max_sequence_length: int = 512,
|
| 236 |
+
):
|
| 237 |
+
if isinstance(callback_on_step_end, (PipelineCallback, MultiPipelineCallbacks)):
|
| 238 |
+
callback_on_step_end_tensor_inputs = callback_on_step_end.tensor_inputs
|
| 239 |
+
|
| 240 |
+
num_frames = 1
|
| 241 |
+
|
| 242 |
+
self.check_inputs(prompt, height, width, prompt_embeds)
|
| 243 |
+
self._guidance_scale = guidance_scale
|
| 244 |
+
self._current_timestep = None
|
| 245 |
+
self._interrupt = False
|
| 246 |
+
|
| 247 |
+
device = self._execution_device
|
| 248 |
+
|
| 249 |
+
if prompt is not None and isinstance(prompt, str):
|
| 250 |
+
batch_size = 1
|
| 251 |
+
elif prompt is not None and isinstance(prompt, list):
|
| 252 |
+
batch_size = len(prompt)
|
| 253 |
+
else:
|
| 254 |
+
batch_size = prompt_embeds.shape[0]
|
| 255 |
+
|
| 256 |
+
# Encode prompt
|
| 257 |
+
prompt_embeds, negative_prompt_embeds = self.encode_prompt(
|
| 258 |
+
prompt=prompt,
|
| 259 |
+
negative_prompt=negative_prompt,
|
| 260 |
+
do_classifier_free_guidance=self.do_classifier_free_guidance,
|
| 261 |
+
num_images_per_prompt=num_images_per_prompt,
|
| 262 |
+
prompt_embeds=prompt_embeds,
|
| 263 |
+
negative_prompt_embeds=negative_prompt_embeds,
|
| 264 |
+
device=device,
|
| 265 |
+
max_sequence_length=max_sequence_length,
|
| 266 |
+
)
|
| 267 |
+
|
| 268 |
+
# Prepare timesteps - use default descending schedule (1→0)
|
| 269 |
+
timesteps, num_inference_steps = retrieve_timesteps(
|
| 270 |
+
self.scheduler, num_inference_steps=num_inference_steps, device=device
|
| 271 |
+
)
|
| 272 |
+
|
| 273 |
+
# Prepare latents
|
| 274 |
+
transformer_dtype = self.transformer.dtype
|
| 275 |
+
num_channels_latents = self.transformer.config.in_channels
|
| 276 |
+
latents = self.prepare_latents(
|
| 277 |
+
batch_size * num_images_per_prompt,
|
| 278 |
+
num_channels_latents,
|
| 279 |
+
height,
|
| 280 |
+
width,
|
| 281 |
+
num_frames,
|
| 282 |
+
torch.float32,
|
| 283 |
+
device,
|
| 284 |
+
generator,
|
| 285 |
+
latents,
|
| 286 |
+
)
|
| 287 |
+
|
| 288 |
+
padding_mask = latents.new_zeros(1, 1, height, width, dtype=transformer_dtype)
|
| 289 |
+
|
| 290 |
+
# Denoising loop using CONST preconditioning (flow matching velocity model):
|
| 291 |
+
# - c_in = 1.0 (no input scaling)
|
| 292 |
+
# - timestep = sigma (passed directly)
|
| 293 |
+
# - model output is the velocity: denoised = x - velocity * sigma
|
| 294 |
+
# - CFG applied to velocity (equivalent to applying to denoised for linear preconditioning)
|
| 295 |
+
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
|
| 296 |
+
self._num_timesteps = len(timesteps)
|
| 297 |
+
|
| 298 |
+
with self.progress_bar(total=num_inference_steps) as progress_bar:
|
| 299 |
+
for i, t in enumerate(timesteps):
|
| 300 |
+
if self.interrupt:
|
| 301 |
+
continue
|
| 302 |
+
|
| 303 |
+
self._current_timestep = t
|
| 304 |
+
sigma = self.scheduler.sigmas[i]
|
| 305 |
+
|
| 306 |
+
# Pass sigma directly as timestep (CONST preconditioning)
|
| 307 |
+
timestep = sigma.expand(latents.shape[0]).to(transformer_dtype)
|
| 308 |
+
latent_model_input = latents.to(transformer_dtype)
|
| 309 |
+
|
| 310 |
+
# Model predicts velocity (raw output IS the velocity for CONST)
|
| 311 |
+
velocity = self.transformer(
|
| 312 |
+
hidden_states=latent_model_input,
|
| 313 |
+
timestep=timestep,
|
| 314 |
+
encoder_hidden_states=prompt_embeds,
|
| 315 |
+
padding_mask=padding_mask,
|
| 316 |
+
return_dict=False,
|
| 317 |
+
)[0].float()
|
| 318 |
+
|
| 319 |
+
if self.do_classifier_free_guidance:
|
| 320 |
+
velocity_uncond = self.transformer(
|
| 321 |
+
hidden_states=latent_model_input,
|
| 322 |
+
timestep=timestep,
|
| 323 |
+
encoder_hidden_states=negative_prompt_embeds,
|
| 324 |
+
padding_mask=padding_mask,
|
| 325 |
+
return_dict=False,
|
| 326 |
+
)[0].float()
|
| 327 |
+
velocity = velocity_uncond + self.guidance_scale * (velocity - velocity_uncond)
|
| 328 |
+
|
| 329 |
+
# Euler step: scheduler computes x_next = x + (sigma_next - sigma) * velocity
|
| 330 |
+
latents = self.scheduler.step(velocity, t, latents, return_dict=False)[0]
|
| 331 |
+
|
| 332 |
+
if callback_on_step_end is not None:
|
| 333 |
+
callback_kwargs = {}
|
| 334 |
+
for k in callback_on_step_end_tensor_inputs:
|
| 335 |
+
callback_kwargs[k] = locals()[k]
|
| 336 |
+
callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)
|
| 337 |
+
latents = callback_outputs.pop("latents", latents)
|
| 338 |
+
prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
|
| 339 |
+
negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)
|
| 340 |
+
|
| 341 |
+
if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
|
| 342 |
+
progress_bar.update()
|
| 343 |
+
|
| 344 |
+
self._current_timestep = None
|
| 345 |
+
|
| 346 |
+
if not output_type == "latent":
|
| 347 |
+
latents_mean = (
|
| 348 |
+
torch.tensor(self.vae.config.latents_mean)
|
| 349 |
+
.view(1, self.vae.config.z_dim, 1, 1, 1)
|
| 350 |
+
.to(latents.device, latents.dtype)
|
| 351 |
+
)
|
| 352 |
+
latents_std = 1.0 / torch.tensor(self.vae.config.latents_std).view(1, self.vae.config.z_dim, 1, 1, 1).to(
|
| 353 |
+
latents.device, latents.dtype
|
| 354 |
+
)
|
| 355 |
+
latents = latents / latents_std + latents_mean
|
| 356 |
+
video = self.vae.decode(latents.to(self.vae.dtype), return_dict=False)[0]
|
| 357 |
+
video = self.video_processor.postprocess_video(video, output_type=output_type)
|
| 358 |
+
image = [batch[0] for batch in video]
|
| 359 |
+
if isinstance(video, torch.Tensor):
|
| 360 |
+
image = torch.stack(image)
|
| 361 |
+
elif isinstance(video, np.ndarray):
|
| 362 |
+
image = np.stack(image)
|
| 363 |
+
else:
|
| 364 |
+
image = latents[:, :, 0]
|
| 365 |
+
|
| 366 |
+
self.maybe_free_model_hooks()
|
| 367 |
+
|
| 368 |
+
if not return_dict:
|
| 369 |
+
return (image,)
|
| 370 |
+
|
| 371 |
+
return CosmosImagePipelineOutput(images=image)
|
scheduler/scheduler_config.json
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "FlowMatchEulerDiscreteScheduler",
|
| 3 |
+
"_diffusers_version": "0.37.0",
|
| 4 |
+
"num_train_timesteps": 1000,
|
| 5 |
+
"shift": 3.0
|
| 6 |
+
}
|
t5_tokenizer/tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
t5_tokenizer/tokenizer_config.json
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"backend": "tokenizers",
|
| 3 |
+
"clean_up_tokenization_spaces": true,
|
| 4 |
+
"eos_token": "</s>",
|
| 5 |
+
"extra_ids": 100,
|
| 6 |
+
"extra_special_tokens": {
|
| 7 |
+
"extra_id_0": "<extra_id_0>",
|
| 8 |
+
"extra_id_1": "<extra_id_1>",
|
| 9 |
+
"extra_id_2": "<extra_id_2>",
|
| 10 |
+
"extra_id_3": "<extra_id_3>",
|
| 11 |
+
"extra_id_4": "<extra_id_4>",
|
| 12 |
+
"extra_id_5": "<extra_id_5>",
|
| 13 |
+
"extra_id_6": "<extra_id_6>",
|
| 14 |
+
"extra_id_7": "<extra_id_7>",
|
| 15 |
+
"extra_id_8": "<extra_id_8>",
|
| 16 |
+
"extra_id_9": "<extra_id_9>",
|
| 17 |
+
"extra_id_10": "<extra_id_10>",
|
| 18 |
+
"extra_id_11": "<extra_id_11>",
|
| 19 |
+
"extra_id_12": "<extra_id_12>",
|
| 20 |
+
"extra_id_13": "<extra_id_13>",
|
| 21 |
+
"extra_id_14": "<extra_id_14>",
|
| 22 |
+
"extra_id_15": "<extra_id_15>",
|
| 23 |
+
"extra_id_16": "<extra_id_16>",
|
| 24 |
+
"extra_id_17": "<extra_id_17>",
|
| 25 |
+
"extra_id_18": "<extra_id_18>",
|
| 26 |
+
"extra_id_19": "<extra_id_19>",
|
| 27 |
+
"extra_id_20": "<extra_id_20>",
|
| 28 |
+
"extra_id_21": "<extra_id_21>",
|
| 29 |
+
"extra_id_22": "<extra_id_22>",
|
| 30 |
+
"extra_id_23": "<extra_id_23>",
|
| 31 |
+
"extra_id_24": "<extra_id_24>",
|
| 32 |
+
"extra_id_25": "<extra_id_25>",
|
| 33 |
+
"extra_id_26": "<extra_id_26>",
|
| 34 |
+
"extra_id_27": "<extra_id_27>",
|
| 35 |
+
"extra_id_28": "<extra_id_28>",
|
| 36 |
+
"extra_id_29": "<extra_id_29>",
|
| 37 |
+
"extra_id_30": "<extra_id_30>",
|
| 38 |
+
"extra_id_31": "<extra_id_31>",
|
| 39 |
+
"extra_id_32": "<extra_id_32>",
|
| 40 |
+
"extra_id_33": "<extra_id_33>",
|
| 41 |
+
"extra_id_34": "<extra_id_34>",
|
| 42 |
+
"extra_id_35": "<extra_id_35>",
|
| 43 |
+
"extra_id_36": "<extra_id_36>",
|
| 44 |
+
"extra_id_37": "<extra_id_37>",
|
| 45 |
+
"extra_id_38": "<extra_id_38>",
|
| 46 |
+
"extra_id_39": "<extra_id_39>",
|
| 47 |
+
"extra_id_40": "<extra_id_40>",
|
| 48 |
+
"extra_id_41": "<extra_id_41>",
|
| 49 |
+
"extra_id_42": "<extra_id_42>",
|
| 50 |
+
"extra_id_43": "<extra_id_43>",
|
| 51 |
+
"extra_id_44": "<extra_id_44>",
|
| 52 |
+
"extra_id_45": "<extra_id_45>",
|
| 53 |
+
"extra_id_46": "<extra_id_46>",
|
| 54 |
+
"extra_id_47": "<extra_id_47>",
|
| 55 |
+
"extra_id_48": "<extra_id_48>",
|
| 56 |
+
"extra_id_49": "<extra_id_49>",
|
| 57 |
+
"extra_id_50": "<extra_id_50>",
|
| 58 |
+
"extra_id_51": "<extra_id_51>",
|
| 59 |
+
"extra_id_52": "<extra_id_52>",
|
| 60 |
+
"extra_id_53": "<extra_id_53>",
|
| 61 |
+
"extra_id_54": "<extra_id_54>",
|
| 62 |
+
"extra_id_55": "<extra_id_55>",
|
| 63 |
+
"extra_id_56": "<extra_id_56>",
|
| 64 |
+
"extra_id_57": "<extra_id_57>",
|
| 65 |
+
"extra_id_58": "<extra_id_58>",
|
| 66 |
+
"extra_id_59": "<extra_id_59>",
|
| 67 |
+
"extra_id_60": "<extra_id_60>",
|
| 68 |
+
"extra_id_61": "<extra_id_61>",
|
| 69 |
+
"extra_id_62": "<extra_id_62>",
|
| 70 |
+
"extra_id_63": "<extra_id_63>",
|
| 71 |
+
"extra_id_64": "<extra_id_64>",
|
| 72 |
+
"extra_id_65": "<extra_id_65>",
|
| 73 |
+
"extra_id_66": "<extra_id_66>",
|
| 74 |
+
"extra_id_67": "<extra_id_67>",
|
| 75 |
+
"extra_id_68": "<extra_id_68>",
|
| 76 |
+
"extra_id_69": "<extra_id_69>",
|
| 77 |
+
"extra_id_70": "<extra_id_70>",
|
| 78 |
+
"extra_id_71": "<extra_id_71>",
|
| 79 |
+
"extra_id_72": "<extra_id_72>",
|
| 80 |
+
"extra_id_73": "<extra_id_73>",
|
| 81 |
+
"extra_id_74": "<extra_id_74>",
|
| 82 |
+
"extra_id_75": "<extra_id_75>",
|
| 83 |
+
"extra_id_76": "<extra_id_76>",
|
| 84 |
+
"extra_id_77": "<extra_id_77>",
|
| 85 |
+
"extra_id_78": "<extra_id_78>",
|
| 86 |
+
"extra_id_79": "<extra_id_79>",
|
| 87 |
+
"extra_id_80": "<extra_id_80>",
|
| 88 |
+
"extra_id_81": "<extra_id_81>",
|
| 89 |
+
"extra_id_82": "<extra_id_82>",
|
| 90 |
+
"extra_id_83": "<extra_id_83>",
|
| 91 |
+
"extra_id_84": "<extra_id_84>",
|
| 92 |
+
"extra_id_85": "<extra_id_85>",
|
| 93 |
+
"extra_id_86": "<extra_id_86>",
|
| 94 |
+
"extra_id_87": "<extra_id_87>",
|
| 95 |
+
"extra_id_88": "<extra_id_88>",
|
| 96 |
+
"extra_id_89": "<extra_id_89>",
|
| 97 |
+
"extra_id_90": "<extra_id_90>",
|
| 98 |
+
"extra_id_91": "<extra_id_91>",
|
| 99 |
+
"extra_id_92": "<extra_id_92>",
|
| 100 |
+
"extra_id_93": "<extra_id_93>",
|
| 101 |
+
"extra_id_94": "<extra_id_94>",
|
| 102 |
+
"extra_id_95": "<extra_id_95>",
|
| 103 |
+
"extra_id_96": "<extra_id_96>",
|
| 104 |
+
"extra_id_97": "<extra_id_97>",
|
| 105 |
+
"extra_id_98": "<extra_id_98>",
|
| 106 |
+
"extra_id_99": "<extra_id_99>"
|
| 107 |
+
},
|
| 108 |
+
"is_local": false,
|
| 109 |
+
"model_max_length": 512,
|
| 110 |
+
"pad_token": "<pad>",
|
| 111 |
+
"tokenizer_class": "T5Tokenizer",
|
| 112 |
+
"unk_token": "<unk>"
|
| 113 |
+
}
|
text_encoder/config.json
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Qwen3Model"
|
| 4 |
+
],
|
| 5 |
+
"model_type": "qwen3",
|
| 6 |
+
"vocab_size": 151936,
|
| 7 |
+
"hidden_size": 1024,
|
| 8 |
+
"intermediate_size": 3072,
|
| 9 |
+
"num_hidden_layers": 28,
|
| 10 |
+
"num_attention_heads": 16,
|
| 11 |
+
"num_key_value_heads": 8,
|
| 12 |
+
"head_dim": 128,
|
| 13 |
+
"hidden_act": "silu",
|
| 14 |
+
"max_position_embeddings": 32768,
|
| 15 |
+
"rms_norm_eps": 1e-06,
|
| 16 |
+
"rope_theta": 1000000.0,
|
| 17 |
+
"attention_bias": false,
|
| 18 |
+
"attention_dropout": 0.0,
|
| 19 |
+
"use_cache": false,
|
| 20 |
+
"tie_word_embeddings": false,
|
| 21 |
+
"torch_dtype": "bfloat16"
|
| 22 |
+
}
|
text_encoder/generation_config.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token_id": 151643,
|
| 3 |
+
"do_sample": true,
|
| 4 |
+
"eos_token_id": [
|
| 5 |
+
151645,
|
| 6 |
+
151643
|
| 7 |
+
],
|
| 8 |
+
"pad_token_id": 151643,
|
| 9 |
+
"temperature": 0.6,
|
| 10 |
+
"top_k": 20,
|
| 11 |
+
"top_p": 0.95,
|
| 12 |
+
"transformers_version": "4.51.0"
|
| 13 |
+
}
|
text_encoder/merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
text_encoder/model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d10aa56a4da8a95d954d99228d9e20e27f96ac5fc8aa41b89a41532b16bb4817
|
| 3 |
+
size 1192135064
|
text_encoder/tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
|
| 3 |
+
size 11422654
|
text_encoder/tokenizer_config.json
ADDED
|
@@ -0,0 +1,239 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_bos_token": false,
|
| 3 |
+
"add_prefix_space": false,
|
| 4 |
+
"added_tokens_decoder": {
|
| 5 |
+
"151643": {
|
| 6 |
+
"content": "<|endoftext|>",
|
| 7 |
+
"lstrip": false,
|
| 8 |
+
"normalized": false,
|
| 9 |
+
"rstrip": false,
|
| 10 |
+
"single_word": false,
|
| 11 |
+
"special": true
|
| 12 |
+
},
|
| 13 |
+
"151644": {
|
| 14 |
+
"content": "<|im_start|>",
|
| 15 |
+
"lstrip": false,
|
| 16 |
+
"normalized": false,
|
| 17 |
+
"rstrip": false,
|
| 18 |
+
"single_word": false,
|
| 19 |
+
"special": true
|
| 20 |
+
},
|
| 21 |
+
"151645": {
|
| 22 |
+
"content": "<|im_end|>",
|
| 23 |
+
"lstrip": false,
|
| 24 |
+
"normalized": false,
|
| 25 |
+
"rstrip": false,
|
| 26 |
+
"single_word": false,
|
| 27 |
+
"special": true
|
| 28 |
+
},
|
| 29 |
+
"151646": {
|
| 30 |
+
"content": "<|object_ref_start|>",
|
| 31 |
+
"lstrip": false,
|
| 32 |
+
"normalized": false,
|
| 33 |
+
"rstrip": false,
|
| 34 |
+
"single_word": false,
|
| 35 |
+
"special": true
|
| 36 |
+
},
|
| 37 |
+
"151647": {
|
| 38 |
+
"content": "<|object_ref_end|>",
|
| 39 |
+
"lstrip": false,
|
| 40 |
+
"normalized": false,
|
| 41 |
+
"rstrip": false,
|
| 42 |
+
"single_word": false,
|
| 43 |
+
"special": true
|
| 44 |
+
},
|
| 45 |
+
"151648": {
|
| 46 |
+
"content": "<|box_start|>",
|
| 47 |
+
"lstrip": false,
|
| 48 |
+
"normalized": false,
|
| 49 |
+
"rstrip": false,
|
| 50 |
+
"single_word": false,
|
| 51 |
+
"special": true
|
| 52 |
+
},
|
| 53 |
+
"151649": {
|
| 54 |
+
"content": "<|box_end|>",
|
| 55 |
+
"lstrip": false,
|
| 56 |
+
"normalized": false,
|
| 57 |
+
"rstrip": false,
|
| 58 |
+
"single_word": false,
|
| 59 |
+
"special": true
|
| 60 |
+
},
|
| 61 |
+
"151650": {
|
| 62 |
+
"content": "<|quad_start|>",
|
| 63 |
+
"lstrip": false,
|
| 64 |
+
"normalized": false,
|
| 65 |
+
"rstrip": false,
|
| 66 |
+
"single_word": false,
|
| 67 |
+
"special": true
|
| 68 |
+
},
|
| 69 |
+
"151651": {
|
| 70 |
+
"content": "<|quad_end|>",
|
| 71 |
+
"lstrip": false,
|
| 72 |
+
"normalized": false,
|
| 73 |
+
"rstrip": false,
|
| 74 |
+
"single_word": false,
|
| 75 |
+
"special": true
|
| 76 |
+
},
|
| 77 |
+
"151652": {
|
| 78 |
+
"content": "<|vision_start|>",
|
| 79 |
+
"lstrip": false,
|
| 80 |
+
"normalized": false,
|
| 81 |
+
"rstrip": false,
|
| 82 |
+
"single_word": false,
|
| 83 |
+
"special": true
|
| 84 |
+
},
|
| 85 |
+
"151653": {
|
| 86 |
+
"content": "<|vision_end|>",
|
| 87 |
+
"lstrip": false,
|
| 88 |
+
"normalized": false,
|
| 89 |
+
"rstrip": false,
|
| 90 |
+
"single_word": false,
|
| 91 |
+
"special": true
|
| 92 |
+
},
|
| 93 |
+
"151654": {
|
| 94 |
+
"content": "<|vision_pad|>",
|
| 95 |
+
"lstrip": false,
|
| 96 |
+
"normalized": false,
|
| 97 |
+
"rstrip": false,
|
| 98 |
+
"single_word": false,
|
| 99 |
+
"special": true
|
| 100 |
+
},
|
| 101 |
+
"151655": {
|
| 102 |
+
"content": "<|image_pad|>",
|
| 103 |
+
"lstrip": false,
|
| 104 |
+
"normalized": false,
|
| 105 |
+
"rstrip": false,
|
| 106 |
+
"single_word": false,
|
| 107 |
+
"special": true
|
| 108 |
+
},
|
| 109 |
+
"151656": {
|
| 110 |
+
"content": "<|video_pad|>",
|
| 111 |
+
"lstrip": false,
|
| 112 |
+
"normalized": false,
|
| 113 |
+
"rstrip": false,
|
| 114 |
+
"single_word": false,
|
| 115 |
+
"special": true
|
| 116 |
+
},
|
| 117 |
+
"151657": {
|
| 118 |
+
"content": "<tool_call>",
|
| 119 |
+
"lstrip": false,
|
| 120 |
+
"normalized": false,
|
| 121 |
+
"rstrip": false,
|
| 122 |
+
"single_word": false,
|
| 123 |
+
"special": false
|
| 124 |
+
},
|
| 125 |
+
"151658": {
|
| 126 |
+
"content": "</tool_call>",
|
| 127 |
+
"lstrip": false,
|
| 128 |
+
"normalized": false,
|
| 129 |
+
"rstrip": false,
|
| 130 |
+
"single_word": false,
|
| 131 |
+
"special": false
|
| 132 |
+
},
|
| 133 |
+
"151659": {
|
| 134 |
+
"content": "<|fim_prefix|>",
|
| 135 |
+
"lstrip": false,
|
| 136 |
+
"normalized": false,
|
| 137 |
+
"rstrip": false,
|
| 138 |
+
"single_word": false,
|
| 139 |
+
"special": false
|
| 140 |
+
},
|
| 141 |
+
"151660": {
|
| 142 |
+
"content": "<|fim_middle|>",
|
| 143 |
+
"lstrip": false,
|
| 144 |
+
"normalized": false,
|
| 145 |
+
"rstrip": false,
|
| 146 |
+
"single_word": false,
|
| 147 |
+
"special": false
|
| 148 |
+
},
|
| 149 |
+
"151661": {
|
| 150 |
+
"content": "<|fim_suffix|>",
|
| 151 |
+
"lstrip": false,
|
| 152 |
+
"normalized": false,
|
| 153 |
+
"rstrip": false,
|
| 154 |
+
"single_word": false,
|
| 155 |
+
"special": false
|
| 156 |
+
},
|
| 157 |
+
"151662": {
|
| 158 |
+
"content": "<|fim_pad|>",
|
| 159 |
+
"lstrip": false,
|
| 160 |
+
"normalized": false,
|
| 161 |
+
"rstrip": false,
|
| 162 |
+
"single_word": false,
|
| 163 |
+
"special": false
|
| 164 |
+
},
|
| 165 |
+
"151663": {
|
| 166 |
+
"content": "<|repo_name|>",
|
| 167 |
+
"lstrip": false,
|
| 168 |
+
"normalized": false,
|
| 169 |
+
"rstrip": false,
|
| 170 |
+
"single_word": false,
|
| 171 |
+
"special": false
|
| 172 |
+
},
|
| 173 |
+
"151664": {
|
| 174 |
+
"content": "<|file_sep|>",
|
| 175 |
+
"lstrip": false,
|
| 176 |
+
"normalized": false,
|
| 177 |
+
"rstrip": false,
|
| 178 |
+
"single_word": false,
|
| 179 |
+
"special": false
|
| 180 |
+
},
|
| 181 |
+
"151665": {
|
| 182 |
+
"content": "<tool_response>",
|
| 183 |
+
"lstrip": false,
|
| 184 |
+
"normalized": false,
|
| 185 |
+
"rstrip": false,
|
| 186 |
+
"single_word": false,
|
| 187 |
+
"special": false
|
| 188 |
+
},
|
| 189 |
+
"151666": {
|
| 190 |
+
"content": "</tool_response>",
|
| 191 |
+
"lstrip": false,
|
| 192 |
+
"normalized": false,
|
| 193 |
+
"rstrip": false,
|
| 194 |
+
"single_word": false,
|
| 195 |
+
"special": false
|
| 196 |
+
},
|
| 197 |
+
"151667": {
|
| 198 |
+
"content": "<think>",
|
| 199 |
+
"lstrip": false,
|
| 200 |
+
"normalized": false,
|
| 201 |
+
"rstrip": false,
|
| 202 |
+
"single_word": false,
|
| 203 |
+
"special": false
|
| 204 |
+
},
|
| 205 |
+
"151668": {
|
| 206 |
+
"content": "</think>",
|
| 207 |
+
"lstrip": false,
|
| 208 |
+
"normalized": false,
|
| 209 |
+
"rstrip": false,
|
| 210 |
+
"single_word": false,
|
| 211 |
+
"special": false
|
| 212 |
+
}
|
| 213 |
+
},
|
| 214 |
+
"additional_special_tokens": [
|
| 215 |
+
"<|im_start|>",
|
| 216 |
+
"<|im_end|>",
|
| 217 |
+
"<|object_ref_start|>",
|
| 218 |
+
"<|object_ref_end|>",
|
| 219 |
+
"<|box_start|>",
|
| 220 |
+
"<|box_end|>",
|
| 221 |
+
"<|quad_start|>",
|
| 222 |
+
"<|quad_end|>",
|
| 223 |
+
"<|vision_start|>",
|
| 224 |
+
"<|vision_end|>",
|
| 225 |
+
"<|vision_pad|>",
|
| 226 |
+
"<|image_pad|>",
|
| 227 |
+
"<|video_pad|>"
|
| 228 |
+
],
|
| 229 |
+
"bos_token": null,
|
| 230 |
+
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
|
| 231 |
+
"clean_up_tokenization_spaces": false,
|
| 232 |
+
"eos_token": "<|im_end|>",
|
| 233 |
+
"errors": "replace",
|
| 234 |
+
"model_max_length": 131072,
|
| 235 |
+
"pad_token": "<|endoftext|>",
|
| 236 |
+
"split_special_tokens": false,
|
| 237 |
+
"tokenizer_class": "Qwen2Tokenizer",
|
| 238 |
+
"unk_token": null
|
| 239 |
+
}
|
text_encoder/vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer/chat_template.jinja
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- if tools %}
|
| 2 |
+
{{- '<|im_start|>system\n' }}
|
| 3 |
+
{%- if messages[0].role == 'system' %}
|
| 4 |
+
{{- messages[0].content + '\n\n' }}
|
| 5 |
+
{%- endif %}
|
| 6 |
+
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
| 7 |
+
{%- for tool in tools %}
|
| 8 |
+
{{- "\n" }}
|
| 9 |
+
{{- tool | tojson }}
|
| 10 |
+
{%- endfor %}
|
| 11 |
+
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
| 12 |
+
{%- else %}
|
| 13 |
+
{%- if messages[0].role == 'system' %}
|
| 14 |
+
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
|
| 15 |
+
{%- endif %}
|
| 16 |
+
{%- endif %}
|
| 17 |
+
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
| 18 |
+
{%- for message in messages[::-1] %}
|
| 19 |
+
{%- set index = (messages|length - 1) - loop.index0 %}
|
| 20 |
+
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
|
| 21 |
+
{%- set ns.multi_step_tool = false %}
|
| 22 |
+
{%- set ns.last_query_index = index %}
|
| 23 |
+
{%- endif %}
|
| 24 |
+
{%- endfor %}
|
| 25 |
+
{%- for message in messages %}
|
| 26 |
+
{%- if message.content is string %}
|
| 27 |
+
{%- set content = message.content %}
|
| 28 |
+
{%- else %}
|
| 29 |
+
{%- set content = '' %}
|
| 30 |
+
{%- endif %}
|
| 31 |
+
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
|
| 32 |
+
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
| 33 |
+
{%- elif message.role == "assistant" %}
|
| 34 |
+
{%- set reasoning_content = '' %}
|
| 35 |
+
{%- if message.reasoning_content is string %}
|
| 36 |
+
{%- set reasoning_content = message.reasoning_content %}
|
| 37 |
+
{%- else %}
|
| 38 |
+
{%- if '</think>' in content %}
|
| 39 |
+
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
| 40 |
+
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
| 41 |
+
{%- endif %}
|
| 42 |
+
{%- endif %}
|
| 43 |
+
{%- if loop.index0 > ns.last_query_index %}
|
| 44 |
+
{%- if loop.last or (not loop.last and reasoning_content) %}
|
| 45 |
+
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
|
| 46 |
+
{%- else %}
|
| 47 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 48 |
+
{%- endif %}
|
| 49 |
+
{%- else %}
|
| 50 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 51 |
+
{%- endif %}
|
| 52 |
+
{%- if message.tool_calls %}
|
| 53 |
+
{%- for tool_call in message.tool_calls %}
|
| 54 |
+
{%- if (loop.first and content) or (not loop.first) %}
|
| 55 |
+
{{- '\n' }}
|
| 56 |
+
{%- endif %}
|
| 57 |
+
{%- if tool_call.function %}
|
| 58 |
+
{%- set tool_call = tool_call.function %}
|
| 59 |
+
{%- endif %}
|
| 60 |
+
{{- '<tool_call>\n{"name": "' }}
|
| 61 |
+
{{- tool_call.name }}
|
| 62 |
+
{{- '", "arguments": ' }}
|
| 63 |
+
{%- if tool_call.arguments is string %}
|
| 64 |
+
{{- tool_call.arguments }}
|
| 65 |
+
{%- else %}
|
| 66 |
+
{{- tool_call.arguments | tojson }}
|
| 67 |
+
{%- endif %}
|
| 68 |
+
{{- '}\n</tool_call>' }}
|
| 69 |
+
{%- endfor %}
|
| 70 |
+
{%- endif %}
|
| 71 |
+
{{- '<|im_end|>\n' }}
|
| 72 |
+
{%- elif message.role == "tool" %}
|
| 73 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
| 74 |
+
{{- '<|im_start|>user' }}
|
| 75 |
+
{%- endif %}
|
| 76 |
+
{{- '\n<tool_response>\n' }}
|
| 77 |
+
{{- content }}
|
| 78 |
+
{{- '\n</tool_response>' }}
|
| 79 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
| 80 |
+
{{- '<|im_end|>\n' }}
|
| 81 |
+
{%- endif %}
|
| 82 |
+
{%- endif %}
|
| 83 |
+
{%- endfor %}
|
| 84 |
+
{%- if add_generation_prompt %}
|
| 85 |
+
{{- '<|im_start|>assistant\n' }}
|
| 86 |
+
{%- if enable_thinking is defined and enable_thinking is false %}
|
| 87 |
+
{{- '<think>\n\n</think>\n\n' }}
|
| 88 |
+
{%- endif %}
|
| 89 |
+
{%- endif %}
|
tokenizer/tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
|
| 3 |
+
size 11422650
|
tokenizer/tokenizer_config.json
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"backend": "tokenizers",
|
| 4 |
+
"bos_token": null,
|
| 5 |
+
"clean_up_tokenization_spaces": false,
|
| 6 |
+
"eos_token": "<|im_end|>",
|
| 7 |
+
"errors": "replace",
|
| 8 |
+
"extra_special_tokens": {
|
| 9 |
+
"im_start": "<|im_start|>",
|
| 10 |
+
"im_end": "<|im_end|>",
|
| 11 |
+
"object_ref_start": "<|object_ref_start|>",
|
| 12 |
+
"object_ref_end": "<|object_ref_end|>",
|
| 13 |
+
"box_start": "<|box_start|>",
|
| 14 |
+
"box_end": "<|box_end|>",
|
| 15 |
+
"quad_start": "<|quad_start|>",
|
| 16 |
+
"quad_end": "<|quad_end|>",
|
| 17 |
+
"vision_start": "<|vision_start|>",
|
| 18 |
+
"vision_end": "<|vision_end|>",
|
| 19 |
+
"vision_pad": "<|vision_pad|>",
|
| 20 |
+
"image_pad": "<|image_pad|>",
|
| 21 |
+
"video_pad": "<|video_pad|>"
|
| 22 |
+
},
|
| 23 |
+
"is_local": false,
|
| 24 |
+
"model_max_length": 131072,
|
| 25 |
+
"pad_token": "<|endoftext|>",
|
| 26 |
+
"split_special_tokens": false,
|
| 27 |
+
"tokenizer_class": "Qwen2Tokenizer",
|
| 28 |
+
"unk_token": null
|
| 29 |
+
}
|
transformer/config.json
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "CosmosTransformer3DModel",
|
| 3 |
+
"_diffusers_version": "0.37.0.dev0",
|
| 4 |
+
"adaln_lora_dim": 256,
|
| 5 |
+
"attention_head_dim": 128,
|
| 6 |
+
"concat_padding_mask": true,
|
| 7 |
+
"controlnet_block_every_n": null,
|
| 8 |
+
"crossattn_proj_in_channels": 1024,
|
| 9 |
+
"encoder_hidden_states_channels": 1024,
|
| 10 |
+
"extra_pos_embed_type": null,
|
| 11 |
+
"img_context_dim_in": null,
|
| 12 |
+
"img_context_dim_out": 2048,
|
| 13 |
+
"img_context_num_tokens": 256,
|
| 14 |
+
"in_channels": 16,
|
| 15 |
+
"max_size": [
|
| 16 |
+
128,
|
| 17 |
+
240,
|
| 18 |
+
240
|
| 19 |
+
],
|
| 20 |
+
"mlp_ratio": 4.0,
|
| 21 |
+
"num_attention_heads": 16,
|
| 22 |
+
"num_layers": 28,
|
| 23 |
+
"out_channels": 16,
|
| 24 |
+
"patch_size": [
|
| 25 |
+
1,
|
| 26 |
+
2,
|
| 27 |
+
2
|
| 28 |
+
],
|
| 29 |
+
"rope_scale": [
|
| 30 |
+
1.0,
|
| 31 |
+
4.0,
|
| 32 |
+
4.0
|
| 33 |
+
],
|
| 34 |
+
"text_embed_dim": 1024,
|
| 35 |
+
"use_crossattn_projection": false
|
| 36 |
+
}
|
transformer/diffusion_pytorch_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bc5b769a7b2e050d7c30269bb4c94e00b445e28cdb772c1fe65001388cad17ae
|
| 3 |
+
size 7825687184
|
vae/config.json
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "AutoencoderKLWan",
|
| 3 |
+
"_diffusers_version": "0.33.0.dev0",
|
| 4 |
+
"attn_scales": [],
|
| 5 |
+
"base_dim": 96,
|
| 6 |
+
"dim_mult": [
|
| 7 |
+
1,
|
| 8 |
+
2,
|
| 9 |
+
4,
|
| 10 |
+
4
|
| 11 |
+
],
|
| 12 |
+
"dropout": 0.0,
|
| 13 |
+
"latents_mean": [
|
| 14 |
+
-0.7571,
|
| 15 |
+
-0.7089,
|
| 16 |
+
-0.9113,
|
| 17 |
+
0.1075,
|
| 18 |
+
-0.1745,
|
| 19 |
+
0.9653,
|
| 20 |
+
-0.1517,
|
| 21 |
+
1.5508,
|
| 22 |
+
0.4134,
|
| 23 |
+
-0.0715,
|
| 24 |
+
0.5517,
|
| 25 |
+
-0.3632,
|
| 26 |
+
-0.1922,
|
| 27 |
+
-0.9497,
|
| 28 |
+
0.2503,
|
| 29 |
+
-0.2921
|
| 30 |
+
],
|
| 31 |
+
"latents_std": [
|
| 32 |
+
2.8184,
|
| 33 |
+
1.4541,
|
| 34 |
+
2.3275,
|
| 35 |
+
2.6558,
|
| 36 |
+
1.2196,
|
| 37 |
+
1.7708,
|
| 38 |
+
2.6052,
|
| 39 |
+
2.0743,
|
| 40 |
+
3.2687,
|
| 41 |
+
2.1526,
|
| 42 |
+
2.8652,
|
| 43 |
+
1.5579,
|
| 44 |
+
1.6382,
|
| 45 |
+
1.1253,
|
| 46 |
+
2.8251,
|
| 47 |
+
1.916
|
| 48 |
+
],
|
| 49 |
+
"num_res_blocks": 2,
|
| 50 |
+
"temperal_downsample": [
|
| 51 |
+
false,
|
| 52 |
+
true,
|
| 53 |
+
true
|
| 54 |
+
],
|
| 55 |
+
"z_dim": 16
|
| 56 |
+
}
|
vae/diffusion_pytorch_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3b5bf326a6c4f66fb2b2250687fdccd1f126ee7c977d2f0170cb56fdacc70a9a
|
| 3 |
+
size 253806934
|