Super-squash branch 'main' using huggingface_hub

Browse files

Co-authored-by: mli0603 <mli0603@users.noreply.huggingface.co>
Co-authored-by: tangyue0820 <tangyue0820@users.noreply.huggingface.co>
Co-authored-by: CYChenv <CYChenv@users.noreply.huggingface.co>
Co-authored-by: ybalaji <ybalaji@users.noreply.huggingface.co>
Co-authored-by: zekunhao <zekunhao@users.noreply.huggingface.co>
Co-authored-by: harrim-nv <harrim-nv@users.noreply.huggingface.co>
Co-authored-by: liang1225 <liang1225@users.noreply.huggingface.co>
Co-authored-by: sgururani <sgururani@users.noreply.huggingface.co>
Co-authored-by: HaotianZhangDIR <HaotianZhangDIR@users.noreply.huggingface.co>
Co-authored-by: mbalaNV <mbalaNV@users.noreply.huggingface.co>
Co-authored-by: nv-spectralflight <nv-spectralflight@users.noreply.huggingface.co>
Co-authored-by: OmarNvidia <OmarNvidia@users.noreply.huggingface.co>

This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +43 -0
  2. BIAS.md +11 -0
  3. EXPLAINABILITY.md +16 -0
  4. PRIVACY.md +6 -0
  5. README.md +944 -0
  6. SAFETY.md +11 -0
  7. assets/example_action_fd_agibotworld_4chunk_output.mp4 +3 -0
  8. assets/example_action_fd_agibotworld_action_chunks.json +2008 -0
  9. assets/example_action_fd_agibotworld_first_frame.png +3 -0
  10. assets/example_action_id_av_0_input.mp4 +3 -0
  11. assets/example_action_id_av_0_output.json +669 -0
  12. assets/example_action_id_av_0_output.png +3 -0
  13. assets/example_action_id_av_1_input.mp4 +3 -0
  14. assets/example_action_id_av_1_output.json +669 -0
  15. assets/example_action_id_av_1_output.png +3 -0
  16. assets/example_i2v_input.jpg +3 -0
  17. assets/example_i2v_output.mp4 +3 -0
  18. assets/example_i2v_prompt.json +124 -0
  19. assets/example_i2vs_output.mp4 +3 -0
  20. assets/example_reasoning_input.png +3 -0
  21. assets/example_reasoning_prompt.json +4 -0
  22. assets/example_t2v_diffusers_output.mp4 +3 -0
  23. assets/example_t2v_output.mp4 +3 -0
  24. assets/example_t2v_prompt.json +115 -0
  25. assets/example_t2v_prompt_short.txt +1 -0
  26. assets/example_t2vs_output.mp4 +3 -0
  27. assets/example_t2vs_prompt.json +136 -0
  28. assets/negative_prompt.json +108 -0
  29. chat_template.json +3 -0
  30. checkpoint.json +1 -0
  31. config.json +260 -0
  32. generation_config.json +14 -0
  33. images/benchmark-action-1.png +3 -0
  34. images/benchmark-overall.png +3 -0
  35. images/benchmark-reasoning.png +3 -0
  36. images/benchmark-visual-audio.png +3 -0
  37. merges.txt +0 -0
  38. model.safetensors.index.json +0 -0
  39. model_index.json +28 -0
  40. preprocessor_config.json +21 -0
  41. scheduler/scheduler_config.json +33 -0
  42. sound_tokenizer.ckpt +3 -0
  43. sound_tokenizer.json +42 -0
  44. sound_tokenizer/config.json +64 -0
  45. sound_tokenizer/diffusion_pytorch_model.safetensors +3 -0
  46. text_tokenizer/added_tokens.json +28 -0
  47. text_tokenizer/chat_template.jinja +120 -0
  48. text_tokenizer/merges.txt +0 -0
  49. text_tokenizer/special_tokens_map.json +31 -0
  50. text_tokenizer/tokenizer.json +3 -0
.gitattributes ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ text_tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ images/benchmark-overall.png filter=lfs diff=lfs merge=lfs -text
38
+ images/benchmark-reasoning.png filter=lfs diff=lfs merge=lfs -text
39
+ images/benchmark-visual-audio.png filter=lfs diff=lfs merge=lfs -text
40
+ images/benchmark-action-1.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/*.mp4 filter=lfs diff=lfs merge=lfs -text
42
+ assets/*.jpg filter=lfs diff=lfs merge=lfs -text
43
+ assets/*.png filter=lfs diff=lfs merge=lfs -text
BIAS.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Bias
2
+
3
+ | Field | Response |
4
+ | :---- | :---- |
5
+ | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing | None. |
6
+ | Measures taken to mitigate against unwanted bias | Training, evaluation, and testing data are curated before release to filter restricted content, including content relating to protected classes. Model behavior is evaluated across Physical AI domains — robotics, autonomous vehicles, human-centric scenes, common scenes, industry, miscellaneous, and physics-oriented benchmarks — with attention to coverage across diverse demographic and contextual characteristics that affect protected-class outcomes. |
7
+ | Which characteristic (feature) show(s) the greatest difference in performance?: | Greatest performance differences are observed in tasks requiring long-horizon temporal consistency, fine-grained physical interactions, and embodiment-specific action generation. Performance is generally stronger on common visual reasoning and world-generation tasks than on complex multi-agent, robotics-control, or tightly synchronized multimodal generation scenarios. |
8
+ | Which feature(s) have the worst performance overall? | Performance is generally weakest in tasks requiring long-horizon temporal consistency, precise physical interactions, embodiment-specific action control, and strict audio-visual synchronization. |
9
+ | If using internal data, description of methods implemented in data acquisition or processing, if any, to address the prevalence of identifiable biases in the training, testing, and validation data: | Bias-specific methods applied during data processing include person-presence screening, demographic-taxonomy classification (age, gender, ethnicity), embedding-based diversity analysis, and dataset balancing across sources. Internal analysis surfaced: non-person scenes are more prevalent than person-centric content; demographic-taxonomy outputs on person-present samples are most frequently "uncertain" across age, gender, and ethnicity dimensions; and source-type variation, with people-centric image and video datasets showing higher demographic signal than document-, object-, robotics-, or scene-focused datasets. *(Quantitative details in the row below.)* Downstream deployments should add bias audits, fairness evaluation, red-teaming, demographically balanced fine-tuning, or counterfactual augmentation as mitigations. |
10
+ | Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: | Dataset analytics pipelines, metadata distribution analysis, heuristic quality checks, embedding-based clustering, model-assisted filtering systems, and benchmark evaluation suites are used to assess statistical imbalances and identify patterns that may introduce bias into model behavior. |
11
+ | Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: | These datasets, such as OpenImages-derived detection-to-NLP datasets, visual grounding and VQA datasets, document/image understanding datasets, video/action understanding datasets, and NVIDIA-created or curated visual datasets, do not collectively or exhaustively represent all demographic groups (and proportionally therein). For instance, automated person-presence screening did not identify a person in approximately 58% of visual samples analyzed across approximately 400 datasets, while person-present signals were identified in approximately 42% of analyzed samples. In the subset where person-present signals were identified, these datasets contain uneven representation splits across the measured visual taxonomies: age outputs were most frequently uncertain, followed by child and adult; gender outputs were most frequently uncertain, followed by male and female; and ethnicity outputs were most frequently uncertain, followed by Hispanic and White as the most frequent identified categories. Dataset-level results vary by source type, with people-centric image and video datasets containing higher person-present and demographic-taxonomy signals than document-, object-, robotics-, or scene-focused datasets. To mitigate these imbalances, we recommend considering evaluation techniques such as bias audits, task-specific fairness evaluation, and red-teaming, along with fine-tuning with demographically balanced datasets and counterfactual data augmentation to align with the desired model behavior. This evaluation used a baseline of 200 samples across all datasets, with larger subsets of up to 3,000 samples utilized for certain in-depth analyses, identified as optimal thresholds for maximizing embedder accuracy. |
EXPLAINABILITY.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Explainability
2
+
3
+ | Field | Response |
4
+ | :---- | :---- |
5
+ | Intended Application & Domain | World reasoning and generation for Physical AI. |
6
+ | Model Type | Mixture-of-Transformers architecture with two towers. One is an autoregressive model for Physical AI reasoning; the other is a diffusion model for Physical AI generation. |
7
+ | Intended Users | Physical AI developers, researchers, and practitioners building or evaluating autonomous vehicle, robotics, and world-generation workflows. |
8
+ | Output | Images, videos, audio, and action commands. |
9
+ | Tools used to evaluate datasets to identify synthetic data and ensure data authenticity. | Dataset provenance analysis, metadata validation, watermark and artifact detection, embedding-based clustering, heuristic quality checks, and model-assisted data validation pipelines are used to identify synthetic content patterns, assess dataset authenticity, and improve data quality during dataset curation. |
10
+ | Describe how the model works | Cosmos3 is an Omni world foundation model that generates texts, images, videos, audio, and action commands from combinations of text, images, videos, and action trajectory inputs. Input tokens from multiple modalities are packed into a shared sequence and processed by our mixture-of-transformer backbone with modality-specific output heads. |
11
+ | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | None. |
12
+ | Technical Limitations | The model may not follow text, image, video, audio, or action trajectory inputs accurately in challenging cases, especially where the input contains complex scene composition, unusual camera motion, multiple interacting agents, low lighting, high motion blur, or fine-grained physical interactions. Generated outputs may contain temporal inconsistency, object morphing, inaccurate 3D structure, or implausible physical dynamics. Generated audio may not accurately render intelligible speech, or maintain strict temporal and semantic alignment with the visual context. |
13
+ | Verified to have met prescribed NVIDIA quality standards | Yes. |
14
+ | Performance Metrics | Video generation is measured using PAIBench-G, RBench, PhysicsIQ, and Artifical Analysis Image2Video benchmark. Image generation uses UniGenBench and Artifical Analysis Text2Image benchmark. For transfer evaluation, we use PAIBench-C and AVBench-C. Audio generation uses internal benchmarks. Action prediction uses metrics such as action MSE, Absolute Translation Error, Relative Translation Error, Relative Rotation Error, PSNR, and robotic task completion success rate. |
15
+ | Potential Known Risks | This model can generate synthetic media and may produce content that is offensive, unsafe, misleading, indecent, or unsuitable for a target deployment. Users should implement robust safety guardrails — including content filtering, abuse monitoring, and access controls — to reduce the risk of harmful outputs. Users are responsible for ensuring that their use of the model complies with all applicable laws and regulations, and for regularly reviewing and updating their guardrails as risks evolve. |
16
+ | Licensing | [OpenMDW1.1](https://openmdw.ai/) |
PRIVACY.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ ## Privacy
2
+ | Privacy Information |
3
+ |---|
4
+ | The model was trained on large-scale publicly available data that may contain images, audio-video, and text relating to people. NVIDIA collected and used this data in compliance with applicable data protection and privacy laws. This model was not designed to derive insights or otherwise learn from any personal data contained in the datasets. |
5
+ | NVIDIA uses a combination of filters, data minimization techniques, and other guardrails to help prevent personal data from being recited by our models. We employ automated tools and data processing techniques during pre-training or training to identify and filter certain categories of personal data. For example, for text-bearing source and document components, our automated tools identified potential personal data such as person names, locations, and possible business or public-facing contact information such as email addresses and phone numbers. We reviewed and removed any verified instances of personal data through a combination of automated filtering and human-in-the-loop validation. |
6
+ | Please review NVIDIA's [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) for more information. |
README.md ADDED
@@ -0,0 +1,944 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: openmdw1.1-license
4
+ license_link: >-
5
+ https://openmdw.ai/license/1-1/
6
+ library_name: cosmos
7
+ tags:
8
+ - nvidia
9
+ - cosmos
10
+ - cosmos3
11
+ - vllm
12
+ - vllm-omni
13
+ - diffusers
14
+ - text, image, video, audio, and action generation
15
+ - omnimodel
16
+ ---
17
+
18
+ # **Cosmos 3: Omnimodal World Models for Physical AI**
19
+ **[Model Collection](https://huggingface.co/collections/nvidia/cosmos3)** | **[Code](https://github.com/nvidia/cosmos)** | **[White Paper](https://research.nvidia.com/labs/cosmos-lab/cosmos3/technical-report.pdf)** | **[Website](https://research.nvidia.com/labs/cosmos-lab/cosmos3/)**
20
+
21
+ [NVIDIA Cosmos™](https://github.com/nvidia/cosmos) is a world foundation model platform designed to accelerate the development of Physical AI by enabling machines to understand, simulate, and interact with the physical world across robotics, autonomous driving, and smart space environments, including industrial and factory-scale applications.
22
+
23
+ # Model Overview: Cosmos3-Super
24
+
25
+ ## Description
26
+
27
+ Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory inputs. It serves as a foundational building block for a broad range of Physical AI applications and research spanning world understanding, world generation, simulation, and embodied policy learning.
28
+
29
+ This model is ready for commercial and non-commercial use.
30
+
31
+ **Model Developer:** NVIDIA
32
+
33
+ ### Model Versions
34
+ - Cosmos3-Nano:
35
+ - Given multimodal inputs including text, images, video, audio, and action trajectories, generate coherent text, images, video, audio, and action outputs for multimodal understanding, world simulation, future prediction, action reasoning, and Physical AI applications.
36
+
37
+ - Cosmos3-Super:
38
+ - Given multimodal inputs including text, images, video, audio, and action trajectories, generate coherent text, images, video, audio, and action outputs for multimodal understanding, world simulation, future prediction, action reasoning, and Physical AI applications.
39
+
40
+ - Cosmos3-Nano-Policy-DROID:
41
+ - Given language instructions and visual observations from the DROID robot platform, generate robot action trajectories for manipulation and control tasks.
42
+
43
+ - Cosmos3-Super-Image2Video:
44
+ - Given one input image and text instructions, generate temporally coherent video sequences that are consistent with the provided visual content.
45
+
46
+ - Cosmos3-Super-Text2Image:
47
+ - Given text input, generate high-fidelity images that are consistent with the provided description.
48
+
49
+ ### License
50
+
51
+ This model is released under the [OpenMDW1.1](https://openmdw.ai/license/1-1/)
52
+
53
+ ### Deployment Geography
54
+
55
+ Global
56
+
57
+ ### Use Case
58
+
59
+ Physical AI: Encompassing robotics, autonomous vehicles (AV), and smart space environments, including industrial and factory-scale applications.
60
+
61
+ ### Release Date
62
+
63
+ Hugging Face 05/31/2026 via [https://huggingface.co/collections/nvidia/cosmos3](https://huggingface.co/collections/nvidia/cosmos3)
64
+ GitHub 05/31/2026 via [https://github.com/nvidia/cosmos](https://github.com/nvidia/cosmos)
65
+
66
+ ## Model Architecture
67
+
68
+ **Architecture Type:** Transformer
69
+
70
+ **Network Architecture:** Mixture-of-Transformers (MoT)
71
+
72
+ Cosmos3 is an Omni-modal foundation model built on a Mixture-of-Transformers (MoT) architecture consisting of two complementary transformer towers: an autoregressive transformer for discrete token generation and a diffusion transformer for continuous multimodal generation. During inference, text is generated through standard next-token autoregressive decoding, while non-text modalities, such as images, video, audio, and actions, are synthesized through iterative denoising. This unified architecture enables Cosmos3 to model heterogeneous modalities within a single framework while preserving generation mechanisms best suited to each modality.
73
+
74
+ **This model was developed based on:** [Cosmos Framework](https://github.com/nvidia/cosmos-framework)
75
+
76
+ **Number of trainable model parameters:**
77
+
78
+ - Cosmos3-Nano: 16B
79
+ - Cosmos3-Super: 64B
80
+ - Cosmos3-Nano-Policy-DROID: 16B
81
+ - Cosmos3-Super-Image2Video: 64B
82
+ - Cosmos3-Super-Text2Image: 64B
83
+
84
+ ## Input/Output Specifications
85
+
86
+ - **Generator Input**
87
+ - **Input Type(s)**: Text, Image, Video (with audio or without audio), Action Trajectory
88
+ - **Input Format(s)**:
89
+ - Text: String
90
+ - Image: jpg, png, jpeg, webp
91
+ - Video (with or without audio): mp4
92
+ - Action: json (1D list)
93
+ - **Input Parameters**:
94
+ - Text: One-dimensional (1D)
95
+ - Image: Two-dimensional (2D)
96
+ - Video: Three-dimensional (3D)
97
+ - Audio: One-dimensional (1D)
98
+ - Action trajectory: One-dimensional (1D)
99
+ - **Other Properties Related to Input**:
100
+ - For video inputs, we accept various resolutions, including 720p, 480p, and 256p.
101
+ - When using input video with audio muxed into the video MP4 file, the audio should have 2 channels (stereo) and a 48 kHz sample rate.
102
+ - Image and video inputs are RGB color (8 bits per channel, sRGB color space); grayscale inputs are not supported.
103
+ - Action input is a per-frame sequence of robot/agent state or control values (e.g., joint positions, gripper state, camera pose). The full input is a 2D array shaped (T, D), where T is the number of frames and D is the embodiment-specific dimensionality listed below.
104
+ - Input action is only supported for compatible embodiments, including general camera motion (9D), autonomous vehicle (9D), egocentric motion (57D), single Franka Panda arm with RobotiQ gripper (10D), dual Franka Panda arm with RobotiQ gripper (20D), Agibot (29D), UR (10D), Google robot (10D), WidowX 250 (10D), UMI (9D).
105
+ - **Input Size and Length limits:**
106
+ - **Text:** 4096 tokens
107
+ - **Image:** 256p, 480p, and 720p resolution at one of these aspect ratios (16:9, 4:3, 1:1, 3:4, 9:16)
108
+ - **Video:** 256p, 480p, and 720p resolution at one of these aspect ratios (16:9, 4:3, 1:1, 3:4, 9:16). Max number of frames = 5.
109
+ - **Audio:** Max 0.5 second
110
+ - **Action:** 16 – 400 video frames
111
+ - **Generator Output**
112
+ - **Output Type(s)**: Image, video, audio, action, text
113
+ - **Output Format(s)**:
114
+ - Image: JPG
115
+ - Video: MP4
116
+ - Audio: Advanced Audio Coding (AAC) stream (muxed within the MP4)
117
+ - Action: 1D list (.json)
118
+ - Text: string
119
+ - **Output Parameters**:
120
+ - Image: Two-dimensional (2D)
121
+ - Video: Three-dimensional (3D)
122
+ - Audio: One-dimensional (1D)
123
+ - Action: One-dimensional (1D)
124
+ - Text: One-dimensional (1D)
125
+ - **Other Properties Related to Output**:
126
+ - The generated video is an MP4 file, with the resolution, frame rate, and duration specified in the input. The generated audio is encoded in AAC format, muxed into the video MP4 file with 2 channels (stereo) and a 48 kHz sample rate.
127
+ - Video generation supports durations from 5 to 400 frames, with 189 frames as the default generation duration.
128
+ - The generated action is only supported for compatible embodiments, including general camera motion (9D), autonomous vehicle (9D), egocentric motion (57D), single Franka Panda arm with RobotiQ gripper (10D), dual Franka Panda arm with RobotiQ gripper (20D), Agibot (29D), UR (10D), Google robot (10D), WidowX 250 (10D), UMI (9D).
129
+ - Audio: 48 kHz stereo AAC stream muxed into video mp4
130
+ - Video: mp4 at the FPS specified in input
131
+ - Image: JPEG
132
+ - **Reasoner Input**
133
+ - **Input Type(s)**: Text, Text+Image, Text+Video
134
+ - **Input Format(s)**:
135
+ - Text: String
136
+ - Image: jpg, png, jpeg, webp
137
+ - Video: mp4
138
+ - **Input Parameters**:
139
+ - Text: One-dimensional (1D)
140
+ - Image: Two-dimensional (2D)
141
+ - Video: Three-dimensional (3D)
142
+ - **Other Properties Related to Input**:
143
+ - Video inputs are recommended at a frame rate of 4 fps.
144
+ - Long-context inputs supported up to 256K tokens.
145
+ - **Input Size and Length limits:**
146
+ - **Text:** Up to 256K tokens (context window).
147
+ - **Image:** Standard input image formats; passed as file or URL.
148
+ - **Video:** mp4 at the recommended 4 fps.
149
+ - **Reasoner Output**
150
+ - **Output Type(s)**: Text
151
+ - **Output Format(s)**:
152
+ - Text: string
153
+ - **Output Parameters**:
154
+ - Text: One-dimensional (1D)
155
+ - **Other Properties Related to Output**:
156
+ - Default `max_tokens=4096+` is recommended for reasoning outputs; longer outputs may be requested.
157
+ - Reasoning outputs may include structured chain-of-thought, 2D/3D point localization, and bounding-box coordinates for vision-based tasks.
158
+
159
+ The video content visualizes the input text description as a short animated scene, capturing key elements within the specified time constraints.
160
+
161
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g., GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
162
+
163
+ ## Software Integration
164
+
165
+ **Runtime Engine(s):**
166
+
167
+ - [PyTorch](https://github.com/nvidia/cosmos3)
168
+ - [vLLM-Omni](https://github.com/vllm-project/vllm-omni)
169
+ - [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/en/index)
170
+
171
+ **Supported Hardware Microarchitecture Compatibility:**
172
+
173
+ - NVIDIA Ampere
174
+ - NVIDIA Blackwell
175
+ - NVIDIA Hopper
176
+
177
+ **Operating System(s):**
178
+
179
+ - Linux (We have not tested on other operating systems.)
180
+
181
+ **Note:** Only BF16 precision is tested. Other precisions like FP4, FP8, and FP16 are not officially supported.
182
+
183
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
184
+
185
+ ## Training, Testing, and Evaluation Datasets
186
+
187
+ ### Dataset Overview
188
+
189
+ - **Total Size:** 1.3B data points
190
+ - **Total Number of Datasets:** 393 dataset entries
191
+ - **Dataset partition:** Training [100%], Testing [N/A — evaluation benchmarks used separately], Validation [N/A — evaluation benchmarks used separately]
192
+ - **Time period for training data collection:** 2024–2026
193
+ - **Time period for testing data collection:** N/A (standard public benchmarks)
194
+ - **Time period for validation data collection:** N/A (standard public benchmarks)
195
+
196
+ Raw data from internal and external sources is transformed into training-ready data through multiple stages of curation, filtering, and quality review. Data acquisition spans diverse multimodal sources — robotics, autonomous driving, industrial environments, indoor and outdoor scenes, varied lighting and weather conditions, camera viewpoints, object categories, and human activities — to broaden coverage across Physical AI operating environments. Automated filtering pipelines remove corrupted, duplicate, low-quality, and restricted content. Metadata analysis, heuristic rules, and model-assisted classifiers are applied during preprocessing to flag anomalous distributions and low-diversity subsets. Human review supplements automated filtering for selected datasets, benchmark construction, and targeted quality analysis. Datasets are balanced across modalities and task categories — visual reasoning, text-to-image, text-to-video, image-to-video, audio generation, video transfer, action-conditioned generation, and action command generation — to reduce overrepresentation of narrow domains. Synthetic and simulation-based augmentation supplements coverage of rare physical interactions and edge-case scenarios. Deduplication and provenance tracking are applied across the corpus. The resulting processed data is converted into model-ready tokenized or encoded representations through modality-specific preprocessors before training begins.
197
+
198
+ Training datasets passed through multiple layers of automated and manual safeguards designed to reduce the presence of harmful or policy-violating content across categories including weapons and weapons-related instructional content, criminal planning, child sexual abuse material (CSAM), non-consensual intimate imagery (NCII), sexual content involving minors, harassment, hate speech, profanity, threats and incitement to violence, self-harm or suicide-related content, and graphic violence. Data sources are reviewed for licensing compatibility, provenance, and alignment with internal data governance and safety policies before admission into training corpora. Automated filtering pipelines combine multiple detection strategies: hash-matching against known CSAM and NCII reference databases; classifier-based moderation models trained for explicit sexual content, hate speech, violence, weapons imagery, and other restricted categories; keyword and regex-based screening for criminal-planning, threats, and self-harm phrases in text data; metadata and provenance heuristics for source-level risk signals; and embedding-based anomaly detection to surface samples that fall outside expected distributions. Human review and targeted audits supplement automated filtering for selected datasets, benchmark construction, and safety-sensitive evaluation. For multimodal Physical AI data (robotics, autonomous driving, industrial scenes), additional filtering targets invalid action trajectories, physically implausible interactions, and unsafe control sequences. Synthetic and simulation-generated data are evaluated through internal validation before inclusion. Benchmark evaluations and red-team testing are applied post-training to surface remaining safety gaps across world generation, reasoning, audio, and action tasks. No large-scale data-filtering process can guarantee complete removal of all harmful content; residual risks may remain, particularly in rare edge cases or open-world deployment settings. Ongoing monitoring and dataset review continue post-release.
199
+
200
+ **Data Modality and Training Data Size**
201
+
202
+ | Modality | Reasoning Data Sample Count | Generation Data Sample Count |
203
+ | -------- | ------------------- | -------------------- |
204
+ | Text | 22M | Not Applicable |
205
+ | Image | 19M | 767M |
206
+ | Video | 1M | 348M |
207
+ | Audio | Not Applicable | 139M |
208
+ | Action | Not Applicable | 8M |
209
+
210
+ **Data Collection Method by dataset**
211
+
212
+ - Hybrid: Automatic/Sensors, Synthetic, Automated
213
+
214
+ **Labeling Method by dataset**
215
+
216
+ - Hybrid: Human, Automated
217
+
218
+ **Properties:** The training, testing, and evaluation datasets consist of diverse multimodal video, image, audio, action, synthetic, and sensor-conditioned data sourced from NVIDIA-owned data and publicly available, commercially permissive datasets. These datasets are curated to exclude known restricted content and to support building an Omni model that learns to generate and reason about dynamic physical environments across world reasoning and generation tasks.
219
+
220
+ ### Public Datasets
221
+
222
+ | Dataset&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Samples&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
223
+ |---|---|
224
+ | OpenImage | 1.2M |
225
+ | Coyo700M | 100M |
226
+ | YouTube Video | 340M |
227
+ | UMI | 4.5M |
228
+
229
+ ### Private Datasets
230
+
231
+ | Dataset&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Samples&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
232
+ |---|---|
233
+ | Egocentric | 7M |
234
+ | Nexar | 0.6M |
235
+ | AgiBot | 0.2M |
236
+ | HOI | 0.3M |
237
+
238
+ ### Synthetic Datasets
239
+
240
+ | Dataset | Samples |
241
+ |---|---|
242
+ | synthetic images generated using HiDream-I1 | 15M |
243
+ | synthetic images generated using Qwen-Image-2512 | 14M |
244
+ | synthetic captions generated using Qwen3-VL | 1115M |
245
+
246
+ ## Evaluation Datasets
247
+
248
+ **Data Collection Method by dataset**
249
+
250
+ - Hybrid: Automatic/Sensors, Synthetic, Automated
251
+
252
+ **Labeling Method by dataset**
253
+
254
+ - Hybrid: Human, Automated
255
+
256
+ **Properties:** The training, testing, and evaluation datasets consist of diverse multimodal video, image, audio, action, synthetic, and sensor-conditioned data sourced from NVIDIA-owned data and publicly available, commercially permissive datasets. These datasets are curated to exclude known restricted content and to support building an Omni model that learns to generate and reason about dynamic physical environments across world reasoning and generation tasks.
257
+
258
+ ## Benchmarks
259
+
260
+ Please see our [technical paper](https://research.nvidia.com/labs/cosmos-lab/cosmos3/technical-report.pdf) for detailed evaluations of the base model.
261
+
262
+ ### Overall
263
+
264
+ ![Overall benchmark results](images/benchmark-overall.png)
265
+
266
+ ### Reasoning Benchmarks
267
+
268
+ ![Reasoning benchmarks](images/benchmark-reasoning.png)
269
+
270
+ ### Generation Benchmarks
271
+
272
+ #### Visual-Audio Generation
273
+
274
+ ![Visual & audio generation benchmarks](images/benchmark-visual-audio.png)
275
+
276
+ #### Action
277
+
278
+ ![Action benchmarks — forward and inverse dynamics](images/benchmark-action-1.png)
279
+
280
+ ## Usage
281
+
282
+ - See [Cosmos](https://github.com/nvidia/cosmos) for details.
283
+
284
+ ### Prompt upsampling
285
+
286
+ For optimal quality, text prompts should be upsampled into a specific JSON structure. Description and code can be found [here](https://github.com/nvidia/cosmos-framework/blob/main/docs/prompt_upsampling.md).
287
+
288
+ For example, for text-to-video upsampling using Opus-4.6:
289
+
290
+ ```bash
291
+ git clone https://github.com/NVIDIA/cosmos-framework.git packages/cosmos-framework
292
+ pip install -e packages/cosmos-framework
293
+
294
+ export PROMPT_UPSAMPLER_ENDPOINT_URL="https://api.anthropic.com/v1/"
295
+ export PROMPT_UPSAMPLER_MODEL_NAME="claude-opus-4-6"
296
+ export PROMPT_UPSAMPLER_API_TOKEN="<you_token>"
297
+
298
+ python -m cosmos_framework.inference.prompt_upsampling \
299
+ --input assets/example_t2v_prompt_short.txt \
300
+ --output /tmp/upsampled_t2v_opus/ \
301
+ --mode text2video \
302
+ --endpoint-url "${PROMPT_UPSAMPLER_ENDPOINT_URL}" \
303
+ --model "${PROMPT_UPSAMPLER_MODEL_NAME}" \
304
+ --api-token "${PROMPT_UPSAMPLER_API_TOKEN}" \
305
+ --resolution 720 \
306
+ --aspect-ratio "16,9"
307
+ ```
308
+
309
+ The JSON-upsampled version of `assets/example_t2v_prompt_short.txt` is saved in `assets/example_t2v_prompt.json` for convenience, and is used for the video generation examples below.
310
+
311
+ ### vLLM-Omni
312
+
313
+ #### Container
314
+
315
+ ```
316
+ docker pull vllm/vllm-omni:cosmos3
317
+ ```
318
+
319
+ #### General Invocation
320
+
321
+ You can use the release-tested `vllm-omni` package for deploying an OpenAI-compatible API inference endpoint.
322
+ The recommended vLLM-Omni serving configuration for nvidia/Cosmos3-Super on 8xH200, 8xH100, or 8xA100 is:
323
+
324
+ ```bash
325
+ vllm serve nvidia/Cosmos3-Super \
326
+ --omni \
327
+ --host 0.0.0.0 \
328
+ --port 8000 \
329
+ --cfg-parallel-size 2 \
330
+ --ulysses-degree 4 \
331
+ --use-hsdp \
332
+ --hsdp-shard-size 8 \
333
+ --init-timeout 1800
334
+ ```
335
+
336
+ With this configuration, video generation with 50 steps should take approximately 55 seconds on H200 GPUs. For 2xH200, one can simply use `--cfg-parallel-size 2 --use-hsdp --hsdp-shard-size 2`, and a video should take approximately 3 minutes to generate. Tensor parallelism is also supported by setting `--tensor-parallel-size`. Setting `--enable-layerwise-offload` can help reduce memory usage on GPUs with less available memory.
337
+
338
+ #### Examples
339
+
340
+ ##### Download example prompts
341
+
342
+ The example inputs (`assets/`) live in this model repo. Download just this folder with the Hugging Face CLI:
343
+
344
+ ```bash
345
+ pip install -U "huggingface_hub[cli]"
346
+ hf download nvidia/Cosmos3-Super assets/ --local-dir Cosmos3-Super
347
+ cd Cosmos3-Super
348
+ ```
349
+
350
+ Run all commands below from the downloaded repo root.
351
+
352
+ ---
353
+
354
+ ##### Image to video generation
355
+
356
+ ```python
357
+ import json
358
+ import mimetypes
359
+ from pathlib import Path
360
+
361
+ import requests
362
+
363
+ # 1. Read JSON-upsampled prompt and negative prompt
364
+ json_prompt = json.load(open("assets/example_i2v_prompt.json"))
365
+ negative_prompt = json.load(open("assets/negative_prompt.json"))
366
+
367
+ # 2. Build and send the multipart API request
368
+ url = "http://localhost:8000/v1/videos/sync"
369
+ image_path = Path("assets/example_i2v_input.jpg")
370
+ mime_type = mimetypes.guess_type(image_path)[0] or "image/png"
371
+ data = {
372
+ "prompt": json.dumps(json_prompt),
373
+ "negative_prompt": json.dumps(negative_prompt),
374
+ "size": "1280x720",
375
+ "num_frames": "189",
376
+ "fps": "24",
377
+ "num_inference_steps": "35",
378
+ "guidance_scale": "6.0",
379
+ "max_sequence_length": "4096",
380
+ "flow_shift": "10.0",
381
+ "extra_params": json.dumps(
382
+ {
383
+ "use_resolution_template": False,
384
+ "use_duration_template": False,
385
+ "guardrails": True,
386
+ }
387
+ ),
388
+ "seed": "17",
389
+ }
390
+
391
+ with image_path.open("rb") as image_file:
392
+ files = {
393
+ "input_reference": (image_path.name, image_file, mime_type),
394
+ }
395
+ print("Sending request to server...")
396
+ response = requests.post(
397
+ url,
398
+ data=data,
399
+ files=files,
400
+ headers={"Accept": "video/mp4"},
401
+ )
402
+ response.raise_for_status()
403
+
404
+ # 3. Save the generated video
405
+ output_path = Path("/tmp/cosmos3_super_i2v.mp4")
406
+ output_path.write_bytes(response.content)
407
+ print(f"Saved video to {output_path}")
408
+ ```
409
+
410
+ Example output:
411
+
412
+ <video controls width="1280" height="720" src="https://huggingface.co/nvidia/Cosmos3-Super/resolve/main/assets/example_i2v_output.mp4"></video>
413
+
414
+ ---
415
+
416
+ ##### Text to video generation
417
+
418
+ ```python
419
+ import json
420
+ from pathlib import Path
421
+
422
+ import requests
423
+
424
+ # 1. Read JSON-upsampled prompt and negative prompt
425
+ json_prompt = json.load(open("assets/example_t2v_prompt.json"))
426
+ negative_prompt = json.load(open("assets/negative_prompt.json"))
427
+
428
+ # 2. Build your API payload
429
+ data = {
430
+ "prompt": json.dumps(json_prompt),
431
+ "negative_prompt": json.dumps(negative_prompt),
432
+ "size": "1280x720",
433
+ "num_frames": "189",
434
+ "fps": "24",
435
+ "num_inference_steps": "35",
436
+ "guidance_scale": "6.0",
437
+ "max_sequence_length": "4096",
438
+ "flow_shift": "10.0",
439
+ "extra_params": json.dumps(
440
+ {
441
+ "use_resolution_template": False,
442
+ "use_duration_template": False,
443
+ "guardrails": True,
444
+ }
445
+ ),
446
+ "seed": "17",
447
+ }
448
+
449
+ # 3. Send the POST request
450
+ url = "http://localhost:8000/v1/videos/sync"
451
+ print("Sending request to server...")
452
+ response = requests.post(
453
+ url,
454
+ data=data,
455
+ headers={"Accept": "video/mp4"},
456
+ )
457
+ response.raise_for_status()
458
+
459
+ # 4. Save the generated video
460
+ output_path = Path("/tmp/cosmos3_super_t2v.mp4")
461
+ output_path.write_bytes(response.content)
462
+ print(f"Saved video to {output_path}")
463
+ ```
464
+
465
+ Example output:
466
+
467
+ <video controls width="1280" height="720" src="https://huggingface.co/nvidia/Cosmos3-Super/resolve/main/assets/example_t2v_output.mp4"></video>
468
+
469
+ ---
470
+
471
+ ##### Image to Video + Audio generation
472
+
473
+ ```python
474
+ import json
475
+ import mimetypes
476
+ from pathlib import Path
477
+
478
+ import requests
479
+
480
+ # 1. Read JSON-upsampled prompt and negative prompt
481
+ json_prompt = json.load(open("assets/example_i2v_prompt.json"))
482
+ negative_prompt = json.load(open("assets/negative_prompt.json"))
483
+
484
+ # 2. Build and send the multipart API request
485
+ url = "http://localhost:8000/v1/videos/sync"
486
+ image_path = Path("assets/example_i2v_input.jpg")
487
+ mime_type = mimetypes.guess_type(image_path)[0] or "image/png"
488
+ data = {
489
+ "prompt": json.dumps(json_prompt),
490
+ "negative_prompt": json.dumps(negative_prompt),
491
+ "size": "1280x720",
492
+ "num_frames": "189",
493
+ "fps": "24",
494
+ "num_inference_steps": "35",
495
+ "guidance_scale": "6.0",
496
+ "max_sequence_length": "4096",
497
+ "generate_sound": "true",
498
+ "sound_duration": "7.875",
499
+ "flow_shift": "10.0",
500
+ "extra_params": json.dumps(
501
+ {
502
+ "use_resolution_template": False,
503
+ "use_duration_template": False,
504
+ "guardrails": True,
505
+ }
506
+ ),
507
+ "seed": "17",
508
+ }
509
+
510
+ with image_path.open("rb") as image_file:
511
+ files = {
512
+ "input_reference": (image_path.name, image_file, mime_type),
513
+ }
514
+ print("Sending request to server...")
515
+ response = requests.post(
516
+ url,
517
+ data=data,
518
+ files=files,
519
+ headers={"Accept": "video/mp4"},
520
+ )
521
+ response.raise_for_status()
522
+
523
+ # 3. Save the generated video
524
+ output_path = Path("/tmp/cosmos3_super_i2vs.mp4")
525
+ output_path.write_bytes(response.content)
526
+ print(f"Saved video to {output_path}")
527
+ ```
528
+
529
+ Example output:
530
+
531
+ <video controls width="1280" height="720" src="https://huggingface.co/nvidia/Cosmos3-Super/resolve/main/assets/example_i2vs_output.mp4"></video>
532
+
533
+ ---
534
+
535
+ ##### Text to Video + Audio generation
536
+
537
+ ```python
538
+ import json
539
+ from pathlib import Path
540
+
541
+ import requests
542
+
543
+ # 1. Read JSON-upsampled prompt and negative prompt
544
+ json_prompt = json.load(open("assets/example_t2vs_prompt.json"))
545
+ negative_prompt = json.load(open("assets/negative_prompt.json"))
546
+
547
+ # 2. Build your API payload
548
+ data = {
549
+ "prompt": json.dumps(json_prompt),
550
+ "negative_prompt": json.dumps(negative_prompt),
551
+ "size": "1280x720",
552
+ "num_frames": "189",
553
+ "fps": "24",
554
+ "num_inference_steps": "35",
555
+ "guidance_scale": "6.0",
556
+ "max_sequence_length": "4096",
557
+ "generate_sound": "true",
558
+ "sound_duration": "7.875",
559
+ "flow_shift": "10.0",
560
+ "extra_params": json.dumps(
561
+ {
562
+ "use_resolution_template": False,
563
+ "use_duration_template": False,
564
+ "guardrails": True,
565
+ }
566
+ ),
567
+ "seed": "17",
568
+ }
569
+
570
+ # 3. Send the POST request
571
+ url = "http://localhost:8000/v1/videos/sync"
572
+ print("Sending request to server...")
573
+ response = requests.post(
574
+ url,
575
+ data=data,
576
+ headers={"Accept": "video/mp4"},
577
+ )
578
+ response.raise_for_status()
579
+
580
+ # 4. Save the generated video
581
+ output_path = Path("/tmp/cosmos3_super_t2vs.mp4")
582
+ output_path.write_bytes(response.content)
583
+ print(f"Saved video to {output_path}")
584
+ ```
585
+
586
+ Example output:
587
+
588
+ <video controls width="1280" height="720" src="https://huggingface.co/nvidia/Cosmos3-Super/resolve/main/assets/example_t2vs_output.mp4"></video>
589
+
590
+ ---
591
+
592
+ ##### Action generation
593
+
594
+ The forward-dynamics example uses AgiBotWorld-Beta robotics action trajectories, and the inverse-dynamics examples use autonomous-vehicle (AV) action trajectories. Source files:
595
+
596
+ - Forward dynamics first frame: `assets/example_action_fd_agibotworld_first_frame.png`
597
+ - Forward dynamics action chunks: `assets/example_action_fd_agibotworld_action_chunks.json`
598
+ - Forward dynamics output video: `assets/example_action_fd_agibotworld_4chunk_output.mp4`
599
+ - Inverse dynamics source videos: `assets/example_action_id_av_0_input.mp4`, `assets/example_action_id_av_1_input.mp4`
600
+ - Inverse dynamics predicted actions: `assets/example_action_id_av_0_output.json`, `assets/example_action_id_av_1_output.json`
601
+
602
+ ###### Action forward dynamics
603
+
604
+ The example below performs a 4-chunk AgiBotWorld-Beta robotics rollout with the vLLM-Omni `/v1/videos/sync` inference endpoint. Each request sends one conditioning frame through `input_reference` and one 16-step normalized 29-D action chunk through `extra_params["action"]`. The request also sets the top-level `size` field to the input image resolution, so vLLM-Omni returns each chunk at the same resolution as the conditioning image without reflection padding. The stitched output drops each chunk's conditioning frame, producing 64 generated frames. The script extracts the last generated frame from each chunk and uses it as the next chunk's conditioning frame.
605
+
606
+ ```python
607
+ import json
608
+ import mimetypes
609
+ from pathlib import Path
610
+
611
+ import imageio.v3 as iio
612
+ import numpy as np
613
+ import requests
614
+ from PIL import Image
615
+
616
+ url = "http://localhost:8000/v1/videos/sync"
617
+ first_frame_path = Path("assets/example_action_fd_agibotworld_first_frame.png")
618
+ action_spec = json.loads(Path("assets/example_action_fd_agibotworld_action_chunks.json").read_text())
619
+ action_chunks = action_spec["action_chunks"]
620
+
621
+ prompt = action_spec.get("prompt", "Pickup items in the supermarket")
622
+ fps = int(action_spec.get("fps", 10))
623
+ action_chunk_size = int(action_spec.get("action_chunk_size", 16))
624
+ current_frame_path = first_frame_path
625
+ input_width, input_height = Image.open(first_frame_path).size
626
+ chunk_video_paths = []
627
+ stitch_frames = []
628
+
629
+ for chunk_idx, action_chunk in enumerate(action_chunks):
630
+ mime_type = mimetypes.guess_type(current_frame_path)[0] or "image/png"
631
+ extra_params = {
632
+ "action_mode": "forward_dynamics",
633
+ "domain_name": action_spec.get("domain_name", "agibotworld"),
634
+ "action_chunk_size": action_chunk_size,
635
+ "image_size": action_spec.get("image_size", 480),
636
+ "view_point": action_spec.get("view_point", "concat_view"),
637
+ "action": action_chunk,
638
+ "guardrails": True,
639
+ }
640
+ data = {
641
+ "prompt": prompt,
642
+ "num_frames": str(action_chunk_size + 1), # conditioning frame + generated frames
643
+ "fps": str(fps),
644
+ "size": f"{input_width}x{input_height}", # return chunks at input resolution
645
+ "num_inference_steps": "30",
646
+ "guidance_scale": "1.0",
647
+ "flow_shift": "10.0",
648
+ "seed": "0",
649
+ "extra_params": json.dumps(extra_params),
650
+ }
651
+
652
+ with current_frame_path.open("rb") as image_file:
653
+ files = {"input_reference": (current_frame_path.name, image_file, mime_type)}
654
+ print(f"Sending action FD chunk {chunk_idx} to vLLM-Omni...")
655
+ response = requests.post(
656
+ url,
657
+ data=data,
658
+ files=files,
659
+ headers={"Accept": "video/mp4"},
660
+ timeout=600,
661
+ )
662
+ response.raise_for_status()
663
+
664
+ chunk_video_path = Path(f"/tmp/cosmos3_super_action_fd_chunk_{chunk_idx:02d}.mp4")
665
+ chunk_video_path.write_bytes(response.content)
666
+ chunk_video_paths.append(chunk_video_path)
667
+
668
+ # The returned chunk contains the conditioning frame followed by generated frames.
669
+ # Drop the conditioning frame when stitching the generated-only rollout.
670
+ frames = iio.imread(chunk_video_path)
671
+ stitch_frames.extend(frames[1:])
672
+
673
+ # Autoregressive conditioning: use the final generated frame from this chunk
674
+ # as the input image for the next vLLM-Omni request.
675
+ if chunk_idx + 1 < len(action_chunks):
676
+ current_frame_path = Path(f"/tmp/cosmos3_super_action_fd_ar_frame_{chunk_idx + 1:02d}.png")
677
+ iio.imwrite(current_frame_path, frames[-1])
678
+
679
+ stitched_path = Path("/tmp/cosmos3_super_action_fd_agibotworld_4chunk.mp4")
680
+ iio.imwrite(stitched_path, np.asarray(stitch_frames), fps=fps)
681
+ print("Generated chunk videos:", chunk_video_paths)
682
+ print("Saved stitched rollout:", stitched_path)
683
+ print("stitched resolution:", f"{input_width}x{input_height}")
684
+ ```
685
+
686
+
687
+ Example output:
688
+
689
+ <video width="640" controls src="https://huggingface.co/nvidia/Cosmos3-Super/resolve/main/assets/example_action_fd_agibotworld_4chunk_output.mp4"></video>
690
+
691
+ ###### Action inverse dynamics
692
+
693
+ ```python
694
+ import json
695
+ import time
696
+ from pathlib import Path
697
+
698
+ import requests
699
+
700
+ base_url = "http://localhost:8000"
701
+ input_videos = {
702
+ "av_inverse_0": Path("assets/example_action_id_av_0_input.mp4"),
703
+ "av_inverse_1": Path("assets/example_action_id_av_1_input.mp4"),
704
+ }
705
+
706
+ for name, video_path in input_videos.items():
707
+ extra_params = {
708
+ "action_mode": "inverse_dynamics",
709
+ "domain_name": "av",
710
+ "action_chunk_size": 60,
711
+ "image_size": 480,
712
+ "view_point": "ego_view",
713
+ "raw_action_dim": 9,
714
+ "guardrails": True,
715
+ }
716
+ data = {
717
+ "prompt": "You are an autonomous vehicle planning system.",
718
+ "num_frames": "61",
719
+ "fps": "10",
720
+ "num_inference_steps": "30",
721
+ "guidance_scale": "1.0",
722
+ "flow_shift": "10.0",
723
+ "seed": "0",
724
+ "extra_params": json.dumps(extra_params),
725
+ }
726
+
727
+ with video_path.open("rb") as video_file:
728
+ files = {
729
+ "input_reference": (video_path.name, video_file, "video/mp4"),
730
+ }
731
+ print(f"Submitting {name} request to server...")
732
+ response = requests.post(f"{base_url}/v1/videos", data=data, files=files)
733
+ response.raise_for_status()
734
+ initial = response.json()
735
+
736
+ while True:
737
+ response = requests.get(f"{base_url}/v1/videos/{initial['id']}", timeout=30)
738
+ response.raise_for_status()
739
+ final = response.json()
740
+ print(initial["id"], final.get("status"), f"{final.get('progress', 0)}%")
741
+ if final.get("status") == "completed":
742
+ break
743
+ if final.get("status") in {"failed", "cancelled"}:
744
+ raise RuntimeError(json.dumps(final, indent=2))
745
+ time.sleep(2)
746
+
747
+ action = final.get("action")
748
+ if not action or "data" not in action:
749
+ raise RuntimeError(f"Response did not include action data: {json.dumps(final, indent=2)}")
750
+
751
+ output_path = Path(f"/tmp/cosmos3_super_action_id_{name}.json")
752
+ output_path.write_text(json.dumps(action, indent=2))
753
+ print(f"Saved predicted action to {output_path}")
754
+ print("action shape:", action.get("shape"), "dtype:", action.get("dtype"))
755
+ ```
756
+
757
+ Example outputs:
758
+
759
+ - [av_inverse_0 predicted action JSON](https://huggingface.co/nvidia/Cosmos3-Super/blob/main/assets/example_action_id_av_0_output.json)
760
+ - [av_inverse_1 predicted action JSON](https://huggingface.co/nvidia/Cosmos3-Super/blob/main/assets/example_action_id_av_1_output.json)
761
+
762
+ <img width="1280" src="https://huggingface.co/nvidia/Cosmos3-Super/resolve/main/assets/example_action_id_av_0_output.png">
763
+
764
+ <img width="1280" src="https://huggingface.co/nvidia/Cosmos3-Super/resolve/main/assets/example_action_id_av_1_output.png">
765
+
766
+ ### vLLM
767
+
768
+ #### General Invocation
769
+
770
+ You can use the release-tested `vllm` package for deploying an OpenAI-compatible API endpoint:
771
+
772
+ ```shell
773
+ uv venv --python 3.13 --seed --managed-python
774
+ source .venv/bin/activate
775
+ uv pip install --torch-backend=cu130 "vllm==0.21.0" \
776
+ "vllm-cosmos3 @ git+https://github.com/NVIDIA/cosmos-framework.git#subdirectory=packages/vllm-cosmos3" \
777
+ openai
778
+ ```
779
+
780
+ Use `--torch-backend=cu130 "vllm==0.21.0"` for CUDA 13 drivers. For CUDA 12.8 drivers, use `--torch-backend=cu128 "vllm==0.19.1"`.
781
+
782
+ Start the Reasoner server:
783
+
784
+ ```shell
785
+ CUDA_VISIBLE_DEVICES=0,1,2,3 \
786
+ vllm serve nvidia/Cosmos3-Super \
787
+ --hf-overrides '{"architectures": ["Cosmos3ReasonerForConditionalGeneration"]}' \
788
+ --tensor-parallel-size 4 \
789
+ --mm-encoder-tp-mode data \
790
+ --async-scheduling \
791
+ --allowed-local-media-path / \
792
+ --media-io-kwargs '{"video": {"num_frames": -1}}' \
793
+ --port 8000
794
+ ```
795
+
796
+ #### Examples
797
+
798
+ ##### Reasoning
799
+
800
+ Run this example from the model repository root. It reads the robot planning prompt from `assets/example_reasoning_prompt.json` and sends `assets/example_reasoning_input.png` to the local vLLM server.
801
+
802
+ ```python
803
+ import json
804
+ from pathlib import Path
805
+
806
+ import openai
807
+
808
+ # 1. Read the image reasoning prompt
809
+ example = json.load(open("assets/example_reasoning_prompt.json"))
810
+ image_path = Path("assets/example_reasoning_input.png").resolve()
811
+ image_url = image_path.as_uri()
812
+
813
+ # 2. Query the OpenAI-compatible vLLM server
814
+ client = openai.OpenAI(
815
+ api_key="EMPTY",
816
+ base_url="http://localhost:8000/v1",
817
+ )
818
+
819
+ response = client.chat.completions.create(
820
+ model=client.models.list().data[0].id,
821
+ messages=[
822
+ {
823
+ "role": "user",
824
+ "content": [
825
+ {"type": "image_url", "image_url": {"url": image_url}},
826
+ {"type": "text", "text": example["prompt"]},
827
+ ],
828
+ },
829
+ ],
830
+ max_tokens=example["max_tokens"],
831
+ seed=0,
832
+ )
833
+
834
+ # 3. Print the generated reasoning output
835
+ print(response.choices[0].message.content)
836
+ ```
837
+
838
+ Example input:
839
+
840
+ <img src="assets/example_reasoning_input.png" width="640">
841
+
842
+ Prompt:
843
+
844
+ ```text
845
+ The task is to put flower into the red bottle. Generate a plan consisting of subtasks for accomplish the task.
846
+ ```
847
+
848
+ Example output from the command above:
849
+
850
+ ```text
851
+ 1. Move the arm to the left side of the table.
852
+ 2. Pick up the flower.
853
+ 3. Move the arm to the right side of the table.
854
+ 4. Place the flower into the red bottle.
855
+ ```
856
+
857
+ ### Diffusers
858
+
859
+ #### Container
860
+
861
+ To install diffusers with Cosmos3OmniPipeline:
862
+
863
+ ```
864
+ uv venv --python 3.13 --seed --managed-python
865
+ source .venv/bin/activate
866
+ uv pip install \
867
+ "diffusers @ git+https://github.com/huggingface/diffusers.git" \
868
+ accelerate \
869
+ av \
870
+ cosmos_guardrail \
871
+ huggingface_hub \
872
+ imageio \
873
+ imageio-ffmpeg \
874
+ torch \
875
+ torchvision \
876
+ transformers
877
+
878
+ ```
879
+
880
+ #### Examples
881
+
882
+ ##### Text to video generation
883
+
884
+ Run this example from the model repository root. It reads the JSON-upsampled prompt from `assets/example_t2v_prompt.json` and the negative prompt from `assets/negative_prompt.json`. It then loads the pipeline and generate the video, then save it to an MP4 file.
885
+
886
+ ```python
887
+ import json
888
+ import torch
889
+ from diffusers import Cosmos3OmniPipeline
890
+ from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
891
+ from diffusers.utils import export_to_video
892
+
893
+ # Read JSON-upsampled prompt and negative prompt
894
+ json_prompt = json.load(open("assets/example_t2v_prompt.json"))
895
+ negative_prompt = json.load(open("assets/negative_prompt.json"))
896
+
897
+ pipe = Cosmos3OmniPipeline.from_pretrained(
898
+ "nvidia/Cosmos3-Super",
899
+ torch_dtype=torch.bfloat16,
900
+ device_map="cuda",
901
+ enable_safety_checker=True,
902
+ )
903
+ pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=10.0)
904
+
905
+ result = pipe(
906
+ prompt=json.dumps(json_prompt),
907
+ negative_prompt=json.dumps(negative_prompt),
908
+ num_frames=189,
909
+ height=720,
910
+ width=1280,
911
+ num_inference_steps=35,
912
+ guidance_scale=6.0,
913
+ generator=torch.Generator(device="cuda").manual_seed(123),
914
+ )
915
+
916
+ export_to_video(result.video, "/tmp/cosmos3_super_t2v_diffusers.mp4", fps=24)
917
+ print("Saved video to /tmp/cosmos3_super_t2v_diffusers.mp4")
918
+ ```
919
+
920
+ Example output:
921
+
922
+ <video controls width="1280" height="720" src="https://huggingface.co/nvidia/Cosmos3-Super/resolve/main/assets/example_t2v_diffusers_output.mp4"></video>
923
+
924
+ ## Limitations
925
+
926
+ Cosmos3 may produce imperfect outputs in challenging scenarios. Generation artifacts include temporal inconsistency, unstable camera or object motion, imprecise physical interactions, inaccurate audio-video synchronization, and action-state drift — especially in long-horizon or high-resolution outputs. Reasoning may also be incorrect: object states, causal relationships, spatial geometry, temporal ordering, agent intent, and future outcomes can be misinferred, and complex or long-context inputs may yield hallucinated entities, inconsistent interpretations, or implausible predictions. Because the model lacks an explicit physics simulator, 3D geometry, 4D space-time evolution, object permanence, contact dynamics, and physical laws are only approximated — producing artifacts such as disappearing or morphing objects, unrealistic collisions, and physically implausible motions. Quality further degrades in out-of-distribution environments, safety-critical edge cases, and domains underrepresented in training.
927
+
928
+ Cosmos3 outputs should not be treated as physically accurate simulation, reliable ground-truth reasoning, or safety-certified decision making. Applications involving robotics control, autonomous systems, scientific simulation, or safety-critical planning require additional validation, external constraints, system-level safety analysis, and domain-specific guardrails before deployment.
929
+
930
+ ## Inference
931
+
932
+ **Acceleration Engine:** [PyTorch](https://pytorch.org/), [vLLM](https://github.com/vllm-project/vllm), [vLLM-Omni](https://github.com/vllm-project/vllm-omni), [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
933
+
934
+ **Test Hardware:** GB200 and H100
935
+
936
+ ## Ethical Considerations
937
+
938
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
939
+
940
+ Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
941
+
942
+ Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.
943
+
944
+ For more detailed information on ethical considerations for this model, please see the Model Card++ [Explainability](EXPLAINABILITY.md), [Bias](BIAS.md), [Safety & Security](SAFETY.md), and [Privacy](PRIVACY.md) subcards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
SAFETY.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Safety & Security
2
+
3
+ | Field | Response |
4
+ | :---- | :---- |
5
+ | Model Application(s) | World reasoning and generation for Physical AI. |
6
+ | Describe the life critical impact: | This model is not a safety-certified component and must not be used as the sole basis for life-critical decisions or control without additional system-level validation, safety analysis, and safeguards. The model is not designed or tested by NVIDIA for use in any system or application where the use of or failure of such system or application developed with the model could result in injury, death, or catastrophic damage. NVIDIA is not liable to any party, in whole or in part, for any claims or damages arising from those uses. Any system or application developed with the model must include sufficient safety and redundancy features and comply with applicable legal and regulatory standards and requirements. |
7
+ | Description of methods implemented in data acquisition or processing, if any, to address other types of potentially harmful data in the training, testing, and validation data: | Training, evaluation, and validation datasets pass through multi-stage automated and manual filtering to reduce harmful, unsafe, restricted, or policy-violating content. Pipelines include source-licensing review, deduplication, metadata-based and classifier-based moderation, embedding-based anomaly detection, and human audits on selected datasets. For Physical AI data (robotics, autonomous driving, industrial scenes), filtering also targets invalid action trajectories, physically implausible interactions, and unsafe control sequences. Synthetic and simulation-generated data are evaluated through internal validation before inclusion. Benchmark and red-team testing surface remaining safety gaps across world generation, reasoning, audio, and action tasks. No data-filtering process can guarantee complete removal; developers are responsible for application-specific safeguards and validation before deployment. |
8
+ | Description of any methods implemented in data acquisition or processing, if any, to address illegal or harmful content in the training data, including, but not limited to, child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) | In addition to the general unsafe-content filtering described above, training data acquisition and preprocessing apply CSAM- and NCII-specific safeguards: hash-matching systems against known CSAM databases, classifier-based moderation models trained specifically for explicit content and NCII detection, and provenance and licensing review for sources containing human imagery. Identified content is removed at ingest, with human review and targeted audits supplementing automated filtering for selected datasets. Despite these safeguards, no large-scale data-filtering system can guarantee complete detection. Ongoing monitoring and dataset review continue post-release. |
9
+ | Use Case Restrictions | Use is governed by the [OpenMDW1.1](https://openmdw.ai/) |
10
+ | Model and dataset restrictions | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
11
+ | Responsible Data Handling | This AI model was developed based on our policies to ensure responsible data handling and risk mitigation. The datasets used for training have been scanned for harmful content and illegal content, consistent with our policies including scanning for Child Sexual Abuse Material (CSAM). Ongoing review and monitoring mechanisms are in place based on our policies and to maintain data integrity. |
assets/example_action_fd_agibotworld_4chunk_output.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b01266b6cd27478514133b00ada9c33db3f9444167f09942c11f880c629c8c0
3
+ size 672918
assets/example_action_fd_agibotworld_action_chunks.json ADDED
@@ -0,0 +1,2008 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "prompt": "Pickup items in the supermarket",
3
+ "domain_name": "agibotworld",
4
+ "view_point": "concat_view",
5
+ "fps": 10,
6
+ "image_size": 480,
7
+ "action_chunk_size": 16,
8
+ "num_chunks": 4,
9
+ "action_shape_per_chunk": [
10
+ 16,
11
+ 29
12
+ ],
13
+ "description": "Four normalized 29-D AgiBotWorld-Beta action chunks. Use the first frame as the initial conditioning image. For chunks 1-3, condition on the final generated frame from the previous chunk.",
14
+ "action_chunks": [
15
+ [
16
+ [
17
+ 0.009043693542480469,
18
+ 0.20256006717681885,
19
+ 0.04290425777435303,
20
+ 1.0,
21
+ 0.0,
22
+ 0.0,
23
+ 0.0,
24
+ 1.0,
25
+ 0.0,
26
+ -0.010607659816741943,
27
+ 0.005519270896911621,
28
+ 0.04532516002655029,
29
+ 1.0,
30
+ 2.4557113647460938e-05,
31
+ 2.968311309814453e-05,
32
+ -2.4557113647460938e-05,
33
+ 1.0,
34
+ 2.181529998779297e-05,
35
+ 0.41925930976867676,
36
+ 0.033257126808166504,
37
+ 0.0046384334564208984,
38
+ 0.03365051746368408,
39
+ 1.0,
40
+ -8.106231689453125e-06,
41
+ 7.033348083496094e-06,
42
+ 8.106231689453125e-06,
43
+ 1.0,
44
+ 4.792213439941406e-05,
45
+ 0.41629624366760254
46
+ ],
47
+ [
48
+ 0.009057760238647461,
49
+ 0.20257973670959473,
50
+ 0.04291045665740967,
51
+ 1.0,
52
+ 0.0,
53
+ 0.0,
54
+ 0.0,
55
+ 1.0,
56
+ 0.0,
57
+ -0.010164499282836914,
58
+ 0.006109714508056641,
59
+ 0.04489457607269287,
60
+ 1.0,
61
+ -2.1159648895263672e-05,
62
+ -1.5616416931152344e-05,
63
+ 2.110004425048828e-05,
64
+ 1.0,
65
+ -1.7881393432617188e-06,
66
+ 0.41925930976867676,
67
+ 0.033304810523986816,
68
+ 0.005009293556213379,
69
+ 0.03424406051635742,
70
+ 1.0,
71
+ 1.7881393432617188e-06,
72
+ -3.2186508178710938e-06,
73
+ -1.8477439880371094e-06,
74
+ 1.0,
75
+ 2.4437904357910156e-05,
76
+ 0.41629624366760254
77
+ ],
78
+ [
79
+ 0.009057760238647461,
80
+ 0.20257973670959473,
81
+ 0.04291045665740967,
82
+ 1.0,
83
+ 0.0,
84
+ 0.0,
85
+ 0.0,
86
+ 1.0,
87
+ 0.0,
88
+ -0.01051628589630127,
89
+ 0.006176114082336426,
90
+ 0.04440665245056152,
91
+ 1.0,
92
+ 3.147125244140625e-05,
93
+ 3.0994415283203125e-06,
94
+ -3.147125244140625e-05,
95
+ 1.0,
96
+ -1.6987323760986328e-05,
97
+ 0.41925930976867676,
98
+ 0.03370165824890137,
99
+ 0.0064890384674072266,
100
+ 0.03395986557006836,
101
+ 1.0,
102
+ -3.349781036376953e-05,
103
+ -1.424551010131836e-05,
104
+ 3.349781036376953e-05,
105
+ 1.0,
106
+ -6.020069122314453e-05,
107
+ 0.41629624366760254
108
+ ],
109
+ [
110
+ 0.009057760238647461,
111
+ 0.20257973670959473,
112
+ 0.04291045665740967,
113
+ 1.0,
114
+ 0.0,
115
+ 0.0,
116
+ 0.0,
117
+ 1.0,
118
+ 0.0,
119
+ -0.010625958442687988,
120
+ 0.006899237632751465,
121
+ 0.044704556465148926,
122
+ 1.0,
123
+ 3.254413604736328e-05,
124
+ -1.043081283569336e-05,
125
+ -3.254413604736328e-05,
126
+ 1.0,
127
+ -1.3887882232666016e-05,
128
+ 0.41925930976867676,
129
+ 0.03398740291595459,
130
+ 0.0045958757400512695,
131
+ 0.03477442264556885,
132
+ 1.0,
133
+ 2.9325485229492188e-05,
134
+ -3.647804260253906e-05,
135
+ -2.9325485229492188e-05,
136
+ 1.0,
137
+ 2.3365020751953125e-05,
138
+ 0.41629624366760254
139
+ ],
140
+ [
141
+ 0.01160132884979248,
142
+ 0.20257973670959473,
143
+ 0.04291045665740967,
144
+ 1.0,
145
+ 9.5367431640625e-07,
146
+ -2.5033950805664062e-06,
147
+ -9.5367431640625e-07,
148
+ 1.0,
149
+ 0.0,
150
+ -0.009872138500213623,
151
+ 0.005211949348449707,
152
+ 0.045044898986816406,
153
+ 1.0,
154
+ -3.808736801147461e-05,
155
+ -3.814697265625e-06,
156
+ 3.814697265625e-05,
157
+ 1.0,
158
+ -5.4836273193359375e-06,
159
+ 0.41925930976867676,
160
+ 0.03165924549102783,
161
+ 0.0048122406005859375,
162
+ 0.0322871208190918,
163
+ 1.0,
164
+ -1.7881393432617188e-06,
165
+ 7.87973403930664e-05,
166
+ 1.7881393432617188e-06,
167
+ 1.0,
168
+ 0.0,
169
+ 0.41629624366760254
170
+ ],
171
+ [
172
+ 0.00906062126159668,
173
+ 0.20257973670959473,
174
+ 0.04291045665740967,
175
+ 1.0,
176
+ 0.0,
177
+ 0.0,
178
+ 0.0,
179
+ 1.0,
180
+ 0.0,
181
+ -0.01051628589630127,
182
+ 0.0053795576095581055,
183
+ 0.045049309730529785,
184
+ 1.0,
185
+ -4.76837158203125e-06,
186
+ 9.5367431640625e-06,
187
+ 4.76837158203125e-06,
188
+ 1.0,
189
+ 3.504753112792969e-05,
190
+ 0.41925930976867676,
191
+ 0.034897446632385254,
192
+ 0.005955934524536133,
193
+ 0.0352022647857666,
194
+ 1.0,
195
+ 1.1920928955078125e-05,
196
+ -6.92605972290039e-05,
197
+ -1.1920928955078125e-05,
198
+ 1.0,
199
+ -2.1636486053466797e-05,
200
+ 0.41629624366760254
201
+ ],
202
+ [
203
+ 0.009052157402038574,
204
+ 0.20257973670959473,
205
+ 0.042908430099487305,
206
+ 1.0,
207
+ 0.0,
208
+ 0.0,
209
+ 0.0,
210
+ 1.0,
211
+ 0.0,
212
+ -0.010749280452728271,
213
+ 0.007395386695861816,
214
+ 0.0448070764541626,
215
+ 1.0,
216
+ 2.384185791015625e-06,
217
+ 1.4185905456542969e-05,
218
+ -2.384185791015625e-06,
219
+ 1.0,
220
+ -4.285573959350586e-05,
221
+ 0.41925930976867676,
222
+ 0.031437039375305176,
223
+ 0.0054999589920043945,
224
+ 0.03345012664794922,
225
+ 1.0,
226
+ -2.47955322265625e-05,
227
+ 8.356571197509766e-05,
228
+ 2.47955322265625e-05,
229
+ 1.0,
230
+ -1.710653305053711e-05,
231
+ 0.41629624366760254
232
+ ],
233
+ [
234
+ 0.009069085121154785,
235
+ 0.20257973670959473,
236
+ 0.04291260242462158,
237
+ 1.0,
238
+ 0.0,
239
+ 0.0,
240
+ 0.0,
241
+ 1.0,
242
+ 0.0,
243
+ -0.009954392910003662,
244
+ 0.005501866340637207,
245
+ 0.04485750198364258,
246
+ 1.0,
247
+ -2.002716064453125e-05,
248
+ -2.4437904357910156e-05,
249
+ 2.002716064453125e-05,
250
+ 1.0,
251
+ 2.8252601623535156e-05,
252
+ 0.41925930976867676,
253
+ 0.03527843952178955,
254
+ 0.005295157432556152,
255
+ 0.03457236289978027,
256
+ 1.0,
257
+ -1.049041748046875e-05,
258
+ -8.213520050048828e-05,
259
+ 1.049041748046875e-05,
260
+ 1.0,
261
+ 3.4332275390625e-05,
262
+ 0.41629624366760254
263
+ ],
264
+ [
265
+ 0.009052157402038574,
266
+ 0.20257973670959473,
267
+ 0.042908430099487305,
268
+ 1.0,
269
+ 0.0,
270
+ 0.0,
271
+ 0.0,
272
+ 1.0,
273
+ 0.0,
274
+ -0.0008448958396911621,
275
+ -0.017412126064300537,
276
+ 0.05560195446014404,
277
+ 0.9999996423721313,
278
+ -0.0009481906890869141,
279
+ -5.638599395751953e-05,
280
+ 0.0009481906890869141,
281
+ 0.9999994039535522,
282
+ 0.0004875659942626953,
283
+ 0.41925930976867676,
284
+ 0.0332942008972168,
285
+ 0.005210161209106445,
286
+ 0.0339665412902832,
287
+ 1.0,
288
+ 0.0,
289
+ 0.0,
290
+ 0.0,
291
+ 1.0,
292
+ 0.0,
293
+ 0.41629624366760254
294
+ ],
295
+ [
296
+ 0.00906062126159668,
297
+ 0.20257973670959473,
298
+ 0.04291045665740967,
299
+ 1.0,
300
+ 0.0,
301
+ 0.0,
302
+ 0.0,
303
+ 1.0,
304
+ 0.0,
305
+ 0.03963160514831543,
306
+ -0.046320974826812744,
307
+ 0.07942080497741699,
308
+ 0.9999939203262329,
309
+ -0.0032505393028259277,
310
+ -0.0012335777282714844,
311
+ 0.0032502412796020508,
312
+ 0.9999947547912598,
313
+ -0.0002353191375732422,
314
+ 0.41925930976867676,
315
+ 0.0332942008972168,
316
+ 0.005217909812927246,
317
+ 0.03396797180175781,
318
+ 1.0,
319
+ 0.0,
320
+ 0.0,
321
+ 0.0,
322
+ 1.0,
323
+ 0.0,
324
+ 0.41629624366760254
325
+ ],
326
+ [
327
+ 0.00911402702331543,
328
+ 0.20263886451721191,
329
+ 0.04292500019073486,
330
+ 1.0,
331
+ 0.0,
332
+ -5.960464477539063e-08,
333
+ 0.0,
334
+ 1.0,
335
+ -2.682209014892578e-06,
336
+ 0.09385919570922852,
337
+ -0.05113154649734497,
338
+ 0.09816932678222656,
339
+ 0.9999802112579346,
340
+ -0.005321979522705078,
341
+ -0.0033246874809265137,
342
+ 0.005314230918884277,
343
+ 0.9999830722808838,
344
+ -0.0023511648178100586,
345
+ 0.41925930976867676,
346
+ 0.0332942008972168,
347
+ 0.005229473114013672,
348
+ 0.033969879150390625,
349
+ 1.0,
350
+ 0.0,
351
+ 0.0,
352
+ 0.0,
353
+ 1.0,
354
+ 0.0,
355
+ 0.41629624366760254
356
+ ],
357
+ [
358
+ 0.009057760238647461,
359
+ 0.20257973670959473,
360
+ 0.04291045665740967,
361
+ 1.0,
362
+ 0.0,
363
+ 0.0,
364
+ 0.0,
365
+ 1.0,
366
+ -5.960464477539063e-08,
367
+ 0.12021923065185547,
368
+ -0.03888678550720215,
369
+ 0.11576962471008301,
370
+ 0.9999756813049316,
371
+ -0.005499899387359619,
372
+ -0.004302918910980225,
373
+ 0.005483388900756836,
374
+ 0.9999775886535645,
375
+ -0.003832995891571045,
376
+ 0.41925930976867676,
377
+ 0.0332942008972168,
378
+ 0.005214095115661621,
379
+ 0.03396737575531006,
380
+ 1.0,
381
+ 0.0,
382
+ 0.0,
383
+ 0.0,
384
+ 1.0,
385
+ 0.0,
386
+ 0.41629624366760254
387
+ ],
388
+ [
389
+ 0.009049415588378906,
390
+ 0.20256006717681885,
391
+ 0.04290628433227539,
392
+ 1.0,
393
+ 0.0,
394
+ 0.0,
395
+ 0.0,
396
+ 1.0,
397
+ -5.960464477539063e-08,
398
+ 0.1539161205291748,
399
+ -0.02273625135421753,
400
+ 0.12730515003204346,
401
+ 0.9999697208404541,
402
+ -0.005183577537536621,
403
+ -0.00581127405166626,
404
+ 0.005152106285095215,
405
+ 0.9999721050262451,
406
+ -0.005426645278930664,
407
+ 0.41925930976867676,
408
+ 0.0332942008972168,
409
+ 0.005210161209106445,
410
+ 0.0339665412902832,
411
+ 1.0,
412
+ 0.0,
413
+ 0.0,
414
+ 0.0,
415
+ 1.0,
416
+ 0.0,
417
+ 0.41629624366760254
418
+ ],
419
+ [
420
+ 0.011651992797851562,
421
+ 0.2025994062423706,
422
+ 0.04295003414154053,
423
+ 1.0,
424
+ 9.5367431640625e-07,
425
+ -2.5033950805664062e-06,
426
+ -9.5367431640625e-07,
427
+ 1.0,
428
+ -9.5367431640625e-07,
429
+ 0.1874760389328003,
430
+ -0.02913987636566162,
431
+ 0.13934898376464844,
432
+ 0.999955415725708,
433
+ -0.0063391923904418945,
434
+ -0.007002890110015869,
435
+ 0.0062923431396484375,
436
+ 0.9999579191207886,
437
+ -0.0066814422607421875,
438
+ 0.41925930976867676,
439
+ 0.03327834606170654,
440
+ 0.005256533622741699,
441
+ 0.0339813232421875,
442
+ 1.0,
443
+ -7.748603820800781e-07,
444
+ -5.960464477539063e-08,
445
+ 8.344650268554688e-07,
446
+ 1.0,
447
+ -5.364418029785156e-07,
448
+ 0.41629624366760254
449
+ ],
450
+ [
451
+ 0.00906062126159668,
452
+ 0.2025994062423706,
453
+ 0.042908430099487305,
454
+ 1.0,
455
+ 0.0,
456
+ 0.0,
457
+ 0.0,
458
+ 1.0,
459
+ 0.0,
460
+ 0.20743560791015625,
461
+ -0.020664572715759277,
462
+ 0.17266845703125,
463
+ 0.9999444484710693,
464
+ -0.007667183876037598,
465
+ -0.00723874568939209,
466
+ 0.007605195045471191,
467
+ 0.9999344348907471,
468
+ -0.008556544780731201,
469
+ 0.41925930976867676,
470
+ 0.03328895568847656,
471
+ 0.005217909812927246,
472
+ 0.03396928310394287,
473
+ 1.0,
474
+ 0.0,
475
+ 0.0,
476
+ 0.0,
477
+ 1.0,
478
+ 0.0,
479
+ 0.41629624366760254
480
+ ],
481
+ [
482
+ 0.00906062126159668,
483
+ 0.2025994062423706,
484
+ 0.042908430099487305,
485
+ 1.0,
486
+ 0.0,
487
+ 0.0,
488
+ 0.0,
489
+ 1.0,
490
+ 0.0,
491
+ 0.31465721130371094,
492
+ -0.045416176319122314,
493
+ 0.24895763397216797,
494
+ 0.9998956918716431,
495
+ -0.010226964950561523,
496
+ -0.010205447673797607,
497
+ 0.010100603103637695,
498
+ 0.9998726844787598,
499
+ -0.012354671955108643,
500
+ 0.41925930976867676,
501
+ 0.03328895568847656,
502
+ 0.005217909812927246,
503
+ 0.03396928310394287,
504
+ 1.0,
505
+ 0.0,
506
+ 0.0,
507
+ 0.0,
508
+ 1.0,
509
+ 0.0,
510
+ 0.41629624366760254
511
+ ]
512
+ ],
513
+ [
514
+ [
515
+ 0.009052157402038574,
516
+ 0.20257973670959473,
517
+ 0.042908430099487305,
518
+ 1.0,
519
+ 0.0,
520
+ 0.0,
521
+ 0.0,
522
+ 1.0,
523
+ 0.0,
524
+ -0.010411202907562256,
525
+ 0.006074786186218262,
526
+ 0.044891953468322754,
527
+ 1.0,
528
+ 0.00010097026824951172,
529
+ 1.5497207641601562e-06,
530
+ -0.00010102987289428711,
531
+ 1.0,
532
+ 1.1920928955078125e-06,
533
+ 0.41925930976867676,
534
+ 0.034119606018066406,
535
+ 0.0051136016845703125,
536
+ 0.034647583961486816,
537
+ 1.0,
538
+ 3.62396240234375e-05,
539
+ -4.118680953979492e-05,
540
+ -3.618001937866211e-05,
541
+ 1.0,
542
+ -1.3947486877441406e-05,
543
+ 0.41629624366760254
544
+ ],
545
+ [
546
+ 0.009069085121154785,
547
+ 0.20257973670959473,
548
+ 0.04291260242462158,
549
+ 1.0,
550
+ 0.0,
551
+ 0.0,
552
+ 0.0,
553
+ 1.0,
554
+ 0.0,
555
+ -0.010630488395690918,
556
+ 0.006099224090576172,
557
+ 0.04487788677215576,
558
+ 1.0,
559
+ -5.799531936645508e-05,
560
+ 2.002716064453125e-05,
561
+ 5.793571472167969e-05,
562
+ 1.0,
563
+ -1.3232231140136719e-05,
564
+ 0.41925930976867676,
565
+ 0.03215658664703369,
566
+ 0.00477755069732666,
567
+ 0.032892584800720215,
568
+ 1.0,
569
+ 2.384185791015625e-06,
570
+ 5.4001808166503906e-05,
571
+ -2.384185791015625e-06,
572
+ 1.0,
573
+ 1.2516975402832031e-05,
574
+ 0.41629624366760254
575
+ ],
576
+ [
577
+ 0.009066224098205566,
578
+ 0.2025994062423706,
579
+ 0.042914628982543945,
580
+ 1.0,
581
+ 0.0,
582
+ 0.0,
583
+ 0.0,
584
+ 1.0,
585
+ 0.0,
586
+ -0.009620904922485352,
587
+ 0.0038878917694091797,
588
+ 0.046347856521606445,
589
+ 1.0,
590
+ -0.00011581182479858398,
591
+ -1.0132789611816406e-06,
592
+ 0.00011587142944335938,
593
+ 1.0,
594
+ 7.665157318115234e-05,
595
+ 0.41925930976867676,
596
+ 0.034696340560913086,
597
+ 0.006465911865234375,
598
+ 0.03510606288909912,
599
+ 1.0,
600
+ -3.439188003540039e-05,
601
+ -5.424022674560547e-05,
602
+ 3.445148468017578e-05,
603
+ 1.0,
604
+ -5.364418029785156e-06,
605
+ 0.41629624366760254
606
+ ],
607
+ [
608
+ 0.009055018424987793,
609
+ 0.20256006717681885,
610
+ 0.04290628433227539,
611
+ 1.0,
612
+ 0.0,
613
+ 0.0,
614
+ 0.0,
615
+ 1.0,
616
+ 0.0,
617
+ 0.009932160377502441,
618
+ -0.03261244297027588,
619
+ 0.06218290328979492,
620
+ 0.9999983310699463,
621
+ -0.0017368793487548828,
622
+ -0.0003128647804260254,
623
+ 0.0017371177673339844,
624
+ 0.9999983310699463,
625
+ 0.0006134510040283203,
626
+ 0.41925930976867676,
627
+ 0.0332942008972168,
628
+ 0.005214095115661621,
629
+ 0.03396737575531006,
630
+ 1.0,
631
+ 0.0,
632
+ 0.0,
633
+ 0.0,
634
+ 1.0,
635
+ 0.0,
636
+ 0.41629624366760254
637
+ ],
638
+ [
639
+ 0.00906062126159668,
640
+ 0.20257973670959473,
641
+ 0.04291045665740967,
642
+ 1.0,
643
+ 0.0,
644
+ 0.0,
645
+ 0.0,
646
+ 1.0,
647
+ 0.0,
648
+ 0.05640697479248047,
649
+ -0.046244144439697266,
650
+ 0.08853423595428467,
651
+ 0.999990701675415,
652
+ -0.003899216651916504,
653
+ -0.001820981502532959,
654
+ 0.0038973093032836914,
655
+ 0.9999918937683105,
656
+ -0.0010201334953308105,
657
+ 0.41925930976867676,
658
+ 0.0332942008972168,
659
+ 0.005217909812927246,
660
+ 0.03396797180175781,
661
+ 1.0,
662
+ 0.0,
663
+ 0.0,
664
+ 0.0,
665
+ 1.0,
666
+ 0.0,
667
+ 0.41629624366760254
668
+ ],
669
+ [
670
+ 0.009085893630981445,
671
+ 0.20261919498443604,
672
+ 0.042914628982543945,
673
+ 1.0,
674
+ 0.0,
675
+ -5.960464477539063e-08,
676
+ 0.0,
677
+ 1.0,
678
+ -2.682209014892578e-06,
679
+ 0.10252106189727783,
680
+ -0.050013601779937744,
681
+ 0.10689389705657959,
682
+ 0.9999780654907227,
683
+ -0.00561290979385376,
684
+ -0.003548860549926758,
685
+ 0.005602836608886719,
686
+ 0.9999803304672241,
687
+ -0.0028455257415771484,
688
+ 0.41925930976867676,
689
+ 0.0332942008972168,
690
+ 0.005210161209106445,
691
+ 0.0339665412902832,
692
+ 1.0,
693
+ 0.0,
694
+ 0.0,
695
+ 0.0,
696
+ 1.0,
697
+ 0.0,
698
+ 0.41629624366760254
699
+ ],
700
+ [
701
+ 0.00907754898071289,
702
+ 0.2025994062423706,
703
+ 0.04291880130767822,
704
+ 1.0,
705
+ 0.0,
706
+ 0.0,
707
+ 0.0,
708
+ 1.0,
709
+ -5.960464477539063e-08,
710
+ 0.13923311233520508,
711
+ -0.03134775161743164,
712
+ 0.11226391792297363,
713
+ 0.9999709129333496,
714
+ -0.005446195602416992,
715
+ -0.00535351037979126,
716
+ 0.005421996116638184,
717
+ 0.9999749660491943,
718
+ -0.004530847072601318,
719
+ 0.41925930976867676,
720
+ 0.0332942008972168,
721
+ 0.005225658416748047,
722
+ 0.03396928310394287,
723
+ 1.0,
724
+ 0.0,
725
+ 0.0,
726
+ 0.0,
727
+ 1.0,
728
+ 0.0,
729
+ 0.41629624366760254
730
+ ],
731
+ [
732
+ 0.011612653732299805,
733
+ 0.20257973670959473,
734
+ 0.04290628433227539,
735
+ 1.0,
736
+ 9.5367431640625e-07,
737
+ -2.5033950805664062e-06,
738
+ -9.5367431640625e-07,
739
+ 1.0,
740
+ 0.0,
741
+ 0.1607367992401123,
742
+ -0.0290105938911438,
743
+ 0.1363682746887207,
744
+ 0.9999675750732422,
745
+ -0.00552743673324585,
746
+ -0.0058705806732177734,
747
+ 0.005494356155395508,
748
+ 0.9999691247940063,
749
+ -0.005625605583190918,
750
+ 0.41925930976867676,
751
+ 0.0332942008972168,
752
+ 0.005217909812927246,
753
+ 0.03396797180175781,
754
+ 1.0,
755
+ 0.0,
756
+ 0.0,
757
+ 0.0,
758
+ 1.0,
759
+ 0.0,
760
+ 0.41629624366760254
761
+ ],
762
+ [
763
+ 0.009085893630981445,
764
+ 0.20257973670959473,
765
+ 0.042948007583618164,
766
+ 1.0,
767
+ 0.0,
768
+ 0.0,
769
+ 0.0,
770
+ 1.0,
771
+ -1.0132789611816406e-06,
772
+ 0.1929306983947754,
773
+ -0.014666199684143066,
774
+ 0.15205836296081543,
775
+ 0.9999532699584961,
776
+ -0.006599485874176025,
777
+ -0.007064223289489746,
778
+ 0.006544232368469238,
779
+ 0.9999480247497559,
780
+ -0.007818400859832764,
781
+ 0.41925930976867676,
782
+ 0.03327834606170654,
783
+ 0.005248904228210449,
784
+ 0.03397989273071289,
785
+ 1.0,
786
+ -7.748603820800781e-07,
787
+ -5.960464477539063e-08,
788
+ 8.344650268554688e-07,
789
+ 1.0,
790
+ -5.364418029785156e-07,
791
+ 0.41629624366760254
792
+ ],
793
+ [
794
+ 0.00906062126159668,
795
+ 0.2025994062423706,
796
+ 0.042908430099487305,
797
+ 1.0,
798
+ 0.0,
799
+ 0.0,
800
+ 0.0,
801
+ 1.0,
802
+ 0.0,
803
+ 0.30631983280181885,
804
+ -0.04655855894088745,
805
+ 0.2282034158706665,
806
+ 0.9998886585235596,
807
+ -0.01070857048034668,
808
+ -0.010384440422058105,
809
+ 0.010587453842163086,
810
+ 0.9998760223388672,
811
+ -0.011655926704406738,
812
+ 0.41925930976867676,
813
+ 0.0332942008972168,
814
+ 0.005214095115661621,
815
+ 0.03396880626678467,
816
+ 1.0,
817
+ 0.0,
818
+ 0.0,
819
+ 0.0,
820
+ 1.0,
821
+ 0.0,
822
+ 0.41629624366760254
823
+ ],
824
+ [
825
+ 0.00906062126159668,
826
+ 0.2025994062423706,
827
+ 0.042908430099487305,
828
+ 1.0,
829
+ 0.0,
830
+ 0.0,
831
+ 0.0,
832
+ 1.0,
833
+ 0.0,
834
+ 0.3243972063064575,
835
+ -0.058551788330078125,
836
+ 0.2604410648345947,
837
+ 0.9998958110809326,
838
+ -0.010265886783599854,
839
+ -0.010132789611816406,
840
+ 0.010138750076293945,
841
+ 0.9998701810836792,
842
+ -0.01252216100692749,
843
+ 0.41925930976867676,
844
+ 0.03328359127044678,
845
+ 0.00520634651184082,
846
+ 0.033963799476623535,
847
+ 1.0,
848
+ 4.76837158203125e-07,
849
+ 3.5762786865234375e-07,
850
+ -4.172325134277344e-07,
851
+ 1.0,
852
+ 3.5762786865234375e-07,
853
+ 0.41629624366760254
854
+ ],
855
+ [
856
+ 0.00906062126159668,
857
+ 0.2025994062423706,
858
+ 0.042908430099487305,
859
+ 1.0,
860
+ 0.0,
861
+ 0.0,
862
+ 0.0,
863
+ 1.0,
864
+ 0.0,
865
+ 0.331770658493042,
866
+ -0.0866466760635376,
867
+ 0.27182281017303467,
868
+ 0.9998891353607178,
869
+ -0.011347353458404541,
870
+ -0.009640693664550781,
871
+ 0.011227130889892578,
872
+ 0.9998596906661987,
873
+ -0.012438654899597168,
874
+ 0.41925930976867676,
875
+ 0.033225417137145996,
876
+ 0.005071163177490234,
877
+ 0.033928513526916504,
878
+ 1.0,
879
+ 5.0067901611328125e-06,
880
+ 2.7418136596679688e-06,
881
+ -5.0067901611328125e-06,
882
+ 1.0,
883
+ 4.0531158447265625e-06,
884
+ 0.41629624366760254
885
+ ],
886
+ [
887
+ 0.009066224098205566,
888
+ 0.2025994062423706,
889
+ 0.04291045665740967,
890
+ 1.0,
891
+ 0.0,
892
+ 0.0,
893
+ 0.0,
894
+ 1.0,
895
+ 0.0,
896
+ 0.36654579639434814,
897
+ -0.0934380292892456,
898
+ 0.30077290534973145,
899
+ 0.9997755289077759,
900
+ -0.018354415893554688,
901
+ -0.010581493377685547,
902
+ 0.01818561553955078,
903
+ 0.9997092485427856,
904
+ -0.01583230495452881,
905
+ 0.41925930976867676,
906
+ 0.03137350082397461,
907
+ 0.0008289813995361328,
908
+ 0.03272604942321777,
909
+ 1.0,
910
+ 0.0001519918441772461,
911
+ 8.416175842285156e-05,
912
+ -0.00015205144882202148,
913
+ 1.0,
914
+ 0.00012350082397460938,
915
+ 0.41629624366760254
916
+ ],
917
+ [
918
+ 0.009055018424987793,
919
+ 0.20257973670959473,
920
+ 0.04290628433227539,
921
+ 1.0,
922
+ 0.0,
923
+ 0.0,
924
+ 0.0,
925
+ 1.0,
926
+ 0.0,
927
+ 0.3391350507736206,
928
+ -0.11247771978378296,
929
+ 0.30669987201690674,
930
+ 0.9997462034225464,
931
+ -0.02040243148803711,
932
+ -0.009555697441101074,
933
+ 0.020258665084838867,
934
+ 0.9996836185455322,
935
+ -0.014911115169525146,
936
+ 0.41925930976867676,
937
+ 0.03739488124847412,
938
+ 0.014652729034423828,
939
+ 0.03664207458496094,
940
+ 1.0,
941
+ -0.00032657384872436523,
942
+ -0.00018095970153808594,
943
+ 0.00032651424407958984,
944
+ 0.9999998807907104,
945
+ -0.00026535987854003906,
946
+ 0.41629624366760254
947
+ ],
948
+ [
949
+ 0.009260416030883789,
950
+ 0.2028360366821289,
951
+ 0.04296672344207764,
952
+ 1.0,
953
+ 1.1920928955078125e-07,
954
+ -2.980232238769531e-07,
955
+ -1.1920928955078125e-07,
956
+ 1.0,
957
+ -1.5914440155029297e-05,
958
+ 0.33494579792022705,
959
+ -0.07520890235900879,
960
+ 0.2943739891052246,
961
+ 0.9997798204421997,
962
+ -0.01838594675064087,
963
+ -0.010106980800628662,
964
+ 0.018214821815490723,
965
+ 0.9996933937072754,
966
+ -0.016768813133239746,
967
+ 0.41925930976867676,
968
+ 0.024235844612121582,
969
+ -0.014741063117980957,
970
+ 0.02833724021911621,
971
+ 0.9999996423721313,
972
+ 0.0006879568099975586,
973
+ 0.0004069805145263672,
974
+ -0.0006881952285766602,
975
+ 0.9999995231628418,
976
+ 0.0005651712417602539,
977
+ 0.41629624366760254
978
+ ],
979
+ [
980
+ 0.009069085121154785,
981
+ 0.20256006717681885,
982
+ 0.042908430099487305,
983
+ 1.0,
984
+ 0.0,
985
+ 0.0,
986
+ 0.0,
987
+ 1.0,
988
+ 0.0,
989
+ 0.29727888107299805,
990
+ 0.02291703224182129,
991
+ 0.24155020713806152,
992
+ 0.999849796295166,
993
+ -0.014276325702667236,
994
+ -0.00983363389968872,
995
+ 0.014076709747314453,
996
+ 0.9996992349624634,
997
+ -0.020082950592041016,
998
+ 0.41925930976867676,
999
+ 0.040045738220214844,
1000
+ 0.019493699073791504,
1001
+ 0.038599610328674316,
1002
+ 0.9999997615814209,
1003
+ -0.00048214197158813477,
1004
+ -0.0002892613410949707,
1005
+ 0.0004819631576538086,
1006
+ 0.9999997615814209,
1007
+ -0.00040209293365478516,
1008
+ 0.41629624366760254
1009
+ ]
1010
+ ],
1011
+ [
1012
+ [
1013
+ 0.009085893630981445,
1014
+ 0.20261919498443604,
1015
+ 0.042914628982543945,
1016
+ 1.0,
1017
+ 0.0,
1018
+ -5.960464477539063e-08,
1019
+ 0.0,
1020
+ 1.0,
1021
+ -2.682209014892578e-06,
1022
+ 0.11258077621459961,
1023
+ -0.04367637634277344,
1024
+ 0.10712110996246338,
1025
+ 0.999976396560669,
1026
+ -0.005554258823394775,
1027
+ -0.004074573516845703,
1028
+ 0.005540609359741211,
1029
+ 0.9999790191650391,
1030
+ -0.003357231616973877,
1031
+ 0.41925930976867676,
1032
+ 0.0332942008972168,
1033
+ 0.005210161209106445,
1034
+ 0.0339665412902832,
1035
+ 1.0,
1036
+ 0.0,
1037
+ 0.0,
1038
+ 0.0,
1039
+ 1.0,
1040
+ 0.0,
1041
+ 0.41629624366760254
1042
+ ],
1043
+ [
1044
+ 0.00907754898071289,
1045
+ 0.2025994062423706,
1046
+ 0.04291880130767822,
1047
+ 1.0,
1048
+ 0.0,
1049
+ 0.0,
1050
+ 0.0,
1051
+ 1.0,
1052
+ -5.960464477539063e-08,
1053
+ 0.14761626720428467,
1054
+ -0.03002721071243286,
1055
+ 0.12956809997558594,
1056
+ 0.9999697208404541,
1057
+ -0.005600392818450928,
1058
+ -0.005403280258178711,
1059
+ 0.0055730342864990234,
1060
+ 0.9999716281890869,
1061
+ -0.005077004432678223,
1062
+ 0.41925930976867676,
1063
+ 0.0332942008972168,
1064
+ 0.005225658416748047,
1065
+ 0.03396928310394287,
1066
+ 1.0,
1067
+ 0.0,
1068
+ 0.0,
1069
+ 0.0,
1070
+ 1.0,
1071
+ 0.0,
1072
+ 0.41629624366760254
1073
+ ],
1074
+ [
1075
+ 0.011612653732299805,
1076
+ 0.20257973670959473,
1077
+ 0.04290628433227539,
1078
+ 1.0,
1079
+ 9.5367431640625e-07,
1080
+ -2.5033950805664062e-06,
1081
+ -9.5367431640625e-07,
1082
+ 1.0,
1083
+ 0.0,
1084
+ 0.17086505889892578,
1085
+ -0.018072426319122314,
1086
+ 0.13277769088745117,
1087
+ 0.999964714050293,
1088
+ -0.005352973937988281,
1089
+ -0.00646662712097168,
1090
+ 0.005311727523803711,
1091
+ 0.9999654293060303,
1092
+ -0.006388425827026367,
1093
+ 0.41925930976867676,
1094
+ 0.0332942008972168,
1095
+ 0.005217909812927246,
1096
+ 0.03396797180175781,
1097
+ 1.0,
1098
+ 0.0,
1099
+ 0.0,
1100
+ 0.0,
1101
+ 1.0,
1102
+ 0.0,
1103
+ 0.41629624366760254
1104
+ ],
1105
+ [
1106
+ 0.009085893630981445,
1107
+ 0.20257973670959473,
1108
+ 0.042948007583618164,
1109
+ 1.0,
1110
+ 0.0,
1111
+ 0.0,
1112
+ 0.0,
1113
+ 1.0,
1114
+ -1.0132789611816406e-06,
1115
+ 0.20010781288146973,
1116
+ -0.0246087908744812,
1117
+ 0.15921831130981445,
1118
+ 0.9999470710754395,
1119
+ -0.00736159086227417,
1120
+ -0.007179737091064453,
1121
+ 0.0073053836822509766,
1122
+ 0.9999427795410156,
1123
+ -0.00782400369644165,
1124
+ 0.41925930976867676,
1125
+ 0.03327834606170654,
1126
+ 0.005248904228210449,
1127
+ 0.03397989273071289,
1128
+ 1.0,
1129
+ -7.748603820800781e-07,
1130
+ -5.960464477539063e-08,
1131
+ 8.344650268554688e-07,
1132
+ 1.0,
1133
+ -5.364418029785156e-07,
1134
+ 0.41629624366760254
1135
+ ],
1136
+ [
1137
+ 0.00906062126159668,
1138
+ 0.2025994062423706,
1139
+ 0.042908430099487305,
1140
+ 1.0,
1141
+ 0.0,
1142
+ 0.0,
1143
+ 0.0,
1144
+ 1.0,
1145
+ 0.0,
1146
+ 0.31356537342071533,
1147
+ -0.04342836141586304,
1148
+ 0.2481241226196289,
1149
+ 0.9998904466629028,
1150
+ -0.010683715343475342,
1151
+ -0.01024484634399414,
1152
+ 0.01055598258972168,
1153
+ 0.9998668432235718,
1154
+ -0.01244121789932251,
1155
+ 0.41925930976867676,
1156
+ 0.03328895568847656,
1157
+ 0.005221843719482422,
1158
+ 0.03397011756896973,
1159
+ 1.0,
1160
+ -1.7881393432617188e-07,
1161
+ 0.0,
1162
+ 1.1920928955078125e-07,
1163
+ 1.0,
1164
+ -1.1920928955078125e-07,
1165
+ 0.41629624366760254
1166
+ ],
1167
+ [
1168
+ 0.00906062126159668,
1169
+ 0.2025994062423706,
1170
+ 0.042908430099487305,
1171
+ 1.0,
1172
+ 0.0,
1173
+ 0.0,
1174
+ 0.0,
1175
+ 1.0,
1176
+ 0.0,
1177
+ 0.33043670654296875,
1178
+ -0.07546043395996094,
1179
+ 0.26550519466400146,
1180
+ 0.9998939037322998,
1181
+ -0.010528624057769775,
1182
+ -0.010073602199554443,
1183
+ 0.010405302047729492,
1184
+ 0.9998712539672852,
1185
+ -0.012217998504638672,
1186
+ 0.41925930976867676,
1187
+ 0.03329956531524658,
1188
+ 0.005252718925476074,
1189
+ 0.03398323059082031,
1190
+ 1.0,
1191
+ -1.1920928955078125e-06,
1192
+ -7.748603820800781e-07,
1193
+ 1.1920928955078125e-06,
1194
+ 1.0,
1195
+ -1.0132789611816406e-06,
1196
+ 0.41629624366760254
1197
+ ],
1198
+ [
1199
+ 0.00906062126159668,
1200
+ 0.2025994062423706,
1201
+ 0.042908430099487305,
1202
+ 1.0,
1203
+ 0.0,
1204
+ 0.0,
1205
+ 0.0,
1206
+ 1.0,
1207
+ 0.0,
1208
+ 0.3373715877532959,
1209
+ -0.09454900026321411,
1210
+ 0.26748788356781006,
1211
+ 0.9998712539672852,
1212
+ -0.012611329555511475,
1213
+ -0.009919703006744385,
1214
+ 0.01248788833618164,
1215
+ 0.999845027923584,
1216
+ -0.012407541275024414,
1217
+ 0.41925930976867676,
1218
+ 0.033394813537597656,
1219
+ 0.005472898483276367,
1220
+ 0.0340428352355957,
1221
+ 1.0,
1222
+ -8.881092071533203e-06,
1223
+ -4.887580871582031e-06,
1224
+ 8.940696716308594e-06,
1225
+ 1.0,
1226
+ -7.212162017822266e-06,
1227
+ 0.41629624366760254
1228
+ ],
1229
+ [
1230
+ 0.009066224098205566,
1231
+ 0.2025994062423706,
1232
+ 0.04291045665740967,
1233
+ 1.0,
1234
+ 0.0,
1235
+ 0.0,
1236
+ 0.0,
1237
+ 1.0,
1238
+ 0.0,
1239
+ 0.3717309236526489,
1240
+ -0.0959848165512085,
1241
+ 0.3111867904663086,
1242
+ 0.9997556209564209,
1243
+ -0.019321084022521973,
1244
+ -0.010733485221862793,
1245
+ 0.019145727157592773,
1246
+ 0.9996854066848755,
1247
+ -0.016201794147491455,
1248
+ 0.41925930976867676,
1249
+ 0.0340878963470459,
1250
+ 0.007068514823913574,
1251
+ 0.03449106216430664,
1252
+ 1.0,
1253
+ -6.383657455444336e-05,
1254
+ -3.534555435180664e-05,
1255
+ 6.389617919921875e-05,
1256
+ 1.0,
1257
+ -5.1856040954589844e-05,
1258
+ 0.41629624366760254
1259
+ ],
1260
+ [
1261
+ 0.009055018424987793,
1262
+ 0.20257973670959473,
1263
+ 0.04290628433227539,
1264
+ 1.0,
1265
+ 0.0,
1266
+ 0.0,
1267
+ 0.0,
1268
+ 1.0,
1269
+ 0.0,
1270
+ 0.3436669111251831,
1271
+ -0.10583305358886719,
1272
+ 0.3016718626022339,
1273
+ 0.9997358322143555,
1274
+ -0.020739614963531494,
1275
+ -0.009907543659210205,
1276
+ 0.020581960678100586,
1277
+ 0.9996639490127563,
1278
+ -0.01575833559036255,
1279
+ 0.41925930976867676,
1280
+ 0.030542850494384766,
1281
+ -0.0011028051376342773,
1282
+ 0.03217768669128418,
1283
+ 1.0,
1284
+ 0.0002186298370361328,
1285
+ 0.00012099742889404297,
1286
+ -0.0002186298370361328,
1287
+ 1.0,
1288
+ 0.00017762184143066406,
1289
+ 0.41629624366760254
1290
+ ],
1291
+ [
1292
+ 0.009260416030883789,
1293
+ 0.2028360366821289,
1294
+ 0.04296672344207764,
1295
+ 1.0,
1296
+ 1.1920928955078125e-07,
1297
+ -2.980232238769531e-07,
1298
+ -1.1920928955078125e-07,
1299
+ 1.0,
1300
+ -1.5914440155029297e-05,
1301
+ 0.3160369396209717,
1302
+ -0.03754878044128418,
1303
+ 0.2892639636993408,
1304
+ 0.999807596206665,
1305
+ -0.017129719257354736,
1306
+ -0.009551525115966797,
1307
+ 0.016954421997070312,
1308
+ 0.9996918439865112,
1309
+ -0.018134236335754395,
1310
+ 0.41925930976867676,
1311
+ 0.03539478778839111,
1312
+ 0.009861946105957031,
1313
+ 0.03578758239746094,
1314
+ 1.0,
1315
+ -0.00015670061111450195,
1316
+ -8.600950241088867e-05,
1317
+ 0.00015664100646972656,
1318
+ 1.0,
1319
+ -0.00012248754501342773,
1320
+ 0.41629624366760254
1321
+ ],
1322
+ [
1323
+ 0.009069085121154785,
1324
+ 0.20256006717681885,
1325
+ 0.042908430099487305,
1326
+ 1.0,
1327
+ 0.0,
1328
+ 0.0,
1329
+ 0.0,
1330
+ 1.0,
1331
+ 0.0,
1332
+ 0.28902363777160645,
1333
+ 0.0013829469680786133,
1334
+ 0.23963022232055664,
1335
+ 0.9998641014099121,
1336
+ -0.013593852519989014,
1337
+ -0.009318351745605469,
1338
+ 0.013410806655883789,
1339
+ 0.9997210502624512,
1340
+ -0.01943725347518921,
1341
+ 0.41925930976867676,
1342
+ 0.04227328300476074,
1343
+ 0.02469015121459961,
1344
+ 0.03992271423339844,
1345
+ 0.9999997615814209,
1346
+ -0.0006698369979858398,
1347
+ -0.0003890395164489746,
1348
+ 0.0006695985794067383,
1349
+ 0.9999997615814209,
1350
+ -0.0005470514297485352,
1351
+ 0.41629624366760254
1352
+ ],
1353
+ [
1354
+ 0.009069085121154785,
1355
+ 0.20256006717681885,
1356
+ 0.042908430099487305,
1357
+ 1.0,
1358
+ 0.0,
1359
+ 0.0,
1360
+ 0.0,
1361
+ 1.0,
1362
+ 0.0,
1363
+ 0.32505500316619873,
1364
+ 0.011157870292663574,
1365
+ 0.26268625259399414,
1366
+ 0.9998914003372192,
1367
+ -0.00958329439163208,
1368
+ -0.011197924613952637,
1369
+ 0.009337067604064941,
1370
+ 0.9997179508209229,
1371
+ -0.021832704544067383,
1372
+ 0.41925930976867676,
1373
+ 0.02175426483154297,
1374
+ -0.027143001556396484,
1375
+ 0.02682507038116455,
1376
+ 0.9999990463256836,
1377
+ 0.0012558698654174805,
1378
+ 0.0006737709045410156,
1379
+ -0.0012564659118652344,
1380
+ 0.9999986886978149,
1381
+ 0.0009870529174804688,
1382
+ 0.41629624366760254
1383
+ ],
1384
+ [
1385
+ 0.009091615676879883,
1386
+ 0.20256006717681885,
1387
+ 0.04294168949127197,
1388
+ 1.0,
1389
+ 0.0,
1390
+ 0.0,
1391
+ 0.0,
1392
+ 1.0,
1393
+ -9.5367431640625e-07,
1394
+ 0.2935601472854614,
1395
+ 0.02565944194793701,
1396
+ 0.24912917613983154,
1397
+ 0.9998788833618164,
1398
+ -0.010689914226531982,
1399
+ -0.011303961277008057,
1400
+ 0.010461926460266113,
1401
+ 0.9997446537017822,
1402
+ -0.020034193992614746,
1403
+ 0.41925930976867676,
1404
+ 0.0381091833114624,
1405
+ 0.015552878379821777,
1406
+ 0.03541207313537598,
1407
+ 0.9999998807907104,
1408
+ -0.00038051605224609375,
1409
+ -0.00010246038436889648,
1410
+ 0.00038051605224609375,
1411
+ 0.9999998807907104,
1412
+ -0.0002970099449157715,
1413
+ 0.41629624366760254
1414
+ ],
1415
+ [
1416
+ 0.009063482284545898,
1417
+ 0.20257973670959473,
1418
+ 0.04291045665740967,
1419
+ 1.0,
1420
+ 0.0,
1421
+ 0.0,
1422
+ 0.0,
1423
+ 1.0,
1424
+ -5.960464477539063e-08,
1425
+ 0.28693127632141113,
1426
+ 0.1092977523803711,
1427
+ 0.31335508823394775,
1428
+ 0.9998719692230225,
1429
+ -0.006708920001983643,
1430
+ -0.014529645442962646,
1431
+ 0.006423592567443848,
1432
+ 0.9997873306274414,
1433
+ -0.019595563411712646,
1434
+ 0.41925930976867676,
1435
+ 0.043781280517578125,
1436
+ 0.03405153751373291,
1437
+ 0.0410383939743042,
1438
+ 0.9999994039535522,
1439
+ -0.0010578632354736328,
1440
+ -0.0003858208656311035,
1441
+ 0.0010575056076049805,
1442
+ 0.9999990463256836,
1443
+ -0.0008174777030944824,
1444
+ 0.41629624366760254
1445
+ ],
1446
+ [
1447
+ 0.009069085121154785,
1448
+ 0.20257973670959473,
1449
+ 0.042914628982543945,
1450
+ 1.0,
1451
+ 0.0,
1452
+ 0.0,
1453
+ 0.0,
1454
+ 1.0,
1455
+ -5.960464477539063e-08,
1456
+ 0.2642625570297241,
1457
+ 0.19131505489349365,
1458
+ 0.3423175811767578,
1459
+ 0.9998579025268555,
1460
+ -0.005951404571533203,
1461
+ -0.01576918363571167,
1462
+ 0.005600333213806152,
1463
+ 0.9997375011444092,
1464
+ -0.022212564945220947,
1465
+ 0.41925930976867676,
1466
+ 0.04341089725494385,
1467
+ 0.013211607933044434,
1468
+ 0.03693842887878418,
1469
+ 1.0,
1470
+ -0.00010991096496582031,
1471
+ -0.00019061565399169922,
1472
+ 0.00010991096496582031,
1473
+ 1.0,
1474
+ 0.0001575946807861328,
1475
+ 0.41629624366760254
1476
+ ],
1477
+ [
1478
+ 0.009063482284545898,
1479
+ 0.20256006717681885,
1480
+ 0.04291260242462158,
1481
+ 1.0,
1482
+ 0.0,
1483
+ 0.0,
1484
+ 0.0,
1485
+ 1.0,
1486
+ -5.960464477539063e-08,
1487
+ 0.16873621940612793,
1488
+ 0.25091099739074707,
1489
+ 0.359053373336792,
1490
+ 0.9999058246612549,
1491
+ -0.006036102771759033,
1492
+ -0.012328088283538818,
1493
+ 0.005730032920837402,
1494
+ 0.999678373336792,
1495
+ -0.02470952272415161,
1496
+ 0.41925930976867676,
1497
+ 0.022193431854248047,
1498
+ -0.010541379451751709,
1499
+ 0.02903127670288086,
1500
+ 0.9999994039535522,
1501
+ 0.0006458759307861328,
1502
+ 0.0008552074432373047,
1503
+ -0.0006463527679443359,
1504
+ 0.9999996423721313,
1505
+ 0.0005235671997070312,
1506
+ 0.41629624366760254
1507
+ ]
1508
+ ],
1509
+ [
1510
+ [
1511
+ 0.00906062126159668,
1512
+ 0.2025994062423706,
1513
+ 0.042908430099487305,
1514
+ 1.0,
1515
+ 0.0,
1516
+ 0.0,
1517
+ 0.0,
1518
+ 1.0,
1519
+ 0.0,
1520
+ 0.33918070793151855,
1521
+ -0.08039677143096924,
1522
+ 0.269872784614563,
1523
+ 0.9998908042907715,
1524
+ -0.01066964864730835,
1525
+ -0.010227024555206299,
1526
+ 0.010541915893554688,
1527
+ 0.9998667240142822,
1528
+ -0.012470245361328125,
1529
+ 0.41925930976867676,
1530
+ 0.03326773643493652,
1531
+ 0.005175471305847168,
1532
+ 0.03395795822143555,
1533
+ 1.0,
1534
+ 1.430511474609375e-06,
1535
+ 8.344650268554688e-07,
1536
+ -1.430511474609375e-06,
1537
+ 1.0,
1538
+ 1.0728836059570312e-06,
1539
+ 0.41629624366760254
1540
+ ],
1541
+ [
1542
+ 0.00906062126159668,
1543
+ 0.2025994062423706,
1544
+ 0.042908430099487305,
1545
+ 1.0,
1546
+ 0.0,
1547
+ 0.0,
1548
+ 0.0,
1549
+ 1.0,
1550
+ 0.0,
1551
+ 0.35213232040405273,
1552
+ -0.09297341108322144,
1553
+ 0.27886962890625,
1554
+ 0.9998326301574707,
1555
+ -0.015117108821868896,
1556
+ -0.01030886173248291,
1557
+ 0.014972925186157227,
1558
+ 0.9997909069061279,
1559
+ -0.013926804065704346,
1560
+ 0.41925930976867676,
1561
+ 0.03278625011444092,
1562
+ 0.004058837890625,
1563
+ 0.033640503883361816,
1564
+ 1.0,
1565
+ 3.993511199951172e-05,
1566
+ 2.2172927856445312e-05,
1567
+ -3.993511199951172e-05,
1568
+ 1.0,
1569
+ 3.24249267578125e-05,
1570
+ 0.41629624366760254
1571
+ ],
1572
+ [
1573
+ 0.009055018424987793,
1574
+ 0.20257973670959473,
1575
+ 0.04290628433227539,
1576
+ 1.0,
1577
+ 0.0,
1578
+ 0.0,
1579
+ 0.0,
1580
+ 1.0,
1581
+ 0.0,
1582
+ 0.3511180877685547,
1583
+ -0.10789769887924194,
1584
+ 0.3101799488067627,
1585
+ 0.9997737407684326,
1586
+ -0.018787026405334473,
1587
+ -0.009967148303985596,
1588
+ 0.018638968467712402,
1589
+ 0.9997174739837646,
1590
+ -0.014750242233276367,
1591
+ 0.41925930976867676,
1592
+ 0.034955620765686035,
1593
+ 0.009042859077453613,
1594
+ 0.03505551815032959,
1595
+ 1.0,
1596
+ -0.00013250112533569336,
1597
+ -7.349252700805664e-05,
1598
+ 0.00013256072998046875,
1599
+ 1.0,
1600
+ -0.00010764598846435547,
1601
+ 0.41629624366760254
1602
+ ],
1603
+ [
1604
+ 0.009266018867492676,
1605
+ 0.2028360366821289,
1606
+ 0.042970895767211914,
1607
+ 1.0,
1608
+ 1.1920928955078125e-07,
1609
+ -2.980232238769531e-07,
1610
+ -1.1920928955078125e-07,
1611
+ 1.0,
1612
+ -1.5914440155029297e-05,
1613
+ 0.33728480339050293,
1614
+ -0.08807200193405151,
1615
+ 0.30838990211486816,
1616
+ 0.9997389316558838,
1617
+ -0.020735502243041992,
1618
+ -0.009606242179870605,
1619
+ 0.020569682121276855,
1620
+ 0.9996428489685059,
1621
+ -0.017057418823242188,
1622
+ 0.41925930976867676,
1623
+ 0.0355377197265625,
1624
+ 0.010410547256469727,
1625
+ 0.03543984889984131,
1626
+ 1.0,
1627
+ -0.00017952919006347656,
1628
+ -9.948015213012695e-05,
1629
+ 0.00017952919006347656,
1630
+ 1.0,
1631
+ -0.00014573335647583008,
1632
+ 0.41629624366760254
1633
+ ],
1634
+ [
1635
+ 0.009069085121154785,
1636
+ 0.20256006717681885,
1637
+ 0.042908430099487305,
1638
+ 1.0,
1639
+ 0.0,
1640
+ 0.0,
1641
+ 0.0,
1642
+ 1.0,
1643
+ 0.0,
1644
+ 0.3088233470916748,
1645
+ -0.02397996187210083,
1646
+ 0.2727888822555542,
1647
+ 0.9998228549957275,
1648
+ -0.016184329986572266,
1649
+ -0.009604811668395996,
1650
+ 0.016006827354431152,
1651
+ 0.9997048377990723,
1652
+ -0.01827824115753174,
1653
+ 0.41925930976867676,
1654
+ 0.03309321403503418,
1655
+ 0.003977775573730469,
1656
+ 0.03438091278076172,
1657
+ 1.0,
1658
+ 5.173683166503906e-05,
1659
+ 1.6927719116210938e-05,
1660
+ -5.179643630981445e-05,
1661
+ 1.0,
1662
+ 4.4345855712890625e-05,
1663
+ 0.41629624366760254
1664
+ ],
1665
+ [
1666
+ 0.009069085121154785,
1667
+ 0.20256006717681885,
1668
+ 0.042908430099487305,
1669
+ 1.0,
1670
+ 0.0,
1671
+ 0.0,
1672
+ 0.0,
1673
+ 1.0,
1674
+ 0.0,
1675
+ 0.300417423248291,
1676
+ 0.014766693115234375,
1677
+ 0.24292731285095215,
1678
+ 0.9998749494552612,
1679
+ -0.01231163740158081,
1680
+ -0.009922266006469727,
1681
+ 0.01210319995880127,
1682
+ 0.9997103214263916,
1683
+ -0.020803868770599365,
1684
+ 0.41925930976867676,
1685
+ 0.020145773887634277,
1686
+ -0.02418738603591919,
1687
+ 0.024831175804138184,
1688
+ 0.9999992847442627,
1689
+ 0.0010178089141845703,
1690
+ 0.0006074905395507812,
1691
+ -0.0010182857513427734,
1692
+ 0.9999990463256836,
1693
+ 0.0008230209350585938,
1694
+ 0.41629624366760254
1695
+ ],
1696
+ [
1697
+ 0.009091615676879883,
1698
+ 0.20256006717681885,
1699
+ 0.04294168949127197,
1700
+ 1.0,
1701
+ 0.0,
1702
+ 0.0,
1703
+ 0.0,
1704
+ 1.0,
1705
+ -9.5367431640625e-07,
1706
+ 0.31132233142852783,
1707
+ -0.0029664039611816406,
1708
+ 0.25871026515960693,
1709
+ 0.9998986721038818,
1710
+ -0.009363293647766113,
1711
+ -0.010725855827331543,
1712
+ 0.009142041206359863,
1713
+ 0.9997479915618896,
1714
+ -0.020500898361206055,
1715
+ 0.41925930976867676,
1716
+ 0.041992902755737305,
1717
+ 0.02042865753173828,
1718
+ 0.03982365131378174,
1719
+ 0.9999997615814209,
1720
+ -0.00036710500717163086,
1721
+ -0.00019168853759765625,
1722
+ 0.00036704540252685547,
1723
+ 0.9999998807907104,
1724
+ -0.0003796815872192383,
1725
+ 0.41629624366760254
1726
+ ],
1727
+ [
1728
+ 0.009069085121154785,
1729
+ 0.20257973670959473,
1730
+ 0.042914628982543945,
1731
+ 1.0,
1732
+ 0.0,
1733
+ 0.0,
1734
+ 0.0,
1735
+ 1.0,
1736
+ -5.960464477539063e-08,
1737
+ 0.316475510597229,
1738
+ 0.06765854358673096,
1739
+ 0.2514406442642212,
1740
+ 0.9998515844345093,
1741
+ -0.01094353199005127,
1742
+ -0.01331174373626709,
1743
+ 0.010651946067810059,
1744
+ 0.9997060298919678,
1745
+ -0.021778404712677002,
1746
+ 0.41925930976867676,
1747
+ 0.035611748695373535,
1748
+ 0.004885673522949219,
1749
+ 0.034360408782958984,
1750
+ 1.0,
1751
+ -4.6253204345703125e-05,
1752
+ -3.796815872192383e-05,
1753
+ 4.6253204345703125e-05,
1754
+ 1.0,
1755
+ 4.5299530029296875e-05,
1756
+ 0.41629624366760254
1757
+ ],
1758
+ [
1759
+ 0.009063482284545898,
1760
+ 0.20256006717681885,
1761
+ 0.04291260242462158,
1762
+ 1.0,
1763
+ 0.0,
1764
+ 0.0,
1765
+ 0.0,
1766
+ 1.0,
1767
+ -5.960464477539063e-08,
1768
+ 0.253512978553772,
1769
+ 0.10438930988311768,
1770
+ 0.3327815532684326,
1771
+ 0.9998939037322998,
1772
+ -0.005053579807281494,
1773
+ -0.01365971565246582,
1774
+ 0.0048171281814575195,
1775
+ 0.9998389482498169,
1776
+ -0.017287731170654297,
1777
+ 0.41925930976867676,
1778
+ 0.03974413871765137,
1779
+ 0.03579390048980713,
1780
+ 0.036423683166503906,
1781
+ 0.9999992847442627,
1782
+ -0.001103818416595459,
1783
+ -6.61611557006836e-05,
1784
+ 0.0011037588119506836,
1785
+ 0.9999990463256836,
1786
+ -0.0007817745208740234,
1787
+ 0.41629624366760254
1788
+ ],
1789
+ [
1790
+ 0.009057760238647461,
1791
+ 0.20256006717681885,
1792
+ 0.042908430099487305,
1793
+ 1.0,
1794
+ 0.0,
1795
+ 0.0,
1796
+ 0.0,
1797
+ 1.0,
1798
+ -5.960464477539063e-08,
1799
+ 0.2416396141052246,
1800
+ 0.21881604194641113,
1801
+ 0.3564988374710083,
1802
+ 0.9998669624328613,
1803
+ -0.0066040754318237305,
1804
+ -0.014912307262420654,
1805
+ 0.006239175796508789,
1806
+ 0.999683141708374,
1807
+ -0.024385035037994385,
1808
+ 0.41925930976867676,
1809
+ 0.03320956230163574,
1810
+ -0.01241910457611084,
1811
+ 0.030895113945007324,
1812
+ 0.9999996423721313,
1813
+ 0.0008132457733154297,
1814
+ 0.00021898746490478516,
1815
+ -0.0008133649826049805,
1816
+ 0.9999994039535522,
1817
+ 0.0007567405700683594,
1818
+ 0.41629624366760254
1819
+ ],
1820
+ [
1821
+ 0.009069085121154785,
1822
+ 0.20257973670959473,
1823
+ 0.042914628982543945,
1824
+ 1.0,
1825
+ 0.0,
1826
+ 0.0,
1827
+ 0.0,
1828
+ 1.0,
1829
+ -5.960464477539063e-08,
1830
+ 0.1626189947128296,
1831
+ 0.28461992740631104,
1832
+ 0.3566627502441406,
1833
+ 0.9999043941497803,
1834
+ -0.005653977394104004,
1835
+ -0.012622654438018799,
1836
+ 0.005325913429260254,
1837
+ 0.9996511936187744,
1838
+ -0.025870025157928467,
1839
+ 0.41925930976867676,
1840
+ 0.02411937713623047,
1841
+ -0.018249154090881348,
1842
+ 0.03223073482513428,
1843
+ 0.9999994039535522,
1844
+ 0.0008907318115234375,
1845
+ 0.0007027387619018555,
1846
+ -0.0008912086486816406,
1847
+ 0.9999992847442627,
1848
+ 0.0007742643356323242,
1849
+ 0.41629624366760254
1850
+ ],
1851
+ [
1852
+ 0.009063482284545898,
1853
+ 0.20256006717681885,
1854
+ 0.04291260242462158,
1855
+ 1.0,
1856
+ 0.0,
1857
+ 0.0,
1858
+ 0.0,
1859
+ 1.0,
1860
+ -5.960464477539063e-08,
1861
+ 0.11943340301513672,
1862
+ 0.3110710382461548,
1863
+ 0.38945770263671875,
1864
+ 0.9999172687530518,
1865
+ -0.004807233810424805,
1866
+ -0.011935949325561523,
1867
+ 0.004476189613342285,
1868
+ 0.9996087551116943,
1869
+ -0.02760899066925049,
1870
+ 0.41925930976867676,
1871
+ 0.028484582901000977,
1872
+ -0.02769160270690918,
1873
+ 0.031121253967285156,
1874
+ 0.9999990463256836,
1875
+ 0.0012061595916748047,
1876
+ 0.0004590749740600586,
1877
+ -0.0012065768241882324,
1878
+ 0.9999988079071045,
1879
+ 0.0009518861770629883,
1880
+ 0.41629624366760254
1881
+ ],
1882
+ [
1883
+ 0.008874893188476562,
1884
+ 0.20234322547912598,
1885
+ 0.04285633563995361,
1886
+ 1.0,
1887
+ -1.1920928955078125e-07,
1888
+ 2.384185791015625e-07,
1889
+ 1.1920928955078125e-07,
1890
+ 1.0,
1891
+ 1.33514404296875e-05,
1892
+ 0.0724560022354126,
1893
+ 0.22733843326568604,
1894
+ 0.41579723358154297,
1895
+ 0.9999501705169678,
1896
+ -0.00439983606338501,
1897
+ -0.008969545364379883,
1898
+ 0.004159092903137207,
1899
+ 0.9996351003646851,
1900
+ -0.02668970823287964,
1901
+ 0.41925930976867676,
1902
+ 0.03157460689544678,
1903
+ -0.01598513126373291,
1904
+ 0.032421112060546875,
1905
+ 0.9999997615814209,
1906
+ 0.0007050037384033203,
1907
+ 0.00028765201568603516,
1908
+ -0.0007051825523376465,
1909
+ 0.9999996423721313,
1910
+ 0.0005202293395996094,
1911
+ 0.41629624366760254
1912
+ ],
1913
+ [
1914
+ 0.00907754898071289,
1915
+ 0.20257973670959473,
1916
+ 0.042914628982543945,
1917
+ 1.0,
1918
+ 0.0,
1919
+ 0.0,
1920
+ 0.0,
1921
+ 1.0,
1922
+ 0.0,
1923
+ 0.06938600540161133,
1924
+ 0.06680262088775635,
1925
+ 0.4239891767501831,
1926
+ 0.9999661445617676,
1927
+ -0.0036103129386901855,
1928
+ -0.007386624813079834,
1929
+ 0.0034453868865966797,
1930
+ 0.9997472763061523,
1931
+ -0.022213459014892578,
1932
+ 0.41925930976867676,
1933
+ 0.032421231269836426,
1934
+ -0.025632381439208984,
1935
+ 0.031023621559143066,
1936
+ 0.9999992847442627,
1937
+ 0.0012047290802001953,
1938
+ 0.0001195669174194336,
1939
+ -0.0012047886848449707,
1940
+ 0.9999990463256836,
1941
+ 0.0006859302520751953,
1942
+ 0.41629624366760254
1943
+ ],
1944
+ [
1945
+ 0.009063482284545898,
1946
+ 0.20257973670959473,
1947
+ 0.042908430099487305,
1948
+ 1.0,
1949
+ 0.0,
1950
+ 0.0,
1951
+ 0.0,
1952
+ 1.0,
1953
+ 0.0,
1954
+ 0.03682661056518555,
1955
+ -0.025924086570739746,
1956
+ 0.5240132808685303,
1957
+ 0.9999792575836182,
1958
+ -0.004476070404052734,
1959
+ -0.004615306854248047,
1960
+ 0.004358172416687012,
1961
+ 0.999671459197998,
1962
+ -0.025255680084228516,
1963
+ 0.41925930976867676,
1964
+ 0.016828179359436035,
1965
+ 0.05714380741119385,
1966
+ 0.034651994705200195,
1967
+ 0.9999949932098389,
1968
+ -0.003073453903198242,
1969
+ 0.0007290840148925781,
1970
+ 0.003074049949645996,
1971
+ 0.9999948740005493,
1972
+ -0.0009063482284545898,
1973
+ 0.41629624366760254
1974
+ ],
1975
+ [
1976
+ 0.0065114498138427734,
1977
+ 0.20257973670959473,
1978
+ 0.042908430099487305,
1979
+ 1.0,
1980
+ -9.5367431640625e-07,
1981
+ 2.5033950805664062e-06,
1982
+ 9.5367431640625e-07,
1983
+ 1.0,
1984
+ -5.960464477539063e-08,
1985
+ 0.01458740234375,
1986
+ -0.02258777618408203,
1987
+ 0.4755741357803345,
1988
+ 0.9999876022338867,
1989
+ 0.0021882057189941406,
1990
+ -0.004497945308685303,
1991
+ -0.0022587180137634277,
1992
+ 0.9998735189437866,
1993
+ -0.01574110984802246,
1994
+ 0.41925930976867676,
1995
+ 0.03340005874633789,
1996
+ 0.036458492279052734,
1997
+ 0.033553123474121094,
1998
+ 0.9999995231628418,
1999
+ -0.0007740259170532227,
2000
+ -0.0005014538764953613,
2001
+ 0.0007735490798950195,
2002
+ 0.9999990463256836,
2003
+ -0.0010625720024108887,
2004
+ 0.41629624366760254
2005
+ ]
2006
+ ]
2007
+ ]
2008
+ }
assets/example_action_fd_agibotworld_first_frame.png ADDED

Git LFS Details

  • SHA256: 78b7288846b05c2265f4aa9a31b7aa905d75e0154c2126adcf8472432a81053c
  • Pointer size: 131 Bytes
  • Size of remote file: 638 kB
assets/example_action_id_av_0_input.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff205f86ae169031c0e12c8c6bc8aabe24aeb44b84a16cba977669b29912f5fc
3
+ size 1125838
assets/example_action_id_av_0_output.json ADDED
@@ -0,0 +1,669 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "data": [
3
+ [
4
+ -0.004169374704360962,
5
+ 0.0013064965605735779,
6
+ 0.04288285970687866,
7
+ 0.9930733442306519,
8
+ -0.0027953684329986572,
9
+ 0.00307542085647583,
10
+ -0.002344876527786255,
11
+ 1.0027554035186768,
12
+ -0.0005963500589132309
13
+ ],
14
+ [
15
+ -0.000341169536113739,
16
+ 0.0040023550391197205,
17
+ 0.030853301286697388,
18
+ 1.0044922828674316,
19
+ 0.0050698816776275635,
20
+ 0.0007043546065688133,
21
+ -0.0014923065900802612,
22
+ 1.0103027820587158,
23
+ -0.0039251744747161865
24
+ ],
25
+ [
26
+ 0.00016222894191741943,
27
+ 0.002112939953804016,
28
+ 0.004467234015464783,
29
+ 1.0028852224349976,
30
+ 0.003892749547958374,
31
+ 0.00044799596071243286,
32
+ -0.0021587014198303223,
33
+ 1.0042486190795898,
34
+ -0.001470804214477539
35
+ ],
36
+ [
37
+ 0.00045252684503793716,
38
+ 0.003622382879257202,
39
+ 0.0019504725933074951,
40
+ 1.0071654319763184,
41
+ -0.00045464932918548584,
42
+ 0.0014991164207458496,
43
+ -0.004504382610321045,
44
+ 0.9974222183227539,
45
+ 0.0005441531538963318
46
+ ],
47
+ [
48
+ -0.003174394369125366,
49
+ -0.0017216801643371582,
50
+ -0.003694474697113037,
51
+ 1.003875494003296,
52
+ 0.000801771879196167,
53
+ 5.6371092796325684e-05,
54
+ -4.088878631591797e-05,
55
+ 1.0089592933654785,
56
+ 0.0010747015476226807
57
+ ],
58
+ [
59
+ -0.0021718740463256836,
60
+ 0.0004750937223434448,
61
+ -0.0029296129941940308,
62
+ 1.0002799034118652,
63
+ 0.002264350652694702,
64
+ 0.00046096742153167725,
65
+ -0.0009260214865207672,
66
+ 1.00685453414917,
67
+ 0.0006359070539474487
68
+ ],
69
+ [
70
+ -0.002009839750826359,
71
+ -0.0009101033210754395,
72
+ 0.0028893351554870605,
73
+ 0.9993182420730591,
74
+ 0.0020936131477355957,
75
+ -1.9840896129608154e-05,
76
+ 0.0015915334224700928,
77
+ 1.0037269592285156,
78
+ -0.004174880683422089
79
+ ],
80
+ [
81
+ 0.0034651458263397217,
82
+ 0.0037903208285570145,
83
+ 0.0032828450202941895,
84
+ 1.000267505645752,
85
+ -0.0036886483430862427,
86
+ -0.0017361268401145935,
87
+ -0.002902016043663025,
88
+ 0.9957513809204102,
89
+ 0.002555273473262787
90
+ ],
91
+ [
92
+ 0.006279021501541138,
93
+ 0.00949704647064209,
94
+ 0.006110593676567078,
95
+ 1.0021371841430664,
96
+ -0.0037842243909835815,
97
+ -0.0012855827808380127,
98
+ -0.001484893262386322,
99
+ 1.0039020776748657,
100
+ -0.00035249069333076477
101
+ ],
102
+ [
103
+ -0.001515701413154602,
104
+ -0.0002789795398712158,
105
+ -0.0068062953650951385,
106
+ 1.0103508234024048,
107
+ 0.001192629337310791,
108
+ 0.0030622780323028564,
109
+ 0.0011264681816101074,
110
+ 1.007310390472412,
111
+ 0.002111390233039856
112
+ ],
113
+ [
114
+ -0.0014958996325731277,
115
+ 0.0031677186489105225,
116
+ -0.004236042499542236,
117
+ 1.0023982524871826,
118
+ -0.002504110336303711,
119
+ -0.0016808435320854187,
120
+ 0.0012012720108032227,
121
+ 1.0030899047851562,
122
+ 0.0005880743265151978
123
+ ],
124
+ [
125
+ -0.00013130903244018555,
126
+ 0.004746794700622559,
127
+ 0.0033633261919021606,
128
+ 1.0057588815689087,
129
+ 0.003953307867050171,
130
+ 0.000896356999874115,
131
+ -0.0007526576519012451,
132
+ 1.004878044128418,
133
+ -0.00039952993392944336
134
+ ],
135
+ [
136
+ 0.0026078522205352783,
137
+ -0.0008465610444545746,
138
+ 0.009034380316734314,
139
+ 1.0077221393585205,
140
+ -0.0009244084358215332,
141
+ -0.0018756091594696045,
142
+ 0.0006937459111213684,
143
+ 1.0071992874145508,
144
+ 0.0007385127246379852
145
+ ],
146
+ [
147
+ -0.0015690997242927551,
148
+ -0.000245044007897377,
149
+ 0.008958447724580765,
150
+ 1.000647783279419,
151
+ 0.001012757420539856,
152
+ 0.000528484582901001,
153
+ -0.0014475509524345398,
154
+ 1.0043647289276123,
155
+ -0.0019880682229995728
156
+ ],
157
+ [
158
+ 0.0004642903804779053,
159
+ 0.001843869686126709,
160
+ 0.01116645336151123,
161
+ 1.0001274347305298,
162
+ 0.004533201456069946,
163
+ 0.003915667533874512,
164
+ 0.0015954822301864624,
165
+ 1.004312515258789,
166
+ 4.3004751205444336e-05
167
+ ],
168
+ [
169
+ 0.0014523789286613464,
170
+ -0.004249289631843567,
171
+ 0.01856423169374466,
172
+ 1.0013573169708252,
173
+ 0.002036675810813904,
174
+ -0.001862620934844017,
175
+ -0.002076610689982772,
176
+ 1.0027811527252197,
177
+ -0.00468212366104126
178
+ ],
179
+ [
180
+ -0.002909451723098755,
181
+ 0.002633988857269287,
182
+ 0.029188022017478943,
183
+ 1.0077584981918335,
184
+ 0.00022396445274353027,
185
+ 0.00390547513961792,
186
+ 0.0020011961460113525,
187
+ 1.0060248374938965,
188
+ -0.0012506172060966492
189
+ ],
190
+ [
191
+ -0.0009786635637283325,
192
+ 0.002693958580493927,
193
+ 0.04165208339691162,
194
+ 1.0019495487213135,
195
+ -0.00014898180961608887,
196
+ 0.0024263858795166016,
197
+ -1.5497207641601562e-05,
198
+ 1.010059118270874,
199
+ -3.0219554901123047e-05
200
+ ],
201
+ [
202
+ -0.0005752965807914734,
203
+ -0.0005559623241424561,
204
+ 0.046234145760536194,
205
+ 1.0054540634155273,
206
+ 0.0020374152809381485,
207
+ -0.0015819594264030457,
208
+ -0.0029042065143585205,
209
+ 1.0099618434906006,
210
+ 0.000680871307849884
211
+ ],
212
+ [
213
+ -0.0007889866828918457,
214
+ -0.0012893080711364746,
215
+ 0.059357136487960815,
216
+ 1.000758171081543,
217
+ 0.002288408577442169,
218
+ 0.00029518455266952515,
219
+ -0.006809890270233154,
220
+ 1.003616452217102,
221
+ 0.0005193725228309631
222
+ ],
223
+ [
224
+ -0.0005115699023008347,
225
+ -2.2411346435546875e-05,
226
+ 0.05575209856033325,
227
+ 1.0020880699157715,
228
+ 0.002775445580482483,
229
+ 0.002016425132751465,
230
+ 1.4990568161010742e-05,
231
+ 1.0059070587158203,
232
+ 0.000255754217505455
233
+ ],
234
+ [
235
+ -0.0032242387533187866,
236
+ 0.0023158714175224304,
237
+ 0.07140621542930603,
238
+ 1.0127027034759521,
239
+ -0.006227180361747742,
240
+ 0.0001257285475730896,
241
+ 0.003853335976600647,
242
+ 0.989827036857605,
243
+ -0.004152536392211914
244
+ ],
245
+ [
246
+ -0.004788219928741455,
247
+ 0.001452852040529251,
248
+ 0.0808199793100357,
249
+ 1.0067814588546753,
250
+ 0.0018687620759010315,
251
+ 0.002240225672721863,
252
+ 0.0007859393954277039,
253
+ 1.009403109550476,
254
+ -0.0021954476833343506
255
+ ],
256
+ [
257
+ -0.00179203599691391,
258
+ 0.0022020190954208374,
259
+ 0.09712601453065872,
260
+ 1.0037879943847656,
261
+ 0.0016462206840515137,
262
+ -0.0021682195365428925,
263
+ -0.002309828996658325,
264
+ 1.007390022277832,
265
+ -0.002376168966293335
266
+ ],
267
+ [
268
+ 0.000849604606628418,
269
+ -0.0019690990447998047,
270
+ 0.09653186798095703,
271
+ 1.0044101476669312,
272
+ -0.00022560358047485352,
273
+ -0.0021284837275743484,
274
+ 0.0007652044296264648,
275
+ 1.0043072700500488,
276
+ -1.784414052963257e-05
277
+ ],
278
+ [
279
+ -0.0006387382745742798,
280
+ 0.0005827993154525757,
281
+ 0.11596892774105072,
282
+ 1.0059268474578857,
283
+ 0.004413425922393799,
284
+ -0.0023792535066604614,
285
+ -0.004380702972412109,
286
+ 1.0090959072113037,
287
+ -0.001389428973197937
288
+ ],
289
+ [
290
+ -0.0010459311306476593,
291
+ -0.004505783319473267,
292
+ 0.1201993077993393,
293
+ 1.0040526390075684,
294
+ -0.00030914321541786194,
295
+ 0.0003615003079175949,
296
+ 0.004234045743942261,
297
+ 1.0040135383605957,
298
+ 0.005033165216445923
299
+ ],
300
+ [
301
+ 0.002282150089740753,
302
+ 0.005020305514335632,
303
+ 0.134363055229187,
304
+ 1.0035362243652344,
305
+ -0.007463634014129639,
306
+ -0.002947300672531128,
307
+ 0.0028315000236034393,
308
+ 1.0030007362365723,
309
+ 0.0013876184821128845
310
+ ],
311
+ [
312
+ 0.005076408386230469,
313
+ -0.0007408089004456997,
314
+ 0.14487546682357788,
315
+ 1.0071125030517578,
316
+ 0.0022079087793827057,
317
+ -0.0019074305891990662,
318
+ 0.0012646764516830444,
319
+ 1.0073299407958984,
320
+ -0.0017407238483428955
321
+ ],
322
+ [
323
+ 0.0033898651599884033,
324
+ 0.0016630440950393677,
325
+ 0.14222049713134766,
326
+ 1.0055317878723145,
327
+ 0.0016905665397644043,
328
+ -0.0016445815563201904,
329
+ 0.0008559226989746094,
330
+ 1.0037096738815308,
331
+ -0.00027045607566833496
332
+ ],
333
+ [
334
+ -0.00047904253005981445,
335
+ 0.0027755796909332275,
336
+ 0.15001334249973297,
337
+ 1.0068082809448242,
338
+ -0.002282470464706421,
339
+ -0.0004724264144897461,
340
+ 0.0015332624316215515,
341
+ 0.9986047744750977,
342
+ -0.00040721893310546875
343
+ ],
344
+ [
345
+ 0.0012988895177841187,
346
+ -0.0017725825309753418,
347
+ 0.15345412492752075,
348
+ 1.0003080368041992,
349
+ -0.004808247089385986,
350
+ 0.0008027921430766582,
351
+ 0.0003478825092315674,
352
+ 1.0045523643493652,
353
+ -0.0006252527236938477
354
+ ],
355
+ [
356
+ 0.0042798519134521484,
357
+ -0.003797680139541626,
358
+ 0.1682387888431549,
359
+ 1.0006740093231201,
360
+ 0.0014480054378509521,
361
+ -0.004346251487731934,
362
+ -0.0011912062764167786,
363
+ 1.0039736032485962,
364
+ 0.004550337791442871
365
+ ],
366
+ [
367
+ 0.0047804368659853935,
368
+ 0.0020081400871276855,
369
+ 0.18152475357055664,
370
+ 1.0033659934997559,
371
+ -0.003238588571548462,
372
+ -0.0002244710922241211,
373
+ 0.001187354326248169,
374
+ 1.001799464225769,
375
+ 0.00035965442657470703
376
+ ],
377
+ [
378
+ 0.002194732427597046,
379
+ 0.004285037517547607,
380
+ 0.17377570271492004,
381
+ 1.0024428367614746,
382
+ 2.7358531951904297e-05,
383
+ 0.0007259398698806763,
384
+ 0.002799510955810547,
385
+ 1.0095839500427246,
386
+ 0.000986546277999878
387
+ ],
388
+ [
389
+ 0.011655598878860474,
390
+ -0.004363194108009338,
391
+ 0.1733943223953247,
392
+ 1.0060877799987793,
393
+ 0.005308032035827637,
394
+ 0.00019598007202148438,
395
+ 0.0020811930298805237,
396
+ 1.005480170249939,
397
+ 0.0034663397818803787
398
+ ],
399
+ [
400
+ 0.005494013428688049,
401
+ 0.001217433251440525,
402
+ 0.19090485572814941,
403
+ 1.0007939338684082,
404
+ -0.0017113089561462402,
405
+ -0.0029762238264083862,
406
+ -0.0029486119747161865,
407
+ 1.007235050201416,
408
+ 0.0011887773871421814
409
+ ],
410
+ [
411
+ 0.009677506983280182,
412
+ -0.006571769714355469,
413
+ 0.18290859460830688,
414
+ 1.0043387413024902,
415
+ 0.0041519105434417725,
416
+ -0.005755007266998291,
417
+ 0.0021400973200798035,
418
+ 1.0117175579071045,
419
+ 0.0013759732246398926
420
+ ],
421
+ [
422
+ 0.009254217147827148,
423
+ 0.00086212158203125,
424
+ 0.19618983566761017,
425
+ 1.004202127456665,
426
+ -0.0024811476469039917,
427
+ -0.009780704975128174,
428
+ -9.959936141967773e-05,
429
+ 1.0029432773590088,
430
+ 0.002646505832672119
431
+ ],
432
+ [
433
+ 0.009517982602119446,
434
+ 0.0005913078784942627,
435
+ 0.1987060010433197,
436
+ 1.0014187097549438,
437
+ -0.003507643938064575,
438
+ -0.0070085227489471436,
439
+ 0.003143489360809326,
440
+ 1.002164363861084,
441
+ 0.0010729804635047913
442
+ ],
443
+ [
444
+ 0.011235184967517853,
445
+ -0.0012075230479240417,
446
+ 0.1985381841659546,
447
+ 1.00819993019104,
448
+ 0.0004249662160873413,
449
+ -0.007858432829380035,
450
+ 0.0010521560907363892,
451
+ 1.0044105052947998,
452
+ 0.0019039809703826904
453
+ ],
454
+ [
455
+ 0.01611921191215515,
456
+ -0.001196291297674179,
457
+ 0.2129121720790863,
458
+ 1.0026592016220093,
459
+ -0.0043135881423950195,
460
+ -0.008859358727931976,
461
+ 0.0018898062407970428,
462
+ 1.0028712749481201,
463
+ 0.001296408474445343
464
+ ],
465
+ [
466
+ 0.012898094952106476,
467
+ -0.00257091224193573,
468
+ 0.22263681888580322,
469
+ 1.0003653764724731,
470
+ -0.0003224611282348633,
471
+ -0.006597660481929779,
472
+ 0.001808561384677887,
473
+ 1.0038889646530151,
474
+ 0.0026077479124069214
475
+ ],
476
+ [
477
+ 0.013466209173202515,
478
+ 0.0018514543771743774,
479
+ 0.2376641482114792,
480
+ 1.0033774375915527,
481
+ -0.0018234401941299438,
482
+ -0.0027636736631393433,
483
+ -0.002201095223426819,
484
+ 0.9944919347763062,
485
+ -0.003013297915458679
486
+ ],
487
+ [
488
+ 0.016432642936706543,
489
+ 0.0016770362854003906,
490
+ 0.2475210428237915,
491
+ 0.9992994070053101,
492
+ 0.0004978589713573456,
493
+ -0.007409483194351196,
494
+ 0.001736219972372055,
495
+ 1.0018973350524902,
496
+ 0.0006765872240066528
497
+ ],
498
+ [
499
+ 0.014197006821632385,
500
+ -0.0011444091796875,
501
+ 0.2522944509983063,
502
+ 1.007161021232605,
503
+ -0.0002377331256866455,
504
+ -0.0076867565512657166,
505
+ 0.00048054754734039307,
506
+ 1.0041539669036865,
507
+ 4.76837158203125e-07
508
+ ],
509
+ [
510
+ 0.013951335102319717,
511
+ 0.00016671419143676758,
512
+ 0.26611241698265076,
513
+ 1.003096580505371,
514
+ 7.110834121704102e-05,
515
+ -0.007618337869644165,
516
+ -0.0012885034084320068,
517
+ 1.0024244785308838,
518
+ -0.0006037205457687378
519
+ ],
520
+ [
521
+ 0.021747827529907227,
522
+ -0.0007416084408760071,
523
+ 0.2753673195838928,
524
+ 1.004169225692749,
525
+ -0.0007948616985231638,
526
+ -0.0035905838012695312,
527
+ 0.0017324388027191162,
528
+ 1.0070855617523193,
529
+ 0.0011792778968811035
530
+ ],
531
+ [
532
+ 0.0197313129901886,
533
+ 0.0010002106428146362,
534
+ 0.28150367736816406,
535
+ 1.000486135482788,
536
+ 0.0008391812443733215,
537
+ -0.00998014211654663,
538
+ 0.00013040006160736084,
539
+ 1.0073860883712769,
540
+ 0.00011730194091796875
541
+ ],
542
+ [
543
+ 0.013363130390644073,
544
+ -0.0013455301523208618,
545
+ 0.29323309659957886,
546
+ 1.0011693239212036,
547
+ 0.0004394911229610443,
548
+ -0.010259315371513367,
549
+ 0.002908453345298767,
550
+ 1.00381338596344,
551
+ -0.0015798062086105347
552
+ ],
553
+ [
554
+ 0.011962294578552246,
555
+ 0.0017336606979370117,
556
+ 0.3051491379737854,
557
+ 1.0071864128112793,
558
+ -0.00198325514793396,
559
+ -0.007243074476718903,
560
+ 0.002382628619670868,
561
+ 1.0005227327346802,
562
+ 0.002617824822664261
563
+ ],
564
+ [
565
+ 0.013890508562326431,
566
+ 0.005148977041244507,
567
+ 0.31365522742271423,
568
+ 1.0045785903930664,
569
+ -0.0014152824878692627,
570
+ -0.0037764757871627808,
571
+ -1.0967254638671875e-05,
572
+ 1.0036839246749878,
573
+ -0.0013008862733840942
574
+ ],
575
+ [
576
+ 0.022371917963027954,
577
+ 0.0010523945093154907,
578
+ 0.32621023058891296,
579
+ 0.9973301887512207,
580
+ 7.641687989234924e-05,
581
+ -0.008195962756872177,
582
+ -0.0024689435958862305,
583
+ 1.0000015497207642,
584
+ 0.0024291127920150757
585
+ ],
586
+ [
587
+ 0.013878557831048965,
588
+ 0.0006828606128692627,
589
+ 0.338645339012146,
590
+ 0.9975997805595398,
591
+ -0.00040869414806365967,
592
+ -0.00787343829870224,
593
+ -0.003890112042427063,
594
+ 1.0047005414962769,
595
+ 0.002561509609222412
596
+ ],
597
+ [
598
+ 0.015209555625915527,
599
+ 0.003898248076438904,
600
+ 0.3533927798271179,
601
+ 0.9961752891540527,
602
+ -0.0017392784357070923,
603
+ -0.011618092656135559,
604
+ -0.002202928066253662,
605
+ 1.004539132118225,
606
+ 0.0014745891094207764
607
+ ],
608
+ [
609
+ 0.014465630054473877,
610
+ 0.004823088645935059,
611
+ 0.3626636266708374,
612
+ 1.0013577938079834,
613
+ -0.001902550458908081,
614
+ -0.007857277989387512,
615
+ 0.0013346821069717407,
616
+ 1.0042797327041626,
617
+ 0.002392292022705078
618
+ ],
619
+ [
620
+ 0.01207183301448822,
621
+ -0.0008889138698577881,
622
+ 0.3779425024986267,
623
+ 1.0077900886535645,
624
+ 0.003589630126953125,
625
+ -0.006766825914382935,
626
+ -0.0006845667958259583,
627
+ 1.0097190141677856,
628
+ 0.0004222095012664795
629
+ ],
630
+ [
631
+ 0.014950022101402283,
632
+ 0.00266294926404953,
633
+ 0.38445812463760376,
634
+ 1.0014134645462036,
635
+ -0.0014687180519104004,
636
+ -0.010563522577285767,
637
+ -0.001019701361656189,
638
+ 1.0056054592132568,
639
+ 0.0019721686840057373
640
+ ],
641
+ [
642
+ 0.017121687531471252,
643
+ 0.0029426664113998413,
644
+ 0.39281076192855835,
645
+ 1.0034687519073486,
646
+ -0.00029963254928588867,
647
+ -0.014892727136611938,
648
+ 0.00041075050830841064,
649
+ 1.001159906387329,
650
+ -0.0016954541206359863
651
+ ],
652
+ [
653
+ 0.013786494731903076,
654
+ 0.004227876663208008,
655
+ 0.4070053696632385,
656
+ 1.0027498006820679,
657
+ 0.0027508288621902466,
658
+ -0.006377458572387695,
659
+ -0.0003670156002044678,
660
+ 0.998216986656189,
661
+ -0.0007525086402893066
662
+ ]
663
+ ],
664
+ "shape": [
665
+ 60,
666
+ 9
667
+ ],
668
+ "dtype": "float32"
669
+ }
assets/example_action_id_av_0_output.png ADDED

Git LFS Details

  • SHA256: bc0adf920602d9c1ecd0a2e2c1d370ac0b5af22eaddf7d9e1a75efca268e6823
  • Pointer size: 131 Bytes
  • Size of remote file: 136 kB
assets/example_action_id_av_1_input.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:169e65cee76ef7c6366987a8e4c8e4ec6f803f5e624ba2ef39ba1f0c672925aa
3
+ size 1639407
assets/example_action_id_av_1_output.json ADDED
@@ -0,0 +1,669 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "data": [
3
+ [
4
+ 0.07009392976760864,
5
+ 0.006693661212921143,
6
+ 1.1420012712478638,
7
+ 1.0078833103179932,
8
+ 0.0012149810791015625,
9
+ -0.024351507425308228,
10
+ 0.003295496106147766,
11
+ 1.0031427145004272,
12
+ 0.000886848196387291
13
+ ],
14
+ [
15
+ 0.06402561813592911,
16
+ -0.0018425136804580688,
17
+ 1.1193486452102661,
18
+ 1.0022863149642944,
19
+ -0.003297761082649231,
20
+ -0.021648261696100235,
21
+ 0.003286510705947876,
22
+ 1.008054256439209,
23
+ 0.0026217997074127197
24
+ ],
25
+ [
26
+ 0.06708404421806335,
27
+ -0.011202313005924225,
28
+ 1.0937659740447998,
29
+ 1.0016952753067017,
30
+ -0.0032352805137634277,
31
+ -0.02467486262321472,
32
+ 0.002940252423286438,
33
+ 1.0018234252929688,
34
+ 0.005707293748855591
35
+ ],
36
+ [
37
+ 0.07562961429357529,
38
+ -0.015175968408584595,
39
+ 1.06215238571167,
40
+ 1.0063788890838623,
41
+ -0.004444316029548645,
42
+ -0.022849813103675842,
43
+ 0.012309730052947998,
44
+ 1.005750298500061,
45
+ 0.006214454770088196
46
+ ],
47
+ [
48
+ 0.08489534258842468,
49
+ -0.01856812834739685,
50
+ 1.0337228775024414,
51
+ 0.9988476037979126,
52
+ -0.006072938442230225,
53
+ -0.030434921383857727,
54
+ 0.003413841128349304,
55
+ 1.0011348724365234,
56
+ 0.0019939541816711426
57
+ ],
58
+ [
59
+ 0.09222845733165741,
60
+ -0.010918542742729187,
61
+ 1.0194001197814941,
62
+ 0.9980852603912354,
63
+ -0.0031263232231140137,
64
+ -0.03405023366212845,
65
+ 0.004428897053003311,
66
+ 1.001816987991333,
67
+ 0.0022291988134384155
68
+ ],
69
+ [
70
+ 0.10272128880023956,
71
+ 0.000903397798538208,
72
+ 1.0029981136322021,
73
+ 0.9994226694107056,
74
+ -0.0014013946056365967,
75
+ -0.0366886705160141,
76
+ 0.00678609311580658,
77
+ 1.0000828504562378,
78
+ -0.003755345940589905
79
+ ],
80
+ [
81
+ 0.11606456339359283,
82
+ 0.014216836541891098,
83
+ 0.9876351356506348,
84
+ 0.9964910745620728,
85
+ -0.0012535229325294495,
86
+ -0.04123753681778908,
87
+ 0.0015066340565681458,
88
+ 1.0026605129241943,
89
+ -0.006498083472251892
90
+ ],
91
+ [
92
+ 0.12204641103744507,
93
+ 0.015004783868789673,
94
+ 0.9837003350257874,
95
+ 0.9974825382232666,
96
+ -0.0012949556112289429,
97
+ -0.04007109999656677,
98
+ 0.00612206757068634,
99
+ 0.9999562501907349,
100
+ -0.0036430060863494873
101
+ ],
102
+ [
103
+ 0.12851455807685852,
104
+ 0.015528619289398193,
105
+ 0.9764279723167419,
106
+ 0.9981354475021362,
107
+ 0.0018989741802215576,
108
+ -0.04426932334899902,
109
+ 0.004600256681442261,
110
+ 0.9985778331756592,
111
+ -0.003377959132194519
112
+ ],
113
+ [
114
+ 0.13196571171283722,
115
+ 0.00893557071685791,
116
+ 0.9678859114646912,
117
+ 1.0045433044433594,
118
+ 0.002807438373565674,
119
+ -0.051086850464344025,
120
+ -0.0002923011779785156,
121
+ 1.0062892436981201,
122
+ -8.818507194519043e-05
123
+ ],
124
+ [
125
+ 0.1350717842578888,
126
+ 0.010051608085632324,
127
+ 0.9643991589546204,
128
+ 1.0022790431976318,
129
+ -0.001323610544204712,
130
+ -0.04617614299058914,
131
+ 0.0016942918300628662,
132
+ 1.0036096572875977,
133
+ 0.0013676732778549194
134
+ ],
135
+ [
136
+ 0.14420342445373535,
137
+ 0.01007811725139618,
138
+ 0.955908477306366,
139
+ 0.9972440600395203,
140
+ -0.002124980092048645,
141
+ -0.049623310565948486,
142
+ -0.0002627819776535034,
143
+ 1.000319242477417,
144
+ -0.0005148919299244881
145
+ ],
146
+ [
147
+ 0.14022719860076904,
148
+ 0.013264013454318047,
149
+ 0.9560319185256958,
150
+ 0.9995477199554443,
151
+ 0.0035576671361923218,
152
+ -0.04792314022779465,
153
+ -0.001525580883026123,
154
+ 1.003888487815857,
155
+ -0.0013465248048305511
156
+ ],
157
+ [
158
+ 0.1345217525959015,
159
+ 0.01220095157623291,
160
+ 0.9655839204788208,
161
+ 1.0048763751983643,
162
+ 0.00546342134475708,
163
+ -0.04668119549751282,
164
+ -0.0010053887963294983,
165
+ 1.0068625211715698,
166
+ -0.0029554516077041626
167
+ ],
168
+ [
169
+ 0.13503174483776093,
170
+ 0.011316493153572083,
171
+ 0.9516907334327698,
172
+ 1.0043013095855713,
173
+ 0.002426639199256897,
174
+ -0.045573264360427856,
175
+ -0.0013574131298810244,
176
+ 1.0036004781723022,
177
+ -0.003350198268890381
178
+ ],
179
+ [
180
+ 0.1309308409690857,
181
+ 0.009759962558746338,
182
+ 0.9560006856918335,
183
+ 0.9998754262924194,
184
+ 0.004482567310333252,
185
+ -0.04438298940658569,
186
+ -0.001865878701210022,
187
+ 1.0063111782073975,
188
+ 0.0013731084764003754
189
+ ],
190
+ [
191
+ 0.12462389469146729,
192
+ 0.011290190741419792,
193
+ 0.9650402069091797,
194
+ 1.0036883354187012,
195
+ -0.001193612813949585,
196
+ -0.04106159508228302,
197
+ 0.005374550819396973,
198
+ 1.0081455707550049,
199
+ -0.0027110427618026733
200
+ ],
201
+ [
202
+ 0.12767966091632843,
203
+ 0.012917861342430115,
204
+ 0.9721789360046387,
205
+ 1.0006978511810303,
206
+ -0.0012182332575321198,
207
+ -0.04295356199145317,
208
+ 5.793571472167969e-05,
209
+ 1.0048669576644897,
210
+ -0.0009505599737167358
211
+ ],
212
+ [
213
+ 0.12459099292755127,
214
+ 0.016084089875221252,
215
+ 0.9755645990371704,
216
+ 0.99851393699646,
217
+ 0.002149254083633423,
218
+ -0.040429264307022095,
219
+ -0.004845976829528809,
220
+ 1.0035099983215332,
221
+ -0.0011209547519683838
222
+ ],
223
+ [
224
+ 0.12318825721740723,
225
+ 0.020346134901046753,
226
+ 0.9927335381507874,
227
+ 1.001643180847168,
228
+ 0.00202333927154541,
229
+ -0.04205331206321716,
230
+ -0.004307150840759277,
231
+ 1.0037580728530884,
232
+ -0.0029827356338500977
233
+ ],
234
+ [
235
+ 0.12292107939720154,
236
+ 0.022956013679504395,
237
+ 1.0020689964294434,
238
+ 1.002467393875122,
239
+ 0.005101993680000305,
240
+ -0.04205179214477539,
241
+ -0.0018941611051559448,
242
+ 1.0055075883865356,
243
+ -0.0008601397275924683
244
+ ],
245
+ [
246
+ 0.12235790491104126,
247
+ 0.020232800394296646,
248
+ 1.0193688869476318,
249
+ 0.9982290863990784,
250
+ 0.0059588029980659485,
251
+ -0.03710612654685974,
252
+ -0.0030762851238250732,
253
+ 1.0036814212799072,
254
+ 0.003678947687149048
255
+ ],
256
+ [
257
+ 0.11874179542064667,
258
+ 0.02020188421010971,
259
+ 1.0269538164138794,
260
+ 1.00148344039917,
261
+ 0.003334537148475647,
262
+ -0.04146520793437958,
263
+ -0.0018786042928695679,
264
+ 1.0038044452667236,
265
+ -0.0026959776878356934
266
+ ],
267
+ [
268
+ 0.11215314269065857,
269
+ 0.015729710459709167,
270
+ 1.04316246509552,
271
+ 1.000847339630127,
272
+ 0.002756774425506592,
273
+ -0.039638493210077286,
274
+ -0.0009539872407913208,
275
+ 1.0064407587051392,
276
+ -0.0033986493945121765
277
+ ],
278
+ [
279
+ 0.1063896119594574,
280
+ 0.021251197904348373,
281
+ 1.059739112854004,
282
+ 1.0029915571212769,
283
+ 0.008030325174331665,
284
+ -0.03942205011844635,
285
+ -0.006604194641113281,
286
+ 1.0077309608459473,
287
+ -0.0016417503356933594
288
+ ],
289
+ [
290
+ 0.10521477460861206,
291
+ 0.018115460872650146,
292
+ 1.071585774421692,
293
+ 0.9995502829551697,
294
+ 0.002011328935623169,
295
+ -0.03752366453409195,
296
+ -0.003705829381942749,
297
+ 0.996671199798584,
298
+ -0.002115488052368164
299
+ ],
300
+ [
301
+ 0.10078927874565125,
302
+ 0.015093140304088593,
303
+ 1.0889707803726196,
304
+ 1.0062332153320312,
305
+ 0.00926429033279419,
306
+ -0.03409150242805481,
307
+ -0.0007918514311313629,
308
+ 1.0127429962158203,
309
+ -0.002360224723815918
310
+ ],
311
+ [
312
+ 0.09784230589866638,
313
+ 0.016763487830758095,
314
+ 1.1208187341690063,
315
+ 1.0020859241485596,
316
+ 0.003513324074447155,
317
+ -0.03430382162332535,
318
+ 0.0005292966961860657,
319
+ 1.004651427268982,
320
+ -0.005986005067825317
321
+ ],
322
+ [
323
+ 0.07926556468009949,
324
+ 0.012953147292137146,
325
+ 1.140628695487976,
326
+ 0.9961432218551636,
327
+ 0.0038596540689468384,
328
+ -0.030215993523597717,
329
+ -0.0013297051191329956,
330
+ 1.0029637813568115,
331
+ -0.00047488510608673096
332
+ ],
333
+ [
334
+ 0.0753796398639679,
335
+ 0.00887972116470337,
336
+ 1.1544592380523682,
337
+ 1.0071207284927368,
338
+ 0.004703924059867859,
339
+ -0.029834091663360596,
340
+ 0.004652403295040131,
341
+ 1.008349895477295,
342
+ -0.0002899765968322754
343
+ ],
344
+ [
345
+ 0.06969903409481049,
346
+ 0.008346021175384521,
347
+ 1.1774840354919434,
348
+ 1.004931926727295,
349
+ 0.00202101469039917,
350
+ -0.025847388431429863,
351
+ 0.0042855143547058105,
352
+ 1.0110220909118652,
353
+ 0.001279592514038086
354
+ ],
355
+ [
356
+ 0.06848141551017761,
357
+ 0.003982707858085632,
358
+ 1.194669485092163,
359
+ 0.9975674748420715,
360
+ 0.0007158070802688599,
361
+ -0.02307547628879547,
362
+ 0.0026876721531152725,
363
+ 1.0068185329437256,
364
+ 0.001052945852279663
365
+ ],
366
+ [
367
+ 0.057075899094343185,
368
+ 0.01037299633026123,
369
+ 1.2149968147277832,
370
+ 1.0047249794006348,
371
+ 8.045509457588196e-05,
372
+ -0.02383589744567871,
373
+ 0.006846204400062561,
374
+ 1.007742166519165,
375
+ 0.002701878547668457
376
+ ],
377
+ [
378
+ 0.06148111820220947,
379
+ 0.010166585445404053,
380
+ 1.2267282009124756,
381
+ 0.9979768991470337,
382
+ -0.0002384483814239502,
383
+ -0.026486635208129883,
384
+ 0.006823986768722534,
385
+ 1.0042164325714111,
386
+ -0.0020662546157836914
387
+ ],
388
+ [
389
+ 0.054221928119659424,
390
+ 0.01702176034450531,
391
+ 1.2517765760421753,
392
+ 1.0032732486724854,
393
+ -0.003955543041229248,
394
+ -0.02357548475265503,
395
+ 0.0030149370431900024,
396
+ 1.0078377723693848,
397
+ 3.4650787711143494e-05
398
+ ],
399
+ [
400
+ 0.05204963684082031,
401
+ 0.020570974797010422,
402
+ 1.2836378812789917,
403
+ 0.9974656701087952,
404
+ -0.0004451274871826172,
405
+ -0.02366816997528076,
406
+ -0.0014014989137649536,
407
+ 1.0038502216339111,
408
+ -0.0017639398574829102
409
+ ],
410
+ [
411
+ 0.05103199928998947,
412
+ 0.026325732469558716,
413
+ 1.292293906211853,
414
+ 1.001286506652832,
415
+ -0.0010189414024353027,
416
+ -0.020195096731185913,
417
+ 0.0006793439388275146,
418
+ 1.00400972366333,
419
+ -0.0025037825107574463
420
+ ],
421
+ [
422
+ 0.04431256651878357,
423
+ 0.021927133202552795,
424
+ 1.3272300958633423,
425
+ 1.0045137405395508,
426
+ 0.000967755913734436,
427
+ -0.01706451177597046,
428
+ -3.7297606468200684e-05,
429
+ 1.0049716234207153,
430
+ -0.0011880025267601013
431
+ ],
432
+ [
433
+ 0.03555189073085785,
434
+ 0.01882302761077881,
435
+ 1.3371576070785522,
436
+ 1.0066249370574951,
437
+ -0.0009097158908843994,
438
+ -0.014253586530685425,
439
+ -0.0018957853317260742,
440
+ 1.0077271461486816,
441
+ 0.0016103871166706085
442
+ ],
443
+ [
444
+ 0.025582119822502136,
445
+ 0.01724805310368538,
446
+ 1.3608332872390747,
447
+ 1.0031605958938599,
448
+ 0.0026828348636627197,
449
+ -0.012003101408481598,
450
+ 0.003360927104949951,
451
+ 1.004669189453125,
452
+ 6.693601608276367e-05
453
+ ],
454
+ [
455
+ 0.01674354076385498,
456
+ 0.012093983590602875,
457
+ 1.3856382369995117,
458
+ 1.0011014938354492,
459
+ -0.0013251304626464844,
460
+ -0.011342525482177734,
461
+ 0.0018950244411826134,
462
+ 1.0027016401290894,
463
+ 0.0013082325458526611
464
+ ],
465
+ [
466
+ 0.015420682728290558,
467
+ 0.015567049384117126,
468
+ 1.4040814638137817,
469
+ 0.9977735877037048,
470
+ 0.0011159181594848633,
471
+ -0.009455356746912003,
472
+ -3.6954879760742188e-06,
473
+ 1.0064294338226318,
474
+ 0.001549556851387024
475
+ ],
476
+ [
477
+ 0.010074913501739502,
478
+ 0.015535108745098114,
479
+ 1.4403393268585205,
480
+ 1.0040385723114014,
481
+ -0.0023125112056732178,
482
+ -0.0028183162212371826,
483
+ 0.0014582127332687378,
484
+ 1.0049704313278198,
485
+ 6.872415542602539e-05
486
+ ],
487
+ [
488
+ 0.008311420679092407,
489
+ 0.017948955297470093,
490
+ 1.4601424932479858,
491
+ 0.9989641308784485,
492
+ 0.001294134184718132,
493
+ -0.005496010184288025,
494
+ 0.0015567168593406677,
495
+ 1.005094051361084,
496
+ -0.0007513612508773804
497
+ ],
498
+ [
499
+ 0.0035017579793930054,
500
+ 0.020573198795318604,
501
+ 1.4812562465667725,
502
+ 1.0009287595748901,
503
+ 0.004388183355331421,
504
+ -0.0034145116806030273,
505
+ 0.0032011419534683228,
506
+ 1.0060547590255737,
507
+ 0.0006353855133056641
508
+ ],
509
+ [
510
+ 0.0016254633665084839,
511
+ 0.025684982538223267,
512
+ 1.5020363330841064,
513
+ 1.0030529499053955,
514
+ 0.0016576647758483887,
515
+ -0.005533203482627869,
516
+ 0.006280124187469482,
517
+ 1.0066683292388916,
518
+ -0.004450388252735138
519
+ ],
520
+ [
521
+ 0.004891932010650635,
522
+ 0.022001147270202637,
523
+ 1.5159921646118164,
524
+ 0.9982910752296448,
525
+ 0.0019840586464852095,
526
+ 0.0025368332862854004,
527
+ 0.003538757562637329,
528
+ 1.0018901824951172,
529
+ -0.002766549587249756
530
+ ],
531
+ [
532
+ 0.005643934011459351,
533
+ 0.017889663577079773,
534
+ 1.5284485816955566,
535
+ 0.9957029223442078,
536
+ 0.0005564317107200623,
537
+ -0.005682408809661865,
538
+ 0.004203110933303833,
539
+ 1.0006530284881592,
540
+ -0.002186410129070282
541
+ ],
542
+ [
543
+ 0.0013659894466400146,
544
+ 0.017303138971328735,
545
+ 1.5459176301956177,
546
+ 0.9986610412597656,
547
+ 0.0025620944797992706,
548
+ -0.0046241506934165955,
549
+ 0.0023546814918518066,
550
+ 1.0049318075180054,
551
+ -0.0027633383870124817
552
+ ],
553
+ [
554
+ 0.0020876526832580566,
555
+ 0.01876453310251236,
556
+ 1.5635592937469482,
557
+ 1.0048083066940308,
558
+ 0.002034813165664673,
559
+ -0.00023903697729110718,
560
+ 0.0019740089774131775,
561
+ 1.0029975175857544,
562
+ -0.0030378326773643494
563
+ ],
564
+ [
565
+ -0.000969788758084178,
566
+ 0.02315378189086914,
567
+ 1.5827395915985107,
568
+ 1.0053610801696777,
569
+ 0.0017460882663726807,
570
+ 0.0014020651578903198,
571
+ 0.005061998963356018,
572
+ 1.0060739517211914,
573
+ -0.001481384038925171
574
+ ],
575
+ [
576
+ -0.0009270310401916504,
577
+ 0.021047085523605347,
578
+ 1.591257095336914,
579
+ 0.9957737326622009,
580
+ 0.000979650765657425,
581
+ -0.00326402485370636,
582
+ 0.0065198540687561035,
583
+ 1.0065511465072632,
584
+ -0.004033505916595459
585
+ ],
586
+ [
587
+ 0.0006775943329557776,
588
+ 0.021081745624542236,
589
+ 1.6065882444381714,
590
+ 0.9954938888549805,
591
+ -0.00036662817001342773,
592
+ -0.0016272813081741333,
593
+ 0.0027448534965515137,
594
+ 1.0027532577514648,
595
+ -0.0011054277420043945
596
+ ],
597
+ [
598
+ 0.0006171315908432007,
599
+ 0.023595139384269714,
600
+ 1.63612699508667,
601
+ 0.9980154037475586,
602
+ 0.0018391646444797516,
603
+ -0.0013407617807388306,
604
+ 0.0035662055015563965,
605
+ 1.0019949674606323,
606
+ 0.0002149343490600586
607
+ ],
608
+ [
609
+ 0.00248805433511734,
610
+ 0.024855956435203552,
611
+ 1.6411882638931274,
612
+ 1.003147840499878,
613
+ -0.0025542080402374268,
614
+ -0.0012470632791519165,
615
+ 0.0017508864402770996,
616
+ 1.0075249671936035,
617
+ 0.004809528589248657
618
+ ],
619
+ [
620
+ 0.004023965448141098,
621
+ 0.02341434359550476,
622
+ 1.6552321910858154,
623
+ 0.9999925494194031,
624
+ 0.003260374069213867,
625
+ -0.004240959882736206,
626
+ -0.0003981068730354309,
627
+ 1.0055534839630127,
628
+ -0.002688378095626831
629
+ ],
630
+ [
631
+ 0.005011148750782013,
632
+ 0.021908849477767944,
633
+ 1.6845402717590332,
634
+ 0.9952720403671265,
635
+ 0.0013539791107177734,
636
+ -0.004400491714477539,
637
+ -0.0024556964635849,
638
+ 1.0060759782791138,
639
+ 0.0018045902252197266
640
+ ],
641
+ [
642
+ 0.00039759278297424316,
643
+ 0.022936120629310608,
644
+ 1.6785025596618652,
645
+ 0.9988229274749756,
646
+ 0.0002734065055847168,
647
+ -0.006101280450820923,
648
+ -0.00011269748210906982,
649
+ 1.0011308193206787,
650
+ -0.0015612244606018066
651
+ ],
652
+ [
653
+ -0.00012701749801635742,
654
+ 0.022358208894729614,
655
+ 1.6973373889923096,
656
+ 1.0032126903533936,
657
+ 0.001924470067024231,
658
+ -0.0016700327396392822,
659
+ 0.0007153153419494629,
660
+ 1.0081400871276855,
661
+ 0.0002083033323287964
662
+ ]
663
+ ],
664
+ "shape": [
665
+ 60,
666
+ 9
667
+ ],
668
+ "dtype": "float32"
669
+ }
assets/example_action_id_av_1_output.png ADDED

Git LFS Details

  • SHA256: 056ff91da285cb32e267d5e2115e7a3f7085bd26dd2d903457082b002fffa769
  • Pointer size: 131 Bytes
  • Size of remote file: 134 kB
assets/example_i2v_input.jpg ADDED

Git LFS Details

  • SHA256: 1de51eb5c6d51ec97aa3351a6685717b0253e36073eb9cdd3ef65affa115a4e2
  • Pointer size: 131 Bytes
  • Size of remote file: 861 kB
assets/example_i2v_output.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dce2d76053d4283342bb4103a3a32a28b5529a97b15cacc386dd75c812601779
3
+ size 17298256
assets/example_i2v_prompt.json ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "subjects": [
3
+ {
4
+ "description": "A car (viewed from the driver's perspective/dashcam) traveling along a coastal mountain road. The vehicle's grey dashboard and windshield frame are visible at the bottom of the frame.",
5
+ "appearance_details": "Grey/dark interior dashboard, windshield wipers visible, clean windshield providing clear view of the road ahead",
6
+ "relationship": "Primary subject and point-of-view vehicle navigating the mountain road",
7
+ "location": "Bottom foreground, providing the POV perspective",
8
+ "relative_size": "Large within frame (dashboard occupies bottom portion)",
9
+ "orientation": "Facing forward along the road direction",
10
+ "pose": "",
11
+ "action": "Driving fast along the curved coastal road, then making a sudden emergency stop",
12
+ "state_changes": "Transitions from fast forward motion to abrupt deceleration and complete stop",
13
+ "clothing": "",
14
+ "expression": "",
15
+ "gender": "",
16
+ "age": "",
17
+ "skin_tone_and_texture": "",
18
+ "facial_features": "",
19
+ "number_of_subjects": 1,
20
+ "number_of_arms": 0,
21
+ "number_of_legs": 0
22
+ },
23
+ {
24
+ "description": "Falling rocks and debris from the steep rocky mountain cliff face, creating a landslide that tumbles down toward the road surface",
25
+ "appearance_details": "Grey and brown angular rocks of various sizes, accompanied by dust and smaller debris, mixed with displaced green vegetation",
26
+ "relationship": "Obstacle that forces the car to make an emergency stop",
27
+ "location": "Center and center-right of frame, falling from the mountain cliff face toward the road",
28
+ "relative_size": "Medium within frame, growing larger as rocks approach the road",
29
+ "orientation": "Falling downward from the cliff face",
30
+ "pose": "",
31
+ "action": "Tumbling and cascading down the steep rocky mountainside toward the road",
32
+ "state_changes": "Begins as small movement on the cliff face, escalates into a significant rockfall with dust cloud",
33
+ "clothing": "",
34
+ "expression": "",
35
+ "gender": "",
36
+ "age": "",
37
+ "skin_tone_and_texture": "",
38
+ "facial_features": "",
39
+ "number_of_subjects": 0,
40
+ "number_of_arms": 0,
41
+ "number_of_legs": 0
42
+ }
43
+ ],
44
+ "background_setting": "A dramatic coastal mountain highway carved along steep rocky cliffs. The road is a well-maintained two-lane highway with white dashed center lines, solid white edge lines, and raised reflective road markers. A metal guardrail runs along the right side protecting from a drop toward the ocean. The mountain rises steeply on the left with exposed grey and brown rock faces partially covered with lush green native vegetation including ferns and shrubs. The ocean is visible in the distance to the right under a partly cloudy blue sky. The road curves sharply to the left around the mountain headland.",
45
+ "lighting": {
46
+ "conditions": "Bright natural daylight with partly cloudy skies, strong ambient illumination",
47
+ "direction": "Top-lit and slightly front-lit from the sun positioned high in the sky",
48
+ "shadows": "Moderate shadows cast by the mountain cliff creating areas of shade on the road and vegetation, soft shadows from clouds",
49
+ "illumination_effect": "Bright, clear visibility with good contrast between the sunlit road surface and shaded mountain areas, natural coastal light"
50
+ },
51
+ "aesthetics": {
52
+ "composition": "Driver's POV/dashcam perspective with the road curving ahead as the central leading line, mountain dominating the upper-left portion, ocean visible to the right, dashboard framing the bottom",
53
+ "color_scheme": "Dominant greens from vegetation, grey-brown rocky cliff faces, dark grey asphalt road with white markings, blue sky and ocean, yellow warning sign providing accent color",
54
+ "mood_atmosphere": "Initially scenic and adventurous, transitioning to tense and dangerous as the landslide occurs",
55
+ "patterns": "Repeating metal guardrail posts along the right side, dashed center line markings, layered rock strata on the cliff face"
56
+ },
57
+ "cinematography": {
58
+ "camera_motion": "Forward-moving dashcam perspective with slight vibration from vehicle speed, sudden forward lurch during emergency braking",
59
+ "framing": "Wide shot from driver's POV through windshield",
60
+ "camera_angle": "Eye-level from driver's seated position",
61
+ "depth_of_field": "Deep",
62
+ "focus": "Road ahead and mountain cliff face in sharp focus",
63
+ "lens_focal_length": "Wide-angle (approximately 28-35mm equivalent)"
64
+ },
65
+ "style_medium": "Live-action video",
66
+ "artistic_style": "Realistic dashcam footage, dramatic documentary style",
67
+ "context": "Dashcam footage capturing a dangerous landslide event on a coastal mountain highway, requiring emergency driving maneuver",
68
+ "actions": [
69
+ {
70
+ "time": "0:00-0:03",
71
+ "description": "The car drives fast along the winding coastal mountain road, approaching the left curve with the road rushing beneath"
72
+ },
73
+ {
74
+ "time": "0:03-0:05",
75
+ "description": "Rocks begin to dislodge from the steep cliff face ahead, tumbling down with increasing intensity as a landslide develops, dust rising from impacts"
76
+ },
77
+ {
78
+ "time": "0:05-0:07",
79
+ "description": "The car makes a sudden emergency stop, the camera lurching forward from the deceleration as rocks and debris continue falling onto the road ahead"
80
+ }
81
+ ],
82
+ "text_and_signage_elements": [
83
+ {
84
+ "text": "Left curve arrow",
85
+ "category": "scene_sign",
86
+ "appearance": "Black arrow on yellow diamond-shaped warning sign, mounted on a metal pole",
87
+ "spatial_temporal": "Left side of road, visible from 0:00 and passing by as car moves forward",
88
+ "context": "Road warning sign indicating a sharp left curve ahead, standard highway signage"
89
+ }
90
+ ],
91
+ "segments": [
92
+ {
93
+ "segment_index": 0,
94
+ "time_range": "0:00-0:03",
95
+ "description": "The car speeds along the scenic coastal mountain highway, navigating the curve. The road surface rushes beneath, white lines blur slightly from speed, and the mountain looms ahead.",
96
+ "key_changes": "Road perspective shifts as car navigates the curve, passing the warning sign, guardrail moves in peripheral view",
97
+ "camera": "Forward-moving dashcam with slight road vibration, steady forward momentum"
98
+ },
99
+ {
100
+ "segment_index": 1,
101
+ "time_range": "0:03-0:05",
102
+ "description": "Rocks begin breaking loose from the cliff face ahead. Small stones first, then larger boulders cascade down the mountainside creating a growing dust cloud and debris field on and near the road.",
103
+ "key_changes": "Mountain transitions from static backdrop to active hazard, dust cloud forms, debris appears on road surface ahead",
104
+ "camera": "Still moving forward but beginning to slow, slight camera shake increases"
105
+ },
106
+ {
107
+ "segment_index": 2,
108
+ "time_range": "0:05-0:07",
109
+ "description": "The car performs an emergency stop. The camera pitches forward from sudden braking force. The vehicle comes to a complete halt with the landslide debris visible on the road ahead, dust still settling.",
110
+ "key_changes": "Rapid deceleration, camera lurches forward then settles, motion stops completely, dust and small rocks still visible falling",
111
+ "camera": "Abrupt forward pitch from braking, then stabilizes to static as vehicle stops"
112
+ }
113
+ ],
114
+ "transitions": [],
115
+ "temporal_caption": "The video begins with a dashcam view of a car driving fast along a scenic coastal mountain road, the asphalt rushing beneath and a sharp left curve sign visible on the left. The car navigates the curve at speed with the steep rocky cliff towering above and the ocean visible to the right. Around the 3-second mark, small rocks begin to dislodge from the cliff face ahead, quickly escalating into a significant rockfall with larger boulders tumbling down the mountainside and a dust cloud forming. By second 5, the driver reacts with emergency braking - the camera lurches forward dramatically from the sudden deceleration. The car comes to a complete stop by second 6-7, with the road ahead partially blocked by fallen rocks and debris, dust still settling in the air around the landslide zone.",
116
+ "audio_description": "Initially the sound of a car engine at moderate-high RPM and tire noise on asphalt with wind passing over the vehicle. Around second 3, the rumbling and cracking sounds of rocks breaking loose begin, growing louder with impacts of stones hitting the road surface and each other. Loud thuds and crashes as larger rocks land. At second 5, the sharp screech of tires braking hard on asphalt, followed by the settling sounds of smaller pebbles and dust, with distant rumbling of the remaining rockfall subsiding.",
117
+ "resolution": {
118
+ "W": 1280,
119
+ "H": 720
120
+ },
121
+ "aspect_ratio": "16,9",
122
+ "duration": "7s",
123
+ "fps": 24
124
+ }
assets/example_i2vs_output.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:421401e0d16252575c042512ed9f6e3b296fe239e71ee861bc693ed542534d46
3
+ size 15752836
assets/example_reasoning_input.png ADDED

Git LFS Details

  • SHA256: 6686b937bdb28e2c9804c435bd63a7ab56ac1531ffc9b482f2b8a75aed8770f5
  • Pointer size: 131 Bytes
  • Size of remote file: 231 kB
assets/example_reasoning_prompt.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "prompt": "The task is to put flower into the red bottle. Generate a plan consisting of subtasks for accomplish the task.",
3
+ "max_tokens": 4096
4
+ }
assets/example_t2v_diffusers_output.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d59ff3a81c442dba15951dca10876e0df855ca054503a2e905cf63996b852327
3
+ size 1693143
assets/example_t2v_output.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f235b9e0cbc8fe455e3d730a7e89e861be546e9d240e86515f4bece4d60334e
3
+ size 9514932
assets/example_t2v_prompt.json ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "subjects": [
3
+ {
4
+ "description": "A modern industrial robotic arm with a silver and dark gray metallic body, featuring multiple articulated joints and a flat rubber-padded gripper end-effector holding a green sponge. The arm has visible hydraulic cables and a smooth polished finish.",
5
+ "appearance_details": "The robotic arm has a sturdy base bolted to the countertop, with branded serial number markings on the shoulder joint. The gripper holds a standard green and yellow kitchen sponge. Small LED indicator lights glow blue near the base.",
6
+ "relationship": "Primary subject interacting with the dirty plate",
7
+ "location": "Center-right of frame",
8
+ "relative_size": "Large within frame",
9
+ "orientation": "Angled toward camera at roughly 45 degrees, arm extending downward toward the plate",
10
+ "pose": "Extended downward with the end-effector pressing the sponge against the plate surface",
11
+ "action": "Wiping a dirty plate with circular and sweeping motions",
12
+ "state_changes": "The arm moves fluidly from one side of the plate to the other, rotating its wrist joint to apply even pressure across the plate surface",
13
+ "clothing": "",
14
+ "expression": "",
15
+ "gender": "",
16
+ "age": "",
17
+ "skin_tone_and_texture": "",
18
+ "facial_features": "",
19
+ "number_of_subjects": 1,
20
+ "number_of_arms": 0,
21
+ "number_of_legs": 0
22
+ },
23
+ {
24
+ "description": "A white ceramic dinner plate with dried food residue, sauce stains, and grease marks covering its surface. Standard 10-inch round plate with a slightly raised rim.",
25
+ "appearance_details": "The plate has brownish-orange dried sauce, bits of dried food particles, and oily smears. As the sponge wipes across, clean white ceramic is revealed underneath.",
26
+ "relationship": "Object being cleaned by the robotic arm",
27
+ "location": "Center-left foreground, resting on the countertop",
28
+ "relative_size": "Medium within frame",
29
+ "orientation": "Flat, face-up on the counter",
30
+ "pose": "Stationary on the kitchen countertop",
31
+ "action": "Being wiped clean",
32
+ "state_changes": "Progressively becomes cleaner as the robotic arm wipes away the food residue, transitioning from dirty to mostly clean",
33
+ "clothing": "",
34
+ "expression": "",
35
+ "gender": "",
36
+ "age": "",
37
+ "skin_tone_and_texture": "",
38
+ "facial_features": "",
39
+ "number_of_subjects": 1,
40
+ "number_of_arms": 0,
41
+ "number_of_legs": 0
42
+ }
43
+ ],
44
+ "background_setting": "A modern, well-organized residential kitchen with light gray granite countertops, white cabinetry with brushed nickel handles, a stainless steel sink visible to the left, and a tiled backsplash in a subtle herringbone pattern. A dish rack with a few clean plates sits near the sink. A window above the sink lets in natural daylight. Small potted herbs sit on the windowsill.",
45
+ "lighting": {
46
+ "conditions": "Bright, mixed natural and artificial lighting \u2014 daylight from the window supplemented by warm overhead LED kitchen lights",
47
+ "direction": "Primary light from the left (window) with soft overhead fill from above",
48
+ "shadows": "Soft shadows cast by the robotic arm onto the countertop and plate, with a gentle shadow beneath the plate",
49
+ "illumination_effect": "Clean, well-lit domestic atmosphere with slight warm tones from the overhead lights blending with cooler daylight from the window"
50
+ },
51
+ "aesthetics": {
52
+ "composition": "The robotic arm and plate are centered in the frame with the kitchen environment providing context in the background. The diagonal line of the arm creates dynamic visual interest.",
53
+ "color_scheme": "Neutral palette of whites, grays, and silver metallics with pops of green from the sponge and herbs, warm wood tones from a cutting board in the background",
54
+ "mood_atmosphere": "Futuristic domestic, clean, efficient, slightly whimsical",
55
+ "patterns": "Herringbone tile pattern on the backsplash"
56
+ },
57
+ "cinematography": {
58
+ "camera_motion": "Slow, subtle push-in toward the plate as it gets cleaner",
59
+ "framing": "Medium close-up shot capturing the robotic arm, plate, and immediate countertop area",
60
+ "camera_angle": "Slightly high angle, approximately 30 degrees above eye level, looking down at the plate",
61
+ "depth_of_field": "Shallow",
62
+ "focus": "Sharp focus on the sponge-plate contact point and the robotic arm's gripper",
63
+ "lens_focal_length": "50mm equivalent"
64
+ },
65
+ "style_medium": "Live-action video",
66
+ "artistic_style": "Realistic, clean tech-demo aesthetic with cinematic color grading",
67
+ "context": "Demonstration of a robotic kitchen assistant performing a household chore, suitable for a technology showcase or smart home advertisement",
68
+ "actions": [
69
+ {
70
+ "time": "0:00-0:02",
71
+ "description": "The robotic arm lowers the sponge onto the dirty plate and begins its first wiping pass from the center outward"
72
+ },
73
+ {
74
+ "time": "0:02-0:05",
75
+ "description": "The arm performs circular wiping motions across the plate surface, rotating its wrist joint, progressively removing dried food and sauce stains"
76
+ },
77
+ {
78
+ "time": "0:05-0:07",
79
+ "description": "The arm makes final sweeping passes across the now mostly-clean plate, then lifts the sponge slightly, revealing the cleaned surface"
80
+ }
81
+ ],
82
+ "text_and_signage_elements": [],
83
+ "segments": [
84
+ {
85
+ "segment_index": 0,
86
+ "time_range": "0:00-0:02",
87
+ "description": "The robotic arm descends and makes initial contact with the dirty plate, beginning the wiping process",
88
+ "key_changes": "Arm transitions from hovering position to active contact with plate; first streak of clean ceramic appears",
89
+ "camera": "Static with very subtle push-in beginning"
90
+ },
91
+ {
92
+ "segment_index": 1,
93
+ "time_range": "0:02-0:05",
94
+ "description": "Main cleaning action as the robotic arm performs systematic circular wiping motions across the plate",
95
+ "key_changes": "Plate progressively becomes cleaner; food residue is visibly displaced by the sponge; water droplets and suds appear",
96
+ "camera": "Continuing slow push-in, maintaining focus on the cleaning action"
97
+ },
98
+ {
99
+ "segment_index": 2,
100
+ "time_range": "0:05-0:07",
101
+ "description": "Final cleaning passes and the arm lifts away to reveal the clean plate",
102
+ "key_changes": "Plate transitions from mostly clean to fully clean; arm lifts sponge and retracts slightly upward",
103
+ "camera": "Push-in completes; camera holds steady on the clean plate"
104
+ }
105
+ ],
106
+ "transitions": [],
107
+ "temporal_caption": "The video opens with a robotic arm positioned above a dirty white plate on a kitchen countertop. In the first two seconds, the arm lowers its sponge-equipped gripper onto the plate and begins a sweeping motion from center to edge. From seconds two through five, the arm performs methodical circular wiping motions, its wrist joint rotating smoothly as dried food and sauce stains are progressively removed, revealing clean white ceramic beneath. Small water droplets and light suds form on the plate surface. In the final two seconds, the arm completes its last passes across the now-clean plate and lifts the sponge upward, pausing briefly as if inspecting its work, with the gleaming clean plate fully visible below.",
108
+ "resolution": {
109
+ "W": 1280,
110
+ "H": 720
111
+ },
112
+ "aspect_ratio": "16,9",
113
+ "duration": "7s",
114
+ "fps": 24
115
+ }
assets/example_t2v_prompt_short.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ A robot arm is cleaning a plate in the kitchen
assets/example_t2vs_output.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:929bb8ea9a4da34da334c9cedfd696bb89a80b9f018ce732a586143be45331f0
3
+ size 4407722
assets/example_t2vs_prompt.json ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "subjects": [
3
+ {
4
+ "description": "A sleek industrial robot arm with a silver and dark gray metallic finish, featuring multiple articulated joints and a gripper end-effector that holds a clear glass jar filled with water.",
5
+ "appearance_details": "The robot arm has visible servo motors at each joint, subtle branding embossed on the upper arm segment, black rubber gaskets at joint connections, and a precision two-finger parallel gripper clamping the jar securely.",
6
+ "relationship": "Primary actor performing the pouring action; positioned above and beside the cup.",
7
+ "location": "Center-right of frame",
8
+ "relative_size": "Large within frame",
9
+ "orientation": "Angled toward the left side of frame, gripper tilted downward toward the cup",
10
+ "pose": "Extended arm with elbow joint bent, wrist rotated to tilt the jar at a pouring angle",
11
+ "action": "Pouring water from a glass jar into a cup",
12
+ "state_changes": "Begins in a neutral upright hold, then gradually tilts the jar to pour, and returns slightly upright as the pour finishes.",
13
+ "clothing": "",
14
+ "expression": "",
15
+ "gender": "",
16
+ "age": "",
17
+ "skin_tone_and_texture": "",
18
+ "facial_features": "",
19
+ "number_of_subjects": 1,
20
+ "number_of_arms": 0,
21
+ "number_of_legs": 0
22
+ },
23
+ {
24
+ "description": "A clear glass jar approximately three-quarters full of water, with a wide mouth opening, held by the robot arm's gripper.",
25
+ "appearance_details": "Transparent glass with slight green tint, smooth cylindrical body, no label, water visible sloshing gently inside as the jar tilts.",
26
+ "relationship": "Held by the robot arm; source of the water being poured.",
27
+ "location": "Center-right, elevated above the cup",
28
+ "relative_size": "Medium within frame",
29
+ "orientation": "Tilting progressively toward the left as the pour occurs",
30
+ "pose": "Held at an angle by the gripper",
31
+ "action": "Being tilted to pour water",
32
+ "state_changes": "Water level decreases as the pour progresses; jar tilts from near-vertical to approximately 45 degrees.",
33
+ "clothing": "",
34
+ "expression": "",
35
+ "gender": "",
36
+ "age": "",
37
+ "skin_tone_and_texture": "",
38
+ "facial_features": "",
39
+ "number_of_subjects": 1,
40
+ "number_of_arms": 0,
41
+ "number_of_legs": 0
42
+ },
43
+ {
44
+ "description": "A white ceramic cup sitting on a flat surface, positioned to receive the poured water.",
45
+ "appearance_details": "Simple cylindrical mug shape with a small handle on the right side, matte white glaze, clean and empty at the start.",
46
+ "relationship": "Receiving vessel for the water being poured from the jar.",
47
+ "location": "Center-left foreground, on the table surface",
48
+ "relative_size": "Small within frame",
49
+ "orientation": "Upright, opening facing upward",
50
+ "pose": "Stationary on the table",
51
+ "action": "Receiving water",
52
+ "state_changes": "Gradually fills with water as the pour continues.",
53
+ "clothing": "",
54
+ "expression": "",
55
+ "gender": "",
56
+ "age": "",
57
+ "skin_tone_and_texture": "",
58
+ "facial_features": "",
59
+ "number_of_subjects": 1,
60
+ "number_of_arms": 0,
61
+ "number_of_legs": 0
62
+ }
63
+ ],
64
+ "background_setting": "A clean, minimalist laboratory or workshop environment with a plain light gray wall in the background and a smooth white tabletop surface. The space is uncluttered, with no other objects or distractions visible, contributing to a very quiet and sterile atmosphere. The edges of the table are just visible at the bottom of the frame.",
65
+ "lighting": {
66
+ "conditions": "Soft, even studio lighting with minimal harsh highlights, creating a clean and controlled look.",
67
+ "direction": "Front-lit with slight top-down component, providing even illumination across the robot arm and objects.",
68
+ "shadows": "Soft, diffused shadows beneath the cup and the robot arm's base, with gentle shadow on the table surface.",
69
+ "illumination_effect": "The lighting emphasizes the metallic sheen of the robot arm and the transparency of the water and glass jar, giving the scene a polished, technical demonstration feel."
70
+ },
71
+ "aesthetics": {
72
+ "composition": "The robot arm dominates the right half of the frame while the cup sits in the left-center foreground, creating a clear visual flow from right to left following the pouring action. Negative space above and behind keeps focus on the action.",
73
+ "color_scheme": "Neutral palette dominated by silver, gray, and white with the clear blue-tinted transparency of water providing subtle color contrast.",
74
+ "mood_atmosphere": "Calm, precise, clinical, quietly impressive",
75
+ "patterns": ""
76
+ },
77
+ "cinematography": {
78
+ "camera_motion": "Static",
79
+ "framing": "Medium shot",
80
+ "camera_angle": "Eye-level, slightly elevated",
81
+ "depth_of_field": "Shallow",
82
+ "focus": "Sharp focus on the pouring point where water exits the jar and enters the cup",
83
+ "lens_focal_length": "50mm equivalent"
84
+ },
85
+ "style_medium": "Live-action video",
86
+ "artistic_style": "Realistic, clean technical demonstration",
87
+ "context": "A robotics demonstration showcasing a robot arm's precision and dexterity in performing a delicate pouring task.",
88
+ "actions": [
89
+ {
90
+ "time": "0:00-0:02",
91
+ "description": "The robot arm holds the jar upright in a steady position above the cup, making small preparatory adjustments to align the jar's mouth with the cup opening."
92
+ },
93
+ {
94
+ "time": "0:02-0:05",
95
+ "description": "The robot arm smoothly tilts the jar, and a steady stream of clear water flows from the jar into the white ceramic cup below. The water stream is smooth and controlled."
96
+ },
97
+ {
98
+ "time": "0:05-0:07",
99
+ "description": "The robot arm gradually returns the jar to a more upright position, the water stream thins and stops, and the cup is now partially filled with water. The arm holds still in its final position."
100
+ }
101
+ ],
102
+ "text_and_signage_elements": [],
103
+ "segments": [
104
+ {
105
+ "segment_index": 0,
106
+ "time_range": "0:00-0:02",
107
+ "description": "Opening shot establishes the robot arm holding the jar above the cup. The arm makes slight positional adjustments.",
108
+ "key_changes": "Minor wrist rotation as the arm aligns the jar with the cup.",
109
+ "camera": "Static, medium shot at eye level."
110
+ },
111
+ {
112
+ "segment_index": 1,
113
+ "time_range": "0:02-0:05",
114
+ "description": "The main pouring action occurs as the robot arm tilts the jar and water flows in a controlled stream into the cup.",
115
+ "key_changes": "Jar tilts from near-vertical to approximately 45 degrees; water stream begins and maintains a steady flow; cup fills progressively.",
116
+ "camera": "Static, maintaining focus on the pouring point."
117
+ },
118
+ {
119
+ "segment_index": 2,
120
+ "time_range": "0:05-0:07",
121
+ "description": "The pour concludes as the arm returns the jar upright. The water stream tapers off and the scene settles into stillness.",
122
+ "key_changes": "Jar returns toward vertical; water stream narrows and ceases; cup now holds water; arm stabilizes.",
123
+ "camera": "Static, same framing held throughout."
124
+ }
125
+ ],
126
+ "transitions": [],
127
+ "temporal_caption": "The video opens with a silver robotic arm holding a clear glass jar of water above a white ceramic cup on a clean white table against a gray background. During the first two seconds, the arm makes subtle alignment adjustments. At around two seconds, the wrist joint rotates and the jar begins to tilt, releasing a smooth, steady stream of clear water that arcs downward into the cup. The pouring continues for about three seconds, with the water level in the jar visibly decreasing and the cup gradually filling. Around the five-second mark, the arm begins to level the jar back upright, the stream of water thins to a trickle, then stops entirely. For the remaining two seconds, the robot arm holds the jar still in a slightly tilted resting position, and the filled cup sits motionless on the table.",
128
+ "audio_description": "The scene is very quiet. The dominant sound is the gentle splashing and trickling of water as it pours from the jar into the cup, starting softly and becoming slightly louder as the cup fills. There is a faint mechanical whir from the robot arm's servo motors during movement. No music, no speech, and minimal ambient noise, emphasizing the tranquility of the environment.",
129
+ "resolution": {
130
+ "H": 720,
131
+ "W": 1280
132
+ },
133
+ "aspect_ratio": "16,9",
134
+ "duration": "7s",
135
+ "fps": 24
136
+ }
assets/negative_prompt.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "subjects": [
3
+ {
4
+ "description": "Blurry, poorly defined subjects with inconsistent shapes and unrealistic proportions.",
5
+ "appearance_details": "Distorted features, visible compression artifacts, muddy textures lacking fine detail, color bleeding between elements, and unnatural skin tones or surface textures that appear artificial or computer-generated.",
6
+ "relationship": "Subjects appear disconnected from the environment, floating or improperly grounded in the scene without proper occlusion or spatial coherence.",
7
+ "location": "Subjects are poorly placed within the frame, appearing at awkward positions that violate basic compositional rules.",
8
+ "relative_size": "Inconsistent scale relationships between subjects and the environment, with objects appearing too large or too small relative to their surroundings.",
9
+ "orientation": "Unnatural orientations that defy physics and spatial logic.",
10
+ "pose": "Stiff, mannequin-like poses with unnatural joint angles and impossible limb positions that look computer-generated.",
11
+ "action": "Incoherent motion with visible frame-to-frame discontinuities. Movement appears as a slideshow rather than smooth animation. Limbs and appendages pop between positions without interpolation.",
12
+ "state_changes": "Visual state transitions are abrupt and jarring. Colors shift without motivation. Surface textures flicker between different materials randomly. Outlines shimmer and vibrate.",
13
+ "clothing": "Clothing appears painted on with no sense of material weight or drape. Fabric textures are flat and repeat visibly.",
14
+ "expression": "Frozen, uncanny valley expressions or expressions that change abruptly without natural transition.",
15
+ "gender": "",
16
+ "age": "",
17
+ "skin_tone_and_texture": "Waxy, plastic-looking skin with visible artifacts and inconsistent texture resolution across the frame.",
18
+ "facial_features": "Asymmetric facial features, extra fingers or limbs, teeth that appear blurry or malformed.",
19
+ "number_of_subjects": 0,
20
+ "number_of_arms": 0,
21
+ "number_of_legs": 0
22
+ },
23
+ {
24
+ "description": "Extremely low-quality subjects with visible rendering artifacts, broken mesh geometry, and completely unrealistic proportions throughout.",
25
+ "appearance_details": "Distorted features, visible compression artifacts, muddy textures lacking fine detail, color bleeding between elements, and unnatural skin tones or surface textures that appear artificial or computer-generated.",
26
+ "relationship": "Subjects appear disconnected from the environment, floating or improperly grounded in the scene without proper occlusion or spatial coherence.",
27
+ "location": "Subjects are poorly placed within the frame, appearing at awkward positions that violate basic compositional rules.",
28
+ "relative_size": "Inconsistent scale relationships between subjects and the environment, with objects appearing too large or too small relative to their surroundings.",
29
+ "orientation": "Unnatural orientations that defy physics and spatial logic.",
30
+ "pose": "Stiff, mannequin-like poses with unnatural joint angles and impossible limb positions that look computer-generated.",
31
+ "action": "Incoherent motion with visible frame-to-frame discontinuities. Movement appears as a slideshow rather than smooth animation. Limbs and appendages pop between positions without interpolation.",
32
+ "state_changes": "Visual state transitions are abrupt and jarring. Colors shift without motivation. Surface textures flicker between different materials randomly. Outlines shimmer and vibrate.",
33
+ "clothing": "Clothing appears painted on with no sense of material weight or drape. Fabric textures are flat and repeat visibly.",
34
+ "expression": "Frozen, uncanny valley expressions or expressions that change abruptly without natural transition.",
35
+ "gender": "",
36
+ "age": "",
37
+ "skin_tone_and_texture": "Waxy, plastic-looking skin with visible artifacts and inconsistent texture resolution across the frame.",
38
+ "facial_features": "Asymmetric facial features, extra fingers or limbs, teeth that appear blurry or malformed.",
39
+ "number_of_subjects": 0,
40
+ "number_of_arms": 0,
41
+ "number_of_legs": 0
42
+ },
43
+ {
44
+ "description": "Poorly generated subjects exhibiting all hallmarks of failed neural rendering \u2014 flickering edges, inconsistent depth, and uncanny spatial relationships.",
45
+ "appearance_details": "Distorted features, visible compression artifacts, muddy textures lacking fine detail, color bleeding between elements, and unnatural skin tones or surface textures that appear artificial or computer-generated.",
46
+ "relationship": "Subjects appear disconnected from the environment, floating or improperly grounded in the scene without proper occlusion or spatial coherence.",
47
+ "location": "Subjects are poorly placed within the frame, appearing at awkward positions that violate basic compositional rules.",
48
+ "relative_size": "Inconsistent scale relationships between subjects and the environment, with objects appearing too large or too small relative to their surroundings.",
49
+ "orientation": "Unnatural orientations that defy physics and spatial logic.",
50
+ "pose": "Stiff, mannequin-like poses with unnatural joint angles and impossible limb positions that look computer-generated.",
51
+ "action": "Incoherent motion with visible frame-to-frame discontinuities. Movement appears as a slideshow rather than smooth animation. Limbs and appendages pop between positions without interpolation.",
52
+ "state_changes": "Visual state transitions are abrupt and jarring. Colors shift without motivation. Surface textures flicker between different materials randomly. Outlines shimmer and vibrate.",
53
+ "clothing": "Clothing appears painted on with no sense of material weight or drape. Fabric textures are flat and repeat visibly.",
54
+ "expression": "Frozen, uncanny valley expressions or expressions that change abruptly without natural transition.",
55
+ "gender": "",
56
+ "age": "",
57
+ "skin_tone_and_texture": "Waxy, plastic-looking skin with visible artifacts and inconsistent texture resolution across the frame.",
58
+ "facial_features": "Asymmetric facial features, extra fingers or limbs, teeth that appear blurry or malformed.",
59
+ "number_of_subjects": 0,
60
+ "number_of_arms": 0,
61
+ "number_of_legs": 0
62
+ }
63
+ ],
64
+ "background_setting": "A poorly rendered, flat background with visible seams, repeated textures, and inconsistent depth cues. The environment lacks volumetric depth and appears as a painted backdrop rather than a three-dimensional space. Vegetation looks like flat cutouts with no volumetric depth. The background appears to have been composited from multiple source materials at different resolutions, creating visible seams and edge artifacts where elements meet. Textures swim and shift across surfaces in a way that breaks the illusion of solidity \u2014 patterns drift laterally rather than staying anchored to the geometry they belong to. Background elements flicker in and out of existence between frames, particularly at the edges of the field of view. The rendering resolution is visibly lower for distant elements, creating a jarring transition between near and far objects. Cloud textures repeat obviously in the sky with visible tiling. Water surfaces lack proper reflection and refraction, appearing as flat animated textures. Fog and atmospheric effects pop in and out rather than smoothly transitioning. Trees and vegetation exhibit obvious LOD (level-of-detail) switching. Building facades have inconsistent window spacing and pattern repetition. The overall scene feels like a poorly assembled collage of individually rendered elements rather than a coherent whole.",
65
+ "lighting": {
66
+ "conditions": "Harsh, flat lighting with no natural variation. The scene appears uniformly lit as if by a single overhead fluorescent light, removing all sense of depth and atmosphere.",
67
+ "direction": "Inconsistent light sources \u2014 shadows point in multiple contradictory directions, breaking physical plausibility.",
68
+ "shadows": "Hard-edged, unrealistic shadows that pop in and out of existence between frames. Some objects cast no shadows while others have impossibly dark ones that don't animate smoothly with the object's motion. Shadow edges exhibit visible staircase aliasing artifacts. Shadow maps appear to have been rendered at extremely low resolution, creating blocky patterns. Self-shadowing on characters shows visible peter-panning artifacts where shadows detach from their source. Contact shadows between objects and the ground appear and disappear as objects move slightly. Shadow color is pure black with no ambient contribution, creating an unnaturally harsh contrast that flattens the image. Multiple shadow cascades have visible boundaries where resolution changes. The shadow rendering appears to be temporally unstable \u2014 even static objects have shadows that shimmer and crawl frame to frame, breaking the illusion of a stable light source.",
69
+ "illumination_effect": "No bounce light, no ambient occlusion, no subtle color interactions between surfaces. The scene looks like a poorly lit 3D render from the early 2000s."
70
+ },
71
+ "aesthetics": {
72
+ "composition": "Cluttered, poorly framed composition with no clear focal point. Important elements are cut off by the frame edges. The rule of thirds is completely ignored, leading to an unbalanced and visually unpleasant arrangement.",
73
+ "color_scheme": "Oversaturated, garish colors that clash violently. Color banding is visible in gradient areas. The overall palette feels artificial and digitally processed rather than natural.",
74
+ "mood_atmosphere": "Unsettling, uncanny atmosphere that fails to evoke any intended emotional response. The scene feels lifeless and sterile despite attempting to portray dynamic action.",
75
+ "patterns": "Visible tiling artifacts in textures, moir\u00e9 patterns, and aliasing on edges."
76
+ },
77
+ "cinematography": {
78
+ "camera_motion": "Extremely shaky, unstable camera with visible rolling shutter artifacts. The motion is jerky and discontinuous, causing motion sickness and making the scene impossible to follow.",
79
+ "framing": "Poorly framed shots that cut off important elements and include unnecessary empty space.",
80
+ "camera_angle": "Awkward, disorienting camera angles that provide no useful spatial information about the scene. The camera path exhibits visible mathematical artifacts suggesting simple interpolation between keyframes rather than natural camera operation. Camera motion is completely disconnected from the scene content \u2014 panning away from action, dollying during dialogue, and shaking during still moments. The camera appears to pass through solid objects occasionally. Zoom is applied digitally rather than optically, revealing progressively worse resolution. Camera motion exhibits non-physical acceleration profiles \u2014 instant starts and stops rather than smooth ease-in/ease-out. Rolling shutter simulation is applied inconsistently, present in some frames but not others. The camera occasionally exhibits impossible motion like teleporting between positions. Virtual camera stabilization creates an uncanny floating sensation disconnected from any physical camera rig.",
81
+ "depth_of_field": "Uniform focus throughout, creating a flat, documentary-like appearance with no cinematic depth separation.",
82
+ "focus": "Soft, out-of-focus imagery with visible chromatic aberration and lens distortion that was not corrected in post-processing.",
83
+ "lens_focal_length": "Inappropriate focal length causing barrel distortion and unnatural perspective compression."
84
+ },
85
+ "style_medium": "Low quality compressed digital video with visible encoding artifacts",
86
+ "artistic_style": "Amateur, unpolished with inconsistent visual style",
87
+ "context": "A poorly produced video with numerous technical and artistic flaws that detract from any intended narrative or visual impact.",
88
+ "actions": [
89
+ {
90
+ "time": "0:00-0:08",
91
+ "description": "Subjects attempt to move but their motion is jerky, temporally inconsistent, and physically implausible. Background elements flicker and shift between frames."
92
+ }
93
+ ],
94
+ "text_and_signage_elements": [],
95
+ "segments": [
96
+ {
97
+ "segment_index": 0,
98
+ "time_range": "0:00-0:08",
99
+ "description": "A single continuous shot suffering from severe temporal inconsistencies \u2014 subjects that morph and deform between frames, backgrounds that shift and wobble, and rendering quality that fluctuates visibly over time. Motion blur is applied incorrectly, smearing in directions that don't match actual movement. Frame-to-frame coherence breaks down with individual pixels changing color randomly in flat areas. Texture detail level fluctuates between frames as if the rendering budget varied shot to shot. Color grading drifts over the duration with no creative motivation. Noise patterns change between frames in ways that draw attention rather than being invisible. Overall visual quality degrades progressively from start to finish.",
100
+ "key_changes": "No meaningful progression or narrative development. Visual quality degrades over time.",
101
+ "camera": "Unstable, poorly controlled camera work with visible mathematical interpolation artifacts."
102
+ }
103
+ ],
104
+ "transitions": [],
105
+ "temporal_caption": "The scene opens at 0.0 seconds with a poorly rendered establishing shot that immediately reveals low production quality. At 1.0 seconds, subjects begin to move but their motion is jerky and inconsistent, with limbs bending at unnatural angles and objects clipping through each other. From 2.0 to 4.0 seconds, the camera shakes violently while the scene exhibits visible compression artifacts, color banding in the sky, and flickering in the shadows. Between 4.0 and 6.0 seconds, temporal coherence breaks down as elements appear and disappear between frames, textures swim and morph unnaturally, and the lighting shifts abruptly without physical cause. In the final 2 seconds, the overall visual quality deteriorates further with increasing noise, blur, and a general loss of spatial coherence that makes the scene nearly unwatchable. Additionally, the frame rate appears inconsistent with visible judder and stuttering throughout. Color temperature shifts randomly between warm and cool tones with no motivation. The encode quality degrades in complex regions showing macro-blocking and mosquito noise around moving edges. Temporal noise patterns are spatially correlated, creating swimming artifacts on flat surfaces.",
106
+ "audio_description": "",
107
+ "physical_realism": "No adherence to physical laws. Objects defy gravity, pass through solid surfaces, and change mass and momentum without cause. Fluid dynamics, cloth simulation, and rigid body physics are all fundamentally broken. Furthermore, conservation of energy is violated as objects gain or lose kinetic energy spontaneously. Elastic collisions produce inelastic results and vice versa. Surface friction is inconsistent \u2014 objects slide on rough surfaces while sticking to smooth ones. Air resistance appears to affect only some objects while others move through the atmosphere unimpeded."
108
+ }
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {%- if messages[0].content is string %}\n {{- messages[0].content }}\n {%- else %}\n {%- for content in messages[0].content %}\n {%- if 'text' in content %}\n {{- content.text }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].content is string %}\n {{- messages[0].content }}\n {%- else %}\n {%- for content in messages[0].content %}\n {%- if 'text' in content %}\n {{- content.text }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- for message in messages %}\n {%- if message.role == \"user\" %}\n {{- '<|im_start|>' + message.role + '\\n' }}\n {%- if message.content is string %}\n {{- message.content }}\n {%- else %}\n {%- for content in message.content %}\n {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}\n {%- set image_count.value = image_count.value + 1 %}\n {%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%}\n <|vision_start|><|image_pad|><|vision_end|>\n {%- elif content.type == 'video' or 'video' in content %}\n {%- set video_count.value = video_count.value + 1 %}\n {%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%}\n <|vision_start|><|video_pad|><|vision_end|>\n {%- elif 'text' in content %}\n {{- content.text }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role + '\\n' }}\n {%- if message.content is string %}\n {{- message.content }}\n {%- else %}\n {%- for content_item in message.content %}\n {%- if 'text' in content_item %}\n {{- content_item.text }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and message.content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {%- if message.content is string %}\n {{- message.content }}\n {%- else %}\n {%- for content in message.content %}\n {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}\n {%- set image_count.value = image_count.value + 1 %}\n {%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%}\n <|vision_start|><|image_pad|><|vision_end|>\n {%- elif content.type == 'video' or 'video' in content %}\n {%- set video_count.value = video_count.value + 1 %}\n {%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%}\n <|vision_start|><|video_pad|><|vision_end|>\n {%- elif 'text' in content %}\n {{- content.text }}\n {%- endif %}\n {%- endfor %}\n {%- endif %}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n"
3
+ }
checkpoint.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
config.json ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "allow_patterns_overrides": [
3
+ "*/*.safetensors"
4
+ ],
5
+ "architectures": [
6
+ "Cosmos3ForConditionalGeneration"
7
+ ],
8
+ "image_token_id": 151655,
9
+ "model": {
10
+ "_recursive_": false,
11
+ "_target": "omni_mot_model",
12
+ "config": {
13
+ "_type": "omni_mot_model_config",
14
+ "action_gen": true,
15
+ "activation_checkpointing": {
16
+ "_type": "activation_checkpointing_config",
17
+ "determinism_check": "default",
18
+ "mode": "full",
19
+ "preserve_rng_state": true,
20
+ "save_ops_regex": [
21
+ "fmha"
22
+ ]
23
+ },
24
+ "causal_training_strategy": "none",
25
+ "diffusion_expert_config": {
26
+ "_type": "diffusion_expert_config",
27
+ "base_fps": 24,
28
+ "enable_fps_modulation": true,
29
+ "load_weights_from_pretrained": false,
30
+ "max_vae_latent_side_after_patchify": 20,
31
+ "patch_spatial": 2,
32
+ "position_embedding_type": "unified_3d_mrope",
33
+ "rope_h_extrapolation_ratio": 1.0,
34
+ "rope_t_extrapolation_ratio": 1.0,
35
+ "rope_w_extrapolation_ratio": 1.0,
36
+ "timestep_range": 1.0,
37
+ "unified_3d_mrope_reset_spatial_ids": true,
38
+ "unified_3d_mrope_temporal_modality_margin": 15000
39
+ },
40
+ "ema": {
41
+ "_type": "ema_config",
42
+ "enabled": false,
43
+ "iteration_shift": 0,
44
+ "rate": 0.1
45
+ },
46
+ "fixed_step_sampler_config": null,
47
+ "input_caption_key": "ai_caption",
48
+ "input_image_key": "images",
49
+ "input_video_key": "video",
50
+ "joint_attn_implementation": "two_way",
51
+ "latent_downsample_factor": 16,
52
+ "lbl": {
53
+ "_type": "lbl_config",
54
+ "coeff_gen": null,
55
+ "coeff_und": null,
56
+ "method": "local"
57
+ },
58
+ "log_enc_time_every_n": 100,
59
+ "lora_alpha": 32,
60
+ "lora_enabled": false,
61
+ "lora_rank": 16,
62
+ "lora_target_modules": "q_proj_moe_gen,k_proj_moe_gen,v_proj_moe_gen,o_proj_moe_gen",
63
+ "max_action_dim": 64,
64
+ "max_num_tokens_after_packing": 74000,
65
+ "natten_parameter_list": null,
66
+ "net": null,
67
+ "num_embodiment_domains": 32,
68
+ "parallelism": {
69
+ "_type": "parallelism_config",
70
+ "cfg_parallel_shard_degree": 1,
71
+ "compile_dynamic": true,
72
+ "compiled_region": "language",
73
+ "context_parallel_shard_degree": 1,
74
+ "coordinate_descent_tuning": false,
75
+ "data_parallel_replicate_degree": 1,
76
+ "data_parallel_shard_degree": 128,
77
+ "enable_inference_mode": false,
78
+ "max_autotune_pointwise": false,
79
+ "precision": "bfloat16",
80
+ "use_cuda_graphs": false,
81
+ "use_torch_compile": true
82
+ },
83
+ "rectified_flow_inference_config": {
84
+ "_type": "rectified_flow_inference_config",
85
+ "num_train_timesteps": 1000,
86
+ "scheduler_type": "unipc",
87
+ "shift": 1,
88
+ "use_dynamic_shifting": false
89
+ },
90
+ "rectified_flow_training_config": {
91
+ "_type": "rectified_flow_training_config",
92
+ "action_loss_weight": 10.0,
93
+ "high_sigma_ratio": 0.05,
94
+ "high_sigma_timesteps_max": 1000,
95
+ "high_sigma_timesteps_min": 995,
96
+ "image_loss_scale": null,
97
+ "independent_action_schedule": false,
98
+ "independent_sound_schedule": false,
99
+ "loss_scale": 10.0,
100
+ "normalize_loss_by_active": false,
101
+ "shift": {
102
+ "256": 3,
103
+ "480": 5,
104
+ "720": 10
105
+ },
106
+ "shift_action": null,
107
+ "shift_sound": null,
108
+ "sound_loss_scale": 2.0,
109
+ "train_time_action_distribution": "logitnormal",
110
+ "train_time_image_distribution": "logitnormal",
111
+ "train_time_sound_distribution": "logitnormal",
112
+ "train_time_video_distribution": "waver",
113
+ "train_time_weight": "uniform",
114
+ "use_discrete_rf": false,
115
+ "use_dynamic_shift": false,
116
+ "use_high_sigma_strategy": false,
117
+ "use_high_sigma_strategy_action": false,
118
+ "use_high_sigma_strategy_sound": false
119
+ },
120
+ "resolution": "720",
121
+ "sound_dim": 64,
122
+ "sound_gen": true,
123
+ "sound_latent_fps": 25,
124
+ "sound_tokenizer": {
125
+ "_target": "avae_interface",
126
+ "audio_channels": 2,
127
+ "avae_config_path": "",
128
+ "avae_path": "pretrained/tokenizers/audio/avae/avae_48k_noncausal_25hz_64ch.ckpt",
129
+ "bucket_name": "bucket",
130
+ "hop_size": 1920,
131
+ "io_channels": 64,
132
+ "latent_mean": null,
133
+ "latent_std": null,
134
+ "normalization_type": "none",
135
+ "normalize_latents": false,
136
+ "object_store_credential_path_pretrained": "credentials/gcp_training.secret",
137
+ "sample_rate": 48000,
138
+ "tanh_clamp": 0.995,
139
+ "tanh_input_scale": 1.5,
140
+ "tanh_output_scale": 3.5
141
+ },
142
+ "state_ch": 48,
143
+ "state_t": 300,
144
+ "tokenizer": {
145
+ "_target": "wan2pt2_vae_interface",
146
+ "bucket_name": "bucket",
147
+ "chunk_duration": 93,
148
+ "encode_bucket_multiple": null,
149
+ "encode_chunk_frames": {
150
+ "256": 68,
151
+ "480": 24,
152
+ "720": 12
153
+ },
154
+ "encode_exact_durations": [
155
+ 17,
156
+ 61,
157
+ 73
158
+ ],
159
+ "keep_decoder_cache": false,
160
+ "object_store_credential_path_pretrained": "credentials/gcp_training.secret",
161
+ "spatial_compression_factor": 16,
162
+ "temporal_compression_factor": 4,
163
+ "temporal_window": null,
164
+ "use_streaming_encode": false,
165
+ "vae_path": "pretrained/tokenizers/video/wan2pt2/Wan2.2_VAE.pth"
166
+ },
167
+ "video_temporal_causal": false,
168
+ "vision_gen": true,
169
+ "vlm_config": {
170
+ "_type": "vlm_config",
171
+ "layer_module": null,
172
+ "model_instance": {
173
+ "_target": "qwen3_vl_text_for_causal_lm",
174
+ "config": {
175
+ "_target": "create_vlm_config",
176
+ "base_config": {
177
+ "_target": "qwen3_vl_mot_config_from_json_file",
178
+ "json_file": "cosmos3://vfm/models/vlm/qwen3_vl/configs/Qwen3-VL-32B-Instruct.json"
179
+ },
180
+ "qk_norm_for_text": true
181
+ }
182
+ },
183
+ "model_name": "nvidia/Cosmos3-Super-Reasoner",
184
+ "pretrained_weights": {
185
+ "_type": "pretrained_weights_config",
186
+ "backbone_path": "s3://bucket/cosmos3/pretrained/huggingface/Cosmos-Reason/Cosmos3-Super-Reasoner-b6df0d1/",
187
+ "checkpoint_format": null,
188
+ "credentials_path": "credentials/gcp_checkpoint.secret",
189
+ "enable_gcs_patch_in_boto3": true,
190
+ "enabled": false
191
+ },
192
+ "qk_norm": false,
193
+ "tie_word_embeddings": false,
194
+ "tokenizer": {
195
+ "_target": "create_qwen2_tokenizer_with_download",
196
+ "config_variant": "gcp",
197
+ "pretrained_model_name": "Qwen/Qwen3-VL-32B-Instruct"
198
+ },
199
+ "use_system_prompt": false
200
+ }
201
+ }
202
+ },
203
+ "model_type": "cosmos3_omni",
204
+ "text_config": {
205
+ "attention_bias": false,
206
+ "attention_dropout": 0.0,
207
+ "bos_token_id": 151643,
208
+ "dtype": "bfloat16",
209
+ "eos_token_id": 151645,
210
+ "head_dim": 128,
211
+ "hidden_act": "silu",
212
+ "hidden_size": 5120,
213
+ "initializer_range": 0.02,
214
+ "intermediate_size": 25600,
215
+ "max_position_embeddings": 262144,
216
+ "model_type": "qwen3_vl_text",
217
+ "num_attention_heads": 64,
218
+ "num_hidden_layers": 64,
219
+ "num_key_value_heads": 8,
220
+ "rms_norm_eps": 1e-06,
221
+ "rope_scaling": {
222
+ "mrope_interleaved": true,
223
+ "mrope_section": [
224
+ 24,
225
+ 20,
226
+ 20
227
+ ],
228
+ "rope_type": "default"
229
+ },
230
+ "rope_theta": 5000000,
231
+ "use_cache": true,
232
+ "vocab_size": 151936
233
+ },
234
+ "tie_word_embeddings": false,
235
+ "transformers_version": "4.57.0.dev0",
236
+ "video_token_id": 151656,
237
+ "vision_config": {
238
+ "deepstack_visual_indexes": [
239
+ 8,
240
+ 16,
241
+ 24
242
+ ],
243
+ "depth": 27,
244
+ "hidden_act": "gelu_pytorch_tanh",
245
+ "hidden_size": 1152,
246
+ "in_channels": 3,
247
+ "initializer_range": 0.02,
248
+ "intermediate_size": 4304,
249
+ "model_type": "qwen3_vl",
250
+ "num_heads": 16,
251
+ "num_position_embeddings": 2304,
252
+ "out_hidden_size": 5120,
253
+ "patch_size": 16,
254
+ "spatial_merge_size": 2,
255
+ "temporal_patch_size": 2
256
+ },
257
+ "vision_end_token_id": 151653,
258
+ "vision_start_token_id": 151652
259
+ }
260
+
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "pad_token_id": 151643,
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 151645,
7
+ 151643
8
+ ],
9
+ "top_p": 0.8,
10
+ "top_k": 20,
11
+ "temperature": 0.7,
12
+ "repetition_penalty": 1.0,
13
+ "transformers_version": "4.56.0"
14
+ }
images/benchmark-action-1.png ADDED

Git LFS Details

  • SHA256: 11772abc8515d29e62403d12a19194308e513d323852de9c7878a1006a3660ee
  • Pointer size: 131 Bytes
  • Size of remote file: 101 kB
images/benchmark-overall.png ADDED

Git LFS Details

  • SHA256: 25425f6be1f1f2099c484f3e887b420d18728de1f79e4600770461ef94ed44c7
  • Pointer size: 131 Bytes
  • Size of remote file: 269 kB
images/benchmark-reasoning.png ADDED

Git LFS Details

  • SHA256: ba95a3b243fc23c644e12dd8b22794af7002a1ee5a469322dbe1a55f080a12b4
  • Pointer size: 132 Bytes
  • Size of remote file: 2.68 MB
images/benchmark-visual-audio.png ADDED

Git LFS Details

  • SHA256: 2faab808cd9871de94c099ccc4bca428206814d818224879726d54e5a80b2578
  • Pointer size: 131 Bytes
  • Size of remote file: 366 kB
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
model_index.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "Cosmos3OmniDiffusersPipeline",
3
+ "_diffusers_version": "0.37.1",
4
+ "scheduler": [
5
+ "diffusers",
6
+ "UniPCMultistepScheduler"
7
+ ],
8
+ "text_tokenizer": [
9
+ "transformers",
10
+ "Qwen2TokenizerFast"
11
+ ],
12
+ "transformer": [
13
+ "diffusers",
14
+ "Cosmos3OmniTransformer"
15
+ ],
16
+ "vae": [
17
+ "diffusers",
18
+ "AutoencoderKLWan"
19
+ ],
20
+ "vision_encoder": [
21
+ "transformers",
22
+ "Qwen3VLVisionModel"
23
+ ],
24
+ "sound_tokenizer": [
25
+ "diffusers",
26
+ "Cosmos3AVAEAudioTokenizer"
27
+ ]
28
+ }
preprocessor_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "size": {
3
+ "longest_edge": 16777216,
4
+ "shortest_edge": 65536
5
+ },
6
+ "patch_size": 16,
7
+ "temporal_patch_size": 2,
8
+ "merge_size": 2,
9
+ "image_mean": [
10
+ 0.5,
11
+ 0.5,
12
+ 0.5
13
+ ],
14
+ "image_std": [
15
+ 0.5,
16
+ 0.5,
17
+ 0.5
18
+ ],
19
+ "processor_class": "Qwen3VLProcessor",
20
+ "image_processor_type": "Qwen2VLImageProcessorFast"
21
+ }
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UniPCMultistepScheduler",
3
+ "_diffusers_version": "0.37.1",
4
+ "beta_end": 0.02,
5
+ "beta_schedule": "linear",
6
+ "beta_start": 0.0001,
7
+ "disable_corrector": [],
8
+ "dynamic_thresholding_ratio": 0.995,
9
+ "final_sigmas_type": "zero",
10
+ "flow_shift": 1.0,
11
+ "lower_order_final": true,
12
+ "num_train_timesteps": 1000,
13
+ "predict_x0": true,
14
+ "prediction_type": "flow_prediction",
15
+ "rescale_betas_zero_snr": false,
16
+ "sample_max_value": 1.0,
17
+ "shift_terminal": null,
18
+ "sigma_max": 200.0,
19
+ "sigma_min": 0.147,
20
+ "solver_order": 2,
21
+ "solver_p": null,
22
+ "solver_type": "bh2",
23
+ "steps_offset": 0,
24
+ "thresholding": false,
25
+ "time_shift_type": "exponential",
26
+ "timestep_spacing": "linspace",
27
+ "trained_betas": null,
28
+ "use_beta_sigmas": false,
29
+ "use_dynamic_shifting": false,
30
+ "use_exponential_sigmas": false,
31
+ "use_flow_sigmas": true,
32
+ "use_karras_sigmas": true
33
+ }
sound_tokenizer.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6daeb68a219f3e86c0918f616d78b9ebf073f3d700df63ff1c02d214c081d72d
3
+ size 1985246007
sound_tokenizer.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "autoencoder_v2",
3
+ "sampling_rate": 48000,
4
+ "stereo": true,
5
+ "use_wav_as_input": true,
6
+ "normalize_volume": true,
7
+ "hop_size": 1920,
8
+ "input_channels": 1,
9
+ "enc_type": "spec_convnext",
10
+ "enc_dim": 192,
11
+ "enc_intermediate_dim": 768,
12
+ "enc_num_layers": 12,
13
+ "enc_num_blocks": 2,
14
+ "enc_n_fft": 64,
15
+ "enc_hop_length": 16,
16
+ "enc_latent_dim": 128,
17
+ "enc_c_mults": [1, 2, 4],
18
+ "enc_strides": [4, 5, 6],
19
+ "enc_identity_init": false,
20
+ "enc_use_snake": true,
21
+ "dec_type": "oobleck",
22
+ "dec_dim": 320,
23
+ "dec_c_mults": [1, 2, 4, 8, 16],
24
+ "dec_strides": [2, 4, 5, 6, 8],
25
+ "dec_use_snake": true,
26
+ "dec_final_tanh": false,
27
+ "dec_out_channels": 2,
28
+ "dec_anti_aliasing": false,
29
+ "dec_use_nearest_upsample": false,
30
+ "dec_use_tanh_at_final": false,
31
+ "bottleneck_type": "vae",
32
+ "bottleneck": {"type": "vae"},
33
+ "activation": "snakebeta",
34
+ "snake_logscale": true,
35
+ "anti_aliasing": false,
36
+ "use_cuda_kernel": false,
37
+ "causal": false,
38
+ "padding_mode": "zeros",
39
+ "vocoder_input_dim": 64,
40
+ "latent_mean": null,
41
+ "latent_std": null
42
+ }
sound_tokenizer/config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "autoencoder_v2",
3
+ "sampling_rate": 48000,
4
+ "stereo": true,
5
+ "use_wav_as_input": true,
6
+ "normalize_volume": true,
7
+ "hop_size": 1920,
8
+ "input_channels": 1,
9
+ "enc_type": "spec_convnext",
10
+ "enc_dim": 192,
11
+ "enc_intermediate_dim": 768,
12
+ "enc_num_layers": 12,
13
+ "enc_num_blocks": 2,
14
+ "enc_n_fft": 64,
15
+ "enc_hop_length": 16,
16
+ "enc_latent_dim": 128,
17
+ "enc_c_mults": [
18
+ 1,
19
+ 2,
20
+ 4
21
+ ],
22
+ "enc_strides": [
23
+ 4,
24
+ 5,
25
+ 6
26
+ ],
27
+ "enc_identity_init": false,
28
+ "enc_use_snake": true,
29
+ "dec_type": "oobleck",
30
+ "dec_dim": 320,
31
+ "dec_c_mults": [
32
+ 1,
33
+ 2,
34
+ 4,
35
+ 8,
36
+ 16
37
+ ],
38
+ "dec_strides": [
39
+ 2,
40
+ 4,
41
+ 5,
42
+ 6,
43
+ 8
44
+ ],
45
+ "dec_use_snake": true,
46
+ "dec_final_tanh": false,
47
+ "dec_out_channels": 2,
48
+ "dec_anti_aliasing": false,
49
+ "dec_use_nearest_upsample": false,
50
+ "dec_use_tanh_at_final": false,
51
+ "bottleneck_type": "vae",
52
+ "bottleneck": {
53
+ "type": "vae"
54
+ },
55
+ "activation": "snakebeta",
56
+ "snake_logscale": true,
57
+ "anti_aliasing": false,
58
+ "use_cuda_kernel": false,
59
+ "causal": false,
60
+ "padding_mode": "zeros",
61
+ "vocoder_input_dim": 64,
62
+ "latent_mean": null,
63
+ "latent_std": null
64
+ }
sound_tokenizer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d4c61cde38acfb0cad9048a140c3533750277a8462b19dc08450d9fe1ad9879
3
+ size 1892409600
text_tokenizer/added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
text_tokenizer/chat_template.jinja ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0].role == 'system' %}
4
+ {%- if messages[0].content is string %}
5
+ {{- messages[0].content }}
6
+ {%- else %}
7
+ {%- for content in messages[0].content %}
8
+ {%- if 'text' in content %}
9
+ {{- content.text }}
10
+ {%- endif %}
11
+ {%- endfor %}
12
+ {%- endif %}
13
+ {{- '\n\n' }}
14
+ {%- endif %}
15
+ {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
16
+ {%- for tool in tools %}
17
+ {{- "\n" }}
18
+ {{- tool | tojson }}
19
+ {%- endfor %}
20
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
21
+ {%- else %}
22
+ {%- if messages[0].role == 'system' %}
23
+ {{- '<|im_start|>system\n' }}
24
+ {%- if messages[0].content is string %}
25
+ {{- messages[0].content }}
26
+ {%- else %}
27
+ {%- for content in messages[0].content %}
28
+ {%- if 'text' in content %}
29
+ {{- content.text }}
30
+ {%- endif %}
31
+ {%- endfor %}
32
+ {%- endif %}
33
+ {{- '<|im_end|>\n' }}
34
+ {%- endif %}
35
+ {%- endif %}
36
+ {%- set image_count = namespace(value=0) %}
37
+ {%- set video_count = namespace(value=0) %}
38
+ {%- for message in messages %}
39
+ {%- if message.role == "user" %}
40
+ {{- '<|im_start|>' + message.role + '\n' }}
41
+ {%- if message.content is string %}
42
+ {{- message.content }}
43
+ {%- else %}
44
+ {%- for content in message.content %}
45
+ {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}
46
+ {%- set image_count.value = image_count.value + 1 %}
47
+ {%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%}
48
+ <|vision_start|><|image_pad|><|vision_end|>
49
+ {%- elif content.type == 'video' or 'video' in content %}
50
+ {%- set video_count.value = video_count.value + 1 %}
51
+ {%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%}
52
+ <|vision_start|><|video_pad|><|vision_end|>
53
+ {%- elif 'text' in content %}
54
+ {{- content.text }}
55
+ {%- endif %}
56
+ {%- endfor %}
57
+ {%- endif %}
58
+ {{- '<|im_end|>\n' }}
59
+ {%- elif message.role == "assistant" %}
60
+ {{- '<|im_start|>' + message.role + '\n' }}
61
+ {%- if message.content is string %}
62
+ {{- message.content }}
63
+ {%- else %}
64
+ {%- for content_item in message.content %}
65
+ {%- if 'text' in content_item %}
66
+ {{- content_item.text }}
67
+ {%- endif %}
68
+ {%- endfor %}
69
+ {%- endif %}
70
+ {%- if message.tool_calls %}
71
+ {%- for tool_call in message.tool_calls %}
72
+ {%- if (loop.first and message.content) or (not loop.first) %}
73
+ {{- '\n' }}
74
+ {%- endif %}
75
+ {%- if tool_call.function %}
76
+ {%- set tool_call = tool_call.function %}
77
+ {%- endif %}
78
+ {{- '<tool_call>\n{"name": "' }}
79
+ {{- tool_call.name }}
80
+ {{- '", "arguments": ' }}
81
+ {%- if tool_call.arguments is string %}
82
+ {{- tool_call.arguments }}
83
+ {%- else %}
84
+ {{- tool_call.arguments | tojson }}
85
+ {%- endif %}
86
+ {{- '}\n</tool_call>' }}
87
+ {%- endfor %}
88
+ {%- endif %}
89
+ {{- '<|im_end|>\n' }}
90
+ {%- elif message.role == "tool" %}
91
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
92
+ {{- '<|im_start|>user' }}
93
+ {%- endif %}
94
+ {{- '\n<tool_response>\n' }}
95
+ {%- if message.content is string %}
96
+ {{- message.content }}
97
+ {%- else %}
98
+ {%- for content in message.content %}
99
+ {%- if content.type == 'image' or 'image' in content or 'image_url' in content %}
100
+ {%- set image_count.value = image_count.value + 1 %}
101
+ {%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%}
102
+ <|vision_start|><|image_pad|><|vision_end|>
103
+ {%- elif content.type == 'video' or 'video' in content %}
104
+ {%- set video_count.value = video_count.value + 1 %}
105
+ {%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%}
106
+ <|vision_start|><|video_pad|><|vision_end|>
107
+ {%- elif 'text' in content %}
108
+ {{- content.text }}
109
+ {%- endif %}
110
+ {%- endfor %}
111
+ {%- endif %}
112
+ {{- '\n</tool_response>' }}
113
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
114
+ {{- '<|im_end|>\n' }}
115
+ {%- endif %}
116
+ {%- endif %}
117
+ {%- endfor %}
118
+ {%- if add_generation_prompt %}
119
+ {{- '<|im_start|>assistant\n' }}
120
+ {%- endif %}
text_tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
text_tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
text_tokenizer/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654