The code in github is just fake code????

#1
by sjmind - opened

You can publish nothing, but you shouldn't tell a lie.

负责人来解释解释?ziyuan?

4b56a081bba8e016e177b181b8e1d96e

c9132a7d658c9652c8673997a6701d8e

3bd7515ec65411da37e64c5f022f649d

b05b04dc9ab5012092115c939b016b43

48191b51fe017ce1e1f08f264e862976

inclusionAI org

Hi, that's an excellent question, and thank you for your close attention to our work!

You're right to point out that the current public interfaces for understanding and generation are separate. This was a deliberate choice for two primary reasons:

  1. Clear Evaluation: It allows the community to independently verify the model's performance on both tasks, which is a standard practice for benchmarking.
  2. Inference Pipelines: The two tasks currently have slightly different preprocessing needs during inference (e.g., mixed-resolution and classifier-free guidance for generation).

The key thing to emphasize is that this separation is only at the interface level, not within the model's core architecture.
That said, unified interface for generation and understanding is crucial to natively unify visual understanding and generation within a single autoregressive framework. We are actively working on it and the corresponding codes will be released in the coming days.

We’d love to have you involved in shaping the project—feel free to open issues, suggest features, or submit PRs so we can build this together!

Best regards,
Ming team

inclusionAI org

Hi, thank you again for your feedback!

We’re excited to share that we’ve now released the unified interface for image understanding, generation, and editing! This update allows seamless multimodal interactions within a single autoregressive framework, supporting flexible input types ("text" and "image"), mixed input orders, and multi-turn conversations via internal state management.

Key features:

  • Image generation: Use descriptive prompts with output_image_prefix to save generated images.
  • Image understanding: Include both "image" and "text" in the same message for joint reasoning.
  • Image editing: Chain multiple generate(..., for_edit=True) calls with unique output_image_prefix names.
  • Multi-turn interactions: Supported via the model’s internal state — call model.reset_inner_state() to reset when needed.

You can find detailed usage examples in the updated README. We’d love for you to try it out and let us know what you think!
As always, we welcome your contributions — feel free to open issues, suggest improvements, or submit PRs. Let’s build this together!

Best regards,
Ming team

zyhuangnus changed discussion status to closed

Sign up or log in to comment