·
AI & ML interests
None yet
Recent Activity
posted an update 1 day ago VQASynth is the open source implementation of the https://huggingface.co/papers/2401.12168 paper, putting together the data synthesis pipeline behind https://huggingface.co/remyxai/SpaceQwen2.5-VL-3B-Instruct, https://huggingface.co/remyxai/SpaceThinker-Qwen2.5VL-3B, and several other spatial reasoning models we've shared here on HF.
From early development through production, different categories of evidence become available to guide what to try next. The strongest decisions combine evidence across categories rather than relying on any one.
Stage 1: Development history
Commit history holds the moments where things changed. For VQASynth, that's how scenes get parsed, how captions get generated, how spatial relations get encoded. Even before a model is in production, those milestones are a strong signal for what methods are semantically relevant to where the system is now.
Stage 2: Observational outcomes
Once a model is serving, the same commit history delineates changes against real-world results. That opens up quasi-experiments. You get causal evidence about which changes drove which outcomes, and inference on questions you haven't directly tested.
Stage 3: Controlled experiments
When teams start running interventions, those outcomes tighten the estimates further. This is the regime most people associate with rigor, but it's expensive and gated by traffic.
Stage 4: Counterfactual perturbations
When A/B testing becomes the operational bottleneck, instrumenting decision points in the production system lets you probe what would have happened under alternative choices. Shadow mode first, live traffic once audits pass.
Experimentation maturity is a journey, and every stage offers something to learn from.
More on these ideas: https://docs.remyx.ai/concepts/maturity-progression View all activity Organizations
salma-remyx/vqasynth_testing_evals_eval
Viewer
• Updated • 5 • 22
salma-remyx/vqasynth_testing_evals
Viewer
• Updated • 5 • 22
salma-remyx/vqasynth_testing_evals_full_reasoning
Viewer
• Updated • 5 • 20
salma-remyx/vqasynth_sample_processed
Viewer
• Updated • 5 • 11
salma-remyx/vqasynth_sample_processed_full
Viewer
• Updated • 5 • 19
salma-remyx/remyxai_docker_images_with_content
Viewer
• Updated • 10.4k • 18
• 1
salma-remyx/remyxai_docker_images
Viewer
• Updated • 10.4k • 11
• 1
salma-remyx/vqasynth_sample_processed_test
Viewer
• Updated • 5 • 7
salma-remyx/vqasynth_sample_processed_test_full
Viewer
• Updated • 5 • 6
salma-remyx/SpaceOm_MindCube_Results
Updated • 17
salma-remyx/SpaceThinker_SpatialScore-Hard
Updated • 11
salma-remyx/SpaceOm_SpatialScore-Hard
salma-remyx/SpaceOm_OmniSpatial
salma-remyx/SpaceThinker_SpaCE-10_Results
Preview
• Updated • 6
salma-remyx/SpaceQwen_SpaCE-10_Results
Preview
• Updated • 4
salma-remyx/SpaceOm_SpaCE-10_Results
Preview
• Updated • 4
salma-remyx/SpaceOm_SpatialScore
Updated • 10
• 1
salma-remyx/SpaceThinker_SpatialScore
Updated • 6
• 1
salma-remyx/Q-Spatial-Bench-sMAPE-Comparison
Viewer
• Updated • 13 • 8
• 1
salma-remyx/vqasynth_sample_processed_dummy
Viewer
• Updated • 5 • 9
salma-remyx/vqasynth_sample_processed_dummy_full
Viewer
• Updated • 5 • 7
salma-remyx/localllama-sentiment-Why-new-models-feel-dumber
Viewer
• Updated • 20 • 12
• 1
Viewer
• Updated • 8 • 5
salma-remyx/vqasynth_processed_r1_12k
Viewer
• Updated • 12.7k • 15
salma-remyx/vqasynth_processed_r1_12k_full_reasoning
Viewer
• Updated • 12.7k • 18
salma-remyx/ffmperative-sample
Viewer
• Updated • 1.89k • 7
Viewer
• Updated • 6.38k • 14
• 1
salma-remyx/vqasynth_nas_example_ds
Viewer
• Updated • 51 • 8
salma-remyx/vqasynth_nas_example_ds_full
Viewer
• Updated • 51 • 9
salma-remyx/nas_example_ds
Viewer
• Updated • 58 • 7