WhynotHug
/

PixDLM

Image Segmentation

compute-transparency

reasoning-segmentation

vision-language

Model card Files Files and versions

PixDLM / examples /README.md

WhynotHug's picture

Upload folder using huggingface_hub

3334467 verified 6 days ago

|

History Blame Contribute Delete

1.08 kB

	# Examples

	PixDLM evaluates reasoning segmentation samples in the DRSeg format.

	Minimal sample fields:

	```json
	{
	"id": "example_frame_0001",
	"width": 1024,
	"height": 1024,
	"metadata": {
	"time_of_day": "day",
	"location": "urban_road",
	"altitude": "60m",
	"camera_angle": "90deg"
	},
	"ann_list": [
	{
	"bbox": [100.0, 120.0, 80.0, 60.0],
	"segmentation": [[100.0, 120.0, 180.0, 120.0, 180.0, 180.0, 100.0, 180.0]],
	"area": 4800.0,
	"category_name": "car"
	}
	],
	"questions": [
	"Which vehicle is closest to the intersection and may affect traffic flow?"
	],
	"answers": [
	"<think>The target vehicle is positioned nearest to the intersection and aligned with the traffic lane.</think> <answer>The vehicle closest to the intersection is the target.</answer>"
	],
	"reasoning_types": ["spatial"]
	}
	```

	Evaluation outputs are written to:

	```text
	outputs/<exp_name>/<dataset_name>/with_cot/
	```

	Per sample:

	- `*_input.jpg`
	- `*_pred_mask.png`
	- `*_gt_mask.png`
	- `*_overlay_pred_red_gt_green.jpg`
	- `*_result.json`