-
Item-Language Model for Conversational Recommendation
Paper • 2406.02844 • Published • 13 -
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
Paper • 2412.18176 • Published • 16 -
hkuds/RecGPT_model
Updated • 4 -
hkuds/easyrec-roberta-base
Updated • 19 • 3
Marcus Gawronsky
marcusinthesky
AI & ML interests
Representation Learning
Recent Activity
published an article 14 days ago
Models are Markup, Tokens are Features... upvoted an article 14 days ago
The Five Technologies driving AI 3.0 published an article 14 days ago
The Five Technologies driving AI 3.0Organizations
VLM Benchmarks
-
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Paper • 2410.10139 • Published • 51 -
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Paper • 2410.10563 • Published • 37 -
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Paper • 2410.10783 • Published • 26 -
TVBench: Redesigning Video-Language Evaluation
Paper • 2410.07752 • Published • 6
Multi-modal Mamba
-
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Paper • 2403.14520 • Published • 35 -
ZigMa: Zigzag Mamba Diffusion Model
Paper • 2403.13802 • Published • 18 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12
Tiny VLM Decoder
Foundational
Datasets
Open-vocabulary object detection (OVD).
-
Simple Open-Vocabulary Object Detection with Vision Transformers
Paper • 2205.06230 • Published • 4 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 175k • 149 -
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Paper • 2305.07011 • Published • 6 -
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Paper • 2306.05493 • Published • 6
Multimodal Embeddings
-
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Paper • 2403.19651 • Published • 26 -
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 30 -
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 47
PeFT
Decoder Upcycled to Embeddings
ZecRec
-
Item-Language Model for Conversational Recommendation
Paper • 2406.02844 • Published • 13 -
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
Paper • 2412.18176 • Published • 16 -
hkuds/RecGPT_model
Updated • 4 -
hkuds/easyrec-roberta-base
Updated • 19 • 3
Datasets
VLM Benchmarks
-
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Paper • 2410.10139 • Published • 51 -
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Paper • 2410.10563 • Published • 37 -
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content
Paper • 2410.10783 • Published • 26 -
TVBench: Redesigning Video-Language Evaluation
Paper • 2410.07752 • Published • 6
Open-vocabulary object detection (OVD).
-
Simple Open-Vocabulary Object Detection with Vision Transformers
Paper • 2205.06230 • Published • 4 -
google/owlvit-base-patch32
Zero-Shot Object Detection • 0.2B • Updated • 175k • 149 -
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Paper • 2305.07011 • Published • 6 -
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Paper • 2306.05493 • Published • 6
Multi-modal Mamba
-
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Paper • 2403.14520 • Published • 35 -
ZigMa: Zigzag Mamba Diffusion Model
Paper • 2403.13802 • Published • 18 -
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Paper • 2403.15360 • Published • 13 -
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection
Paper • 2403.19888 • Published • 12
Multimodal Embeddings
-
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Paper • 2403.19651 • Published • 26 -
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 30 -
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper • 2403.20327 • Published • 47
Tiny VLM Decoder
PeFT
Foundational
Decoder Upcycled to Embeddings