similarity between embedding

by Khalil0000 - opened Dec 15, 2025

Dec 15, 2025

Hello,
when comparing the Two embeddings from the original T5 and the distilled one how much similarity i would expect, and if there are similarity between the Two embeddings would doing Lora on the student model to finetune it be beneficial ?

LifuWang

Owner Dec 16, 2025

I didn't calculate the similarity between the embeddings, but they should be rather different. The student model learns a local minima that works well for Flux but does not extend to other diffusion models.

Khalil0000

Dec 20, 2025

•

edited Dec 20, 2025

Thank you so much for replying, I have to many questions and very interested.
Hope I’m not asking too many questions 😅

can i finetune the student with this loss for a specific domain which i observed it not as good in terms of prompt adherence, avoiding doing the VLoss, because i have limited compute resources
Loss = MSE(teacher_embeddings, student_embeddings) + λ * MSE(student_embeddings, current_student_embeddings(original_prompts))

where
teacher_embeddings: T5-XXL embeddings for furnishings prompts
student_embeddings: Your distilled T5 embeddings for same prompts
original_student_embeddings: the original student
λ : regularization value

LifuWang

Owner Dec 22, 2025

You are welcome to ask any questions. I think the regularization term may help mitigate mode collapse, but it could also limit the student model’s ability to fully capture the capacity of T5-XXL. In addition, since T5 is trained using a cross-entropy loss, you may also consider experimenting with that objective.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment