Training Details
#13
by RedFairy - opened
Thanks for the great work!
I have a question on the training detail on the conditioning mechanism. Specifically, does the model takes rendered point cloud at the novel view as an image condition? Or the model works simply by adding camera tokens (as text tokens) to the prompt?
Thank you!