DLM variant?
#21
by danieledll - opened
I wonder how well would it perform if it would be converted to a DLM, here an interesting solution that was applied on Qwen with amazing results both in terms of performance and in terms of retention
https://github.com/tencent/WeDLM