Data correction
#1
by
TWei-flm
- opened
added this
LFM2.5-1.2B-Thinking excels at long-context inference. For example, on AMD Ryzen™ NPUs with FastFlowLM, decoding throughput sustains ~52 tok/s at 16K context and ~46 tok/s even at the full 32K context, indicating robust long-context scalability. For more details on longer context benchmarks on AMD Ryzen™ NPUs with FastFlowLM, please review these here.
to light long context capability
minor correction on FLM perf. data
mlabonne
changed pull request status to
merged