The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+
•
42
The AI community building the future.
hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.--max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.
device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.output_scores=True to get similarity scores returned. This can be useful for some distillation losses!