8GB VRAM Local LLMs - Practitioner Tested Collection Real practitioner benchmarks of small/mid open-source LLMs on consumer 8GB VRAM hardware (RTX 4060 Ti). • 4 items • Updated about 14 hours ago • 3
Granite 4.1 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 6 items • Updated 7 days ago • 45
APEX Quants (GGUF) Collection MoE models quantized with the APEX Quantization technique ( https://github.com/mudler/apex-quant ) • 27 items • Updated 7 days ago • 87
1930 Coder Collection Fine-tuning the Talkie 13B 1930 model on agentic trajectories • 4 items • Updated about 7 hours ago • 4
Laguna XS.2 Collection Designed for agentic coding and long-horizon work on a local machine. Apache 2.0. • 4 items • Updated 8 days ago • 18
privacy-filter Collection OpenAI's privacy-filter fine0tuned models • 6 items • Updated 2 days ago • 8
talkie-13b Collection talkie-1930-13b is a vintage language model trained on pre-1931 English-language text. See https://github.com/talkie-lm/talkie to run talkie. • 3 items • Updated 15 days ago • 45
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem Paper • 2411.17525 • Published Nov 26, 2024 • 6
HIGGS Collection Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run. • 18 items • Updated Feb 18 • 15
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification Paper • 2604.14531 • Published 20 days ago • 7
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation Paper • 2604.09497 • Published 26 days ago • 29