Ring and Ling models
please update llama.cpp
https://github.com/ggml-org/llama.cpp/pull/16063
some cool models!
1T models:
https://huggingface.co/inclusionAI/Ring-1T
https://huggingface.co/inclusionAI/Ling-1T
103B models
https://huggingface.co/inclusionAI/Ling-flash-2.0
https://huggingface.co/inclusionAI/Ring-flash-2.0
16B models
Nice I got Ling-1T quantizing on nico1 despite mradermacher not yet having updated llama.cpp by building llama.cpp from source and using the recently introduced llama argument for the first time ☺️
venv/bin/python convert_hf_to_gguf.py /cpool/Ling-1T --outtype=source --outfile=/transfer/Ling-1T.gguf
rm -rf /llmjob/llama.cpp-nico
rm -rf llama.cpp
git clone --recursive https://github.com/nicoboss/llama.cpp.git
mv llama.cpp /llmjob/llama.cpp-nico
cd /llmjob/llama.cpp-nico
python -m venv venv
venv/bin/pip install -r requirements.txt
export CUDA_HOME=/usr/local/cuda-13.0/bin
export PATH=/usr/local/cuda-13.0/bin:$PATH
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-13.0/lib64
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j
nico1 ~# llmc add -8000 si https://huggingface.co/inclusionAI/Ling-1T llama nico worker nico1
submit tokens: ["-8000","static","imatrix","llama","nico","worker","nico1","https://huggingface.co/inclusionAI/Ling-1T"]
https://huggingface.co/inclusionAI/Ling-1T
nico1 ~# llmc push-model nico1 Ling-1T
nico1 Ling-1T: run job static (399093560965.7, 2199374439386.3, 1200000000000, -8000)
Ling-1T submitted to nico1
Usualy I would say that you can check the status under https://hf.tst.eu/status.html but it is currently down and llmc restart-llmstatusd does not fix it so you will likely have to wait for quants to appear under https://hf.tst.eu/model#Ling-1T-GGUF
I usually don't give my opinion about models but I have to say Ling is really good having has it's unique writing style and knoledge while Ring is kind of bad. Definately worse than Ring for Q&A questions - at least for the 103B models I have extensively tested using vLLM during the past week.
I have one mainline llama.cpp compatible big Ling-1T-GGUF quant available if you have about 256GB RAM+VRAM: https://huggingface.co/ubergarm/Ling-1T-GGUF/tree/main/smol-IQ2_XXS
The rest of those quants are ik_llama.cpp only, the smol-IQ2_XXS I made to help test the PR and seems to be working fine in limited testing by myself and a couple reports.
Thanks @nicoboss team mradermacher for crunching this big one and cool to hear another report. I've heard it is good for logic / STEM questions and in some of my own very limited role playing experiments it seems steerable even if it might initially refuse.
Curious to see how the ling-flash-2.0-103B-A6.1B compares with upcoming GLM-4.6-Air which is a somewhat similar footprint... Lots of models to play with though for sure!
I see Ling but not Ring yet ;)
I prioritized Ling because I personally prefer it over Ring but no worries they will all be quantized as soon as possible. I even put the custom updated version of llama.cpp on rich1 so it can work on it as well.
The status page is now available again so you can always check it for the latest status: https://hf.tst.eu/status.html
nico1 is currently working on: Ling-1T , Ling-mini-2.0 and Ling-flash-2.0
rich1 is currently working on Ring-mini-2.0 and static-only Ring-flash-2.0
All static Ling-flash-2.0 quants and its imatrix are already computed while imatrix quants are waiting for more storage budget which it should get once Q8_0 quants of Ling-1T are uploaded. Regarding Ring-flash-2.0 imatrix computation and imatrix quants I will check in a few houers when the storage situation on nico1 hopefully improved.
not yet having updated llama.cpp by building llama.cpp from source
you hacker 😄 I'm also happy it worked .)
llama.cpp has been updated
Wow, a Q8_0 is going to be 1TB.
And, yeah, the status page was down, together with dozens of other sites, but finally kaos is running trixie. Only rain is left (and a few services).