Issue: GLM-4.7-Flash Q6_K - Completely unusable output (looping)
Environment
- Model: unsloth/GLM-4.7-Flash-GGUF (Q6_K, 23GB)
- llama.cpp: build 7779 (commit 6df686bee)
- Hardware: RTX 4090, 128 GB RAM
Issue
Model produces completely unusable output regardless of input:
| Test | Input | Output |
|---|---|---|
| Chinese | "δ½ ε₯½" | Version changelog |
| Chinese | "θ―·δ»η»..." | Git repo structure |
| English | "hi" | S repeated 200x |
| English | "What is 2+2?" | Prompt analysis |
| Coding | "fibonacci..." | Git commit info |
| Math | "Solve x^2-4=0" | S repeated 200x |
test output
Test 1: Chinese - "δ½ ε₯½"
- 1.0.0 (2024-07-19)
- Initial release.
- 1.1.0 (2024-08-01)
- Add --no-verify flag to`
- Add --verify flag (default)
- Add --force flag
- Add --dry-run
- Add --verbose flag
- Add --debug flag
...
Test 2: Chinese - "θ―·δ»η»δΈδΈδ½ θͺε·±"
j: Simplified the logic for handling the --model argument. Now it's just a simple map of model names to their respective system prompts.
##Repo Structure: βββ .
βββ .github
S3
β βββ ...
βββ .vscode
β βββ ...
βββ src
β βββ main
β β βββ java
β β β βββ com
β β β βββ github
β β β βββ catcode
β β β βββ openai4j
β β β βββ cli
β β β βββ Main.java
Test 3: English - "hi"
j
S
S
S
S
S
S
S
S
S
S
S
S
S
S
...
(repeated 200 times)
Test 4: English - "What is 2+2?"
j #1. Analyze the Request: * Input:#1. **` (This looks like a list item or a header, but the content following it is missing).
* Context: The user provided a very short prompt. It's possible they want me to complete the sentence, they pasted a snippet by mistake, or they are asking for a specific type of response based on a previous (unseen) context.
* Hypothesis 1 (Completion): The user wants me to continue the list or the sentence.
Test 5: Coding - "Write fibonacci..."
j: S3ClientBuilder#build returns a client that uses the default region provider chain (#447)
##Repo Structure: βββ .
βββ .github
S3ClientBuilderTest.java
βββ README.md
βββ gradle.properties
βββ gradlew.bat
Additional changes:
- Updated S3ClientBuilder to return a client that uses the DefaultRegionProviderChain instead of DefaultS3RegionProvider.
Test 6: Math - "Solve x^2-4=0"
j
S
S
S
S
S
S
S
...
(repeated 200 times)
Parameters Used
--temp 0.2 --top-p 0.95 --top-k 50 --min-p 0.01 --dry-multiplier 1.1 --fit on
Additional Info
- β
Verified --dry-multiplier exists in help
- β
Model loads without errors
- β
GPU is being used
- β dry_multiplier has ZERO effect
- β UD-Q4_K_XL has same issue
- β Q6_K has same issue
Suggested Solutions
1. Test with higher dry-multiplier (2.0, 5.0, 10.0)?
2. Verify GGUF conversion for deepseek2 architecture?
3. Check if dry-multiplier works for deepseek2?
Impact: Model is 100% unusable. Can't use for any task. π
Ref: https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF/discussions/1
Hello @gannima did you manage to solve the issue?
yes, the solution is on the https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF/discussions/1 , i just post my test output and fixing way in the latest posting
Global GLM 4.7 flash problem from all - go to loop and writes crazy things . This is the first time I've seen a model glitch like this, regardless of quantization. LM studio ( with new glm 4.7 flash update )