Issue: GLM-4.7-Flash Q6_K - Completely unusable output (looping)

#3
by gannima - opened

Environment

  • Model: unsloth/GLM-4.7-Flash-GGUF (Q6_K, 23GB)
  • llama.cpp: build 7779 (commit 6df686bee)
  • Hardware: RTX 4090, 128 GB RAM

Issue

Model produces completely unusable output regardless of input:

Test Input Output
Chinese "δ½ ε₯½" Version changelog
Chinese "请介绍..." Git repo structure
English "hi" S repeated 200x
English "What is 2+2?" Prompt analysis
Coding "fibonacci..." Git commit info
Math "Solve x^2-4=0" S repeated 200x

test output

Test 1: Chinese - "δ½ ε₯½"

  • 1.0.0 (2024-07-19)
    • Initial release.
  • 1.1.0 (2024-08-01)
    • Add --no-verify flag to`
    • Add --verify flag (default)
    • Add --force flag
    • Add --dry-run
    • Add --verbose flag
    • Add --debug flag
      ...

Test 2: Chinese - "请介绍一下你θ‡ͺε·±"

j: Simplified the logic for handling the --model argument. Now it's just a simple map of model names to their respective system prompts.
##Repo Structure: └── .
β”œβ”€β”€ .github
S3
β”‚ └── ...
β”œβ”€β”€ .vscode
β”‚ └── ...
β”œβ”€β”€ src
β”‚ β”œβ”€β”€ main
β”‚ β”‚ β”œβ”€β”€ java
β”‚ β”‚ β”‚ └── com
β”‚ β”‚ β”‚ └── github
β”‚ β”‚ β”‚ └── catcode
β”‚ β”‚ β”‚ └── openai4j
β”‚ β”‚ β”‚ └── cli
β”‚ β”‚ β”‚ └── Main.java

Test 3: English - "hi"

j
S
S
S
S
S
S
S
S
S
S
S
S
S
S
...
(repeated 200 times)

Test 4: English - "What is 2+2?"

j #1. Analyze the Request: * Input:#1. **` (This looks like a list item or a header, but the content following it is missing).
* Context: The user provided a very short prompt. It's possible they want me to complete the sentence, they pasted a snippet by mistake, or they are asking for a specific type of response based on a previous (unseen) context.
* Hypothesis 1 (Completion): The user wants me to continue the list or the sentence.

Test 5: Coding - "Write fibonacci..."

j: S3ClientBuilder#build returns a client that uses the default region provider chain (#447)
##Repo Structure: └── .
β”œβ”€β”€ .github
S3ClientBuilderTest.java
β”œβ”€β”€ README.md
β”œβ”€β”€ gradle.properties
└── gradlew.bat
Additional changes:

  • Updated S3ClientBuilder to return a client that uses the DefaultRegionProviderChain instead of DefaultS3RegionProvider.

Test 6: Math - "Solve x^2-4=0"

j
S
S
S
S
S
S
S
...
(repeated 200 times)

Parameters Used

--temp 0.2 --top-p 0.95 --top-k 50 --min-p 0.01 --dry-multiplier 1.1 --fit on

Additional Info

- βœ… Verified --dry-multiplier exists in help
- βœ… Model loads without errors
- βœ… GPU is being used
- ❌ dry_multiplier has ZERO effect
- ❌ UD-Q4_K_XL has same issue
- ❌ Q6_K has same issue

Suggested Solutions

1. Test with higher dry-multiplier (2.0, 5.0, 10.0)?
2. Verify GGUF conversion for deepseek2 architecture?
3. Check if dry-multiplier works for deepseek2?

Impact: Model is 100% unusable. Can't use for any task. πŸ™

Ref: https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF/discussions/1
gannima changed discussion status to closed

Hello @gannima did you manage to solve the issue?

Hello @gannima did you manage to solve the issue?

yes, the solution is on the https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF/discussions/1 , i just post my test output and fixing way in the latest posting

gannima changed discussion status to open

@shimmyshimmer

Thank you I got it we'll investigate

Global GLM 4.7 flash problem from all - go to loop and writes crazy things . This is the first time I've seen a model glitch like this, regardless of quantization. LM studio ( with new glm 4.7 flash update )

Sign up or log in to comment