HyperCLOVAX-SEED-Think-32B-4bit / README.md

Update README.md

875b0ec verified 14 days ago

4.16 kB

	---
	base_model:
	- naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
	---

	Thanks to naver-hyperclovax



	# HyperCLOVA X SEED 32B Think - 4bit Quantized
	This is a 4-bit quantized version of [naver-hyperclovax/HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B) using bitsandbytes NF4 quantization with double quantization for optimal memory efficiency.

	## Model Overview
	HyperCLOVA X SEED 32B Think is an advanced vision-language thinking model that extends the SEED Think 14B line.


	## Quantization Details
	Quantization Method: bitsandbytes NF4 (NormalFloat 4-bit)
	Compute dtype: bfloat16
	Storage dtype: uint8
	Double Quantization: Enabled






	## Installation

	### Requirements

	```bash
	pip install torch transformers bitsandbytes accelerate
	```

	### Quick Start

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "jjjssjs/HyperCLOVAX-SEED-Think-32B-4bit"

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(
	model_id,
	trust_remote_code=True,
	fix_mistral_reges=True
	)

	# Load quantized model (quantization config is in config.json)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	)

	# Generate
	inputs = tokenizer("양자역학이 뭐야?", return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=100,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```






	## Usage Examples

	### Basic Text Generation

	```python
	prompt = "Explain quantum computing in simple terms."

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.7,
	top_p=0.9,
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Image Understanding

	```python
	from PIL import Image

	# Load image
	image = Image.open("example.jpg")

	# Prepare inputs
	text = "Describe this image in detail."
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	# Generate response
	outputs = model.generate(
	**inputs,
	max_new_tokens=150,
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Multi-turn Conversation

	```python
	conversation = [
	{"role": "user", "content": "What is machine learning?"},
	{"role": "assistant", "content": "Machine learning is..."},
	{"role": "user", "content": "Can you give me an example?"}
	]

	# Process conversation
	inputs = tokenizer.apply_chat_template(
	conversation,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(inputs, max_new_tokens=200)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```


	Features:
	- Reasoning mode with `<think>...</think>` output
	- Multi-turn conversation support
	- Image/Video understanding
	- Korean-centric reasoning
	- Long-context understanding (128K tokens)




	## Performance Considerations

	### Advantages of 4-bit Quantization

	- Memory Efficient: Fits on consumer GPUs
	- Fast Loading: ~8 seconds vs minutes for full precision
	- Cost Effective: No need for expensive A100 80GB GPUs
	- Practical Deployment: Suitable for edge devices and personal use

	### Trade-offs

	- Slight Quality Loss: Minor degradation in output quality compared to full precision
	- Inference Speed: ~4.5 tokens/sec (may vary by hardware)
	- Precision: 4-bit weights vs 16-bit (original)

	## Known Issues

	- Tokenizer warning about regex pattern (can be ignored or fixed with `fix_mistral_regex=True`)
	- Some vision packages may show import warnings (does not affect text-only inference)




	## Benchmark Results
	Note: Quantized model benchmarks pending. Performance may differ slightly from the original model.
	For original model benchmarks, see: [HyperCLOVAX-SEED-Think-32B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Think-32B)

	## License
	This model is licensed under the HyperCLOVA X SEED 32B Think Model License Agreement.