theprint
/

TiTan-Gemma3-1B

Text Generation

text-generation-inference

Model card Files Files and versions

theprint commited on Aug 12, 2025

Commit

63c9a30

·

verified ·

1 Parent(s): dcb95bd

Update README.md

Files changed (1) hide show

README.md +11 -10

README.md CHANGED Viewed

@@ -29,6 +29,17 @@ This model is a fine-tuned version of google/gemma-3-1b-it using the Unsloth fra
 - **Base model:** google/gemma-3-1b-it
 - **Fine-tuning method:** LoRA with rank 128
 ## Intended Use
 Title and tag generation.
@@ -99,16 +110,6 @@ outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=
 response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
 print(response)
 ```
-## GGUF Quantized Versions
-Quantized GGUF versions are available in the `gguf/` directory for use with llama.cpp:
-- `TiTan-Gemma3-1B-f16.gguf` (2489.6 MB) - 16-bit float (original precision, largest file)
-- `TiTan-Gemma3-1B-q3_k_m.gguf` (850.9 MB) - 3-bit quantization (medium quality)
-- `TiTan-Gemma3-1B-q4_k_m.gguf` (966.7 MB) - 4-bit quantization (medium, recommended for most use cases)
-- `TiTan-Gemma3-1B-q5_k_m.gguf` (1027.9 MB) - 5-bit quantization (medium, good quality)
-- `TiTan-Gemma3-1B-q6_k.gguf` (1270.9 MB) - 6-bit quantization (high quality)
-- `TiTan-Gemma3-1B-q8_0.gguf` (1325.8 MB) - 8-bit quantization (very high quality)
 ### Using with llama.cpp

 - **Base model:** google/gemma-3-1b-it
 - **Fine-tuning method:** LoRA with rank 128
+## GGUF Quantized Versions
+Quantized GGUF versions are available at [theprint/TiTan-Gemma3-1B-GGUF](https://huggingface.co/theprint/TiTan-Gemma3-1B-GGUF) for use with llama.cpp:
+- `TiTan-Gemma3-1B-f16.gguf` (2489.6 MB) - 16-bit float (original precision, largest file)
+- `TiTan-Gemma3-1B-q3_k_m.gguf` (850.9 MB) - 3-bit quantization (medium quality)
+- `TiTan-Gemma3-1B-q4_k_m.gguf` (966.7 MB) - 4-bit quantization (medium, recommended for most use cases)
+- `TiTan-Gemma3-1B-q5_k_m.gguf` (1027.9 MB) - 5-bit quantization (medium, good quality)
+- `TiTan-Gemma3-1B-q6_k.gguf` (1270.9 MB) - 6-bit quantization (high quality)
+- `TiTan-Gemma3-1B-q8_0.gguf` (1325.8 MB) - 8-bit quantization (very high quality)
 ## Intended Use
 Title and tag generation.
 response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
 print(response)
 ```
 ### Using with llama.cpp