waqasm86 commited on
Commit
8f24916
·
verified ·
1 Parent(s): e5a5ec2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - llama.cpp
5
+ - gguf
6
+ - gemma
7
+ - quantized
8
+ - cuda
9
+ language:
10
+ - en
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # llcuda Models
15
+
16
+ Optimized GGUF models for llcuda - Zero-config CUDA-accelerated LLM inference.
17
+
18
+ ## Models
19
+
20
+ ### google_gemma-3-1b-it-Q4_K_M.gguf
21
+
22
+ - **Model**: Google Gemma 3 1B Instruct
23
+ - **Quantization**: Q4_K_M (4-bit)
24
+ - **Size**: 769 MB
25
+ - **Use case**: General-purpose chat, Q&A, code assistance
26
+ - **Recommended for**: 1GB+ VRAM GPUs
27
+
28
+ **Performance:**
29
+ - Tesla T4 (Colab/Kaggle): ~15 tok/s
30
+ - Tesla P100 (Colab): ~18 tok/s
31
+ - GeForce 940M (1GB): ~15 tok/s
32
+ - RTX 30xx/40xx: ~25+ tok/s
33
+
34
+ ## Usage
35
+
36
+ ### With llcuda (Recommended)
37
+
38
+ ```python
39
+ pip install llcuda
40
+
41
+ import llcuda
42
+ engine = llcuda.InferenceEngine()
43
+ engine.load_model("gemma-3-1b-Q4_K_M")
44
+ result = engine.infer("What is AI?")
45
+ print(result.text)
46
+ ```
47
+
48
+ ### With llama.cpp
49
+
50
+ ```bash
51
+ # Download model
52
+ huggingface-cli download waqasm86/llcuda-models google_gemma-3-1b-it-Q4_K_M.gguf --local-dir ./models
53
+
54
+ # Run with llama.cpp
55
+ ./llama-server -m ./models/google_gemma-3-1b-it-Q4_K_M.gguf -ngl 26
56
+ ```
57
+
58
+ ## Supported Platforms
59
+
60
+ - ✅ Google Colab (T4, P100, V100, A100)
61
+ - ✅ Kaggle (Tesla T4)
62
+ - ✅ Local GPUs (GeForce, RTX, Tesla)
63
+ - ✅ All NVIDIA GPUs with compute capability 5.0+
64
+
65
+ ## Links
66
+
67
+ - **PyPI**: [pypi.org/project/llcuda](https://pypi.org/project/llcuda/)
68
+ - **GitHub**: [github.com/waqasm86/llcuda](https://github.com/waqasm86/llcuda)
69
+ - **Documentation**: [waqasm86.github.io](https://waqasm86.github.io/)
70
+
71
+ ## License
72
+
73
+ Apache 2.0 - Models are provided as-is for educational and research purposes.
74
+
75
+ ## Credits
76
+
77
+ - Model: Google Gemma 3 1B
78
+ - Quantization: llama.cpp GGUF format
79
+ - Package: llcuda by Waqas Muhammad