Spaces:

Kasher13
/

tch-ai

Sleeping

tch-ai / README.md

feat(ai): Gemma 4 31B primary with retry + thinking model extraction

22af747 verified 25 days ago

961 Bytes

	---
	title: TwoCentsHustler AI
	emoji: 📈
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	license: apache-2.0
	---

	# TwoCentsHustler AI Space

	Local inference on cpu-basic (free, unlimited).
	Runs `gemma-4-E4B-it-Q4_K_M.gguf` (~2.7 GB) via `llama-cpp-python`.

	Fallback provider for the TwoCentsHustler financial news platform.

	## Endpoint

	`POST /api/ai` — `{ "operation": "analyze"\|"summarize"\|"cluster", "payload": {...} }`

	## Environment Variables

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `GGUF_REPO` \| `unsloth/gemma-4-E4B-it-GGUF` \| HF repo containing the GGUF file \|
	\| `GGUF_FILE` \| `gemma-4-E4B-it-Q4_K_M.gguf` \| Quantization variant to load \|
	\| `N_THREADS` \| `2` \| CPU threads for inference \|
	\| `N_CTX` \| `4096` \| Context window size \|
	\| `HF_TOKEN` \| — \| Optional: for gated models \|

	## Hardware

	`cpu-basic` — 2 vCPU, 16 GB RAM.
	Inference: ~20-40s per call.