Nikhil K. PRO

hexgridcloud

·

https://hexgrid.cloud

AI & ML interests

One-click deployment of Open-source LLMs, on managed and dedicated GPUs.

Recent Activity

new activity 4 days ago

google/gemma-4-31B-it:Benchmarked on HexGrid Cloud : Gemma-4 31B + vLLM + RTX 6000 PRO : 1168 tokens/sec and still asking for more...

upvoted an article 5 days ago

Gemma-4 31B + vLLM on RTX 6000 PRO : A Real-Load Benchmark

published an article 5 days ago

Gemma-4 31B + vLLM on RTX 6000 PRO : A Real-Load Benchmark

View all activity

Organizations

New activity in google/gemma-4-31B-it 4 days ago

Benchmarked on HexGrid Cloud : Gemma-4 31B + vLLM + RTX 6000 PRO : 1168 tokens/sec and still asking for more...

#123 opened 4 days ago by

upvoted an article 5 days ago

Article

Gemma-4 31B + vLLM on RTX 6000 PRO : A Real-Load Benchmark

hexgridcloud

•

5 days ago

• 3

published an article 5 days ago

Article

Gemma-4 31B + vLLM on RTX 6000 PRO : A Real-Load Benchmark

hexgridcloud

•

5 days ago

• 3

New activity in Qwen/Qwen3.5-9B 10 days ago

Deployed on HexGrid Cloud: 1x RTX 5090 + Qwen3.5 9B BF16 — 1280 tok/s peak, then TTFT goes from 0.7s to 18s, ShareGPT, concurrency 16–128

#58 opened 19 days ago by

updated 3 collections 28 days ago

One-click LLM deployments on Private GPU

Every model deployable on HexGrid Cloud with one click. Dedicated GPU, private API endpoint, OpenAI-compatible. Visit https://hexgrid.cloud • 10 items • Updated 28 days ago

Best Open-Source Coding LLMs for Private Deployment

Code generation, debugging, review, and test writing. All deployable privately on dedicated GPUs at hexgrid.cloud • 3 items • Updated 28 days ago

Production-Ready Quantized Chat LLMs — 4-bit & 8-bit

FP8, AWQ-4Bit and W8A8 quantized versions of popular models. Lower VRAM, same production quality. Deploy at hexgrid.cloud in one click. • 9 items • Updated 28 days ago