Commit History

Switch to Qwen-1.5-1.8B-Chat - verified multilingual model with good Indic support
3862877

hardkpentium101 Qwen-Coder commited on

Pass HF_TOKEN explicitly to model loading - more reliable in Docker
aefd0f7

hardkpentium101 Qwen-Coder commited on

Switch to AI4Bharat IndicLLM - better support for 11 Indic languages
057cc64

hardkpentium101 Qwen-Coder commited on

update max new token to 1024
a9b1188

hardkpentium101 commited on

Simplify validation - use general patterns, not hardcoded lists
dd9966e

hardkpentium101 Qwen-Coder commited on

Set generation_config on model only, not passed to pipeline - fixes duplicate arg error
53643df

hardkpentium101 Qwen-Coder commited on

Suppress max_new_tokens/max_length warning - cosmetic only
e9cffd1

hardkpentium101 Qwen-Coder commited on

Create gen_config once, pass to pipeline once - clean implementation
288b50e

hardkpentium101 Qwen-Coder commited on

Use GenerationConfig object passed to pipeline - proper format
bff2384

hardkpentium101 Qwen-Coder commited on

Pass generation params directly to pipeline - no GenerationConfig object
2bbaeb3

hardkpentium101 Qwen-Coder commited on

Remove redundant config override - set generation_config once
b0da3b5

hardkpentium101 Qwen-Coder commited on

Force override model.config max_length and max_new_tokens to fix warning
52b77b9

hardkpentium101 Qwen-Coder commited on

Use structured prompt format (CONTEXT/OBJECTIVE/STYLE/TONE/AUDIENCE/RESPONSE)
bea43dd

hardkpentium101 Qwen-Coder commited on

Simplify prompt and validation - robust creative output
e6027ef

hardkpentium101 Qwen-Coder commited on

Fix syntax error: missing quote in invalid_patterns
9a520d5

hardkpentium101 Qwen-Coder commited on

Use comprehensive creative writer prompt with strict output validation
1c91cf7

hardkpentium101 Qwen-Coder commited on

Fix prompt leakage: simpler prompt format, stricter filtering for exact leakage patterns
3b06a13

hardkpentium101 Qwen-Coder commited on

Set max_length=None in GenerationConfig to override model defaults
9164057

hardkpentium101 Qwen-Coder commited on

Fix max_length conflict by explicitly setting max_length=None in pipeline
355f389

hardkpentium101 Qwen-Coder commited on

Add poem/story specifications: type, theme, length, style
5737e4a

hardkpentium101 Qwen-Coder commited on

Add output validation and strengthen prompt against meta-commentary
e9c1b7a

hardkpentium101 Qwen-Coder commited on

Add user-selected language to prompt for proper language response
5c2c171

hardkpentium101 Qwen-Coder commited on

Restructure prompt for direct creative output, improve meta-commentary filtering
5d1a0cf

hardkpentium101 Qwen-Coder commited on

Improve output cleaning: remove </s>, [INST], bracket numbers
bdbc8d1

hardkpentium101 Qwen-Coder commited on

Strengthen prompt: never refer to or explain context
eae5493

hardkpentium101 Qwen-Coder commited on

Add output cleaning to filter informal/garbled text
ff4f2a3

hardkpentium101 Qwen-Coder commited on

Update prompt for creative writing, increase temperature to 0.9 and top_p to 0.92
bebbcff

hardkpentium101 Qwen-Coder commited on

Set generation_config on model directly to avoid duplicate param error
177f8b5

hardkpentium101 Qwen-Coder commited on

Use GenerationConfig to avoid parameter conflict warnings
d8b5182

hardkpentium101 Qwen-Coder commited on

Fix max_length conflict, set max_new_tokens to 1024
93d3286

hardkpentium101 Qwen-Coder commited on

Increase max_tokens to 4096, send top 3 docs, truncate context to 800 chars
cf6e686

hardkpentium101 Qwen-Coder commited on

Increase max_new_tokens to 1024 for longer responses
27f1789

hardkpentium101 Qwen-Coder commited on

Update prompt for Hindi literature expertise and multilingual support
8e3ac92

hardkpentium101 Qwen-Coder commited on

Fix prompt template and increase max_new_tokens to 512
c8b8fcf

hardkpentium101 Qwen-Coder commited on

Fix CPU inference: auto-detect GPU, use float16 on CPU
7e8fd52

hardkpentium101 Qwen-Coder commited on

Use bitsandbytes 4-bit quantization instead of AirLLM (more stable)
83eb81f

hardkpentium101 Qwen-Coder commited on

Add airllm and optimum to requirements
4e571e5

hardkpentium101 Qwen-Coder commited on

Use AirLLM 4-bit quantization for Sarvam-1 (uses ~1.5GB RAM)
c47fb58

hardkpentium101 Qwen-Coder commited on

Switch to TinyLlama-1.1B with float16 for lower memory
916bdad

hardkpentium101 Qwen-Coder commited on

Tighten prompt to reduce meta-commentary
4dd4fff

hardkpentium101 Qwen-Coder commited on

Improve system prompt for better RAG responses
a059043

hardkpentium101 Qwen-Coder commited on

Add debug logging for prompt/response
71ceb5b

hardkpentium101 Qwen-Coder commited on

Optimize generation params for HF free tier CPU
3343db3

hardkpentium101 Qwen-Coder commited on

Remove local_files_only from pipeline - not valid for model_kwargs
404a31f

hardkpentium101 Qwen-Coder commited on

Fix Sarvam-1 model loading: enable download and use correct dtype parameter
a9f11ed

hardkpentium101 Qwen-Coder commited on

Fix duplicate local_files_only keyword argument in Sarvam-1 initialization
52796bf

hardkpentium101 Qwen-Coder commited on

Pre-download models in Dockerfile, use cache at runtime
d69e53e

hardkpentium101 Qwen-Coder commited on