Switch to Qwen-1.5-1.8B-Chat - verified multilingual model with good Indic support 3862877 hardkpentium101 Qwen-Coder commited on 27 days ago
Pass HF_TOKEN explicitly to model loading - more reliable in Docker aefd0f7 hardkpentium101 Qwen-Coder commited on 27 days ago
Switch to AI4Bharat IndicLLM - better support for 11 Indic languages 057cc64 hardkpentium101 Qwen-Coder commited on 27 days ago
Simplify validation - use general patterns, not hardcoded lists dd9966e hardkpentium101 Qwen-Coder commited on 28 days ago
Set generation_config on model only, not passed to pipeline - fixes duplicate arg error 53643df hardkpentium101 Qwen-Coder commited on 28 days ago
Suppress max_new_tokens/max_length warning - cosmetic only e9cffd1 hardkpentium101 Qwen-Coder commited on 28 days ago
Create gen_config once, pass to pipeline once - clean implementation 288b50e hardkpentium101 Qwen-Coder commited on 28 days ago
Use GenerationConfig object passed to pipeline - proper format bff2384 hardkpentium101 Qwen-Coder commited on 28 days ago
Pass generation params directly to pipeline - no GenerationConfig object 2bbaeb3 hardkpentium101 Qwen-Coder commited on 28 days ago
Remove redundant config override - set generation_config once b0da3b5 hardkpentium101 Qwen-Coder commited on 28 days ago
Force override model.config max_length and max_new_tokens to fix warning 52b77b9 hardkpentium101 Qwen-Coder commited on 28 days ago
Use structured prompt format (CONTEXT/OBJECTIVE/STYLE/TONE/AUDIENCE/RESPONSE) bea43dd hardkpentium101 Qwen-Coder commited on 28 days ago
Simplify prompt and validation - robust creative output e6027ef hardkpentium101 Qwen-Coder commited on 28 days ago
Fix syntax error: missing quote in invalid_patterns 9a520d5 hardkpentium101 Qwen-Coder commited on 28 days ago
Use comprehensive creative writer prompt with strict output validation 1c91cf7 hardkpentium101 Qwen-Coder commited on 28 days ago
Fix prompt leakage: simpler prompt format, stricter filtering for exact leakage patterns 3b06a13 hardkpentium101 Qwen-Coder commited on 28 days ago
Set max_length=None in GenerationConfig to override model defaults 9164057 hardkpentium101 Qwen-Coder commited on 28 days ago
Fix max_length conflict by explicitly setting max_length=None in pipeline 355f389 hardkpentium101 Qwen-Coder commited on 28 days ago
Add poem/story specifications: type, theme, length, style 5737e4a hardkpentium101 Qwen-Coder commited on 28 days ago
Add output validation and strengthen prompt against meta-commentary e9c1b7a hardkpentium101 Qwen-Coder commited on 28 days ago
Add user-selected language to prompt for proper language response 5c2c171 hardkpentium101 Qwen-Coder commited on 28 days ago
Restructure prompt for direct creative output, improve meta-commentary filtering 5d1a0cf hardkpentium101 Qwen-Coder commited on 28 days ago
Improve output cleaning: remove </s>, [INST], bracket numbers bdbc8d1 hardkpentium101 Qwen-Coder commited on 28 days ago
Strengthen prompt: never refer to or explain context eae5493 hardkpentium101 Qwen-Coder commited on 28 days ago
Add output cleaning to filter informal/garbled text ff4f2a3 hardkpentium101 Qwen-Coder commited on 28 days ago
Update prompt for creative writing, increase temperature to 0.9 and top_p to 0.92 bebbcff hardkpentium101 Qwen-Coder commited on 28 days ago
Set generation_config on model directly to avoid duplicate param error 177f8b5 hardkpentium101 Qwen-Coder commited on 28 days ago
Use GenerationConfig to avoid parameter conflict warnings d8b5182 hardkpentium101 Qwen-Coder commited on 28 days ago
Fix max_length conflict, set max_new_tokens to 1024 93d3286 hardkpentium101 Qwen-Coder commited on 28 days ago
Increase max_tokens to 4096, send top 3 docs, truncate context to 800 chars cf6e686 hardkpentium101 Qwen-Coder commited on 28 days ago
Increase max_new_tokens to 1024 for longer responses 27f1789 hardkpentium101 Qwen-Coder commited on 28 days ago
Update prompt for Hindi literature expertise and multilingual support 8e3ac92 hardkpentium101 Qwen-Coder commited on 28 days ago
Fix prompt template and increase max_new_tokens to 512 c8b8fcf hardkpentium101 Qwen-Coder commited on 28 days ago
Fix CPU inference: auto-detect GPU, use float16 on CPU 7e8fd52 hardkpentium101 Qwen-Coder commited on 28 days ago
Use bitsandbytes 4-bit quantization instead of AirLLM (more stable) 83eb81f hardkpentium101 Qwen-Coder commited on 28 days ago
Use AirLLM 4-bit quantization for Sarvam-1 (uses ~1.5GB RAM) c47fb58 hardkpentium101 Qwen-Coder commited on 28 days ago
Switch to TinyLlama-1.1B with float16 for lower memory 916bdad hardkpentium101 Qwen-Coder commited on 28 days ago
Improve system prompt for better RAG responses a059043 hardkpentium101 Qwen-Coder commited on 28 days ago
Optimize generation params for HF free tier CPU 3343db3 hardkpentium101 Qwen-Coder commited on 28 days ago
Remove local_files_only from pipeline - not valid for model_kwargs 404a31f hardkpentium101 Qwen-Coder commited on 28 days ago
Fix Sarvam-1 model loading: enable download and use correct dtype parameter a9f11ed hardkpentium101 Qwen-Coder commited on 28 days ago
Fix duplicate local_files_only keyword argument in Sarvam-1 initialization 52796bf hardkpentium101 Qwen-Coder commited on 28 days ago
Pre-download models in Dockerfile, use cache at runtime d69e53e hardkpentium101 Qwen-Coder commited on 29 days ago