Spaces:

hardkpentium101
/

indicRAG

Sleeping

App Files Files Community

indicRAG / backend /src

Commit History

Switch to Qwen-1.5-1.8B-Chat - verified multilingual model with good Indic support

3862877

hardkpentium101 Qwen-Coder commited on 27 days ago

Pass HF_TOKEN explicitly to model loading - more reliable in Docker

aefd0f7

hardkpentium101 Qwen-Coder commited on 27 days ago

Switch to AI4Bharat IndicLLM - better support for 11 Indic languages

057cc64

hardkpentium101 Qwen-Coder commited on 27 days ago

update max new token to 1024

a9b1188

hardkpentium101 commited on 27 days ago

Simplify validation - use general patterns, not hardcoded lists

dd9966e

hardkpentium101 Qwen-Coder commited on 28 days ago

Set generation_config on model only, not passed to pipeline - fixes duplicate arg error

53643df

hardkpentium101 Qwen-Coder commited on 28 days ago

Suppress max_new_tokens/max_length warning - cosmetic only

e9cffd1

hardkpentium101 Qwen-Coder commited on 28 days ago

Create gen_config once, pass to pipeline once - clean implementation

288b50e

hardkpentium101 Qwen-Coder commited on 28 days ago

Use GenerationConfig object passed to pipeline - proper format

bff2384

hardkpentium101 Qwen-Coder commited on 28 days ago

Pass generation params directly to pipeline - no GenerationConfig object

2bbaeb3

hardkpentium101 Qwen-Coder commited on 28 days ago

Remove redundant config override - set generation_config once

b0da3b5

hardkpentium101 Qwen-Coder commited on 28 days ago

Force override model.config max_length and max_new_tokens to fix warning

52b77b9

hardkpentium101 Qwen-Coder commited on 28 days ago

Use structured prompt format (CONTEXT/OBJECTIVE/STYLE/TONE/AUDIENCE/RESPONSE)

bea43dd

hardkpentium101 Qwen-Coder commited on 28 days ago

Simplify prompt and validation - robust creative output

e6027ef

hardkpentium101 Qwen-Coder commited on 28 days ago

Fix syntax error: missing quote in invalid_patterns

9a520d5

hardkpentium101 Qwen-Coder commited on 28 days ago

Use comprehensive creative writer prompt with strict output validation

1c91cf7

hardkpentium101 Qwen-Coder commited on 28 days ago

Fix prompt leakage: simpler prompt format, stricter filtering for exact leakage patterns

3b06a13

hardkpentium101 Qwen-Coder commited on 28 days ago

Set max_length=None in GenerationConfig to override model defaults

9164057

hardkpentium101 Qwen-Coder commited on 28 days ago

Fix max_length conflict by explicitly setting max_length=None in pipeline

355f389

hardkpentium101 Qwen-Coder commited on 28 days ago

Add poem/story specifications: type, theme, length, style

5737e4a

hardkpentium101 Qwen-Coder commited on 28 days ago

Add output validation and strengthen prompt against meta-commentary

e9c1b7a

hardkpentium101 Qwen-Coder commited on 28 days ago

Add user-selected language to prompt for proper language response

5c2c171

hardkpentium101 Qwen-Coder commited on 28 days ago

Restructure prompt for direct creative output, improve meta-commentary filtering

5d1a0cf

hardkpentium101 Qwen-Coder commited on 28 days ago

Improve output cleaning: remove </s>, [INST], bracket numbers

bdbc8d1

hardkpentium101 Qwen-Coder commited on 28 days ago

Strengthen prompt: never refer to or explain context

eae5493

hardkpentium101 Qwen-Coder commited on 28 days ago

Add output cleaning to filter informal/garbled text

ff4f2a3

hardkpentium101 Qwen-Coder commited on 28 days ago

Update prompt for creative writing, increase temperature to 0.9 and top_p to 0.92

bebbcff

hardkpentium101 Qwen-Coder commited on 28 days ago

Set generation_config on model directly to avoid duplicate param error

177f8b5

hardkpentium101 Qwen-Coder commited on 28 days ago

Use GenerationConfig to avoid parameter conflict warnings

d8b5182

hardkpentium101 Qwen-Coder commited on 28 days ago

Fix max_length conflict, set max_new_tokens to 1024

93d3286

hardkpentium101 Qwen-Coder commited on 28 days ago

Increase max_tokens to 4096, send top 3 docs, truncate context to 800 chars

cf6e686

hardkpentium101 Qwen-Coder commited on 28 days ago

Increase max_new_tokens to 1024 for longer responses

27f1789

hardkpentium101 Qwen-Coder commited on 28 days ago

Update prompt for Hindi literature expertise and multilingual support

8e3ac92

hardkpentium101 Qwen-Coder commited on 28 days ago

Fix prompt template and increase max_new_tokens to 512

c8b8fcf

hardkpentium101 Qwen-Coder commited on 28 days ago

Fix CPU inference: auto-detect GPU, use float16 on CPU

7e8fd52

hardkpentium101 Qwen-Coder commited on 28 days ago

Use bitsandbytes 4-bit quantization instead of AirLLM (more stable)

83eb81f

hardkpentium101 Qwen-Coder commited on 28 days ago

Add airllm and optimum to requirements

4e571e5

hardkpentium101 Qwen-Coder commited on 28 days ago

Use AirLLM 4-bit quantization for Sarvam-1 (uses ~1.5GB RAM)

c47fb58

hardkpentium101 Qwen-Coder commited on 28 days ago

Switch to TinyLlama-1.1B with float16 for lower memory

916bdad

hardkpentium101 Qwen-Coder commited on 28 days ago

Tighten prompt to reduce meta-commentary

4dd4fff

hardkpentium101 Qwen-Coder commited on 28 days ago

Improve system prompt for better RAG responses

a059043

hardkpentium101 Qwen-Coder commited on 28 days ago

Add debug logging for prompt/response

71ceb5b

hardkpentium101 Qwen-Coder commited on 28 days ago

Optimize generation params for HF free tier CPU

3343db3

hardkpentium101 Qwen-Coder commited on 28 days ago

Remove local_files_only from pipeline - not valid for model_kwargs

404a31f

hardkpentium101 Qwen-Coder commited on 28 days ago

Fix Sarvam-1 model loading: enable download and use correct dtype parameter

a9f11ed

hardkpentium101 Qwen-Coder commited on 28 days ago

Fix duplicate local_files_only keyword argument in Sarvam-1 initialization

52796bf

hardkpentium101 Qwen-Coder commited on 28 days ago

Pre-download models in Dockerfile, use cache at runtime

d69e53e

hardkpentium101 Qwen-Coder commited on 29 days ago

merge local branch

2e82da7

hardkpentium101 commited on 30 days ago

Commit History

Switch to Qwen-1.5-1.8B-Chat - verified multilingual model with good Indic support 3862877

Pass HF_TOKEN explicitly to model loading - more reliable in Docker aefd0f7

Switch to AI4Bharat IndicLLM - better support for 11 Indic languages 057cc64

update max new token to 1024 a9b1188

Simplify validation - use general patterns, not hardcoded lists dd9966e

Set generation_config on model only, not passed to pipeline - fixes duplicate arg error 53643df

Suppress max_new_tokens/max_length warning - cosmetic only e9cffd1

Create gen_config once, pass to pipeline once - clean implementation 288b50e

Use GenerationConfig object passed to pipeline - proper format bff2384

Pass generation params directly to pipeline - no GenerationConfig object 2bbaeb3

Remove redundant config override - set generation_config once b0da3b5

Force override model.config max_length and max_new_tokens to fix warning 52b77b9

Use structured prompt format (CONTEXT/OBJECTIVE/STYLE/TONE/AUDIENCE/RESPONSE) bea43dd

Simplify prompt and validation - robust creative output e6027ef

Fix syntax error: missing quote in invalid_patterns 9a520d5

Use comprehensive creative writer prompt with strict output validation 1c91cf7

Fix prompt leakage: simpler prompt format, stricter filtering for exact leakage patterns 3b06a13

Set max_length=None in GenerationConfig to override model defaults 9164057

Fix max_length conflict by explicitly setting max_length=None in pipeline 355f389

Add poem/story specifications: type, theme, length, style 5737e4a

Add output validation and strengthen prompt against meta-commentary e9c1b7a

Add user-selected language to prompt for proper language response 5c2c171

Restructure prompt for direct creative output, improve meta-commentary filtering 5d1a0cf

Improve output cleaning: remove </s>, [INST], bracket numbers bdbc8d1

Strengthen prompt: never refer to or explain context eae5493

Add output cleaning to filter informal/garbled text ff4f2a3

Update prompt for creative writing, increase temperature to 0.9 and top_p to 0.92 bebbcff

Set generation_config on model directly to avoid duplicate param error 177f8b5

Use GenerationConfig to avoid parameter conflict warnings d8b5182

Fix max_length conflict, set max_new_tokens to 1024 93d3286

Increase max_tokens to 4096, send top 3 docs, truncate context to 800 chars cf6e686

Increase max_new_tokens to 1024 for longer responses 27f1789

Update prompt for Hindi literature expertise and multilingual support 8e3ac92

Fix prompt template and increase max_new_tokens to 512 c8b8fcf

Fix CPU inference: auto-detect GPU, use float16 on CPU 7e8fd52

Use bitsandbytes 4-bit quantization instead of AirLLM (more stable) 83eb81f

Add airllm and optimum to requirements 4e571e5

Use AirLLM 4-bit quantization for Sarvam-1 (uses ~1.5GB RAM) c47fb58

Switch to TinyLlama-1.1B with float16 for lower memory 916bdad

Tighten prompt to reduce meta-commentary 4dd4fff

Improve system prompt for better RAG responses a059043

Add debug logging for prompt/response 71ceb5b

Optimize generation params for HF free tier CPU 3343db3

Remove local_files_only from pipeline - not valid for model_kwargs 404a31f

Fix Sarvam-1 model loading: enable download and use correct dtype parameter a9f11ed

Fix duplicate local_files_only keyword argument in Sarvam-1 initialization 52796bf

Pre-download models in Dockerfile, use cache at runtime d69e53e

merge local branch 2e82da7

Switch to Qwen-1.5-1.8B-Chat - verified multilingual model with good Indic support

3862877

Pass HF_TOKEN explicitly to model loading - more reliable in Docker

aefd0f7

Switch to AI4Bharat IndicLLM - better support for 11 Indic languages

057cc64

update max new token to 1024

a9b1188

Simplify validation - use general patterns, not hardcoded lists

dd9966e

Set generation_config on model only, not passed to pipeline - fixes duplicate arg error

53643df

Suppress max_new_tokens/max_length warning - cosmetic only

e9cffd1

Create gen_config once, pass to pipeline once - clean implementation

288b50e

Use GenerationConfig object passed to pipeline - proper format

bff2384

Pass generation params directly to pipeline - no GenerationConfig object

2bbaeb3

Remove redundant config override - set generation_config once

b0da3b5

Force override model.config max_length and max_new_tokens to fix warning

52b77b9

Use structured prompt format (CONTEXT/OBJECTIVE/STYLE/TONE/AUDIENCE/RESPONSE)

bea43dd

Simplify prompt and validation - robust creative output

e6027ef

Fix syntax error: missing quote in invalid_patterns

9a520d5

Use comprehensive creative writer prompt with strict output validation

1c91cf7

Fix prompt leakage: simpler prompt format, stricter filtering for exact leakage patterns

3b06a13

Set max_length=None in GenerationConfig to override model defaults

9164057

Fix max_length conflict by explicitly setting max_length=None in pipeline

355f389

Add poem/story specifications: type, theme, length, style

5737e4a

Add output validation and strengthen prompt against meta-commentary

e9c1b7a

Add user-selected language to prompt for proper language response

5c2c171

Restructure prompt for direct creative output, improve meta-commentary filtering

5d1a0cf

Improve output cleaning: remove </s>, [INST], bracket numbers

bdbc8d1

Strengthen prompt: never refer to or explain context

eae5493

Add output cleaning to filter informal/garbled text

ff4f2a3

Update prompt for creative writing, increase temperature to 0.9 and top_p to 0.92

bebbcff

Set generation_config on model directly to avoid duplicate param error

177f8b5

Use GenerationConfig to avoid parameter conflict warnings

d8b5182

Fix max_length conflict, set max_new_tokens to 1024

93d3286

Increase max_tokens to 4096, send top 3 docs, truncate context to 800 chars

cf6e686

Increase max_new_tokens to 1024 for longer responses

27f1789

Update prompt for Hindi literature expertise and multilingual support

8e3ac92

Fix prompt template and increase max_new_tokens to 512

c8b8fcf

Fix CPU inference: auto-detect GPU, use float16 on CPU

7e8fd52

Use bitsandbytes 4-bit quantization instead of AirLLM (more stable)

83eb81f

Add airllm and optimum to requirements

4e571e5

Use AirLLM 4-bit quantization for Sarvam-1 (uses ~1.5GB RAM)

c47fb58

Switch to TinyLlama-1.1B with float16 for lower memory

916bdad

Tighten prompt to reduce meta-commentary

4dd4fff

Improve system prompt for better RAG responses

a059043

Add debug logging for prompt/response

71ceb5b

Optimize generation params for HF free tier CPU

3343db3

Remove local_files_only from pipeline - not valid for model_kwargs

404a31f

Fix Sarvam-1 model loading: enable download and use correct dtype parameter

a9f11ed

Fix duplicate local_files_only keyword argument in Sarvam-1 initialization

52796bf

Pre-download models in Dockerfile, use cache at runtime

d69e53e

merge local branch

2e82da7