Commit History

Enable concurrency_limit=10 for better parallel processing
0b2f34f

david167 commited on

Upgrade to Llama 3.1 8B-Instruct for better long-form content
6ea58d5

david167 commited on

Fix build error: Remove flash-attn dependency
7629837

david167 commited on

Speed optimizations: Switch to Mistral-7B + optimize generation params
fac0be2

david167 commited on

COMPLETE API REBUILD: ZERO TRUNCATION PRINCIPLE - Intelligent extraction, generous tokens, never cut content
7822d6f

david167 commited on

ULTRA CONSERVATIVE EXTRACTION: Find JSON array boundaries properly, extensive logging, no aggressive cutting
d82dc35

david167 commited on

FIX TRUNCATION: Improved response extraction logic, conservative cutting, detailed logging - NO MORE TRUNCATION && git push
f52c60e

david167 commited on

FIX TUPLE ISSUE: Return single string output instead of tuple - eliminates ('content', '') wrapper
1644c5e

david167 commited on

DIRECT CONTENT API: Return just the generated content, no API wrappers or tuples - perfect for client parsing
0460f5e

david167 commited on

ULTRA SIMPLE FIX: Remove ALL JSON components, use only Textbox inputs/outputs, no state anywhere
02333f2

david167 commited on

BULLETPROOF API: Remove ALL State components, use JSON inputs instead, proper input/output matching, ZERO GRADIO ERRORS
caf4bcb

david167 commited on

SIMPLE WORKING API: Fix Gradio interface issues, use simple Interface instead of Blocks, proper API structure
657d622

david167 commited on

Fix Gradio interface: Remove chatbot format issues, add proper API endpoint structure
b2df124

david167 commited on

ELEGANT API REWRITE: Clean architecture, smart token allocation, proper JSON extraction - eliminate placeholder generation
0cdc4eb

david167 commited on

MAXIMUM TOKEN SETTINGS: Use 131k context, 16k max_new_tokens, 2k min_tokens for CoT - eliminate all truncation
14f445d

david167 commited on

Aggressive fix for CoT truncation: increase min_new_tokens to 1500, suppress EOS token for CoT requests, cap max_new_tokens
b394386

david167 commited on

Fix CoT truncation: increase min_new_tokens to 1000, add generation logging, improve truncated JSON handling
2e7d584

david167 commited on

Improve Chain of Thinking support: increase min_new_tokens to 500 for CoT requests, improve JSON bracket tracking for nested objects
04a4f80

david167 commited on

Fix response truncation - improve extraction logic to find actual content start
678e0f9

david167 commited on

Simplify API - remove all templates, just prompt-in response-out
6bf8feb

david167 commited on

Add 'list' template for better summarization with specific content extraction
19607d6

david167 commited on

Fix JSON templates - use instructional format instead of literal examples
07655f2

david167 commited on

Fix response extraction - prevent truncation at beginning of JSON responses
7f68863

david167 commited on

Fix JSON array generation - add explicit array requirements and improve JSON parsing
8860e75

david167 commited on

Increase max_new_tokens to 8192 for unlimited length responses
1ba70a2

david167 commited on

Improve model generation parameters and add logging - fix response truncation issues
342694d

david167 commited on

Fix JSON response templates for better prompt generation - simplified templates for more reliable JSON parsing
4ad994e

david167 commited on

Fix invisible text with comprehensive CSS targeting
f83cad9

david167 commited on

Fix font visibility: Add dark text colors for better contrast
02ad4bf

david167 commited on

Fix UI: Reduce dialog size and prevent input focus layout shifts
6849ba1

david167 commited on

Fix requirements.txt - Add missing transformers and ML dependencies
0f21de6

david167 commited on

Major update: Add NFL training data generation and improve model handling
992eedb

david167 commited on

Fix layout jumping when focusing input field
c106c31

david167 commited on

Add complete JSON functionality to Gradio interface
f364fe3

david167 commited on

Add JSON imports for structured response functionality
f093b76

david167 commited on

Update UI for full-width display on big screens with responsive design
a7cf970

david167 commited on

Fix: Add missing generated_text variable definition
8b4bf36

david167 commited on

DEBUG: Show complete raw model output and prompt to identify clipping source
0c60639

david167 commited on

COMPLETE REWRITE: Clean ChatGPT-style interface with proper response handling
fcef7cd

david167 commited on

TEMPORARY: Show full model response for debugging clipping issue
8106bb9

david167 commited on

Fix response truncation: disable early stopping, increase token limits to 4096, add debugging logs
4185c2a

david167 commited on

Fix response clipping: use robust assistant header detection instead of prompt length
0d85e38

david167 commited on

Replace with simplified raw chat interface for prompt testing
625d819

david167 commited on

Fix syntax errors: correct comma placement and indentation
0b607e8

david167 commited on

Fix remaining device_map auto in gradio_app.py
c86959d

david167 commited on

Force all CUDA operations to cuda:0 and use device_map to prevent multi-GPU distribution
01a04bc

david167 commited on

Fix multi-GPU device placement error: disable device_map auto and ensure tensors on same device
0331461

david167 commited on

Fix duplicate import in app.py
000a38b

david167 commited on

Switch to Llama-3.1-8B-Instruct: update model loading, prompts, and generation parameters
8b5e9db

david167 commited on

Switch back to Llama-3.1-8B-Instruct model: update prompts, generation params, and UI descriptions
e6b5afc

david167 commited on