Spaces:

Kakaarot
/

Gemma-HuggingFace_TextCompletion_Demo

Build error

App Files Files Community

Kakaarot commited on Apr 8, 2025

Commit

4b486c5

verified ·

1 Parent(s): 36ee1a9

I used 1.75x Tokens as one word and gave an instruction on checking discussion

Browse files

Files changed (1) hide show

app.py +20 -10

app.py CHANGED Viewed

@@ -76,30 +76,38 @@ def generate_text(prompt, tone, max_length, temperature=0.7, top_p=0.9, repetiti
     # This turns our input text (with the tone instruction) into a format (tensors) that the model can process using the tokenizer.
     input_token_length = input_ids.shape[1]  # Get the number of tokens in the input
     # Store the length of the input
     outputs = model.generate(
         inputs["input_ids"],
         # max_length=max_length + len(input_text.split()),
         # This sets how long the generated text can be. We add the number of words in our input text (len(input_text.split())) to the max_length the user picked, so the model knows how many total words to create.
         # CHANGE: Use max_new_tokens for clarity instead of calculating total length
-        max_new_tokens=max_length,
         # Generate THIS many NEW tokens
-        temperature=temperature,
         # This controls how creative the model gets. A lower temperature (e.g., 0.7) keeps things more predictable, while a higher one makes it wilder and more random—think of it like adjusting the spice level!
         top_p=top_p,
         # This is like a filter for word choices. It picks from the top percentage of likely words (e.g., 0.9 means 90% of the best options), making the output diverse but not too crazy.
-        repetition_penalty=repetition_penalty,
         # This stops the model from repeating the same words too much. A higher value (e.g., 1.5) pushes it to try new words, like telling it to mix up its vocabulary!
-        num_return_sequences=1,
         # This tells the model to give us just one version of the text. If we wanted more options, we could change
-        do_sample=True,
-        pad_token_id=tokenizer.eos_token_id # Good practice for generation
     )
-    # --- Decode ONLY the generated part ---
     # Slice the output tensor to get only the tokens AFTER the input tokens
     # This tells the model to generate text: it uses the input IDs, sets a max length, and adjusts creativity with temperature, top_p, and repetition_penalty.
     generated_token_ids = outputs[0, input_token_length:]
-    generated_text = tokenizer.decode(generated_token_ids, skip_special_tokens=True)
     return generated_text # Return only the newly generated text
     # This turns again the model's output back into readable form, skipping any extra tokens we don’t need.
@@ -325,6 +333,8 @@ st.markdown("""
 # This part is generally telling how things work
 st.markdown("""
     <div class="instructions">
         Enter a prompt below to generate text using the Gemma model from DeepMind. Customize the tone and length to see different outputs!<br><br>
         <b>Example:</b> Prompt: "The cat sat on" | Tone: "Funny" | Length: 50 → "The cat sat on my homework and laughed as I cried over my grades."
     </div>
@@ -387,7 +397,7 @@ with st.form(key="input_form"):
     with col1:
         tone = st.selectbox("Tone", ["Funny", "Serious", "Poetic"], index=["Funny", "Serious", "Poetic"].index(st.session_state.get("tone", "Funny")))
     with col2:
-        max_length = st.slider("Word count", 20, 100, 50)
         # This adds a slider for users to set how many words they want in the output, ranging from 20 to 100 with a default of 50.
         # And similarly every slider here works

     # This turns our input text (with the tone instruction) into a format (tensors) that the model can process using the tokenizer.
     input_token_length = input_ids.shape[1]  # Get the number of tokens in the input
     # Store the length of the input
+    # --- Step 1: Estimate Tokens Needed (Increase the buffer) ---
+    # Estimate slightly more tokens than words (e.g., 1.5x or 2x buffer)
+    # Let's use a factor of 1.75 for a larger buffer to increase chances of reaching word count
+    estimated_max_tokens = int(max_length * 1.75)
+    # Add a minimum token generation to avoid tiny requests
+    estimated_max_tokens = max(estimated_max_tokens, 30) # Ensure we generate at least some tokens
+    # --- Step 2: Generate with the higher token limit ---
     outputs = model.generate(
         inputs["input_ids"],
         # max_length=max_length + len(input_text.split()),
         # This sets how long the generated text can be. We add the number of words in our input text (len(input_text.split())) to the max_length the user picked, so the model knows how many total words to create.
         # CHANGE: Use max_new_tokens for clarity instead of calculating total length
+        max_new_tokens = estimated_max_tokens, # Use the higher estimate, meaning 1.75 times the lenght
         # Generate THIS many NEW tokens
+        temperature = temperature,
         # This controls how creative the model gets. A lower temperature (e.g., 0.7) keeps things more predictable, while a higher one makes it wilder and more random—think of it like adjusting the spice level!
         top_p=top_p,
         # This is like a filter for word choices. It picks from the top percentage of likely words (e.g., 0.9 means 90% of the best options), making the output diverse but not too crazy.
+        repetition_penalty = repetition_penalty,
         # This stops the model from repeating the same words too much. A higher value (e.g., 1.5) pushes it to try new words, like telling it to mix up its vocabulary!
+        num_return_sequences = 1,
         # This tells the model to give us just one version of the text. If we wanted more options, we could change
+        do_sample = True,
+        pad_token_id = tokenizer.eos_token_id # Good practice for generation
     )
+    # --- Step 3: Decode ONLY the generated part ---
     # Slice the output tensor to get only the tokens AFTER the input tokens
     # This tells the model to generate text: it uses the input IDs, sets a max length, and adjusts creativity with temperature, top_p, and repetition_penalty.
     generated_token_ids = outputs[0, input_token_length:]
+    generated_text = tokenizer.decode(generated_token_ids, skip_special_tokens=True).strip()
     return generated_text # Return only the newly generated text
     # This turns again the model's output back into readable form, skipping any extra tokens we don’t need.
 # This part is generally telling how things work
 st.markdown("""
     <div class="instructions">
+        <b><a href = "https://huggingface.co/spaces/Kakaarot/Gemma-HuggingFace_TextCompletion_Demo/discussions/1">Please check the discussion</a><b>, I mentioned there the reason, why your first response will take little more time.
+        Thanks for understanding, Now Enjoyyy 😁 <br><br>
         Enter a prompt below to generate text using the Gemma model from DeepMind. Customize the tone and length to see different outputs!<br><br>
         <b>Example:</b> Prompt: "The cat sat on" | Tone: "Funny" | Length: 50 → "The cat sat on my homework and laughed as I cried over my grades."
     </div>
     with col1:
         tone = st.selectbox("Tone", ["Funny", "Serious", "Poetic"], index=["Funny", "Serious", "Poetic"].index(st.session_state.get("tone", "Funny")))
     with col2:
+        max_length = st.slider("Word count", 20, 100, 50, help="Tries to generate text close to this word count. Output might be shorter if the model finishes early, or slightly different due to word splitting. I am considering 1.75 tokens as one word.")
         # This adds a slider for users to set how many words they want in the output, ranging from 20 to 100 with a default of 50.
         # And similarly every slider here works