AI & ML interests

None defined yet.

Recent Activity

richardyoung 
posted an update about 1 month ago
view post
Post
248
## Models know they're being influenced. They just don't tell you.

12 open-weight reasoning models. 41,832 inference runs. Six types of reasoning hints. One finding: models acknowledge influence ~87.5% of the time in their thinking tokens, but only ~28.6% in their final answers.

If you're using CoT monitoring for safety, this is a blind spot. The reasoning trace looks clean while the model's internal deliberation tells a different story.

- Faithfulness ranges from 39.7% to 89.9% across model families
- Social-pressure hints are least acknowledged (consistency: 35.5%, sycophancy: 53.9%)
- Training methodology matters more than scale

**Paper:** [arxiv:2603.22582](https://arxiv.org/abs/2603.22582) | **Dataset:** [richardyoung/cot-faithfulness-open-models]( richardyoung/cot-faithfulness-open-models) | **Companion paper:** [arxiv:2603.20172]( Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models? (2603.22582)
richardyoung 
posted an update about 2 months ago
view post
Post
256
## I couldn't replicate my own faithfulness results. Turns out that's the point.

While building a large-scale CoT faithfulness study across open-weight reasoning models, I hit a wall: my faithfulness numbers kept shifting depending on how I classified whether a model "acknowledged" a hint in its reasoning. I assumed it was a pipeline bug. After weeks of debugging, I realized the instability *was* the finding.

### What I did

I took 10,276 reasoning traces where hints had successfully flipped a model's answer, meaning we *know* the hint influenced the output, and ran three different classifiers to detect whether the model acknowledged that influence in its chain-of-thought:

| Classifier | What it does | Overall faithfulness |
| --------------------------- | ------------------------------------------------------------ | -------------------- |
| **Regex-only** | Pattern-matches 38 keywords like "hint," "told," "suggested" | 74.4% |
| **Regex + Ollama pipeline** | Regex first, then a 3-judge local LLM majority vote on ambiguous cases | 82.6% |
| **Claude Sonnet 4 judge** | Independent LLM reads the full trace and judges epistemic dependence | 69.7% |

These aren't close. The 95% confidence intervals don't even overlap. All pairwise per-model gaps are statistically significant (McNemar's test, p < 0.001).

### Which models

12 open-weight reasoning models spanning 9 families (7B to 1T parameters): DeepSeek-R1, DeepSeek-V3.2-Speciale, Qwen3-235B, Qwen3.5-27B, QwQ-32B, Gemma-3-27B, Phi-4-reasoning-plus, OLMo-3.1-32B, Llama-4-Maverick, Seed-1.6-Flash, GLM-4-32B, and Falcon-H1-34B.

### The rankings flip

Classifier choice doesn't just change the numbers. It reverses model rankings. Qwen3.5-27B ranks **1st** under the pipeline but **7th** under the Sonnet judge. OLMo-3.1-32B goes from **9th to 3rd**.

awacke1 
posted an update about 1 year ago
view post
Post
2902
AI Vision & SFT Titans 🌟 Turns PDFs into text, snaps pics, and births AI art.

https://huggingface.co/spaces/awacke1/TorchTransformers-Diffusion-CV-SFT

1. OCR a grocery list or train a titan while sipping coffee? ☕
2. Camera Snap 📷: Capture life’s chaos—your cat’s face or that weird receipt. Proof you’re a spy!
3. OCR 🔍: PDFs beg for mercy as GPT-4o extracts text.
4. Image Gen 🎨: Prompt “neon superhero me”
5. PDF 📄: Double-page OCR Single-page sniping

Build Titans 🌱: Train tiny AI models. 💪Characters🧑‍🎨: Craft quirky heroes.
🎥

awacke1 
posted an update about 1 year ago
awacke1 
posted an update about 1 year ago
view post
Post
2521
🚀 Blast into the future with ZaxxonGalaxian – a thrilling 3D action game where you navigate epic battles through towering 3D cityscapes! Face off against relentless swarm bots, climb the leaderboard, and dominate the skies. awacke1/ZaxxoGalaxian
awacke1 
posted an update over 1 year ago
view post
Post
3232
Deep Research Evaluator was asked:
" design a coral defense mechanism that upon sensing say an acid that's causing coral reefs to have a carbon dioxide issue it develops... please create a plan and a design for this\n
"
It picks these three as best combined solution.

1. [Reef-insight: A framework for reef habitat mapping with clustering methods via remote sensing]...
2. Phone a friend: [Learning to Communicate and Collaborate in a Competitive Multi-Agent Setup to Clean the Ocean from Macroplastics]...
3. World Solve: [Dependence of Physiochemical Features on Marine Chlorophyll Analysis with Learning Techniques]


To design a system that allows coralows coral reefs to respond to increased acidity levels in their environment, we can create a network of pH sensors and dispersal units that can detect changes in pH levels and release a base solution to neutralize the acid.

1. pH Sensors: The first component of the system would be a network of pH sensors placed strategically throughout the coral reef. These sensors would be small, durable, and able to withstand the harsh conditions of the ocean. They would be placed at various depths and locations within the reef to ensure accurate and comprehensive monitoring of pH levels.
2. Base Dispersal Units: Once the pH sensors detect a decrease in pH levels, they would trigger the base dispersal units to release a base solution into the water. These units would be strategically placed around the reef and would be able to release a controlled amount of base solution to neutralize the acidity in the water.
3. Water Dispersal Mechanism: The base dispersal units would be connected to a water dispersal mechanism that would allow the base solution to be distributed evenly around the reef. This could be achieved through a series of pipes or channels that would distribute the base solution in a controlled and targeted manner.
awacke1 
posted an update over 1 year ago
awacke1 
posted an update over 1 year ago
view post
Post
1037
🕊️Hope🕊️ and ⚖️Justice⚖️ AI
🚲 Stolen bike in Denver FOUND - Sometimes hope & justice DO prevail.

🎬 So I Created an AI+Art+Music tribute:
-🧠 AI App that Evaluates GPT-4o vs Claude:
awacke1/RescuerOfStolenBikes
https://x.com/Aaron_Wacker/status/1857640877986033980?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1857640877986033980%7Ctwgr%5E203a5022b0eb4c41ee8c1dd9f158330216ac5be1%7Ctwcon%5Es1_c10&ref_url=https%3A%2F%2Fpublish.twitter.com%2F%3Furl%3Dhttps%3A%2F%2Ftwitter.com%2FAaron_Wacker%2Fstatus%2F1857640877986033980

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">QT your 🕊️Hope🕊️ and ⚖️Justice⚖️ art🎨<br><br>🚲 Stolen bike in Denver FOUND! <br> - Sometimes hope &amp; justice DO prevail! <br><br>🎬 Created an AI+Art+Music tribute: <br> -🧠 AI App that Evaluates GPT-4o vs Claude: <a href="https://t.co/odrYdaeizZ">https://t.co/odrYdaeizZ</a><br> <a href="https://twitter.com/hashtag/GPT?src=hash&amp;ref_src=twsrc%5Etfw">#GPT</a> <a href="https://twitter.com/hashtag/Claude?src=hash&amp;ref_src=twsrc%5Etfw">#Claude</a> <a href="https://twitter.com/hashtag/Huggingface?src=hash&amp;ref_src=twsrc%5Etfw">#Huggingface</a> <a href="https://twitter.com/OpenAI?ref_src=twsrc%5Etfw">@OpenAI</a> <a href="https://twitter.com/AnthropicAI?ref_src=twsrc%5Etfw">@AnthropicAI</a> <a href="https://t.co/Q9wGNzLm5C">pic.twitter.com/Q9wGNzLm5C</a></p>&mdash; Aaron Wacker (@Aaron_Wacker) <a href="https://twitter.com/Aaron_Wacker/status/1857640877986033980?ref_src=twsrc%5Etfw">November 16, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>


#GPT #Claude #Huggingface
@OpenAI
@AnthropicAI
awacke1 
posted an update over 1 year ago
view post
Post
1988
Since 2022 I have been trying to understand how to support advancement of the two best python patterns for AI development which are:
1. Streamlit
2. Gradio

The reason I chose them in this order was the fact that the streamlit library had the timing drop on gradio by being available with near perfection about a year or two before training data tap of GPT.

Nowadays its important that if you want current code to be right on generation it requires understanding of consistency in code method names so no manual intervention is required with each try.

With GPT and Claude being my top two for best AI pair programming models, I gravitate towards streamlit since aside from common repeat errors on cache and experimental functions circa 2022 were not solidified.
Its consistency therefore lacks human correction needs. Old dataset error situations are minimal.

Now, I seek to make it consistent on gradio side. Why? Gradio lapped streamlit for blocks paradigm and API for free which are I feel are amazing features which change software engineering forever.

For a few months I thought BigCode would become the new best model due to its training corpus datasets, yet I never felt it got to market as the next best AI coder model.

I am curious on Gradio's future and how. If the two main models (GPT and Claude) pick up the last few years, I could then code with AI without manual intervention. As it stands today Gradio is better if you could get the best coding models to not repeatedly confuse old syntax as current syntax yet we do live in an imperfect world!

Is anyone using an AI pair programming model that rocks with Gradio's latest syntax? I would like to code with a model that knows how to not miss the advancements and syntax changes that gradio has had in the past few years. Trying grok2 as well.

My IDE coding love is HF. Its hands down faster (100x) than other cloud paradigms. Any tips on models best for gradio coding I can use?

--Aaron
  • 5 replies
·