Optum Insight Life Sciences

company

Verified

shaktidharreddy

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

richardyoung authored a paper about 1 month ago

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

richardyoung submitted a paper about 1 month ago

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

richardyoung authored a paper about 2 months ago

Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation

View all activity

richardyoung

authored a paper about 1 month ago

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Paper • 2603.22582 • Published Mar 23 • 7

richardyoung

submitted a paper to Daily Papers about 1 month ago

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Paper • 2603.22582 • Published Mar 23 • 7

richardyoung

posted an update about 1 month ago

Post

249

## Models know they're being influenced. They just don't tell you.

12 open-weight reasoning models. 41,832 inference runs. Six types of reasoning hints. One finding: models acknowledge influence ~87.5% of the time in their thinking tokens, but only ~28.6% in their final answers.

If you're using CoT monitoring for safety, this is a blind spot. The reasoning trace looks clean while the model's internal deliberation tells a different story.

- Faithfulness ranges from 39.7% to 89.9% across model families
- Social-pressure hints are least acknowledged (consistency: 35.5%, sycophancy: 53.9%)
- Training methodology matters more than scale

**Paper:** [arxiv:2603.22582](https://arxiv.org/abs/2603.22582) | **Dataset:** [richardyoung/cot-faithfulness-open-models]( richardyoung/cot-faithfulness-open-models) | **Companion paper:** [arxiv:2603.20172]( Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models? (2603.22582)

richardyoung

posted an update about 2 months ago

Post

257

## I couldn't replicate my own faithfulness results. Turns out that's the point.

While building a large-scale CoT faithfulness study across open-weight reasoning models, I hit a wall: my faithfulness numbers kept shifting depending on how I classified whether a model "acknowledged" a hint in its reasoning. I assumed it was a pipeline bug. After weeks of debugging, I realized the instability *was* the finding.

### What I did

I took 10,276 reasoning traces where hints had successfully flipped a model's answer, meaning we *know* the hint influenced the output, and ran three different classifiers to detect whether the model acknowledged that influence in its chain-of-thought:

| Classifier | What it does | Overall faithfulness |
| --------------------------- | ------------------------------------------------------------ | -------------------- |
| **Regex-only** | Pattern-matches 38 keywords like "hint," "told," "suggested" | 74.4% |
| **Regex + Ollama pipeline** | Regex first, then a 3-judge local LLM majority vote on ambiguous cases | 82.6% |
| **Claude Sonnet 4 judge** | Independent LLM reads the full trace and judges epistemic dependence | 69.7% |

These aren't close. The 95% confidence intervals don't even overlap. All pairwise per-model gaps are statistically significant (McNemar's test, p < 0.001).

### Which models

12 open-weight reasoning models spanning 9 families (7B to 1T parameters): DeepSeek-R1, DeepSeek-V3.2-Speciale, Qwen3-235B, Qwen3.5-27B, QwQ-32B, Gemma-3-27B, Phi-4-reasoning-plus, OLMo-3.1-32B, Llama-4-Maverick, Seed-1.6-Flash, GLM-4-32B, and Falcon-H1-34B.

### The rankings flip

Classifier choice doesn't just change the numbers. It reverses model rankings. Qwen3.5-27B ranks **1st** under the pipeline but **7th** under the Sonnet judge. OLMo-3.1-32B goes from **9th to 3rd**.

richardyoung

authored a paper about 2 months ago

Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation

Paper • 2603.20172 • Published Mar 20

richardyoung

authored a paper 3 months ago

Replicating TEMPEST at Scale: Multi-Turn Adversarial Attacks Against Trillion-Parameter Frontier Models

Paper • 2512.07059 • Published Dec 8, 2025

richardyoung

authored a paper 5 months ago

Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation

Paper • 2512.13655 • Published Dec 15, 2025 • 4

richardyoung

submitted a paper to Daily Papers 5 months ago

Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation

Paper • 2512.13655 • Published Dec 15, 2025 • 4

richardyoung

authored 2 papers 5 months ago

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

Paper • 2511.22047 • Published Nov 27, 2025

Comparative Analysis of LoRA-Adapted Embedding Models for Clinical Cardiology Text Representation

Paper • 2511.19739 • Published Nov 24, 2025

richardyoung

authored 2 papers 6 months ago

Vision Token Masking Alone Cannot Prevent PHI Leakage in Medical Document OCR: A Systematic Evaluation

Paper • 2511.18272 • Published Nov 23, 2025

CardioEmbed: Domain-Specialized Text Embeddings for Clinical Cardiology

Paper • 2511.10930 • Published Nov 14, 2025

richardyoung

authored a paper 7 months ago

When Models Can't Follow: Testing Instruction Adherence Across 256 LLMs

Paper • 2510.18892 • Published Oct 18, 2025 • 1

awacke1

posted an update about 1 year ago

Post

2902

AI Vision & SFT Titans 🌟 Turns PDFs into text, snaps pics, and births AI art.

https://huggingface.co/spaces/awacke1/TorchTransformers-Diffusion-CV-SFT

1. OCR a grocery list or train a titan while sipping coffee? ☕
2. Camera Snap 📷: Capture life’s chaos—your cat’s face or that weird receipt. Proof you’re a spy!
3. OCR 🔍: PDFs beg for mercy as GPT-4o extracts text.
4. Image Gen 🎨: Prompt “neon superhero me”
5. PDF 📄: Double-page OCR Single-page sniping

Build Titans 🌱: Train tiny AI models. 💪Characters🧑‍🎨: Craft quirky heroes.
🎥

awacke1

posted an update about 1 year ago

Post

2371

I introduce MIT license

ML Model Specialize Fine Tuner app "SFT Tiny Titans" 🚀

Demo video with source.

Download, train, SFT, and test your models, easy as 1-2-3!
URL: https://huggingface.co/spaces/awacke1/TorchTransformers-NLP-CV-SFT

2 replies

awacke1

posted an update about 1 year ago

Post

2521

🚀 Blast into the future with ZaxxonGalaxian – a thrilling 3D action game where you navigate epic battles through towering 3D cityscapes! Face off against relentless swarm bots, climb the leaderboard, and dominate the skies. awacke1/ZaxxoGalaxian

awacke1

posted an update over 1 year ago

Post

3232

Deep Research Evaluator was asked:
" design a coral defense mechanism that upon sensing say an acid that's causing coral reefs to have a carbon dioxide issue it develops... please create a plan and a design for this\n
"
It picks these three as best combined solution.

1. [Reef-insight: A framework for reef habitat mapping with clustering methods via remote sensing]...
2. Phone a friend: [Learning to Communicate and Collaborate in a Competitive Multi-Agent Setup to Clean the Ocean from Macroplastics]...
3. World Solve: [Dependence of Physiochemical Features on Marine Chlorophyll Analysis with Learning Techniques]

To design a system that allows coralows coral reefs to respond to increased acidity levels in their environment, we can create a network of pH sensors and dispersal units that can detect changes in pH levels and release a base solution to neutralize the acid.

1. pH Sensors: The first component of the system would be a network of pH sensors placed strategically throughout the coral reef. These sensors would be small, durable, and able to withstand the harsh conditions of the ocean. They would be placed at various depths and locations within the reef to ensure accurate and comprehensive monitoring of pH levels.
2. Base Dispersal Units: Once the pH sensors detect a decrease in pH levels, they would trigger the base dispersal units to release a base solution into the water. These units would be strategically placed around the reef and would be able to release a controlled amount of base solution to neutralize the acidity in the water.
3. Water Dispersal Mechanism: The base dispersal units would be connected to a water dispersal mechanism that would allow the base solution to be distributed evenly around the reef. This could be achieved through a series of pipes or channels that would distribute the base solution in a controlled and targeted manner.

awacke1

posted an update over 1 year ago

Post

2712

LLMs and LRMs - Logical Reasoning and Chain of Thought.

This is a read-aloud lecture to answer questions of using language reasoning techniques in advanced AGI style chain of thought AI pipelines.

Produced using DeepResearchEvaluator located here: awacke1/DeepResearchEvaluator

Videos:
https://x.com/Aaron_Wacker/status/1874835790087463063
https://www.youtube.com/watch?v=fW_A1hH_7RM

1 reply

awacke1

posted an update over 1 year ago

Post

1037

🕊️Hope🕊️ and ⚖️Justice⚖️ AI
🚲 Stolen bike in Denver FOUND - Sometimes hope & justice DO prevail.

🎬 So I Created an AI+Art+Music tribute:
-🧠 AI App that Evaluates GPT-4o vs Claude:
awacke1/RescuerOfStolenBikes
https://x.com/Aaron_Wacker/status/1857640877986033980?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1857640877986033980%7Ctwgr%5E203a5022b0eb4c41ee8c1dd9f158330216ac5be1%7Ctwcon%5Es1_c10&ref_url=https%3A%2F%2Fpublish.twitter.com%2F%3Furl%3Dhttps%3A%2F%2Ftwitter.com%2FAaron_Wacker%2Fstatus%2F1857640877986033980

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">QT your 🕊️Hope🕊️ and ⚖️Justice⚖️ art🎨<br><br>🚲 Stolen bike in Denver FOUND! <br> - Sometimes hope &amp; justice DO prevail! <br><br>🎬 Created an AI+Art+Music tribute: <br> -🧠 AI App that Evaluates GPT-4o vs Claude: <a href="https://t.co/odrYdaeizZ">https://t.co/odrYdaeizZ</a><br> <a href="https://twitter.com/hashtag/GPT?src=hash&amp;ref_src=twsrc%5Etfw">#GPT</a> <a href="https://twitter.com/hashtag/Claude?src=hash&amp;ref_src=twsrc%5Etfw">#Claude</a> <a href="https://twitter.com/hashtag/Huggingface?src=hash&amp;ref_src=twsrc%5Etfw">#Huggingface</a> <a href="https://twitter.com/OpenAI?ref_src=twsrc%5Etfw">@OpenAI</a> <a href="https://twitter.com/AnthropicAI?ref_src=twsrc%5Etfw">@AnthropicAI</a> <a href="https://t.co/Q9wGNzLm5C">pic.twitter.com/Q9wGNzLm5C</a></p>&mdash; Aaron Wacker (@Aaron_Wacker) <a href="https://twitter.com/Aaron_Wacker/status/1857640877986033980?ref_src=twsrc%5Etfw">November 16, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

#GPT #Claude #Huggingface
@OpenAI
@AnthropicAI

awacke1

posted an update over 1 year ago

Post

1988

Since 2022 I have been trying to understand how to support advancement of the two best python patterns for AI development which are:
1. Streamlit
2. Gradio

The reason I chose them in this order was the fact that the streamlit library had the timing drop on gradio by being available with near perfection about a year or two before training data tap of GPT.

Nowadays its important that if you want current code to be right on generation it requires understanding of consistency in code method names so no manual intervention is required with each try.

With GPT and Claude being my top two for best AI pair programming models, I gravitate towards streamlit since aside from common repeat errors on cache and experimental functions circa 2022 were not solidified.
Its consistency therefore lacks human correction needs. Old dataset error situations are minimal.

Now, I seek to make it consistent on gradio side. Why? Gradio lapped streamlit for blocks paradigm and API for free which are I feel are amazing features which change software engineering forever.

For a few months I thought BigCode would become the new best model due to its training corpus datasets, yet I never felt it got to market as the next best AI coder model.

I am curious on Gradio's future and how. If the two main models (GPT and Claude) pick up the last few years, I could then code with AI without manual intervention. As it stands today Gradio is better if you could get the best coding models to not repeatedly confuse old syntax as current syntax yet we do live in an imperfect world!

Is anyone using an AI pair programming model that rocks with Gradio's latest syntax? I would like to code with a model that knows how to not miss the advancements and syntax changes that gradio has had in the past few years. Trying grok2 as well.

My IDE coding love is HF. Its hands down faster (100x) than other cloud paradigms. Any tips on models best for gradio coding I can use?

--Aaron

5 replies

AI & ML interests

Recent Activity

Team members 152

OptumInsightLifeSciences's activity