Spaces:

bhsinghgrid
/

DevaFlow-space

Sleeping

App Files Files Community

DevaFlow-space / analysis /reports /task3_concept_vectors_report.md

bhsinghgrid

Upgrade UI: model selection + tasks 1-5 + analysis modules

29e5bf8 verified 9 days ago

preview code

raw

history blame contribute delete

4.18 kB

A newer version of the Gradio SDK is available: 6.10.0

Upgrade

Task 3 Report: Concept Vectors and PCA-Based Steering

1. Objective

Task 3 explores whether decoder hidden states contain a measurable direction corresponding to paraphrase diversity. The idea is:

collect hidden states from many validation samples
fit PCA to the hidden-state space
find a principal direction correlated with output diversity
steer generation along that direction

This is an advanced representation-learning experiment. Its value for mentor evaluation lies in showing that the project is not limited to training and inference, but also investigates controllable generation.

2. Implementation Approach

The implementation is in analysis/concept_vectors.py. Hidden states are captured from the decoder during cached inference and pooled across sequence positions.

PCA Fitting Snippet

def fit_pca(hidden_matrix, n_components=50):
    from sklearn.decomposition import PCA
    n_comp = min(n_components, hidden_matrix.shape[0] - 1, hidden_matrix.shape[1])
    pca = PCA(n_components=n_comp)
    pca.fit(hidden_matrix)
    return pca

Steering Snippet

if alpha != 0.0:
    x = x + alpha * dir_tensor.unsqueeze(0).unsqueeze(0)

logits = inner.head(x)

The steering mechanism adds a learned direction in hidden-state space before projection to logits.

3. Experimental Setup

Task 3 was run from the shared analysis driver and generated:

The run used 500 validation examples for hidden-state extraction.

4. Results

Observed summary:

PCA components retained: 50
total explained variance: 96.1%
selected diversity principal component: PC 1
absolute correlation with output length: 0.303

On paper, these values suggest that hidden-state variation is structured and that at least one direction correlates with output-length changes. That is a positive sign from a representation-analysis standpoint.

However, the actual diversity spectrum outputs are not semantically convincing. The steered generations are highly repetitive and mostly malformed token sequences rather than clear paraphrases with controlled variation.

5. Interpretation

This task should be presented carefully.

What is supported:

hidden states are rich enough for PCA analysis
the representation space is not random noise
controllable steering infrastructure has been implemented successfully

What is not yet supported:

interpretable semantic control
high-quality paraphrase diversity
evidence that the identified direction reflects useful linguistic variation

For mentor evaluation, this is best framed as a promising exploratory experiment rather than a finished result.

6. Benefits

Benefits of the task include:

opens a path toward controllable paraphrase generation
demonstrates hidden-state instrumentation beyond standard inference
provides a research direction for future work on style and diversity control
connects model analysis with possible user-facing controllability

7. Limitations

The main limitation is output quality. Even though the PCA statistics look reasonable, the steered generations are not linguistically strong enough to claim meaningful semantic control. This makes the current result more useful as a prototype than as a validated research finding.

8. Conclusion

Task 3 is not yet strong enough as a final evaluation result, but it is valuable as research evidence of advanced model analysis. For mentor discussion, it should be described as an experimental controllability framework that has been implemented successfully but still requires better base model quality before the steering outputs become persuasive.