ReadPaper Blog
Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings
This paper investigates why large language models with strong zero-shot generation ability often perform poorly as off-the-shelf text embedding models. It argues that raw LLM embeddings are biased toward frequent, semantically uninformative tokens, identifies the unembedding matrix as a mechanistic lens on that failure, and introduces EmbedFilter, a training-free linear transformation that improves zero-shot embedding quality while enabling dimensionality reduction.
Source: Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

The Missing Power of LLM Embeddings
The paper addresses a practical gap between the broad zero-shot competence of large language models and their weaker performance when used directly as text embedding models. Wu, Chen, Liu, Cui, Li, and Yan frame this as a representation problem rather than merely a prompting problem: LLM hidden states can encode useful semantics, but their pooled embeddings are not automatically well shaped for similarity search or retrieval. The authors note that prior prompt-engineering approaches such as PromptEOL and ECHO can improve LLM-derived embeddings, yet these methods are sensitive to prompt design or add computational overhead. Their central claim is that an internal property of the model’s vocabulary projection machinery helps explain the weakness of raw embeddings. By focusing on the unembedding matrix, the paper shifts the problem from surface-level extraction heuristics to a mechanistic account of how token-prediction structure contaminates sentence-level representations.

The Logit Lens Reveal
The first major evidence comes from applying Logit Lens to text embeddings extracted from LLM backbones. Logit Lens projects hidden representations into vocabulary space, allowing the authors to inspect which tokens a given embedding most strongly aligns with. Across examples involving Qwen-2.5-0.5B, Llama-3.1-8B-Instruct, and Mistral-7B-Instruct-V0.3, the highest-probability tokens are reported to be frequent but semantically weak, rather than the most informative words from the input. The paper interprets this as a form of representation collapse: raw embeddings are pulled toward common vocabulary directions that do not discriminate meaning well. This observation motivates the search for a structural source of the bias inside the model rather than treating the failure as accidental noise.

The Hidden Bias Zone
The paper connects this common-token dominance to anisotropy in text embeddings, a known phenomenon in which representations occupy a narrow cone instead of spreading evenly through the embedding space. The authors hypothesize that the centroid of this narrow region behaves like an “average” token, corresponding to a frequency-weighted average embedding over the training corpus. Under this interpretation, raw LLM embeddings contain a strong commonality component that overshadows input-specific semantic features. The Logit Lens results become mechanistically meaningful because the embeddings’ projection into vocabulary space exposes the influence of this average-token region. The implication is that improving LLM embeddings may require suppressing a shared frequency-driven component, not simply choosing a better pooling rule or adding a more elaborate prompt.

The Edge Spectrum Threat
To identify the source of this bias, the paper uses Logit Spectroscopy on the LLM unembedding matrix. The unembedding matrix maps hidden states into vocabulary logits, and its singular value decomposition provides spectral directions that can be tested for their effect on token probabilities. The authors report a latent “edge spectrum” space, spanned by right singular vectors associated with the smallest and largest singular values, that actively writes frequent tokens into the embedding space. When the projection of the reverse-engineered average token onto this edge spectrum is truncated, the logits of frequent tokens are significantly disrupted. This analysis supports the paper’s thesis that the unembedding matrix is not only an output layer for next-token prediction but also a feature lens for diagnosing and refining text embeddings.

EmbedFilter Saves the Day
The proposed method, EmbedFilter, operationalizes this interpretation as a simple linear transformation derived from the unembedding matrix. EmbedFilter filters out the edge spectrum subspace so that high-frequency token influence is suppressed while semantic representation is preserved or improved. The paper presents it as a post-processing technique that requires no additional training and can be applied to embeddings from multiple LLM backbones and extraction setups. In experiments over downstream zero-shot text embedding evaluations, the authors report consistent gains, including up to a 14.1% improvement on MTEB in the excerpted description. Because the transformation is distance-preserving in the relevant sense and naturally supports dimensionality reduction, the paper argues that EmbedFilter can also reduce index storage and speed up retrieval for large-scale embedding deployment.
