ReadPaper Blog
LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents
LatentSkill addresses a central efficiency and modularity problem in LLM agents: reusable textual skills help agents solve long-horizon tasks, but repeatedly inserting those skills into prompts consumes context, raises prefill cost, and exposes procedural knowledge as plaintext. The paper proposes a hypernetwork-based framework that compiles each textual skill into a plug-and-play LoRA adapter, storing procedural knowledge in weight space while preserving modular loading, scaling, and composition. Experiments on ALFWorld and Search-QA show that this in-weight skill representation can outperform in-context skill prompting while substantially reducing skill-token overhead.
Source: (none provided)

Skill Text vs Skill Weights
The paper starts from the observation that modern LLM agents increasingly rely on reusable textual skills to encode task procedures, tool-use patterns, and recovery strategies. This design is attractive because a skill library can be retrieved and injected into the prompt whenever an agent needs specialized behavior, but the same skill text may be repeated across many decision steps in a long interaction. LatentSkill targets the resulting context overhead, especially the prefill cost created by repeatedly supplying procedural text to the model. The authors also emphasize that plaintext skills expose potentially proprietary procedures and share the instruction channel with environment observations that may be untrusted. The paper’s core motivation is therefore to keep the practical benefits of skill libraries while moving the skill content out of the prompt.

The Gap in Existing Approaches
The paper frames existing approaches as a trilemma between efficiency, privacy-like reduced exposure, and modularity. In-context skill prompting is simple and reusable, but it becomes costly as interactions lengthen and skill libraries grow, and prior work cited by the authors suggests that long inputs can make it harder for models to robustly use all supplied information. Parametric alternatives such as agent fine-tuning or curriculum learning avoid inserting skill text at inference time, but they fuse skills into the backbone model and make individual skills difficult to update, remove, or combine. LatentSkill is positioned between these extremes: it avoids per-step textual skill insertion without permanently merging skills into the frozen backbone. This positioning motivates the paper’s use of adapters as a non-destructive substrate for skill storage.

LatentSkill’s Core Trick
LatentSkill’s main method is a skill compiler, a trained hypernetwork that maps a textual skill document into a set of LoRA updates for a frozen backbone LLM. Formally, the compiler produces a latent skill Δs = Gϕ(s), and the model conditions on task history through the augmented parameters θ ⊕ αΔs rather than through a prompt containing the skill text. For each selected target module, the generated update follows the standard LoRA form with low-rank matrices Bs and As, added to the frozen weight with a scaling coefficient α and rank r. The compiler is trained in two stages: document-level pretraining, where reconstruction and completion objectives force skill information to pass through the generated adapter, and trajectory-supervised fine-tuning, where a single generated adapter must support teacher actions across all steps of an agent trajectory. This training design encourages the adapter to encode stable, skill-level procedural knowledge rather than step-specific prompt content.

Why It Helps in Practice
The paper evaluates LatentSkill on ALFWorld and Search-QA, comparing it with the corresponding in-context skill baseline that uses the same skill descriptions in the prompt. On ALFWorld, LatentSkill improves success by 21.4 points on the seen split and 13.4 points on the unseen split while using 64.1% fewer prefill tokens. On Search-QA, it improves exact match by 3.0 points while reducing skill-token overhead by 72.2%. These results support the paper’s claim that moving skills into LoRA adapters is not merely a compression trick, because the generated weights can preserve or improve task performance while reducing prompt cost. The experiments also reinforce the practical value of keeping the backbone frozen while allowing skills to be loaded, unloaded, replaced, or scaled at inference time.

Takeaway: Skills Become a Weight-Space Toolkit
Beyond benchmark performance, the paper argues that generated skill LoRAs form a useful weight-space representation of procedural knowledge. Its analysis reports that skills from different domains form separable clusters in the generated LoRA weight space, suggesting a structured semantic geometry rather than arbitrary adapter weights. The authors also show that the effect of a latent skill can be controlled through the LoRA scaling coefficient α, making injection strength an explicit inference-time knob. A further analysis studies composition through parameter-space arithmetic, with positive results when skill descriptions are decomposed into semantically aligned components before combining their adapters. The paper’s broader implication is that LLM agent skills can become modular weight-space objects that reduce context overhead, limit plaintext exposure, and support inspection, control, and reuse.
