ReadPaper Blog
AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification
AUDITFLOW addresses the problem that language-model agents struggle to verify structured financial reports when correctness depends on XBRL facts, US-GAAP taxonomy constraints, calculation relations, dimensional contexts, and numerical recomputation rather than text retrieval alone. The paper proposes a graph-grounded multi-agent framework that lets LLMs guide the search while deterministic symbolic tools perform fact retrieval, taxonomy traversal, numerical checking, and rule evaluation. Its results on a FinAuditing-derived FinMR sample show that executable symbolic verification substantially improves audit accuracy and that removing deterministic checks sharply degrades performance.
Source: AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

When Numbers Become a Trap
The paper frames XBRL audit verification as a problem in structured evidence, not ordinary document question answering. In public-company filings, a reported value is tied to a concept, period, unit, context, and often to calculation or dimensional relationships defined by the US-GAAP taxonomy. AUDITFLOW argues that verifying such a value requires linking dispersed filing facts to taxonomy concepts, traversing relevant relationships, recomputing an expected value, and then applying an audit rule. This formulation explains why a filing number cannot be treated as self-validating evidence: correctness depends on how that number interacts with regulatory constraints and neighboring facts. The paper therefore defines the central challenge as graph-grounded numerical consistency verification, where the final answer must include not only a verdict but also an expected value, action path, supporting evidence, and trustworthiness score.

The Gap in Current AI Audits
The paper situates AUDITFLOW against tool-augmented agents, retrieval-augmented systems, graph-grounded methods, neuro-symbolic approaches, and multi-agent debate frameworks. Its key critique is that many prior systems improve evidence access but still leave final arithmetic, rule interpretation, or trust judgment inside the language model. The authors connect this limitation to financial-audit benchmarks and systems such as FinAuditing, FinRule-Bench, XBRL-Agent, FinReporting, Herculean, and FinVault, which expose persistent difficulty in numerical consistency and rule-based verification. The paper also notes that similar weaknesses appear in domains such as legal compliance and clinical guideline checking, where answers depend on external rules and structured evidence. AUDITFLOW’s contribution is to move the load-bearing verification step out of natural-language reasoning and into executable operations over a structured environment.

AUDITFLOW’s Big Idea
AUDITFLOW’s main methodological idea is a separation between adaptive search and deterministic computation. The language-model agents decide what concepts, facts, contexts, or relationships to inspect next, but the symbolic environment performs fact retrieval, taxonomy traversal, numerical checks, and rule evaluation. This design treats the XBRL filing and the US-GAAP taxonomy as an executable symbolic environment rather than as text to be summarized. The paper formalizes the task around a filing, target concept, reporting period, and taxonomy release, and asks whether the reported value is consistent with the value implied by taxonomy constraints, filing context, and an applicable audit rule. By exposing verification through typed tools, AUDITFLOW constrains the model to select structured operations while deterministic implementations produce the observations needed for an auditable verdict.

How the Environment Works
The paper implements this environment as a dual graph composed of a static US-GAAP taxonomy graph and a dynamic filing-specific evidence graph. The taxonomy graph contains concepts, labels, definitions, data types, period types, abstract flags, balance attributes, and relationships such as presentation, calculation, and dimensional edges. The filing graph contains reported facts, contexts, periods, units, dimensional assignments, values, decimals, and concept links, with bridge edges connecting filing facts to governing taxonomy concepts. AUDITFLOW exposes these graphs through a typed action-observation interface covering retrieval tools, traversal tools, forensic comparison tools, and deterministic rule-specific checkers. Its multi-agent protocol assigns two junior auditors to inspect the same case from regulatory and evidentiary perspectives, while a senior auditor reviews disagreements and can request further investigation before evidential aggregation produces the final verdict, evidence trail, expected value, and trustworthiness score.

What the Experiments Show
The experiments evaluate AUDITFLOW on a FinAuditing-derived FinMR sample covering three Data Quality Committee rule families: sign consistency under DQC.US.0015, dimensional aggregation consistency under DQC.US.0117, and calculation-tree consistency under DQC.US.0126. Under GPT-5.5, AUDITFLOW reaches 82.09% joint audit accuracy and outperforms the strongest reported baseline, Single Agent, by 14.93 percentage points. The decisive ablation is the removal of deterministic checks, which drops accuracy to 17.91% and raises invalid outputs to 35.82%, indicating that the executable symbolic environment performs verification steps that the model cannot reliably replace. The paper also reports stable performance across strong backbones, with GPT-4o, Claude Sonnet 4.6, and Qwen-397B each reaching 80.60% joint accuracy. These results support the paper’s broader implication that reliable financial AI agents should combine LLM-guided search with deterministic symbolic verification when correctness depends on structured rules, graph relations, and recomputed values.
