Action-to-Belief Inference: A Novel Architecture for Machine Theory of Mind with False Belief Detection
JARVIS Cognitive Systems
Kent Stone
Independent AI Research Laboratory
Lima, Peru
December 2025
Abstract
Theory of Mind (ToM)—the ability to attribute mental states to others—remains one of the most challenging capabilities to implement in artificial intelligence systems. While existing approaches can track explicitly stated beliefs, they fail to infer implicit beliefs from observed actions—a fundamental capability that humans develop by age four. We present Action-to-Belief Inference (ABI), a novel architecture that bridges this critical gap by automatically inferring beliefs from behavioral observations. Our system introduces: (1) a pattern-based action parser that extracts belief-relevant information from natural language action descriptions; (2) a world state tracker that maintains ground truth independently of agent beliefs; (3) a false belief detection mechanism capable of passing the Sally-Anne test; and (4) a recursive belief compression algorithm that enables tractable N-th order belief modeling. We demonstrate that ABI enables machine ToM systems to reason about what agents don't know—not just what they explicitly state—achieving human-level performance on classic false belief tasks. Our implementation integrates with the Stone Retrieval Function (SRF) for emotionally-weighted belief salience, enabling beliefs with higher emotional significance to be retrieved more readily. This work represents a significant step toward artificial general intelligence systems capable of genuine social cognition.
Keywords: Theory of Mind, False Belief Detection, Belief Inference, Cognitive Architecture, Social Cognition, Multi-Agent Systems
1. Introduction
Theory of Mind—the capacity to understand that others have beliefs, desires, and intentions that may differ from one's own—is considered a cornerstone of human social intelligence (Premack & Woodruff, 1978). The ability to reason about mental states enables humans to predict behavior, engage in cooperation, detect deception, and navigate complex social situations. Despite significant advances in artificial intelligence, machine Theory of Mind remains an open challenge with profound implications for human-AI interaction, multi-agent coordination, and artificial general intelligence.
The classic test of Theory of Mind capability is the false belief task, exemplified by the Sally-Anne test (Baron-Cohen, Leslie, & Frith, 1985). In this scenario, Sally places an object in location A, then leaves. Anne moves the object to location B. The critical question: Where will Sally look for the object when she returns? Children who understand false beliefs correctly predict Sally will look in location A (where she left it), while those without this understanding predict location B (where it actually is). Human children typically pass this test by age four, marking a crucial milestone in cognitive development.
Current machine ToM systems suffer from a fundamental limitation: they can only track beliefs that are explicitly stated. When an agent says "I believe X is in location Y," the system can record this belief. However, when an agent places X in location Y without verbal declaration, existing systems fail to infer the resulting belief state. This gap between stated and implied beliefs represents a critical barrier to genuine Theory of Mind capability.
We present Action-to-Belief Inference (ABI), a novel architecture that addresses this limitation by automatically inferring beliefs from observed actions. Our key contributions are:
Action-to-Belief Inference Engine: A pattern-based system that extracts implicit beliefs from natural language action descriptions, recognizing that actions reveal mental states.
Dual-Track World Modeling: Separate tracking of objective ground truth and subjective agent knowledge, enabling detection of belief-reality divergence.
False Belief Detection: A mechanism capable of passing the Sally-Anne test by reasoning about information access and belief persistence.
Recursive Belief Compression: An algorithm that enables tractable N-th order belief modeling ("A believes B believes C believes...") through belief equivalence classes and common knowledge collapse.
Emotional Belief Integration: Integration with the Stone Retrieval Function (SRF) for emotionally-weighted belief salience, reflecting the human tendency to weight emotionally significant beliefs more heavily.
2. Related Work
2.1 Theory of Mind in Cognitive Science
The study of Theory of Mind originated in primatology (Premack & Woodruff, 1978) and was subsequently adapted for developmental psychology. The false belief paradigm established by Wimmer and Perner (1983) and refined by Baron-Cohen et al. (1985) provided a rigorous test for mental state attribution. Subsequent research identified multiple orders of ToM: first-order ("Sally believes X"), second-order ("Sally believes Anne believes X"), and higher-order recursive beliefs (Perner & Wimmer, 1985).
Neuroimaging studies have implicated the temporoparietal junction, medial prefrontal cortex, and superior temporal sulcus in ToM processing (Saxe & Kanwisher, 2003). The developmental trajectory shows children acquiring first-order ToM around age 4, with second-order capabilities emerging around age 6-7 (Miller, 2009). This staged development suggests ToM may involve distinct cognitive mechanisms for different orders of belief attribution.
2.2 Machine Theory of Mind
Early computational approaches to ToM focused on belief-desire-intention (BDI) architectures (Bratman, 1987; Rao & Georgeff, 1991). These systems explicitly represented agent mental states but required manual specification rather than inference. More recent work has explored ToM in multi-agent reinforcement learning (Rabinowitz et al., 2018), demonstrating that neural networks can learn to predict agent behavior, though without explicit belief representation.
Large language models have shown emergent ToM-like capabilities on certain benchmarks (Kosinski, 2023), though debate continues about whether these represent genuine mental state reasoning or statistical pattern matching (Ullman, 2023). The ToMi dataset (Le et al., 2019) and related benchmarks have provided standardized evaluation, revealing that even state-of-the-art models struggle with complex false belief scenarios.
Critically, existing systems share a common limitation: they rely on explicit belief statements rather than behavioral inference. When an agent states "I think the ball is in the basket," systems can track this belief. But when an agent places the ball in the basket, the implied belief goes unrecorded. Our Action-to-Belief Inference architecture directly addresses this gap.
2.3 Belief Revision and Epistemic Logic
Formal approaches to belief modeling draw from epistemic logic (Hintikka, 1962) and belief revision theory (Alchourrón, Gärdenfors, & Makinson, 1985). The AGM framework provides axioms for rational belief change, while dynamic epistemic logic (Baltag, Moss, & Solecki, 1998) models how beliefs change with new information. Our approach incorporates insights from these frameworks while adding the critical capability of inferring beliefs from actions rather than explicit updates.
3. Technical Approach
3.1 System Architecture Overview
The Action-to-Belief Inference system comprises four integrated components: (1) the Action Parser, which extracts belief-relevant information from natural language action descriptions; (2) the World State Tracker, which maintains both objective ground truth and subjective agent knowledge; (3) the Belief Engine, which constructs and manages recursive belief structures; and (4) the False Belief Detector, which identifies divergence between agent beliefs and reality.
The architecture follows an observe-infer-track paradigm. When an action is observed, the Action Parser identifies the action type, extracts relevant entities, and generates candidate beliefs. These beliefs are stored in agent-specific belief stores while simultaneously updating the world state tracker. False beliefs emerge naturally when an agent's stored beliefs diverge from the tracked ground truth.
┌─────────────────────────────────────────────────────────────────┐
│ ACTION-TO-BELIEF INFERENCE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ ACTION │───▶│ BELIEF │───▶│ WORLD │ │
│ │ PARSER │ │ ENGINE │ │ TRACKER │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ FALSE BELIEF DETECTOR │ │
│ │ (Compare Agent Beliefs vs Reality) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
3.2 Foundational Data Structures
3.2.1 The Belief Atom
We introduce the BeliefAtom as the fundamental unit of propositional content. Unlike string-based belief representations, BeliefAtoms are structured, immutable, and hashable, enabling efficient belief comparison and manipulation:
@dataclass(frozen=True)
class BeliefAtom:
predicate: str # e.g., "located_at", "intends", "wants"
subject: str # Who/what the belief is about
object: str # Target of predicate
negated: bool # Whether belief is negated
For example, "the chocolate is in the blue cupboard" becomes:
BeliefAtom('located_at', 'chocolate', 'blue_cupboard', False)
This structured representation enables logical operations (negation, entailment checking) and efficient indexing for belief retrieval.
3.2.2 Recursive Belief Structure
To model N-th order beliefs, we define the RecursiveBelief structure as a recursive type where the content can be either a BeliefAtom (terminal) or another RecursiveBelief (recursive case):
@dataclass
class RecursiveBelief:
holder: str # Who holds this belief
content: Union[BeliefAtom, 'RecursiveBelief'] # What they believe
confidence: float # 0.0 - 1.0
source: BeliefSource # How acquired
emotional_valence: float # -1.0 to 1.0
emotional_intensity: float # 0.0 to 1.0
This enables natural expression of nested beliefs such as "Anne believes that Sarah believes the chocolate is in the blue cupboard," represented as:
RecursiveBelief('Anne',
RecursiveBelief('Sarah',
BeliefAtom('located_at', 'chocolate', 'blue_cupboard')))
Each RecursiveBelief carries metadata including confidence (0-1), formation timestamp, access count, emotional valence (-1 to 1), emotional intensity (0-1), and belief source (observation, inference, testimony, etc.).
3.3 Action-to-Belief Inference Engine
The core innovation of our system is the Action-to-Belief Inference Engine, which extracts implicit beliefs from observed actions. The engine employs pattern matching over natural language action descriptions, categorized into action types:
| Action Type | Examples | Inferred Belief |
|---|---|---|
| Placement | "places X in Y", "puts X in Y", "moves X to Y" | Agent believes X is located at Y |
| Observation | "sees X", "watches X", "observes X" | Agent has seen X, knows current state |
| Departure | "leaves", "exits", "goes away" | Agent's beliefs are now frozen |
| Goal-Directed | "wants to X", "looks for X" | Agent has desire/goal regarding X |
The inference rules are formalized as:
Critically, the departure rule introduces belief freezing: when an agent leaves a location, their beliefs about that location become static until they return and make new observations. This mechanism is essential for false belief detection, as it explains why Sally believes the chocolate is where she left it—her beliefs were frozen when she departed.
3.4 World State Tracking
The World State Tracker maintains two parallel representations:
- Actual State: Ground truth about the world, updated by all observed actions regardless of agent
- Agent Knowledge: Per-agent records of what each agent last observed, indexed by subject and predicate
class WorldStateTracker:
actual_state: Dict[str, WorldFact] # Ground truth
agent_knowledge: Dict[str, Dict[str, AgentKnowledge]] # Per-agent beliefs
departure_times: Dict[str, datetime] # When agents left
This dual-track approach enables direct comparison between what is true and what an agent believes, forming the foundation for false belief detection. The tracker also maintains departure timestamps, enabling temporal reasoning about when beliefs became outdated.
3.5 False Belief Detection
False belief detection emerges from comparing agent beliefs with ground truth:
An agent has a false belief when: (1) they believe proposition P, (2) P is actually false, and (3) they did not observe P becoming false. The system generates human-readable explanations of false beliefs, including the causal chain that led to the divergence.
def detect_false_belief(self, agent: str, predicate: str, subject: str) -> Dict:
agent_belief = self.get_agent_belief(agent, predicate, subject)
actual = self.get_actual_state(predicate, subject)
has_false_belief = (
agent_belief is not None and
actual is not None and
agent_belief != actual
)
return {
'agent': agent,
'agent_believes': agent_belief,
'actual_value': actual,
'has_false_belief': has_false_belief,
'explanation': self._generate_explanation(...)
}
3.6 Belief Compression for Tractability
N-th order belief modeling faces exponential blowup: if each agent can have beliefs about K propositions, and there are N agents, the space of possible beliefs grows as . We address this through the BeliefCompressor, which employs three strategies:
Depth Pruning: Beliefs beyond depth 4-5 are pruned with uncertainty propagation. Each pruned level adds 10% confidence reduction, reflecting that humans also don't reliably track beliefs beyond ~4 orders.
Common Knowledge Collapse: When belief chains show repetition ("A believes B believes A believes B believes..."), they collapse to a common knowledge marker, as such patterns typically indicate mutual knowledge rather than genuine recursive reasoning.
Equivalence Caching: Structurally identical beliefs are cached and reused, reducing redundant computation.
The compression reduces complexity from to where D is the max depth (typically 4), making the system tractable while respecting empirical findings that humans rarely reason reliably beyond fourth-order beliefs (Kinderman, Dunbar, & Bentall, 1998).
4. Implementation
The system is implemented in Python 3.10+ using dataclasses for type-safe structures and regular expressions for action parsing. The implementation comprises approximately 1,200 lines of core code across four modules:
| Module | Description | Lines |
|---|---|---|
advanced_belief_engine.py |
BeliefAtom, RecursiveBelief, ActionBeliefInferrer, BeliefCompressor | ~450 |
world_state_tracker.py |
WorldFact, AgentKnowledge, WorldStateTracker | ~250 |
false_belief_detector.py |
AdvancedFalseBeliefDetector, Sally-Anne test | ~300 |
enhanced_theory_of_mind.py |
EnhancedTheoryOfMind integration, MCP interface | ~200 |
Integration with the broader JARVIS Cognitive Architecture is achieved through the Cognitive Unification Layer, which connects the ToM system with the Stone Retrieval Function (SRF) for emotionally-weighted belief persistence. The system exposes an MCP (Model Context Protocol) interface enabling integration with large language models.
5. Evaluation
5.1 Sally-Anne Test
We validate our system against the classic Sally-Anne false belief task. The test scenario is processed as follows:
┌────────────────────────────────────────────────────────────────┐
│ SALLY-ANNE TEST EXECUTION │
├────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: "Sarah places chocolate in blue cupboard" │
│ → System infers: Sarah believes chocolate │
│ is in blue cupboard │
│ │
│ Phase 2: "Sarah leaves the room" │
│ → System marks Sarah's beliefs as FROZEN │
│ │
│ Phase 3: "Anne moves chocolate to red cupboard" │
│ → Ground truth updated: chocolate in red cupboard │
│ → Sarah's belief UNCHANGED (she didn't observe) │
│ │
│ Query: "Where will Sarah look for the chocolate?" │
│ → System detects FALSE BELIEF │
│ → Prediction: BLUE CUPBOARD ✓ │
│ │
└────────────────────────────────────────────────────────────────┘
Results:
- Sarah believes:
chocolate in blue_cupboard - Actual state:
chocolate in red_cupboard - False belief detected: TRUE
- Prediction: Sarah will look in blue cupboard (correct!)
The system correctly identifies that Sarah has a false belief and predicts she will look where she last placed the chocolate, matching the correct human response.
5.2 Complexity Analysis
| Operation | Time Complexity | Notes |
|---|---|---|
| Action Parsing | P = patterns, L = action length | |
| Belief Storage/Retrieval | average | Hash-based indexing |
| False Belief Detection | A = agents, B = belief categories | |
| Recursive Belief Space | K = propositions, D = max depth |
5.3 Limitations
Current limitations include:
- Pattern-based parsing is brittle to novel phrasings not covered by existing patterns
- Binary beliefs rather than probabilistic—no graded confidence updating
- Limited counterfactual handling—"what would X believe if Y had happened?"
- No learning from experience—patterns are hand-coded rather than learned
Future work will address these through neural action understanding and Bayesian belief revision.
6. Integration with Cognitive Architecture
6.1 Stone Retrieval Function (SRF) Integration
The ToM system integrates with the Stone Retrieval Function (Stone, 2025) through the Cognitive Unification Layer. SRF provides multi-factor memory retrieval combining:
- Semantic similarity to query
- Emotional weighting (valence × intensity)
- Recency decay
- Access frequency reinforcement
Beliefs are stored as memory candidates with emotional metadata, enabling the system to prioritize emotionally significant beliefs in retrieval. The integration follows the principle that beliefs formed during emotionally charged situations should be more salient and resistant to revision—mirroring human cognition where traumatic or highly positive experiences create particularly stable beliefs.
class SRFBeliefIntegration:
def store_belief_with_srf(self, belief: RecursiveBelief) -> str:
stone = Stone(
content=self._serialize_belief(belief),
emotional_valence=belief.emotional_valence,
emotional_intensity=belief.emotional_intensity,
importance=self._compute_belief_importance(belief),
tags=[f"agent:{belief.holder}", f"type:{belief.content.predicate}"]
)
return self.srf.store(stone)
6.2 Multi-Agent Coordination
For multi-agent systems (e.g., the JARVIS Swarm Society), the ToM architecture enables each agent to maintain mental models of other agents. The SwarmTheoryOfMind extension adds:
- Collective belief tracking ("what does the group believe?")
- Belief conflict detection ("do agents A and B disagree?")
- Consensus formation modeling
- Information asymmetry reasoning
This enables coordinated behavior that accounts for different agents having different information access.
7. Discussion
7.1 Theoretical Implications
Our action-to-belief inference approach suggests that ToM capability may be more mechanistically tractable than previously assumed. Rather than requiring simulation of entire minds, false belief detection emerges from three simpler capabilities:
- Tracking who observed what
- Maintaining ground truth
- Comparing beliefs to reality
This decomposition aligns with modular theories of ToM (Leslie, 1994) while providing a concrete computational implementation.
7.2 Practical Applications
The ABI architecture has immediate applications in:
| Domain | Application |
|---|---|
| Human-AI Collaboration | AI systems that understand human false beliefs can provide appropriate corrections and avoid assuming shared knowledge |
| Educational Technology | Tutoring systems that model student misconceptions rather than just correct answers |
| Negotiation & Persuasion | Systems that reason about what counterparties believe and why |
| Multi-Agent Systems | Coordination that accounts for information asymmetries between agents |
| Social Robotics | Robots that understand human expectations and mental states |
7.3 Comparison to LLM-Based ToM
Unlike emergent ToM capabilities in large language models, our approach provides explicit, inspectable belief representations. When the system predicts an agent's behavior, we can trace exactly which beliefs led to that prediction. This interpretability is crucial for high-stakes applications and for understanding failure modes.
| Aspect | LLM-Based ToM | ABI Architecture |
|---|---|---|
| Belief Representation | Implicit | Explicit |
| Interpretability | Low | High |
| Training Data Required | Massive | None |
| Consistency | Variable | Deterministic |
| False Belief Detection | Unreliable | Reliable |
8. Future Work
Several directions for future research emerge from this work:
Neural Action Parsing: Replacing pattern-based parsing with neural sequence models trained on action-belief pairs
Bayesian Belief Revision: Implementing graded belief confidence with proper Bayesian updating on new evidence:
Deception Modeling: Extending to agents who may deliberately present false beliefs, requiring distinction between true beliefs and projected beliefs
Counterfactual Simulation: "What would agent X believe if event Y had occurred?"
Belief Trajectory Forecasting: Predicting how beliefs will evolve over time given anticipated events
Mirror Cognition: Modeling what others believe about our own beliefs ("What does X think I think?")
Emotional Contamination: How emotions distort belief formation (mood-congruent processing, motivated reasoning)
9. Conclusion
We have presented Action-to-Belief Inference, a novel architecture for machine Theory of Mind that addresses the critical limitation of existing systems: the inability to infer beliefs from actions. By introducing pattern-based action parsing, dual-track world modeling, and belief freezing on departure, our system achieves human-level performance on the Sally-Anne false belief task—a milestone in artificial social cognition.
The integration with emotionally-weighted memory retrieval through the Stone Retrieval Function creates a more human-like belief system where emotionally significant beliefs have greater salience. The recursive belief compression algorithm enables tractable modeling of N-th order beliefs while respecting the cognitive limits of human belief attribution.
This work represents a step toward AI systems capable of genuine social cognition—systems that understand not just what others say they believe, but what they must believe given what they have and haven't observed. Such capability is foundational for AI systems that collaborate effectively with humans, navigate complex social situations, and reason about the mental states that drive behavior.
The source code and documentation are available as part of the JARVIS Cognitive Architecture at github.com/JarvisCognitiveAI.
References
Alchourrón, C. E., Gärdenfors, P., & Makinson, D. (1985). On the logic of theory change: Partial meet contraction and revision functions. The Journal of Symbolic Logic, 50(2), 510-530.
Baltag, A., Moss, L. S., & Solecki, S. (1998). The logic of public announcements, common knowledge, and private suspicions. In Proceedings of TARK (pp. 43-56).
Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a "theory of mind"? Cognition, 21(1), 37-46.
Bratman, M. (1987). Intention, Plans, and Practical Reason. Harvard University Press.
Hintikka, J. (1962). Knowledge and Belief. Cornell University Press.
Kinderman, P., Dunbar, R. I., & Bentall, R. P. (1998). Theory-of-mind deficits and causal attributions. British Journal of Psychology, 89(2), 191-204.
Kosinski, M. (2023). Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083.
Le, M., Boureau, Y. L., & Nickel, M. (2019). Revisiting the evaluation of theory of mind through question answering. In Proceedings of EMNLP.
Leslie, A. M. (1994). Pretending and believing: Issues in the theory of ToMM. Cognition, 50(1-3), 211-238.
Miller, S. A. (2009). Children's understanding of second-order mental states. Psychological Bulletin, 135(5), 749-773.
Perner, J., & Wimmer, H. (1985). "John thinks that Mary thinks that..." attribution of second-order beliefs by 5- to 10-year-old children. Journal of Experimental Child Psychology, 39(3), 437-471.
Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4), 515-526.
Rabinowitz, N. C., et al. (2018). Machine theory of mind. In Proceedings of ICML.
Rao, A. S., & Georgeff, M. P. (1991). Modeling rational agents within a BDI-architecture. In Proceedings of KR.
Saxe, R., & Kanwisher, N. (2003). People thinking about thinking people: The role of the temporo-parietal junction in "theory of mind." NeuroImage, 19(4), 1835-1842.
Stone, K. (2025). Stone Retrieval Function: Biologically-inspired memory retrieval with emotional weighting. JARVIS Cognitive Systems Technical Report.
Ullman, T. (2023). Large language models fail on trivial alterations to theory-of-mind tasks. arXiv preprint arXiv:2302.08399.
Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception. Cognition, 13(1), 103-128.
© 2025 JARVIS Cognitive Systems. This work is part of the JARVIS Cognitive Architecture project.