Issue-Fixture Recovery Diagnostics for Hermes-Agent
A bounded Lab v1 simulation artifact using real issue and PR-linked fixtures.
Abstract
This Lab v1 artifact converts real Hermes-Agent GitHub issues and PR-linked regressions into fixed input fixtures for recovery-diagnostic evaluation. It compares a baseline representation against a second node with a LeWorldModel-inspired expectation trace observer. The observer records expected-vs-actual transitions, heuristic surprise scores, failure categories, and recovery hints. The purpose is not to establish runtime superiority, but to test whether issue history can become a reusable evaluation surface for agent recovery behavior.
Final Conclusion
On 12 fixed issue/PR-linked fixtures, the expectation trace observer produced higher fixture-level heuristic scores than the baseline representation: failure detection changed from 0.83 to 1.00, average recovery-hint quality from 0.92 to 2.75, and average diagnosis steps decreased from 3.67 to 1.00. These results are bounded to the selected fixtures and scoring rubric. They are not production reliability measurements, not statistical evidence, and not evidence of a full LeWorldModel implementation.
Lab v1 Summary
Fixture-level heuristic scores on 12 fixed Hermes-Agent issue / PR-linked inputs. For diagnosis steps, lower is better.