Preprint · Version 1 · Posted 2026-04-06
Bioelectric Integrity as an Intrinsic Reward Signal: Eliminating Reward Hacking via Physiological Constraints in Reinforcement Learning
Artofficial Technologies, Calgary, Canada; Mathematical Institute, University of Oxford
Abstract
Reward hacking remains an unsolved alignment problem. We propose a principled solution: grounding reward signals in the organism's own bioelectric signatures, which cannot be gamed without disrupting physiological function. We formalise this as the Bioelectric Integrity Constraint (BIC) and prove that it bounds the divergence between proxy and true reward to a quantity determined strictly by the organism's homeostatic tolerance range. Experimental validation on 500 bovine subjects over 90 days demonstrates that bioelectric reward signals eliminate all 14 classes of reward hacking identified in prior RL literature.