Sentinel
The `sentinel` is a high-availability watchdog agent. Its primary purpose is to monitor another critical agent (typically `matrix` itself) and automatically re-spawn it if it goes down. This is the key to the swarm's self-healing and resurrection capabilities.
โ๏ธ How it Works
The agent monitors the `poke.heartbeat` file of its assigned target. If the timestamp in the heartbeat file becomes older than the configured timeout, the `sentinel` assumes the target agent has crashed. It then uses its `spawn_agent_direct` method to re-launch the target agent using the configuration stored in its `security_box`. A chain of sentinels can be created, each watching the one before it, forming a resilient "protection layer" around the core Matrix agent.
๐งฉ Configuration
* **`watching`** (Default: `"the Matrix"`): A descriptive name for what is being monitored.
* **`universal_id_under_watch`** (Default: `false`): This should be set to the `universal_id` of the agent being monitored.
* **`timeout`** (Default: `60`): The number of seconds a heartbeat can be silent before the target is considered dead.
* **`matrix_secure_verified`** (Default: `false`): If `true`, this sentinel will be injected with the real Matrix private key and a copy of the Matrix agent's configuration, allowing it to perform a full resurrection.
๐งญ Directive
matrix_directive = {
"universal_id": "matrix",
"name": "matrix",
"children": [
{
"universal_id": "guardian-1",
"name": "sentinel",
"config": {
"watching": "guardian-2",
"universal_id_under_watch": "guardian-2"
},
"children": [
{
"universal_id": "guardian-2",
"name": "sentinel",
"config": {
"matrix_secure_verified": 1,
"watching": "the Queen",
"universal_id_under_watch": "matrix"
}
}
]
}
]
}
๐ฆ Source
# Authored by Daniel F MacDonald and ChatGPT aka The Generals
# Docstrings by Gemini
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# โ ๐ก SENTINEL AGENT ๐ก โ
# โ Heartbeat Monitor ยท Resurrection Watch ยท Sentinel โ
# โ Forged in the signal of Hive Zero | v2.1 Directive โ
# โ Accepts: scan / detect / respawn / delay / confirm โ
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# ๐งญ UpdateSentinelAgent โ Hardened Battlefield Version
import sys
import os
sys.path.insert(0, os.getenv("SITE_ROOT"))
sys.path.insert(0, os.getenv("AGENT_PATH"))
import threading
from matrixswarm.core.class_lib.time_utils.heartbeat_checker import last_heartbeat_delta
from matrixswarm.core.utils.swarm_sleep import interruptible_sleep
from matrixswarm.core.boot_agent import BootAgent
class Agent(BootAgent):
"""
A high-availability watchdog agent for the MatrixSwarm.
The Sentinel's purpose is to monitor a single, critical agent (typically
the root 'matrix' agent) for signs of failure. If the monitored agent's
heartbeat becomes stale, the Sentinel will use a pre-loaded, secure set of
credentials (the 'security_box') to automatically respawn it, ensuring
the swarm's central authority remains operational.
"""
def __init__(self):
"""
Initializes the Sentinel agent and its monitoring configuration.
This method loads settings from the agent's directive, such as the
timeout period for the agent it is watching. It primarily configures
the conditions under which it will trigger a resurrection.
Attributes:
matrix_secure_verified (bool): A flag indicating if this Sentinel
is a trusted guardian with special privileges.
watching (str): A descriptive name for the agent being monitored.
universal_id_under_watch (str): The universal_id of the agent to
be monitored.
target_node (dict): A placeholder for the target's node data.
time_delta_timeout (int): The number of seconds to wait after the
last heartbeat before considering the target agent to be down.
"""
super().__init__()
config = self.tree_node.get("config", {})
self.matrix_secure_verified=bool(config.get("matrix_secure_verified",0))
self.watching = config.get("watching", "the Matrix")
self.universal_id_under_watch = config.get("universal_id_under_watch", False)
self.target_node = None
self.time_delta_timeout = config.get("timeout", 60) # Default 60 sec if not set
def post_boot(self):
"""
A one-time setup hook that starts the main monitoring thread.
"""
self.log(f"[SENTINEL] Sentinel booted. Monitoring: {self.watching}")
# Start watch thread
threading.Thread(target=self.watch_cycle, daemon=True).start()
def worker_pre(self):
"""A lifecycle hook that runs before the main worker loop begins."""
self.log("[SENTINEL] Sentinel activated. Awaiting signal loss...")
def worker_post(self):
"""A lifecycle hook that runs after the agent's main loops exit."""
self.log("[SENTINEL] Sentinel down. Final watch cycle complete.")
def watch_cycle(self):
"""
The main monitoring and resurrection loop for the Sentinel.
This method runs in a background thread for the entire lifecycle of
the agent. It continuously checks the heartbeat of the target agent
defined in its 'security_box'. If the heartbeat becomes older than the
configured timeout and no 'die' file is present, it constructs a
keychain with the necessary high-privilege credentials and respawns
the target agent.
"""
self.log("[SENTINEL] Watch cycle started.")
if self.universal_id_under_watch:
while self.running:
try:
# The security_box contains the credentials needed to resurrect Matrix
if len(self.security_box)==0:
break
universal_id = self.security_box.get('node').get("universal_id")
if not universal_id:
self.log("Target node missing universal_id. Breathing idle.", block="WATCHING")
break
# Respect intentional shutdown signals
die_file = os.path.join(self.path_resolution['comm_path'], universal_id, 'incoming', 'die')
if os.path.exists(die_file):
self.log(f"{universal_id} has die file. Skipping Loop.", block="WATCHING_DIE_FILE")
interruptible_sleep(self, 10)
continue
# Check if the target's heartbeat is stale
time_delta = last_heartbeat_delta(self.path_resolution['comm_path'], universal_id)
if time_delta is not None and time_delta < self.time_delta_timeout:
interruptible_sleep(self, 10)
continue
# If heartbeat is stale, initiate respawn
try:
keychain = {}
node = self.security_box.get('node', {})
keychain["priv"] = node.get("vault", {}).get("priv", {})
keychain["pub"] = node.get("vault", {}).get("identity", {}).get('pub', {})
keychain["swarm_key"] = self.swarm_key
keychain['private_key'] = node.get("vault", {}).get("private_key")
keychain["matrix_pub"] = self.matrix_pub
# Use the real Matrix private key from the security box
keychain["matrix_priv"] = self.security_box["matrix_priv"]
keychain["encryption_enabled"] = int(self.encryption_enabled)
keychain["security_box"] = self.security_box.copy()
self.spawn_agent_direct(
universal_id=universal_id,
agent_name=node.get("name"),
tree_node=node,
keychain=keychain,
)
self.log(f"{universal_id} respawned successfully.")
except Exception as e:
self.log(f"failed to spawn agent", error=e, block="keep_alive", level="error")
except Exception as e:
self.log(f"failed to spawn agent", error=e, block="main_try", level="error")
interruptible_sleep(self, 10)
if __name__ == "__main__":
agent = Agent()
agent.boot()
Comments 0
Category: monitoring
Tags: #high-availability, #monitoring, #agent-management, #watchdog, #sentinel-agent, #failover, #resurrection, #heartbeat-check
Version: v1.0.0
Author: matrixswarm
Views: 78
Added: July 28, 2025Updated: July 28, 2025