The Uli SDK's approach to reliability is founded on a core principle: tasks are executed only when the system is in a proper, validated state, ensuring that both software and hardware are ready for the operation. To enforce this principle, the SDK implements a sophisticated two-tiered state machine architecture. This architecture consists of a Lifecycle Management state machine that governs the behavior of each individual mission-critical application, and a top-level Subsystem State machine that aggregates these individual states to determine the overall system status and serve as the primary interface to the operator. This layered approach enables applications to meticulously track system states, deterministically process execution commands, and provide clear status feedback, guaranteeing predictable and controlled behavior under all conditions.
The graph below shows the interactions (indicated by arrows) among the participating services:
The arrows in the graph are explained below:
1 An operator or external system issues high-level commands (RESET, SHUTDOWN, RENDER USELESS) to the Subsystem State service.
2 and 3 The Subsystem State service translates the high-level command into specific directives (Initialize, Shutdown, Render Useless) and sends them down to the individual Lifecycle Management service of each mission critical application.
4 and 5 Each Lifecycle Management service reports its current state back up to the Subsystem State service. The Subsystem State service then aggregates these individual statuses to determine its own transition.
Example: It will transition from INITIALIZE to OPERATIONAL only after all the Lifecycle Management services have reported that they are in a STANDBY or READY state.
6 Mission client engages an Application by continuously sending GO commands to its Lifecycle Management service. This action transitions the application from STANDBY to READY and serves as a keep-alive signal to maintain the READY state during operation.
7 Subsystem State service reports its state back to the Subsystem State Client, providing the operator with a clear, unified view of the system’s current status.
At the application level, this reliability is realized through the Lifecycle Management state machine framework. Inherited from the Management Service, this framework provides each mission-critical application with a standardized set of states to govern its behavior throughout its entire operational lifecycle. The behavior of an application in each state is strictly defined to ensure safety and predictability:
Upon startup, the application enters the INITIALIZE state to perform all necessary procedures and self-tests.
Only after successful validation does it proceed to STANDBY, awaiting engagement, and then to READY, where it executes its core commands and operations.
Should an unrecoverable error occur, the application transitions to the FAULT state, where its sole responsibility is to report the error via the Health Reporter.
The lifecycle also includes states for controlled interruptions (PAUSE), orderly termination (SHUTDOWN), and secure decommissioning (RENDER USELESS).
Here is the Lifecycle Management state machine:
The following table is the state transitions and their triggers:
Transition | Trigger |
1 | Initialization procedures are performed and tests are successful. |
2 | Receive INITIALIZE command. |
3 | Client engages, receive GO command. |
4 | GO command times out. |
5 | Recoverable error or pause condition is detected. |
6 | Receive CONTINUE command. |
7 | Receive SHUTDOWN command. |
8 | Receive RENDER USELESS command. |
9 | Receive SHUTDOWN command. |
10 | Receive RENDER USELESS command. |
The Subsystem State service acts as the central orchestrator for the entire subsystem, governing its overall behavior and serving as the primary interface for the operator. Its core responsibilities are twofold: it translates high-level operator commands into specific directives for each mission-critical application's Lifecycle Management service, and conversely, it aggregates the individual states reported by each application to determine a unified, overall subsystem state. This aggregated state is then reported back to the operator, providing a clear and authoritative view of the system's operational status.
This behavior is governed by a dedicated state machine with five primary states: INITIALIZE, OPERATIONAL, PAUSE, SHUTDOWN, and RENDER USELESS:
The following table is the state transitions and their triggers:
Transition | Trigger |
1 | All the Lifecycle Management states are either in STANDBY or READY state. |
2 | Receive RESET command. |
3 | One of the Lifecycle Management is in PAUSE state. |
4 | Receive CONTINUE command. |
5 | Receive SHUTDOWN command. |
6 | Receive RENDER USELESS command. |
7 | Receive SHUTDOWN command. |
8 | Receive RENDER USELESS command. |
The transition from INITIALIZE to OPERATIONAL is not triggered by a direct command but is an automatic validation step. This transition only occurs once all Lifecycle Management services report they are in either a STANDBY or READY state, confirming that the entire subsystem has successfully initialized.
If even a single application reports a PAUSE condition, the entire subsystem will enter the PAUSE state, requiring an explicit CONTINUE command from the operator to resume.
Direct commands such as SHUTDOWN and RENDER USELESS will move the subsystem into its terminal states from either an OPERATIONAL or PAUSE state.
Finally, a RESET command will return the subsystem to the INITIALIZE state, beginning the startup and validation sequence anew.
Ultimately, the Uli SDK's reliability framework provides a robust mechanism for enforcing operational integrity by decoupling high-level operator commands from low-level application execution. The Subsystem State acts as a gatekeeper and translator, while the Lifecycle Management state machine ensures each application adheres strictly to its defined behavior. This hierarchical control structure prevents unintended actions and guarantees that the entire system behaves as a cohesive, predictable whole.