Redundancy Protocols

Failure-tolerant systems are designed to fail gracefully, not to avoid failure. OnionHat builds redundancy into the architecture rather than relying on fallback mechanisms that may not be tested in production conditions.

Redundancy Principles

Active-Active Operation

Redundant components operate continuously, not as cold standbys. Failover is not a special event—it is normal operation.

Independent Failure Domains

Redundant components should not share failure modes. Common dependencies (power, network, provider) are eliminated where possible.

Tested Regularly

Redundancy that is never exercised is not redundancy. Failure scenarios are tested as part of normal operation.

Levels of Redundancy

Data Redundancy

Data is replicated across independent storage systems. Replication is synchronous where consistency is required, asynchronous where latency is prioritized.

Service Redundancy

Services run on multiple independent hosts. Load distribution and failover are automatic.

Network Redundancy

Multiple network paths exist between components. Routing automatically adapts to path failures.

Geographic Redundancy

Critical systems are replicated across jurisdictions. Regional failures do not cause global outages.

What Redundancy Does Not Solve

Correlated failures across all replicas
Bugs that affect all instances simultaneously
Operator errors that propagate to all systems
Adversaries with access to all redundant components

Redundancy reduces risk. It does not eliminate it.