Reliability
infrastructure
for Solana

Routing policies, observability, and incident response for high-throughput RPC infrastructure.

SLO: P99 < 200ms | 99.99% uptime
Interfaces: API, SDK, CLI, Console

System model

Four-stage pipeline from signal to traffic decision

1.SignalsMetrics, traces, health checks
2.ScoringWeight by latency, errors, capacity
3.Policy evalApply rules, thresholds, constraints
4.Traffic decisionRoute, shift, or circuit-break

Interfaces

API-first with SDKs, CLI, and optional web console

API
- /v1/routes - query routing table state
- /v1/policies - manage routing policies
- /v1/telemetry - stream metrics and traces
- /v1/incidents - incident history and context
SDK
- TypeScript client (npm: @gate/client)
- Rust client (crates.io: gate-client)
CLI
$ gate status
$ gate policy apply latency-failover.yaml
$ gate routes list
$ gate incidents show --last 24h
Web Console
- Optional UI for metrics dashboards and incident review
- All functionality available via API and CLI

Primitives

Core abstractions and guarantees

[ Observability ]

Track P50 / P95 / P99 latency, error budgets, and saturation.

  • - Multi-dimensional metrics
  • - Distributed trace context
  • - Endpoint-level breakdowns
[ Routing Policies ]

Declarative failover logic and traffic shaping.

  • - Versioned policy configs
  • - Priority, weights, decay, hysteresis
  • - Health-based routing
  • - Gradual rollouts
[ Endpoint Scoring ]

Weight endpoints by latency, availability, and cost.

  • - Real-time scoring
  • - Geographic affinity
  • - Capacity-aware selection
[ Audit Model ]

Complete history of routing decisions and config changes.

  • - Config hash verification
  • - Signed changes with attribution
  • - Immutable logs
  • - Full diff viewing and rollback

Failure modes handled

Designed for real-world degradation patterns

- RPC timeout / connection failures
- Partial endpoint degradation (latency increase without total failure)
- Skewed endpoints (one endpoint significantly slower than pool)
- Error rate spikes with intermittent recovery
- Capacity saturation (connection pool exhaustion)
- Network partition / geographic isolation

Policy examples

Declarative YAML configs for routing behavior

# policy/latency-failover.yaml
name: latency_failover
priority: 100
trigger:
metric: p99_latency
threshold: 200ms
window: 30s
action: shift_traffic
target: fallback_pool
hysteresis: 60s
# policy/error-budget.yaml
name: error_budget_enforcement
priority: 200
trigger:
metric: error_rate
threshold: 0.1%
burn_rate: 5x
lookback: 1h
action: circuit_break
recovery_threshold: 0.01%
# Routing table dump
$ gate routes list --format=table
ENDPOINT SCORE WEIGHT STATUS P99
us-west-1a 0.92 0.35 OK 142ms
us-west-1b 0.88 0.32 OK 156ms
us-east-1a 0.85 0.28 OK 178ms
eu-central-1 0.42 0.05 DEGRADED 421ms

Request early access

Join infrastructure teams already using Gate in production.

Get Started