Documentation

❤️System Health Monitoring

Real-time monitoring, diagnostics, and health checks to ensure your AI system runs at peak performance

← Back to Documentation

Always-On System Health

Monitor every aspect of your Hive AI system in real-time. Track API health, model availability, performance metrics, and get instant alerts when issues arise. Keep your AI infrastructure running smoothly.

System Status

All Systems Operational
99.9%
API Uptime
320+
Models Online
45ms
Avg Response
0
Active Incidents

🔌 API Health

Provider connections, response times, error rates

🤖 Model Status

Availability, performance, rate limits

💾 System Resources

Database, storage, memory usage

💻 Terminal/CLI Commands

Quick Health Check

# Basic health status
hive health

✅ API Connection: Healthy
✅ License: Valid (Enterprise)
✅ Models: 320/320 available
✅ Database: Connected
✅ Overall Status: Operational

Detailed Diagnostics

# Comprehensive health report
hive health --detailed

# Check specific component
hive health --component database
hive health --component providers
hive health --component models

# Performance diagnostics
hive health --performance

Monitor & Watch

# Real-time monitoring (updates every 5s)
hive health --watch

# Monitor with custom interval
hive health --watch --interval 10

# Export health metrics
hive health --export health-report.json

Troubleshooting

# Run diagnostic tests
hive health diagnose

# Check error logs
hive health errors --last 24h

# Test specific provider
hive health test openai
hive health test anthropic

🔧 IDE Integration (Claude Code, Cursor, Windsurf)

💡 MCP Tool Name: hive_health - Monitor system health directly from your IDE

Basic Health Check

// MCP Tool Call
hive_health({
component: "all",
detailed: false
})

Quick system status check

Detailed Report

// MCP Tool Call
hive_health({
component: "all",
detailed: true
})

Comprehensive diagnostics

Component Check

// MCP Tool Call
hive_health({
component: "providers",
detailed: true
})

Check specific components

Performance Stats

// MCP Tool Call
hive_health({
component: "performance",
detailed: true
})

Performance metrics

📊 Health Metrics Explained

🔌 API Health Metrics

  • Connection Status: Active/Failed/Retry
  • Response Time: Average latency in ms
  • Error Rate: Failed requests percentage
  • Rate Limits: Usage vs available quota
Healthy Range:
• Response: <100ms
• Error Rate: <1%
• Uptime: >99.5%

🤖 Model Health Metrics

  • Availability: Online/Offline status
  • Performance: Token generation speed
  • Queue Time: Wait time for requests
  • Success Rate: Completed vs failed
Key Indicators:
• Availability: 99%+
• Queue: <5 seconds
• Success: >98%

💾 System Resource Metrics

  • Database: Connection pool status
  • Storage: Used vs available space
  • Memory: RAM usage patterns
  • Cache: Hit rate and efficiency
Thresholds:
• Storage: <80% used
• Memory: <90% used
• Cache Hit: >85%

🔐 Circuit Breakers & Protection

Hive AI includes automatic circuit breakers that protect your system from cascading failures:

Provider Circuit Breakers

  • Auto-disable failing providers
  • Retry with exponential backoff
  • Automatic recovery testing
  • Fallback to alternate providers
# Check circuit status
hive health circuits

Cost Protection

  • Budget limit enforcement
  • Spike detection alerts
  • Automatic pause on anomalies
  • Usage trend monitoring
# Check cost limits
hive health budget

🚨 Alerts & Notifications

Configure Health Alerts

# Set up email alerts
hive health alerts add --email team@company.com

# Add webhook notifications
hive health alerts add --webhook https://alerts.company.com

# Configure alert thresholds
hive health alerts config --error-rate 5 --response-time 1000

🔴 Critical Alerts

  • API connection lost
  • License expired
  • Database failure

🟡 Warning Alerts

  • High error rate
  • Slow response times
  • Budget threshold

🔵 Info Alerts

  • New models available
  • System updates
  • Usage reports

📈 Health Dashboard

Real-Time Monitoring View

hive health --dashboard
╔══════════════ HIVE AI HEALTH DASHBOARD ══════════════╗
║ ║
System Status: HEALTHY Uptime: 99.95% ║
║ ║
║ APIs: OpenRouter Anthropic Google ║
║ Models: 320/320 online Queue: 0.2s avg ║
║ Database: Connected Cache Hit: 92% ║
║ ║
║ Response Times: [▁▂▁▃▂▁▂▁] 45ms avg ║
║ Error Rate: [▁▁▁▁▁▁▁▁] 0.02% ║
╚══════════════════════════════════════════════════════╝

The dashboard updates in real-time, showing system vitals, performance graphs, and active alerts. Use arrow keys to navigate between sections, press 'q' to quit.

✅ Health Monitoring Best Practices

Regular Checks

  • Run daily health checks
  • Monitor after deployments
  • Check before critical tasks
  • Review weekly reports
  • Test recovery procedures

Proactive Monitoring

  • Set up automated alerts
  • Monitor usage trends
  • Track performance baselines
  • Document incident responses
  • Plan capacity ahead

🛡️ Enterprise-Grade Reliability

With comprehensive health monitoring, you'll know your AI system status at all times. Get alerts before issues impact your work, and maintain peak performance with proactive diagnostics.

# Check your system health now
hive health --detailed
Start free trial