Stable Docs | Stable Architecture & Integration Guide

Comprehensive guide for monitoring Stable nodes and performing routine maintenance tasks.

Monitoring Stack Overview

Recommended Stack

Prometheus: Metrics collection
Grafana: Visualization and dashboards
AlertManager: Alert routing and management
Node Exporter: System metrics
Loki: Log aggregation (optional)

Quick Monitoring Setup

Step 1: Enable Prometheus Metrics

# Edit ~/.stabled/config/config.toml
[instrumentation]
prometheus = true
prometheus_listen_addr = ":26660"
namespace = "stablebft"

Restart node:

sudo systemctl restart ${SERVICE_NAME}

Step 2: Install Prometheus

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-2.45.0.linux-amd64.tar.gz
sudo mv prometheus-2.45.0.linux-amd64 /opt/prometheus

# Create config
sudo tee /opt/prometheus/prometheus.yml > /dev/null <<EOF
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'stable-node'
    static_configs:
      - targets: ['localhost:26660']
        labels:
          instance: 'mainnode'

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']
EOF

# Create systemd service
sudo tee /etc/systemd/system/prometheus.service > /dev/null <<EOF
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \
  --config.file=/opt/prometheus/prometheus.yml \
  --storage.tsdb.path=/opt/prometheus/data

[Install]
WantedBy=multi-user.target
EOF

# Start Prometheus
sudo useradd -rs /bin/false prometheus
sudo chown -R prometheus:prometheus /opt/prometheus
sudo systemctl enable prometheus
sudo systemctl start prometheus

Step 3: Install Grafana

# Add Grafana repository
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

# Install Grafana
sudo apt-get update
sudo apt-get install grafana

# Start Grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

# Access at http://your-ip:3000
# Default login: admin/admin

Key Metrics to Monitor

Node Health Metrics

Metric	Description	Alert Threshold
`up`	Node availability	= 0 for 5m
`stablebft_consensus_height`	Current block height	No increase for 5m
`stablebft_consensus_validators`	Active validators	N/A
`stablebft_consensus_rounds`	Consensus rounds	> 3
`stablebft_consensus_block_interval`	Block time	> 10s
`stablebft_p2p_peers`	Connected peers	< 3
`stablebft_mempool_size`	Mempool size	> 1500
`stablebft_mempool_failed_txs`	Failed transactions	> 100/min

System Metrics

Metric	Description	Alert Threshold
`node_cpu_seconds_total`	CPU usage	> 80% for 5m
`node_memory_MemAvailable_bytes`	Available memory	< 10%
`node_filesystem_avail_bytes`	Available disk	< 10%
`node_network_receive_bytes_total`	Network RX	> 100MB/s
`node_disk_io_time_seconds_total`	Disk I/O	> 80%
`node_load15`	System load	> CPU cores * 2

Grafana Dashboard Setup

Import Stable Dashboard

{
  "dashboard": {
    "title": "Stable Node Monitoring",
    "panels": [
      {
        "title": "Block Height",
        "targets": [
          {
            "expr": "stablebft_consensus_height{chain_id=\"stabletestnet_2201-1\"}"
          }
        ]
      },
      {
        "title": "Peers",
        "targets": [
          {
            "expr": "stablebft_p2p_peers"
          }
        ]
      },
      {
        "title": "Block Time",
        "targets": [
          {
            "expr": "rate(stablebft_consensus_height[1m]) * 60"
          }
        ]
      },
      {
        "title": "Mempool Size",
        "targets": [
          {
            "expr": "stablebft_mempool_size"
          }
        ]
      }
    ]
  }
}

Custom Dashboard Import

Import dashboards via Grafana UI:

# Navigate to Dashboards > Import > Upload JSON file
# Or use Dashboard ID in Grafana's dashboard library

AlertManager Configuration

Install AlertManager

# Download AlertManager
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar xvf alertmanager-0.26.0.linux-amd64.tar.gz
sudo mv alertmanager-0.26.0.linux-amd64 /opt/alertmanager

# Configure
sudo tee /opt/alertmanager/alertmanager.yml > /dev/null <<EOF
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'team-notifications'

receivers:
  - name: 'team-notifications'
    webhook_configs:
      - url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
        send_resolved: true
    email_configs:
      - to: 'alerts@yourteam.com'
        from: 'prometheus@yournode.com'
        smarthost: 'smtp.gmail.com:587'
        auth_username: 'your@gmail.com'
        auth_password: 'app-specific-password'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
EOF

# Start AlertManager
sudo systemctl enable alertmanager
sudo systemctl start alertmanager

Alert Rules

# /opt/prometheus/alerts.yml
groups:
  - name: stable_alerts
    rules:
      - alert: NodeDown
        expr: up{job="stable-node"} == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.instance }} is down"

      - alert: BlockProductionStopped
        expr: increase(stablebft_consensus_height[5m]) == 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Block production stopped"

      - alert: LowPeerCount
        expr: stablebft_p2p_peers < 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low peer count: {{ $value }}"

      - alert: HighMempool
        expr: stablebft_mempool_size > 1500
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High mempool size: {{ $value }}"

      - alert: DiskSpaceLow
        expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} < 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space: {{ $value | humanizePercentage }}"

      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage: {{ $value }}%"

Log Monitoring

Systemd Logs

# View recent logs
sudo journalctl -u ${SERVICE_NAME} -n 100

# Follow logs
sudo journalctl -u ${SERVICE_NAME} -f

# Filter by time
sudo journalctl -u ${SERVICE_NAME} --since "1 hour ago"

# Export logs
sudo journalctl -u ${SERVICE_NAME} --since today > stable-logs-$(date +%Y%m%d).log

Log Analysis Scripts

#!/bin/bash
# analyze-logs.sh

# Count errors in last hour
echo "Errors in last hour:"
sudo journalctl -u ${SERVICE_NAME} --since "1 hour ago" | grep -c ERROR

# Show peer connections
echo "Peer connections:"
sudo journalctl -u ${SERVICE_NAME} --since "10 minutes ago" | grep "Peer connection" | tail -10

# Check for consensus issues
echo "Consensus rounds:"
sudo journalctl -u ${SERVICE_NAME} --since "30 minutes ago" | grep -E "enterNewRound|Timeout" | tail -20

# Memory usage patterns
echo "Memory warnings:"
sudo journalctl -u ${SERVICE_NAME} --since "1 day ago" | grep -i memory

Loki Setup (Optional)

# Install Loki
wget https://github.com/grafana/loki/releases/download/v2.9.0/loki-linux-amd64.zip
unzip loki-linux-amd64.zip
sudo mv loki-linux-amd64 /usr/local/bin/loki

# Install Promtail
wget https://github.com/grafana/loki/releases/download/v2.9.0/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
sudo mv promtail-linux-amd64 /usr/local/bin/promtail

# Configure Promtail
sudo tee /etc/promtail-config.yml > /dev/null <<EOF
server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
  - job_name: stable
    systemd_journal:
      matches: "_SYSTEMD_UNIT=stabled.service"
      labels:
        job: stable
        host: localhost
EOF

# Start services
promtail -config.file=/etc/promtail-config.yml

Health Check Endpoints

HTTP Endpoints

# Basic health check
curl -s http://localhost:26657/health

# Node status
curl -s http://localhost:26657/status | jq

# Net info
curl -s http://localhost:26657/net_info | jq

# Consensus state
curl -s http://localhost:26657/consensus_state | jq

# Unconfirmed transactions
curl -s http://localhost:26657/num_unconfirmed_txs | jq

Health Check Script

#!/bin/bash
# health-check.sh

set -e

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
export SERVICE_NAME="stable"

echo "=== Stable Node Health Check ==="
echo

# Check if service is running
if systemctl is-active --quiet ${SERVICE_NAME}; then
    echo -e "${GREEN}✓${NC} Service is running"
else
    echo -e "${RED}✗${NC} Service is not running"
    exit 1
fi

# Check node sync status
SYNC_STATUS=$(curl -s localhost:26657/status | jq -r '.result.sync_info.catching_up')
if [ "$SYNC_STATUS" = "false" ]; then
    echo -e "${GREEN}✓${NC} Node is synced"
else
    echo -e "${YELLOW}⚠${NC} Node is syncing"
fi

# Check peer count
PEERS=$(curl -s localhost:26657/net_info | jq -r '.result.n_peers')
if [ "$PEERS" -ge 3 ]; then
    echo -e "${GREEN}✓${NC} Connected peers: $PEERS"
else
    echo -e "${YELLOW}⚠${NC} Low peer count: $PEERS"
fi

# Check disk space
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -lt 80 ]; then
    echo -e "${GREEN}✓${NC} Disk usage: ${DISK_USAGE}%"
else
    echo -e "${YELLOW}⚠${NC} High disk usage: ${DISK_USAGE}%"
fi

# Check memory
MEM_AVAILABLE=$(free -m | awk 'NR==2 {print $7}')
MEM_TOTAL=$(free -m | awk 'NR==2 {print $2}')
MEM_PERCENT=$((100 - (MEM_AVAILABLE * 100 / MEM_TOTAL)))
if [ "$MEM_PERCENT" -lt 80 ]; then
    echo -e "${GREEN}✓${NC} Memory usage: ${MEM_PERCENT}%"
else
    echo -e "${YELLOW}⚠${NC} High memory usage: ${MEM_PERCENT}%"
fi

echo
echo "=== Health Check Complete ==="

Maintenance Tasks

Daily Maintenance

#!/bin/bash
# daily-maintenance.sh

# Rotate logs
sudo journalctl --rotate
sudo journalctl --vacuum-time=7d

# Clear cache
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

# Check for updates
echo "Checking for updates..."
curl -s https://api.github.com/repos/stable-chain/stable/releases/latest | jq -r '.tag_name'

# Backup important config files
cp ~/.stabled/config/node_key.json ~/backups/node_key_$(date +%Y%m%d).json

# Generate report
echo "Daily report generated: $(date)" > ~/reports/daily_$(date +%Y%m%d).log
curl -s localhost:26657/status | jq >> ~/reports/daily_$(date +%Y%m%d).log

Weekly Maintenance

#!/bin/bash
# weekly-maintenance.sh

# Prune old data
stabled prune

# Compact database
stabled compact

# Update peer list
wget https://raw.githubusercontent.com/stable-chain/networks/main/testnet/peers.txt
cat peers.txt >> ~/.stabled/config/config.toml

# Create snapshot (optional)
./create-snapshot.sh

# System updates
sudo apt update
sudo apt upgrade -y

# Restart node (during low activity)
sudo systemctl restart ${SERVICE_NAME}

Database Maintenance

# Check database size
du -sh ~/.stabled/data/

# Analyze database
stabled debug db stats ~/.stabled/data

Performance Monitoring

Resource Usage Tracking

#!/bin/bash
# track-resources.sh

while true; do
    TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
    CPU=$(top -bn1 | grep "stabled" | awk '{print $9}')
    MEM=$(top -bn1 | grep "stabled" | awk '{print $10}')
    IO=$(iostat -x 1 2 | tail -n2 | awk '{print $14}')

    echo "$TIMESTAMP,CPU:$CPU,MEM:$MEM,IO:$IO" >> ~/metrics/resources.csv

    sleep 60
done

Query Performance

# Monitor RPC response times
while true; do
    START=$(date +%s%N)
    curl -s http://localhost:26657/status > /dev/null
    END=$(date +%s%N)
    DIFF=$((($END - $START) / 1000000))
    echo "RPC response time: ${DIFF}ms"
    sleep 5
done

Monitoring Best Practices

Set Up Redundant Monitoring
- Use external monitoring services
- Implement cross-node monitoring
- Set up dead man’s switch alerts
Alert Fatigue Prevention
- Tune alert thresholds based on baseline
- Use alert grouping and inhibition
- Implement escalation policies
Data Retention
- Keep metrics for 30 days minimum
- Archive important logs
- Regular backup of monitoring configs
Security
- Secure Grafana with strong passwords
- Use HTTPS for all endpoints
- Restrict prometheus access
Documentation
- Document all custom metrics
- Maintain runbooks for alerts
- Keep dashboard descriptions updated

Next Steps

Review Troubleshooting Guide for issue resolution
Configure Upgrades with monitoring
Set up custom alerts based on your requirements

Introduction

Developers

Stable Architecture

Resources

Monitoring & Maintenance Guide

Monitoring Stack Overview

Recommended Stack

Quick Monitoring Setup

Step 1: Enable Prometheus Metrics

Step 2: Install Prometheus

Step 3: Install Grafana

Key Metrics to Monitor

Node Health Metrics

System Metrics

Grafana Dashboard Setup

Import Stable Dashboard

Custom Dashboard Import

AlertManager Configuration

Install AlertManager

Alert Rules

Log Monitoring

Systemd Logs

Log Analysis Scripts

Loki Setup (Optional)

Health Check Endpoints

HTTP Endpoints

Health Check Script

Maintenance Tasks

Daily Maintenance

Weekly Maintenance

Database Maintenance

Performance Monitoring

Resource Usage Tracking

Query Performance

Monitoring Best Practices

Next Steps

Introduction

Developers

Stable Architecture

Resources

​Monitoring Stack Overview

​Recommended Stack

​Quick Monitoring Setup

​Step 1: Enable Prometheus Metrics

​Step 2: Install Prometheus

​Step 3: Install Grafana

​Key Metrics to Monitor

​Node Health Metrics

​System Metrics

​Grafana Dashboard Setup

​Import Stable Dashboard

​Custom Dashboard Import

​AlertManager Configuration

​Install AlertManager

​Alert Rules

​Log Monitoring

​Systemd Logs

​Log Analysis Scripts

​Loki Setup (Optional)

​Health Check Endpoints

​HTTP Endpoints

​Health Check Script

​Maintenance Tasks

​Daily Maintenance

​Weekly Maintenance

​Database Maintenance

​Performance Monitoring

​Resource Usage Tracking

​Query Performance

​Monitoring Best Practices

​Next Steps

Monitoring Stack Overview

Recommended Stack

Quick Monitoring Setup

Step 1: Enable Prometheus Metrics

Step 2: Install Prometheus

Step 3: Install Grafana

Key Metrics to Monitor

Node Health Metrics

System Metrics

Grafana Dashboard Setup

Import Stable Dashboard

Custom Dashboard Import

AlertManager Configuration

Install AlertManager

Alert Rules

Log Monitoring

Systemd Logs

Log Analysis Scripts

Loki Setup (Optional)

Health Check Endpoints

HTTP Endpoints

Health Check Script

Maintenance Tasks

Daily Maintenance

Weekly Maintenance

Database Maintenance

Performance Monitoring

Resource Usage Tracking

Query Performance

Monitoring Best Practices

Next Steps