incident-commander/docs/SETUP.md
2026-03-12 13:32:12 +03:00

3.7 KiB

Setup Guide — Hermes Incident Commander

Quick Start (Demo Only — No Hermes Required)

# 1. Clone the repo
git clone https://github.com/YOUR_USERNAME/hermes-incident-commander
cd hermes-incident-commander

# 2. Install demo dependencies
pip install anthropic rich

# 3. Set API key
export ANTHROPIC_API_KEY=sk-ant-...

# 4. Run a demo incident
python demo/demo_incident.py --scenario disk-full-logs

Full Setup (With Hermes Agent)

Step 1: Install Hermes Agent

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

This installs Python 3.11, uv, and Hermes. Takes about 60 seconds.

Step 2: Run Setup Wizard

hermes setup

Choose your model provider:

  • Nous Portal (recommended) — OAuth login, access to Hermes models
  • OpenRouter — API key, access to all models
  • Custom endpoint — VLLM, Ollama, or any OpenAI-compatible API

Step 3: Install the Incident Commander Skill

# Copy skill to Hermes's skills directory
cp -r skills/incident-commander ~/.hermes/skills/

# Verify it loaded
hermes
# In the Hermes CLI:
> /skills
# You should see "incident-commander" in the list

Step 4: Set Up Messaging Gateway (for alerts)

hermes gateway setup

Follow the prompts to connect:

  • Telegram (recommended) — Create a bot via @BotFather, paste the token
  • Discord — Create a bot in Discord Developer Portal
  • Slack — Create a Slack app with webhook URL

Then start the gateway:

hermes gateway install  # Installs as systemd service (runs on boot)
hermes gateway          # Or run manually

Step 5: Configure Monitoring Cron Jobs

Open a Hermes conversation and say:

Set up incident monitoring with these schedules:
- Every 5 minutes: run a critical health check, alert me on Telegram if severity is P0 or P1
- Every hour: run a comprehensive system audit and save it to ~/.hermes/incidents/
- Every day at 08:00: send me a morning briefing on Telegram with any trends or risks

Hermes will create the cron jobs automatically.

Step 6: Test It

# Trigger a test incident
hermes
> I'm testing the incident response system. 
> Please investigate the current system health and generate a test incident report.

RL Training Setup (Advanced)

Prerequisites

  • Hermes Agent installed
  • Atropos: pip install atroposlib
  • For GPU training: VLLM installed

Generate SFT Training Data

# Set your API key
export OPENROUTER_API_KEY=sk-or-...

# Generate 100 training trajectories (SFT mode)
python environments/incident_env.py process \
    --config environments/incident_config.yaml \
    --num-episodes 100

Output: ShareGPT-format JSONL in ~/.hermes/trajectories/

Full RL Training

# Edit config to point to your VLLM server
vim environments/incident_config.yaml
# Set: server_type: vllm, model_name: your-model

# Start training
python environments/incident_env.py serve \
    --config environments/incident_config.yaml

Metrics logged to Weights & Biases (configure wandb.entity in config).


Troubleshooting

"hermes: command not found"

source ~/.bashrc  # or ~/.zshrc
# Or: ~/.local/bin/hermes

"Skill not loading"

ls ~/.hermes/skills/incident-commander/SKILL.md  # Should exist
hermes doctor  # Runs full diagnostics

"Cron jobs not running"

hermes gateway  # Gateway must be running for cron
systemctl status hermes-gateway  # If installed as service

Demo script errors

pip install anthropic rich  # Make sure dependencies are installed
python -c "import anthropic; print(anthropic.__version__)"  # Should be >= 0.49