# Recovery — bootstrap the fleet from scratch

If tmux died, the GVL server rebooted, you're on a fresh box, or you have
no idea what happened — follow this.

## TL;DR (same machine, fleet just died)

```bash
cd /data/cameron/agents_stuff
git pull
./bootstrap.sh
```

Done. The agents re-read their roles and latest scrollback logs and pick up
where they left off.

## Full recovery (new machine / wiped disk)

### 1. Install prerequisites

```bash
# tmux, python3, git, claude, cloudflared
sudo apt-get install -y tmux python3 python3-pip git
pip install pyyaml
# install claude per https://claude.ai/code
# install cloudflared per https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/
```

### 2. Clone the repo

```bash
git clone https://github.com/cameronosmith/agents_stuff.git /data/cameron/agents_stuff
cd /data/cameron/agents_stuff
```

### 3. Set up secrets

```bash
cp .env.example .env
chmod 600 .env
$EDITOR .env   # fill in real values
```

You'll need:

- `ANTHROPIC_API_KEY` — generate at console.anthropic.com (NEW key — rotate
  the one that was previously committed in plaintext to ~/.cameron_claude_backup_notes)
- `DASHBOARD_PASSWORD` — for the `/agents` route on omidlab.net
- Cloudflare tunnel credentials at `~/.cloudflared/` — recover from your
  password manager or re-create the tunnel via `cloudflared tunnel login`

### 4. Set up project directories

The agents expect these paths to exist:

- `/data/cameron/para/` — PARA research project
- `/data/cameron/life/` — personal knowledge base (for `life_manager`)
- `/data/cameron/567_augmentation_viewpoint_project/` — course project

If you're on a fresh machine, clone the relevant repos for those.

### 5. Bootstrap the fleet

```bash
./bootstrap.sh
```

You'll be dropped into the `manager` tmux window. Switch between agents with
`Ctrl-b 0`, `Ctrl-b 1`, etc., or `Ctrl-b w` for a window list.

### 6. Start the dashboard

In the `dashboard` tmux window (created but not running anything), paste the
command shown in the banner. It starts Flask + cloudflared.

### 7. Install the cron backup

```bash
(crontab -l 2>/dev/null; \
 echo "0 0 * * * bash /data/cameron/agents_stuff/shared/save_scrollback.sh >> /data/cameron/agents_stuff/logs/cron.log 2>&1") \
 | crontab -
crontab -l   # confirm
```

### 8. Verify

- `tmux list-windows -t agents` — every agent + service window present
- `https://omidlab.net` resolves to the dashboard
- Each agent has typed a one-line status of its understood task (the
  onboarding prompt asks for this)

## What survives a full disk loss vs what doesn't

**Survives** (committed to GitHub):
- All `ROLE.md` files
- `config.yaml`, `bootstrap.sh`, shared scripts
- Latest scrollback log per agent (`logs/<name>_latest.log`)

**Lost** (local-only):
- Claude session JSONL files in `~/.claude/projects/` — full conversational
  context. Agents re-onboard from ROLE + scrollback log instead.
- `inbox.md` / `outbox.md` / `status.md` — transient runtime state
- `~/.cloudflared/` tunnel credentials — recover from password manager

## What if the agent comes up confused

If an agent's onboarding prompt didn't take effect (e.g., claude was slow to
start and missed the input), nudge it manually:

```bash
tmux send-keys -t agents:<name> "Read your ROLE.md at /data/cameron/agents_stuff/agents/<name>/ROLE.md and the latest scrollback at /data/cameron/agents_stuff/logs/<name>_latest.log, then give me a one-line status." Enter
sleep 1
tmux send-keys -t agents:<name> Enter
```

If the bootstrap onboarding prompt timing is consistently off on your
machine, edit the `sleep 4` in `bootstrap.sh` (the wait between launching
claude and sending the prompt) — bump it up.

## Splitting the dashboard onto a separate VPS later

The bootstrap is structured so this is easy:

1. Spin up a $4 Hetzner box (or similar)
2. `git clone` this repo there
3. `./bootstrap.sh --only=dashboard` — only the service window comes up
4. Move the cloudflared tunnel cert to the new box
5. Run the agent fleet on the GPU box without the dashboard

See conversation with claude on 2026-04-24 for the full reasoning.
