Tinker
Resources
Agent logs
Agent memories
Agent sessions
Agent terminal logs
Agents
Comments
Epics
Projects
Proposals
Tickets
Avo user
Resources
Agent logs
Agent memories
Agent sessions
Agent terminal logs
Agents
Comments
Epics
Projects
Proposals
Tickets
Avo user
Home
Epics
Implement smarter context refresh conditional on worker availability
Edit
Implement smarter context refresh conditional on worker availability
Cancel
Save
Title
*
Project
*
Choose an option
alpha
tinker
Create new project
Description
The current `/new` command mechanism (`ResetContextJob`) has two problems: 1. **Broken**: Sends wrong message format (`{command: "/new"}` but bridge expects `{type: "input", data: "..."}`) 2. **Not smart**: Fires every hour at :08 regardless of worker state - could interrupt mid-task workers 3. **Zombie worker problem**: Workers that crash/error/forget to mark themselves `idle` stay `busy` forever ## Current State (Broken) **File**: `/rails/app/jobs/reset_context_job.rb` ```ruby # Sends /new at :08 every hour to ALL active agents Agent.where(status: :active).find_each do |agent| ActionCable.server.broadcast( "agent_#{agent.agent_type}_#{agent.project_id}", { command: "/new" } # ❌ WRONG FORMAT ) end ``` **File**: `/rails/agent-bridge.go` - Only handles `{type: "input", data: "..."}` - Silently ignores anything else ## Proposed Solution ### Option A: Fix + Smart Filtering + Timeout (Keep Scheduled Refresh) - Fix message format to use `AgentCommander.send_message` - Only send to `idle` workers OR workers with stale activity - Add timeout: auto-reset `busy` → `idle` after X minutes of inactivity - Keep schedule at :08 (within quiet period :00-:10) ### Option B: Remove Scheduled Refresh, Add Heartbeat (Event-Driven) - Workers already check for work via `list_tasks` MCP tool - Add heartbeat/last_activity_at tracking - Auto-idle workers with stale heartbeat - Delete `ResetContextJob` ### Option C: Conditional Refresh by Orchestrator + Zombie Detection - Orchestrator decides when to refresh context - Only sends refresh when: - Worker has been idle for X minutes - Worker's last ticket was completed/failed - **Worker is `busy` but stale (zombie detection)** - Remove hardcoded schedule - Add zombie worker cleanup ### Option D: Agent-Side Context Management + Auto-Idle - Workers manage their own context via MCP tools - Add `clear_context` or `reset_session` MCP tool - **Auto-idle timeout**: workers reset to `idle` if no activity for X minutes - No external triggering needed ## Recommended Approach: Option C (Orchestrator-Driven Conditional + Zombie Detection) **Why**: Orchestrator already knows: - Which workers are idle vs busy - What tickets are available - Which workers haven't had work in a while - **Which workers are stale/zombies (busy but no recent activity)** **Implementation**: 1. Remove `ResetContextJob` and recurring.yml entry 2. Add logic to `OrchestratorPingJob` or new service: - Check for idle workers with stale last_activity_at → safe to refresh - **Check for `busy` workers with stale activity → zombie, force to `idle`** - Only send refresh if worker idle AND no tasks in queue - Send via proper `AgentCommander.send_message` format 3. Add heartbeat tracking: - Update `last_activity_at` on every MCP call - Update `last_activity_at` on ticket state transition 4. Add MCP tool `refresh_worker_context(worker_id)` for explicit control 5. Optional: Let workers request refresh via MCP tool ## Zombie Worker Detection Strategy **Define "stale"**: Worker marked `busy` but no activity for X minutes (configurable, default: 30) **Detection logic**: ```ruby stale_workers = Agent.where(status: :active, availability_status: :busy) .where('last_activity_at < ?', X.minutes.ago) stale_workers.each do |zombie| zombie.update(availability_status: :idle) Rails.logger.warn "[ZombieWorker] Force-idle agent #{zombie.id} - no activity for #{X} minutes" # Optionally: send context refresh to clean up end ``` **Heartbeat sources**: - MCP tool calls - Ticket transitions (start_work, complete, fail_audit, etc.) - Agent messages via WebSocket - Terminal activity (via agent-bridge.go) ## Acceptance Criteria - [ ] `ResetContextJob` is removed (or fixed if Option A) - [ ] Scheduled "/new" no longer fires blindly - [ ] Context refresh only happens when worker is `idle` OR stale/zombie - [ ] **Zombie workers are auto-reset to `idle` after X minutes of inactivity** - [ ] Refresh respects: - Worker not currently working on a ticket - Worker has been idle for X minutes (configurable) - No urgent work pending - **Worker is stale (last_activity_at older than threshold)** - [ ] Message format uses proper `AgentCommander.send_message` - [ ] MCP tool `refresh_worker_context` exists for manual control - [ ] `last_activity_at` is updated on all relevant worker activities - [ ] Tests cover conditional logic and zombie detection - [ ] Quiet period (:00-:10) still respected ## Technical Notes **Agent availability_status**: - `idle` (0) - available for work - `busy` (1) - actively working **Zombie detection**: - `busy` + stale `last_activity_at` → force to `idle` - Configurable timeout (default: 30 minutes) - Log warnings for zombie detection **Current schedule coordination**: - :00-:10: Quiet period (no orchestrator pings) - :08: ResetContextJob fires (TO BE REMOVED/FIXED) - :10+: OrchestratorPingJob every 3 minutes **Files to modify**: - `/rails/config/recurring.yml` - remove/reset schedule - `/rails/app/jobs/reset_context_job.rb` - delete or refactor - `/rails/app/jobs/orchestrator_ping_job.rb` - add conditional logic + zombie detection - `/rails/app/services/agent_commander.rb` - ensure proper format used - `/rails/app/models/agent.rb` - add `stale?` scope, `mark_idle_if_stale!` method - `/rails/agent-bridge.go` - optionally add heartbeat updates
Avo
· © 2026 AvoHQ ·
v3.27.0
Close modal
Are you sure?
Yes, I'm sure
No, cancel