Skip to main content

CloudWatch Helper

Use these CloudWatch Logs Insights queries to debug GitHub webhook behavior end-to-end.

1) Trace One Webhook Delivery End-To-End

Use this when you have the GitHub delivery ID (x-github-delivery).

fields @timestamp, @message, level, msg, event, deliveryId, action, repo, pullRequestNumber, issueNumber, ignored, ignoredReason, trigger, automation, reviewState, implementationAgentId, implementationPullRequestUrl
| filter deliveryId = "REPLACE_WITH_DELIVERY_ID"
| sort @timestamp asc
| limit 200

2) Debug Request-Changes Events That Were Ignored

fields @timestamp, msg, event, deliveryId, action, repo, pullRequestNumber, reviewState, ignored, ignoredReason, senderType
| filter event = "pull_request_review"
| filter trigger = "changes_requested" or ignoredReason = "no-changes-requested-review"
| sort @timestamp desc
| limit 200

3) Debug One PR Review Webhook Flow

This shows accepted event handling and downstream automation logs.

fields @timestamp, msg, event, deliveryId, repo, pullRequestNumber, trigger, automation, ignored, ignoredReason, feedbackCount, routeTarget, implementationProvider, implementationAgentId, implementationAgentUrl, implementationPullRequestUrl
| filter repo = "your-org/your-repo"
| filter pullRequestNumber = 123
| filter event = "pull_request_review" or automation like /review-/
| sort @timestamp asc
| limit 300

4) Failures Only (Quick Triage)

fields @timestamp, level, msg, event, deliveryId, repo, pullRequestNumber, issueNumber, automation, @message
| filter level >= 50 or msg like /Failed|Rejected|invalid/i
| sort @timestamp desc
| limit 200

5) Trace a Request Across Services by trace_id

Every accepted webhook and its downstream worker execution share the same trace_id (set from the GitHub delivery ID). Use this to follow a single request through core → orchestrator → worker in one query across all log groups.

fields @timestamp, @log, service, component, msg, final_status
| filter trace_id = "REPLACE_WITH_TRACE_ID"
| sort @timestamp asc
| limit 500

Run this against a log group that spans all services, or use a log group pattern (/ecs/alakai/sandbox/*) if your account supports it.

6) Structured Event Summary (Latency + Outcome)

The logger emits one structured event log per operation with event, outcome, duration_ms, and steps. Use this to get a latency and success-rate summary across all operations:

fields @timestamp, service, event, outcome, final_status, duration_ms
| filter ispresent(event) and ispresent(duration_ms)
| stats avg(duration_ms) as avg_ms,
max(duration_ms) as max_ms,
count(*) as total,
sum(outcome = "error") as errors
by event, outcome
| sort event, outcome

7) Token Usage per Model

Track AI token consumption over time to monitor cost:

fields @timestamp, agent.model, agent.input_tokens, agent.output_tokens
| filter ispresent(agent.model)
| stats sum(agent.input_tokens) as total_input,
sum(agent.output_tokens) as total_output,
count(*) as runs
by agent.model
| sort total_input desc

8) Error Rate by Repo

fields @timestamp, outcome, repo.full_name
| filter ispresent(event) and outcome = "error"
| stats count(*) as errors by repo.full_name
| sort errors desc

Metric Filters to Create

Create these metric filters on the log groups to get CloudWatch metrics you can alarm on:

Metric nameLog groupFilter patternValue
coding.prompt.error/ecs/alakai/*/core{ $.event = "coding.prompt" && $.outcome = "error" }1
coding.implementation.error/ecs/alakai/*/orchestrator{ $.event = "coding.implementation" && $.outcome = "error" }1
coding.prompt.duration_ms/ecs/alakai/*/core{ $.event = "coding.prompt" && $.outcome = "success" }$.duration_ms
worker.fatal/ecs/alakai/*/workers{ $.level = 60 }1

With coding.implementation.error alarmed, you get notified automatically when a worker fails — no need to manually watch CW.

Tips

  • Use deliveryId when possible to correlate a single webhook from ingress to completion.
  • Use trace_id to follow a request across services (core → orchestrator → worker) — it matches the GitHub delivery ID for webhook-triggered flows.
  • For request-changes debugging, focus first on ignored and ignoredReason, then follow automation logs.
  • If an automation starts but no follow-up action occurs, check error-level entries for the same deliveryId.
  • The structured event log (msg ends in -> success or -> error) is always the last log for an operation and contains the full summary — filter on it first for triage.

See Tracing Guide for the full cross-component tracing runbook.