CloudWatch Helper
Use these CloudWatch Logs Insights queries to debug GitHub webhook behavior end-to-end.
1) Trace One Webhook Delivery End-To-End
Use this when you have the GitHub delivery ID (x-github-delivery).
fields @timestamp, @message, level, msg, event, deliveryId, action, repo, pullRequestNumber, issueNumber, ignored, ignoredReason, trigger, automation, reviewState, implementationAgentId, implementationPullRequestUrl
| filter deliveryId = "REPLACE_WITH_DELIVERY_ID"
| sort @timestamp asc
| limit 200
2) Debug Request-Changes Events That Were Ignored
fields @timestamp, msg, event, deliveryId, action, repo, pullRequestNumber, reviewState, ignored, ignoredReason, senderType
| filter event = "pull_request_review"
| filter trigger = "changes_requested" or ignoredReason = "no-changes-requested-review"
| sort @timestamp desc
| limit 200
3) Debug One PR Review Webhook Flow
This shows accepted event handling and downstream automation logs.
fields @timestamp, msg, event, deliveryId, repo, pullRequestNumber, trigger, automation, ignored, ignoredReason, feedbackCount, routeTarget, implementationProvider, implementationAgentId, implementationAgentUrl, implementationPullRequestUrl
| filter repo = "your-org/your-repo"
| filter pullRequestNumber = 123
| filter event = "pull_request_review" or automation like /review-/
| sort @timestamp asc
| limit 300
4) Failures Only (Quick Triage)
fields @timestamp, level, msg, event, deliveryId, repo, pullRequestNumber, issueNumber, automation, @message
| filter level >= 50 or msg like /Failed|Rejected|invalid/i
| sort @timestamp desc
| limit 200
5) Trace a Request Across Services by trace_id
Every accepted webhook and its downstream worker execution share the same trace_id (set from the GitHub delivery ID). Use this to follow a single request through core → orchestrator → worker in one query across all log groups.
fields @timestamp, @log, service, component, msg, final_status
| filter trace_id = "REPLACE_WITH_TRACE_ID"
| sort @timestamp asc
| limit 500
Run this against a log group that spans all services, or use a log group pattern (/ecs/alakai/sandbox/*) if your account supports it.
6) Structured Event Summary (Latency + Outcome)
The logger emits one structured event log per operation with event, outcome, duration_ms, and steps. Use this to get a latency and success-rate summary across all operations:
fields @timestamp, service, event, outcome, final_status, duration_ms
| filter ispresent(event) and ispresent(duration_ms)
| stats avg(duration_ms) as avg_ms,
max(duration_ms) as max_ms,
count(*) as total,
sum(outcome = "error") as errors
by event, outcome
| sort event, outcome
7) Token Usage per Model
Track AI token consumption over time to monitor cost:
fields @timestamp, agent.model, agent.input_tokens, agent.output_tokens
| filter ispresent(agent.model)
| stats sum(agent.input_tokens) as total_input,
sum(agent.output_tokens) as total_output,
count(*) as runs
by agent.model
| sort total_input desc
8) Error Rate by Repo
fields @timestamp, outcome, repo.full_name
| filter ispresent(event) and outcome = "error"
| stats count(*) as errors by repo.full_name
| sort errors desc
Metric Filters to Create
Create these metric filters on the log groups to get CloudWatch metrics you can alarm on:
| Metric name | Log group | Filter pattern | Value |
|---|---|---|---|
coding.prompt.error | /ecs/alakai/*/core | { $.event = "coding.prompt" && $.outcome = "error" } | 1 |
coding.implementation.error | /ecs/alakai/*/orchestrator | { $.event = "coding.implementation" && $.outcome = "error" } | 1 |
coding.prompt.duration_ms | /ecs/alakai/*/core | { $.event = "coding.prompt" && $.outcome = "success" } | $.duration_ms |
worker.fatal | /ecs/alakai/*/workers | { $.level = 60 } | 1 |
With coding.implementation.error alarmed, you get notified automatically when a worker fails — no need to manually watch CW.
Tips
- Use
deliveryIdwhen possible to correlate a single webhook from ingress to completion. - Use
trace_idto follow a request across services (core → orchestrator → worker) — it matches the GitHub delivery ID for webhook-triggered flows. - For request-changes debugging, focus first on
ignoredandignoredReason, then followautomationlogs. - If an automation starts but no follow-up action occurs, check error-level entries for the same
deliveryId. - The structured event log (
msgends in-> successor-> error) is always the last log for an operation and contains the full summary — filter on it first for triage.
See Tracing Guide for the full cross-component tracing runbook.