CloudWatch Helper

Use these CloudWatch Logs Insights queries to debug GitHub webhook behavior end-to-end.

1) Trace One Webhook Delivery End-To-End

Use this when you have the GitHub delivery ID (x-github-delivery).

fields @timestamp, @message, level, msg, event, deliveryId, action, repo, pullRequestNumber, issueNumber, ignored, ignoredReason, trigger, automation, reviewState, implementationAgentId, implementationPullRequestUrl
| filter deliveryId = "REPLACE_WITH_DELIVERY_ID"
| sort @timestamp asc
| limit 200

2) Debug Request-Changes Events That Were Ignored

fields @timestamp, msg, event, deliveryId, action, repo, pullRequestNumber, reviewState, ignored, ignoredReason, senderType
| filter event = "pull_request_review"
| filter trigger = "changes_requested" or ignoredReason = "no-changes-requested-review"
| sort @timestamp desc
| limit 200

3) Debug One PR Review Webhook Flow

This shows accepted event handling and downstream automation logs.

fields @timestamp, msg, event, deliveryId, repo, pullRequestNumber, trigger, automation, ignored, ignoredReason, feedbackCount, routeTarget, implementationProvider, implementationAgentId, implementationAgentUrl, implementationPullRequestUrl
| filter repo = "your-org/your-repo"
| filter pullRequestNumber = 123
| filter event = "pull_request_review" or automation like /review-/
| sort @timestamp asc
| limit 300

4) Failures Only (Quick Triage)

fields @timestamp, level, msg, event, deliveryId, repo, pullRequestNumber, issueNumber, automation, @message
| filter level >= 50 or msg like /Failed|Rejected|invalid/i
| sort @timestamp desc
| limit 200

5) Trace a Request Across Services by trace_id

Every accepted webhook and its downstream worker execution share the same trace_id (set from the GitHub delivery ID). Use this to follow a single request through core → orchestrator → worker in one query across all log groups.

fields @timestamp, @log, service, component, msg, final_status
| filter trace_id = "REPLACE_WITH_TRACE_ID"
| sort @timestamp asc
| limit 500

Run this against a log group that spans all services, or use a log group pattern (/ecs/alakai/sandbox/*) if your account supports it.

6) Structured Event Summary (Latency + Outcome)

The logger emits one structured event log per operation with event, outcome, duration_ms, and steps. Use this to get a latency and success-rate summary across all operations:

fields @timestamp, service, event, outcome, final_status, duration_ms
| filter ispresent(event) and ispresent(duration_ms)
| stats avg(duration_ms) as avg_ms,
        max(duration_ms) as max_ms,
        count(*) as total,
        sum(outcome = "error") as errors
  by event, outcome
| sort event, outcome

7) Token Usage per Model

Track AI token consumption over time to monitor cost:

fields @timestamp, agent.model, agent.input_tokens, agent.output_tokens
| filter ispresent(agent.model)
| stats sum(agent.input_tokens) as total_input,
        sum(agent.output_tokens) as total_output,
        count(*) as runs
  by agent.model
| sort total_input desc

8) Error Rate by Repo

fields @timestamp, outcome, repo.full_name
| filter ispresent(event) and outcome = "error"
| stats count(*) as errors by repo.full_name
| sort errors desc

Metric Filters to Create

Create these metric filters on the log groups to get CloudWatch metrics you can alarm on:

Metric name	Log group	Filter pattern	Value
`coding.prompt.error`	`/ecs/alakai/*/core`	`{ $.event = "coding.prompt" && $.outcome = "error" }`	`1`
`coding.implementation.error`	`/ecs/alakai/*/orchestrator`	`{ $.event = "coding.implementation" && $.outcome = "error" }`	`1`
`coding.prompt.duration_ms`	`/ecs/alakai/*/core`	`{ $.event = "coding.prompt" && $.outcome = "success" }`	`$.duration_ms`
`worker.fatal`	`/ecs/alakai/*/workers`	`{ $.level = 60 }`	`1`

With coding.implementation.error alarmed, you get notified automatically when a worker fails — no need to manually watch CW.

Tips

Use deliveryId when possible to correlate a single webhook from ingress to completion.
Use trace_id to follow a request across services (core → orchestrator → worker) — it matches the GitHub delivery ID for webhook-triggered flows.
For request-changes debugging, focus first on ignored and ignoredReason, then follow automation logs.
If an automation starts but no follow-up action occurs, check error-level entries for the same deliveryId.
The structured event log (msg ends in -> success or -> error) is always the last log for an operation and contains the full summary — filter on it first for triage.

See Tracing Guide for the full cross-component tracing runbook.

1) Trace One Webhook Delivery End-To-End​

2) Debug Request-Changes Events That Were Ignored​

3) Debug One PR Review Webhook Flow​

4) Failures Only (Quick Triage)​

5) Trace a Request Across Services by trace_id​

6) Structured Event Summary (Latency + Outcome)​

7) Token Usage per Model​

8) Error Rate by Repo​

Metric Filters to Create​

Tips​