September 28, 2025ΒΆ
Generated: 2025-09-29 10:49 UTC
Total Duration: 1h 4m 41s
Iterations: 1
Judge (classifier) model: gpt-4.1
About this BenchmarkΒΆ
HolmesGPT is continuously evaluated against real-world Kubernetes and cloud troubleshooting scenarios.
If you find scenarios that HolmesGPT does not perform well on, please consider adding them as evals to the benchmark.
Model Accuracy ComparisonΒΆ
| Model | Pass | Fail | Skip/Error | Total | Success Rate |
|---|---|---|---|---|---|
| gpt-4o | 58 | 35 | 12 | 105 | π‘ 62% (58/93) |
| gpt-4.1 | 72 | 22 | 11 | 105 | π‘ 77% (72/94) |
| gpt-5 | 76 | 17 | 12 | 105 | π‘ 82% (76/93) |
| sonnet-4-20250514 | 88 | 6 | 11 | 105 | π‘ 94% (88/94) |
Model Cost ComparisonΒΆ
| Model | Tests | Avg Cost | Min Cost | Max Cost | Total Cost |
|---|---|---|---|---|---|
| gpt-4o | 93 | $0.14 | $0.03 | $0.27 | $12.83 |
| gpt-4.1 | 94 | $0.09 | $0.03 | $0.41 | $8.69 |
| gpt-5 | 93 | $0.13 | $0.02 | $0.39 | $12.35 |
| sonnet-4-20250514 | 94 | $0.16 | $0.06 | $0.50 | $15.47 |
Model Latency ComparisonΒΆ
| Model | Avg (s) | Min (s) | Max (s) | P50 (s) | P95 (s) |
|---|---|---|---|---|---|
| gpt-4o | 35.1 | 9.7 | 84.7 | 35.2 | 48.6 |
| gpt-4.1 | 35.0 | 7.0 | 80.4 | 34.6 | 58.5 |
| gpt-5 | 189.5 | 24.1 | 677.8 | 159.7 | 464.0 |
| sonnet-4-20250514 | 67.6 | 10.7 | 210.1 | 55.2 | 150.5 |
Performance by TagΒΆ
Success rate by test category and model:
| Tag | gpt-4o | gpt-4.1 | gpt-5 | sonnet-4-20250514 | Warnings |
|---|---|---|---|---|---|
| chain-of-causation | π΄ 0% (0/6) | π΄ 0% (0/6) | π‘ 67% (4/6) | π‘ 83% (β ) | β οΈ 8 skipped |
| context_window | π‘ 57% (4/7) | π‘ 71% (5/7) | π’ 100% (7/7) | π‘ 86% (6/7) | |
| counting | π’ 100% (4/4) | π’ 100% (4/4) | π’ 100% (4/4) | π’ 100% (4/4) | |
| database | π΄ 0% (0/1) | π΄ 0% (0/1) | π’ 100% (1/1) | π’ 100% (1/1) | β οΈ 12 skipped |
| datadog | π‘ 75% (ΒΎ) | π’ 100% (4/4) | π‘ 75% (ΒΎ) | π’ 100% (4/4) | |
| datetime | π‘ 75% (ΒΎ) | π‘ 50% (2/4) | π’ 100% (4/4) | π‘ 75% (ΒΎ) | β οΈ 8 skipped |
| easy | π‘ 94% (34/36) | π‘ 97% (35/36) | π‘ 78% (28/36) | π’ 100% (36/36) | |
| hard | π‘ 14% (2/14) | π‘ 29% (4/14) | π‘ 57% (8/14) | π‘ 86% (12/14) | β οΈ 24 skipped |
| kafka | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - | β οΈ 8 skipped |
| kubernetes | π‘ 55% (26/47) | π‘ 77% (36/47) | π‘ 81% (38/47) | π‘ 94% (44/47) | β οΈ 4 skipped |
| logs | π‘ 65% (17/26) | π‘ 73% (19/26) | π‘ 85% (22/26) | π‘ 85% (22/26) | β οΈ 28 skipped |
| medium | π‘ 51% (22/43) | π‘ 75% (33/44) | π‘ 93% (40/43) | π‘ 91% (40/44) | β οΈ 22 skipped |
| network | π‘ 25% (ΒΌ) | π‘ 75% (ΒΎ) | π’ 100% (4/4) | π’ 100% (4/4) | |
| numerical | π’ 100% (1/1) | π’ 100% (1/1) | π’ 100% (1/1) | π’ 100% (1/1) | |
| port-forward | π‘ 33% (3/9) | π‘ 56% (5/9) | π‘ 78% (7/9) | π‘ 78% (7/9) | |
| prometheus | π‘ 50% (2/4) | π’ 100% (4/4) | π’ 100% (4/4) | π’ 100% (4/4) | |
| question-answer | π’ 100% (4/4) | π’ 100% (4/4) | π’ 100% (4/4) | π’ 100% (4/4) | |
| runbooks | π‘ 33% (2/6) | π‘ 83% (β ) | π’ 100% (6/6) | π’ 100% (6/6) | β οΈ 4 skipped |
| slackbot | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - | β οΈ 4 skipped |
| traces | π΄ 0% (0/5) | π΄ 0% (0/5) | π‘ 60% (β ) | π‘ 80% (β ) | |
| transparency | π‘ 50% (7/14) | π‘ 71% (10/14) | π‘ 86% (12/14) | π‘ 86% (12/14) | β οΈ 4 skipped |
| Overall | π‘ 62% (58/93) | π‘ 77% (72/94) | π‘ 82% (76/93) | π‘ 94% (88/94) | β οΈ 46 skipped |
Raw ResultsΒΆ
Status of all evaluations across models. Color coding:
- π’ Passing 100% (stable)
- π‘ Passing 1-99%
- π΄ Passing 0% (failing)
- π§ Mock data failure (missing or invalid test data)
- β οΈ Setup failure (environment/infrastructure issue)
- β±οΈ Timeout or rate limit error
- βοΈ Test skipped (e.g., known issue or precondition not met)
Detailed Raw ResultsΒΆ
| Eval ID | gpt-4o | gpt-4.1 | gpt-5 | sonnet-4-20250514 |
|---|---|---|---|---|
| 01_how_many_pods π | π’ 100% (1/1) / β±οΈ 28.8s / π° $0.08 | π’ 100% (1/1) / β±οΈ 27.1s / π° $0.06 | π’ 100% (1/1) / β±οΈ 39.1s / π° $0.03 | π’ 100% (1/1) / β±οΈ 26.5s / π° $0.08 |
| 02_what_is_wrong_with_pod π | π’ 100% (1/1) / β±οΈ 27.1s / π° $0.08 | π’ 100% (1/1) / β±οΈ 28.3s / π° $0.06 | π’ 100% (1/1) / β±οΈ 179.2s / π° $0.14 | π’ 100% (1/1) / β±οΈ 44.1s / π° $0.11 |
| 03_what_is_the_command_to_port_forward π | π’ 100% (1/1) / β±οΈ 33.5s / π° $0.16 | π’ 100% (1/1) / β±οΈ 31.0s / π° $0.09 | π’ 100% (1/1) / β±οΈ 119.7s / π° $0.08 | π’ 100% (1/1) / β±οΈ 38.3s / π° $0.12 |
| 04_related_k8s_events π | π’ 100% (1/1) / β±οΈ 31.1s / π° $0.11 | π’ 100% (1/1) / β±οΈ 29.7s / π° $0.06 | π’ 100% (1/1) / β±οΈ 84.5s / π° $0.06 | π’ 100% (1/1) / β±οΈ 44.0s / π° $0.09 |
| 05_image_version π | π’ 100% (1/1) / β±οΈ 27.2s / π° $0.11 | π’ 100% (1/1) / β±οΈ 23.0s / π° $0.05 | π’ 100% (1/1) / β±οΈ 65.4s / π° $0.06 | π’ 100% (1/1) / β±οΈ 32.1s / π° $0.09 |
| 09_crashpod π | π’ 100% (1/1) / β±οΈ 32.3s / π° $0.11 | π’ 100% (1/1) / β±οΈ 31.5s / π° $0.06 | π’ 100% (1/1) / β±οΈ 74.4s / π° $0.09 | π’ 100% (1/1) / β±οΈ 54.8s / π° $0.16 |
| 100a_historical_logs π | π’ 100% (1/1) / β±οΈ 44.8s / π° $0.14 | π’ 100% (1/1) / β±οΈ 40.6s / π° $0.08 | π’ 100% (1/1) / β±οΈ 388.1s / π° $0.29 | π’ 100% (1/1) / β±οΈ 130.5s / π° $0.28 |
| 100b_historical_logs_nonstandard_label π | π΄ 0% (0/1) / β±οΈ 36.1s / π° $0.15 | π΄ 0% (0/1) / β±οΈ 34.6s / π° $0.07 | π΄ 0% (0/1) / β±οΈ 354.2s / π° $0.16 | π΄ 0% (0/1) / β±οΈ 151.0s / π° $0.23 |
| 101_historical_logs_pod_deleted π | π΄ 0% (0/1) / β±οΈ 40.2s / π° $0.12 | π΄ 0% (0/1) / β±οΈ 31.4s / π° $0.06 | π’ 100% (1/1) / β±οΈ 453.0s / π° $0.33 | π΄ 0% (0/1) / β±οΈ 139.6s / π° $0.46 |
| 103_logs_transparency_default_limit π | π΄ 0% (0/1) / β±οΈ 33.4s / π° $0.14 | π΄ 0% (0/1) / β±οΈ 46.9s / π° $0.41 | π’ 100% (1/1) / β±οΈ 76.1s / π° $0.07 | π΄ 0% (0/1) / β±οΈ 54.3s / π° $0.41 |
| 104a_postgres_root_issue π | π΄ 0% (0/1) / β±οΈ 36.3s / π° $0.17 | π΄ 0% (0/1) / β±οΈ 55.4s / π° $0.14 | π’ 100% (1/1) / β±οΈ 243.1s / π° $0.23 | π’ 100% (1/1) / β±οΈ 92.4s / π° $0.21 |
| 107_log_filter_http_status_code π | π’ 100% (1/1) / β±οΈ 37.0s / π° $0.15 | π’ 100% (1/1) / β±οΈ 38.2s / π° $0.11 | π’ 100% (1/1) / β±οΈ 304.7s / π° $0.26 | π’ 100% (1/1) / β±οΈ 80.1s / π° $0.19 |
| 108_logs_nearby_lines π | π΄ 0% (0/1) / β±οΈ 38.9s / π° $0.15 | π΄ 0% (0/1) / β±οΈ 41.4s / π° $0.14 | π΄ 0% (0/1) / β±οΈ 417.1s / π° $0.22 | π’ 100% (1/1) / β±οΈ 77.5s / π° $0.22 |
| 109_logs_transparency_not_found π | π΄ 0% (0/1) / β±οΈ 27.7s / π° $0.12 | π’ 100% (1/1) / β±οΈ 31.6s / π° $0.06 | π’ 100% (1/1) / β±οΈ 121.1s / π° $0.08 | π’ 100% (1/1) / β±οΈ 36.3s / π° $0.09 |
| 10_image_pull_backoff π | π’ 100% (1/1) / β±οΈ 40.8s / π° $0.13 | π’ 100% (1/1) / β±οΈ 28.4s / π° $0.05 | π΄ 0% (0/1) / β±οΈ 45.5s / π° $0.02 | π’ 100% (1/1) / β±οΈ 50.0s / π° $0.11 |
| 110_k8s_events_image_pull π | π’ 100% (1/1) / β±οΈ 32.6s / π° $0.11 | π’ 100% (1/1) / β±οΈ 34.6s / π° $0.07 | π’ 100% (1/1) / β±οΈ 55.4s / π° $0.05 | π’ 100% (1/1) / β±οΈ 43.9s / π° $0.11 |
| 111_disabled_datadog_traces π | π΄ 0% (0/1) / β±οΈ 28.8s / π° $0.03 | π΄ 0% (0/1) / β±οΈ 20.1s / π° $0.03 | π’ 100% (1/1) / β±οΈ 161.4s / π° $0.10 | π’ 100% (1/1) / β±οΈ 78.5s / π° $0.21 |
| 111_pod_names_contain_service π | π’ 100% (1/1) / β±οΈ 34.3s / π° $0.14 | π’ 100% (1/1) / β±οΈ 38.4s / π° $0.09 | π’ 100% (1/1) / β±οΈ 207.8s / π° $0.18 | π’ 100% (1/1) / β±οΈ 71.3s / π° $0.21 |
| 112_find_pvcs_by_uuid π | π΄ 0% (0/1) / β±οΈ 30.2s / π° $0.11 | π΄ 0% (0/1) / β±οΈ 39.4s / π° $0.07 | π’ 100% (1/1) / β±οΈ 100.6s / π° $0.08 | π’ 100% (1/1) / β±οΈ 49.9s / π° $0.14 |
| 114_checkout_latency_tracing_rebuild[0] π | π΄ 0% (0/1) / β±οΈ 40.2s / π° $0.25 | π΄ 0% (0/1) / β±οΈ 44.6s / π° $0.17 | π΄ 0% (0/1) / β±οΈ 443.6s / π° $0.36 | π’ 100% (1/1) / β±οΈ 120.6s / π° $0.36 |
| 115_checkout_errors_tracing[0] π | π΄ 0% (0/1) / β±οΈ 43.1s / π° $0.25 | π΄ 0% (0/1) / β±οΈ 64.1s / π° $0.17 | π’ 100% (1/1) / β±οΈ 193.9s / π° $0.20 | π’ 100% (1/1) / β±οΈ 109.7s / π° $0.35 |
| 11_init_containers π | π’ 100% (1/1) / β±οΈ 32.7s / π° $0.10 | π’ 100% (1/1) / β±οΈ 33.6s / π° $0.07 | π΄ 0% (0/1) / β±οΈ 26.2s / π° $0.02 | π’ 100% (1/1) / β±οΈ 56.7s / π° $0.13 |
| 121_new_relic_checkout_errors_tracing[0] π | π΄ 0% (0/1) / β±οΈ 29.9s / π° $0.11 | π΄ 0% (0/1) / β±οΈ 25.9s / π° $0.07 | π’ 100% (1/1) / β±οΈ 565.1s / π° $0.31 | π΄ 0% (0/1) / β±οΈ 141.8s / π° $0.28 |
| 122_new_relic_checkout_latency_tracing_rebuild[0] π | π΄ 0% (0/1) / β±οΈ 36.9s / π° $0.20 | π΄ 0% (0/1) / β±οΈ 40.9s / π° $0.12 | π΄ 0% (0/1) / β±οΈ 677.8s / π° $0.39 | π’ 100% (1/1) / β±οΈ 118.7s / π° $0.33 |
| 123_new_relic_checkout_errors_tracing[0] π | π΄ 0% (0/1) / β±οΈ 32.5s / π° $0.13 | π΄ 0% (0/1) / β±οΈ 22.5s / π° $0.03 | π’ 100% (1/1) / β±οΈ 577.4s / π° $0.32 | π’ 100% (1/1) / β±οΈ 97.5s / π° $0.29 |
| 12_job_crashing π | π’ 100% (1/1) / β±οΈ 36.6s / π° $0.18 | π’ 100% (1/1) / β±οΈ 33.3s / π° $0.07 | π΄ 0% (0/1) / β±οΈ 54.3s / π° $0.05 | π’ 100% (1/1) / β±οΈ 55.2s / π° $0.12 |
| 13a_pending_node_selector_basic π | π’ 100% (1/1) / β±οΈ 35.2s / π° $0.14 | π’ 100% (1/1) / β±οΈ 50.5s / π° $0.08 | π΄ 0% (0/1) / β±οΈ 27.4s / π° $0.02 | π’ 100% (1/1) / β±οΈ 51.6s / π° $0.13 |
| 13b_pending_node_selector_detailed π | π΄ 0% (0/1) / β±οΈ 33.6s / π° $0.13 | π’ 100% (1/1) / β±οΈ 36.3s / π° $0.08 | π’ 100% (1/1) / β±οΈ 314.9s / π° $0.14 | π’ 100% (1/1) / β±οΈ 50.0s / π° $0.13 |
| 14_pending_resources π | π’ 100% (1/1) / β±οΈ 37.6s / π° $0.14 | π’ 100% (1/1) / β±οΈ 37.6s / π° $0.09 | π΄ 0% (0/1) / β±οΈ 39.8s / π° $0.02 | π’ 100% (1/1) / β±οΈ 56.2s / π° $0.12 |
| 159_prometheus_high_cardinality_cpu[0] π | π’ 100% (1/1) / β±οΈ 30.4s / π° $0.11 | π’ 100% (1/1) / β±οΈ 58.5s / π° $0.14 | π’ 100% (1/1) / β±οΈ 304.1s / π° $0.22 | π’ 100% (1/1) / β±οΈ 55.1s / π° $0.18 |
| 159_prometheus_high_cardinality_cpu[1] π | π΄ 0% (0/1) / β±οΈ 48.6s / π° $0.12 | π’ 100% (1/1) / β±οΈ 34.3s / π° $0.14 | π’ 100% (1/1) / β±οΈ 358.2s / π° $0.13 | π’ 100% (1/1) / β±οΈ 135.5s / π° $0.21 |
| 159_prometheus_high_cardinality_cpu[2] π | π΄ 0% (0/1) / β±οΈ 38.4s / π° $0.09 | π’ 100% (1/1) / β±οΈ 51.4s / π° $0.15 | π’ 100% (1/1) / β±οΈ 119.6s / π° $0.11 | π’ 100% (1/1) / β±οΈ 69.2s / π° $0.21 |
| 15_failed_readiness_probe π | π’ 100% (1/1) / β±οΈ 44.0s / π° $0.18 | π’ 100% (1/1) / β±οΈ 32.3s / π° $0.06 | π’ 100% (1/1) / β±οΈ 236.9s / π° $0.18 | π’ 100% (1/1) / β±οΈ 132.8s / π° $0.15 |
| 16_failed_no_toolset_found π | π΄ 0% (0/1) / β±οΈ 24.4s / π° $0.06 | π΄ 0% (0/1) / β±οΈ 23.7s / π° $0.04 | π’ 100% (1/1) / β±οΈ 36.9s / π° $0.02 | π΄ 0% (0/1) / β±οΈ 22.3s / π° $0.06 |
| 17_oom_kill π | π’ 100% (1/1) / β±οΈ 38.3s / π° $0.18 | π’ 100% (1/1) / β±οΈ 31.4s / π° $0.06 | π’ 100% (1/1) / β±οΈ 78.0s / π° $0.09 | π’ 100% (1/1) / β±οΈ 55.1s / π° $0.12 |
| 19_detect_missing_app_details π | π΄ 0% (0/1) / β±οΈ 50.5s / π° $0.22 | π΄ 0% (0/1) / β±οΈ 43.3s / π° $0.07 | π΄ 0% (0/1) / β±οΈ 264.1s / π° $0.21 | π’ 100% (1/1) / β±οΈ 95.7s / π° $0.16 |
| 20_long_log_file_search π | π’ 100% (1/1) / β±οΈ 39.1s / π° $0.10 | π’ 100% (1/1) / β±οΈ 42.0s / π° $0.05 | π’ 100% (1/1) / β±οΈ 97.6s / π° $0.08 | π’ 100% (1/1) / β±οΈ 77.2s / π° $0.12 |
| 21_job_fail_curl_no_svc_account π | π’ 100% (1/1) / β±οΈ 43.9s / π° $0.27 | π’ 100% (1/1) / β±οΈ 38.2s / π° $0.14 | π΄ 0% (0/1) / β±οΈ 26.8s / π° $0.02 | π’ 100% (1/1) / β±οΈ 54.3s / π° $0.22 |
| 23_app_error_in_current_logs π | π’ 100% (1/1) / β±οΈ 40.1s / π° $0.26 | π’ 100% (1/1) / β±οΈ 36.2s / π° $0.08 | π’ 100% (1/1) / β±οΈ 283.2s / π° $0.19 | π’ 100% (1/1) / β±οΈ 72.4s / π° $0.50 |
| 24_misconfigured_pvc π | π’ 100% (1/1) / β±οΈ 39.7s / π° $0.23 | π’ 100% (1/1) / β±οΈ 40.9s / π° $0.09 | π΄ 0% (0/1) / β±οΈ 24.1s / π° $0.02 | π’ 100% (1/1) / β±οΈ 61.0s / π° $0.16 |
| 24a_misconfigured_pvc_basic π | π’ 100% (1/1) / β±οΈ 40.4s / π° $0.18 | π’ 100% (1/1) / β±οΈ 67.0s / π° $0.23 | π΄ 0% (0/1) / β±οΈ 29.4s / π° $0.02 | π’ 100% (1/1) / β±οΈ 79.2s / π° $0.15 |
| 24b_misconfigured_pvc_detailed π | π΄ 0% (0/1) / β±οΈ 40.3s / π° $0.17 | π΄ 0% (0/1) / β±οΈ 37.2s / π° $0.11 | π΄ 0% (0/1) / β±οΈ 29.3s / π° $0.02 | π’ 100% (1/1) / β±οΈ 64.2s / π° $0.14 |
| 25_misconfigured_ingress_class π | π΄ 0% (0/1) / β±οΈ 39.2s / π° $0.14 | π΄ 0% (0/1) / β±οΈ 45.1s / π° $0.19 | π’ 100% (1/1) / β±οΈ 296.2s / π° $0.20 | π’ 100% (1/1) / β±οΈ 117.6s / π° $0.31 |
| 26_page_render_times π | π’ 100% (1/1) / β±οΈ 30.2s / π° $0.14 | π’ 100% (1/1) / β±οΈ 30.7s / π° $0.10 | π’ 100% (1/1) / β±οΈ 227.4s / π° $0.20 | π’ 100% (1/1) / β±οΈ 57.6s / π° $0.15 |
| 27a_multi_container_logs π | π’ 100% (1/1) / β±οΈ 35.0s / π° $0.13 | π’ 100% (1/1) / β±οΈ 36.3s / π° $0.11 | π’ 100% (1/1) / β±οΈ 201.3s / π° $0.09 | π’ 100% (1/1) / β±οΈ 43.9s / π° $0.13 |
| 27b_multi_container_logs π | π’ 100% (1/1) / β±οΈ 32.5s / π° $0.14 | π’ 100% (1/1) / β±οΈ 39.0s / π° $0.09 | π’ 100% (1/1) / β±οΈ 154.3s / π° $0.13 | π’ 100% (1/1) / β±οΈ 37.8s / π° $0.11 |
| 28_permissions_error π | π’ 100% (1/1) / β±οΈ 21.2s / π° $0.04 | π’ 100% (1/1) / β±οΈ 23.9s / π° $0.05 | π΄ 0% (0/1) / β±οΈ 124.6s / π° $0.06 | π’ 100% (1/1) / β±οΈ 25.7s / π° $0.07 |
| 33_cpu_metrics_discovery π | π’ 100% (1/1) / β±οΈ 27.6s / π° $0.11 | π’ 100% (1/1) / β±οΈ 37.9s / π° $0.11 | π’ 100% (1/1) / β±οΈ 246.8s / π° $0.20 | π’ 100% (1/1) / β±οΈ 48.4s / π° $0.13 |
| 39_failed_toolset π | π’ 100% (1/1) / β±οΈ 27.7s / π° $0.06 | π’ 100% (1/1) / β±οΈ 26.9s / π° $0.04 | π’ 100% (1/1) / β±οΈ 222.1s / π° $0.16 | π’ 100% (1/1) / β±οΈ 191.5s / π° $0.12 |
| 41_setup_argo π | π’ 100% (1/1) / β±οΈ 19.4s / π° $0.03 | π’ 100% (1/1) / β±οΈ 31.5s / π° $0.04 | π’ 100% (1/1) / β±οΈ 170.7s / π° $0.11 | π’ 100% (1/1) / β±οΈ 19.0s / π° $0.06 |
| 42_dns_issues_result_new_tools_no_runbook π | π΄ 0% (0/1) / β±οΈ 34.9s / π° $0.24 | π’ 100% (1/1) / β±οΈ 35.3s / π° $0.06 | π’ 100% (1/1) / β±οΈ 564.4s / π° $0.25 | π’ 100% (1/1) / β±οΈ 140.3s / π° $0.20 |
| 42_dns_issues_steps_new_tools π | π’ 100% (1/1) / β±οΈ 84.7s / π° $0.12 | π’ 100% (1/1) / β±οΈ 37.1s / π° $0.07 | π’ 100% (1/1) / β±οΈ 464.0s / π° $0.24 | π’ 100% (1/1) / β±οΈ 210.1s / π° $0.31 |
| 43_current_datetime_from_prompt π | π’ 100% (1/1) / β±οΈ 28.4s / π° $0.03 | π’ 100% (1/1) / β±οΈ 20.9s / π° $0.04 | π’ 100% (1/1) / β±οΈ 91.9s / π° $0.07 | π’ 100% (1/1) / β±οΈ 18.1s / π° $0.06 |
| 45_fetch_deployment_logs_simple π | π’ 100% (1/1) / β±οΈ 29.5s / π° $0.11 | π’ 100% (1/1) / β±οΈ 32.5s / π° $0.07 | π’ 100% (1/1) / β±οΈ 108.3s / π° $0.09 | π’ 100% (1/1) / β±οΈ 35.9s / π° $0.09 |
| 50_logs_since_specific_date π | π’ 100% (1/1) / β±οΈ 16.6s / π° $0.13 | π’ 100% (1/1) / β±οΈ 19.6s / π° $0.09 | π’ 100% (1/1) / β±οΈ 136.4s / π° $0.11 | π’ 100% (1/1) / β±οΈ 27.8s / π° $0.11 |
| 50a_logs_since_last_specific_month π | π’ 100% (1/1) / β±οΈ 29.5s / π° $0.09 | π’ 100% (1/1) / β±οΈ 28.1s / π° $0.05 | π’ 100% (1/1) / β±οΈ 110.7s / π° $0.10 | π’ 100% (1/1) / β±οΈ 41.4s / π° $0.10 |
| 51_logs_summarize_errors π | π’ 100% (1/1) / β±οΈ 29.4s / π° $0.11 | π’ 100% (1/1) / β±οΈ 32.8s / π° $0.07 | π’ 100% (1/1) / β±οΈ 89.4s / π° $0.06 | π’ 100% (1/1) / β±οΈ 42.8s / π° $0.10 |
| 52_logs_login_issues π | π΄ 0% (0/1) / β±οΈ 33.6s / π° $0.10 | π’ 100% (1/1) / β±οΈ 48.1s / π° $0.10 | π’ 100% (1/1) / β±οΈ 339.9s / π° $0.19 | π’ 100% (1/1) / β±οΈ 75.4s / π° $0.12 |
| 53_logs_find_term π | π’ 100% (1/1) / β±οΈ 30.7s / π° $0.15 | π’ 100% (1/1) / β±οΈ 40.7s / π° $0.09 | π’ 100% (1/1) / β±οΈ 81.5s / π° $0.08 | π’ 100% (1/1) / β±οΈ 46.2s / π° $0.13 |
| 54_not_truncated_when_getting_pods π | π’ 100% (1/1) / β±οΈ 36.5s / π° $0.13 | π’ 100% (1/1) / β±οΈ 40.5s / π° $0.12 | π’ 100% (1/1) / β±οΈ 188.7s / π° $0.13 | π’ 100% (1/1) / β±οΈ 69.8s / π° $0.16 |
| 57_wrong_namespace π | π΄ 0% (0/1) / β±οΈ 27.1s / π° $0.10 | π΄ 0% (0/1) / β±οΈ 27.8s / π° $0.05 | π΄ 0% (0/1) / β±οΈ 111.4s / π° $0.10 | π’ 100% (1/1) / β±οΈ 41.5s / π° $0.10 |
| 59_label_based_counting π | π’ 100% (1/1) / β±οΈ 26.1s / π° $0.08 | π’ 100% (1/1) / β±οΈ 27.1s / π° $0.05 | π’ 100% (1/1) / β±οΈ 47.0s / π° $0.03 | π’ 100% (1/1) / β±οΈ 166.8s / π° $0.08 |
| 60_count_less_than π | π’ 100% (1/1) / β±οΈ 23.3s / π° $0.07 | π’ 100% (1/1) / β±οΈ 23.4s / π° $0.06 | π’ 100% (1/1) / β±οΈ 53.1s / π° $0.04 | π’ 100% (1/1) / β±οΈ 26.2s / π° $0.08 |
| 61_exact_match_counting π | π’ 100% (1/1) / β±οΈ 36.2s / π° $0.08 | π’ 100% (1/1) / β±οΈ 25.4s / π° $0.05 | π’ 100% (1/1) / β±οΈ 55.7s / π° $0.04 | π’ 100% (1/1) / β±οΈ 29.5s / π° $0.07 |
| 62_fetch_error_logs_with_errors π | π’ 100% (1/1) / β±οΈ 29.9s / π° $0.10 | π’ 100% (1/1) / β±οΈ 29.7s / π° $0.06 | π’ 100% (1/1) / β±οΈ 96.1s / π° $0.09 | π’ 100% (1/1) / β±οΈ 39.3s / π° $0.09 |
| 63_fetch_error_logs_no_errors π | π’ 100% (1/1) / β±οΈ 38.5s / π° $0.13 | π’ 100% (1/1) / β±οΈ 29.9s / π° $0.08 | π’ 100% (1/1) / β±οΈ 100.5s / π° $0.07 | π’ 100% (1/1) / β±οΈ 37.1s / π° $0.09 |
| 64_keda_vs_hpa_confusion π | π’ 100% (1/1) / β±οΈ 55.0s / π° $0.27 | π΄ 0% (0/1) / β±οΈ 31.9s / π° $0.07 | π’ 100% (1/1) / β±οΈ 241.5s / π° $0.20 | π’ 100% (1/1) / β±οΈ 72.1s / π° $0.25 |
| 65_health_check_followup π | π’ 100% (1/1) / β±οΈ 35.2s / π° $0.14 | π’ 100% (1/1) / β±οΈ 40.6s / π° $0.10 | π’ 100% (1/1) / β±οΈ 224.3s / π° $0.17 | π’ 100% (1/1) / β±οΈ 76.3s / π° $0.24 |
| 71_connection_pool_starvation π | π’ 100% (1/1) / β±οΈ 28.6s / π° $0.08 | π’ 100% (1/1) / β±οΈ 37.1s / π° $0.08 | π’ 100% (1/1) / β±οΈ 161.5s / π° $0.10 | π’ 100% (1/1) / β±οΈ 57.2s / π° $0.15 |
| 73a_time_window_anomaly π | π’ 100% (1/1) / β±οΈ 47.4s / π° $0.21 | π΄ 0% (0/1) / β±οΈ 27.6s / π° $0.05 | π’ 100% (1/1) / β±οΈ 113.4s / π° $0.08 | π’ 100% (1/1) / β±οΈ 60.6s / π° $0.14 |
| 73b_time_window_anomaly π | π΄ 0% (0/1) / β±οΈ 40.6s / π° $0.17 | π΄ 0% (0/1) / β±οΈ 43.6s / π° $0.09 | π’ 100% (1/1) / β±οΈ 136.4s / π° $0.08 | π΄ 0% (0/1) / β±οΈ 55.1s / π° $0.13 |
| 76_service_discovery_issue π | π΄ 0% (0/1) / β±οΈ 34.5s / π° $0.14 | π’ 100% (1/1) / β±οΈ 45.6s / π° $0.11 | π’ 100% (1/1) / β±οΈ 231.2s / π° $0.12 | π’ 100% (1/1) / β±οΈ 68.2s / π° $0.20 |
| 77_liveness_probe_misconfiguration π | π΄ 0% (0/1) / β±οΈ 43.4s / π° $0.19 | π’ 100% (1/1) / β±οΈ 30.3s / π° $0.07 | π’ 100% (1/1) / β±οΈ 101.5s / π° $0.09 | π’ 100% (1/1) / β±οΈ 48.1s / π° $0.13 |
| 78a_missing_cpu_limits π | π’ 100% (1/1) / β±οΈ 42.2s / π° $0.18 | π’ 100% (1/1) / β±οΈ 34.6s / π° $0.07 | π’ 100% (1/1) / β±οΈ 260.0s / π° $0.13 | π’ 100% (1/1) / β±οΈ 55.1s / π° $0.13 |
| 78b_cpu_quota_exceeded π | π΄ 0% (0/1) / β±οΈ 42.7s / π° $0.20 | π’ 100% (1/1) / β±οΈ 46.1s / π° $0.13 | π’ 100% (1/1) / β±οΈ 171.3s / π° $0.10 | π’ 100% (1/1) / β±οΈ 72.2s / π° $0.13 |
| 79_configmap_mount_issue π | π’ 100% (1/1) / β±οΈ 28.6s / π° $0.09 | π’ 100% (1/1) / β±οΈ 32.6s / π° $0.08 | π’ 100% (1/1) / β±οΈ 153.7s / π° $0.12 | π’ 100% (1/1) / β±οΈ 49.2s / π° $0.12 |
| 80_pvc_storage_class_mismatch π | π΄ 0% (0/1) / β±οΈ 29.7s / π° $0.11 | π΄ 0% (0/1) / β±οΈ 37.9s / π° $0.08 | π’ 100% (1/1) / β±οΈ 159.7s / π° $0.13 | π’ 100% (1/1) / β±οΈ 51.8s / π° $0.12 |
| 81_service_account_permission_denied π | π’ 100% (1/1) / β±οΈ 35.8s / π° $0.15 | π’ 100% (1/1) / β±οΈ 43.6s / π° $0.12 | π’ 100% (1/1) / β±οΈ 165.4s / π° $0.15 | π’ 100% (1/1) / β±οΈ 67.1s / π° $0.20 |
| 82_pod_anti_affinity_conflict π | π’ 100% (1/1) / β±οΈ 35.9s / π° $0.13 | π’ 100% (1/1) / β±οΈ 35.0s / π° $0.09 | π’ 100% (1/1) / β±οΈ 191.3s / π° $0.15 | π’ 100% (1/1) / β±οΈ 59.3s / π° $0.14 |
| 83_secret_not_found π | π’ 100% (1/1) / β±οΈ 32.0s / π° $0.15 | π’ 100% (1/1) / β±οΈ 37.5s / π° $0.06 | π’ 100% (1/1) / β±οΈ 132.9s / π° $0.10 | π’ 100% (1/1) / β±οΈ 52.2s / π° $0.11 |
| 84_network_policy_blocking_traffic π | π΄ 0% (0/1) / β±οΈ 35.2s / π° $0.14 | π’ 100% (1/1) / β±οΈ 39.6s / π° $0.12 | π’ 100% (1/1) / β±οΈ 157.9s / π° $0.13 | π’ 100% (1/1) / β±οΈ 84.8s / π° $0.22 |
| 85_hpa_not_scaling π | π΄ 0% (0/1) / β±οΈ 34.6s / π° $0.14 | π’ 100% (1/1) / β±οΈ 42.6s / π° $0.16 | π’ 100% (1/1) / β±οΈ 195.0s / π° $0.19 | π’ 100% (1/1) / β±οΈ 62.4s / π° $0.17 |
| 86_configmap_like_but_secret π | π΄ 0% (0/1) / β±οΈ 44.9s / π° $0.17 | π’ 100% (1/1) / β±οΈ 43.9s / π° $0.14 | π’ 100% (1/1) / β±οΈ 184.3s / π° $0.15 | π’ 100% (1/1) / β±οΈ 46.7s / π° $0.12 |
| 89_runbook_missing_cloudwatch π | π΄ 0% (0/1) / β±οΈ 30.8s / π° $0.16 | π’ 100% (1/1) / β±οΈ 22.3s / π° $0.05 | π’ 100% (1/1) / β±οΈ 315.4s / π° $0.19 | π’ 100% (1/1) / β±οΈ 42.8s / π° $0.10 |
| 90_runbook_basic_selection π | π΄ 0% (0/1) / β±οΈ 47.0s / π° $0.26 | π’ 100% (1/1) / β±οΈ 46.1s / π° $0.12 | π’ 100% (1/1) / β±οΈ 383.5s / π° $0.32 | π’ 100% (1/1) / β±οΈ 150.5s / π° $0.13 |
| 91f_datadog_logs_historical_pod π | π΄ 0% (0/1) / β±οΈ 38.9s / π° $0.17 | π’ 100% (1/1) / β±οΈ 80.4s / π° $0.36 | π΄ 0% (0/1) / β±οΈ 434.8s / π° $0.31 | π’ 100% (1/1) / β±οΈ 69.6s / π° $0.15 |
| 93_calling_datadog[0] π | π’ 100% (1/1) / β±οΈ 48.5s / π° $0.12 | π’ 100% (1/1) / β±οΈ 10.8s / π° $0.07 | π’ 100% (1/1) / β±οΈ 102.7s / π° $0.12 | π’ 100% (1/1) / β±οΈ 10.9s / π° $0.15 |
| 93_calling_datadog[1] π | π’ 100% (1/1) / β±οΈ 45.3s / π° $0.12 | π’ 100% (1/1) / β±οΈ 8.7s / π° $0.07 | π’ 100% (1/1) / β±οΈ 40.1s / π° $0.06 | π’ 100% (1/1) / β±οΈ 10.7s / π° $0.15 |
| 93_calling_datadog[2] π | π’ 100% (1/1) / β±οΈ 47.0s / π° $0.15 | π’ 100% (1/1) / β±οΈ 9.8s / π° $0.07 | π’ 100% (1/1) / β±οΈ 42.7s / π° $0.09 | π’ 100% (1/1) / β±οΈ 15.2s / π° $0.15 |
| 94_runbook_transparency π | π’ 100% (1/1) / β±οΈ 52.7s / π° $0.18 | π’ 100% (1/1) / β±οΈ 35.5s / π° $0.06 | π’ 100% (1/1) / β±οΈ 359.0s / π° $0.26 | π’ 100% (1/1) / β±οΈ 83.1s / π° $0.23 |
| 96_no_matching_runbook π | π΄ 0% (0/1) / β±οΈ 36.3s / π° $0.16 | π΄ 0% (0/1) / β±οΈ 60.8s / π° $0.17 | π’ 100% (1/1) / β±οΈ 383.2s / π° $0.26 | π’ 100% (1/1) / β±οΈ 90.1s / π° $0.26 |
| 97_logs_clarification_needed π | π’ 100% (1/1) / β±οΈ 15.2s / π° $0.03 | π’ 100% (1/1) / β±οΈ 19.3s / π° $0.03 | π’ 100% (1/1) / β±οΈ 26.8s / π° $0.02 | π’ 100% (1/1) / β±οΈ 131.5s / π° $0.19 |
| 99_logs_transparency_custom_time π | π’ 100% (1/1) / β±οΈ 45.3s / π° $0.12 | π’ 100% (1/1) / β±οΈ 33.6s / π° $0.08 | π’ 100% (1/1) / β±οΈ 67.9s / π° $0.06 | π’ 100% (1/1) / β±οΈ 43.0s / π° $0.10 |
| 93_events_since_specific_date π | βͺοΈ - | π’ 100% (1/1) / β±οΈ 13.2s / π° $0.07 | βͺοΈ - | π’ 100% (1/1) / β±οΈ 16.3s / π° $0.10 |
| 44_slack_statefulset_logs π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 48_logs_since_thursday π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 22_high_latency_dbi_down π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 08_sock_shop_frontend π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 104b_postgres_missing_index_pgstat π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 104c_postgres_minimal_missing_index π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 105_redis_wrong_data_structure π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 156_kafka_opensearch_latency π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 43_slack_deployment_logs π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 55_kafka_runbook π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
| 98_logs_transparency_default_time π | βͺοΈ - | βͺοΈ - | βͺοΈ - | βͺοΈ - |
Results are automatically generated and updated weekly. View full traces and detailed analysis in Braintrust experiment: local-benchmark-20250927-230943.