datadog-cli

by softaworks

datadog-cli helps agents run Datadog CLI workflows for logs, traces, metrics, services, and dashboards. Learn setup with DD_API_KEY and DD_APP_KEY, use npx @leoflores/datadog-cli commands, and handle --site plus dashboard update safety for incident triage.

Stars0

Favorites0

Comments0

AddedApr 1, 2026

CategoryObservability

Install Command

npx skills add softaworks/agent-toolkit --skill datadog-cli

Curation Score

This skill scores 82/100, which means it is a solid directory listing candidate for users who want Datadog debugging workflows an agent can invoke with less guesswork than a generic prompt. The repository gives substantial command coverage, concrete examples, and reference docs, though install/setup guidance is slightly fragmented between the skill and README.

82/100

Strengths

Strong operational references cover logs, metrics, query syntax, dashboards, and common workflows, reducing command guesswork for agents.
Good triggerability: the description and examples clearly map to real debugging tasks like incident triage, trace following, log tailing, and dashboard work.
Trust-building safety guidance is explicit, especially the dashboards reference warning that updates are destructive and should follow a backup-first workflow.

Cautions

Setup/install path is split between SKILL.md's direct `npx @leoflores/datadog-cli` usage and README's plugin install flow, which may cause some adoption guesswork.
The skill depends on users already having valid Datadog API/app keys and Datadog query familiarity; there is no bundled automation or helper scripts.

Cli Monitoring Metrics Dashboard

Overview

Overview of datadog-cli skill

The datadog-cli skill helps an agent use Datadog from the command line for practical observability work: searching logs, tracing requests, querying metrics, listing services, and managing dashboards. It is best for engineers, SREs, platform teams, and AI-assisted incident responders who already have Datadog access and want faster triage without manually clicking through the UI.

What datadog-cli is for

Use datadog-cli when the real job is not “summarize Datadog,” but “investigate a production symptom with repeatable commands.” The skill is strongest when you need to:

narrow an incident by service, error type, or time window
pivot from logs to trace context
check whether a spike is new or normal
pull metrics quickly for a service or environment
inspect or update dashboards with CLI-driven workflows

Best-fit users

This datadog-cli skill fits users who:

already use Datadog for logs, metrics, traces, or dashboards
want an agent to generate correct commands instead of vague search suggestions
need incident triage workflows, not generic observability advice
are comfortable providing service names, time ranges, trace IDs, or dashboard IDs

If you do not have Datadog keys or do not know your service/tag conventions, setup and prompt quality will matter more than the skill itself.

Why this skill is more useful than a generic prompt

A normal prompt might say “look at Datadog logs.” This skill gives the agent a command-level path: logs search, logs tail, logs trace, logs context, logs patterns, logs compare, metrics query, errors, services, and dashboard operations. It also points to reference docs that matter for correct execution, especially query syntax and the dashboard update warnings.

Key adoption blockers to know first

The main blockers are operational, not conceptual:

DD_API_KEY and DD_APP_KEY are required
non-US Datadog accounts may need --site, such as datadoghq.eu
results depend heavily on correct Datadog query syntax
dashboard updates are destructive if fields are omitted

Those are the first things to verify before you judge datadog-cli usage quality.

How to Use datadog-cli skill

Install and runtime context

The skill itself lives in softaworks/agent-toolkit, but the actual CLI it teaches the agent to run is:

npx @leoflores/datadog-cli <command>

Set credentials first:

export DD_API_KEY="your-api-key"
export DD_APP_KEY="your-app-key"

For non-US Datadog sites, pass --site:

npx @leoflores/datadog-cli logs search --query "*" --site datadoghq.eu

For a practical datadog-cli install decision, the dependency to validate is the external CLI plus working Datadog API access.

Read these files before first real use

This skill is unusually reference-driven. Read in this order:

SKILL.md
references/query-syntax.md
references/logs-commands.md
references/metrics.md
references/workflows.md
references/dashboards.md

That path reduces most first-run mistakes: bad filters, weak time windows, and unsafe dashboard edits.

Inputs the skill needs to work well

The datadog-cli skill performs best when your request includes at least some of:

service name, team name, or environment
time window like 15m, 1h, or 24h
symptom type: errors, latency, failed requests, deployment regression
trace ID, request ID, or timestamp if you have one
whether you want logs, metrics, dashboards, or a triage workflow
Datadog site if not default US

Weak input: “Check Datadog.”
Strong input: “Investigate payment-api 5xx errors in prod for the last hour, compare against the previous hour, then pull any related traces and CPU metrics.”

Turn a rough goal into a usable prompt

A good datadog-cli guide prompt should tell the agent both the objective and the narrowing dimensions.

Try this pattern:

Use datadog-cli for Observability triage.
Goal: identify why checkout failures increased after the last deploy.
Scope: service:payment-api env:prod
Time: last 1h, compare with previous 1h
Need: error summary, common log patterns, likely trace IDs, and key metrics
Site: datadoghq.eu

Why this works:

it gives the agent a workflow, not a single command
it includes query tags the CLI can actually use
it prevents the agent from searching too broadly

Best first commands for common jobs

For incident triage, start broad, then narrow:

npx @leoflores/datadog-cli errors --from 1h --pretty
npx @leoflores/datadog-cli logs compare --query "status:error" --period 1h --pretty
npx @leoflores/datadog-cli logs patterns --query "status:error" --from 1h --pretty

Then scope to service:

npx @leoflores/datadog-cli logs search --query "service:payment-api status:error env:prod" --from 1h --pretty

If you already have a trace:

npx @leoflores/datadog-cli logs trace --id "TRACE_ID" --from 24h --pretty

For service health:

npx @leoflores/datadog-cli metrics query --query "avg:system.cpu.user{env:prod,service:payment-api}" --from 1h --pretty

Query syntax matters more than most users expect

Many weak datadog-cli usage results are really query-quality problems. The skill relies on Datadog search syntax like:

service:api status:error
@http.status_code:>=500
service:api OR service:payment
@duration:[1000 TO 5000]
-status:info

If you know your fields, include them explicitly. If you do not, ask the agent to start with broader discovery queries, then tighten based on returned attributes.

Practical workflow for incident response

A strong investigation loop with datadog-cli is:

get error overview with errors
compare current period with prior period using logs compare
cluster repeated failures with logs patterns
narrow by service/env using logs search
inspect surrounding activity with logs context
pivot into distributed flow using logs trace
confirm resource or throughput signals with metrics query

This is much better than repeatedly asking for “more logs,” because each command answers a different diagnostic question.

Dashboards need extra caution

The most important safety note in this repo is that dashboards update replaces the whole dashboard, not just changed fields. If fields like template variables, description, or notify list are omitted, they can be removed.

Before any update, the safe workflow is:

fetch the dashboard to a temp file with --output
preserve existing fields
update using the full retained structure

This makes the datadog-cli skill suitable for dashboard work only if you are disciplined about backups and full-state updates.

Output-quality tips that change results

To get better answers from the agent:

specify whether you want discovery, explanation, or exact commands
include service and env tags together when possible
choose a bounded time window first; widen only if needed
ask for comparison against a previous period when evaluating regressions
prefer a trace ID or timestamp if you already have one
ask for --pretty when human review matters

The biggest quality gain usually comes from giving a precise query target, not from asking for more verbose analysis.

When to use logs vs metrics vs dashboards

Use logs when you need concrete events, errors, or request details.
Use metrics when you need trends, resource usage, or rate/latency signals.
Use dashboards when you need existing operational context or want to package a view for a team.

If you ask the agent for all three at once, tell it the decision goal: root cause, blast radius, regression check, or dashboard creation.

datadog-cli skill FAQ

Is datadog-cli good for beginners?

Yes, if you already have Datadog access and basic concepts like services, tags, and time windows. No, if you are still learning what logs, traces, and metrics represent. The skill reduces command guesswork, but it does not remove the need to know your environment names and observability conventions.

What makes this different from using Datadog UI directly?

datadog-cli is better when you want repeatable, scriptable, agent-generated investigation steps. It is especially useful for rapid triage, prompt-driven debugging, and sharing exact commands. The UI is still better for deep visual exploration and ad hoc browsing.

When is datadog-cli not a good fit?

Do not use this skill if:

your organization blocks Datadog API key use
you need UI-only features not exposed by the CLI workflow
you want broad observability theory rather than Datadog-specific execution
you cannot provide enough context for the agent to form valid queries

Do I need to install anything besides the skill?

Yes. The critical runtime dependency is the Datadog CLI invoked as:

npx @leoflores/datadog-cli <command>

You also need DD_API_KEY and DD_APP_KEY. For some accounts, you must pass --site.

Is datadog-cli for Observability only, or can it change things too?

Mostly it helps inspect and investigate, but dashboard commands can modify state. That is where caution matters most. Read references/dashboards.md before allowing any update flow.

Is it better than asking an agent to “check logs”?

Yes, because the skill gives the agent concrete command families and reference docs. That usually means faster narrowing, fewer malformed queries, and more useful incident workflows than ordinary freeform prompting.

How to Improve datadog-cli skill

Start prompts with operational constraints

The fastest way to improve datadog-cli output is to include the constraints the CLI actually needs:

Datadog site
environment
service names
time range
identifiers like trace ID or dashboard ID
whether the task is read-only or allowed to modify dashboards

Without that, the agent often defaults to broad, low-signal commands.

Ask for a workflow, not just one command

A common failure mode is prompting for a single lookup when the problem needs a sequence. Better prompt:

Use datadog-cli to triage a spike in 5xx responses for service:checkout in env:prod over the last hour.
First compare against the prior hour, then identify top error patterns, then pull relevant traces, then check CPU and memory metrics.

This produces better investigations because it maps onto the repo's workflow references.

Provide stronger query ingredients

Good inputs include actual Datadog fields:

service:payment-api
env:prod
@http.status_code:>=500
@error.kind:TimeoutError
@duration:>=1000

If you only provide natural language like “the API is slow,” the agent must guess field names and filters. Better field-level inputs lead to better datadog-cli usage.

Handle dashboard edits with a safety-first prompt

If your task touches dashboards, explicitly require a backup-first workflow:

Use datadog-cli to update dashboard abc-def-ghi, but first export the current dashboard to a temp file, preserve template variables and description, and show the exact safe update command.
Do not produce a partial update.

This sharply reduces the biggest destructive risk in the skill.

Iterate after first output instead of broadening blindly

After the first command set, improve results by narrowing:

from all errors to one service
from 24h to the exact failure window
from generic logs to pattern grouping
from symptom to trace-level evidence
from logs to confirming metrics

This is better than asking the agent for “more detail,” which often just expands noise.

Common mistakes to avoid

The most common adoption and output problems are:

missing DD_API_KEY or DD_APP_KEY
forgetting --site for non-US Datadog
using weak or invalid query syntax
searching too wide a time range first
treating dashboard update as patch-like instead of full replacement
asking for observability help without naming the affected service or env

What to inspect in the repo when results feel weak

If the agent seems generic, go back to:

references/query-syntax.md for filter precision
references/logs-commands.md for command choice
references/workflows.md for investigation order
references/dashboards.md for safe modification patterns

That reading path usually fixes poor prompts faster than rewriting the whole request from scratch.

Best way to evaluate datadog-cli after installation

A practical acceptance test for datadog-cli install is:

run a known logs search
run a scoped metrics query
test one workflow command like errors or logs patterns
confirm --site behavior if outside US
avoid dashboard writes until backup workflow is verified

If those succeed, the datadog-cli skill is likely ready for real incident and observability work.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

configuring-suricata-for-network-monitoring

by mukul975

The configuring-suricata-for-network-monitoring skill helps deploy and tune Suricata for IDS/IPS monitoring, EVE JSON logging, rules management, and SIEM-ready output. It suits the configuring-suricata-for-network-monitoring for Security Audit workflow when you need practical setup, validation, and false-positive reduction.

Security Audit

Favorites 0GitHub 0

auditing-tls-certificate-transparency-logs

by mukul975

The auditing-tls-certificate-transparency-logs skill helps security teams monitor Certificate Transparency logs for owned domains, detect unauthorized certificate issuance, discover certificate-exposed subdomains, and track suspicious CA activity with a repeatable Security Audit workflow.

Security Audit

Favorites 0GitHub 0

analyzing-docker-container-forensics

by mukul975

analyzing-docker-container-forensics helps investigate compromised Docker containers by analyzing images, layers, volumes, logs, and runtime artifacts to identify malicious activity and preserve evidence. Use this analyzing-docker-container-forensics skill for a Security Audit, incident review, or container hardening assessment.

Security Audit

Favorites 0GitHub 0

aws-serverless-eda

by zxkane

aws-serverless-eda is a guide for Backend Development on AWS serverless and event-driven architecture. Use it to design Lambda APIs, async workflows, microservices, queues, pub/sub, and orchestration with API Gateway, DynamoDB, Step Functions, EventBridge, SQS, and SNS. It emphasizes Well-Architected decisions, observability, security, and deployment discipline.

Backend Development

Favorites 0GitHub 0

sentry

by openai

The sentry skill is a read-only Observability tool for inspecting Sentry issues, events, and health signals. Use it to investigate recent production errors, summarize impact, and run repeatable CLI-based queries with structured output. It is best when you need a practical sentry guide for triage, not a broad observability overview.

Observability

Favorites 0GitHub 0

building-cloud-siem-with-sentinel

by mukul975

building-cloud-siem-with-sentinel is a practical guide for deploying Microsoft Sentinel as a cloud SIEM and SOAR layer. It covers multi-cloud log ingestion, KQL detections, incident investigation, and Logic Apps response playbooks for Security Audit and SOC operations. Use this building-cloud-siem-with-sentinel skill when you need a repo-backed starting point for centralized cloud security monitoring.

Security Audit

Favorites 0GitHub 0

aws-cost-operations

by zxkane

aws-cost-operations is an AWS cost and operations skill for estimating costs, reviewing bills, monitoring CloudWatch, checking CloudTrail, and guiding operational decisions. It is well suited for Finance, FinOps, platform teams, and operators who need verified AWS facts and decision-ready output.

Finance

Favorites 0GitHub 0

canary-watch

by affaan-m

canary-watch is a post-deploy monitoring skill for checking a live URL for regressions after releases, merges, or dependency updates across staging or production.

Monitoring

Favorites 0GitHub 156.1k

python-observability

by wshobson

python-observability helps you instrument Python services with structured logging, metrics, traces, correlation IDs, and bounded-cardinality patterns for production debugging and safer observability rollouts.

Observability

Favorites 0GitHub 32.6k

prometheus-configuration

by wshobson

prometheus-configuration helps you install and use Prometheus for scraping, retention, alerting, and recording rules across Kubernetes, Docker Compose, and server setups.

Observability

Favorites 0GitHub 32.6k

appinsights-instrumentation

by github

appinsights-instrumentation helps instrument Azure-hosted web apps with Application Insights. It guides App Service auto-instrumentation or manual ASP.NET Core and Node.js setup, including connection string and IaC updates.

Observability

Favorites 0GitHub 27.8k

analyzing-security-logs-with-splunk

by mukul975

analyzing-security-logs-with-splunk helps investigate security events in Splunk by correlating Windows, firewall, proxy, and authentication logs into timelines and evidence. This analyzing-security-logs-with-splunk skill is a practical guide for Security Audit, incident response, and threat hunting.

Security Audit

Favorites 0GitHub 6.1k

azure-monitor-opentelemetry-ts

by microsoft

azure-monitor-opentelemetry-ts helps instrument Node.js apps with Azure Monitor and OpenTelemetry for distributed traces, metrics, and logs. Use this azure-monitor-opentelemetry-ts skill to install the package, set APPLICATIONINSIGHTS_CONNECTION_STRING, and follow the correct startup order for auto-instrumentation.

Observability

Favorites 0GitHub 2.3k

conducting-cloud-incident-response

by mukul975

conducting-cloud-incident-response is a cloud incident response skill for AWS, Azure, and GCP. It focuses on identity-based containment, log review, resource isolation, and forensic evidence capture. Use it for suspicious API activity, compromised access keys, or cloud-hosted workload breaches when you need a practical conducting-cloud-incident-response guide.

Incident Response

Favorites 0GitHub 0

building-threat-intelligence-platform

by mukul975

building-threat-intelligence-platform skill for designing, deploying, and reviewing a threat intelligence platform with MISP, OpenCTI, TheHive, Cortex, STIX/TAXII, and Elasticsearch. Use it for installation guidance, usage workflows, and Security Audit planning backed by repository references and scripts.

Security Audit

Favorites 0GitHub 0

building-soc-metrics-and-kpi-tracking

by mukul975

The building-soc-metrics-and-kpi-tracking skill turns SOC activity data into KPIs like MTTD, MTTR, alert quality, analyst productivity, and detection coverage. It fits SOC leadership, security operations, and observability teams that need repeatable reporting, trend tracking, and executive-friendly metrics backed by Splunk-based workflows.

Observability

Favorites 0GitHub 0