datadog-cli
by softaworksdatadog-cli helps agents run Datadog CLI workflows for logs, traces, metrics, services, and dashboards. Learn setup with DD_API_KEY and DD_APP_KEY, use npx @leoflores/datadog-cli commands, and handle --site plus dashboard update safety for incident triage.
This skill scores 82/100, which means it is a solid directory listing candidate for users who want Datadog debugging workflows an agent can invoke with less guesswork than a generic prompt. The repository gives substantial command coverage, concrete examples, and reference docs, though install/setup guidance is slightly fragmented between the skill and README.
- Strong operational references cover logs, metrics, query syntax, dashboards, and common workflows, reducing command guesswork for agents.
- Good triggerability: the description and examples clearly map to real debugging tasks like incident triage, trace following, log tailing, and dashboard work.
- Trust-building safety guidance is explicit, especially the dashboards reference warning that updates are destructive and should follow a backup-first workflow.
- Setup/install path is split between SKILL.md's direct `npx @leoflores/datadog-cli` usage and README's plugin install flow, which may cause some adoption guesswork.
- The skill depends on users already having valid Datadog API/app keys and Datadog query familiarity; there is no bundled automation or helper scripts.
Overview of datadog-cli skill
The datadog-cli skill helps an agent use Datadog from the command line for practical observability work: searching logs, tracing requests, querying metrics, listing services, and managing dashboards. It is best for engineers, SREs, platform teams, and AI-assisted incident responders who already have Datadog access and want faster triage without manually clicking through the UI.
What datadog-cli is for
Use datadog-cli when the real job is not “summarize Datadog,” but “investigate a production symptom with repeatable commands.” The skill is strongest when you need to:
- narrow an incident by service, error type, or time window
- pivot from logs to trace context
- check whether a spike is new or normal
- pull metrics quickly for a service or environment
- inspect or update dashboards with CLI-driven workflows
Best-fit users
This datadog-cli skill fits users who:
- already use Datadog for logs, metrics, traces, or dashboards
- want an agent to generate correct commands instead of vague search suggestions
- need incident triage workflows, not generic observability advice
- are comfortable providing service names, time ranges, trace IDs, or dashboard IDs
If you do not have Datadog keys or do not know your service/tag conventions, setup and prompt quality will matter more than the skill itself.
Why this skill is more useful than a generic prompt
A normal prompt might say “look at Datadog logs.” This skill gives the agent a command-level path: logs search, logs tail, logs trace, logs context, logs patterns, logs compare, metrics query, errors, services, and dashboard operations. It also points to reference docs that matter for correct execution, especially query syntax and the dashboard update warnings.
Key adoption blockers to know first
The main blockers are operational, not conceptual:
DD_API_KEYandDD_APP_KEYare required- non-US Datadog accounts may need
--site, such asdatadoghq.eu - results depend heavily on correct Datadog query syntax
- dashboard updates are destructive if fields are omitted
Those are the first things to verify before you judge datadog-cli usage quality.
How to Use datadog-cli skill
Install and runtime context
The skill itself lives in softaworks/agent-toolkit, but the actual CLI it teaches the agent to run is:
npx @leoflores/datadog-cli <command>
Set credentials first:
export DD_API_KEY="your-api-key"
export DD_APP_KEY="your-app-key"
For non-US Datadog sites, pass --site:
npx @leoflores/datadog-cli logs search --query "*" --site datadoghq.eu
For a practical datadog-cli install decision, the dependency to validate is the external CLI plus working Datadog API access.
Read these files before first real use
This skill is unusually reference-driven. Read in this order:
SKILL.mdreferences/query-syntax.mdreferences/logs-commands.mdreferences/metrics.mdreferences/workflows.mdreferences/dashboards.md
That path reduces most first-run mistakes: bad filters, weak time windows, and unsafe dashboard edits.
Inputs the skill needs to work well
The datadog-cli skill performs best when your request includes at least some of:
- service name, team name, or environment
- time window like
15m,1h, or24h - symptom type: errors, latency, failed requests, deployment regression
- trace ID, request ID, or timestamp if you have one
- whether you want logs, metrics, dashboards, or a triage workflow
- Datadog site if not default US
Weak input: “Check Datadog.”
Strong input: “Investigate payment-api 5xx errors in prod for the last hour, compare against the previous hour, then pull any related traces and CPU metrics.”
Turn a rough goal into a usable prompt
A good datadog-cli guide prompt should tell the agent both the objective and the narrowing dimensions.
Try this pattern:
Use datadog-cli for Observability triage.
Goal: identify why checkout failures increased after the last deploy.
Scope: service:payment-api env:prod
Time: last 1h, compare with previous 1h
Need: error summary, common log patterns, likely trace IDs, and key metrics
Site: datadoghq.eu
Why this works:
- it gives the agent a workflow, not a single command
- it includes query tags the CLI can actually use
- it prevents the agent from searching too broadly
Best first commands for common jobs
For incident triage, start broad, then narrow:
npx @leoflores/datadog-cli errors --from 1h --pretty
npx @leoflores/datadog-cli logs compare --query "status:error" --period 1h --pretty
npx @leoflores/datadog-cli logs patterns --query "status:error" --from 1h --pretty
Then scope to service:
npx @leoflores/datadog-cli logs search --query "service:payment-api status:error env:prod" --from 1h --pretty
If you already have a trace:
npx @leoflores/datadog-cli logs trace --id "TRACE_ID" --from 24h --pretty
For service health:
npx @leoflores/datadog-cli metrics query --query "avg:system.cpu.user{env:prod,service:payment-api}" --from 1h --pretty
Query syntax matters more than most users expect
Many weak datadog-cli usage results are really query-quality problems. The skill relies on Datadog search syntax like:
service:api status:error@http.status_code:>=500service:api OR service:payment@duration:[1000 TO 5000]-status:info
If you know your fields, include them explicitly. If you do not, ask the agent to start with broader discovery queries, then tighten based on returned attributes.
Practical workflow for incident response
A strong investigation loop with datadog-cli is:
- get error overview with
errors - compare current period with prior period using
logs compare - cluster repeated failures with
logs patterns - narrow by service/env using
logs search - inspect surrounding activity with
logs context - pivot into distributed flow using
logs trace - confirm resource or throughput signals with
metrics query
This is much better than repeatedly asking for “more logs,” because each command answers a different diagnostic question.
Dashboards need extra caution
The most important safety note in this repo is that dashboards update replaces the whole dashboard, not just changed fields. If fields like template variables, description, or notify list are omitted, they can be removed.
Before any update, the safe workflow is:
- fetch the dashboard to a temp file with
--output - preserve existing fields
- update using the full retained structure
This makes the datadog-cli skill suitable for dashboard work only if you are disciplined about backups and full-state updates.
Output-quality tips that change results
To get better answers from the agent:
- specify whether you want discovery, explanation, or exact commands
- include service and env tags together when possible
- choose a bounded time window first; widen only if needed
- ask for comparison against a previous period when evaluating regressions
- prefer a trace ID or timestamp if you already have one
- ask for
--prettywhen human review matters
The biggest quality gain usually comes from giving a precise query target, not from asking for more verbose analysis.
When to use logs vs metrics vs dashboards
Use logs when you need concrete events, errors, or request details.
Use metrics when you need trends, resource usage, or rate/latency signals.
Use dashboards when you need existing operational context or want to package a view for a team.
If you ask the agent for all three at once, tell it the decision goal: root cause, blast radius, regression check, or dashboard creation.
datadog-cli skill FAQ
Is datadog-cli good for beginners?
Yes, if you already have Datadog access and basic concepts like services, tags, and time windows. No, if you are still learning what logs, traces, and metrics represent. The skill reduces command guesswork, but it does not remove the need to know your environment names and observability conventions.
What makes this different from using Datadog UI directly?
datadog-cli is better when you want repeatable, scriptable, agent-generated investigation steps. It is especially useful for rapid triage, prompt-driven debugging, and sharing exact commands. The UI is still better for deep visual exploration and ad hoc browsing.
When is datadog-cli not a good fit?
Do not use this skill if:
- your organization blocks Datadog API key use
- you need UI-only features not exposed by the CLI workflow
- you want broad observability theory rather than Datadog-specific execution
- you cannot provide enough context for the agent to form valid queries
Do I need to install anything besides the skill?
Yes. The critical runtime dependency is the Datadog CLI invoked as:
npx @leoflores/datadog-cli <command>
You also need DD_API_KEY and DD_APP_KEY. For some accounts, you must pass --site.
Is datadog-cli for Observability only, or can it change things too?
Mostly it helps inspect and investigate, but dashboard commands can modify state. That is where caution matters most. Read references/dashboards.md before allowing any update flow.
Is it better than asking an agent to “check logs”?
Yes, because the skill gives the agent concrete command families and reference docs. That usually means faster narrowing, fewer malformed queries, and more useful incident workflows than ordinary freeform prompting.
How to Improve datadog-cli skill
Start prompts with operational constraints
The fastest way to improve datadog-cli output is to include the constraints the CLI actually needs:
- Datadog site
- environment
- service names
- time range
- identifiers like trace ID or dashboard ID
- whether the task is read-only or allowed to modify dashboards
Without that, the agent often defaults to broad, low-signal commands.
Ask for a workflow, not just one command
A common failure mode is prompting for a single lookup when the problem needs a sequence. Better prompt:
Use datadog-cli to triage a spike in 5xx responses for service:checkout in env:prod over the last hour.
First compare against the prior hour, then identify top error patterns, then pull relevant traces, then check CPU and memory metrics.
This produces better investigations because it maps onto the repo's workflow references.
Provide stronger query ingredients
Good inputs include actual Datadog fields:
service:payment-apienv:prod@http.status_code:>=500@error.kind:TimeoutError@duration:>=1000
If you only provide natural language like “the API is slow,” the agent must guess field names and filters. Better field-level inputs lead to better datadog-cli usage.
Handle dashboard edits with a safety-first prompt
If your task touches dashboards, explicitly require a backup-first workflow:
Use datadog-cli to update dashboard abc-def-ghi, but first export the current dashboard to a temp file, preserve template variables and description, and show the exact safe update command.
Do not produce a partial update.
This sharply reduces the biggest destructive risk in the skill.
Iterate after first output instead of broadening blindly
After the first command set, improve results by narrowing:
- from all errors to one service
- from 24h to the exact failure window
- from generic logs to pattern grouping
- from symptom to trace-level evidence
- from logs to confirming metrics
This is better than asking the agent for “more detail,” which often just expands noise.
Common mistakes to avoid
The most common adoption and output problems are:
- missing
DD_API_KEYorDD_APP_KEY - forgetting
--sitefor non-US Datadog - using weak or invalid query syntax
- searching too wide a time range first
- treating dashboard update as patch-like instead of full replacement
- asking for observability help without naming the affected service or env
What to inspect in the repo when results feel weak
If the agent seems generic, go back to:
references/query-syntax.mdfor filter precisionreferences/logs-commands.mdfor command choicereferences/workflows.mdfor investigation orderreferences/dashboards.mdfor safe modification patterns
That reading path usually fixes poor prompts faster than rewriting the whole request from scratch.
Best way to evaluate datadog-cli after installation
A practical acceptance test for datadog-cli install is:
- run a known
logs search - run a scoped
metrics query - test one workflow command like
errorsorlogs patterns - confirm
--sitebehavior if outside US - avoid dashboard writes until backup workflow is verified
If those succeed, the datadog-cli skill is likely ready for real incident and observability work.
