data-analyst

by Shubhamsaboo

data-analyst is a minimal GitHub skill that guides agents toward SQL, pandas, and basic statistical analysis for data exploration. Best for users who want code-backed queries, transformations, and interpretations from a single SKILL.md prompt layer.

Stars104.2k

Favorites0

Comments0

AddedApr 1, 2026

CategoryData Analysis

Install Command

npx skills add Shubhamsaboo/awesome-llm-apps --skill data-analyst

Curation Score

This skill scores 66/100, which means it is acceptable to list for directory users who want a lightweight data-analysis prompting aid, but they should expect limited operational depth. The repository clearly signals when to invoke the skill and what topics it covers, yet it stops short of providing concrete workflows, examples, or implementation artifacts that would reduce guesswork as much as a stronger skill would.

66/100

Strengths

The description and "When to Apply" section make triggering straightforward for data analysis, SQL, pandas, and statistics requests.
It defines a coherent scope around common analyst tasks such as querying, cleaning, transformations, and pattern finding.
The output guidance asks for commented SQL/pandas code, example results, performance notes, and interpretation, which is more actionable than a bare role prompt.

Cautions

No runnable examples, support files, or install/use commands, so agents must infer execution details from generic prose.
The skill lists broad competencies but gives few constraints or decision rules for choosing SQL vs. pandas vs. statistical methods in specific situations.

Sql Python Analytics

Overview

Overview of data-analyst skill

The data-analyst skill is a lightweight, focused prompt layer for Data Analysis tasks that need SQL, pandas, and basic statistical reasoning. It is best for users who already have a dataset, table schema, query goal, or exploratory question and want more reliable analytical output than a generic chat prompt usually gives.

What data-analyst is designed to do

This data-analyst skill steers an agent toward:

writing SQL for extraction and transformation
using pandas for cleaning, grouping, reshaping, and time-based work
applying descriptive statistics, correlation checks, and simple hypothesis-testing logic
returning code plus interpretation, not just commentary

The real job-to-be-done is not “be analytical” in the abstract. It is to turn a vague request like “find churn drivers” or “help me explore this CSV” into executable analysis steps, code, and findings you can inspect.

Who should install the data-analyst skill

Best fit:

analysts who want faster first-draft SQL or pandas workflows
engineers who occasionally need data exploration help
AI users who want code-backed answers instead of high-level advice
teams using agents for ad hoc analysis, data cleaning, or exploratory diagnostics

Less ideal:

users expecting automated chart rendering, notebook execution, or database connectivity from the skill alone
advanced statisticians needing rigorous model selection, causal inference, or production-grade ML pipelines

What makes this data-analyst skill different from a generic prompt

The main advantage of data-analyst is scope clarity. The skill explicitly centers SQL, pandas, and statistics, so the agent is more likely to:

choose the right analytical tool for the question
produce structured code instead of hand-wavy explanation
include comments, example outputs, performance notes, and interpretation
stay anchored to common data analysis workflows

That makes it more useful for real work than a broad “analyze this data” prompt, especially when you need something you can run or adapt quickly.

What the repository includes

This skill is intentionally minimal. The repository evidence shows only a single SKILL.md file and no helper scripts, rules, references, or sample datasets. That matters for adoption:

setup is simple
behavior is easy to understand
there is less hidden logic
output quality depends heavily on the quality of your prompt and data context

If you want a deeply opinionated framework with test assets or decision trees, this is not that. If you want a clean data-analyst skill you can invoke quickly for SQL/pandas/statistics work, it is a good fit.

How to Use data-analyst skill

Install context for data-analyst skill

If your agent environment supports GitHub-hosted skills, install data-analyst from the repository that contains it:

npx skills add Shubhamsaboo/awesome-llm-apps --skill data-analyst

If your client uses a different skills loader, adapt the source path to:

awesome_agent_skills/data-analyst

Because this repo exposes only SKILL.md, there are no extra dependency files you need to inspect before deciding whether to try it.

Read this file first before using data-analyst

Start with:

awesome_agent_skills/data-analyst/SKILL.md

There are no supporting README.md, metadata.json, rules/, or resources/ files in this skill directory, so nearly all of the usable guidance is in that one file. Read it to understand:

when the skill should be applied
its expected competency areas
the preferred output style

What input the data-analyst skill needs

The data-analyst install step is easy; good results depend on the input you provide after installation. At minimum, give the agent some combination of:

table schema or CSV column names
data types and date fields
business question
sample rows
desired grain, filters, or time range
output preference: SQL, pandas, stats explanation, or all three

Weak input:

“Analyze my sales data.”

Strong input:

“Use the data-analyst skill. I have an orders table with order_id, customer_id, order_date, country, channel, revenue, and is_refunded. Write SQL to calculate monthly revenue, refund rate, and repeat-purchase rate for 2024 by country and channel. Then explain what patterns to look for.”

The stronger version reduces guesswork on metrics, dimensions, and time scope.

How to turn a rough goal into a usable prompt

A good data-analyst usage prompt usually contains five parts:

Context — what dataset or system you have
Question — what decision or insight you need
Structure — schema, columns, joins, date rules
Constraints — SQL dialect, pandas only, no plotting, etc.
Output format — query, code, interpretation, validation checks

Example prompt:

“Use the data-analyst skill for Data Analysis. I need pandas code to inspect a customer support CSV. Columns: ticket_id, created_at, resolved_at, priority, channel, csat_score, agent_id. Clean missing values, compute resolution time in hours, summarize by priority and channel, flag outliers, and explain what metrics might indicate process issues. Assume the file is already loaded into a DataFrame named df.”

Best workflow for SQL tasks

For SQL-heavy work, use this sequence:

provide schema and join keys
define the metric precisely
name the SQL dialect if it matters
ask for both query and explanation
ask for edge-case checks before running

Useful prompt addition:

“State any assumptions about nulls, duplicate keys, and date boundaries before writing the final query.”

This improves output because SQL errors often come from unstated assumptions, not syntax.

Best workflow for pandas tasks

For pandas work, tell the skill:

the DataFrame name
whether dates are already parsed
expected row count or memory constraints
whether you need one-off analysis or reusable transformation code

A stronger pandas request:

“Use pandas only. df has 4 million rows, so avoid unnecessary copies. Show memory-conscious cleaning steps, groupby summaries, and missing-value diagnostics.”

That helps the agent choose more practical code instead of toy examples.

How to ask for statistical analysis well

The data-analyst guide is most useful when the statistical question is concrete. Ask for:

the hypothesis
the variables involved
whether comparison groups exist
what level of rigor you need

Better:

“Compare average order value between paid search and organic traffic. Recommend an appropriate significance test, explain assumptions, and show pandas code to run it.”

Worse:

“Do some stats on this data.”

The skill covers descriptive statistics, correlation analysis, and basic testing logic, but it is not a substitute for a specialized statistical review when decisions are high stakes.

Output to expect from data-analyst usage

According to the skill definition, good outputs should include:

SQL queries or pandas code
clear comments
example results
performance considerations
interpretation of findings

That output shape is valuable in practice because it gives you something to run plus enough explanation to sanity-check the logic before execution.

Practical tips that improve output quality

Small prompt upgrades materially improve data-analyst for Data Analysis workflows:

Specify whether you want exploration or a final metric.
Tell it if the data is messy, sparse, or wide.
Mention suspected issues like duplicates, missing timestamps, or inconsistent categories.
Ask for validation queries, not just the main query.
Request alternative approaches when there are tradeoffs.

Example:

“After the main SQL, add a validation query to check duplicate customer_id + order_date combinations and null rates in revenue columns.”

What this skill does not do for you

Because the skill is only a prompt file, it does not itself:

connect to databases
execute SQL
load files
profile your environment
enforce statistical correctness

You still need your own runtime, database access, and judgment. The skill improves the agent’s analytical framing; it does not replace tools or domain review.

data-analyst skill FAQ

Is data-analyst skill worth installing if I already use normal prompts?

Usually yes, if you often ask for SQL, pandas, or exploratory analysis. The value is not hidden automation; it is a better default analytical posture. A generic prompt may answer broadly. data-analyst is more likely to give code, assumptions, and interpretation aligned to common analyst work.

Is the data-analyst skill beginner-friendly?

Yes, with one caveat: beginners still need to provide schema and business context. The skill can help you structure an analysis, but it will not rescue an underspecified request. If you are new to SQL or pandas, ask it to explain each step and comment the code heavily.

When should I not use data-analyst?

Skip data-analyst when your task is mainly:

dashboard design
advanced machine learning
causal inference
data engineering orchestration
visualization-specific work

It is strongest in exploratory analysis, transformation logic, querying, and straightforward statistical reasoning.

Does data-analyst support a specific database or library stack?

The skill mentions SQL, Python with pandas, and statistical analysis, but it does not lock you to one SQL engine or one data platform. That flexibility is helpful, but it means you should state your dialect explicitly when needed, such as PostgreSQL, BigQuery, Snowflake, or SQLite.

Is this skill enough for production analytics work?

It can accelerate production work, but it is not production assurance by itself. Review generated SQL for performance, confirm metric definitions with stakeholders, and validate outputs on real data. The skill is a drafting and reasoning aid, not an execution guarantee.

How to Improve data-analyst skill

Give the data-analyst skill better analytical context

The biggest quality lever is context density. Include:

schema
business definitions
sample records
known data quality issues
success criteria

Without those, the skill may still respond fluently, but the analysis can drift from your actual metric logic.

Ask for assumptions before final code

One of the most effective ways to improve data-analyst skill output is to force assumptions into the open.

Try:

“Before writing the final SQL, list assumptions about joins, null handling, duplicate events, and time windows.”

This catches common failure modes early:

inflated counts from one-to-many joins
wrong date grain
misread categorical values
invalid statistical comparisons

Request validation steps, not just answers

A high-quality data-analyst guide prompt asks the model to verify its own work.

Useful additions:

“Provide one validation query.”
“Show sanity checks for row counts before and after filtering.”
“Point out which result would be suspicious and why.”
“List possible confounders before interpreting the correlation.”

This is often more valuable than asking for longer explanations.

Narrow the task when the first answer is too broad

If the initial response mixes SQL, pandas, and stats all at once, split the workflow:

schema understanding
extraction query
cleaning/transformation
statistical interpretation
summary for stakeholders

The data-analyst skill performs better when each pass has a single analytical objective.

Improve pandas results with runtime constraints

Pandas output gets better when you tell the model what matters operationally:

memory sensitivity
notebook vs script style
vectorized operations preferred
readability vs speed tradeoff

Example:

“Optimize for readable notebook code, but avoid row-wise apply unless necessary.”

That kind of instruction changes code quality in a way that generic prompts often miss.

Iterate on interpretation, not only on code

After the first answer, ask follow-ups like:

“Which conclusion is strongest, and what evidence supports it?”
“What could make this result misleading?”
“What segment cut would you check next?”
“What additional column would most improve confidence?”

This is where data-analyst for Data Analysis becomes more than code generation. It helps move from extraction to decision support.

Common failure modes to watch for

Even with the data-analyst skill, review outputs for:

incorrect joins
unspoken metric assumptions
null handling mistakes
overconfident statistical claims
example outputs that do not match your schema
inefficient SQL on large tables

The skill is compact and useful, but not deeply constrained by rules or test fixtures, so your review process matters.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

clickhouse-best-practices

by ClickHouse

clickhouse-best-practices is a ClickHouse best practices skill for Database Engineering. It guides schema design, query tuning, insert strategy, and agent connectivity with rule-based recommendations, making clickhouse-best-practices usage easier to trigger, review, and cite in ClickHouse workflows.

Database Engineering

Favorites 0GitHub 412

chdb-datastore

by ClickHouse

chdb-datastore is a pandas-compatible skill for fast data analysis with a ClickHouse-backed DataStore API. It supports file, database, and cloud connectors, cross-source joins, and minimal code changes for pandas-style workflows. Use this chdb-datastore guide when you want a drop-in analysis layer for larger datasets.

Data Analysis

Favorites 0GitHub 0

sympy

by K-Dense-AI

Use the sympy skill for exact symbolic math in Python, including algebra, calculus, matrices, physics formulas, number theory, geometry, and code generation. It helps you keep expressions exact, choose the right SymPy modules, and avoid float-heavy mistakes. Best for users who need a practical sympy guide for symbolic workflows and sympy for Data Analysis.

Data Analysis

Favorites 0GitHub 21.4k

interpreting-culture-index

by trailofbits

interpreting-culture-index helps interpret Culture Index surveys, profile exports, and related hiring or coaching notes. Use this interpreting-culture-index skill for role fit, team dynamics, burnout risk, candidate debriefs, onboarding plans, and conflict mediation. It emphasizes arrow-relative reading, anti-pattern checks, and practical outputs for data analysis and decision support.

Data Analysis

Favorites 0GitHub 5k

azure-search-documents-py

by microsoft

azure-search-documents-py is the Python Azure AI Search skill for backend development, covering install, auth, index design, vector search, hybrid search, semantic ranking, and agentic retrieval. Use the azure-search-documents-py skill when you need practical guidance from setup to working query patterns.

Backend Development

Favorites 0GitHub 2.3k

gget

by K-Dense-AI

gget is a bioinformatics skill for fast, unified access to 20+ genomic databases and analysis tools from CLI or Python. Use it for gene info, BLAST-related lookups, AlphaFold structures, expression data, disease associations, and enrichment-style analysis. It suits quick exploration and gget for Data Analysis workflows.

Data Analysis

Favorites 0GitHub 0

torch-geometric

by K-Dense-AI

torch-geometric skill guide for PyTorch Geometric graph neural networks. Use it for torch-geometric install help, torch-geometric usage, graph classification, node classification, link prediction, heterogeneous graphs, custom MessagePassing layers, and scaling GNNs for Machine Learning workflows.

Machine Learning

Favorites 0GitHub 21.4k

rdkit

by K-Dense-AI

The rdkit skill helps with precise cheminformatics workflows: parsing SMILES, SDF, MOL, PDB, and InChI; calculating descriptors; generating fingerprints; running substructure search; handling reactions; and building 2D/3D coordinates. Use this rdkit guide for advanced control, custom sanitization, and rdkit for Data Analysis workflows.

Data Analysis

Favorites 0GitHub 21.4k

huggingface-vision-trainer

by huggingface

huggingface-vision-trainer helps you install and use a Hugging Face skill for vision training jobs: object detection, image classification, and SAM/SAM2 segmentation. It covers dataset prep, cloud GPU setup, evaluation, Trackio logging, and pushing results to the Hub. Ideal for backend automation and repeatable training workflows.

Backend Development

Favorites 0GitHub 10.4k

seo-dataforseo

by AgriciDaniel

seo-dataforseo connects Claude to live SEO data through the DataForSEO MCP server for SERP checks, keyword research, backlinks, on-page analysis, competitor research, business listings, and AI visibility tracking. It is best for data-backed workflows when you need real search evidence, clear install guidance, and practical seo-dataforseo usage.

Keyword Research

Favorites 0GitHub 6.2k

pymc

by K-Dense-AI

PyMC is a Bayesian modeling skill for building, fitting, checking, and comparing probabilistic models in Python. Use pymc for hierarchical regression, multilevel analysis, time series, missing data, measurement error, and model comparison with LOO or WAIC.

Data Analysis

Favorites 0GitHub 0

pymatgen

by K-Dense-AI

pymatgen is a Python materials science toolkit for crystal structures, phase diagrams, electronic structure, and file conversion. This pymatgen skill helps with scientific workflows using CIF, POSCAR, VASP, and Materials Project data.

Scientific

Favorites 0GitHub 0

geopandas

by K-Dense-AI

geopandas skill for Python geospatial vector data analysis, including shapefiles, GeoJSON, and GeoPackage files. Use it to read, clean, join, buffer, clip, reproject, and export spatial data with less guesswork.

Data Analysis

Favorites 0GitHub 0

analyzing-threat-intelligence-feeds

by mukul975

Analyzing-threat-intelligence-feeds helps you ingest CTI feeds, normalize indicators, assess feed quality, and enrich IOCs for STIX 2.1 workflows. This analyzing-threat-intelligence-feeds skill is built for threat intel operations and Data Analysis, with practical guidance for TAXII, MISP, and commercial feeds.

Data Analysis

Favorites 0GitHub 0

azure-ai-textanalytics-py

by microsoft

azure-ai-textanalytics-py is a skill for Azure AI Text Analytics in Python. It helps with sentiment analysis, entity recognition, key phrase extraction, language detection, PII detection, and healthcare NLP. Use it when you need a fast path to Azure client setup, authentication, and practical text analytics usage for apps, notebooks, or data analysis workflows.

Data Analysis

Favorites 0GitHub 0

chdb-sql

by ClickHouse

chdb-sql is a GitHub skill for running ClickHouse SQL in Python without a server. It covers chdb.query(), Session, DB-API connections, table functions like file() and s3(), parametrized queries, and backend development workflows for local files and external data sources.

Backend Development

Favorites 0GitHub 0