chdb-datastore

by ClickHouse

chdb-datastore is a pandas-compatible skill for fast data analysis with a ClickHouse-backed DataStore API. It supports file, database, and cloud connectors, cross-source joins, and minimal code changes for pandas-style workflows. Use this chdb-datastore guide when you want a drop-in analysis layer for larger datasets.

Stars0

Favorites0

Comments0

AddedApr 29, 2026

CategoryData Analysis

Install Command

npx skills add ClickHouse/agent-skills --skill chdb-datastore

Curation Score

This skill scores 88/100, which means it is a solid directory candidate with good install value for agents that need a pandas-like interface over ClickHouse-backed data access. The repository gives users enough evidence to decide it is worth installing: clear trigger phrases, a defined import pattern, supported connectors/formats, runnable examples, and a verification script. It is not perfect, but it is operationally clear enough to reduce guesswork versus a generic prompt.

88/100

Strengths

Explicit triggerability: the README lists concrete prompts and SKILL.md says when not to use it.
Strong operational surface: import pattern, constructor/API reference, and connector docs cover the main workflows.
Good install confidence: runnable examples plus scripts/verify_install.py help users validate the environment.

Cautions

The skill is focused on Python/pandas-style workflows only; it is not for raw SQL or non-Python use cases.
The install path is slightly fragmented: SKILL.md has no install command, so users must rely on README/docs to set it up.

Python Pandas Clickhouse Databases CSV S3 Postgres Mysql

Overview

Overview of chdb-datastore skill

What chdb-datastore does

The chdb-datastore skill helps you use chdb.datastore as a pandas-compatible layer for fast data analysis. It is best for people who want to keep familiar pandas-style code, but run it on a ClickHouse-backed engine that can handle larger data and cross-source joins more efficiently. If your goal is chdb-datastore for Data Analysis, this skill is a strong fit when you need to read files, query databases, or combine remote sources without rewriting your workflow around raw SQL.

Who should use it

Use the chdb-datastore skill if you already think in DataFrames and want to:

speed up slow pandas workflows,
read local files or cloud data directly,
join data across systems like MySQL, PostgreSQL, S3, and Parquet,
keep analysis code close to standard pandas syntax.

It is less useful if you mainly want ClickHouse server administration, SQL-only analytics, or a non-Python workflow.

What makes it different

The main differentiator is the “drop-in” style: you often change the import, not the whole analysis. The skill is centered on import chdb.datastore as pd or from datastore import DataStore, then using normal pandas operations. That reduces adoption friction, but only if your input is already shaped like an analysis task. The skill also matters when users care about one practical outcome: faster execution with fewer code changes.

How to Use chdb-datastore skill

Install and verify the environment

For the chdb-datastore install step, start by confirming the repo-installed skill and the runtime assumptions:

Python 3.9+ on macOS or Linux
chdb available in the environment
the DataStore import path you plan to use

The repository includes scripts/verify_install.py, which is the fastest way to catch environment problems before you write analysis code. Use it when installation seems correct but imports fail, or when you are unsure whether datastore and chdb.datastore both resolve correctly.

Give the skill the right kind of task

The chdb-datastore usage pattern works best when the request includes:

the source type: file, S3 object, MySQL table, PostgreSQL table, or mixed sources,
the desired output shape: filtered table, grouped summary, join, export, or inspection,
any schema hints for ambiguous files,
the size or performance constraint if speed is the reason for using chdb.

A weak prompt is: “Analyze this data.”
A stronger prompt is: “Use chdb-datastore to load sales.parquet, filter rows where region == 'EU', group by product, and return total revenue and order count. Keep pandas-style code and note any required import changes.”

Read these files first

For the most useful chdb-datastore guide workflow, read in this order:

SKILL.md for the activation logic and core positioning
examples/examples.md for runnable patterns and failure modes
references/connectors.md for connection methods and source-specific options
references/api-reference.md for supported operations and method signatures
scripts/verify_install.py to validate the local setup

This order helps you distinguish the common path from edge-case connector behavior before you ask the model to generate code.

Practical workflow for better output

Use a three-step prompt structure:

State the data source and file/database details.
Say whether you want pandas-compatible code, a migration from pandas, or a new analysis.
Add output constraints such as joins, aggregation, export, or minimal code changes.

Example prompt pattern:
Use chdb-datastore to replace pandas in this script. Load the Parquet file from S3, join it with a PostgreSQL table on user_id, then compute monthly revenue by country. Keep the code readable and mention any connector assumptions.

That kind of prompt gives the skill enough context to choose the right connector, avoid overexplaining, and preserve the pandas mental model.

chdb-datastore skill FAQ

Is chdb-datastore just pandas with a different import?

Mostly, yes, from the user’s point of view. The chdb-datastore skill is designed for pandas-style analysis with a ClickHouse-backed engine underneath. That means many familiar DataFrame operations stay the same, but performance and execution behavior differ.

When should I not use chdb-datastore?

Do not use it for raw SQL tasks, ClickHouse server tuning, or cases where the user wants to author database-side SQL directly. It is also a poor fit if the job is non-Python or if the source data is already best handled by a specialized library rather than a DataFrame workflow.

Is it beginner-friendly?

Yes, if the beginner already understands basic pandas concepts. The learning curve is usually lower than learning a new query language because the skill preserves familiar DataFrame operations. The main beginner risk is assuming every pandas pattern will behave identically without checking connector constraints or execution triggers.

How is it different from an ordinary prompt?

An ordinary prompt may produce a generic pandas answer. The chdb-datastore page gives the model concrete cues about import style, supported connectors, repository files to inspect, and when the skill is the wrong tool. That tends to produce better install decisions and fewer broken examples.

How to Improve chdb-datastore skill

Provide source-specific details

The biggest quality boost comes from naming the data source precisely. chdb-datastore works better when you say sales.csv, s3://bucket/path.parquet, or from_mysql(...) instead of “a table” or “some data.” If the schema is uncertain, include the column names you expect and the join keys you need.

Mention the pandas pattern you want preserved

Say whether you need filtering, groupby, sorting, joins, window-like logic, or simple inspection. The skill is strongest when the requested output is framed as a pandas workflow, because that makes it easier to choose the right DataStore method and avoid unnecessary SQL-style rewriting.

Watch for the common failure modes

The most common mistakes are:

leaving out the connector type,
assuming unsupported raw SQL behavior,
skipping schema hints for semi-structured files,
asking for performance gains without saying what is slow.

If the first answer is too generic, iterate by adding the exact file path, database type, and the final shape of the result. For chdb-datastore usage, a precise problem statement is usually more valuable than a longer one.

Iterate with a concrete target

If your first output is close but not usable, refine it by asking for one of these:

“keep the code as close to pandas as possible”
“show the connector setup explicitly”
“optimize for readability, not brevity”
“prefer one example that I can run immediately”

That approach helps the chdb-datastore skill produce analysis code that is easier to install, test, and adapt in a real project.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

clickhouse-best-practices

by ClickHouse

clickhouse-best-practices is a ClickHouse best practices skill for Database Engineering. It guides schema design, query tuning, insert strategy, and agent connectivity with rule-based recommendations, making clickhouse-best-practices usage easier to trigger, review, and cite in ClickHouse workflows.

Database Engineering

Favorites 0GitHub 412

chdb-sql

by ClickHouse

chdb-sql is a GitHub skill for running ClickHouse SQL in Python without a server. It covers chdb.query(), Session, DB-API connections, table functions like file() and s3(), parametrized queries, and backend development workflows for local files and external data sources.

Backend Development

Favorites 0GitHub 0

analytics-tracking

by coreyhaines31

analytics-tracking helps teams design, audit, and implement measurement for GA4, GTM, UTMs, conversions, and event plans. Use it to define decision-focused events, naming conventions, parameters, trigger logic, and QA steps for marketing sites, SaaS apps, or ecommerce flows.

Data Analysis

Favorites 0GitHub 0

spark-optimization

by wshobson

spark-optimization is a practical guide to diagnosing slow Apache Spark jobs with partitioning, shuffle, skew, caching, and memory tuning. Use it to install the skill from wshobson/agents, read SKILL.md, and apply evidence-based fixes from Spark UI symptoms, cluster settings, and query patterns.

Performance Optimization

Favorites 0GitHub 32.6k

data-analytics

by markdown-viewer

The data-analytics skill creates PlantUML diagrams for data analysis workflows, including ETL, ELT, data lakes, warehouses, streaming pipelines, log analytics, and BI dashboards. It is optimized for clear source-to-destination flow, AWS analytics/database stencils, and practical data-analytics guide output—not generic software or cloud architecture diagrams.

Data Analysis

Favorites 0GitHub 1.1k

clickhouse-io

by affaan-m

clickhouse-io is a ClickHouse-focused skill for schema design, analytical SQL, ingestion patterns, and performance tuning. Use it to guide MergeTree choices, partitioning, materialized views, and workload-specific query optimization.

Database Engineering

Favorites 0GitHub 156.1k

backtesting-frameworks

by wshobson

The backtesting-frameworks skill helps design and review trading strategy backtests with stronger controls for look-ahead bias, survivorship bias, overfitting, transaction costs, and walk-forward validation in Finance.

Finance

Favorites 0GitHub 32.6k

social-graph-ranker

by affaan-m

social-graph-ranker is the weighted graph-ranking layer for warm intro discovery, bridge scoring, and network gap analysis across X and LinkedIn. Use the social-graph-ranker skill when you need a reusable ranking engine for Lead Research, not a full outbound or network-maintenance workflow.

Lead Research

Favorites 0GitHub 156.3k

x-mastery-mentor

by alchaincyf

x-mastery-mentor is an X/Twitter skill for creators who need better post ideas, thread structure, account diagnostics, and growth guidance. It routes requests by task, points agents to the right reference files, and helps with writing, review, strategy, and X-specific content decisions.

Social Media

Favorites 0GitHub 656

regex-vs-llm-structured-text

by affaan-m

regex-vs-llm-structured-text skill for choosing regex or LLM in structured text extraction. Start with deterministic parsing, add LLM validation for low-confidence edge cases, and use a cheaper, more reliable pipeline for documents, forms, invoices, and data analysis.

Data Analysis

Favorites 0GitHub 156.2k

startup-metrics-framework

by wshobson

startup-metrics-framework helps founders, analysts, and operators calculate startup KPIs like CAC, LTV, burn multiple, runway, and growth metrics for SaaS, marketplace, consumer, and B2B startups.

Data Analysis

Favorites 0GitHub 32.6k

market-sizing-analysis

by wshobson

Use the market-sizing-analysis skill to build structured TAM, SAM, and SOM estimates with top-down, bottom-up, and value-theory methods. Covers install context, key files, inputs, workflow, and practical usage for startup market sizing and Data Analysis.

Data Analysis

Favorites 0GitHub 32.6k

startup-financial-modeling

by wshobson

startup-financial-modeling helps agents build 3-5 year startup finance models with cohort revenue, cost structure, burn, runway, and fundraising scenarios. Best for founders and finance leads who need install context, clear inputs, and practical usage guidance from the skill's SKILL.md.

Finance

Favorites 0GitHub 32.6k

risk-metrics-calculation

by wshobson

risk-metrics-calculation helps compute portfolio risk metrics like VaR, CVaR, Sharpe, Sortino, beta, volatility, and drawdown. Use it to turn return series into structured risk reporting, Python implementation patterns, and practical interpretation for finance workflows.

Finance

Favorites 0GitHub 32.6k

Workspace Data Analyst

by VoltAgent

Workspace Data Analyst is a lightweight skill for data analysis in your workspace. It analyzes CSV files, checks headers, summarizes totals, averages, and outliers, and provides concise next-step insights. The Workspace Data Analyst skill is ideal for quick file-aware reviews before deeper modeling.

Data Analysis

Favorites 0GitHub 8.5k

data-analyst

by Shubhamsaboo

data-analyst is a minimal GitHub skill that guides agents toward SQL, pandas, and basic statistical analysis for data exploration. Best for users who want code-backed queries, transformations, and interpretations from a single SKILL.md prompt layer.

Data Analysis

Favorites 0GitHub 104.2k