chdb-datastore
by ClickHousechdb-datastore is a pandas-compatible skill for fast data analysis with a ClickHouse-backed DataStore API. It supports file, database, and cloud connectors, cross-source joins, and minimal code changes for pandas-style workflows. Use this chdb-datastore guide when you want a drop-in analysis layer for larger datasets.
This skill scores 88/100, which means it is a solid directory candidate with good install value for agents that need a pandas-like interface over ClickHouse-backed data access. The repository gives users enough evidence to decide it is worth installing: clear trigger phrases, a defined import pattern, supported connectors/formats, runnable examples, and a verification script. It is not perfect, but it is operationally clear enough to reduce guesswork versus a generic prompt.
- Explicit triggerability: the README lists concrete prompts and SKILL.md says when not to use it.
- Strong operational surface: import pattern, constructor/API reference, and connector docs cover the main workflows.
- Good install confidence: runnable examples plus scripts/verify_install.py help users validate the environment.
- The skill is focused on Python/pandas-style workflows only; it is not for raw SQL or non-Python use cases.
- The install path is slightly fragmented: SKILL.md has no install command, so users must rely on README/docs to set it up.
Overview of chdb-datastore skill
What chdb-datastore does
The chdb-datastore skill helps you use chdb.datastore as a pandas-compatible layer for fast data analysis. It is best for people who want to keep familiar pandas-style code, but run it on a ClickHouse-backed engine that can handle larger data and cross-source joins more efficiently. If your goal is chdb-datastore for Data Analysis, this skill is a strong fit when you need to read files, query databases, or combine remote sources without rewriting your workflow around raw SQL.
Who should use it
Use the chdb-datastore skill if you already think in DataFrames and want to:
- speed up slow pandas workflows,
- read local files or cloud data directly,
- join data across systems like MySQL, PostgreSQL, S3, and Parquet,
- keep analysis code close to standard pandas syntax.
It is less useful if you mainly want ClickHouse server administration, SQL-only analytics, or a non-Python workflow.
What makes it different
The main differentiator is the “drop-in” style: you often change the import, not the whole analysis. The skill is centered on import chdb.datastore as pd or from datastore import DataStore, then using normal pandas operations. That reduces adoption friction, but only if your input is already shaped like an analysis task. The skill also matters when users care about one practical outcome: faster execution with fewer code changes.
How to Use chdb-datastore skill
Install and verify the environment
For the chdb-datastore install step, start by confirming the repo-installed skill and the runtime assumptions:
- Python 3.9+ on macOS or Linux
chdbavailable in the environment- the
DataStoreimport path you plan to use
The repository includes scripts/verify_install.py, which is the fastest way to catch environment problems before you write analysis code. Use it when installation seems correct but imports fail, or when you are unsure whether datastore and chdb.datastore both resolve correctly.
Give the skill the right kind of task
The chdb-datastore usage pattern works best when the request includes:
- the source type: file, S3 object, MySQL table, PostgreSQL table, or mixed sources,
- the desired output shape: filtered table, grouped summary, join, export, or inspection,
- any schema hints for ambiguous files,
- the size or performance constraint if speed is the reason for using chdb.
A weak prompt is: “Analyze this data.”
A stronger prompt is: “Use chdb-datastore to load sales.parquet, filter rows where region == 'EU', group by product, and return total revenue and order count. Keep pandas-style code and note any required import changes.”
Read these files first
For the most useful chdb-datastore guide workflow, read in this order:
SKILL.mdfor the activation logic and core positioningexamples/examples.mdfor runnable patterns and failure modesreferences/connectors.mdfor connection methods and source-specific optionsreferences/api-reference.mdfor supported operations and method signaturesscripts/verify_install.pyto validate the local setup
This order helps you distinguish the common path from edge-case connector behavior before you ask the model to generate code.
Practical workflow for better output
Use a three-step prompt structure:
- State the data source and file/database details.
- Say whether you want pandas-compatible code, a migration from pandas, or a new analysis.
- Add output constraints such as joins, aggregation, export, or minimal code changes.
Example prompt pattern:
Use chdb-datastore to replace pandas in this script. Load the Parquet file from S3, join it with a PostgreSQL table on user_id, then compute monthly revenue by country. Keep the code readable and mention any connector assumptions.
That kind of prompt gives the skill enough context to choose the right connector, avoid overexplaining, and preserve the pandas mental model.
chdb-datastore skill FAQ
Is chdb-datastore just pandas with a different import?
Mostly, yes, from the user’s point of view. The chdb-datastore skill is designed for pandas-style analysis with a ClickHouse-backed engine underneath. That means many familiar DataFrame operations stay the same, but performance and execution behavior differ.
When should I not use chdb-datastore?
Do not use it for raw SQL tasks, ClickHouse server tuning, or cases where the user wants to author database-side SQL directly. It is also a poor fit if the job is non-Python or if the source data is already best handled by a specialized library rather than a DataFrame workflow.
Is it beginner-friendly?
Yes, if the beginner already understands basic pandas concepts. The learning curve is usually lower than learning a new query language because the skill preserves familiar DataFrame operations. The main beginner risk is assuming every pandas pattern will behave identically without checking connector constraints or execution triggers.
How is it different from an ordinary prompt?
An ordinary prompt may produce a generic pandas answer. The chdb-datastore page gives the model concrete cues about import style, supported connectors, repository files to inspect, and when the skill is the wrong tool. That tends to produce better install decisions and fewer broken examples.
How to Improve chdb-datastore skill
Provide source-specific details
The biggest quality boost comes from naming the data source precisely. chdb-datastore works better when you say sales.csv, s3://bucket/path.parquet, or from_mysql(...) instead of “a table” or “some data.” If the schema is uncertain, include the column names you expect and the join keys you need.
Mention the pandas pattern you want preserved
Say whether you need filtering, groupby, sorting, joins, window-like logic, or simple inspection. The skill is strongest when the requested output is framed as a pandas workflow, because that makes it easier to choose the right DataStore method and avoid unnecessary SQL-style rewriting.
Watch for the common failure modes
The most common mistakes are:
- leaving out the connector type,
- assuming unsupported raw SQL behavior,
- skipping schema hints for semi-structured files,
- asking for performance gains without saying what is slow.
If the first answer is too generic, iterate by adding the exact file path, database type, and the final shape of the result. For chdb-datastore usage, a precise problem statement is usually more valuable than a longer one.
Iterate with a concrete target
If your first output is close but not usable, refine it by asking for one of these:
- “keep the code as close to pandas as possible”
- “show the connector setup explicitly”
- “optimize for readability, not brevity”
- “prefer one example that I can run immediately”
That approach helps the chdb-datastore skill produce analysis code that is easier to install, test, and adapt in a real project.
