Data Processing

Data Processing taxonomy generated by the site skill importer.

15 skills
A
regex-vs-llm-structured-text

by affaan-m

regex-vs-llm-structured-text skill for choosing regex or LLM in structured text extraction. Start with deterministic parsing, add LLM validation for low-confidence edge cases, and use a cheaper, more reliable pipeline for documents, forms, invoices, and data analysis.

Data Analysis
Favorites 0GitHub 156.2k
K
omero-integration

by K-Dense-AI

omero-integration skill for OMERO Python workflows in Backend Development. Connect to OMERO, retrieve projects, datasets, images, ROIs, annotations, tables, and run batch scripts with less guesswork.

Backend Development
Favorites 0GitHub 21.3k
K
hypogenic

by K-Dense-AI

hypogenic is a skill for generating and testing hypotheses on tabular or text-derived datasets with LLM support. It helps with hypogenic for Data Analysis by turning empirical questions into structured, testable workflows for classification interpretation, content analysis, and deception detection. Use it when you need evidence-backed hypotheses, not just brainstorming.

Data Analysis
Favorites 0GitHub 21.3k
K
dnanexus-integration

by K-Dense-AI

dnanexus-integration is a practical skill for DNAnexus cloud genomics work. Use it to build apps and applets, manage uploads and downloads, run workflows, and automate pipelines with dxpy. The dnanexus-integration guide helps Backend Development tasks involving FASTQ, BAM, and VCF files, plus platform-specific configuration and job execution.

Backend Development
Favorites 0GitHub 21.3k
H
huggingface-datasets

by huggingface

Use the huggingface-datasets skill for Hugging Face Dataset Viewer API workflows to validate datasets, resolve splits, preview and paginate rows, search text, apply filters, and fetch parquet links or statistics. It is a practical huggingface-datasets guide for read-only dataset exploration.

Web Scraping
Favorites 0GitHub 10.4k
V
Workspace Data Analyst

by VoltAgent

Workspace Data Analyst is a lightweight skill for data analysis in your workspace. It analyzes CSV files, checks headers, summarizes totals, averages, and outliers, and provides concise next-step insights. The Workspace Data Analyst skill is ideal for quick file-aware reviews before deeper modeling.

Data Analysis
Favorites 0GitHub 8.5k
M
azure-storage-file-datalake-py

by microsoft

azure-storage-file-datalake-py is the Python skill for Azure Data Lake Storage Gen2. It helps backend developers and agents install, authenticate, and use the Azure SDK for hierarchical file system tasks like listing, uploading, downloading, and managing directories and files.

Backend Development
Favorites 0GitHub 2.3k
M
azure-cosmos-py

by microsoft

The azure-cosmos-py skill helps you install, configure, and use the Azure Cosmos DB Python SDK for NoSQL CRUD, queries, container setup, partitioning, and authentication. It is especially useful for Database Engineering workflows where partition keys and query cost matter.

Database Engineering
Favorites 0GitHub 2.2k
C
clickhouse-best-practices

by ClickHouse

clickhouse-best-practices is a ClickHouse best practices skill for Database Engineering. It guides schema design, query tuning, insert strategy, and agent connectivity with rule-based recommendations, making clickhouse-best-practices usage easier to trigger, review, and cite in ClickHouse workflows.

Database Engineering
Favorites 0GitHub 412
T
tinybird

by tinybirdco

Tinybird best practices for project files, SQL rules, optimization patterns, and file-based workflows. Use this tinybird skill for Backend Development when you need help with datasources, pipes, endpoints, materialized views, and deployment-safe guidance grounded in the repo rules.

Backend Development
Favorites 0GitHub 16
K
pymatgen

by K-Dense-AI

pymatgen is a Python materials science toolkit for crystal structures, phase diagrams, electronic structure, and file conversion. This pymatgen skill helps with scientific workflows using CIF, POSCAR, VASP, and Materials Project data.

Scientific
Favorites 0GitHub 0
K
exploratory-data-analysis

by K-Dense-AI

The exploratory-data-analysis skill turns scientific files into format-aware EDA reports. It detects file type, summarizes structure and quality, extracts key metadata, and suggests downstream analysis. Use it for exploratory-data-analysis for Data Analysis across chemistry, bioinformatics, microscopy, spectroscopy, proteomics, metabolomics, and other scientific file formats.

Data Analysis
Favorites 0GitHub 0
K
astropy

by K-Dense-AI

astropy is a Python toolkit for astronomy and astrophysics workflows. Use this astropy skill for celestial coordinates, units, FITS files, time scales, tables, WCS, cosmology, and astropy for Data Analysis. It helps with practical astronomy tasks like coordinate transforms, unit conversion, and data processing.

Data Analysis
Favorites 0GitHub 0
K
aeon

by K-Dense-AI

aeon is a scikit-learn-compatible Python skill for time series machine learning. Use it for classification, regression, clustering, forecasting, anomaly detection, segmentation, similarity search, and other temporal data workflows. It fits univariate and multivariate analysis when you need specialized methods beyond generic tabular ML.

Data Analysis
Favorites 0GitHub 0
S
postgres

by sanjay3290

The postgres skill lets you inspect live PostgreSQL databases with read-only SQL. Use it for schema discovery, table checks, and SELECT-based analysis across multiple connections with description-based auto-selection. It is built for Database Engineering workflows and blocks writes like INSERT, UPDATE, DELETE, and DROP for safety.

Database Engineering
Favorites 0GitHub 0
Data Processing