Dataset

Dataset skills and workflows surfaced by the site skill importer.

7 skills
P
dummy-dataset

by phuryn

dummy-dataset generates realistic test data in CSV, JSON, SQL, or Python script form. It helps with mock datasets, demos, database seeding, QA, and data cleaning by letting you define columns, row counts, and constraints for believable sample records.

Data Cleaning
Favorites 0GitHub 11.1k
H
huggingface-datasets

by huggingface

Use the huggingface-datasets skill for Hugging Face Dataset Viewer API workflows to validate datasets, resolve splits, preview and paginate rows, search text, apply filters, and fetch parquet links or statistics. It is a practical huggingface-datasets guide for read-only dataset exploration.

Web Scraping
Favorites 0GitHub 10.4k
K
pytdc

by K-Dense-AI

pytdc is a skill for Therapeutics Data Commons, giving AI-ready drug discovery datasets and benchmarks for ADME, toxicity, DTI, DDI, generation, scaffold splits, and pharmacological prediction.

Data Analysis
Favorites 0GitHub 0
K
pydeseq2

by K-Dense-AI

pydeseq2 is a Python DESeq2 skill for bulk RNA-seq differential gene expression analysis. Use it to compare conditions, fit single- or multi-factor designs, apply Wald tests and FDR correction, and generate volcano or MA plots in pandas and AnnData workflows.

Data Analysis
Favorites 0GitHub 0
K
molfeat

by K-Dense-AI

molfeat is a molecular featurization skill for ML and Data Analysis. It helps convert SMILES or RDKit molecules into fingerprints, descriptors, and pretrained embeddings for QSAR, virtual screening, similarity search, and chemical space analysis. Use this molfeat guide to pick practical representations and build reusable featurization pipelines.

Data Analysis
Favorites 0GitHub 0
K
lamindb

by K-Dense-AI

The lamindb skill helps you work with LaminDB, an open-source biology data framework for making data queryable, traceable, reproducible, and FAIR. Use it for lamindb for Data Analysis, metadata curation, ontology-based annotation, schema validation, and lineage-aware workflows across notebooks and pipelines.

Data Analysis
Favorites 0GitHub 0
K
cellxgene-census

by K-Dense-AI

cellxgene-census skill for querying the CELLxGENE Census programmatically. Use it to explore expression data, metadata, embeddings, and cross-dataset patterns across tissues, diseases, and cell types. Best for population-scale single-cell analysis and reference atlas comparisons; for your own data, use scanpy or scvi-tools.

Data Analysis
Favorites 0GitHub 0
Dataset