pytdc

by K-Dense-AI

pytdc is a skill for Therapeutics Data Commons, giving AI-ready drug discovery datasets and benchmarks for ADME, toxicity, DTI, DDI, generation, scaffold splits, and pharmacological prediction.

Stars0

Favorites0

Comments0

AddedMay 14, 2026

CategoryData Analysis

Install Command

npx skills add K-Dense-AI/claude-scientific-skills --skill pytdc

Curation Score

This skill scores 78/100, which means it is a solid listing candidate for directory users who need a practical PyTDC workflow for therapeutics ML. The repository gives enough operational detail to help an agent recognize when to use it, install it, and work with key dataset/benchmark tasks with less guesswork than a generic prompt.

78/100

Strengths

Explicit use cases cover ADME, toxicity, drug-target interaction, molecule generation, and benchmark evaluation.
Installation and upgrade commands are provided with a concrete pip/uv path, improving triggerability and adoption.
Long, structured SKILL.md with many headings and workflow sections suggests substantive operational guidance rather than a placeholder.

Cautions

Repository tree shows no scripts, references, resources, or install command metadata beyond SKILL.md, so some workflows may rely on narrative instructions only.
The excerpt indicates broad coverage but not a fully visible end-to-end quick start here, so users may still need some trial-and-error for specific tasks.

Python Dataset Benchmarking Machine Learning Drug Discovery Therapeutic

Overview

Overview of pytdc skill

What pytdc is for

pytdc is the skill for using Therapeutics Data Commons in AI-driven drug discovery workflows. It helps you get to curated, AI-ready datasets and benchmarks for ADME, toxicity, bioactivity, drug-target interaction, drug-drug interaction, generation, and related evaluation tasks without inventing your own data schema.

Who should install it

Install the pytdc skill if you are doing therapeutic ML, pharmacological prediction, or benchmarking models on standardized splits and metrics. It is a strong fit for data scientists who need reproducible dataset access; it is a weaker fit if you only need a generic chemistry prompt with no dataset loading or evaluation step.

Why it matters

The main value of the pytdc skill is not just dataset access, but the structure around it: task-specific loaders, standard splits such as scaffold or cold splits, and benchmark-friendly evaluation choices. That reduces the usual adoption blockers in drug discovery work, where inconsistent preprocessing and ad hoc splitting can make results hard to trust.

How to Use pytdc skill

Install pytdc in your environment

Use the install command from the skill instructions first:
uv pip install PyTDC

For updating an existing setup, use:
uv pip install PyTDC --upgrade

If your workflow uses a different package manager, map the same package name into that environment rather than rewriting the skill’s assumptions.

Start from the right files

Begin with SKILL.md, then read the sections on overview, when to use, installation, and quick start before jumping into code. If you need broader project context, inspect any nearby documentation the repo exposes through the skill file tree; in this repository, the skill content itself is the main source of truth.

Turn a rough goal into a usable prompt

The pytdc usage works best when your prompt names the task, dataset family, split strategy, and output goal. For example, instead of asking for “help with PyTDC,” ask for:

“Load an ADME dataset in pytdc, use a scaffold split, and prepare a baseline regression workflow.”
“Show a pytdc guide for DTI benchmarking with train/validation/test splits and metric reporting.”
“Set up pytdc for Data Analysis on a toxicity dataset and summarize label balance, missingness, and split design.”

Those details help the skill choose the right task path and avoid generic code that does not match your experiment.

Workflow that usually works best

First identify the therapeutic task, then confirm the dataset class and split policy, then load the data and inspect labels before modeling. If you are benchmarking, decide early whether you need a scaffold split, a cold split, or another predefined evaluation setup, because that choice affects comparability more than model choice does.

pytdc skill FAQ

Is pytdc only for drug discovery models?

Mostly yes. The pytdc skill is built around therapeutic ML and pharmacology use cases, especially datasets and benchmarks rather than general-purpose tabular analysis. If your project is unrelated to compounds, proteins, or drug interaction tasks, a different skill is probably a better fit.

Do I need PyTDC experience before using the skill?

No. The skill is useful for beginners who can describe a dataset goal in plain language. What matters most is being specific about the target task, desired split, and whether you need analysis, prediction, or generation.

How is this different from a normal prompt?

A normal prompt can describe one-off loading or modeling steps, but the pytdc skill is more useful when you want repeatable data access and benchmark discipline. That is especially important when you need standard splits and evaluation conventions that make results easier to compare.

When should I not use pytdc?

Do not use pytdc if you do not need TDC datasets or therapeutic benchmarks, or if you only want a high-level overview of medicinal chemistry concepts. It is also not the best choice if your data is proprietary and unrelated to the supported therapeutic task families.

How to Improve pytdc skill

Provide the task before the model idea

The most useful improvement to a pytdc request is clearer problem framing. Say whether you need property prediction, DTI, DDI, molecule generation, or retrosynthesis before mentioning architectures or metrics. That lets the skill choose the right dataset and preprocessing assumptions.

Specify split and metric expectations

Many failures come from underspecified evaluation. If you care about a scaffold split, cold split, ROC-AUC, PR-AUC, RMSE, or ranking metrics, say so up front in your pytdc prompt. The output is much better when the split strategy and metric are fixed before the modeling discussion starts.

If you need notebook-ready code, a lightweight data audit, or compatibility with a specific stack, include that in the request. For pytdc for Data Analysis, mention whether you want class balance, missing-value checks, descriptor summaries, or train/test leakage risk checks so the output focuses on the right diagnostics.

Iterate by tightening the dataset target

If the first answer is too broad, narrow it by dataset family, task type, and output format. A better follow-up might be: “Keep the same pytdc workflow, but switch to toxicity classification, use a scaffold split, and return only the data-loading and evaluation steps.”

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

clickhouse-best-practices

by ClickHouse

clickhouse-best-practices is a ClickHouse best practices skill for Database Engineering. It guides schema design, query tuning, insert strategy, and agent connectivity with rule-based recommendations, making clickhouse-best-practices usage easier to trigger, review, and cite in ClickHouse workflows.

Database Engineering

Favorites 0GitHub 412

chdb-datastore

by ClickHouse

chdb-datastore is a pandas-compatible skill for fast data analysis with a ClickHouse-backed DataStore API. It supports file, database, and cloud connectors, cross-source joins, and minimal code changes for pandas-style workflows. Use this chdb-datastore guide when you want a drop-in analysis layer for larger datasets.

Data Analysis

Favorites 0GitHub 0

sympy

by K-Dense-AI

Use the sympy skill for exact symbolic math in Python, including algebra, calculus, matrices, physics formulas, number theory, geometry, and code generation. It helps you keep expressions exact, choose the right SymPy modules, and avoid float-heavy mistakes. Best for users who need a practical sympy guide for symbolic workflows and sympy for Data Analysis.

Data Analysis

Favorites 0GitHub 21.4k

interpreting-culture-index

by trailofbits

interpreting-culture-index helps interpret Culture Index surveys, profile exports, and related hiring or coaching notes. Use this interpreting-culture-index skill for role fit, team dynamics, burnout risk, candidate debriefs, onboarding plans, and conflict mediation. It emphasizes arrow-relative reading, anti-pattern checks, and practical outputs for data analysis and decision support.

Data Analysis

Favorites 0GitHub 5k

azure-search-documents-py

by microsoft

azure-search-documents-py is the Python Azure AI Search skill for backend development, covering install, auth, index design, vector search, hybrid search, semantic ranking, and agentic retrieval. Use the azure-search-documents-py skill when you need practical guidance from setup to working query patterns.

Backend Development

Favorites 0GitHub 2.3k

gget

by K-Dense-AI

gget is a bioinformatics skill for fast, unified access to 20+ genomic databases and analysis tools from CLI or Python. Use it for gene info, BLAST-related lookups, AlphaFold structures, expression data, disease associations, and enrichment-style analysis. It suits quick exploration and gget for Data Analysis workflows.

Data Analysis

Favorites 0GitHub 0

torch-geometric

by K-Dense-AI

torch-geometric skill guide for PyTorch Geometric graph neural networks. Use it for torch-geometric install help, torch-geometric usage, graph classification, node classification, link prediction, heterogeneous graphs, custom MessagePassing layers, and scaling GNNs for Machine Learning workflows.

Machine Learning

Favorites 0GitHub 21.4k

rdkit

by K-Dense-AI

The rdkit skill helps with precise cheminformatics workflows: parsing SMILES, SDF, MOL, PDB, and InChI; calculating descriptors; generating fingerprints; running substructure search; handling reactions; and building 2D/3D coordinates. Use this rdkit guide for advanced control, custom sanitization, and rdkit for Data Analysis workflows.

Data Analysis

Favorites 0GitHub 21.4k

huggingface-vision-trainer

by huggingface

huggingface-vision-trainer helps you install and use a Hugging Face skill for vision training jobs: object detection, image classification, and SAM/SAM2 segmentation. It covers dataset prep, cloud GPU setup, evaluation, Trackio logging, and pushing results to the Hub. Ideal for backend automation and repeatable training workflows.

Backend Development

Favorites 0GitHub 10.4k

seo-dataforseo

by AgriciDaniel

seo-dataforseo connects Claude to live SEO data through the DataForSEO MCP server for SERP checks, keyword research, backlinks, on-page analysis, competitor research, business listings, and AI visibility tracking. It is best for data-backed workflows when you need real search evidence, clear install guidance, and practical seo-dataforseo usage.

Keyword Research

Favorites 0GitHub 6.2k

pymc

by K-Dense-AI

PyMC is a Bayesian modeling skill for building, fitting, checking, and comparing probabilistic models in Python. Use pymc for hierarchical regression, multilevel analysis, time series, missing data, measurement error, and model comparison with LOO or WAIC.

Data Analysis

Favorites 0GitHub 0

pymatgen

by K-Dense-AI

pymatgen is a Python materials science toolkit for crystal structures, phase diagrams, electronic structure, and file conversion. This pymatgen skill helps with scientific workflows using CIF, POSCAR, VASP, and Materials Project data.

Scientific

Favorites 0GitHub 0

geopandas

by K-Dense-AI

geopandas skill for Python geospatial vector data analysis, including shapefiles, GeoJSON, and GeoPackage files. Use it to read, clean, join, buffer, clip, reproject, and export spatial data with less guesswork.

Data Analysis

Favorites 0GitHub 0

analyzing-threat-intelligence-feeds

by mukul975

Analyzing-threat-intelligence-feeds helps you ingest CTI feeds, normalize indicators, assess feed quality, and enrich IOCs for STIX 2.1 workflows. This analyzing-threat-intelligence-feeds skill is built for threat intel operations and Data Analysis, with practical guidance for TAXII, MISP, and commercial feeds.

Data Analysis

Favorites 0GitHub 0

azure-ai-textanalytics-py

by microsoft

azure-ai-textanalytics-py is a skill for Azure AI Text Analytics in Python. It helps with sentiment analysis, entity recognition, key phrase extraction, language detection, PII detection, and healthcare NLP. Use it when you need a fast path to Azure client setup, authentication, and practical text analytics usage for apps, notebooks, or data analysis workflows.

Data Analysis

Favorites 0GitHub 0

chdb-sql

by ClickHouse

chdb-sql is a GitHub skill for running ClickHouse SQL in Python without a server. It covers chdb.query(), Session, DB-API connections, table functions like file() and s3(), parametrized queries, and backend development workflows for local files and external data sources.

Backend Development

Favorites 0GitHub 0

pytdc

Overview of pytdc skill

What pytdc is for

Who should install it

Why it matters

How to Use pytdc skill

Install pytdc in your environment

Start from the right files

Turn a rough goal into a usable prompt

Workflow that usually works best

pytdc skill FAQ

Is pytdc only for drug discovery models?

Do I need PyTDC experience before using the skill?

How is this different from a normal prompt?

When should I not use pytdc?

How to Improve pytdc skill

Provide the task before the model idea

Specify split and metric expectations

Share your constraints and data shape

Iterate by tightening the dataset target

Ratings & Reviews