pytdc
by K-Dense-AIpytdc is a skill for Therapeutics Data Commons, giving AI-ready drug discovery datasets and benchmarks for ADME, toxicity, DTI, DDI, generation, scaffold splits, and pharmacological prediction.
This skill scores 78/100, which means it is a solid listing candidate for directory users who need a practical PyTDC workflow for therapeutics ML. The repository gives enough operational detail to help an agent recognize when to use it, install it, and work with key dataset/benchmark tasks with less guesswork than a generic prompt.
- Explicit use cases cover ADME, toxicity, drug-target interaction, molecule generation, and benchmark evaluation.
- Installation and upgrade commands are provided with a concrete pip/uv path, improving triggerability and adoption.
- Long, structured SKILL.md with many headings and workflow sections suggests substantive operational guidance rather than a placeholder.
- Repository tree shows no scripts, references, resources, or install command metadata beyond SKILL.md, so some workflows may rely on narrative instructions only.
- The excerpt indicates broad coverage but not a fully visible end-to-end quick start here, so users may still need some trial-and-error for specific tasks.
Overview of pytdc skill
What pytdc is for
pytdc is the skill for using Therapeutics Data Commons in AI-driven drug discovery workflows. It helps you get to curated, AI-ready datasets and benchmarks for ADME, toxicity, bioactivity, drug-target interaction, drug-drug interaction, generation, and related evaluation tasks without inventing your own data schema.
Who should install it
Install the pytdc skill if you are doing therapeutic ML, pharmacological prediction, or benchmarking models on standardized splits and metrics. It is a strong fit for data scientists who need reproducible dataset access; it is a weaker fit if you only need a generic chemistry prompt with no dataset loading or evaluation step.
Why it matters
The main value of the pytdc skill is not just dataset access, but the structure around it: task-specific loaders, standard splits such as scaffold or cold splits, and benchmark-friendly evaluation choices. That reduces the usual adoption blockers in drug discovery work, where inconsistent preprocessing and ad hoc splitting can make results hard to trust.
How to Use pytdc skill
Install pytdc in your environment
Use the install command from the skill instructions first:
uv pip install PyTDC
For updating an existing setup, use:
uv pip install PyTDC --upgrade
If your workflow uses a different package manager, map the same package name into that environment rather than rewriting the skill’s assumptions.
Start from the right files
Begin with SKILL.md, then read the sections on overview, when to use, installation, and quick start before jumping into code. If you need broader project context, inspect any nearby documentation the repo exposes through the skill file tree; in this repository, the skill content itself is the main source of truth.
Turn a rough goal into a usable prompt
The pytdc usage works best when your prompt names the task, dataset family, split strategy, and output goal. For example, instead of asking for “help with PyTDC,” ask for:
- “Load an ADME dataset in
pytdc, use a scaffold split, and prepare a baseline regression workflow.” - “Show a
pytdc guidefor DTI benchmarking with train/validation/test splits and metric reporting.” - “Set up
pytdc for Data Analysison a toxicity dataset and summarize label balance, missingness, and split design.”
Those details help the skill choose the right task path and avoid generic code that does not match your experiment.
Workflow that usually works best
First identify the therapeutic task, then confirm the dataset class and split policy, then load the data and inspect labels before modeling. If you are benchmarking, decide early whether you need a scaffold split, a cold split, or another predefined evaluation setup, because that choice affects comparability more than model choice does.
pytdc skill FAQ
Is pytdc only for drug discovery models?
Mostly yes. The pytdc skill is built around therapeutic ML and pharmacology use cases, especially datasets and benchmarks rather than general-purpose tabular analysis. If your project is unrelated to compounds, proteins, or drug interaction tasks, a different skill is probably a better fit.
Do I need PyTDC experience before using the skill?
No. The skill is useful for beginners who can describe a dataset goal in plain language. What matters most is being specific about the target task, desired split, and whether you need analysis, prediction, or generation.
How is this different from a normal prompt?
A normal prompt can describe one-off loading or modeling steps, but the pytdc skill is more useful when you want repeatable data access and benchmark discipline. That is especially important when you need standard splits and evaluation conventions that make results easier to compare.
When should I not use pytdc?
Do not use pytdc if you do not need TDC datasets or therapeutic benchmarks, or if you only want a high-level overview of medicinal chemistry concepts. It is also not the best choice if your data is proprietary and unrelated to the supported therapeutic task families.
How to Improve pytdc skill
Provide the task before the model idea
The most useful improvement to a pytdc request is clearer problem framing. Say whether you need property prediction, DTI, DDI, molecule generation, or retrosynthesis before mentioning architectures or metrics. That lets the skill choose the right dataset and preprocessing assumptions.
Specify split and metric expectations
Many failures come from underspecified evaluation. If you care about a scaffold split, cold split, ROC-AUC, PR-AUC, RMSE, or ranking metrics, say so up front in your pytdc prompt. The output is much better when the split strategy and metric are fixed before the modeling discussion starts.
Share your constraints and data shape
If you need notebook-ready code, a lightweight data audit, or compatibility with a specific stack, include that in the request. For pytdc for Data Analysis, mention whether you want class balance, missing-value checks, descriptor summaries, or train/test leakage risk checks so the output focuses on the right diagnostics.
Iterate by tightening the dataset target
If the first answer is too broad, narrow it by dataset family, task type, and output format. A better follow-up might be: “Keep the same pytdc workflow, but switch to toxicity classification, use a scaffold split, and return only the data-loading and evaluation steps.”
