molfeat

by K-Dense-AI

molfeat is a molecular featurization skill for ML and Data Analysis. It helps convert SMILES or RDKit molecules into fingerprints, descriptors, and pretrained embeddings for QSAR, virtual screening, similarity search, and chemical space analysis. Use this molfeat guide to pick practical representations and build reusable featurization pipelines.

Stars0

Favorites0

Comments0

AddedMay 14, 2026

CategoryData Analysis

Install Command

npx skills add K-Dense-AI/claude-scientific-skills --skill molfeat

Curation Score

This skill scores 78/100, which means it is a solid listing candidate for Agent Skills Finder. The repository gives users enough evidence that an agent can trigger it for molecular featurization tasks, understand its purpose quickly, and get real workflow leverage beyond a generic prompt, though a few adoption details are still under-specified.

78/100

Strengths

Clear, domain-specific trigger: the skill is explicitly for molecular featurization, QSAR/QSPR, virtual screening, similarity search, and SMILES-to-features workflows.
Strong operational depth: the body is substantial (14k+ chars) with many headings and workflow signals, suggesting usable guidance rather than a stub.
Concrete installation and capability framing: it names 100+ featurizers and includes install commands plus optional dependency variants for specific model families.

Cautions

No embedded scripts, references, or support files were provided in the repo snapshot, so users must trust the prose without extra executable or validation assets.
The excerpt shows installation detail but not a fully visible end-to-end quick-start in the provided evidence, so some edge-case triggering may still require user interpretation.

Python Scikit Learn Machine Learning Chemistry Dataset Bioinformatics

Overview

Overview of molfeat skill

What the molfeat skill does

The molfeat skill helps you turn molecules into machine-learning features. It is best for users who need a practical molfeat guide for QSAR, QSPR, virtual screening, similarity search, or chemical space analysis. Instead of writing one-off feature code, molfeat gives you a standard way to convert SMILES or RDKit molecules into numeric vectors, fingerprints, descriptors, and pretrained embeddings.

Who should use it

Use the molfeat skill if you are doing molecular ML for Data Analysis, building featurization pipelines, or comparing representation choices across models. It is especially useful when you want scikit-learn-style transformers, parallel processing, and caching without assembling every featurizer manually.

Why it is different

The main value of molfeat is breadth plus consistency: many featurizers in one library, unified inputs, and outputs that fit downstream ML workflows. The tradeoff is that you still need to choose the right representation for your task, and some embeddings depend on optional extras. If you only need one fingerprint, a plain RDKit script may be simpler; if you need repeatable feature generation across many molecule types, molfeat is the stronger fit.

How to Use molfeat skill

Install molfeat and the right extras

For most users, the molfeat install step is straightforward: install the base package, then add extras only for the featurizers you actually need. A common starting point is:

uv pip install molfeat
# or, if you need broader support
uv pip install "molfeat[all]"

If your workflow depends on graph models, pretrained language-model embeddings, or a specific backend, verify the optional dependency before you design the pipeline.

Start from the input you already have

The skill works best when you state your actual molecule format, task, and output shape up front. Good inputs include: a column of SMILES, an RDKit molecule list, a desired fingerprint family, and the downstream model type. For example, “Convert 50k SMILES into cached Morgan fingerprints for a scikit-learn classification model” is much better than “featurize these compounds.”

Read the right files first

For this repo, start with SKILL.md and the installation section, then scan the overview and the “When to Use This Skill” guidance. That gives you the fastest route to the supported workflows, dependency expectations, and the featurizer families most likely to matter. Because the repo is compact, the main decision value is in understanding fit and dependencies, not in hunting for helper files.

Practical prompt pattern

When invoking the molfeat usage workflow, include the task, molecule source, preferred representation, and constraints. A strong request looks like: “I have a CSV of SMILES, need a reproducible featurization step for QSAR, prefer scikit-learn compatibility, and want to compare ECFP, MACCS, and physicochemical descriptors.” That lets the skill choose a sensible path instead of guessing at your intent.

molfeat skill FAQ

Is molfeat only for cheminformatics experts?

No. The molfeat skill is beginner-friendly if you can describe your molecules and your prediction goal. The hard part is not syntax; it is choosing a representation that matches your dataset and model.

When should I not use molfeat?

Skip molfeat if you only need a single trivial descriptor, or if your workflow is not molecular Data Analysis at all. It is also a weaker choice if you want a full training pipeline rather than just featurization.

How is this different from a generic prompt?

A generic prompt may explain fingerprints in theory, but molfeat gives a concrete install-and-use path for molecular features, caching, and transformer-based workflows. That matters when you need output that is ready for actual modeling, not just conceptual advice.

What usually blocks adoption?

The main blockers are missing optional dependencies, unclear input format, and choosing an overcomplicated featurizer for the task. If you know whether you are working from SMILES or RDKit objects, and whether you need classical descriptors or pretrained embeddings, adoption is much easier.

How to Improve molfeat skill

Give the skill better molecule context

The strongest way to improve molfeat results is to specify the molecule source, batch size, and target use case. For example: “SMILES from an assay CSV, 20k rows, binary classification, need compact features for random forest” is more actionable than “make features.”

State the constraints that matter

If you care about speed, memory, reproducibility, or model compatibility, say so directly. Those constraints change whether the best molfeat option is a simple fingerprint, a descriptor set, or a pretrained embedding with extra dependencies.

Ask for a comparison when choosing representations

If you are unsure which representation to use, ask for a side-by-side recommendation instead of a single answer. For example: “Compare ECFP, MACCS, and pretrained embeddings for a small QSAR dataset with limited compute.” That kind of prompt forces the skill to explain tradeoffs that affect final model quality.

Iterate from a baseline

Start with one stable featurization, confirm the output shape and missing-value behavior, then expand to alternatives. In practice, the fastest improvement path is to validate a simple molfeat pipeline first, then refine with caching, batching, or a richer feature set once the baseline works.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

clickhouse-best-practices

by ClickHouse

clickhouse-best-practices is a ClickHouse best practices skill for Database Engineering. It guides schema design, query tuning, insert strategy, and agent connectivity with rule-based recommendations, making clickhouse-best-practices usage easier to trigger, review, and cite in ClickHouse workflows.

Database Engineering

Favorites 0GitHub 412

chdb-datastore

by ClickHouse

chdb-datastore is a pandas-compatible skill for fast data analysis with a ClickHouse-backed DataStore API. It supports file, database, and cloud connectors, cross-source joins, and minimal code changes for pandas-style workflows. Use this chdb-datastore guide when you want a drop-in analysis layer for larger datasets.

Data Analysis

Favorites 0GitHub 0

sympy

by K-Dense-AI

Use the sympy skill for exact symbolic math in Python, including algebra, calculus, matrices, physics formulas, number theory, geometry, and code generation. It helps you keep expressions exact, choose the right SymPy modules, and avoid float-heavy mistakes. Best for users who need a practical sympy guide for symbolic workflows and sympy for Data Analysis.

Data Analysis

Favorites 0GitHub 21.4k

interpreting-culture-index

by trailofbits

interpreting-culture-index helps interpret Culture Index surveys, profile exports, and related hiring or coaching notes. Use this interpreting-culture-index skill for role fit, team dynamics, burnout risk, candidate debriefs, onboarding plans, and conflict mediation. It emphasizes arrow-relative reading, anti-pattern checks, and practical outputs for data analysis and decision support.

Data Analysis

Favorites 0GitHub 5k

azure-search-documents-py

by microsoft

azure-search-documents-py is the Python Azure AI Search skill for backend development, covering install, auth, index design, vector search, hybrid search, semantic ranking, and agentic retrieval. Use the azure-search-documents-py skill when you need practical guidance from setup to working query patterns.

Backend Development

Favorites 0GitHub 2.3k

gget

by K-Dense-AI

gget is a bioinformatics skill for fast, unified access to 20+ genomic databases and analysis tools from CLI or Python. Use it for gene info, BLAST-related lookups, AlphaFold structures, expression data, disease associations, and enrichment-style analysis. It suits quick exploration and gget for Data Analysis workflows.

Data Analysis

Favorites 0GitHub 0

channel-economics

by alirezarezvani

channel-economics helps RevOps and commercial leaders compare direct, partner, marketplace, reseller, or OEM channels with fully loaded cost-to-serve, ROI lenses, and constrained channel-mix recommendations. Includes Python scripts, data templates, and guidance for channel-economics usage.

Revenue Operations

Favorites 0GitHub 22.1k

torch-geometric

by K-Dense-AI

torch-geometric skill guide for PyTorch Geometric graph neural networks. Use it for torch-geometric install help, torch-geometric usage, graph classification, node classification, link prediction, heterogeneous graphs, custom MessagePassing layers, and scaling GNNs for Machine Learning workflows.

Machine Learning

Favorites 0GitHub 21.4k

rdkit

by K-Dense-AI

The rdkit skill helps with precise cheminformatics workflows: parsing SMILES, SDF, MOL, PDB, and InChI; calculating descriptors; generating fingerprints; running substructure search; handling reactions; and building 2D/3D coordinates. Use this rdkit guide for advanced control, custom sanitization, and rdkit for Data Analysis workflows.

Data Analysis

Favorites 0GitHub 21.4k

huggingface-vision-trainer

by huggingface

huggingface-vision-trainer helps you install and use a Hugging Face skill for vision training jobs: object detection, image classification, and SAM/SAM2 segmentation. It covers dataset prep, cloud GPU setup, evaluation, Trackio logging, and pushing results to the Hub. Ideal for backend automation and repeatable training workflows.

Backend Development

Favorites 0GitHub 10.4k

seo-dataforseo

by AgriciDaniel

seo-dataforseo connects Claude to live SEO data through the DataForSEO MCP server for SERP checks, keyword research, backlinks, on-page analysis, competitor research, business listings, and AI visibility tracking. It is best for data-backed workflows when you need real search evidence, clear install guidance, and practical seo-dataforseo usage.

Keyword Research

Favorites 0GitHub 6.2k

pymc

by K-Dense-AI

PyMC is a Bayesian modeling skill for building, fitting, checking, and comparing probabilistic models in Python. Use pymc for hierarchical regression, multilevel analysis, time series, missing data, measurement error, and model comparison with LOO or WAIC.

Data Analysis

Favorites 0GitHub 0

pymatgen

by K-Dense-AI

pymatgen is a Python materials science toolkit for crystal structures, phase diagrams, electronic structure, and file conversion. This pymatgen skill helps with scientific workflows using CIF, POSCAR, VASP, and Materials Project data.

Scientific

Favorites 0GitHub 0

geopandas

by K-Dense-AI

geopandas skill for Python geospatial vector data analysis, including shapefiles, GeoJSON, and GeoPackage files. Use it to read, clean, join, buffer, clip, reproject, and export spatial data with less guesswork.

Data Analysis

Favorites 0GitHub 0

analyzing-threat-intelligence-feeds

by mukul975

Analyzing-threat-intelligence-feeds helps you ingest CTI feeds, normalize indicators, assess feed quality, and enrich IOCs for STIX 2.1 workflows. This analyzing-threat-intelligence-feeds skill is built for threat intel operations and Data Analysis, with practical guidance for TAXII, MISP, and commercial feeds.

Data Analysis

Favorites 0GitHub 0

azure-ai-textanalytics-py

by microsoft

azure-ai-textanalytics-py is a skill for Azure AI Text Analytics in Python. It helps with sentiment analysis, entity recognition, key phrase extraction, language detection, PII detection, and healthcare NLP. Use it when you need a fast path to Azure client setup, authentication, and practical text analytics usage for apps, notebooks, or data analysis workflows.

Data Analysis

Favorites 0GitHub 0