vector-index-tuning

by wshobson

vector-index-tuning helps tune vector search indexes for latency, recall, and memory. Use it to choose index types, adjust HNSW settings, and compare quantization options for RAG workflows.

Stars32.6k

Favorites0

Comments0

AddedMar 30, 2026

CategoryRAG Workflows

Install Command

npx skills add wshobson/agents --skill vector-index-tuning

Curation Score

This skill scores 71/100, which means it is acceptable to list for directory users who want reusable guidance on vector index optimization, but they should expect a documentation-heavy reference rather than a tightly operational workflow. The repository evidence shows substantial content with concrete tuning topics like HNSW parameters, index selection, and quantization tradeoffs, so an agent can likely trigger it correctly. However, the lack of support files, install instructions, and stronger procedural signals means users may still need to translate the guidance into their own stack.

71/100

Strengths

Strong triggerability from a specific description covering HNSW tuning, quantization, latency, recall, and scaling use cases.
Substantial skill content with structured sections, tables, and code fences that go beyond a placeholder or thin prompt wrapper.
Useful decision guidance for common vector-search choices, including index type ranges and parameter tradeoffs.

Cautions

Operational clarity is limited by the absence of scripts, references, or repo/file integration examples, so execution still requires interpretation.
No install command or practical quick-start path is evident in SKILL.md, which weakens confidence for fast adoption.

RAG Vector Databases Semantic Search Embedding Performance Ml Ai

Overview

Overview of vector-index-tuning skill

What vector-index-tuning is for

The vector-index-tuning skill helps you choose and tune vector search index settings for real production tradeoffs: latency, recall, memory usage, build time, and scale. It is most useful when a RAG system works in principle but retrieval quality, query speed, or infrastructure cost is no longer acceptable.

Who should use this skill

This vector-index-tuning skill is a good fit for:

engineers running semantic search or RAG in production
teams choosing between Flat, HNSW, quantized HNSW, IVF+PQ, or disk-backed indexes
builders who need concrete parameter guidance instead of generic “optimize your embeddings” advice

If you are still validating whether vector search is needed at all, this may be too early.

The real job-to-be-done

Users typically do not want “index theory.” They want answers to questions like:

Why is recall dropping after quantization?
Which HNSW settings should I try first?
At what data size should I stop using exact search?
How do I reduce RAM without making RAG retrieval obviously worse?

vector-index-tuning for RAG Workflows is strongest when you already know your corpus size, dimensionality, latency budget, and acceptable recall loss.

What makes it different from a generic prompt

A normal prompt often produces hand-wavy suggestions. vector-index-tuning is more useful because it gives a practical decision frame:

index type by dataset scale
HNSW parameter roles (M, efConstruction, efSearch)
quantization options by memory/quality tradeoff
production-oriented thinking for large collections

That makes it easier to move from “our retrieval feels slow” to a concrete tuning plan.

What to know before installing

This skill is a single SKILL.md guide with no helper scripts or benchmark harness. That means adoption is lightweight, but execution depends on the quality of your own metrics and test setup. Install it if you want structured tuning guidance; do not expect ready-made automation.

How to Use vector-index-tuning skill

vector-index-tuning install

Install from the repository with:

npx skills add https://github.com/wshobson/agents --skill vector-index-tuning

Because the skill lives as one markdown guide, install is simple. The main practical work happens after install: supplying enough system details for the model to make good tuning recommendations.

Read this file first

Start with:

SKILL.md

There are no support scripts, references, or rules folders here, so nearly all usable guidance is in that one file. This is good for fast review, but it also means you should bring your own benchmark data rather than expecting embedded test assets.

What input the skill needs to work well

For strong vector-index-tuning usage, give the model:

number of vectors
embedding dimension
current index type
current HNSW settings if applicable
memory budget
target p95 or p99 latency
required recall target or acceptable quality loss
update pattern: mostly static, batch refresh, or high-write
RAG retrieval setup: top-k, reranking, filtering, metadata constraints

Without those inputs, the skill can only return generic recommendations.

Turn a rough goal into a usable prompt

Weak prompt:

Tune my vector index.

Stronger prompt:

Use the vector-index-tuning skill. I have 18M vectors at 768 dimensions for a RAG system. Current index is HNSW with M=16, efConstruction=100, efSearch=40. p95 latency is 140ms, RAM is too high, and recall@10 versus brute-force is 0.91. I can tolerate recall@10 down to 0.88 if p95 falls below 80ms and RAM drops by 30%. Recommend index strategy, parameter changes, and a benchmark plan.

This works better because it exposes the real optimization target and acceptable tradeoff boundary.

Best workflow for vector-index-tuning for RAG Workflows

A practical sequence is:

Describe corpus size and current retrieval architecture.
State the business constraint first: latency, memory, or recall.
Ask the skill to choose an index family before tuning fine-grained parameters.
Benchmark against a fixed query set and ground-truth method.
Iterate one variable group at a time.

This matters because many teams jump straight to parameter sweeps without confirming they are using the right index type for their scale.

How to choose index family first

The skill’s core decision table is useful as a first-pass filter:

under ~10K vectors: Flat exact search is often simpler and good enough
~10K to 1M: HNSW is usually the default candidate
~1M to 100M: HNSW plus quantization becomes relevant
above ~100M: IVF+PQ or DiskANN-style approaches become more plausible

Treat these as starting points, not laws. If your vectors are heavily filtered, frequently updated, or deployed on very tight memory budgets, your best choice may differ.

How to use HNSW guidance well

When asking for HNSW help, include all three major knobs:

M: graph connectivity, usually better recall with more memory
efConstruction: build quality versus build cost
efSearch: query-time recall versus latency

A useful prompt pattern is:

Use the vector-index-tuning skill to propose a minimal test matrix for M, efConstruction, and efSearch that fits my latency and recall targets, and explain which parameter I should lock first.

This gets you an ordered tuning plan instead of an unstructured list of values.

How to use quantization guidance well

If memory is the main pain point, ask the skill to compare:

FP32
FP16
INT8 scalar quantization
Product Quantization
binary representations when relevant

Good prompt:

I need a 2-4x memory reduction for 50M vectors and can accept modest recall loss in first-stage retrieval because a reranker follows. Use the vector-index-tuning skill to compare FP16, INT8, and PQ for this RAG pipeline.

This is stronger than asking “should I quantize?” because it links compression tolerance to downstream reranking.

What outputs you should expect

The best outcome is not one magic parameter set. It is:

a narrowed index choice
a short candidate parameter grid
an evaluation plan
tradeoff explanations you can test

If the model gives only one configuration with no benchmark method, ask it to revise into an experiment plan.

Practical repository-reading path

Since only SKILL.md is present, focus on these sections in order:

When to Use This Skill
Core Concepts
Index Type Selection
HNSW Parameters
Quantization Types
code templates near the bottom

That reading path gets you decision logic first, then tuning knobs, then implementation patterns.

Common adoption blockers

Teams usually stall for one of these reasons:

no recall baseline against exact search
no fixed query set for comparing runs
trying to optimize latency and recall without a memory budget
using synthetic benchmarks that do not resemble real RAG queries

The skill helps with tuning decisions, but it cannot replace representative evaluation data.

vector-index-tuning skill FAQ

Is vector-index-tuning good for beginners

Yes, if you already understand what a vector index is. No, if you are still deciding between keyword search, hybrid search, and dense retrieval. The skill assumes you are past basic retrieval architecture selection and need tuning guidance.

When is vector-index-tuning not the right tool

Do not start with vector-index-tuning if your real issue is:

poor chunking
bad embeddings
weak document preprocessing
missing metadata filters
no reranking where one is needed

Index tuning will not fix relevance problems caused upstream.

Is this better than asking an LLM directly

Usually yes, because the vector-index-tuning skill keeps the conversation centered on measurable tradeoffs and known parameter levers instead of generic optimization advice. The gain is structure, not automation.

Does it help with vector-index-tuning for RAG Workflows specifically

Yes. The skill is especially relevant for first-stage retrieval in RAG, where you often need to balance recall and cost before reranking. It becomes more useful when you explicitly tell it whether a reranker exists, what top-k you use, and whether metadata filtering shrinks the candidate set.

Does the skill include runnable benchmarking tools

No. Based on the repository structure, this skill is documentation-driven. You should expect conceptual guidance and code examples, not a complete harness for measuring recall, build time, and latency in your environment.

What if my collection updates frequently

Use the skill, but mention update frequency explicitly. Some index choices look excellent for static corpora and less attractive for heavy write workloads. This is one of the easiest ways to get an answer that sounds smart but is operationally wrong.

How to Improve vector-index-tuning skill

Give the skill hard constraints, not preferences

The fastest way to improve vector-index-tuning results is to replace vague goals with numbers:

“under 75ms p95”
“under 64GB RAM”
“recall@20 must stay above 0.9”
“nightly rebuild is acceptable”
“ingest is continuous, no long offline rebuilds”

Numeric constraints force clearer recommendations.

Provide a baseline and a target delta

Better input:

Current HNSW index uses 92GB RAM, p95 is 110ms, recall@10 is 0.93. Need 30% lower memory and under 85ms p95.

This lets the skill reason from a real starting point. Without baseline metrics, its output will be too generic to trust.

Ask for a benchmark matrix, not a single answer

A high-value prompt is:

Use the vector-index-tuning skill to produce a 6-run benchmark matrix prioritized by information gain, not exhaustiveness.

That usually yields better practical results than requesting “best settings,” because vector index performance depends heavily on data distribution and workload.

Separate retrieval quality from final answer quality

In RAG, users often judge index changes by end-answer quality alone. Improve results by asking the skill to separate:

raw retrieval recall
latency
memory footprint
downstream reranker impact
end-task quality

This avoids over-tuning the index for a metric your application does not actually optimize.

State whether filtering changes the search space

If your system applies tenant, language, date, or product filters before or during search, say so. Filtered search can change the best index decision materially. This is especially important for vector-index-tuning for RAG Workflows in multi-tenant systems.

Common failure modes to watch for

The most common mistakes are:

raising efSearch without checking whether HNSW graph quality is the real bottleneck
compressing too aggressively before establishing a recall floor
comparing indexes on different query sets
choosing IVF/PQ for scale alone without validating query distribution
ignoring build and refresh costs

These are exactly the cases where a seemingly faster setup underperforms in production.

How to iterate after the first output

After the first recommendation, reply with results in a compact table:

configuration
RAM
build time
p95 latency
recall@k
notes on retrieval errors

Then ask:

Revise the tuning plan using these measurements and eliminate dominated configurations.

That second-pass loop is where the skill becomes materially better than a one-shot prompt.

Improve trust by requesting explicit tradeoff language

Ask the skill to label each recommendation as:

likely win
risky but high upside
low effort
requires benchmark confirmation

This makes it easier to prioritize changes and reduces the chance of copying a suggestion that only works under ideal assumptions.

Pair the skill with your own exact-search ground truth

The single best upgrade to vector-index-tuning usage is a small exact-search benchmark on representative queries. Even a few hundred labeled or brute-force-evaluated queries dramatically improves decision quality, because every tuning recommendation can be tested against a known recall baseline.

What success looks like

A good use of vector-index-tuning ends with:

a justified index family choice
a short parameter shortlist
benchmark evidence for recall, speed, and memory
a deployment decision aligned to your RAG workload

If you do not leave with a testable plan, ask the skill to be more operational and less descriptive.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

iterative-retrieval

by affaan-m

iterative-retrieval is a workflow pattern for progressively refining context retrieval in agentic work. It helps subagents avoid too much or too little context, making it useful for iterative-retrieval usage, install decisions, and iterative-retrieval for Workflow Automation.

Workflow Automation

Favorites 0GitHub 156.2k

azure-ai-contentunderstanding-py

by microsoft

azure-ai-contentunderstanding-py is the Python skill for Azure AI Content Understanding. It extracts structured content from documents, images, audio, and video for RAG workflows and automation. Use it when you need reliable multimodal extraction, Azure authentication, and repeatable pipeline-ready output.

RAG Workflows

Favorites 0GitHub 2.2k

azure-search-documents-ts

by microsoft

azure-search-documents-ts helps backend developers build Azure AI Search solutions with the @azure/search-documents SDK. Use it for index creation, document upload, keyword, vector, hybrid, and semantic search, plus credential and environment setup. It is a practical azure-search-documents-ts guide for backend development.

Backend Development

Favorites 0GitHub 2.3k

hybrid-search-implementation

by wshobson

The hybrid-search-implementation skill shows how to combine vector and keyword retrieval with RRF, linear fusion, reranking, and cascade patterns for RAG and search systems.

RAG Workflows

Favorites 0GitHub 32.6k

embedding-strategies

by wshobson

embedding-strategies helps you choose and optimize embedding models for semantic search and RAG workflows, with practical guidance on chunking, model tradeoffs, multilingual content, and retrieval evaluation.

RAG Workflows

Favorites 0GitHub 32.6k

rag-implementation

by wshobson

rag-implementation is a practical skill for planning RAG systems with vector databases, embeddings, retrieval patterns, and grounded-answer workflows. Use it to compare stack options, shape architecture decisions, and guide install and usage for document Q&A, knowledge assistants, and semantic search.

RAG Workflows

Favorites 0GitHub 32.6k

langchain-architecture

by wshobson

langchain-architecture is a design guide for building LangChain 1.x and LangGraph applications. Use it to choose between chains, agents, retrieval, memory, and stateful orchestration patterns before implementation.

Agent Orchestration

Favorites 0GitHub 32.6k

similarity-search-patterns

by wshobson

similarity-search-patterns helps you choose distance metrics, index types, and hybrid retrieval patterns for semantic search and RAG workflows. Use it to plan production vector search tradeoffs around recall, latency, and scale.

RAG Workflows

Favorites 0GitHub 32.6k

frontend-design

by anthropics

frontend-design helps you turn vague UI ideas into distinctive, production-grade interfaces with real frontend code, strong aesthetic direction, and less generic AI styling.

UI Design

Favorites 1GitHub 105.2k

create-colleague

by titanwings

create-colleague turns coworker docs, chats, emails, screenshots, Feishu, and DingTalk data into an editable AI skill with separate work and persona outputs, plus update flows for ongoing refinement.

Skill Authoring

Favorites 1GitHub 747

hyperframes

by heygen-com

hyperframes is a workflow skill for building HTML-based video compositions in HyperFrames. Use it for title cards, overlays, captions, voiceovers, audio-reactive motion, and scene transitions when you need structured, code-first hyperframes for Video Editing. It favors layout, timing, and animation decisions over generic prompt-only video requests.

Video Editing

Favorites 0GitHub 2.7k

kreuzberg

by kreuzberg-dev

The kreuzberg skill helps you install and use Kreuzberg for document extraction across 91+ formats, including PDFs, Office files, images, HTML, email, and archives. It covers Python, Node.js/TypeScript, Rust, and CLI workflows for OCR, tables, metadata, batch processing, and practical parsing guidance.

PDF Processing

Favorites 0GitHub 0

skill-creator

by anthropics

skill-creator is a Skill Authoring meta-skill for drafting new skills, revising existing SKILL.md files, running evals, comparing variants, and improving trigger descriptions with repository scripts and review tools.

Skill Authoring

Favorites 2GitHub 105.1k

azure-identity-py

by microsoft

azure-identity-py helps set up Azure authentication in Python with Microsoft Entra ID. Use it to choose DefaultAzureCredential, managed identity, or service principal auth, configure environment variables, and troubleshoot access control and credential chain issues. Install guidance, usage patterns, and practical setup notes are based on the repo skill file.

Access Control

Favorites 0GitHub 2.2k

claude-api

by anthropics

claude-api is a practical skill for installing and using the Claude API and Anthropic SDKs. It helps developers choose the right SDK or raw HTTP path, detect language-specific docs, and implement streaming, tool use, files, batches, and error handling with less guesswork.

API Development

Favorites 0GitHub 105k

wrangler

by cloudflare

The wrangler skill helps you find correct CLI commands, config shapes, and deployment steps for Cloudflare Workers. Use it for wrangler usage, wrangler install checks, and a practical wrangler guide when building or shipping Workers for Backend Development.

Backend Development

Favorites 0GitHub 1.3k