ai-rag-pipeline
by inferen-shBuild Retrieval Augmented Generation (RAG) pipelines that combine web search tools (Tavily, Exa) with LLMs (Claude, GPT-4, Gemini via OpenRouter) using the inference.sh CLI. Ideal for research agents, fact-checkers, and AI assistants that need grounded, cited answers.
Overview
What is ai-rag-pipeline?
The ai-rag-pipeline skill helps you build Retrieval Augmented Generation (RAG) workflows that combine live web search with large language models via the inference.sh (infsh) CLI. It provides a simple pattern for:
- Running web research with tools like Tavily Search and Exa
- Passing those search results into LLMs such as Claude, GPT-4, or Gemini (via OpenRouter)
- Getting grounded, source-aware answers instead of unsupported guesses
Under the hood, ai-rag-pipeline is a shell-friendly pattern for chaining infsh app run calls, so you can quickly prototype RAG-style research, question answering, and fact-checking pipelines.
Who is this skill for?
ai-rag-pipeline is a good fit if you:
- Use inference.sh to orchestrate LLM tools from the command line
- Want research-style answers with citations or explicit web context
- Are building AI research agents or assistant workflows that must stay up to date
- Need fact-checking or web-grounded summaries from multiple sources
It is especially useful for developers, data/AI researchers, and power users who are comfortable with Bash, CLIs, and JSON inputs.
What problems does ai-rag-pipeline solve?
This skill focuses on a common RAG use case: combining search and LLMs in a repeatable, scriptable way. It helps you:
- Move beyond single-prompt chat to pipeline-style research
- Use Tavily or Exa to pull in fresh, relevant information
- Feed that content into Claude, GPT-4, Gemini (via OpenRouter) through
infsh - Produce answers that can be inspected, audited, and reused in other tools or agents
If you want a Perplexity-like workflow using your own tools and models, ai-rag-pipeline gives you the building blocks.
When is ai-rag-pipeline not a good fit?
Consider other skills or approaches if:
- You are not using the inference.sh CLI or cannot install it
- You need a fully packaged web app or GUI (this is CLI/bash oriented)
- You need deep, domain-specific indexing over private docs (this skill focuses on live web search patterns rather than full vector database setup)
For higher-level automation around documents, knowledge bases, or agents, use ai-rag-pipeline as a low-level RAG building block and compose it with other skills.
How to Use
Prerequisites
Before installing ai-rag-pipeline, make sure:
- You have inference.sh CLI (
infsh) installed.- The upstream repo links to install instructions at:
https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md
- The upstream repo links to install instructions at:
- You can run
infsh loginand authenticate successfully. - You have access to the tools you intend to use (e.g., Tavily, Exa, OpenRouter-backed models) through inference.sh.
Install the ai-rag-pipeline skill
In an Agent Skills–enabled environment, install the skill with:
npx skills add https://github.com/inferen-sh/skills --skill ai-rag-pipeline
This pulls the ai-rag-pipeline definition from tools/llm/ai-rag-pipeline in the inferen-sh/skills repository and makes it available to your agent or workspace.
After installation, open the Files view and review:
SKILL.md– core description and quick start
Quick start: Simple search + answer RAG pipeline
The SKILL file illustrates a minimal RAG flow using the infsh CLI.
- Log in to inference.sh:
infsh login
- Run a Tavily search and store the result in a Bash variable:
SEARCH=$(infsh app run tavily/search-assistant --input '{"query": "latest AI developments 2024"}')
- Pass that research into an LLM (example: a Claude model via OpenRouter) for summarization:
infsh app run openrouter/claude-sonnet-45 --input "{
\"prompt\": \"Based on this research, summarize the key trends: $SEARCH\"
}"
This pattern demonstrates the core idea of ai-rag-pipeline:
- Retrieval –
tavily/search-assistantperforms web research - Augmentation – the search output is embedded in your prompt as
$SEARCH - Generation – the Claude model generates a summary grounded in that research
You can swap in other search tools (e.g., Exa Search / Exa Answer) or other models (e.g., GPT-4, Gemini via OpenRouter) as supported by your inference.sh setup.
Customizing the RAG workflow
Once the basic flow works, adapt it to your use case:
1. Change the research query
Tailor the query field to your domain:
SEARCH=$(infsh app run tavily/search-assistant --input '{"query": "impact of LLMs on healthcare regulation"}')
2. Use a different model
Swap openrouter/claude-sonnet-45 with a different LLM app configured in inference.sh, such as a GPT-4 or Gemini-backed route, if available in your environment.
3. Adjust the output style
Modify the prompt text to request:
- Bullet summaries
- Pros/cons lists
- Fact-check reports
- Study notes with citations
For example:
infsh app run openrouter/claude-sonnet-45 --input "{
\"prompt\": \"Using the sources in $SEARCH, write a concise fact-check report. Highlight any conflicting claims and clearly list cited URLs.\"
}"
4. Wrap into reusable scripts
Because ai-rag-pipeline is Bash-friendly, you can place the pattern into shell scripts for reuse:
research-topic.sh– takes a topic and returns a web-grounded summaryfact-check.sh– takes a claim, runs search, and produces a fact-check reportbriefing.sh– creates a briefing from the latest sources in a given domain
Call these scripts from your agents or CI jobs to automate research workflows.
Integrating with AI agents and workflows
ai-rag-pipeline is designed to work with agent frameworks that:
- Can call Bash (the SKILL allows
Bash(infsh *)) - Need a RAG step to fetch and ground context before generating answers
Typical integrations:
- AI research assistants that automatically call
infsh app run tavily/...before answering - Fact-checking agents that run a search pipeline before confirming or rejecting claims
- Knowledge base updaters that periodically fetch and summarize the latest news on specific topics
By standardizing on this pattern, your agents can reuse the same RAG logic while changing queries, tools, or models as needed.
FAQ
What is ai-rag-pipeline in simple terms?
ai-rag-pipeline is a small, CLI-focused RAG blueprint that teaches your agents how to:
- Call web search tools through the inference.sh CLI
- Capture the search output
- Feed it into an LLM for grounded, source-based responses
It does not try to be a full framework; instead, it gives you a practical pattern you can customize.
Do I need inference.sh to use ai-rag-pipeline?
Yes. The skill is built around the inference.sh (infsh) CLI. The Quick Start and example commands depend on infsh app run .... If you do not use inference.sh, this skill will not be directly applicable.
Which tools and models can I use with ai-rag-pipeline?
The SKILL description references these families of tools and models, as long as they are exposed through your inference.sh setup:
- Search / retrieval: Tavily Search, Exa Search, Exa Answer
- LLMs via OpenRouter: Claude variants, GPT-4, Gemini (and other OpenRouter routes supported by your account)
Check your infsh app list for the exact apps available in your environment.
Can I use ai-rag-pipeline for fact-checking?
Yes. Fact-checking is one of the main intended uses. A typical flow is:
- Formulate the claim or question
- Use Tavily or Exa to gather multiple sources
- Ask the LLM to compare sources, highlight conflicts, and provide a justified conclusion
Because the answer is grounded in retrieved content, you can inspect and verify the underlying sources.
Is this a full RAG framework with vector databases?
No. ai-rag-pipeline focuses on live web search–driven RAG via inference.sh. It does not configure databases, embeddings, or index management for private corpora. You can, however, combine its patterns with your own indexing layer if your environment exposes those tools through infsh.
How do I debug issues with the pipeline?
If something goes wrong:
- Run each
infsh app run ...command separately to confirm it returns valid JSON - Echo or log the
$SEARCHvariable to see the raw search output - Simplify the
promptand add instructions like “show your reasoning and list the URLs you used” - Consult the upstream
SKILL.mdfor any updated quick-start examples
Where can I learn more about RAG concepts?
The SKILL file includes a short explanation of RAG as a three-step process: Retrieval → Augmentation → Generation. For deeper conceptual material, use ai-rag-pipeline itself:
SEARCH=$(infsh app run tavily/search-assistant --input '{"query": "introduction to retrieval augmented generation"}')
infsh app run openrouter/claude-sonnet-45 --input "{
\"prompt\": \"Using the following sources, explain RAG to a developer audience: $SEARCH\"
}"
This lets you bootstrap your RAG learning using the pipeline you plan to deploy.
