kreuzberg

by kreuzberg-dev

The kreuzberg skill helps you install and use Kreuzberg for document extraction across 91+ formats, including PDFs, Office files, images, HTML, email, and archives. It covers Python, Node.js/TypeScript, Rust, and CLI workflows for OCR, tables, metadata, batch processing, and practical parsing guidance.

Stars0

Favorites0

Comments0

AddedMay 9, 2026

CategoryPDF Processing

Install Command

npx skills add kreuzberg-dev/kreuzberg --skill kreuzberg

Curation Score

This skill scores 91/100, which means it is a strong listing candidate for directory users: it is highly triggerable, covers a broad real workflow, and provides enough operational detail for an agent to install and use it with relatively little guesswork. The repository clearly explains when to use Kreuzberg, how to install it across multiple runtimes, and where to look for deeper API/CLI/reference guidance.

91/100

Strengths

Explicit, actionable trigger: extract text, tables, metadata, and images from 91+ formats across Python, Node.js/TypeScript, Rust, and CLI.
Strong operational coverage: installation, sync/async extraction, configuration, batch processing, OCR, error handling, and plugins are all called out in the skill description and references.
Good progressive disclosure: multiple reference files provide language-specific APIs, CLI commands, configuration, supported formats, and advanced features.

Cautions

Some install paths are spread across many references, so first-time adopters may need to read beyond SKILL.md to choose the right runtime and feature set.
No install command in SKILL.md itself, so users relying on the skill file alone may need to consult the references for exact setup details and feature flags.

Python Node.js TypeScript Rust Cli API MCP Documents

Overview

Overview of kreuzberg skill

What kreuzberg does

The kreuzberg skill helps you use Kreuzberg to extract text, tables, metadata, images, and OCR-backed content from 91+ document formats, with native support for Python, Node.js/TypeScript, Rust, and a CLI. It is best for people who need reliable document processing code, not just a one-off prompt that guesses at parsing.

Who should install it

Install kreuzberg if your task is to turn PDFs, Office files, images, HTML, email, archives, or academic files into structured output, especially when scan quality, batch runs, or language-specific OCR matter. It is a strong fit for ingestion pipelines, document search, RAG prep, and extraction tooling.

Why it is different

The main value of the kreuzberg skill is that it is implementation-oriented: it covers install paths, extraction modes, config, batch processing, error handling, and plugins across multiple runtimes. That makes it more useful than a generic “analyze this document” prompt when you need code you can actually run.

How to Use kreuzberg skill

Install and confirm the target runtime

For a fast kreuzberg install, start from the runtime you will actually ship:

pip install kreuzberg
npm install @kreuzberg/node
cargo install kreuzberg-cli

Then read the matching API reference first: references/python-api.md, references/nodejs-api.md, or references/rust-api.md. If you are using the CLI, begin with references/cli-reference.md. The skill is most effective when you choose one runtime and one document type first instead of asking for everything at once.

Turn a rough request into a usable prompt

A good kreuzberg usage prompt names the file type, extraction goal, runtime, and constraints. For example: “Use kreuzberg in Python to extract invoice text, tables, and OCR from scanned PDFs, keep line breaks, and return JSON suitable for downstream parsing.” That is better than “extract data from PDFs” because it tells the skill whether to optimize for tables, OCR, or clean text.

Read these files first

For practical kreuzberg guide work, read in this order: SKILL.md, references/configuration.md, the runtime API file, and references/supported-formats.md. Then open references/advanced-features.md if you need plugins, OCR tuning, or batch behavior. This order surfaces the decisions that most affect adoption: install shape, supported inputs, and configuration defaults.

Use the workflow that matches your job

If you are processing one file, start with a simple extract_file or CLI extract call, then add MIME hints or config only if the output is wrong. If you are processing many files, check batch helpers and error handling early. For kreuzberg for PDF Processing, OCR settings and output format usually matter more than the base extraction call, so validate those before you scale up.

kreuzberg skill FAQ

Is kreuzberg only for PDFs?

No. PDF is a major use case, but the skill also covers Office documents, images, HTML, email, archives, and academic formats. If your workload is mixed-format ingestion, kreuzberg is a better fit than a PDF-only tool.

Do I need to know the library before using the skill?

No, but you do need to know your target runtime and output goal. The kreuzberg skill is beginner-friendly if you can describe the document type, whether OCR is needed, and whether you want plain text, markdown, JSON, or structured metadata.

When should I not use kreuzberg?

Skip kreuzberg if your task is primarily semantic summarization, not extraction, or if you only need a quick manual prompt for a single document with no code output. It also may be overkill if your pipeline does not need OCR, tables, or multi-format support.

How is it different from a normal prompt?

A normal prompt can describe the task, but kreuzberg usage is about getting the right install, API call, config, and failure handling for document extraction. That makes it better when output quality depends on runtime setup, OCR backend choice, or batch processing details.

How to Improve kreuzberg skill

Provide the input shape upfront

The best kreuzberg skill results come from prompts that specify file type, source quality, and desired output. Include details like “scanned PDF,” “digital PDF,” “invoice tables,” “email attachments,” or “preserve headings.” Those details change whether OCR, chunking, or plain extraction is the right default.

State the failure mode you want to avoid

If your first output is poor, tell the skill what went wrong: missing tables, broken line breaks, slow OCR, bad language detection, or noisy images. For kreuzberg for PDF Processing, this helps narrow whether to adjust OCR backend, configuration, or output format instead of rewriting the whole workflow.

Iterate with concrete examples

A stronger improvement loop is to paste one failing file description and one target result, such as: “This scanned invoice should produce invoice number, total, vendor, and line items in JSON.” That is more useful than asking to “make it more accurate,” because the skill can tune extraction advice to the actual schema and document type.

Start narrow, then expand

Begin with one runtime, one format, and one extraction mode. Once the base kreuzberg install and extraction path are working, add batch processing, plugins, or advanced configuration. This reduces confusion and makes it easier to verify whether the problem is installation, OCR, or downstream parsing.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

pdf

by anthropics

The pdf skill guides PDF Processing tasks like text extraction, merge and split operations, rendering pages to images, and PDF form workflows. It is especially useful for checking fillable fields, extracting form metadata, and validating non-fillable form layouts with scripts.

PDF Processing

Favorites 0GitHub 105.1k

azure-ai-document-intelligence-ts

by microsoft

azure-ai-document-intelligence-ts is a TypeScript skill for extracting text, tables, key-value fields, and structured data with Azure Document Intelligence. Use it for OCR Extraction from invoices, receipts, IDs, and forms, or when you need prebuilt and custom model workflows in Node.js with Azure REST SDK authentication.

OCR Extraction

Favorites 0GitHub 2.3k

azure-ai-contentunderstanding-py

by microsoft

azure-ai-contentunderstanding-py is the Python skill for Azure AI Content Understanding. It extracts structured content from documents, images, audio, and video for RAG workflows and automation. Use it when you need reliable multimodal extraction, Azure authentication, and repeatable pipeline-ready output.

RAG Workflows

Favorites 0GitHub 2.2k

azure-ai-document-intelligence-dotnet

by microsoft

azure-ai-document-intelligence-dotnet helps .NET developers install and use Azure AI Document Intelligence to extract text, tables, key-value pairs, and structured fields from invoices, receipts, IDs, and custom documents. It includes practical setup, authentication, and OCR Extraction guidance for reliable document analysis.

OCR Extraction

Favorites 0GitHub 2.2k

nutrient-document-processing

by PSPDFKit-labs

nutrient-document-processing is a workflow skill for PDF Processing with Nutrient DWS. It helps you install, understand, and use repeatable document workflows for convert, merge, split, OCR, extract, redact, sign, optimize, and compliance outputs like PDF/A or PDF/UA.

PDF Processing

Favorites 0GitHub 0

visa-doc-translate

by affaan-m

visa-doc-translate translates visa application document images to English and creates a bilingual PDF with the original page and translation. It is built for structured visa paperwork, OCR fallback, rotation handling, and preserving names, dates, and amounts.

Translation

Favorites 0GitHub 156.3k

nutrient-document-processing

by affaan-m

nutrient-document-processing skill for PDF processing and document automation with the Nutrient DWS API. Convert, OCR, extract, redact, sign, watermark, and fill files like PDFs, DOCX, XLSX, PPTX, HTML, and images.

PDF Processing

Favorites 0GitHub 156.2k

hv-analysis

by KKKKhazix

hv-analysis is a horizontal-vertical research skill for turning a product, company, concept, technology, or person into a structured analysis report. Use the hv-analysis skill for deep research, competitive comparison, and report-ready output, especially when you need hv-analysis for Data Analysis or a polished PDF workflow.

Data Analysis

Favorites 0GitHub 9k

azure-ai-formrecognizer-java

by microsoft

The azure-ai-formrecognizer-java skill helps Java developers use Azure AI Document Intelligence for OCR extraction, tables, key-value pairs, invoices, receipts, IDs, and custom document models. It aligns with the current com.azure:azure-ai-documentintelligence SDK and is useful when you need practical Java setup, API guidance, and repeatable document analysis.

OCR Extraction

Favorites 0GitHub 2.2k

markitdown

by K-Dense-AI

markitdown converts files and office documents to Markdown for easier reading, chunking, search, and LLM workflows. This markitdown skill supports PDF, DOCX, PPTX, XLSX, HTML, CSV, JSON, XML, ZIP, EPUB, images with OCR, and audio transcription, making it a practical markitdown guide for format conversion.

Format Conversion

Favorites 0GitHub 0

analyzing-malicious-pdf-with-peepdf

by mukul975

analyzing-malicious-pdf-with-peepdf is a static malware analysis skill for suspicious PDFs. Use peepdf, pdfid, and pdf-parser to triage phishing attachments, inspect objects, extract embedded JavaScript or shellcode, and review suspicious streams safely without execution.

Malware Analysis

Favorites 0GitHub 0

analyzing-pdf-malware-with-pdfid

by mukul975

analyzing-pdf-malware-with-pdfid is a PDF malware triage skill for detecting embedded JavaScript, exploit markers, object streams, attachments, and suspicious actions before opening a file. It supports static analysis for malicious PDF investigation, incident response, and analyzing-pdf-malware-with-pdfid for Security Audit workflows.

Security Audit

Favorites 0GitHub 0

pdf

by openai

Use the pdf skill for PDF Processing tasks where layout, pagination, and rendered output matter. It helps you read, create, edit, and review PDFs with a visual-first workflow: render pages, inspect the result, then adjust. Use it when you need reliable PDF install, pdf usage, and a practical pdf guide for document accuracy.

PDF Processing

Favorites 0GitHub 0

pdf

by K-Dense-AI

The pdf skill is a practical guide for PDF Processing when you need to read, extract, transform, or create PDF files in a workflow you can ship. It covers text extraction, merging, splitting, rotation, form filling, encryption, image extraction, and OCR for scanned PDFs. Use it when you need a repeatable pdf guide instead of a one-off prompt.

PDF Processing

Favorites 0GitHub 0

Resume Formatter

by Paramchoudhary

Resume Formatter helps turn rough resumes into clean, ATS-friendly documents with clear hierarchy, balanced spacing, and professional structure. It is useful for Resume Formatter for Resume Writing, job applications, and redesigns that need to stay readable on screen and paper.

Resume Writing

Favorites 0GitHub 443

minimax-pdf

by MiniMax-AI

The minimax-pdf skill helps you create, fill, or reformat polished PDFs when visual quality and document identity matter. Use it for CREATE, FILL, or REFORMAT workflows with a token-based design system that turns rough input into print-ready output. This guide covers minimax-pdf install, minimax-pdf usage, and route selection for better results.

PDF Processing

Favorites 0GitHub 0