pdf

by anthropics

The pdf skill guides PDF Processing tasks like text extraction, merge and split operations, rendering pages to images, and PDF form workflows. It is especially useful for checking fillable fields, extracting form metadata, and validating non-fillable form layouts with scripts.

Stars105.1k

Favorites0

Comments0

AddedMar 28, 2026

CategoryPDF Processing

Install Command

npx skills add anthropics/skills --skill pdf

Curation Score

This skill scores 84/100, which means it is a strong directory listing candidate for agents that need to work with PDFs. Directory users get broad trigger coverage, substantial procedural content, and concrete helper scripts—especially for form filling—so an agent can usually act with less guesswork than a generic prompt, though environment/setup expectations are not fully spelled out in the skill itself.

84/100

Strengths

Very strong triggerability: the description explicitly says to use it whenever the user mentions a .pdf or asks to produce one, and names many common PDF tasks.
Operationally useful workflow content: SKILL.md provides examples for core PDF operations, while forms.md gives ordered instructions and command-level steps for fillable vs non-fillable forms.
Real execution leverage from included scripts: the repo ships multiple utilities for checking form fields, extracting structure, converting PDFs to images, validating bounding boxes, and filling forms.

Cautions

Install/runtime requirements are implied rather than clearly packaged: SKILL.md has no install command, even though the skill relies on Python libraries and command-line tooling.
The scope is very broad, but some advanced capabilities are pushed into reference material, so users may still need to choose among libraries and approaches.

Pdf OCR Python Cli Workflow

Overview

Overview of pdf skill

What the pdf skill does

The pdf skill is a practical guide for PDF Processing tasks, with the strongest value in routine operations and form workflows. It helps an agent choose working tools and steps for reading PDFs, extracting text, merging or splitting files, rendering pages to images, and especially filling PDF forms correctly.

Who should install this pdf skill

This pdf skill is best for users who regularly handle PDFs in automation, data entry, document pipelines, or agent workflows. It is a strong fit if you want more than a generic “use a PDF library” answer and need concrete paths for fillable vs non-fillable forms, page rendering, and validation.

Real job-to-be-done

Most users do not need a broad PDF theory guide. They need a dependable way to answer questions like:

“How do I extract text from this PDF?”
“How do I merge or split pages safely?”
“Does this form have actual fillable fields?”
“If not, how do I locate where values should be placed?”
“How do I validate that my field boxes do not overlap?”

This skill is useful because it turns those questions into a workflow instead of leaving the agent to guess.

What makes pdf different from a generic prompt

The main differentiator is form handling discipline. The repository includes dedicated instructions in forms.md and helper scripts such as:

scripts/check_fillable_fields.py
scripts/extract_form_field_info.py
scripts/extract_form_structure.py
scripts/fill_fillable_fields.py
scripts/fill_pdf_form_with_annotations.py
scripts/check_bounding_boxes.py
scripts/create_validation_image.py

That means the pdf guide is not just about libraries; it gives a decision path for forms and validation, which is where many PDF automations fail.

Best-fit and misfit cases

Use pdf for PDF Processing when you need actionable instructions for Python-based workflows, image conversion, rendering, or form filling.

It is less compelling if you only need a one-line reminder for a standard library call, or if your stack is entirely outside Python and you do not want to translate examples from reference.md.

How to Use pdf skill

Install context for pdf

Install the skill from the Anthropic skills repository:

npx skills add https://github.com/anthropics/skills --skill pdf

After install, work from the skill directory rather than only skimming the top file, because the most valuable guidance is split across SKILL.md, forms.md, reference.md, and the scripts/ folder.

Read these files first

For a fast adoption path, open files in this order:

SKILL.md
forms.md
reference.md
scripts/check_fillable_fields.py
scripts/extract_form_field_info.py
scripts/fill_fillable_fields.py

Why this order matters:

SKILL.md covers common operations and library direction.
forms.md contains the strict branching logic for form tasks.
reference.md expands into rendering and JavaScript options.
The scripts show the real expected inputs and outputs.

Choose the right workflow before writing code

A good pdf usage pattern starts with task classification:

Text extraction
Page manipulation
Render PDF pages as images
Fill a form
Build a PDF from data

Do this first because form tasks follow a very different path from merge/split/extract tasks. The repository is explicit that form filling should not start with ad hoc code.

How to handle ordinary PDF operations

For basic PDF Processing, the skill points first to pypdf. That is the default path for:

reading PDFs
counting pages
extracting text
merging files
splitting pages

If your task is “combine these files” or “extract the text page by page,” the examples in SKILL.md are the quickest starting point.

How to handle rendering and image conversion

If your goal is page screenshots, previews, visual inspection, or image-based downstream processing, use the rendering-oriented materials:

reference.md for pypdfium2
scripts/convert_pdf_to_images.py for PNG conversion

This matters when text extraction alone is insufficient, such as scanned PDFs, visual form review, or validating page layout before annotation.

The critical branch for PDF forms

For forms, the skill gives a stricter process than generic prompting. Start with:

python scripts/check_fillable_fields.py <file.pdf>

This answers the first decision that blocks many automations:

If the PDF has fillable fields, extract field info and populate those fields directly.
If it does not, use the non-fillable workflow from forms.md, which relies on visual structure and bounding boxes.

Skipping this check is the most common way to waste time.

Inputs that produce better pdf results

When invoking the pdf skill, provide:

the exact file path or file names
whether the PDF is digital or scanned
the intended output format
whether forms are fillable
whether you need text fidelity, layout fidelity, or visual output
whether you can run Python scripts locally

A weak request:

“Help with this PDF.”

A strong request:

“I need to fill a 6-page government form PDF. First determine whether it has fillable fields. If yes, extract field metadata to JSON. If no, convert pages to images, identify entry regions, and generate a validation image before placing values.”

The stronger version lets the agent trigger the right path immediately.

How to prompt the pdf skill well

A reliable prompt format is:

goal
file(s)
constraints
desired output
validation requirement

Example:

Goal: extract tables and page text from report.pdf
Constraints: Python only, no cloud OCR
Desired output: CSV tables plus a text dump per page
Validation: preserve page numbers and report pages with no text

This is better than just asking for “PDF extraction” because the skill covers multiple methods and quality depends on choosing the correct one.

Form workflow for fillable PDFs

If the PDF has real fields, the useful next step is:

python scripts/extract_form_field_info.py <input.pdf> <field_info.json>

The extracted JSON includes field IDs, page numbers, rectangles, and field types such as:

text
checkbox
radio_group
choice

This is the practical core of the pdf guide for forms, because it gives structured targets instead of relying on visual guessing.

Form workflow for non-fillable PDFs

If the PDF is not fillable, forms.md indicates that you must visually determine where values belong. The supporting scripts suggest a workflow like:

convert the PDF to images
infer form structure and bounding boxes
validate box placement
write annotations or filled output

This is slower than fillable-field handling, but the repository gives a more realistic path than “just OCR it.”

Use validation scripts before trusting output

Two scripts materially improve reliability:

scripts/check_bounding_boxes.py
scripts/create_validation_image.py

Use them when working with non-fillable forms or inferred field locations. They help catch overlapping entry areas, label collisions, and placement mistakes before you generate final output.

That is a real adoption advantage of this pdf install: it includes validation helpers, not just transformation code.

Libraries and tool choices inside the skill

The repository’s practical tool split is:

pypdf for standard document operations
pypdfium2 for rendering and image-oriented work
pdf2image in the helper script for conversion to PNG
pdf-lib in reference.md if you prefer JavaScript for creation/manipulation

If you are deciding whether to install this pdf skill, that tool coverage is useful: it is not locked to one library, but it still has a clear default path.

pdf skill FAQ

Is this pdf skill only for form filling?

No. The pdf skill also covers extraction, merge/split operations, rendering, creation, and general PDF manipulation. But form workflows are where it adds the most decision value over an ordinary prompt.

Is pdf good for beginners?

Yes, if you can run Python scripts. The best beginner path is to start with SKILL.md for simple operations, then use forms.md only when your task is actually a form. The scripts reduce guesswork, but they do assume a local Python environment and basic command-line comfort.

What does this skill do better than a normal LLM prompt?

It gives a concrete workflow for branching between fillable and non-fillable PDFs, plus validation tooling. A normal prompt may suggest libraries; this skill shows when to inspect fields, when to render pages, and how to verify bounding boxes.

When should I not use this pdf guide?

Do not rely on this pdf guide if:

you need a fully packaged end-user app rather than a skill/workflow
you cannot execute local scripts
you need advanced OCR-first pipelines beyond what the repository explicitly supports
you want a single opinionated production framework instead of a mixed-reference toolkit

Does pdf support JavaScript too?

Partly. The main workflow is Python-first, but reference.md includes pdf-lib examples for JavaScript. If your team is JS-native, the skill still helps with concepts and task decomposition, but the strongest operational support is in Python.

Can this skill handle scanned PDFs?

Partially. It can help render pages to images and structure workflows around visual processing. But scanned PDFs often require OCR or visual placement logic, so results depend heavily on document quality and your chosen downstream tools.

How to Improve pdf skill

Start with the right PDF diagnosis

The best way to improve pdf usage is to classify the document before acting:

text-based vs scanned
fillable vs non-fillable
document extraction vs form completion
visual fidelity vs text fidelity

Most failures come from choosing the wrong path, not from bad code syntax.

Provide stronger task inputs

Better inputs produce better outputs. Include:

sample file name
number of pages
whether there are tables, forms, or signatures
whether you need editable output or just extracted data
the exact fields to fill, preferably as a JSON mapping

For forms, this is much better than a prose list because the scripts and workflows naturally map to structured data.

Validate before scaling up

Do not test on 200 PDFs first. Run the pdf skill on one representative file and inspect:

extracted text quality
field metadata completeness
page image rendering
bounding box overlap warnings
final visual output

This small-batch validation catches the errors that become expensive later.

Common failure modes in pdf workflows

Watch for these:

assuming a PDF is fillable without checking
using text extraction on scanned files and getting near-empty output
writing field values without first inspecting field IDs and field types
skipping validation images for non-fillable forms
treating rendering output as if it were structured text extraction

These are exactly the areas where the repository’s scripts help.

Improve prompts by asking for the full workflow

A better prompt for pdf for PDF Processing asks the agent to:

identify the document type
select the library/tool path
show intermediate outputs
validate before finalizing

Example:
“Use the pdf skill to inspect application.pdf. First check if it has fillable fields. If yes, extract field metadata and propose a JSON payload for completion. If no, convert each page to images, identify entry regions, generate a validation image for page 1, and only then suggest the filling approach.”

This kind of prompt improves both accuracy and trust.

Iterate after the first output

If the first result is weak, do not just ask for “better.” Ask for a narrower correction:

“Re-run using rendered images because text extraction returned little content.”
“List all checkbox and radio fields separately.”
“Generate validation overlays for pages 2 and 3.”
“Preserve original page order and output one file per page.”

Specific iteration requests make the pdf skill much more effective than broad retries.

Use repository scripts as truth anchors

When agent output and document reality differ, trust the repository scripts over freeform reasoning. For this skill, the scripts are the strongest source of operational truth because they define expected inputs, field structures, and validation checks.

Know the adoption tradeoff

The pdf install is worth it if PDF forms, layout-sensitive workflows, or repeated document handling are part of your work. If your use case is only occasional page merging, a generic prompt may be enough. The skill pays off most when you need repeatable, validated PDF Processing rather than one-off advice.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

kreuzberg

by kreuzberg-dev

The kreuzberg skill helps you install and use Kreuzberg for document extraction across 91+ formats, including PDFs, Office files, images, HTML, email, and archives. It covers Python, Node.js/TypeScript, Rust, and CLI workflows for OCR, tables, metadata, batch processing, and practical parsing guidance.

PDF Processing

Favorites 0GitHub 0

azure-ai-document-intelligence-ts

by microsoft

azure-ai-document-intelligence-ts is a TypeScript skill for extracting text, tables, key-value fields, and structured data with Azure Document Intelligence. Use it for OCR Extraction from invoices, receipts, IDs, and forms, or when you need prebuilt and custom model workflows in Node.js with Azure REST SDK authentication.

OCR Extraction

Favorites 0GitHub 2.3k

azure-ai-contentunderstanding-py

by microsoft

azure-ai-contentunderstanding-py is the Python skill for Azure AI Content Understanding. It extracts structured content from documents, images, audio, and video for RAG workflows and automation. Use it when you need reliable multimodal extraction, Azure authentication, and repeatable pipeline-ready output.

RAG Workflows

Favorites 0GitHub 2.2k

azure-ai-document-intelligence-dotnet

by microsoft

azure-ai-document-intelligence-dotnet helps .NET developers install and use Azure AI Document Intelligence to extract text, tables, key-value pairs, and structured fields from invoices, receipts, IDs, and custom documents. It includes practical setup, authentication, and OCR Extraction guidance for reliable document analysis.

OCR Extraction

Favorites 0GitHub 2.2k

nutrient-document-processing

by PSPDFKit-labs

nutrient-document-processing is a workflow skill for PDF Processing with Nutrient DWS. It helps you install, understand, and use repeatable document workflows for convert, merge, split, OCR, extract, redact, sign, optimize, and compliance outputs like PDF/A or PDF/UA.

PDF Processing

Favorites 0GitHub 0

visa-doc-translate

by affaan-m

visa-doc-translate translates visa application document images to English and creates a bilingual PDF with the original page and translation. It is built for structured visa paperwork, OCR fallback, rotation handling, and preserving names, dates, and amounts.

Translation

Favorites 0GitHub 156.3k

nutrient-document-processing

by affaan-m

nutrient-document-processing skill for PDF processing and document automation with the Nutrient DWS API. Convert, OCR, extract, redact, sign, watermark, and fill files like PDFs, DOCX, XLSX, PPTX, HTML, and images.

PDF Processing

Favorites 0GitHub 156.2k

hv-analysis

by KKKKhazix

hv-analysis is a horizontal-vertical research skill for turning a product, company, concept, technology, or person into a structured analysis report. Use the hv-analysis skill for deep research, competitive comparison, and report-ready output, especially when you need hv-analysis for Data Analysis or a polished PDF workflow.

Data Analysis

Favorites 0GitHub 9k

azure-ai-formrecognizer-java

by microsoft

The azure-ai-formrecognizer-java skill helps Java developers use Azure AI Document Intelligence for OCR extraction, tables, key-value pairs, invoices, receipts, IDs, and custom document models. It aligns with the current com.azure:azure-ai-documentintelligence SDK and is useful when you need practical Java setup, API guidance, and repeatable document analysis.

OCR Extraction

Favorites 0GitHub 2.2k

analyzing-malicious-pdf-with-peepdf

by mukul975

analyzing-malicious-pdf-with-peepdf is a static malware analysis skill for suspicious PDFs. Use peepdf, pdfid, and pdf-parser to triage phishing attachments, inspect objects, extract embedded JavaScript or shellcode, and review suspicious streams safely without execution.

Malware Analysis

Favorites 0GitHub 0

analyzing-pdf-malware-with-pdfid

by mukul975

analyzing-pdf-malware-with-pdfid is a PDF malware triage skill for detecting embedded JavaScript, exploit markers, object streams, attachments, and suspicious actions before opening a file. It supports static analysis for malicious PDF investigation, incident response, and analyzing-pdf-malware-with-pdfid for Security Audit workflows.

Security Audit

Favorites 0GitHub 0

pdf

by openai

Use the pdf skill for PDF Processing tasks where layout, pagination, and rendered output matter. It helps you read, create, edit, and review PDFs with a visual-first workflow: render pages, inspect the result, then adjust. Use it when you need reliable PDF install, pdf usage, and a practical pdf guide for document accuracy.

PDF Processing

Favorites 0GitHub 0

Resume Formatter

by Paramchoudhary

Resume Formatter helps turn rough resumes into clean, ATS-friendly documents with clear hierarchy, balanced spacing, and professional structure. It is useful for Resume Formatter for Resume Writing, job applications, and redesigns that need to stay readable on screen and paper.

Resume Writing

Favorites 0GitHub 443

minimax-pdf

by MiniMax-AI

The minimax-pdf skill helps you create, fill, or reformat polished PDFs when visual quality and document identity matter. Use it for CREATE, FILL, or REFORMAT workflows with a token-based design system that turns rough input into print-ready output. This guide covers minimax-pdf install, minimax-pdf usage, and route selection for better results.

PDF Processing

Favorites 0GitHub 0

frontend-design

by anthropics

frontend-design helps you turn vague UI ideas into distinctive, production-grade interfaces with real frontend code, strong aesthetic direction, and less generic AI styling.

UI Design

Favorites 1GitHub 105.2k

create-colleague

by titanwings

create-colleague turns coworker docs, chats, emails, screenshots, Feishu, and DingTalk data into an editable AI skill with separate work and persona outputs, plus update flows for ongoing refinement.

Skill Authoring

Favorites 1GitHub 747