Multimodal

Multimodal skills and workflows surfaced by the site skill importer.

4 skills

gemini-interactions-api

by google-gemini

Use the gemini-interactions-api skill to build Gemini API code for chat, multimodal prompts, streaming, structured output, tool use, and image generation. It also helps with migration from older generateContent patterns and provides practical guidance for API Development in Python and TypeScript.

API Development

Favorites 0GitHub 3.4k

azure-ai-contentunderstanding-py

by microsoft

azure-ai-contentunderstanding-py is the Python skill for Azure AI Content Understanding. It extracts structured content from documents, images, audio, and video for RAG workflows and automation. Use it when you need reliable multimodal extraction, Azure authentication, and repeatable pipeline-ready output.

RAG Workflows

Favorites 0GitHub 2.2k

azure-ai-vision-imageanalysis-java

by microsoft

azure-ai-vision-imageanalysis-java helps you build Java image analysis apps with Azure AI Vision. Use it for captioning, OCR, object detection, tagging, people detection, smart cropping, and API Development with SDK setup, auth, and examples.

API Development

Favorites 0GitHub 2.2k

transform-generate-image-with-transloadit

by transloadit

transform-generate-image-with-transloadit is a one-off image generation skill for creating a local image file from a text prompt or prompt plus reference images using Transloadit via the transloadit CLI. Use it for fast, prompt-driven image generation with clear output-path control and optional model selection.

Image Generation

Favorites 0GitHub 0