Gemini Omni API

Gemini Omni API is a video generation interface for Google's latest multimodal model family. Developers and creative teams use it to turn text prompts or static images into short, high-fidelity video clips without managing complex local GPU clusters. It handles the heavy lifting of video synthesis through a single cloud endpoint.

The system supports several input modes, including text-to-video and a unique three-image fusion feature. This allows you to combine a specific character, a product, and a background into one coherent motion shot. It is useful for social media marketing, rapid prototyping for film, or adding dynamic content to landing pages.

Key features

Resolution options ranging from 720p and 1080p up to 4K
Flexible clip durations of 4, 6, 8, or 10 seconds
Three-image fusion mode to combine scene, character, and product
Single API endpoint for all video generation tasks
Asynchronous processing with task ID polling
Credit-based billing where you only pay for completed files
Support for motion prompts to guide how images are animated

How to use

Sign up for a reAPI account and generate an API key from the dashboard.
Set your request headers with the authorization key for authentication.
Send a POST request to the video generations endpoint with your prompt or image URLs.
Capture the task ID returned in the initial response to track progress.
Poll the task status endpoint until the status shows as completed.
Download the final video file from the provided URL once the task finishes.

Use cases

Animating a single product photo for a social media advertisement.
Creating a 4K hero video for a website header using a text prompt.
Fusing a character reference and a background image into a consistent scene.
Generating short B-roll clips for professional video editing workflows.

Pricing

Costs are calculated per generation based on resolution and length. A 4-second 720p clip starts at $0.18 (180 credits), while a 10-second 4K video costs $0.48 (480 credits). Check the official website for current pricing.

FAQ

What is Gemini Omni API?

It is a video generation surface for Google DeepMind's multimodal models that creates 4 to 10 second clips from text or images.

Is Gemini Omni API free?

You get free credits upon signing up to test the service, but subsequent generations require purchasing credits.

Does it support audio?

No, audio is currently omitted in the output, making the clips ready for manual sound design in post-production.

Can I use multiple reference images?

Yes, the API supports up to three reference images to control scene, character, and product details simultaneously.

What is the maximum resolution?

The API can generate video at 4K resolution for high-fidelity production needs.

How long does it take to generate a video?

Generation times vary based on resolution and length, but you poll the task ID until the status is marked as completed.

Introduction

Key features

How to use

Use cases

Pricing