Grok Imagine Video 1.5
Image-to-Video Only30 Copy-Ready PromptsPreview · June 2026

Grok Imagine Video 1.5 Prompt Guide:
Write Prompts That Actually Work

Grok Imagine Video 1.5 Preview is xAI's image-to-video model that animates still images into short videos with synchronized audio. The model already sees your uploaded image — your prompt should focus on motion, camera, atmosphere, and audio. This guide follows the official image-to-video prompt structure with copy-ready examples and the mistakes to avoid.

Director's 5 Rules
30 Copy-Ready Examples
Camera + Audio Guide
1–15s · 480p/720p

// Image-to-Video Prompt Formula

The Image-to-Video Prompt Structure

Grok Imagine Video 1.5 is image-to-video only. Your uploaded image provides the scene — your prompt describes what should change: motion, camera, atmosphere, and audio.

// The Formula

1Motion2Camera3Atmosphere4Audio
Step 1

Motion / Action

What should change — the action, movement, or transformation. Be specific about degree and speed.

turns head to the right and smilescar racing past at high speedsteam rising gently
Step 2

Camera Movement

How the camera moves — use standard cinematic language the model understands.

gentle camera push-incamera orbiting at eye levelslow pan left
Step 3

Atmosphere / Lighting

Mood, time of day, or light quality — not what's already visible in the image.

soft morning window lightgolden hour warmthcozy kitchen mood
Step 4

Audio

Native audio is generated automatically. Describe music, SFX, ambience, or dialogue.

soft room tonefootsteps on gravelupbeat electronic music
// Complete Example
1 · MotionThe woman slowly turns her head to the right and smiles, soft breeze moving her hair. 2 · CameraGentle camera push-in. 3 · AtmosphereSoft morning window light, cozy mood. 4 · AudioQuiet room tone with faint birdsong.

// Camera Movements

Cinematic Camera Language

Grok Imagine Video 1.5 understands standard cinematic camera terminology. Always specify a shot type and camera movement in your image-to-video prompts.

MovementWhat it does
Pan left / rightCamera rotates horizontally to reveal a scene
Tilt up / downCamera rotates vertically for dramatic reveals
Zoom in / outLens zooms closer or further
Dolly in / outCamera physically moves forward or backward (more cinematic than zoom)
Tracking / follow shotCamera follows a moving subject
Orbit / surroundCamera circles around the subject
Aerial / droneElevated bird's-eye perspective
HandheldNatural shake for documentary feel or urgency
Slow push-inGradual forward movement to build tension
Static / tripodNo camera movement for stable, formal compositions

// Audio Prompts

Native Audio Generation

Grok Imagine Video 1.5 generates audio natively alongside the video. Mention music, sound effects, ambience, or short dialogue in your prompt.

Background music

  • with upbeat electronic music
  • dramatic orchestral score

Sound effects

  • footsteps on gravel
  • wind howling
  • engine revving

Ambient audio

  • quiet café ambience
  • forest sounds with birdsong

Short dialogue

  • a quiet whisper: 'We made it.'
  • urgent shout: 'Stop him!'
Tip: You can add an AUDIO: section at the end of your prompt for clarity. This helps separate visual and audio instructions.
// AUDIO: Section Example
Close-up of hands pulling apart a warm cinnamon roll, steam rising, soft morning window light, slow camera push-in, cozy kitchen mood. AUDIO: soft room tone, faint kettle hiss, gentle pastry tear sound, a quiet satisfied whisper: 'Perfect.'

// Prompt Keywords

Prompt Keyword Library

Click any keyword to copy it. Combine motion, camera, atmosphere, and audio terms for image-to-video prompts.

Motion & Action

Camera Movement

Atmosphere & Lighting

Audio (Native Generation)

// 30 Copy-Ready Prompts

Prompt Examples by Scene

Click any prompt to copy it. All examples are motion-focused for image-to-video — upload your image first, then paste and customize for your scene.

The sneaker rotates smoothly on the pedestal, camera orbiting at eye level, dramatic spotlight sweeping across the surface.

Slow 360-degree rotation. Studio lighting sweeping across surface. Subtle electronic hum.

Static shot, steam rising from cup. Natural kitchen sounds with distant conversation.

Water droplets falling on watch face in slow motion. Dramatic rim light sweeping. Orchestral swell building.

Product slowly tilts forward revealing details. Clean studio lighting. Quiet ambient tone.

// Bad vs Good Prompts

What Makes a Prompt Actually Work

The fastest way to learn is to see what doesn't work next to what does. These image-to-video prompt comparisons show exactly what to fix — motion, camera, and audio instead of re-describing your image.

Re-describing the image
Bad Prompt

A woman with brown hair and blue dress walking on a beach at sunset with waves

Good Prompt

Slow pull-back as she walks forward. Ocean breeze moving her hair. Ambient wave sounds.

Vague motion
Bad Prompt

Car passing

Good Prompt

Car racing past at high speed. Static wide shot. Engine revving loudly.

Contradicting the image
Bad Prompt

A woman dances gracefully

Good Prompt

The man slowly nods and smiles. Gentle camera push-in. Soft room tone.

No camera direction
Bad Prompt

Steam rising from the coffee cup

Good Prompt

Static close-up, steam rising gently. Morning window light. Quiet café ambience.

Negative prompts
Bad Prompt

No blur, avoid shaking, without grain

Good Prompt

Sharp focus, stable tripod shot, clean cinematic look.

// Common Mistakes

7 Prompt Mistakes That Kill Your Output

These are the most frequent problems we see with Grok Imagine Video 1.5 prompts. Each mistake leads to blurry, inconsistent, or off-target results — and each has a simple fix.

Mistake 1

Re-describing the image

The model already sees it. Describing what's in the photo wastes prompt budget and can cause drift.

Fix: Focus on motion, camera movement, atmosphere, and audio.

Mistake 2

Contradicting the source image

Writing actions or subjects that don't match the uploaded photo confuses the output.

Fix: Match your prompt to what's actually in the image.

Mistake 3

Tag stacking

"knight, castle, epic, 8K, cinematic" doesn't help — the model needs intent, not keywords.

Fix: Write a natural sentence with clear motion and camera direction.

Mistake 4

Too many simultaneous actions

Multiple unrelated actions at once produce inconsistent results.

Fix: Keep it to one subject, one action, one camera move — or list multi-beat actions in order.

Mistake 5

No camera direction

Without camera direction, the model defaults to static or unpredictable motion.

Fix: Always specify a shot type and camera movement.

Mistake 6

Vague motion

"The thing moves" gives the model nothing to work with.

Fix: Use specific verbs with intensity modifiers — 'racing past at high speed' not 'passing.'

Mistake 7

Using negative prompts

"No blur", "avoid shaking" — the model ignores negative instructions entirely.

Fix: Describe what you want instead.

// Think Like a Director

Write Prompts for Image-to-Video

Think like a director — your image is the scene. Write about motion, camera, and audio, not description. Every generation requires an input image.

1

Don't re-describe the image

The model sees it. Tell it what should change — the action, the camera movement, the atmosphere.

2

Don't contradict the image

If there's a man in the photo, don't write 'a woman dances.' Match your prompt to what's actually there.

3

Be specific about motion

'Car passing' is vague — 'car racing past at high speed' gives the model something to work with.

4

Anchor the subject

Mention prominent features: 'the old man wearing glasses' or 'the woman in the red jacket.'

5

Negative prompts don't work

The model ignores them. Describe what you want instead.

// What You Can Make

Use Cases for Image-to-Video

Grok Imagine Video 1.5 animates still images into short videos with synchronized audio. It handles both visual generation and audio synthesis in one pass.

Product Showcases

Transform static product photography into dynamic demonstrations. A watch photo becomes a luxury ad with an elegant wrist turn. A sneaker shot gets a 360-degree rotation with dramatic lighting.

Character Animation

Turn illustrated characters into smooth animations. The model understands cartoon physics and exaggerated motion, creating professional-quality animation that would typically require an entire animation team.

Portrait Videos

Animate professional headshots into video introductions with natural human motion. The model handles realistic facial expressions, head turns, and body language.

Creative Projects

Bring concept art to life, animate historical photos, or turn memes into short video clips with appropriate sound effects and music.

// FAQ

Frequently Asked Questions

Common questions about writing prompts for Grok Imagine Video 1.5 — covering length, structure, audio, camera control, and consistency.

No. Grok Imagine Video 1.5 Preview is image-to-video only — every generation requires an input image. Write prompts focused on motion, camera movement, atmosphere, and audio rather than describing the scene from scratch.

Keep it short — 1–3 sentences focusing on motion, camera movement, atmosphere, and audio. The image already provides the visual context, so you don't need to describe what's in the photo.

No. The model already sees your uploaded image. Describing what's in the photo wastes prompt budget and can cause the output to drift from your source. Focus only on what should change.

Include audio descriptions in your prompt: 'ambient rain sounds', 'orchestral swell', 'footsteps on gravel'. Grok Imagine Video 1.5 generates synchronized audio automatically. You can add an 'AUDIO:' section at the end of your prompt for clarity.

Yes. Use standard cinematic language: 'slow push-in', 'aerial drone shot', 'handheld tracking', 'orbit around subject', 'static tripod shot'. The model understands professional camera terminology.

[Motion/Action] + [Camera Movement] + [Atmosphere] + [Audio]. Example: 'Slow zoom out, leaves falling gently. Ocean breeze moving her hair. Ambient wave sounds.' Don't describe what's in the image — focus on what should change.

Keep prompts simple: one subject, one action, one camera move. Use specific verbs with intensity modifiers. Iterate in small steps — change one element at a time. Shorter clips (5–8 seconds) are more stable than 15-second clips. Match aspect ratio to your platform.

// Ready to Generate

Put Your Prompts to Work

Upload a still image, paste a motion-focused prompt from this guide, and generate a short clip with native audio — no API key required on this site.

Want benchmark data and real test results? Read the full review