Seedance 2.0 via Kie.ai API: How to Use ByteDance's Multimodal Video Model (2026)

Q: What makes Seedance 2.0 different from Sora 2 or Veo 3.1?

Seedance 2.0 is the first frontier video model that accepts all four modalities in a single request — text, image, video, and audio — with up to twelve reference files at once. It generates native audio in the same pass as video (no post-production lip-sync), supports multi-shot prompting with consistent characters, and ships clips of four to fifteen seconds. On paper its 720p pricing via Kie.ai is the cheapest of the three.

Q: Seedance 2.0 or Seedance 2.0 Fast — which should I use?

Standard for production-quality video where motion, lip-sync, and character consistency matter. Fast for iteration and batch generation where you are testing prompts or running a lot of variations. Fast only supports 480p and 720p, not 1080p, and finishes in about four minutes vs five minutes for standard.

Q: Can I upload my own video as a reference?

Yes. Seedance 2.0 accepts up to three reference videos per request with a combined length of fifteen seconds, plus up to nine reference images and three reference audio clips. This is the unlock for filmmakers — you can record a rough shot on your phone and have Seedance replicate the blocking, camera path, and lighting in a cinematic version.

Q: What's the best prompt structure for multi-shot sequences?

Give each shot its own line. Describe the shot type (close-up, wide, tracking), the action, the camera movement, and the lighting per shot. Name the character once and use the same handle across every shot so the model locks on. Keep total duration under fifteen seconds for a single-clip multi-shot; for longer sequences, chain multiple generations with the same character reference image.

Key Takeaways

Seedance 2.0 is a multimodal-first video model — text, image, video, and audio all feed the same prompt.
Native audio in a single pass — synced dialogue, music, and ambient soundscapes without a second pipeline.
Multi-shot prompting — one prompt can produce five to fifteen consistent shots with a locked character.
Via Kie.ai: from $0.0575/s at 480p. A 10-second 720p shot is $1.25.
Two tiers: Standard (up to 1080p) and Fast (up to 720p, ~4 min generation).

Table of Contents

What Seedance 2.0 actually is
Six capabilities that matter
Why Kie.ai is the cleanest access point
Prompt recipes that consistently work
The API walkthrough — createTask to download
Pricing, tiers, and the Fast vs Standard call
Seedance 2.0 vs Sora 2 vs Veo 3.1
Watch the full walkthrough
FAQ

Seedance 2.0 via Kie.ai — ByteDance multimodal video model 2026 build guide

ByteDance launched Seedance 2.0 in February 2026 and quietly moved the video-generation frontier. It is the first major video model that accepts all four modalities in one request — text, image, video, and audio — with up to twelve reference files at once. Audio generates natively alongside the picture. Multi-shot prompts stay consistent across fifteen shots. A ten-second 720p clip costs $1.25 via Kie.ai. We spent the week since launch putting it through real production work and writing down what actually happens.

Visual overview of Seedance 2.0 capabilities — multimodal inputs, native audio, multi-shot prompting — Sketch-note overview of Seedance 2.0's feature landscape.

Ship with Seedance 2.0 today

Try Seedance 2.0 on Kie.ai — 5,000 free credits

One API key for Seedance 2.0, Veo 3.1, Suno, Midjourney, Nano Banana Pro, and dozens more — at 30-80% below official rates.

What Seedance 2.0 actually is

Seedance 2.0 is ByteDance's unified multimodal audio-video model. The unusual word in that sentence is unified. Most video models take one input shape. Runway takes a prompt plus an image. Veo takes a prompt plus an image. Sora mostly takes a prompt. Seedance takes all four — text, image, video, and audio — in a single request, and it composes them into one clip.

In practice that means you can upload a phone clip of your dog, a photo of your dog from two years ago, a short audio snippet of the bark you want, and a prompt that says "my dog runs through a snowy Alpine village, golden hour, dolly-back shot". The model renders a cinematic version where the dog looks right, moves right, and barks at the right moment. None of the other frontier video models can take all four of those at once.

Seedance 2.0 landing page on seed.bytedance.com — unified multimodal audio-video generation model — Seedance 2.0 on seed.bytedance.com — ByteDance's in-house research page for the model.

The catch: ByteDance's own web interface is limited to China. International access happens through third-party platforms. Replicate has it. fal has it. Kie.ai has it — and Kie is where we run it, because the billing, pricing, and documentation are the least painful of the set.

Six capabilities that matter in production

After the first week of real usage, here are the six features that change how we work. Everything else is details.

Six Seedance 2.0 capabilities — multimodal input, native audio, 15s duration, multi-shot, reference control, cinematic motion — The six capabilities ranked by how often they change our shot-list decisions.

1. Multimodal input in a single request

Up to nine reference images, three reference videos (combined fifteen seconds), three reference audio files (combined fifteen seconds). Text prompt on top. The model references everything with an @-mention system, so you can name specific uploads in the prompt — "follow the camera path of @ref_video_1, apply the lighting from @ref_image_3".

2. Native audio that syncs on first try

Set generate_audio: true and the model produces a clip where dialogue matches mouth movement, footsteps hit when feet hit ground, and background music follows the emotional arc of the scene. The alternative is to generate silent video, run a TTS pass, run a lip-sync pass, then layer music — three additional tools, three places things break. With Seedance, the audio is part of the same render.

3. Multi-shot consistency in one prompt

Write five or ten shots in one prompt and the model keeps the character's face, clothes, and voice locked across every shot. Lighting stays consistent within a scene. Camera logic — who is where relative to whom — carries over. This is the single biggest practical unlock. Earlier video models would drift character identity across shots; you had to regenerate each shot individually against a reference image.

4. Flexible four-to-fifteen second duration

Most social clips want six seconds. Most cinematic shots want eight to twelve. Fifteen is plenty for a multi-shot sequence that makes narrative sense. Pass the exact duration you want as an integer in seconds.

5. Reference-driven workflow instead of prompt guessing

If you want a specific camera move — an orbit, a whip pan, a match cut — upload a four-second reference video of that move on anything (a YouTube clip, a phone recording) and tell Seedance to replicate the movement. No more reverse-engineering cinematic terminology into a prompt string. Show, don't tell.

6. Physics-aware motion

Objects have weight. A falling knife falls at knife speed. A ball bouncing on grass stops faster than a ball bouncing on concrete. This was the most embarrassing failure mode in earlier models — objects would glide, float, or teleport — and it is largely fixed.

Quick-recall flashcards — Seedance 2.0 fundamentals

Which company developed the Seedance 2.0 multimodal video model?

ByteDance

Seedance 2.0 is described as a _____ model because text, image, video, and audio all feed the same prompt.

multimodal-first

How many total reference files can be included in a single Seedance 2.0 request?

Up to twelve

What four input modalities can Seedance 2.0 accept simultaneously in one request?

Text, image, video, and audio

The ability of Seedance 2.0 to generate synced dialogue and music in the same render is called _____.

native audio

How many consistent shots can a single multi-shot prompt produce in Seedance 2.0?

Five to fifteen shots

Why Kie.ai is the cleanest access point

Seedance 2.0 is available on a handful of platforms. We use Kie.ai for day-to-day because of three things: the unified API shape, the predictable USD pricing, and the fact that you can route Seedance alongside Veo 3.1, Suno, Midjourney, Flux, and Nano Banana Pro through the same API key and billing line.

Kie.ai Seedance 2.0 API page showing pricing tiers, model variants, and input parameters — Kie.ai's Seedance 2.0 page. The Playground is the fastest way to test before wiring the API.

If you already use Kie.ai for Veo or Suno, Seedance 2.0 is a drop-in addition — same /api/v1/jobs/createTask endpoint, same Bearer auth, same recordInfo polling shape. We've been writing about Kie.ai as an API gateway since launch; if you want the full platform-level review, read our Kie.ai review from earlier this month.

Kie.ai homepage listing frontier AI APIs including Veo, Suno, Midjourney, Seedance, Flux, Nano Banana Pro — Kie.ai's catalog. Seedance 2.0 joined Veo, Suno, Midjourney, Flux, Nano Banana Pro, and Runway Aleph.

Prompt recipes that consistently work

Three prompt patterns account for nine out of ten of the Seedance clips we've produced this week. None of them are secret — they are just what the model rewards.

Recipe one — the bullet-time freeze

A single action frozen mid-motion while the camera orbits. The ingredient that makes it work: a specific action verb, a specific orbit direction, and a reference image of the subject mid-action. Example:

A goalkeeper dives horizontally to block a shot, frozen mid-air at the
apex. Camera orbits 180 degrees counter-clockwise around the keeper at
the moment of contact with the ball. Stadium floodlights, wet turf
reflecting. After three seconds of frozen orbit, time resumes and the
keeper crashes to the grass. Duration: 8 seconds. Native audio: crowd
roar building, impact thud, grass scuff.

Recipe two — the multi-shot sequence

Numbered shots, one per line, with the character's name repeated in every shot. Specify shot type, action, camera, and lighting for each. Example from a breakfast ad we shot:

Character: @ref_image_1 — a woman in her 30s, messy hair, white t-shirt.

Shot 1 (2s): Close-up of a coffee mug sliding across a wooden table.
  Kitchen morning light. Camera locked.
Shot 2 (2s): Overhead on the woman pouring oat milk. Light streaming
  through the window. Camera slowly zooming in.
Shot 3 (3s): Medium shot, she takes the first sip. Warm smile. Camera
  gentle dolly out.
Shot 4 (2s): Close-up of her eyes opening wider, almost surprised.
  Natural window light.
Shot 5 (3s): Cut to black, title card "GOOD MORNING".

Duration: 12 seconds. Native audio: ambient kitchen, wooden clinks,
soft upbeat piano entering at shot 3.

Recipe three — the reference-driven VFX shot

Upload a reference video of the camera move or VFX you want, then describe the new scene. Seedance copies the motion and applies it to your subject. This is the pattern we use most for product shots and brand film.

A useful LLM trick: take any Seedance prompt that works, paste it into Claude or ChatGPT, and ask for a variation where only the setting changes. The structure is what the video model rewards; the words inside are replaceable. We've reused the breakfast template above for six different products and got usable output every time.

Five-step Seedance 2.0 prompt workflow — image first, cinematic prompt, add references, POST to Kie, poll and download — Five steps from idea to downloadable MP4. The image-first step is where most people get it wrong.

Quick-recall flashcards — prompts & pricing

What is the specific duration range for clips generated by Seedance 2.0?

Four to fifteen seconds

In multimodal inputs, what is the maximum number of reference images allowed?

Nine reference images

In multimodal inputs, what is the maximum number of reference videos allowed?

Three reference videos

In multimodal inputs, what is the maximum number of reference audio files allowed?

Three reference audio files

What is the combined maximum duration allowed for reference video files?

Fifteen seconds

What is the combined maximum duration allowed for reference audio files?

Fifteen seconds

The API walkthrough — createTask to download

Seedance 2.0 on Kie.ai follows the platform's standard task pattern. POST a job, poll for the result, download the video URL when state goes success. Two endpoints total.

POST https://api.kie.ai/api/v1/jobs/createTask
Authorization: Bearer YOUR_KIE_API_KEY
Content-Type: application/json

{
  "model": "bytedance/seedance-2",
  "input": {
    "prompt": "A goalkeeper dives horizontally to block a shot...",
    "reference_image_urls": ["https://.../keeper.jpg"],
    "reference_video_urls": [],
    "reference_audio_urls": [],
    "generate_audio": true,
    "resolution": "720p",
    "aspect_ratio": "16:9",
    "duration": 8,
    "first_frame_url": "",
    "last_frame_url": ""
  }
}

Response returns a taskId. Poll the result:

GET https://api.kie.ai/api/v1/jobs/recordInfo?taskId=<taskId>
Authorization: Bearer YOUR_KIE_API_KEY

// Response when complete:
{
  "code": 200,
  "data": {
    "state": "success",
    "resultJson": "{\"resultUrls\":[\"https://cdn.kie.ai/videos/....mp4\"]}"
  }
}

Standard mode generates in about five minutes. Fast mode in about four. Parse data.resultJson as JSON, pull the first URL out of resultUrls, and download.

Kie.ai documentation hub — Getting Started with KIE API, auth, rate limits — Kie.ai's documentation hub — the same task pattern applies to every model on the platform.

A working Python client fits in forty lines:

import json, os, time, requests

KEY = os.environ["KIE_API_KEY"]
H = {"Authorization": f"Bearer {KEY}", "Content-Type": "application/json"}

def submit(prompt, duration=8, reference_images=None, generate_audio=True):
    body = {
        "model": "bytedance/seedance-2",
        "input": {
            "prompt": prompt,
            "reference_image_urls": reference_images or [],
            "reference_video_urls": [],
            "reference_audio_urls": [],
            "generate_audio": generate_audio,
            "resolution": "720p",
            "aspect_ratio": "16:9",
            "duration": duration,
        },
    }
    r = requests.post("https://api.kie.ai/api/v1/jobs/createTask",
                      json=body, headers=H, timeout=30).json()
    return r["data"]["taskId"]

def wait(task_id, timeout=600):
    t0 = time.time()
    while time.time() - t0 < timeout:
        time.sleep(10)
        r = requests.get(f"https://api.kie.ai/api/v1/jobs/recordInfo?taskId={task_id}",
                         headers=H, timeout=20).json()
        data = r.get("data") or {}
        if data.get("state") == "success":
            return json.loads(data["resultJson"])["resultUrls"][0]
        if data.get("state") == "fail":
            raise RuntimeError(data.get("failMsg"))
    raise TimeoutError

Reference — Seedance 2.0 Standard vs Fast on Kie.ai

Feature	Seedance 2.0 Standard	Seedance 2.0 Fast
Max resolution	1080p	720p
Generation time	~5 minutes	~4 minutes
Price at 480p with reference	$0.0575 per second	$0.0575 per second
Price at 720p with reference	$0.125 per second	$0.125 per second
Price at 1080p with reference	$0.31 per second	✗ Not supported
Max clip duration	15 seconds	15 seconds
Native audio support	✓ Yes	✓ Yes
Recommended use case	Production-quality where motion, lip-sync, and character consistency matter	Iteration and batch generation
Supported aspect ratios	✗ Not in source	✗ Not in source

Pricing, tiers, and the Fast vs Standard call

Kie.ai's pricing has a quirk worth understanding. There are two price points per resolution: with video reference and without. The with-video version is cheaper because Kie's billing counts input and output separately in that mode. The without-video version is more expensive because the model does more work from scratch.

Kie.ai Seedance 2.0 pricing — 480p 720p 1080p tiers, with and without video reference — Full pricing breakdown. All numbers in USD per second, pulled from Kie.ai 2026-04-20.

Working math on real clips:

6s Instagram reel at 720p with reference: $0.75
10s YouTube shorts at 1080p without reference: $5.10
15s multi-shot storytelling at 720p with reference: $1.88
15s pilot pitch clip at 1080p without reference: $7.65

The practical rule we use: prototype in Fast at 480p (under $0.50 a clip), iterate the prompt and references until the shot is right, then run the final pass in Standard at 720p. 1080p only when the client brief demands it; the quality-to-cost ratio at 720p is the sweet spot.

High-tier Kie.ai top-ups include a 10% bonus, so effective cost is about 10% lower than the list price. If you're doing volume, that stacks.

Seedance 2.0 vs Sora 2 vs Veo 3.1

There are three frontier video models in 2026 and they are not interchangeable. Here is how we route work between them.

Seedance 2.0 — our default for anything that needs multi-shot consistency, reference-video input, or native audio. Best $/s at 720p.

Sora 2 — longer single-shot duration (up to 20s), slightly better photorealism on human faces, but limited multimodal input and significantly pricier per clip.

Veo 3.1 — the strongest on physical realism and weather / atmosphere. Also the model we turn to when the client insists on a Google-hosted solution. No multi-shot in a single prompt; you chain clips.

fal.ai Seedance 2.0 listing page showing an alternative provider — Seedance 2.0 is also available via fal.ai and Replicate. We use Kie.ai because it hosts the whole frontier stack.

Replicate's Seedance 2.0 model page — another provider for the same ByteDance model — Replicate runs the same ByteDance weights — handy if you already have a Replicate account.

Replicate's Seedance 2.0 API documentation tab — Replicate's Seedance 2.0 API docs — roughly the same task shape as Kie.ai, different auth and billing.

Watch the full walkthrough

If you want the moving pictures — the bullet-time freeze, the breakfast-ad multi-shot, the polar-bear VR sequence, the dancer-on-a-whale — the video version walks through the prompt ideas that triggered each clip.

Quick-recall flashcards — API & comparisons

Seedance 2.0 uses a(n) _____ system in prompts to reference specific uploaded files.

@-mention (e.g., @ref_image_1)

Which capability ensures that dialogue matches mouth movement and footsteps hit when feet hit the ground?

Native audio syncing

What character elements stay locked across shots in a multi-shot prompt?

Face, clothes, and voice

The 'reference-driven workflow' allows users to replicate specific _____ by uploading a short clip.

camera movements (e.g., orbit, whip pan, match cut)

How does 'physics-aware motion' improve video quality in Seedance 2.0?

It ensures objects have realistic weight and interact accurately with surfaces.

Which API platform is recommended for accessing Seedance 2.0 due to its unified billing and documentation?

Kie.ai

Related Articles

Built something with Seedance? Submit your AI tool and get listed in front of thousands of video creators.

Submit Your AI Tool — Free Listing →

Frequently asked questions

❓ What makes Seedance 2.0 different from Sora 2 or Veo 3.1?

Seedance is the first frontier video model that accepts all four modalities in a single request — text, image, video, and audio — with up to twelve reference files at once. It generates native audio in the same pass, supports multi-shot prompting with consistent characters, and ships clips of four to fifteen seconds. On paper its 720p pricing via Kie.ai is the cheapest of the three.

❓ How much does Seedance 2.0 cost on Kie.ai?

480p is $0.0575 per second with a reference video and $0.095 per second text-only. 720p is $0.125 per second with reference and $0.205 without. 1080p is $0.31 per second with reference and $0.51 per second without. A 10-second 720p clip with reference costs $1.25.

❓ Seedance 2.0 or Seedance 2.0 Fast — which should I use?

Standard for production-quality where motion, lip-sync, and character consistency matter. Fast for iteration and batch generation. Fast only supports 480p and 720p, not 1080p, and finishes in about four minutes vs five minutes for standard.

❓ Can I upload my own video as a reference?

Yes. Up to three reference videos per request, combined length fifteen seconds, plus up to nine reference images and three reference audio clips. Record a rough shot on your phone and have Seedance replicate the blocking and camera path in a cinematic version.

❓ Does Seedance 2.0 really generate audio?

Yes, natively. Dialogue with lip-sync, ambient soundscapes, and music in the same pass as video. Turn it off with generate_audio: false if you want to dub yourself.

❓ What's the best prompt structure for multi-shot sequences?

Give each shot its own line. Describe shot type, action, camera movement, and lighting per shot. Name the character once and reuse the same handle across every shot so the model locks on. Keep total duration under fifteen seconds for a single multi-shot clip.

The bottom line

Seedance 2.0 is the first video model that treats multi-modality as a first-class input, not an afterthought. For anyone producing short-form brand film, social video, or product demos, that changes the routing: you go to Seedance for multi-shot consistency and reference-video control, to Sora for long single shots, to Veo for physical realism.

Access it through Kie.ai because the billing is USD, the API shape matches every other model on the platform, and the same key routes Veo 3.1, Suno, Midjourney, Flux, and Nano Banana Pro. The 5,000 free credits on signup is enough for dozens of test clips. Get the credits, burn through them on the three recipes above, decide whether it fits your pipeline.

Start shipping Seedance clips

One API key for Seedance 2.0, Veo 3.1, Suno & Midjourney

Kie.ai routes the whole frontier stack through a single key at 30-80% off official rates. 5,000 free credits on signup.

$0.125/s

Seedance 720p

Free signup credits

300+

Models on one key

Claim 5,000 Free Kie.ai Credits →