Ivory Creative OS

Your AI creative production system. Select a block to get started.

🎬

Live

AI UGC Generation

Generate ultra-realistic AI video ads using kie.ai + Sora 2 Pro. Covers all content formats: talking head, b-roll, dialogue, podcast, censored method.

Format

4 Types

Framework

Modules

4 Videos

Ivory Creative OS · AI UGC Generation · V5

Sora 2 Pro → Publish-Ready Ad

The complete system for generating ultra-realistic AI UGC video ads using kie.ai. Every step from prompt to final export — exactly how it's done.

kie.ai

Sora 2 Pro

Claude

Topaz

CapCut

11 Labs

Nano Banana Pro

Cling 2.6

Adobe Podcast

Captions App

Edits (IG)

Content Formats

UGC / Discovery Format

Organic selfie-style · Most natural-looking

▼

The go-to format. An actor/actress filmed in a casual, organic setting — bathroom, bedroom, kitchen. Feels like a real person sharing a genuine product discovery.

Key technique: "Put the camera down" method — prompt the character to set the phone on a surface and film it, which adds a passive, voyeuristic quality that increases watch time.

Best for: Supplements, beauty, wellness, skincare, hair care. Any product where personal testimony matters.

Pro tip: Match the UGC style with discovery-method product clips (real or found product videos). Don't pair a podcast-style hook with discovery b-roll — keep the energy congruent.

Podcast Format

Authoritative · Brand-level presence

▼

Actor/actress filmed in a podcast-style setup (microphone, professional lighting, eye contact). Feels more authoritative and brand-driven. Works well for supplement brands targeting older demographics.

Best paired with: High-quality B-roll footage of the product. Use 11 Labs voice clone for voiceover continuity.

Pro tip: Stick to this format if you want to project brand authority. Don't mix with casual UGC product clips — it breaks the narrative.

Censored / Sensor Method

Curiosity-gap · High engagement

▼

Uses deliberate information withholding to force watch-time. A censored texture overlays the product name/key claim, paired with a bleep audio effect — making it feel like the video is revealing something you shouldn't know.

Structure: Big text bar (e.g. "Skincare is a SCAM") → censored footage → reveal that natural remedy solves it → your brand as the solution.

In CapCut: Add Google image of "censor texture" → circular blur effect (Body Effects → Blur) over product name → extract audio → add bleep from audio effects → cut word underneath bleep.

Best for: Skin care, hair care, consumer health. Any category where you can position mainstream products as the villain.

Dialogue / Reaction Split Screen

Two actors · Conversational hook

▼

Prompt two actors having a conversation about the product. Clip at moments where only one is speaking to avoid the AI sync glitch (both talking simultaneously). Composite side by side in CapCut.

Watch out: AI sometimes makes both characters say the same thing at the same time. Always preview before committing — clip around these moments.

Full V5 Production Workflow

Find Reference Image (Pinterest Method)

Tool: Pinterest

▼

Pinterest is your creative home base for every ad you make. Before you write a single word of a prompt, you need a reference image that defines your actor's look, body type, and energy. This image becomes the anchor for everything in Nano Banana Pro.

How to search by niche:
• Hair care → "woman over 50 natural hair", "60s hair growth before after"
• Weight loss → "fitness before and after", "plus size transformation"
• Skincare → "skincare model natural", "glowing skin close up"
• Supplements → "[your niche] supplement lifestyle", "healthy woman 40s 50s"
• General → "perfect influencer", "UGC content creator selfie"

What to look for: Realistic, neutral, influencer-style photography. Natural lighting, relatable settings (bathroom, bedroom, gym, kitchen). Avoid runway/editorial shots, obvious professional studio lighting, and any images that look AI-generated — Nano Banana will replicate the AI tell.

Before/after split technique: If you're building a transformation ad, find a single image with good visual contrast between two states. Screenshot the full image at high resolution, then crop it exactly in half. Import the left half as your "before" reference and the right half as your "after" reference when prompting in Nano Banana Pro.

Pro tip: Save 3–5 reference images per archetype. Different angles give Nano Banana more to work with and produce more consistent outputs when you feed multiple files.

Build Your Prompt with Claude

Tool: Claude (Build Tab above)

▼

Use the Build tab in this tool — Claude will guide you through the full prompt wizard and generate a complete, paste-ready Sora 2 Pro prompt. You don't need to use ChatGPT manually anymore.

What Claude will generate for you:
• Full physical description of the actor (age, hair texture and colour, skin tone, makeup level, outfit, jewellery, body language)
• Precise scene description (room type, lighting direction, camera angle and distance, time of day)
• Character personality and movement direction (e.g. "puts camera down casually", "looks directly into lens with slight smirk")
• 3 hook options to choose from
• Complete 15-second script calibrated to exact read time

Hook rules — non-negotiable:
• Must stop the scroll within the first 2 words
• Focus on HOPE not pain: "My hair grew past my shoulders at 60 for the first time" — not "I was losing my hair"
• Never start with "I" — it's weak
• The hook should make the viewer feel like they've stumbled onto a secret

Style references that work: "Billie Eilish energy — nonchalant, edgy, doesn't care if you believe her", "Emma Chamberlain casual — like she forgot the camera was on", "Good looking girl next door — relatable but aspirational". These dramatically improve output quality.

Iterate aggressively. Generate 3 hooks, pick the best one, then ask Claude to make it better. A great hook can 3x your CTR vs a mediocre one on the exact same video.

Test on Standalone Sora First

Tool: sora.com — NOT kie.ai yet

▼

Before you spend $3 on kie.ai and wait 12–18 minutes, paste your exact prompt into sora.com (standalone). A Sora test costs ~$0.30 and generates in 2–3 minutes. This is your validation step.

What to evaluate on the test output:
• Is this the right actor archetype? (age, skin tone, hair, body type)
• Is the camera angle correct? (too close, too far, wrong orientation)
• Does the personality and vibe match? (too stiff, too over-the-top, feels fake)
• If podcast format — is the microphone the right size and position?
• Is there good eye contact with the camera?
• Does the setting feel authentic to the character?

How to fix a bad test: Go back to the Build tab → use the Refine field to describe what's wrong in plain language. Examples: "make her less stiff, more casual", "age her up 10 years", "move the camera back — too close to her face", "make the background look more like a real bathroom, less staged". Re-test on Sora until the concept is locked.

Never skip this step. kie.ai uses the exact same Sora 2 Pro engine. One unvalidated prompt on kie.ai = $3 and 15 minutes wasted. At 10 bad prompts a week that's $30 and 2.5 hours — gone. Test on Sora first, every single time.

Generate Final Video on kie.ai

Tool: kie.ai → Sora 2 Pro → Text to Video

▼

Prompt is locked and Sora-tested. Now go to kie.ai → Sora 2 Pro → Text to Video. Paste your exact finalized prompt. Do not change anything between the Sora test and kie.ai — the outputs are from the same engine and the prompt is already validated.

Format

Portrait

Duration

15 seconds

Quality

High

Cost

~$3/video

Gen time

12–18 min

Watermark

Remove ($0.05)

Character setup — critical: When setting up a character in kie.ai, there are two name fields. The username/handle is what you tag in the prompt (e.g. @stimulox). The display name is what the actor will actually say out loud in the video. Set the display name to your product name exactly as you want it spoken — e.g. "Stimulox Hair Gummies" not "@stimuloxofficial".

Multi-language: kie.ai supports any language and ethnicity in prompts. Simply request the output language and specify the actor's ethnicity. The model handles it natively — no post-processing needed.

Standard quality vs High: Always use High for hero ads. If you plan to upscale in Topaz anyway, you can use Standard to reduce cost — Topaz recovers most of the detail. But for final ads going live, High is worth the extra spend.

Upscale in Topaz Video AI

Tool: Topaz Video AI

▼

Topaz upscaling is the single biggest differentiator between your ads and every other AI UGC creator's content. The raw kie.ai output is ~1024×1792 — slightly under 1080p. Topaz brings it to ~2.7K resolution with added grain and motion smoothing that makes it look shot on a real phone.

Model

Proteus

Upscale

Frame Rate

60 FPS

Grain Amount

Grain Size

Add Noise

None

Recover Detail

Output Size

~1.2–1.5 GB

Why 60 FPS upscale then descale to 30? Upscaling to 60 FPS and then descaling to 30 FPS in CapCut creates smoother motion interpolation than going direct to 30 FPS. The intermediate 60 FPS pass forces Topaz to generate more in-between frames, which when halved back to 30 FPS produces cleaner, more natural motion.

For Cling 2.6 animated product images: Use Chronos Fast model (not Proteus) with the same 60 FPS setting. Same resolution. This is also the fix for Sora character upload failures — Sora registers raw Cling output as AI, but a Topaz-upscaled version passes through.

GPU intensive. Close all other applications before running. On a 4090 this takes ~1 minute. On a standard gaming PC expect 10–15 minutes. Output files will be 1.2–1.5 GB — this is expected. You descale next.

Descale in CapCut

Tool: CapCut → Export Settings

▼

Import the Topaz output into CapCut. The 1.2–1.5 GB file is too large to work with directly — you need to compress it down while preserving the quality gains from Topaz. The goal is a file that Instagram can auto-descale to 2K without losing the resolution advantage.

Resolution

Frame Rate

60 FPS

Bit Rate

Recommended

Output Size

~22 MB

Why 4K and not 1080p? When you upload a 4K file to Instagram or Meta Ads, their CDN auto-descales it to 2K. When you upload a 1080p file, they descale to 1080p. The 4K → 2K path gives you a meaningfully sharper output on the viewer's screen, particularly on newer phones. Always export 4K here even though the source was 1080p — Topaz added the resolution, CapCut preserves it, Instagram delivers it.

After export, transfer to phone (AirDrop on Mac, Google Drive otherwise) and import directly into the Edits app. Do not re-compress further.

Enhance Audio — Adobe Podcast

Tool: podcast.adobe.com/en/enhance

▼

The tinny, slightly metallic quality of AI-generated voice is the #1 tell that a video is fake. Adobe Podcast Enhance is a free tool that runs a noise reduction and audio clarity pass — transforming the audio to sound like it was recorded on an iPhone in a real room.

Process: Go to podcast.adobe.com/en/enhance → upload your CapCut export → adjust the sliders → listen to both original and enhanced in real time.

Enhance slider

50–70%

BG Noise removal

45–60%

How to calibrate: Play the original and enhanced back to back. You're listening for: does it sound like it was recorded in a real space? Is there natural room tone? Does it sound like iPhone audio or like a podcast microphone? You want it in the middle — present and clear but not clinical. If it sounds too clean and studio-like, dial both sliders back.

Export correctly: When downloading, deselect Video and select Audio only → MP3 format. You only need the audio file for 11 Labs cloning. Don't download the video — you already have that from CapCut.

Note: This step is especially important if you're using VEO 3 or another engine that produces more tinny audio. Sora 2 Pro via kie.ai has better native audio, but the enhancement still adds meaningful realism.

Clone the Voice in 11 Labs

Tool: ElevenLabs → Instant Voice Clone

▼

This is the Founder's Method — you clone the AI actor's voice from the Sora video so that all subsequent narration (over your Cling 2.6 product clips) sounds like the same person. It creates a seamless, congruent video that feels like one continuous piece of UGC.

Process: Go to elevenlabs.io → Voices → Add Voice → Instant Voice Clone → upload the MP3 from Adobe Podcast → name it (e.g. "hairgummies_influencer_v1") → save.

Critical: Do NOT enable "Remove Background Noise" inside 11 Labs when cloning. The ambient background noise from the original recording is what makes the cloned voice sound like it was captured in the same real-world environment. Removing it produces a sterile, obviously synthetic clone. Leave it on.

Writing the voiceover script: This is the narration that plays over your Cling 2.6 animated product shots at the end of the video. It should:
• Pick up naturally where the Sora hook left off
• Mention the product name naturally (not salesy)
• Include a specific benefit or result
• Close with a soft CTA ("I literally won't stop using it")
• Match the same tempo and energy as the original — listen to the Sora clip first, then write to that rhythm

Enhance setting in 11 Labs: Use V3 model, set to Enhance. Generates in seconds. Listen through before downloading — if it sounds robotic on a specific word, regenerate that sentence in isolation.

Create Product Stills — Nano Banana Pro

Tool: kie.ai → Nano Banana Pro

▼

Nano Banana Pro is inside kie.ai. It generates photorealistic product lifestyle images — your actor holding the product, using it, reacting to it. These images become the source material you animate in Cling 2.6.

Capturing still frames from your Sora video: Open the CapCut project containing your Sora footage → scrub to moments where the actor's face is clearly lit and fully visible → use Export Still Frame to save 4–5 images at different angles. These become your actor reference files in Nano Banana.

Nano Banana prompt structure:
"This exact girl from file 1, file 2, file 3 [always upload multiple for consistency], holding [product name] from file 6, in [setting]. Same photo composition and lighting as file 7 [your Pinterest reference]. [Product] text is legible and facing forward."

Product image setup: For your product file — use a clean product shot on white or transparent background. Nano Banana will place it in-scene. If the label text renders incorrectly (common), iterate: "Improve the legibility of the label text on the product."

Resolution

Format

9:16 or 1:1

Cost

Same as 1K

Congruency check: After generating, compare side by side with your Sora video still. Does the skin tone match? Hair colour? If there's drift, use CapCut's Retouch tool to adjust skin tone on the Nano Banana image — don't try to fix it in Nano Banana itself (it's faster in CapCut).

Also use Nano Banana for: Before/after images using your Pinterest split references, static ad images for Facebook (no animation needed), Shopify product page lifestyle shots, and founder-style images holding the product.

Animate Product Images — Cling 2.6

Tool: Cling 2.6 → Image to Video

▼

Cling 2.6 brings your Nano Banana product stills to life. These animated clips play at the end of the video over the 11 Labs voiceover — they're the product demonstration and close section of your ad.

Navigate correctly: Go to Cling → Image to Video. NOT Motion Control — that's a different mode with different output behaviour. Make sure you're in Image to Video before uploading.

Sample prompts by clip type:
• Product hold: "Natural hand movement holding the product, slight finger adjust, static camera [none], she glances down at it then back up"
• Product use: "She opens the bottle, pours one gummy into her palm, smiles slightly, static camera movement"
• Reaction: "She runs fingers through her hair, eyes widen slightly, looks at camera, slight smile, handheld camera natural sway"
• Product on surface: "Realistic handheld camera slowly pans toward the supplement bottle on a marble countertop, natural indoor lighting, slight camera shake"

After Cling generates: Cling exports at 30 FPS. In CapCut, apply Optical Flow to interpolate to 60 FPS — this smooths the motion dramatically. Then set playback speed to 1.2–2x depending on how slow the movement is. Natural movement should feel like someone casually filming on their phone, not slow-motion.

Fix for Sora character upload fail: Raw Cling output is flagged as AI by Sora's detection. Run it through Topaz first (Chronos Fast, 60 FPS, same resolution) → descale in CapCut to 1080p/30 FPS → Sora will now accept it as a character reference.

Assemble Final Video in CapCut

Tool: CapCut

Assembly order: Topaz-upscaled + CapCut-descaled Sora video → 11 Labs voiceover audio → Cling 2.6 animated product clips (with Optical Flow applied)

Cling clip settings:
• Apply Optical Flow (right-click clip → Interpolation → Optical Flow)
• Speed: 1.2–2x depending on motion (preview and judge — it should feel like casual phone filming)
• Zoom to 102–119% to eliminate black bars from aspect ratio differences
• If a clip is too fast at 1.2 and too slow at 1x, try 1.1x with Optical Flow

Handheld zoom effect (cubic ease): Select the clip → add a keyframe at the start (e.g. 119% zoom) → move playhead forward 1–2 seconds → add another keyframe at 128% zoom → right-click the second keyframe → Show All Presets → select Cubic Ease. This creates a smooth, organic zoom-in that mimics a real person physically adjusting their phone grip.

Color filter — apply to every clip consistently:

Temperature

Exposure

–4

Contrast

–8

Highlights

–18

Shadows

Particles

Fade

Save this as a template in CapCut (Adjustment → Save as Template → name it "Ivory UGC") so you can one-click apply it to every future project without manually entering values.

TikTok comment overlay: Generate at fakecommentgenerator.com. Rules: keep it organic — if it's a podcast clip, have "someone" tag the brand handle. If it's UGC, make it look like a friend tagging the brand. Character limit matches real TikTok (150 chars for text comments). Place for 5–7 seconds, fade out. Never use generic comments like "this is amazing!" — specificity is what sells it as real.

Format for Facebook in-feed (1:1): Switch canvas to 3:4 → stretch clip to fill → move the comment overlay higher in frame. Still reads great and performs well on Facebook placements.

Skin realism tip: If any clip has that smooth, oily AI skin look that's a tell, go to Effects → Body Effects → Skin → apply a slight matte skin effect. Or use CapCut's Retouch → Smooth at 20–30% to flatten it without making it look filtered.

Add Captions + Export via Edits App

Tool: Edits (Instagram app) → Captions App (iOS)

▼

Upload your 4K/60 FPS CapCut export to your phone. Open in Captions app (iOS) → auto-add captions → add zoom feature (increases retention at the 3–5s drop-off point). Export.

Then import into Edits (Instagram's editing app). Anything edited in Edits gets an algorithm push. Add final captions here if needed, then export. Instagram auto-descales to 2K.

Total time per ad: ~20–25 minutes once you have the workflow dialed. You can produce 3–4 quality ads per hour at scale.

Product & Lifestyle Images — Whisk (Google)

Whisk for Product Page Lifestyle Images

Tool: Whisk by Google · Free

▼

Use Whisk (labs.google/whisk) for all lifestyle product photography on your Shopify store. It's free, owned by Google, and produces $20K shoot-level quality.

What works: Specific location prompts (Aman resort, gym, poolside), multi-racial model variations (target all 4 — Asian, Black, Latina, White — then double down on your top buyer demo), product with logo placed on clothing or held by model.

Format: 16:9 for hero images, 9:16 for vertical/testimonial images. Max at 2K resolution (same quality as 4K, cheaper).

Iteration method: Don't rewrite the whole prompt. Just target what's wrong: "Improve the physical appearance of the girl" / "Make the product clearer in frame" / "Improve her body composition."

Bonus: Whisk allows celebrity likeness in prompts — you can reference public figures to create mock lifestyle imagery. Use carefully and with discretion.

🤖

I'm your AI UGC production assistant, trained on the full v1– framework. Ask me anything about kie.ai, Sora 2 Pro, Topaz settings, 11 Labs voice cloning, Cling 2.6, the censored method, CapCut editing — anything in the workflow. What do you need?