The go-to format. An actor/actress filmed in a casual, organic setting — bathroom, bedroom, kitchen. Feels like a real person sharing a genuine product discovery.
Key technique: "Put the camera down" method — prompt the character to set the phone on a surface and film it, which adds a passive, voyeuristic quality that increases watch time.
Best for: Supplements, beauty, wellness, skincare, hair care. Any product where personal testimony matters.
Actor/actress filmed in a podcast-style setup (microphone, professional lighting, eye contact). Feels more authoritative and brand-driven. Works well for supplement brands targeting older demographics.
Best paired with: High-quality B-roll footage of the product. Use 11 Labs voice clone for voiceover continuity.
Uses deliberate information withholding to force watch-time. A censored texture overlays the product name/key claim, paired with a bleep audio effect — making it feel like the video is revealing something you shouldn't know.
Structure: Big text bar (e.g. "Skincare is a SCAM") → censored footage → reveal that natural remedy solves it → your brand as the solution.
In CapCut: Add Google image of "censor texture" → circular blur effect (Body Effects → Blur) over product name → extract audio → add bleep from audio effects → cut word underneath bleep.
Prompt two actors having a conversation about the product. Clip at moments where only one is speaking to avoid the AI sync glitch (both talking simultaneously). Composite side by side in CapCut.
Pinterest is your creative home base for every ad you make. Before you write a single word of a prompt, you need a reference image that defines your actor's look, body type, and energy. This image becomes the anchor for everything in Nano Banana Pro.
How to search by niche:
• Hair care → "woman over 50 natural hair", "60s hair growth before after"
• Weight loss → "fitness before and after", "plus size transformation"
• Skincare → "skincare model natural", "glowing skin close up"
• Supplements → "[your niche] supplement lifestyle", "healthy woman 40s 50s"
• General → "perfect influencer", "UGC content creator selfie"
What to look for: Realistic, neutral, influencer-style photography. Natural lighting, relatable settings (bathroom, bedroom, gym, kitchen). Avoid runway/editorial shots, obvious professional studio lighting, and any images that look AI-generated — Nano Banana will replicate the AI tell.
Before/after split technique: If you're building a transformation ad, find a single image with good visual contrast between two states. Screenshot the full image at high resolution, then crop it exactly in half. Import the left half as your "before" reference and the right half as your "after" reference when prompting in Nano Banana Pro.
Use the Build tab in this tool — Claude will guide you through the full prompt wizard and generate a complete, paste-ready Sora 2 Pro prompt. You don't need to use ChatGPT manually anymore.
What Claude will generate for you:
• Full physical description of the actor (age, hair texture and colour, skin tone, makeup level, outfit, jewellery, body language)
• Precise scene description (room type, lighting direction, camera angle and distance, time of day)
• Character personality and movement direction (e.g. "puts camera down casually", "looks directly into lens with slight smirk")
• 3 hook options to choose from
• Complete 15-second script calibrated to exact read time
Hook rules — non-negotiable:
• Must stop the scroll within the first 2 words
• Focus on HOPE not pain: "My hair grew past my shoulders at 60 for the first time" — not "I was losing my hair"
• Never start with "I" — it's weak
• The hook should make the viewer feel like they've stumbled onto a secret
Style references that work: "Billie Eilish energy — nonchalant, edgy, doesn't care if you believe her", "Emma Chamberlain casual — like she forgot the camera was on", "Good looking girl next door — relatable but aspirational". These dramatically improve output quality.
Before you spend $3 on kie.ai and wait 12–18 minutes, paste your exact prompt into sora.com (standalone). A Sora test costs ~$0.30 and generates in 2–3 minutes. This is your validation step.
What to evaluate on the test output:
• Is this the right actor archetype? (age, skin tone, hair, body type)
• Is the camera angle correct? (too close, too far, wrong orientation)
• Does the personality and vibe match? (too stiff, too over-the-top, feels fake)
• If podcast format — is the microphone the right size and position?
• Is there good eye contact with the camera?
• Does the setting feel authentic to the character?
How to fix a bad test: Go back to the Build tab → use the Refine field to describe what's wrong in plain language. Examples: "make her less stiff, more casual", "age her up 10 years", "move the camera back — too close to her face", "make the background look more like a real bathroom, less staged". Re-test on Sora until the concept is locked.
Prompt is locked and Sora-tested. Now go to kie.ai → Sora 2 Pro → Text to Video. Paste your exact finalized prompt. Do not change anything between the Sora test and kie.ai — the outputs are from the same engine and the prompt is already validated.
Character setup — critical: When setting up a character in kie.ai, there are two name fields. The username/handle is what you tag in the prompt (e.g. @stimulox). The display name is what the actor will actually say out loud in the video. Set the display name to your product name exactly as you want it spoken — e.g. "Stimulox Hair Gummies" not "@stimuloxofficial".
Multi-language: kie.ai supports any language and ethnicity in prompts. Simply request the output language and specify the actor's ethnicity. The model handles it natively — no post-processing needed.
Topaz upscaling is the single biggest differentiator between your ads and every other AI UGC creator's content. The raw kie.ai output is ~1024×1792 — slightly under 1080p. Topaz brings it to ~2.7K resolution with added grain and motion smoothing that makes it look shot on a real phone.
Why 60 FPS upscale then descale to 30? Upscaling to 60 FPS and then descaling to 30 FPS in CapCut creates smoother motion interpolation than going direct to 30 FPS. The intermediate 60 FPS pass forces Topaz to generate more in-between frames, which when halved back to 30 FPS produces cleaner, more natural motion.
For Cling 2.6 animated product images: Use Chronos Fast model (not Proteus) with the same 60 FPS setting. Same resolution. This is also the fix for Sora character upload failures — Sora registers raw Cling output as AI, but a Topaz-upscaled version passes through.
Import the Topaz output into CapCut. The 1.2–1.5 GB file is too large to work with directly — you need to compress it down while preserving the quality gains from Topaz. The goal is a file that Instagram can auto-descale to 2K without losing the resolution advantage.
Why 4K and not 1080p? When you upload a 4K file to Instagram or Meta Ads, their CDN auto-descales it to 2K. When you upload a 1080p file, they descale to 1080p. The 4K → 2K path gives you a meaningfully sharper output on the viewer's screen, particularly on newer phones. Always export 4K here even though the source was 1080p — Topaz added the resolution, CapCut preserves it, Instagram delivers it.
After export, transfer to phone (AirDrop on Mac, Google Drive otherwise) and import directly into the Edits app. Do not re-compress further.
The tinny, slightly metallic quality of AI-generated voice is the #1 tell that a video is fake. Adobe Podcast Enhance is a free tool that runs a noise reduction and audio clarity pass — transforming the audio to sound like it was recorded on an iPhone in a real room.
Process: Go to podcast.adobe.com/en/enhance → upload your CapCut export → adjust the sliders → listen to both original and enhanced in real time.
How to calibrate: Play the original and enhanced back to back. You're listening for: does it sound like it was recorded in a real space? Is there natural room tone? Does it sound like iPhone audio or like a podcast microphone? You want it in the middle — present and clear but not clinical. If it sounds too clean and studio-like, dial both sliders back.
Export correctly: When downloading, deselect Video and select Audio only → MP3 format. You only need the audio file for 11 Labs cloning. Don't download the video — you already have that from CapCut.
This is the Founder's Method — you clone the AI actor's voice from the Sora video so that all subsequent narration (over your Cling 2.6 product clips) sounds like the same person. It creates a seamless, congruent video that feels like one continuous piece of UGC.
Process: Go to elevenlabs.io → Voices → Add Voice → Instant Voice Clone → upload the MP3 from Adobe Podcast → name it (e.g. "hairgummies_influencer_v1") → save.
Writing the voiceover script: This is the narration that plays over your Cling 2.6 animated product shots at the end of the video. It should:
• Pick up naturally where the Sora hook left off
• Mention the product name naturally (not salesy)
• Include a specific benefit or result
• Close with a soft CTA ("I literally won't stop using it")
• Match the same tempo and energy as the original — listen to the Sora clip first, then write to that rhythm
Enhance setting in 11 Labs: Use V3 model, set to Enhance. Generates in seconds. Listen through before downloading — if it sounds robotic on a specific word, regenerate that sentence in isolation.
Nano Banana Pro is inside kie.ai. It generates photorealistic product lifestyle images — your actor holding the product, using it, reacting to it. These images become the source material you animate in Cling 2.6.
Capturing still frames from your Sora video: Open the CapCut project containing your Sora footage → scrub to moments where the actor's face is clearly lit and fully visible → use Export Still Frame to save 4–5 images at different angles. These become your actor reference files in Nano Banana.
Nano Banana prompt structure:
"This exact girl from file 1, file 2, file 3 [always upload multiple for consistency], holding [product name] from file 6, in [setting]. Same photo composition and lighting as file 7 [your Pinterest reference]. [Product] text is legible and facing forward."
Product image setup: For your product file — use a clean product shot on white or transparent background. Nano Banana will place it in-scene. If the label text renders incorrectly (common), iterate: "Improve the legibility of the label text on the product."
Congruency check: After generating, compare side by side with your Sora video still. Does the skin tone match? Hair colour? If there's drift, use CapCut's Retouch tool to adjust skin tone on the Nano Banana image — don't try to fix it in Nano Banana itself (it's faster in CapCut).
Cling 2.6 brings your Nano Banana product stills to life. These animated clips play at the end of the video over the 11 Labs voiceover — they're the product demonstration and close section of your ad.
Navigate correctly: Go to Cling → Image to Video. NOT Motion Control — that's a different mode with different output behaviour. Make sure you're in Image to Video before uploading.
Sample prompts by clip type:
• Product hold: "Natural hand movement holding the product, slight finger adjust, static camera [none], she glances down at it then back up"
• Product use: "She opens the bottle, pours one gummy into her palm, smiles slightly, static camera movement"
• Reaction: "She runs fingers through her hair, eyes widen slightly, looks at camera, slight smile, handheld camera natural sway"
• Product on surface: "Realistic handheld camera slowly pans toward the supplement bottle on a marble countertop, natural indoor lighting, slight camera shake"
After Cling generates: Cling exports at 30 FPS. In CapCut, apply Optical Flow to interpolate to 60 FPS — this smooths the motion dramatically. Then set playback speed to 1.2–2x depending on how slow the movement is. Natural movement should feel like someone casually filming on their phone, not slow-motion.
Assembly order: Topaz-upscaled + CapCut-descaled Sora video → 11 Labs voiceover audio → Cling 2.6 animated product clips (with Optical Flow applied)
Cling clip settings:
• Apply Optical Flow (right-click clip → Interpolation → Optical Flow)
• Speed: 1.2–2x depending on motion (preview and judge — it should feel like casual phone filming)
• Zoom to 102–119% to eliminate black bars from aspect ratio differences
• If a clip is too fast at 1.2 and too slow at 1x, try 1.1x with Optical Flow
Handheld zoom effect (cubic ease): Select the clip → add a keyframe at the start (e.g. 119% zoom) → move playhead forward 1–2 seconds → add another keyframe at 128% zoom → right-click the second keyframe → Show All Presets → select Cubic Ease. This creates a smooth, organic zoom-in that mimics a real person physically adjusting their phone grip.
Color filter — apply to every clip consistently:
Save this as a template in CapCut (Adjustment → Save as Template → name it "Ivory UGC") so you can one-click apply it to every future project without manually entering values.
TikTok comment overlay: Generate at fakecommentgenerator.com. Rules: keep it organic — if it's a podcast clip, have "someone" tag the brand handle. If it's UGC, make it look like a friend tagging the brand. Character limit matches real TikTok (150 chars for text comments). Place for 5–7 seconds, fade out. Never use generic comments like "this is amazing!" — specificity is what sells it as real.
Format for Facebook in-feed (1:1): Switch canvas to 3:4 → stretch clip to fill → move the comment overlay higher in frame. Still reads great and performs well on Facebook placements.
Upload your 4K/60 FPS CapCut export to your phone. Open in Captions app (iOS) → auto-add captions → add zoom feature (increases retention at the 3–5s drop-off point). Export.
Then import into Edits (Instagram's editing app). Anything edited in Edits gets an algorithm push. Add final captions here if needed, then export. Instagram auto-descales to 2K.
Use Whisk (labs.google/whisk) for all lifestyle product photography on your Shopify store. It's free, owned by Google, and produces $20K shoot-level quality.
What works: Specific location prompts (Aman resort, gym, poolside), multi-racial model variations (target all 4 — Asian, Black, Latina, White — then double down on your top buyer demo), product with logo placed on clothing or held by model.
Format: 16:9 for hero images, 9:16 for vertical/testimonial images. Max at 2K resolution (same quality as 4K, cheaper).
Iteration method: Don't rewrite the whole prompt. Just target what's wrong: "Improve the physical appearance of the girl" / "Make the product clearer in frame" / "Improve her body composition."