Looking for the best way to create AI video with synchronized audio? This guide covers everything about LTX 2.3 — the latest open-source video model from Lightricks that generates video with native audio sync and lip-sync. No post-production needed — audio and video come out together, perfectly matched.
What Is LTX 2.3?
LTX 2.3 is a production-grade open-source AI video model, available on Hugging Face. It builds on Lightricks’ LTX-Video architecture with significant improvements in visual quality, audio generation, and lip-sync accuracy. What makes LTX 2.3 stand out is its ability to generate synchronized audio alongside video — including speech, music, and ambient sound — from a single prompt or image input. It runs on our RunPod GPU backend, so you don’t need any hardware of your own.
Why Choose LTX 2.3 for AI Video?
🎵 Native Audio Sync
LTX 2.3 generates matching audio — dialogue, music, ambient sounds — all perfectly synchronized with the video output.
👄 Lip-Sync Mode
Upload your own voiceover or song and LTX 2.3 generates video with accurate lip movements — talking or singing.
🎨 Custom LoRA Support
Load up to two LTX 2.3-compatible LoRAs for consistent characters and styles. Train your own or grab one from CivitAI.
⚡ Fast & Affordable
Results in minutes, not hours. Pay-as-you-go pricing starting from just $0.26 per generation — no subscription required.
LTX 2.3 Workflows on druidcat.com
LTX 2.3 is available as two dedicated workflows on druidcat.com — and both are also built into the free Kitty AI Studio desktop app.
🎬 Text or Image to Video
The core LTX 2.3 workflow supports both text-to-video and image-to-video in a single endpoint:
- Text to Video — describe a scene with visual and audio cues. LTX 2.3 generates matching video with synchronized sound.
- Image to Video — upload a starting image and describe the animation. The image comes to life with motion and audio.
- Up to 15 seconds at 24 fps
- Full negative prompt support for precise control
Example prompt: “A cheerful girl with curly hair holding a red umbrella while rain falls gently around her. She sings ‘I love the rain’ with a melodic tune as soft ambient rain sounds fill the scene.”
Pricing: 10 seconds — $0.26 | 15 seconds — $0.32
🎧 Audio Sync — Image to Video with Your Audio
This is where LTX 2.3 really shines. Upload your own audio — voiceover, song, sound effects — and an image. LTX 2.3 generates video synced to your audio with accurate lip-sync.
- Three lip-sync modes: Person Talks, Person Sings, or No Lip Sync
- Up to 30 seconds at 24 fps
- Two LoRA slots with adjustable strength (0–1.5) for character/style consistency
- Great for music from Udio or Suno, voiceovers, or any audio track
Perfect for: AI music videos, talking-head content, AI influencers, product demos with narration, and lip-synced character animation.
Pricing: $0.29 per 5 seconds | 5s = $0.29 | 15s = $0.87 | 30s = $1.74
LTX 2.3 Workflows at a Glance
| Workflow | Input | Output | Max Duration | Price |
|---|---|---|---|---|
| Text to Video | Text prompt | Video + Audio | 15s | from $0.26 |
| Image to Video | Image + Prompt | Video + Audio | 15s | from $0.26 |
| Audio Sync I2V | Image + Your Audio | Lip-synced video | 30s | from $0.29 |
How LTX 2.3 Audio Sync Works
Unlike older approaches that generate video first and dub audio on top, LTX 2.3 processes audio and video jointly during generation. Here’s the workflow:
- Upload a face image — a clear portrait works best
- Upload your audio — speech, singing, or any sound (up to 30 seconds)
- Choose the lip-sync mode — “Person Talks” for speech, “Person Sings” for music, or “No Lip Sync” for ambient video
- Write your prompt — describe the scene, background, and any additional motion
- Optionally load LoRAs — up to two custom LoRAs for character or style consistency
- Generate — LTX 2.3 outputs video with mouth movements synced to your audio
The result is a single video file with your original audio perfectly matched to the generated lip movements. No manual alignment needed.
🎨 Custom LoRA Support
The Audio Sync workflow supports two LoRA slots with adjustable strength (0 to 1.5). This means you can:
- Load a character LoRA for consistent face and body across multiple generations
- Add a style LoRA for a specific art direction — anime, cinematic, vintage, etc.
- Stack both for maximum control over the final look
LoRAs must be LTX 2.3-compatible. You can find community-trained models on Hugging Face or CivitAI, or train your own using Lightricks’ published training scripts.
💡 Pro Tips for LTX 2.3
- Describe sounds in your prompt — LTX 2.3 reads audio cues from text. Mention “soft piano music,” “city traffic sounds,” or “she whispers ‘hello'” for matching audio output.
- Use clear portraits for lip-sync — a well-lit, front-facing face produces the best mouth movement accuracy.
- Keep LoRA strength moderate — start at 0.6 and adjust. Going above 1.0 can introduce artifacts.
- Leverage negative prompts — exclude “blurry, low quality, static, no movement” for consistently better results.
- Chain clips for longer content — generate multiple segments and combine them in Kitty AI Studio‘s timeline editor for seamless longer videos.
- Smooth the result — use Frame Interpolation to boost frame rate from 24 to 48 or 60 fps for silky playback.
🐱 Use LTX 2.3 in Kitty AI Studio Desktop App
Both LTX 2.3 workflows are available directly inside the free Kitty AI Studio desktop app — a full video editor with 50+ AI workflows, multi-track timeline, animated captions, color grading, and frame-precise editing.
Generate your LTX 2.3 clips, drag them onto the timeline, trim, arrange, add music, captions, and export — all without leaving the app. It’s the fastest way to turn AI-generated clips into finished content.
Download Kitty App – Free Deskop App →
🚀 Get Started with LTX 2.3
- Go to druidcat.com/kitty-ai-studio or open the Kitty AI Studio desktop app
- Choose your workflow — Text/Image to Video for general content, Audio Sync for lip-synced video
- Upload your inputs — image, audio, or just write a text prompt
- Set duration and options — pick duration, lip-sync mode, optional LoRAs
- Generate — your video with synchronized audio is ready in minutes
LTX 2.3 brings professional audio-synced AI video to everyone — open-source, affordable, and available right now on druidcat.com.
Need GPU power for your own LTX 2.3 projects? Check out RunPod for affordable cloud GPU rentals.