LTX-2.3 Studio Complete Guide

Looking for the best way to create AI video with synchronized audio? This guide covers everything about LTX 2.3 — the latest open-source video model from Lightricks that generates video with native audio sync and lip-sync. No post-production needed — audio and video come out together, perfectly matched.

What Is LTX 2.3?

LTX 2.3 is a production-grade open-source AI video model, available on Hugging Face. It builds on Lightricks’ LTX-Video architecture with significant improvements in visual quality, audio generation, and lip-sync accuracy. What makes LTX 2.3 stand out is its ability to generate synchronized audio alongside video — including speech, music, and ambient sound — from a single prompt or image input. It runs on our RunPod GPU backend, so you don’t need any hardware of your own.


Why Choose LTX 2.3 for AI Video?

🎵 Native Audio Sync

LTX 2.3 generates matching audio — dialogue, music, ambient sounds — all perfectly synchronized with the video output.

👄 Lip-Sync Mode

Upload your own voiceover or song and LTX 2.3 generates video with accurate lip movements — talking or singing.

🎨 Custom LoRA Support

Load up to two LTX 2.3-compatible LoRAs for consistent characters and styles. Train your own or grab one from CivitAI.

⚡ Fast & Affordable

Results in minutes, not hours. Pay-as-you-go pricing starting from just $0.26 per generation — no subscription required.


LTX 2.3 Workflows on druidcat.com

LTX 2.3 is available as two dedicated workflows on druidcat.com — and both are also built into the free Kitty AI Studio desktop app.

🎬 Text or Image to Video

The core LTX 2.3 workflow supports both text-to-video and image-to-video in a single endpoint:

  • Text to Video — describe a scene with visual and audio cues. LTX 2.3 generates matching video with synchronized sound.
  • Image to Video — upload a starting image and describe the animation. The image comes to life with motion and audio.
  • Up to 15 seconds at 24 fps
  • Full negative prompt support for precise control

Example prompt: “A cheerful girl with curly hair holding a red umbrella while rain falls gently around her. She sings ‘I love the rain’ with a melodic tune as soft ambient rain sounds fill the scene.”

Pricing: 10 seconds — $0.26  |  15 seconds — $0.32

🎧 Audio Sync — Image to Video with Your Audio

This is where LTX 2.3 really shines. Upload your own audio — voiceover, song, sound effects — and an image. LTX 2.3 generates video synced to your audio with accurate lip-sync.

  • Three lip-sync modes: Person Talks, Person Sings, or No Lip Sync
  • Up to 30 seconds at 24 fps
  • Two LoRA slots with adjustable strength (0–1.5) for character/style consistency
  • Great for music from Udio or Suno, voiceovers, or any audio track

Perfect for: AI music videos, talking-head content, AI influencers, product demos with narration, and lip-synced character animation.

Pricing: $0.29 per 5 seconds  |  5s = $0.29  |  15s = $0.87  |  30s = $1.74


LTX 2.3 Workflows at a Glance

WorkflowInputOutputMax DurationPrice
Text to VideoText promptVideo + Audio15sfrom $0.26
Image to VideoImage + PromptVideo + Audio15sfrom $0.26
Audio Sync I2VImage + Your AudioLip-synced video30sfrom $0.29

How LTX 2.3 Audio Sync Works

Unlike older approaches that generate video first and dub audio on top, LTX 2.3 processes audio and video jointly during generation. Here’s the workflow:

  1. Upload a face image — a clear portrait works best
  2. Upload your audio — speech, singing, or any sound (up to 30 seconds)
  3. Choose the lip-sync mode — “Person Talks” for speech, “Person Sings” for music, or “No Lip Sync” for ambient video
  4. Write your prompt — describe the scene, background, and any additional motion
  5. Optionally load LoRAs — up to two custom LoRAs for character or style consistency
  6. Generate — LTX 2.3 outputs video with mouth movements synced to your audio

The result is a single video file with your original audio perfectly matched to the generated lip movements. No manual alignment needed.


🎨 Custom LoRA Support

The Audio Sync workflow supports two LoRA slots with adjustable strength (0 to 1.5). This means you can:

  • Load a character LoRA for consistent face and body across multiple generations
  • Add a style LoRA for a specific art direction — anime, cinematic, vintage, etc.
  • Stack both for maximum control over the final look

LoRAs must be LTX 2.3-compatible. You can find community-trained models on Hugging Face or CivitAI, or train your own using Lightricks’ published training scripts.


💡 Pro Tips for LTX 2.3

  • Describe sounds in your prompt — LTX 2.3 reads audio cues from text. Mention “soft piano music,” “city traffic sounds,” or “she whispers ‘hello'” for matching audio output.
  • Use clear portraits for lip-sync — a well-lit, front-facing face produces the best mouth movement accuracy.
  • Keep LoRA strength moderate — start at 0.6 and adjust. Going above 1.0 can introduce artifacts.
  • Leverage negative prompts — exclude “blurry, low quality, static, no movement” for consistently better results.
  • Chain clips for longer content — generate multiple segments and combine them in Kitty AI Studio‘s timeline editor for seamless longer videos.
  • Smooth the result — use Frame Interpolation to boost frame rate from 24 to 48 or 60 fps for silky playback.

🐱 Use LTX 2.3 in Kitty AI Studio Desktop App

Both LTX 2.3 workflows are available directly inside the free Kitty AI Studio desktop app — a full video editor with 50+ AI workflows, multi-track timeline, animated captions, color grading, and frame-precise editing.

Generate your LTX 2.3 clips, drag them onto the timeline, trim, arrange, add music, captions, and export — all without leaving the app. It’s the fastest way to turn AI-generated clips into finished content.

Download Kitty App – Free Deskop App →


🚀 Get Started with LTX 2.3

  1. Go to druidcat.com/kitty-ai-studio or open the Kitty AI Studio desktop app
  2. Choose your workflow — Text/Image to Video for general content, Audio Sync for lip-synced video
  3. Upload your inputs — image, audio, or just write a text prompt
  4. Set duration and options — pick duration, lip-sync mode, optional LoRAs
  5. Generate — your video with synchronized audio is ready in minutes

LTX 2.3 brings professional audio-synced AI video to everyone — open-source, affordable, and available right now on druidcat.com.

Need GPU power for your own LTX 2.3 projects? Check out RunPod for affordable cloud GPU rentals.

Druid Cat

Druid Cat

AI content creation tutorials, ComfyUI workflows, and tools for creating AI influencers. Visit our YouTube for video tutorials.