Icon

HuMo

Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

* Equal contribution, Project lead, § Corresponding author

Tsinghua University Intelligent Creation Team, Bytedance

Research Paper GitHub Demo-Bilibili

Faceless Thrones

A short demo video by HuMo in TA & TIA modes. Please turn on the sound for watching.

Video Generation from Text & Image (TI)
Our method generates high-quality, text-aligned and subject-consistent videos. Hover over the video to reveal the text prompt.
Loading...
A handsome man with neatly combed hair dons a black suit and a white shirt. He then gracefully puts on a pair of rich, deep brown leather gloves with fine stitching and a slightly ruched wrist area. The man's posture is confident as he focuses on the task of wearing the gloves, and the frame captures the entire scene in a complete and detailed manner.
Loading...
A woman with long, curly black hair, wearing a light - purple knitted short - sleeved top, a necklace, and pearl earrings, is sound asleep on a large bed in a room. She has on silver - gray over - ear headphones. The room features round portholes with a view of the blue sea outside, wooden wall panels, and a white table lamp. A small Chihuahua with brown and white fur and perked - up ears is quietly lying on the bed beside her.
Loading...
A woman with short black hair and golden spherical earrings dons a casual outfit consisting of a light - beige short - sleeved top printed with "The Sneaker Culture" and light - blue ripped jeans. She also adorns a flower - pearl necklace decorated with pearls, white flowers, and a transparent butterfly. The woman appears fully in the frame, facing the camera, showcasing her stylish ensemble.
Loading...
A man with gray-white hair, wearing a dark green sweater, a man with glasses, dressed in a suit and a blue-and-white striped tie, a bearded man in a dark blue T-shirt, and a woman with long flowing hair, wearing a black off the shoulder top, step into an ancient Chinese Buddhist temple. A slow push in shot, starting from a long shot and moving to a close up.
Loading...
Warm sunlight bathes the grassy field. A young girl with twin-tails and green bows in her hair, wearing a light-green frilly dress, crouches beside blooming daisies. A brown and white dog, its fluffy tail wagging merrily, walks toward her. The girl smiles and holds up a toy camera with a yellow and red color scheme and blue buttons, ready to capture the moment. Slow motion.
Loading...
A young witch, adorned with a large red bow on her head, wearing a black top and a white apron, takes flight on a broomstick. Accompanying her is a black kitten with a red bow around its neck. They soar through the gaps between lush, green trees, where sunlight filters through the leaves. Above them is a clear blue sky dotted with fluffy white clouds.
Video Generation from Text & Audio (TA)
Our method generates text-aligned and audio-synced videos. Please turn on the sound for watching. Hover over the video to reveal the text prompt.
Loading...
A handheld tracking shot follows a female warrior walking through a cave. Her determined eyes are locked straight ahead as she grips a blazing torch tightly in her hand. She speaks with intensity.
Loading...
A sophisticated woman aged 25 wears a chic, season-appropriate blazer and blouse, embodying an intellectual host. Outdoors on a high-end street, she speaks professionally, gesturing with her hands and slightly raising her right palm outward.
Loading...
A woman dressed as a mermaid swims gracefully underwater, her long wavy blonde hair flowing freely. She wears a silver tail and a patterned top, with delicate earrings. Sunlight filters through the clear blue water, creating a serene, magical atmosphere as she explores peacefully.
Loading...
A close-up of an elderly sailor in weathered yellow oilskins, holding a pipe and speaking with a serious expression while looking right. His yellow cat curls up beside him as gentle waves lap the hull and seabirds circle overhead. The static camera captures his expressive face and upper body throughout.
Loading...
A medium shot of a man in a red soccer jersey clenching his fists in anger as he witnesses an injustice unfolding on the field. He speaks while his hands remain firmly curled, his emotions evident in every gesture. The blurred background of the soccer field and players adds to the intensity of the moment.
Loading...
A scientist with shoulder-length blonde hair, wearing a white lab coat over a dark blue blouse, sits right in a lab chair, curiously tilting a vial of glowing blue liquid. Facing left, she speaks to a partially blurred colleague across from her in a high-tech lab. The camera remains static throughout.
Video Generation from Text, Image & Audio (TIA)
Our method generates text-aligned, subject-consistent, and audio-synced video. Please turn on the sound for watching. Hover over the video to reveal the text prompt.
Loading...
A close-up of a flight attendant with short blonde hair, seated upright in a tan leather airplane seat, speaking on a silver corded phone. She wears a dark uniform with a matching neck scarf, maintaining a composed, professional demeanor. Closed window blinds and a softly lit table lamp set the cabin background as she continues speaking, facing forward.
Loading...
A camera tracking shot of a female warrior rushing, her armor glinting under the light. Her determined eyes are locked ahead. She speaks with intensity while charging forward
Loading...
A woman in a bulky space suit with a large helmet speaks to the camera against a blurred Mars backdrop. Her face remains clearly visible, framed by loose strands of tied-back hair and dark, determined eyes. A static close-up captures her expressive face and upper body as she continues speaking.
Loading...
A woman with long, wavy dark hair walks through a flower-filled park, wearing a light blue-gray, long-sleeved dress with a tied waist—matching one seen on a hanger. Warm, window-like light bathes her figure, highlighting the dress's elegant design and her graceful movements.
Loading...
A man with short dark hair, wearing a crisp white shirt and dark suit, joyfully plays with his black Labrador Retriever in front of a cozy house. Smiling brightly, he tosses a toy as the sleek, droopy-eared dog responds eagerly on the well-kept lawn, creating a warm, lively moment under the open sky.
Loading...
A man with wavy brown hair, wearing a light collared shirt and a cheerful smile, energetically plays with a brown leather football on a sunlit beach. The golden sand glistens, waves lap the shore, and seagulls soar under a clear blue sky, as he enthusiastically passes and catches the ball in the lively seaside breeze.
Loading...
In a neon-lit cyberpunk corridor, a young woman with long silvery hair moves gracefully in black cybernetic armor with glowing blue accents. Her suit reveals her shoulders and midriff, with mechanical gear on her arm, embodying the futuristic vibe.
Loading...
A reddish-brown haired and bearded man sits pensively against swirling blue-and-white brushstrokes, dressed in a blue coat and dark waistcoat. The artistic backdrop and his thoughtful pose evoke a Post-Impressionist style in a studio-like setting.
Loading...
On a blue snowflake-patterned stage with floating ice particles, a woman with a long blonde braid sings passionately, wearing a shimmering teal gown with sheer sleeves, expressing emotion with an extended arm.
Text Control / Edit
Please turn on the sound for watching. Given the same subject reference images but different text prompts, our method achieves collaborative text-image controllability.
Loading...
1. He wears light gray thin framed glasses, a white inner shirt, and a black outer coat, with short black hair.
2. He wears stylish black rectangular glasses, a light blue dress shirt, and a tailored navy blazer, with neatly styled short brown hair.
Loading...
1. He wears sunglasses, a dark suit jacket over a crisp white dress shirt.
2. He is wears a stylish fedora hat, a tailored navy blazer over a light blue dress shirt, and a vibrant patterned tie. His accessories include a sleek watch and a leather briefcase.
Loading...
1. A man donning a light blue button down shirt with a chest pocket.
2. A man wearing a crisp white t-shirt layered under a stylish navy blazer. He sports a sleek watch on his wrist and a pair of trendy sunglasses resting on his head.
Loading...
1. A baby with short, light brown hair. The baby is clad in a garment featuring a blue striped collar and gray fabric adorned with a small dark emblem.
2. A baby with short, light brown hair styled in soft curls. The baby is dressed in a vibrant red onesie with a white polka dot collar and gray fabric adorned with a small dark emblem. The baby wears a cute matching headband with a bow, adding a playful touch.
Loading...
1. A woman with platinum blonde hair that transitions into aqua - green at the tips. She wears long dangling earrings.
2. A woman with deep chestnut hair styled in loose waves, adorned with a delicate floral headband. She wears elegant gold hoop earrings and a flowing white sundress that complements the lush greenery around her.
Loading...
1. A man dressed in a tailored dark gray suit with a small red emblem on the left lapel, a crisp white dress shirt, and a dark blue tie adorned with light - blue square patterns.
2. The video presents a close-up shot of a young man. He has short, styled black hair with a modern quiff. A silver chain necklace is around his neck, complementing a fitted black turtleneck sweater.
Subject Preservation Comparison
Compared with other methods, our method shows strong ability on text following and subject preservation.
The notable points in the video are marked in red in the given prompt.
Loading...
A young witch, adorned with a large red bow on her head, wearing a black top and a white apron, takes flight on a broomstick. Accompanying her is a black kitten with a red bow around its neck. They soar through the gaps between lush, green trees, where sunlight filters through the leaves. Above them is a clear blue sky dotted with fluffy white clouds.
Loading...
A handsome man with neatly combed hair dons a black suit and a white shirt. He then gracefully puts on a pair of rich, deep brown leather gloves with fine stitching and a slightly ruched wrist area. The man's posture is confident as he focuses on the task of wearing the gloves, and the frame captures the entire scene in a complete and detailed manner.
Loading...
A gray-haired man, a suited man with glasses, a bearded man, and a woman in a black top, step into an ancient Chinese Buddhist temple. A long shot slowly moving to a close up.
Audio-Visual Sync Comparison
Please turn on the sound for watching. Compared with other methods, our method shows strong ability on text following, subject preservation and audio-visual sync.
The notable points in the video are marked in red in the given prompt.
Loading...
A man in a checkered shirt and headphones sings, plays a silver guitar, and speaks to the camera in a recording studio. A static front shot captures his rhythmic movements and deeply focused, emotionally engaged expression against a lit, card-decorated black wall.
Loading...
A gray-haired, bearded man in his 60s speaks thoughtfully to the camera at a Paris café. Wearing a wool coat, button-down shirt, brown beret, and glasses, he has a professorial look. A static close-up captures his mostly still upper body, lit by cinematic golden light with Parisian streets in the background.
Loading...
A woman in a bulky space suit with a large helmet speaks to the camera against a blurred Mars backdrop. Her face remains clearly visible, framed by loose strands of tied-back hair and dark, determined eyes. A static close-up captures her expressive face and upper body as she continues speaking.