🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
vArmor is a cloud native container sandbox based on LSM. It includes multiple built-in protection rules that are ready to use out of the box.
core native components in Katalyst system, including multiple agents and centralized components
DreamID-V: Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
The official code for "GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning"
An GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.