Latest in Open Source Multimodal AI (open.substack.com)
from yogthos@lemmy.ml to technology@lemmy.ml on 23 Dec 16:10
https://lemmy.ml/post/40709616

PE-AV - Audiovisual Perception with Code

preview.redd.it/k6lp7cgbou8g1.png?width=1456&form…

T5Gemma 2 - Open Encoder-Decoder

Qwen-Image-Layered - Open Image Decomposition

reddit.com/link/1ptg2x9/video/…/player

N3D-VLM - Open 3D Vision-Language Model

reddit.com/link/1ptg2x9/video/…/player

Generative Refocusing - Open Depth Control

StereoPilot - Open 2D to 3D Conversion

reddit.com/link/1ptg2x9/video/…/player

Chatterbox Turbo - MIT Licensed TTS

reddit.com/link/1ptg2x9/video/…/player

FunctionGemma - Open Function Calling

FoundationMotion - Open Motion Analysis

DeContext - Open Image Protection

EgoX - Open Perspective Transformation

reddit.com/link/1ptg2x9/video/…/player

Step-GUI - Open GUI Automation

IC-Effect - Open Video Effects

#technology

threaded - newest