Skip to content

Add Youtu-VL#1018

Open
MollySophia wants to merge 1 commit intoBlaizzy:mainfrom
MollySophia:main
Open

Add Youtu-VL#1018
MollySophia wants to merge 1 commit intoBlaizzy:mainfrom
MollySophia:main

Conversation

@MollySophia
Copy link
Copy Markdown

HF: https://huggingface.co/tencent/Youtu-VL-4B-Instruct
截屏2026-04-13 11 09 21

# molly @ MOLLYJI-MB1 in ~/workspace/mlx-vlm on git:main x .venv [11:03:33] 
$ mlx_vlm.convert \        
    --hf-path /Users/molly/workspace-youtulm-ncnn/Youtu-VL-4B-Instruct \
    --mlx-path /Users/molly/workspace/Youtu-VL-4B-Instruct-4bit-test \
    -q --q-bits 4 --trust-remote-code
[INFO] Loading
[INFO] Using dtype: bfloat16
[INFO] Quantizing
[INFO] Quantized model with 5.460 bits per weight.
# molly @ MOLLYJI-MB1 in ~/workspace/mlx-vlm on git:main x .venv [11:11:41] 
$ mlx_vlm.generate --model ../Youtu-VL-4B-Instruct-4bit-test --max-tokens 512 --prompt "Please describe this image shortly" --trust-remote-code --image /Users/molly/workspace/Youtu-VL-4B-Instruct-6bit-mlx/assets/youtu-vl-logo.png
==========
Files: ['/Users/molly/workspace/Youtu-VL-4B-Instruct-6bit-mlx/assets/youtu-vl-logo.png'] 

Prompt: <|begin_of_text|>system
You are a helpful assistant.<|end_of_text|>
<|begin_of_text|>user
<|vision_start|><|image_pad|><|vision_end|>Please describe this image shortly<|end_of_text|>
<|begin_of_text|>assistant

This image is a promotional banner for a project called “Youtu-VL.”

It features:

- A cute cartoon penguin on the left, wearing headphones and a colorful scarf, suggesting a fun or accessible AI tool.
- The main title “Youtu-VL” in large, gray, sans-serif font centered on a black background.
- The tagline below: “Unleashing Visual Potential via Unified Vision-Language Supervision,” indicating the project’s focus on combining visual and language understanding.
- A colorful abstract logo on the right, composed of three interconnected shapes in blue, pink, and teal — likely representing the Youtu Lab’s branding.

The overall design is clean, modern, and tech-oriented, aimed at conveying innovation and accessibility in vision-language AI.
==========
Prompt: 348 tokens, 131.744 tokens-per-sec
Generation: 165 tokens, 42.101 tokens-per-sec
Peak memory: 4.887 GB

Signed-off-by: mollyji <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant