🐱MaineCoon AI

Benchmarks

Numbers that back the claims

SocialVideo Bench results, inference speed, and cost metrics — sourced from the official technical report.

47.5FPS

Single H100 GPU

<3s

First frame latency

10min+

Continuous generation

<$0.001/s

Generation cost

Output quality in practice

Benchmarks measure scores — this is what SOTA looks like in motion. Real MaineCoon generation with synchronized audio (unmute to verify lip sync).

MaineCoon

SocialVideo Bench — Overall Score

Catnip's benchmark for social-interaction video, covering 7 scenarios and 9 evaluation metrics. MaineCoon surpasses all 7 compared models.

MaineCoon0.934
SoulX-FlashTalk0.895
Other baselines (×5)< 0.89

7 Scenarios

  • Dense speech
  • Two-person interaction
  • Musical performance
  • Emotional acting
  • Dance
  • Creative challenges
  • Social memes

9 Metrics

  • Visual quality
  • Motion quality
  • Audio quality
  • Audio-visual alignment
  • Overall quality
  • Temporal consistency
  • Character consistency
  • Lip sync accuracy
  • Emotional expressiveness

Speed Comparison

ModelFPSNotes
MaineCoon (22B)47.5Single H100
MaineCoon (22B)30+RTX Pro 6000
Streaming AV peers6–7Typical range
1.3B streaming video19.1MaineCoon is 2×+ faster despite 17× params
What is SocialVideo Bench?+

A benchmark created by Catnip specifically for social-interaction video generation. It evaluates models across 7 social scenarios and 9 quality metrics including visual quality, motion, audio, alignment, and consistency.

How was the 47.5 FPS measured?+

On a single NVIDIA H100 GPU during streaming inference. RTX Pro 6000 achieves 30+ FPS — sufficient for real-time playback at standard frame rates.

Can I reproduce these benchmarks?+

The technical report on arXiv contains methodology details. Model weights and code are on Hugging Face and GitHub.

Experience MaineCoon live

Input a prompt and watch real-time streaming audio-visual generation on the official platform.