AI 精选动态
智能评分 60
SVG生成基准测试发布
AI 推荐理由
该基准测试聚焦LLM生成SVG的能力,采用大规模人工评分,为多模态生成评估提供新参考。核心解读
Rapidata 在 ModelScope 发布 SVG 基准测试,比较 30 个前沿 LLM 的静态 SVG 生成能力。人工评估包含 188,754 次对比、500 个提示和 1,355,161 条人类响应。Claude Fable 5 Thinking 以 1232.9 ELO 排名第一。
全文
Rapidata SVG Benchmark just landed on ModelScope, comparing 30 frontier LLMs on static SVG generation from text prompts, with 1.35M+ human votes across preference, coherence, and prompt alignment. 🚀
🤖 https://t.co/DUNsYVKVHY
📊 Scale: 188,754 head-to-head comparisons, 500 prompts, 14,872 rasterized SVG images, and 1,355,161 human responses
🎨 Evaluation target: raw SVG markup generated by LLMs, rendered to 768x768 PNGs, then ranked by humans instead of automated metrics
🏆 Overall ranking: Claude Fable 5 Thinking leads with 1232.9 ELO, followed by Claude Fable 5 and Gemini 3.1 Pro Preview
License: CC-BY-4.0 for the benchmark prompts, with generated outputs governed by each model provider's terms.