AI 精选动态
智能评分 68
DeepAdapt 推出 Runtime 智能层,AI 运行成本降 82% 推理加速 33 倍
AI 推荐理由
通过量化展示成本下降和速度提升等具体收益,DeepAdapt 的技术超出许多开源方案的预期,值得直接阅读原文报告核心解读
DeepAdapt 发布了名为 Adaptive Continual Intelligence(简称 ACI)的运行时智能层,可将重复任务从 GPU 卸载到 CPU,从而将 AI 运营成本降低高达 82%,推理速度提升 33 倍。 ACI 提供的数据显示,令牌消耗降低 90%,生产成本下降 5.7 倍,中位延迟为 159 毫秒,准确率从 85% 提升至 96%,每 1000 次决策的能耗降低 85.7%,规则违规次数减少 4.8 倍,且 89.8% 的决策无需 GPU 处理。 与无 ACI 方案相比,DeepAdapt 的技术无需任何训练即可上线,从首次请求起便通过实时学习和审计持续优化,使 GPU 依赖和能耗随系统成熟而下降。
全文
DeepAdapt has launched a runtime intelligence layer that cuts AI operating costs by up to 82% and 33X faster inference by shifting repetitive workloads from GPUs to standard CPUs.
They are calling it Adaptive Continual Intelligence, ACI.
ACI is a runtime learning layer where analytical learning, supervised learning, and reinforcement learning work together while the system is already in production.
ACI is not caching, memory, a knowledge graph, routing, or a simple optimization trick.
This technique learns from model decisions, corrections, labels, outcomes, and experience, then serves known decisions locally on CPU. Only new, uncertain, or complex requests are routed back to the underlying model.
ACI can also be pre-trained for specific domains, making continual learning faster and cheaper.
DeepAdapt is rolling out first for cloud-based LLM agents, but the same architecture becomes even more important on personal devices, where compute, battery, latency, and local inference reliability are much tighter constraints.
In their benchmarks, ACI has shown up to 90% lower token consumption, 5.7X lower production-scale cost, 33X faster inference with 159 ms median latency, 96% accuracy vs. 85% without ACI, 85.7% lower energy per 1,000 decisions, and 4.8× fewer rule violations.
DeepAdapt intercepts user requests, serving known answers instantly from a standard CPU to completely bypass the expensive GPU.
New questions go to the GPU, but the system logs the output and any human corrections to learn for the next time.
This keeps the underlying language model entirely frozen while the outer software layer handles all real-time learning and auditing.
ACI requires zero training. No fine-tuning. No retraining pipelines. You wire it into your existing stack and it starts learning from real use on the very first request. Every improvement happens at runtime.
The effect: GPU dependency and cost decrease as the system matures, and energy consumption drops proportionally.
In ACI-native agents, everything else becomes a tool inside the ACI runtime: the LLM, memory, tools, knowledge graphs, prompts, workflows, APIs, and external systems. ACI decides what can be handled locally, what should be learned, what must be enforced, and when the system actually needs to fall back to the model.
Inference is becoming one of AI’s biggest cost centers. Token prices may fall, but total AI bills keep rising because usage is exploding. The real leverage is avoiding unnecessary GPU calls altogether.
With ACI, the LLM is no longer the center of the architecture, because ACI becomes the runtime intelligence layer that decides what can be inferred locally, what should be learned, what must be enforced, and when the model is actually needed.
🧵 1.

Rohan Paul (@rohanpaul_ai): Check them out here
https://t.co/DMZkSGwtIJ
Learn about their core thesis
https://t.co/zgdLDiVLtd
Rohan Paul (@rohanpaul_ai): 🧵 7. DeepAdapt’s official numbers for Adaptive Continual Intelligence — ACI.
It can keep 89.8% of decisions off the GPU, reduce energy per 1,000 decisions by 85.7%, and lower production cost by 5.7× on its 867-task run.
These gains come from serving repeated, certified decisions locally after ACI has learned them, instead of sending the same kind of work back through a GPU model every time.