🧠 AI/NLP 每日情报

2026-04-24 · 共 88 条
⭐ 编辑推荐
🏆 今日最值得一看:Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

推荐理由:该论文直接针对当前AI Agent规模化部署的核心瓶颈——工具调用开销(MCP/Tools Tax),提出了“动态工具门控”和“懒加载模式”两种创新解决方案。这对提升多Agent系统的效率和可扩展性具有极高的工程实践价值,是解决Agent落地实际难题的关键进展。

📝 今日要点

今天的AI热点集中在**大语言模型(LLM)的实用化与安全性**两大方向。最值得关注的话题是**LLM在科学自动化与智能体应用中的突破**,比如从研究问题到科学工作流的自动化(第3条)以及多智能体语言系统的端到端优化(第12条)。同时,**模型的安全与伦理问题**也频频被提及,包括非字面记忆风险(第7条)、多轮对话漏洞(第9条)以及文化偏见(第16条)。此外,**评估与交互性**成为新趋势,比如用户自定义排行榜(第14条)和叙事相似性任务(第13条),说明学界正从单纯追求性能转向更贴近真实场景的评测。整体来看,AI正从“能做什么”转向“如何更可靠、更可控地做”。

📄 arXiv 最新论文 (29 条)

👤 Thibault Bañeras-Roux, Shashi Kumar, Driss Khalil, Sergio Burdisso et al.
📂 cs.CL
Automatic Speech Recognition (ASR) is traditionally evaluated using Word Error Rate (WER), a metric that is insensitive to meaning. Embedding-based semantic metrics are better correlated with human pe...
💬 用大模型评估语音识别,开辟了ASR评测新范式,思路清奇。
👤 Zhiqiu Xu, Shibo Jin, Shreya Arya, Mayur Naik
📂 cs.CL, cs.SE
As frontier language models attain near-ceiling performance on static mathematical benchmarks, existing evaluations are increasingly unable to differentiate model capabilities, largely because they ca...
💬 让LLM既出题又解题,全方位考验数学推理能力,玩法新颖。
👤 Bartosz Balis, Michal Orzechowski, Piotr Kica, Michal Dygas et al.
📂 cs.AI
Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually convert research ques...
💬 用AI Agent自动化整个科研流程,从问题到实验一步到位。
👤 Chee Wei Tan, Yuchen Wang, Shangxin Guo
📂 cs.AI
This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this p...
💬 把LLM做成游戏AI,寓教于乐学策略,互动学习好帮手。
👤 Praval Sharma, Ashok Samal, Leen-Kiat Soh, Deepti Joshi
📂 cs.CL
Event extraction identifies the central aspects of events from text. It supports event understanding and analysis, which is crucial for tasks such as informed decision-making in emergencies. Therefore...
💬 大规模开放域事件抽取数据集,为文档级信息抽取立标杆。
👤 Jun Wang, Ziyin Zhang, Rui Wang, Hang Yu et al.
📂 cs.CL, cs.AI, cs.LG
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user ...
💬 企业级实时风险事件发现,从海量噪音中精准抓取关键信号。
👤 Yuto Nishida, Naoki Shikoda, Yosuke Kishinami, Ryo Fujii et al.
📂 cs.CL
Understanding what kinds of factual knowledge large language models (LLMs) memorize is essential for evaluating their reliability and limitations. Entity-based QA is a common framework for analyzing n...
💬 重新审视LLM非逐字记忆,揭示实体表层形式的关键作用。
👤 Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong et al.
📂 cs.CL
Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they enco...
💬 探究AI在道德困境中的行为,对比人类判断与模型决策。
👤 Naheed Rayhan, Sohely Jahan
📂 cs.CR, cs.AI
Large language models (LLMs) are increasingly integrated into sensitive workflows, raising the stakes for adversarial robustness and safety. This paper introduces Transient Turn Injection(TTI), a new ...
💬 揭露LLM多轮对话的脆弱性,用瞬态注入攻击暴露出无状态漏洞。
👤 Haolin Zhang, William Reber, Yuxuan Zhang, Guofei Gu et al.
📂 cs.CR, cs.AI
Modern phishing campaigns increasingly evade snapshot-based URL classifiers using interaction gates (e.g., checkbox/slider challenges), delayed content rendering, and logo-less credential harvesters. ...
💬 交互式URL分类工具,解耦式清单裁决让威胁研判更高效。
👤 Anuj Sadani, Deepak Kumar
📂 cs.AI
The Model Context Protocol (MCP) has become a common interface for connecting large language model (LLM) agents to external tools, but its reliance on stateless, eager schema injection imposes a hidde...
💬 动态工具门控+懒加载模式,彻底解决Agent工作流中的工具调用开销。
👤 Ye Yu, Heming Liu, Haibo Jin, Xiaopeng Yuan et al.
📂 cs.AI, cs.CL, cs.MA
Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communica...
💬 端到端优化多智能体语言系统,让AI学会高效沟通协作。
👤 Hans Ole Hatzel, Ekaterina Artemova, Haimo Paul Stiemer, Evelyn Gius et al.
📂 cs.CL
We present the shared task on narrative similarity and narrative representation learning - NSNRL (pronounced "nass-na-rel"). The task operationalizes narrative similarity as a binary classification pr...
💬 叙事故事相似度与表示学习新任务,推动语义理解更上一层楼。
👤 Minji Jung, Minjae Lee, Yejin Kim, Sarang Choi et al.
📂 cs.AI, cs.CY, cs.HC
LLM leaderboards are widely used to compare models and guide deployment decisions. However, leaderboard rankings are shaped by evaluation priorities set by benchmark designers, rather than by the dive...
💬 让用户自定义LLM排行榜评价标准,打破“最佳”的单一权威。
👤 Guangxiang Zhao, Qilong Shi, Xusen Xiao, Xiangzheng Zhang et al.
📂 cs.AI
Reasoning LLMs often spend substantial tokens on long intermediate reasoning traces (e.g., chain-of-thought) when solving new problems. We propose to summarize and store reusable reasoning skills dist...
💬 用推理技能指导LLM思考,用更少token换更高准确率。
👤 Joseba Fernandez de Landa, Carla Perez-Almendros, Jose Camacho-Collados
📂 cs.CL, cs.AI, cs.CY
LLMs have been showing limitations when it comes to cultural coverage and competence, and in some cases show regional biases such as amplifying Western and Anglocentric viewpoints. While there have be...
💬 揭秘LLM的“日本文化”迷思,揭示隐藏的文化与地域偏见。
👤 Buqiang Xu, Yijun Chen, Jizhan Fang, Ruobin Zhong et al.
📂 cs.CL, cs.AI, cs.IR, cs.LG, cs.MA
Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approa...
💬 结构化记忆助LLM实现长程行为规划,告别对话健忘症。
👤 Wujiang Xu, Jiaojiao Han, Minghao Guo, Kai Mei et al.
📂 cs.CL, cs.AI, cs.CE
LLM agents increasingly operate in open-ended environments spanning hundreds of sequential episodes, yet they remain largely stateless: each task is solved from scratch without converting past experie...
💬 智能体在开放环境中自主学习进化,迈向真正的通用智能。
👤 Minh Duc Bui, Xenia Heilmann, Mattia Cerrato, Manuel Mager et al.
📂 cs.CL, cs.SE
Prior work evaluates code generation bias primarily through simple conditional statements, which represent only a narrow slice of real-world programming and reveal solely overt, explicitly encoded bia...
💬 从if语句到ML管道,回溯代码生成中的偏见问题。
👤 Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang et al.
📂 cs.CR, cs.AI, cs.CL
The growing application of large language models (LLMs) in safety-critical domains has raised urgent concerns about their security. Many recent studies have demonstrated the feasibility of backdoor at...
💬 用自然风格触发器植入后门,LLM安全防御面临新挑战。
👤 Qizhuo Xie, Yunhui Liu, Yu Xing, Qianzi Hou et al.
📂 cs.AI, cs.CL
Large Language Models (LLMs) have shown immense potential in Knowledge Graph Completion (KGC), yet bridging the modality gap between continuous graph embeddings and discrete LLM tokens remains a criti...
💬 粒度语义+生成式结构量化,知识图谱补全效果更精准。
👤 Lester James V. Miranda, Songbo Hu, Roi Reichart, Anna Korhonen
📂 cs.CL, cs.CY
Where and how language models (LMs) are deployed determines who can benefit from them. However, there are several challenges that prevent effective deployment of LMs in non-English-speaking and hardwa...
💬 为全球南方开发边缘端语言模型,让AI普惠更多语言社区。
👤 Hao-Yuan Chen
📂 cs.CL, cs.AI
Inference-time scaling for LLM reasoning has focused on three axes: chain depth, sample breadth, and learned step-scorers (PRMs). We introduce a fourth axis, granularity of external verbal supervision...
💬 通过语言批评进行过程监督,显著提升LLM的推理能力。
👤 Kaushitha Silva, Srinath Perera
📂 cs.SE, cs.AI
Multi-agent frameworks are widely used in autonomous code generation and have applications in complex algorithmic problem-solving. Recent work has addressed the challenge of generating functionally co...
💬 探究公开测试在LLM代码生成中的作用,揭示评估盲区。
👤 Linjuan Wu, Haoran Wei, Jialong Tang, Shuang Luo et al.
📂 cs.CL
As LLMs reduce English-centric bias, a surprising trend emerges: non-English responses sometimes outperform English on reasoning tasks. We hypothesize that language functions as a latent variable that...
💬 将语言视为潜在变量优化推理,思路巧妙且实用。
👤 Maximilian Westermann, Ben Griffin, Aaron Ontoyin Yin, Zakari Salifu et al.
📂 cs.AI, cs.CE, cs.LG
Feature discovery from complex unstructured data is fundamentally a reasoning problem: it requires identifying abstractions that are predictive of a target outcome while avoiding leakage, proxies, and...
💬 控制LLM推理过程进行特征发现,
👤 Milan De Koning, Ali Asgari, Pouria Derakhshanfar, Annibale Panichella
📂 cs.SE, cs.AI
LLM-based automated program repair (APR) techniques have shown promising results in reducing debugging costs. However, prior results can be affected by data leakage: large language models (LLMs) may m...
👤 Chris Schneider, Philipp Schoenegger, Ben Bariach
📂 cs.AI, cs.LG
Current model training approaches incorporate user information directly into shared weights, making individual data removal computationally infeasible without retraining. This paper presents a three-l...
👤 Rodrigo Nogueira, Giovana Kerche Bonás, Thales Sales Almeida, Andrea Roque et al.
📂 cs.CL
Large language models increasingly shape the information people consume: they are embedded in search, consulted for professional advice, deployed as agents, and used as a first stop for questions abou...

🤗 HuggingFace 热门论文 (20 条)

👤 Yueyang Ding, HaoPeng Zhang, Rui Dai · 👍 75
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent am...
👤 Kanzhi Cheng, Zehao Li, Zheng Ma · 👍 27
Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% s...
👤 Chaitanya Dwivedi, Binxuan Huang, Himanshu Gupta · 👍 15
Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert ro...
👤 Yen-Siang Wu, Rundong Luo, Jingsen Zhu · 👍 13
How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention ...
👤 Xiyang Wu, Zongxia Li, Guangyao Shi · 👍 10
Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, an...
👤 Wenhong Zhu, Ruobing Xie, Rui Wang · 👍 9
Knowledge distillation (KD) is a powerful paradigm for compressing large language models (LLMs), whose effectiveness depends on intertwined choices of divergence direction, optimization strategy, and ...
👤 Jun Wang, Ziyin Zhang, Rui Wang · 👍 9
Real-time detection and mitigation of technical anomalies are critical for large-scale cloud-native services, where even minutes of downtime can result in massive financial losses and diminished user ...
👤 Skylar Zhai, Jingcheng Liang, Dongyeop Kang · 👍 8
Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by guessing or hallucinating missing information. Exist...
👤 Valentin Gabeur, Shangbang Long, Songyou Peng · 👍 7
Recent works show that image and video generators exhibit zero-shot visual understanding behaviors, in a way reminiscent of how LLMs develop emergent capabilities of language understanding and reasoni...
👤 Qijun Han, Haoqin Tu, Zijun Wang · 👍 5
Autonomous GUI agents face two fundamental challenges: early stopping, where agents prematurely declare success without verifiable evidence, and repetitive loops, where agents cycle through the same f...
👤 Ceyuan Yang, Zhijie Lin, Yang Zhao · 👍 5
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context ...
👤 Juyong Jiang, Chenglin Cai, Chansung Park · 👍 3
While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Exis...
👤 Yanran Zhang, Wenzhao Zheng, Yifei Li · 👍 3
In recent years, significant progress has been made in both image generation and generated image detection. Despite their rapid, yet largely independent, development, these two fields have evolved dis...
👤 Hardy Chen, Nancy Lau, Haoqin Tu · 👍 3
Frontier coding agents are increasingly used in workflows where users supervise progress primarily through repeated improvement of a public score, namely the reported score on a public evaluation file...
👤 Vipula Rawte, Ryan Rossi, Franck Dernoncourt · 👍 2
Large Language Models (LLMs) have demonstrated remarkable fluency and versatility across a wide range of NLP tasks, yet they remain prone to factual inaccuracies and hallucinations. This limitation po...
👤 Noah Flynn · 👍 2
Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performance due to negative cross-lingual interference. To...
👤 Benjamin K. Johnson, Thomas Goralski, Ayush Semwal · 👍 2
Semi-Markov Conditional Random Fields (semi-CRFs) assign labels to segments of a sequence rather than to individual positions, enabling exact inference over segment-level features and principled uncer...
👤 Yao Zhang, Zhuchenyang Liu, Thomas Ploetz · 👍 1
The world knowledge and reasoning capabilities of text-based large language models (LLMs) are advancing rapidly, yet current approaches to human motion understanding, including motion question answeri...
👤 Jaechul Roh, Amir Houmansadr · 👍 1
Prior work shows that fine-tuning aligned models on benign data degrades safety in text and vision modalities, and that proximity to harmful content in representation space predicts which samples caus...
👤 Mikhail Menschikov, Dmitry Evseev, Victoria Dochkina · 👍 0
Personalizing language models by effectively incorporating user interaction history remains a central challenge in the development of adaptive AI systems. While large language models (LLMs), combined ...

🔥 GitHub 热门项目 (7 条)

⭐ 8,845 (+706 today) · TypeScript
Code search MCP for Claude Code. Make entire codebase the context for any coding agent.
⭐ 2,821 (+228 today) · Python
Build your own AI SRE agents. The open source toolkit for the AI era ✨
⭐ 107,319 (+203 today) · Python
100+ AI Agent & RAG apps you can actually run — clone, customize, ship.
⭐ 31,240 (+188 today) · Python
LLM驱动的 A/H/美股智能分析器:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.
⭐ 7,772 (+49 today) · Python
An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.
⭐ 25,535 (+20 today) · Python
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.
⭐ 8,965 (+17 today) · Python
A collection of sample agents built with Agent Development Kit (ADK)

🤖 HuggingFace 热门模型 (6 条)

❤️ 13,307 · 📥 4,011,990 downloads · text-generation
❤️ 6,523 · 📥 3,330,280 downloads · text-generation
❤️ 6,057 · 📥 9,668,911 downloads · text-to-speech
❤️ 5,623 · 📥 4,904,571 downloads · automatic-speech-recognition
❤️ 4,998 · 📥 6,630 downloads · text-generation
❤️ 4,728 · 📥 3,666,745 downloads · text-generation

🚀 AI 新产品 (10 条)

63. BAND
Coordinate and govern multi-agent work in a single chat Discussion | Link
64. Beezi AI
Make AI development structured, secure, and cost-efficient. Discussion | Link
Emotional intelligence AI for live sales calls Discussion | Link
66. MailCue
Run as a fully hardened production email server. Discussion | Link
AI learns your listening habits and curates your live radio Discussion | Link
Find the right product, just ask Discussion | Link
69. Spira AI
AI Influencer that always on trend, create & grow your brand Discussion | Link
I design agent with full HTML/CSS control and SSR Discussion | Link
71. LifeOS
Turn your AI chats/memory to intros with real humans Discussion | Link
A Generative Audio Workstation with VSTs Discussion | Link

📢 AI 行业新闻 (16 条)

Mary Minno launches early-stage startup accelerator program, Treehub, and an early-stage firm, AI Health Fund, aimed at backing startups working at th
74. OpenAI Blog Introducing GPT-5.5
Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
75. OpenAI Blog GPT-5.5 System Card
76. OpenAI Blog GPT-5.5 Bio Bug Bounty
Explore the GPT-5.5 Bio Bug Bounty: a red-teaming challenge to find universal jailbreaks for bio safety risks, with rewards up to $25,000.
OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists, supporting clinical care, documentation,
78. OpenAI Blog Workspace agents
Learn how to build, use, and scale workspace agents in ChatGPT to automate repeatable workflows, connect tools, and streamline team operations.
A deep dive into the Codex agent loop, showing how WebSockets and connection-scoped caching reduced API overhead and improved model latency.
Workspace agents in ChatGPT are Codex-powered agents that automate complex workflows, run in the cloud, and help teams scale work across tools securel
OpenAI Privacy Filter is an open-weight model for detecting and redacting personally identifiable information (PII) in text with state-of-the-art accu
The eighth generation of Google’s TPU includes two specialized chips that will power the future of AI.
开源SOTA大模型能力一键直达
业界首个面向多Agent协作的标准化能力包规范
在Agent能力、世界知识和推理性能上均实现国内与开源领域的领先。
半马冠军荣耀机器人“闪电”惊喜亮相