人工智能学术速递[12.16]

栏目：娱乐资讯时间：2023-08-14

　　cs.AI人工智能，共计42篇

　　【1】 Textless Speech-to-Speech Translation on Real Data 标题：基于真实数据的无文本语音到语音翻译链接：https://arxiv.org/abs/2112.08352

　　作者：Ann Lee,Hongyu Gong,Paul-Ambroise Duquenne,Holger Schwenk,Peng-Jen Chen,Changhan Wang,Sravya Popuri,Juan Pino,Jiatao Gu,Wei-Ning Hsu 摘要：我们提出了一个无文本语音转换（S2ST）系统，该系统可以将语音从一种语言转换为另一种语言，并且不需要任何文本数据。与现有文献中的工作不同，我们解决了多说话人目标语音建模的挑战，并使用真实世界的S2ST数据对系统进行训练。我们的方法的关键是一种基于自我监督单元的语音规范化技术，它使用来自多个说话人和一个参考说话人的成对音频对预先训练的语音编码器进行微调，以减少由于口音引起的变化，同时保留词汇内容。在语音标准化的配对数据只有10分钟的情况下，与在非标准化语音目标上训练的基线相比，在vp~S2ST数据集上训练S2ST模型时，我们平均获得3.2 BLEU增益。我们还加入了自动挖掘的S2ST数据，并显示了额外的2.0 BLEU增益。据我们所知，我们是第一个建立无文本S2ST技术的人，该技术可以使用真实世界的数据进行训练，并适用于多种语言对。摘要：We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. Different from existing work in the literature, we tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data. The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker to reduce the variations due to accents, while preserving the lexical content. With only 10 minutes of paired data for speech normalization, we obtain on average 3.2 BLEU gain when training the S2ST model on the vp~S2ST dataset, compared to a baseline trained on un-normalized speech target. We also incorporate automatically mined S2ST data and show an additional 2.0 BLEU gain. To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs.

　　【2】 Rethinking Influence Functions of Neural Networks in the Over-parameterized Regime 标题：对过度参数化状态下神经网络影响函数的再思考链接：https://arxiv.org/abs/2112.08297

　　作者：Rui Zhang,Shihua Zhang 备注：To appear in AAAI 2022 摘要：理解神经网络的黑盒预测是一个挑战。为了实现这一点，早期的研究设计了影响函数（IF）来测量移除单个训练点对神经网络的影响。然而，经典的隐式Hessian向量积（IHVP）方法计算IF是脆弱的，在神经网络环境下对IF的理论分析仍然缺乏。为此，我们利用神经切线核（NTK）理论计算了用正则化均方损失训练的神经网络的IF，并证明了当两层ReLU网络的宽度足够大时，近似误差可以任意小。我们分析了经典IHVP方法在过参数化情况下的误差界，以了解其失败的时间和原因。具体而言，我们的理论分析表明：（1）IHVP的精度取决于正则化项，在弱正则化条件下，IHVP的精度很低；（2） IHVP的准确性与相应训练点的概率密度显著相关。我们进一步借用NTK的理论来更好地理解IFs，包括量化有影响样本的复杂性和描述IFs在训练动态过程中的变化。对真实数据的数值实验证实了我们的理论结果并证明了我们的发现。摘要：Understanding the black-box prediction for neural networks is challenging. To achieve this, early studies have designed influence function (IF) to measure the effect of removing a single training point on neural networks. However, the classic implicit Hessian-vector product (IHVP) method for calculating IF is fragile, and theoretical analysis of IF in the context of neural networks is still lacking. To this end, we utilize the neural tangent kernel (NTK) theory to calculate IF for the neural network trained with regularized mean-square loss, and prove that the approximation error can be arbitrarily small when the width is sufficiently large for two-layer ReLU networks. We analyze the error bound for the classic IHVP method in the over-parameterized regime to understand when and why it fails or not. In detail, our theoretical analysis reveals that (1) the accuracy of IHVP depends on the regularization term, and is pretty low under weak regularization; (2) the accuracy of IHVP has a significant correlation with the probability density of corresponding training points. We further borrow the theory from NTK to understand the IFs better, including quantifying the complexity for influential samples and depicting the variation of IFs during the training dynamics. Numerical experiments on real-world data confirm our theoretical results and demonstrate our findings.

　　【3】 Programming Knowledge Tracing: A Comprehensive Dataset and A New Model 标题：编程知识跟踪：一个综合数据集和一个新模型链接：https://arxiv.org/abs/2112.08273

　　作者：Renyu Zhu,Dongxiang Zhang,Chengcheng Han,Ming Gao,Xuesong Lu,Weining Qian,Aoying Zhou 摘要：在本文中，我们研究了编程教育领域中的知识追踪，并做出了两个重要贡献。首先，我们收集并发布了迄今为止最全面的数据集，即BePKT，它涵盖了OJ系统中的各种在线行为，包括编程文本问题、知识注释、用户提交的代码和系统记录的事件。其次，我们提出了一个新的模型PDKT，以利用丰富的情境进行准确的学生行为预测。更具体地说，我们构造了一个用于编程问题嵌入的二部图，并设计了用于代码嵌入的改进预训练模型PLCodeBERT，以及用于有效特征融合的具有指数衰减注意的双序列RNN模型。在新数据集BePKT上的实验结果表明，我们提出的模型在编程知识跟踪方面建立了最先进的性能。此外，我们验证了基于PLCodeBERT的代码嵌入策略是对现有知识跟踪模型的补充，以进一步提高其准确性。作为副产品，PLCodeBERT还可以在其他编程相关任务（如代码克隆检测）中获得更好的性能。摘要：In this paper, we study knowledge tracing in the domain of programming education and make two important contributions. First, we harvest and publish so far the most comprehensive dataset, namely BePKT, which covers various online behaviors in an OJ system, including programming text problems, knowledge annotations, user-submitted code and system-logged events. Second, we propose a new model PDKT to exploit the enriched context for accurate student behavior prediction. More specifically, we construct a bipartite graph for programming problem embedding, and design an improved pre-training model PLCodeBERT for code embedding, as well as a double-sequence RNN model with exponential decay attention for effective feature fusion. Experimental results on the new dataset BePKT show that our proposed model establishes state-of-the-art performance in programming knowledge tracing. In addition, we verify that our code embedding strategy based on PLCodeBERT is complementary to existing knowledge tracing models to further enhance their accuracy. As a side product, PLCodeBERT also results in better performance in other programming-related tasks such as code clone detection.

　　【4】 Prescriptive Machine Learning for Automated Decision Making: Challenges and Opportunities 标题：用于自动决策的说明性机器学习：挑战和机遇链接：https://arxiv.org/abs/2112.08268

　　作者：Eyke Hüllermeier 摘要：机器学习（ML）的最新应用表明，从主要用于预测（基本事实）的模型的数据驱动构建的意义上来说，机器学习（ML）在预测建模中的使用发生了明显的变化，它在规定建模中的使用。这意味着学习一个模型的任务，该模型规定了现实世界场景中正确行动方案的适当决策：应该应用哪种药物治疗？这个人应该被雇用来做这项工作吗？如本文所述，规范性建模带来了新的学习技术条件，以及关于可靠性、责任和决策道德的新要求。因此，为了支持决策代理的数据驱动设计，以理性的方式，同时以负责任的方式，需要严格的方法学基础的规定ML。这篇短文的目的是阐述规定性ML的具体特征，并强调它所暗示的一些关键挑战。此外，通过与当代人工智能研究的其他分支的联系，提倡将规定性ML建立在（广义）决策理论框架中。摘要：Recent applications of machine learning (ML) reveal a noticeable shift from its use for predictive modeling in the sense of a data-driven construction of models mainly used for the purpose of prediction (of ground-truth facts) to its use for prescriptive modeling. What is meant by this is the task of learning a model that stipulates appropriate decisions about the right course of action in real-world scenarios: Which medical therapy should be applied? Should this person be hired for the job? As argued in this article, prescriptive modeling comes with new technical conditions for learning and new demands regarding reliability, responsibility, and the ethics of decision making. Therefore, to support the data-driven design of decision-making agents that act in a rational but at the same time responsible manner, a rigorous methodological foundation of prescriptive ML is needed. The purpose of this short paper is to elaborate on specific characteristics of prescriptive ML and to highlight some key challenges it implies. Besides, drawing connections to other branches of contemporary AI research, the grounding of prescriptive ML in a (generalized) decision-theoretic framework is advocated.

　　【5】 Est-ce que vous compute? Code-switching, cultural identity, and AI 标题：Est-ce Que Vous Computer？语码转换、文化认同与人工智能链接：https://arxiv.org/abs/2112.08256

　　作者：Arianna Falbo,Travis LaCroix 备注：19 pages. Under Review. Please cite published version, if available 摘要：文化语码转换关系到我们如何调整自己的整体行为、说话方式和外表，以应对社会环境的变化。我们认为有必要研究人工智能系统中的文化代码转换能力。我们探讨了一系列的伦理和认知问题，当文化代码转换对人工智能产生影响时会出现这些问题。基于Dotson（2014）对证明性窒息的分析，我们讨论了人工智能中的新兴技术如何导致认知压迫，特别是一种我们称之为“文化窒息”的自我沉默。如果不解决文化代码转换的社会动态特征，人工智能系统可能会扩大机会差距，进一步巩固社会不平等，从而对已经边缘化的社会群体产生负面影响。摘要：Cultural code-switching concerns how we adjust our overall behaviours, manners of speaking, and appearance in response to a perceived change in our social environment. We defend the need to investigate cultural code-switching capacities in artificial intelligence systems. We explore a series of ethical and epistemic issues that arise when bringing cultural code-switching to bear on artificial intelligence. Building upon Dotson's (2014) analysis of testimonial smothering, we discuss how emerging technologies in AI can give rise to epistemic oppression, and specifically, a form of self-silencing that we call 'cultural smothering'. By leaving the socio-dynamic features of cultural code-switching unaddressed, AI systems risk negatively impacting already-marginalised social groups by widening opportunity gaps and further entrenching social inequalities.

　　【6】 An Experimental Study of the Impact of Pre-training on the Pruning of a Convolutional Neural Network 标题：预训练对卷积神经网络修剪影响的实验研究链接：https://arxiv.org/abs/2112.08227

　　作者：Nathan Hubens,Matei Mancas,Bernard Gosselin,Marius Preda,Titus Zaharia 备注：7 pages, published at APPIS 2020 摘要：近年来，深度神经网络在各个应用领域都取得了广泛的成功。然而，它们需要重要的计算和内存资源，这严重阻碍了它们的部署，尤其是在移动设备或实时应用程序上。神经网络通常涉及大量的参数，这些参数对应于网络的权值。通过训练过程获得的这些参数是网络性能的决定因素。然而，它们也是高度冗余的。剪枝方法主要是通过识别和去除不相关的权重来减少参数集的大小。在本文中，我们检验了训练策略对修剪效率的影响。考虑并比较了两种训练模式：（1）微调和（2）从头开始。在四个数据集（CIFAR10、CIFAR100、SVHN和Caltech101）和两个不同CNN（VGG16和MobileNet）上获得的实验结果表明，在大型语料库（如ImageNet）上预先训练，然后在特定数据集上进行微调的网络可以比传统网络更有效地修剪（高达80%的参数缩减）同样的网络从零开始训练。摘要：In recent years, deep neural networks have known a wide success in various application domains. However, they require important computational and memory resources, which severely hinders their deployment, notably on mobile devices or for real-time applications. Neural networks usually involve a large number of parameters, which correspond to the weights of the network. Such parameters, obtained with the help of a training process, are determinant for the performance of the network. However, they are also highly redundant. The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights. In this paper, we examine the impact of the training strategy on the pruning efficiency. Two training modalities are considered and compared: (1) fine-tuned and (2) from scratch. The experimental results obtained on four datasets (CIFAR10, CIFAR100, SVHN and Caltech101) and for two different CNNs (VGG16 and MobileNet) demonstrate that a network that has been pre-trained on a large corpus (e.g. ImageNet) and then fine-tuned on a particular dataset can be pruned much more efficiently (up to 80% of parameter reduction) than the same network trained from scratch.

　　【7】 Single Image Automatic Radial Distortion Compensation Using Deep Convolutional Network 标题：基于深卷积网络的单幅图像径向畸变自动补偿链接：https://arxiv.org/abs/2112.08198

　　作者：Igor Janos,Wanda Benesova 摘要：在许多计算机视觉领域，输入图像必须符合针孔相机模型，即现实世界中的直线作为图像中的直线投影。在体育直播画面上执行计算机视觉任务带来了挑战性的要求，算法不能依赖于特定的校准模式，必须能够处理未知和未校准的摄像机、源自复杂电视镜头的径向失真、通过以下方式补偿失真的少量视觉线索：，以及实时性能的必要性。提出了一种基于深度卷积神经网络的单图像镜头畸变自动补偿方法，该方法利用多项式畸变模型的两个最高阶系数，在体育广播应用领域具有实时性和准确性。关键词：深卷积神经网络、径向畸变、单幅图像校正摘要：In many computer vision domains, the input images must conform with the pinhole camera model, where straight lines in the real world are projected as straight lines in the image. Performing computer vision tasks on live sports broadcast footage imposes challenging requirements where the algorithms cannot rely on a specific calibration pattern must be able to cope with unknown and uncalibrated cameras, radial distortion originating from complex television lenses, few visual clues to compensate distortion by, and the necessity for real-time performance. We present a novel method for single-image automatic lens distortion compensation based on deep convolutional neural networks, capable of real-time performance and accuracy using two highest-order coefficients of the polynomial distortion model operating in the application domain of sports broadcast. Keywords: Deep Convolutional Neural Network, Radial Distortion, Single Image Rectification

　　【8】 N3H-Core: Neuron-designed Neural Network Accelerator via FPGA-based Heterogeneous Computing Cores 标题：N3H-Core：基于FPGA的异构计算核神经元设计的神经网络加速器链接：https://arxiv.org/abs/2112.08193

　　作者：Yu Gong,Zhihan Xu,Zhezhi He,Weifeng Zhang,Xiaobing Tu,Xiaoyao Liang,Li Jiang 备注：11 pages, 12 figures, In Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'22), February 27-March 1, 2022, Virtual Event, CA, USA 摘要：由于FPGA的可重构性和高性能计算能力本质上满足了快速进化神经算法的计算需求，因此，利用FPGA加速神经网络推理成为一种流行的选择。然而，目前流行的FPGA上的神经加速器（如Xilinx-DPU）主要利用DSP资源构建处理单元，而丰富的LUT资源没有得到很好的利用。本文采用软硬件协同设计的方法，开发了一个基于FPGA的异构神经网络加速计算系统。从硬件角度来看，该加速器由基于DSP和LUT的通用矩阵乘法（GEMM）计算核心组成，以异构方式构成整个计算系统。基于DSP和LUT的GEMM核通过统一指令集体系结构（ISA）和统一缓冲区进行计算。沿着神经网络推理路径的数据流，卷积/全连接层的计算分为两部分，由基于DSP和LUT的GEMM内核异步处理。从软件的角度，我们从数学和系统的角度，针对不同的系统设计配置，对所提出的异构加速器的延迟和资源利用率进行建模。通过利用强化学习技术，我们构建了一个框架来实现目标异构加速器设计规范的端到端选择和优化，包括工作负载分割策略、混合精度量化方案以及DSP和LUT核的资源分配。基于所提出的设计框架和异构计算系统，我们的设计优于最先进的混合匹配设计，延迟减少了1.12-1.32倍，推理精度更高。N3H堆芯在以下位置开源：https://github.com/elliothe/N3H_Core. 摘要：Accelerating the neural network inference by FPGA has emerged as a popular option, since the reconfigurability and high performance computing capability of FPGA intrinsically satisfies the computation demand of the fast-evolving neural algorithms. However, the popular neural accelerators on FPGA (e.g., Xilinx DPU) mainly utilize the DSP resources for constructing their processing units, while the rich LUT resources are not well exploited. Via the software-hardware co-design approach, in this work, we develop an FPGA-based heterogeneous computing system for neural network acceleration. From the hardware perspective, the proposed accelerator consists of DSP- and LUT-based GEneral Matrix-Multiplication (GEMM) computing cores, which forms the entire computing system in a heterogeneous fashion. The DSP- and LUT-based GEMM cores are computed w.r.t a unified Instruction Set Architecture (ISA) and unified buffers. Along the data flow of the neural network inference path, the computation of the convolution/fully-connected layer is split into two portions, handled by the DSP- and LUT-based GEMM cores asynchronously. From the software perspective, we mathematically and systematically model the latency and resource utilization of the proposed heterogeneous accelerator, regarding varying system design configurations. Through leveraging the reinforcement learning technique, we construct a framework to achieve end-to-end selection and optimization of the design specification of target heterogeneous accelerator, including workload split strategy, mixed-precision quantization scheme, and resource allocation of DSP- and LUT-core. In virtue of the proposed design framework and heterogeneous computing system, our design outperforms the state-of-the-art Mix&Match design with latency reduced by 1.12-1.32x with higher inference accuracy. The N3H-core is open-sourced at: https://github.com/elliothe/N3H_Core.

　　【9】 Planning with Biological Neurons and Synapses 标题：利用生物神经元和突触进行规划链接：https://arxiv.org/abs/2112.08186

　　作者：Francesco d'Amore,Daniel Mitropolsky,Pierluigi Crescenzi,Emanuele Natale,Christos H. Papadimitriou 摘要：我们重新讨论了块世界中的规划问题，并为此任务实现了一个已知的启发式方法。重要的是，我们的实现在生物学上是合理的，因为它完全是通过神经元的脉冲来实现的。尽管在过去的五十年里，区块世界已经取得了很多成就，但我们相信这是同类算法中的第一个。输入是编码初始块堆栈集和目标集的符号序列，输出是运动命令序列，如“将顶部块放入表上堆栈1”。该程序是在汇编演算中编写的，汇编演算是最近提出的一种计算框架，旨在通过弥合神经活动和认知功能之间的差距来模拟大脑中的计算。它的基本对象是神经元的集合（稳定的神经元集合，它们的同时放电意味着主体正在思考一个对象、概念、单词等），它的命令包括投射和合并，它的执行模型基于广泛接受的神经科学原理。这个框架中的一个程序基本上建立了一个神经元和突触的动态系统，最终以很高的概率完成了任务。这项工作的目的是从经验上证明，汇编演算中合理的大型程序能够正确可靠地执行；而这种相当现实的——如果理想化的话——更高的认知功能，比如街区世界的规划，可以通过这样的程序成功地实现。摘要：We revisit the planning problem in the blocks world, and we implement a known heuristic for this task. Importantly, our implementation is biologically plausible, in the sense that it is carried out exclusively through the spiking of neurons. Even though much has been accomplished in the blocks world over the past five decades, we believe that this is the first algorithm of its kind. The input is a sequence of symbols encoding an initial set of block stacks as well as a target set, and the output is a sequence of motion commands such as ``put the top block in stack 1 on the table''. The program is written in the Assembly Calculus, a recently proposed computational framework meant to model computation in the brain by bridging the gap between neural activity and cognitive function. Its elementary objects are assemblies of neurons (stable sets of neurons whose simultaneous firing signifies that the subject is thinking of an object, concept, word, etc.), its commands include project and merge, and its execution model is based on widely accepted tenets of neuroscience. A program in this framework essentially sets up a dynamical system of neurons and synapses that eventually, with high probability, accomplishes the task. The purpose of this work is to establish empirically that reasonably large programs in the Assembly Calculus can execute correctly and reliably; and that rather realistic -- if idealized -- higher cognitive functions, such as planning in the blocks world, can be implemented successfully by such programs.

　　【10】 Learning Cross-Lingual IR from an English Retriever 标题：向英语检索者学习跨语言信息检索链接：https://arxiv.org/abs/2112.08185

　　作者：Yulong Li,Martin Franz,Md Arafat Sultan,Bhavani Iyer,Young-Suk Lee,Avirup Sil 备注：6 pages 摘要：我们提出了一个新的跨语言信息检索（CLIR）模型，该模型使用多阶段知识提取（KD）进行训练。教师和学生是异构系统，前者是一个依赖机器翻译和单语信息检索的管道，而后者执行单一的CLIR操作。我们证明，学生可以通过优化两个相应的KD目标来学习多语言表示和CLIR。使用一种新的跨语言对齐算法从纯英语检索器学习多语言表示，该算法贪婪地重新定位教师标记以进行对齐。对XOR-TyDi基准测试的评估表明，该模型比现有的跨语言标记IR数据微调方法更有效，精度提高25.4%Recall@5kt. 摘要：We present a new cross-lingual information retrieval (CLIR) model trained using multi-stage knowledge distillation (KD). The teacher and the student are heterogeneous systems-the former is a pipeline that relies on machine translation and monolingual IR, while the latter executes a single CLIR operation. We show that the student can learn both multilingual representations and CLIR by optimizing two corresponding KD objectives. Learning multilingual representations from an English-only retriever is accomplished using a novel cross-lingual alignment algorithm that greedily re-positions the teacher tokens for alignment. Evaluation on the XOR-TyDi benchmark shows that the proposed model is far more effective than the existing approach of fine-tuning with cross-lingual labeled IR data, with a gain in accuracy of 25.4 Recall@5kt.

　　【11】 Interpretable Feature Learning Framework for Smoking Behavior Detection 标题：用于吸烟行为检测的可解释特征学习框架链接：https://arxiv.org/abs/2112.08178

　　作者：Nakayiza Hellen,Ggaliwango Marvin 备注：15 pages 摘要：事实证明，在公共场所吸烟对不吸烟者的危害更大，这使其成为一个巨大的公共卫生问题，迫切需要当局采取积极措施并予以关注。随着世界迈向第四次工业革命，有必要对智能城市内外的这种有害的醉人行为采取可靠的环保检测措施。我们开发了一个用于吸烟行为检测的可解释特征学习框架，该框架利用深度学习VGG-16预训练网络预测和分类输入图像类别，并利用分层相关传播（LRP）解释基于最相关学习特征的网络检测或吸烟行为预测像素或神经元。网络的分类决策主要基于口部的特征，尤其是烟雾对网络决策的重要性。烟雾的轮廓突出显示为相应类别的证据。一些元素被视为对烟雾神经元有负面影响，因此会以不同方式突出显示。有趣的是，网络根据图像区域区分重要和不重要的特征。该技术还可以检测其他可吸烟药物，如大麻、什叶草、大麻等。该框架允许根据政府的监管健康政策，在学校、购物中心、公交车站、铁路车厢或其他违规吸烟场所等不安全区域可靠地识别基于行动的吸烟者。在吸烟区安装明确的装置后，这项技术可以检测到范围之外的吸烟者。摘要：Smoking in public has been proven to be more harmful to nonsmokers, making it a huge public health concern with urgent need for proactive measures and attention by authorities. With the world moving towards the 4th Industrial Revolution, there is a need for reliable eco-friendly detective measures towards this harmful intoxicating behavior to public health in and out of smart cities. We developed an Interpretable feature learning framework for smoking behavior detection which utilizes a Deep Learning VGG-16 pretrained network to predict and classify the input Image class and a Layer-wise Relevance Propagation (LRP) to explain the network detection or prediction of smoking behavior based on the most relevant learned features or pixels or neurons. The network's classification decision is based mainly on features located at the mouth especially the smoke seems to be of high importance to the network's decision. The outline of the smoke is highlighted as evidence for the corresponding class. Some elements are seen as having a negative effect on the smoke neuron and are consequently highlighted differently. It is interesting to see that the network distinguishes important from unimportant features based on the image regions. The technology can also detect other smokeable drugs like weed, shisha, marijuana etc. The framework allows for reliable identification of action-based smokers in unsafe zones like schools, shopping malls, bus stops, railway compartments or other violated places for smoking as per the government's regulatory health policies. With installation clearly defined in smoking zones, this technology can detect smokers out of range.

　　【12】 AMSER: Adaptive Multi-modal Sensing for Energy Efficient and Resilient eHealth Systems 标题：AMSER：适用于高能效和高弹性电子健康系统的自适应多模式感知链接：https://arxiv.org/abs/2112.08176

　　作者：Emad Kasaeyan Naeini,Sina Shahhosseini,Anil Kanduri,Pasi Liljeberg,Amir M. Rahmani,Nikil Dutt 摘要：电子健康系统通过持续监测生理和环境数据，为用户提供关键的数字医疗保健和健康服务。eHealth应用程序使用多模式机器学习内核来分析来自不同传感器模式的数据并自动化决策。感官数据采集过程中的噪声输入和运动伪影会影响i）电子健康服务的预测准确性和弹性，以及ii）处理垃圾数据的能效。监测原始感官输入，以识别和删除噪声模式中的数据和特征，可以提高预测精度和能源效率。我们提出了一个用于多模式电子健康应用的闭环监控框架AMSER，该框架可通过i）监控输入模式，ii）分析原始输入，有选择地丢弃噪声数据和特征，减少垃圾中的垃圾，以及iii）选择适合配置数据和特征向量的适当机器学习模型，以提高预测精度和能源效率。我们评估了我们的AMSER方法，通过不同传感器模式产生的不同水平和类型的噪声成分，使用疼痛评估和压力监测的多模式电子健康应用。与最先进的多模式监测应用相比，我们的方法在预测精度上提高了22%，在传感阶段能耗降低了5.6$倍。摘要：eHealth systems deliver critical digital healthcare and wellness services for users by continuously monitoring physiological and contextual data. eHealth applications use multi-modal machine learning kernels to analyze data from different sensor modalities and automate decision-making. Noisy inputs and motion artifacts during sensory data acquisition affect the i) prediction accuracy and resilience of eHealth services and ii) energy efficiency in processing garbage data. Monitoring raw sensory inputs to identify and drop data and features from noisy modalities can improve prediction accuracy and energy efficiency. We propose a closed-loop monitoring and control framework for multi-modal eHealth applications, AMSER, that can mitigate garbage-in garbage-out by i) monitoring input modalities, ii) analyzing raw input to selectively drop noisy data and features, and iii) choosing appropriate machine learning models that fit the configured data and feature vector - to improve prediction accuracy and energy efficiency. We evaluate our AMSER approach using multi-modal eHealth applications of pain assessment and stress monitoring over different levels and types of noisy components incurred via different sensor modalities. Our approach achieves up to 22% improvement in prediction accuracy and 5.6$ imes$ energy consumption reduction in the sensing phase against the state-of-the-art multi-modal monitoring application.

　　【13】 Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution 标题：文本格式塔：笔画感知场景文本图像超分辨率链接：https://arxiv.org/abs/2112.08171

　　作者：Jingye Chen,Haiyang Yu,Jianqi Ma,Bin Li,Xiangyang Xue 备注：Accepted to AAAI2022. Code is available at this https URL 摘要：近十年来，随着深度学习的蓬勃发展，场景文本识别技术得到了飞速发展。然而，低分辨率场景文本图像的识别仍然是一个挑战。尽管已经提出了一些超分辨率方法来解决这个问题，但它们通常将文本图像视为一般图像，而忽略了笔画（文本的原子单位）的视觉质量对文本识别起着至关重要的作用这一事实。格式塔心理学认为，在先验知识的指导下，人类能够将部分细节组合成最相似的对象。同样，当人类观察低分辨率文本图像时，他们会固有地使用部分笔划级别的细节来恢复整体角色的外观。受格式塔心理学的启发，我们提出了一种包含笔划聚焦模块（SFM）的笔划感知场景文本图像超分辨率方法，以关注文本图像中字符的笔划级内部结构。具体地说，我们试图设计规则，在笔划级别分解英文字符和数字，然后预先训练文本识别器，以提供笔划级别的注意图作为位置线索，目的是控制生成的超分辨率图像和高分辨率地面真相之间的一致性。大量的实验结果验证了该方法确实能够在TextZoom和人工构建的汉字数据集degrade-IC13上生成更清晰的图像。此外，由于建议的SFM仅用于在训练时提供冲程水平指导，因此在测试阶段不会带来任何时间开销。代码可在https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt. 摘要：In the last decade, the blossom of deep learning has witnessed the rapid development of scene text recognition. However, the recognition of low-resolution scene text images remains a challenge. Even though some super-resolution methods have been proposed to tackle this problem, they usually treat text images as general images while ignoring the fact that the visual quality of strokes (the atomic unit of text) plays an essential role for text recognition. According to Gestalt Psychology, humans are capable of composing parts of details into the most similar objects guided by prior knowledge. Likewise, when humans observe a low-resolution text image, they will inherently use partial stroke-level details to recover the appearance of holistic characters. Inspired by Gestalt Psychology, we put forward a Stroke-Aware Scene Text Image Super-Resolution method containing a Stroke-Focused Module (SFM) to concentrate on stroke-level internal structures of characters in text images. Specifically, we attempt to design rules for decomposing English characters and digits at stroke-level, then pre-train a text recognizer to provide stroke-level attention maps as positional clues with the purpose of controlling the consistency between the generated super-resolution image and high-resolution ground truth. The extensive experimental results validate that the proposed method can indeed generate more distinguishable images on TextZoom and manually constructed Chinese character dataset Degraded-IC13. Furthermore, since the proposed SFM is only used to provide stroke-level guidance when training, it will not bring any time overhead during the test phase. Code is available at https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt.

　　【14】 Improving Conversational Recommendation Systems' Quality with Context-Aware Item Meta Information 标题：利用上下文感知项元信息提高会话推荐系统的质量链接：https://arxiv.org/abs/2112.08140

　　作者：Bowen Yang,Cong Han,Yu Li,Lei Zuo,Zhou Yu 摘要：会话推荐系统（CRS）通过从对话历史中推断用户偏好、提供准确的推荐并生成适当的响应来与用户互动。以前的CRS使用基于知识图（KG）的推荐模块，并将KG与语言模型集成以生成响应。尽管基于KG的方法证明是有效的，但仍有两个问题有待解决。首先，基于KG的方法忽略了会话上下文中的信息，而只依赖实体关系和一袋单词来推荐项目。第二，它需要大量的工程工作来维护建模领域特定关系的KG，从而降低灵活性。在本文中，我们提出了一个简单而有效的体系结构，包括一个预训练语言模型（PLM）和一个项目元数据编码器。编码器学习将项元数据映射到能够反映对话框上下文中语义信息的嵌入。PLM然后使用语义一致的项嵌入和对话框上下文来生成高质量的建议和响应。我们的模型没有使用KG建模实体关系，而是通过将每个项直接转换为嵌入项来降低工程复杂性。在基准数据集ReDial上的实验结果表明，我们的模型在推荐和响应生成任务上都获得了最新的结果。摘要：Conversational recommendation systems (CRS) engage with users by inferring user preferences from dialog history, providing accurate recommendations, and generating appropriate responses. Previous CRSs use knowledge graph (KG) based recommendation modules and integrate KG with language models for response generation. Although KG-based approaches prove effective, two issues remain to be solved. First, KG-based approaches ignore the information in the conversational context but only rely on entity relations and bag of words to recommend items. Second, it requires substantial engineering efforts to maintain KGs that model domain-specific relations, thus leading to less flexibility. In this paper, we propose a simple yet effective architecture comprising a pre-trained language model (PLM) and an item metadata encoder. The encoder learns to map item metadata to embeddings that can reflect the semantic information in the dialog context. The PLM then consumes the semantic-aligned item embeddings together with dialog context to generate high-quality recommendations and responses. Instead of modeling entity relations with KGs, our model reduces engineering complexity by directly converting each item to an embedding. Experimental results on the benchmark dataset ReDial show that our model obtains state-of-the-art results on both recommendation and response generation tasks.

　　【15】 Characterizing the Program Expressive Power of Existential Rule Languages 标题：存在规则语言程序表达能力的刻画链接：https://arxiv.org/abs/2112.08136

　　作者：Heng Zhang 备注：To be published in AAAI-22 摘要：存在规则语言是一类广泛应用于本体中介查询应答（OMQA）中的本体语言。然而，对于大多数人来说，OMQA领域知识的表达能力，即程序表达能力，还没有得到很好的理解。在本文中，我们为几种重要的存在规则语言的程序表达能力建立了一些新的特征，包括元组生成依赖（tuple generating dependencies，TGDs）、线性TGDs以及析取TGDs。这些特征采用自然模型理论属性，有时采用自动机理论属性，从而为这些语言中OMQA领域知识的可定义性识别提供了强有力的工具。摘要：Existential rule languages are a family of ontology languages that have been widely used in ontology-mediated query answering (OMQA). However, for most of them, the expressive power of representing domain knowledge for OMQA, known as the program expressive power, is not well-understood yet. In this paper, we establish a number of novel characterizations for the program expressive power of several important existential rule languages, including tuple-generating dependencies (TGDs), linear TGDs, as well as disjunctive TGDs. The characterizations employ natural model-theoretic properties, and automata-theoretic properties sometimes, which thus provide powerful tools for identifying the definability of domain knowledge for OMQA in these languages.

　　【16】 Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration 标题：利用自动无监督孤立点仲裁改进自监督学习链接：https://arxiv.org/abs/2112.08132

　　作者：Yu Wang,Jingyang Lin,Jingjing Zou,Yingwei Pan,Ting Yao,Tao Mei 备注：NeurIPS 2021; Code is publicly available at: this https URL 摘要：我们的工作揭示了现有主流自监督学习方法的结构性缺陷。虽然自监督学习框架通常认为流行的完美实例级不变性假设是理所当然的，但我们仔细研究了其背后的陷阱。特别是，我们认为，现有的用于生成多个积极观点的增强管道自然会引入分布外（OOD）样本，从而破坏下游任务的学习。对输入产生不同的积极增强并不总是有利于下游任务。为了克服这个固有的缺陷，我们引入了一个轻量级的潜在变量模型UOTA，针对自监督学习的视图采样问题。UOTA自适应搜索最重要的采样区域以生成视图，并为离群点鲁棒自监督学习方法提供可行的选择。我们的方法直接推广到许多主流的自监督学习方法，不管损失的性质是否是对比的。我们的经验表明，UOTA比最先进的自我监督范式具有明显的优势，这很好地证明了现有方法中嵌入的OOD样本问题的存在。特别是，我们从理论上证明了该方案的优点归结为保证估计方差和偏差减少。代码可从以下网址获取：https://github.com/ssl-codelab/uota. 摘要：Our work reveals a structured shortcoming of the existing mainstream self-supervised learning methods. Whereas self-supervised learning frameworks usually take the prevailing perfect instance level invariance hypothesis for granted, we carefully investigate the pitfalls behind. Particularly, we argue that the existing augmentation pipeline for generating multiple positive views naturally introduces out-of-distribution (OOD) samples that undermine the learning of the downstream tasks. Generating diverse positive augmentations on the input does not always pay off in benefiting downstream tasks. To overcome this inherent deficiency, we introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning. UOTA adaptively searches for the most important sampling region to produce views, and provides viable choice for outlier-robust self-supervised learning approaches. Our method directly generalizes to many mainstream self-supervised learning approaches, regardless of the loss's nature contrastive or not. We empirically show UOTA's advantage over the state-of-the-art self-supervised paradigms with evident margin, which well justifies the existence of the OOD sample issue embedded in the existing approaches. Especially, we theoretically prove that the merits of the proposal boil down to guaranteed estimator variance and bias reduction. Code is available: at https://github.com/ssl-codelab/uota.

　　【17】 Towards Controllable Agent in MOBA Games with Generative Modeling 标题：基于产生式建模的MOBA博弈中可控Agent研究链接：https://arxiv.org/abs/2112.08093

　　作者：Shubao Zhang 备注：Human-Compatible AI; Human-AI Cooperation; AI control; AI Alignment 摘要：我们提出了一种新的方法来开发行为可控的代理，该代理的行为类似于人类，并且能够在多人在线战场（MOBA）游戏中与人类玩家对齐。通过将控制问题建模为一个动作生成过程，我们设计了一个用于训练agent的深层潜在对齐神经网络模型，并设计了相应的采样算法来控制agent的动作。特别地，我们提出了核心潜在对齐模型的确定性和随机注意实现。在《国王荣誉》游戏中的模拟和在线实验都证明了所提方法的有效性。摘要：We propose novel methods to develop action controllable agent that behaves like a human and has the ability to align with human players in Multiplayer Online Battle Arena (MOBA) games. By modeling the control problem as an action generation process, we devise a deep latent alignment neural network model for training agent, and a corresponding sampling algorithm for controlling an agent's action. Particularly, we propose deterministic and stochastic attention implementations of the core latent alignment model. Both simulated and online experiments in the game Honor of Kings demonstrate the efficacy of the proposed methods.

　　【18】 Cognition-aware Cognate Detection 标题：认知感知的同源检测链接：https://arxiv.org/abs/2112.08087

　　作者：Diptesh Kanojia,Prashant Sharma,Sayali Ghodekar,Pushpak Bhattacharyya,Gholamreza Haffari,Malhar Kulkarni 备注：Published at EACL 2021 摘要：同源词的自动检测有助于机器翻译、跨语言信息检索、计算系统发育和跨语言命名实体识别等NLP下游任务。以前的同源词检测方法使用基于正交、语音和语义相似度的特征集。在本文中，我们提出了一种新的方法来丰富特征集，从人类读者的注视行为中提取认知特征。我们收集了一小部分同源词的注视行为数据，并表明提取的认知特征有助于同源词检测。然而，数据收集和注释是一项成本高昂的任务。我们使用收集到的注视行为数据预测更大样本的认知特征，并表明预测的认知特征也显著提高了任务绩效。我们报告，与之前提出的方法相比，收集的凝视特征提高了10%，使用预测的凝视特征提高了12%。此外，我们还发布了收集到的注视行为数据以及我们的代码和跨语言模型。摘要：Automatic detection of cognates helps downstream NLP tasks of Machine Translation, Cross-lingual Information Retrieval, Computational Phylogenetics and Cross-lingual Named Entity Recognition. Previous approaches for the task of cognate detection use orthographic, phonetic and semantic similarity based features sets. In this paper, we propose a novel method for enriching the feature sets, with cognitive features extracted from human readers' gaze behaviour. We collect gaze behaviour data for a small sample of cognates and show that extracted cognitive features help the task of cognate detection. However, gaze data collection and annotation is a costly task. We use the collected gaze behaviour data to predict cognitive features for a larger sample and show that predicted cognitive features, also, significantly improve the task performance. We report improvements of 10% with the collected gaze features, and 12% using the predicted gaze features, over the previously proposed approaches. Furthermore, we release the collected gaze behaviour data along with our code and cross-lingual models.

　　【19】 Optimal Latent Space Forecasting for Large Collections of Short Time Series Using Temporal Matrix Factorization 标题：基于时间矩阵分解的大样本短时间序列最优潜在空间预测链接：https://arxiv.org/abs/2112.08052

　　作者：Himanshi Charotia,Abhishek Garg,Gaurav Dhama,Naman Maheshwari 摘要：在时间序列预测的背景下，通常的做法是评估多种方法并选择其中一种方法或一个集合来生成最佳预测。然而，在不同的集合中选择多种方法仍然是一项具有挑战性的任务，随着方法数量的增加，这项任务将经历组合爆炸。在需求预测或收入预测方面，由于业务环境不断变化，大量时间序列以及可用的有限历史数据点进一步加剧了这一挑战。尽管深度学习预测方法旨在同时预测大量时间序列，但由于可用的历史有限，它们在此类场景中的应用变得很有挑战性，并且可能不会产生理想的结果。我们提出了一个预测高维短时间序列数据的框架，该框架将低秩时间矩阵分解和使用交叉验证的潜在时间序列的最优模型选择相结合。我们证明，与直接对时间序列应用不同的单变量模型相比，预测潜在因素可以显著提高性能。性能已在M4月度数据集的截断版本上验证，该数据集包含来自多个域的时间序列数据，显示了该方法的普遍适用性。此外，由于潜在因素的数量较少，因此在将预测方法直接应用于高维数据集时通常是不切实际的，因此有利于纳入分析师对未来的看法。摘要：In the context of time series forecasting, it is a common practice to evaluate multiple methods and choose one of these methods or an ensemble for producing the best forecasts. However, choosing among different ensembles over multiple methods remains a challenging task that undergoes a combinatorial explosion as the number of methods increases. In the context of demand forecasting or revenue forecasting, this challenge is further exacerbated by a large number of time series as well as limited historical data points available due to changing business context. Although deep learning forecasting methods aim to simultaneously forecast large collections of time series, they become challenging to apply in such scenarios due to the limited history available and might not yield desirable results. We propose a framework for forecasting short high-dimensional time series data by combining low-rank temporal matrix factorization and optimal model selection on latent time series using cross-validation. We demonstrate that forecasting the latent factors leads to significant performance gains as compared to directly applying different uni-variate models on time series. Performance has been validated on a truncated version of the M4 monthly dataset which contains time series data from multiple domains showing the general applicability of the method. Moreover, it is amenable to incorporating the analyst view of the future owing to the low number of latent factors which is usually impractical when applying forecasting methods directly to high dimensional datasets.

　　【20】 TLogic: Temporal Logical Rules for Explainable Link Forecasting on Temporal Knowledge Graphs 标题：TLogic：时态知识图上可解释链接预测的时态逻辑规则链接：https://arxiv.org/abs/2112.08025

　　作者：Yushan Liu,Yunpu Ma,Marcel Hildebrandt,Mitchell Joblin,Volker Tresp 备注：Accepted at AAAI 2022 (36th AAAI Conference on Artificial Intelligence) 摘要：传统的静态知识图将关系数据中的实体建模为节点，由特定关系类型的边连接。然而，信息和知识不断发展，时间动态不断出现，预计将影响未来局势。在时态知识图中，时间信息通过为每条边配备时间戳或时间范围而集成到图中。基于嵌入的方法已经被引入到时态知识图的链接预测中，但它们大多缺乏可解释性和可理解的推理链。特别是，它们通常不用于处理链接预测——涉及未来时间戳的事件预测。我们讨论了时态知识图上的链接预测任务，并介绍了基于时态随机游动提取的时态逻辑规则的可解释框架TLogic。在三个基准数据集上，我们将TLogic与最先进的基线进行了比较，显示出更好的总体性能，同时我们的方法也提供了保持时间一致性的解释。此外，与大多数最先进的基于嵌入的方法相比，TLogic在归纳环境中工作得很好，在归纳环境中，已经学习的规则被转移到具有公共词汇表的相关数据集。摘要：Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting -- event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.

　　【21】 Segmentation-Reconstruction-Guided Facial Image De-occlusion 标题：基于分割重建的人脸图像去遮挡链接：https://arxiv.org/abs/2112.08022

　　作者：Xiangnan Yin,Di Huang,Zehua Fu,Yunhong Wang,Liming Chen 摘要：遮挡在野外人脸图像中非常常见，导致人脸相关任务的性能下降。尽管人们在去除人脸图像中的遮挡方面做了大量的工作，但是遮挡的形状和纹理的变化仍然对当前方法的鲁棒性提出了挑战。因此，当前的方法要么依赖于手动遮挡遮罩，要么仅适用于特定遮挡。提出了一种新的基于人脸分割和三维人脸重建的人脸去遮挡模型，该模型能够自动去除所有边界模糊的人脸遮挡，例如头发。该模型由三维人脸重建模块、人脸分割模块和图像生成模块组成。图像生成模块利用前两者分别预测的人脸先验信息和遮挡掩模信息，能够忠实地恢复缺失的人脸纹理。为了监督训练，我们进一步构建了一个大型的遮挡数据集，包括手动标记和合成遮挡。定性和定量结果证明了该方法的有效性和鲁棒性。摘要：Occlusions are very common in face images in the wild, leading to the degraded performance of face-related tasks. Although much effort has been devoted to removing occlusions from face images, the varying shapes and textures of occlusions still challenge the robustness of current methods. As a result, current methods either rely on manual occlusion masks or only apply to specific occlusions. This paper proposes a novel face de-occlusion model based on face segmentation and 3D face reconstruction, which automatically removes all kinds of face occlusions with even blurred boundaries,e.g., hairs. The proposed model consists of a 3D face reconstruction module, a face segmentation module, and an image generation module. With the face prior and the occlusion mask predicted by the first two, respectively, the image generation module can faithfully recover the missing facial textures. To supervise the training, we further build a large occlusion dataset, with both manually labeled and synthetic occlusions. Qualitative and quantitative results demonstrate the effectiveness and robustness of the proposed method.

　　【22】 Predicting Media Memorability: Comparing Visual, Textual and Auditory Features 标题：预测媒体记忆力：比较视觉、文本和听觉特征链接：https://arxiv.org/abs/2112.07969

　　作者：Lorin Sweeney,Graham Healy,Alan F. Smeaton 备注：3 pages 摘要：本文介绍了我们在中世纪2021中预测媒体记忆任务的方法，其目的是通过设置自动预测视频记忆性的任务来解决媒体记忆性问题。今年，我们从比较的角度处理这项任务，希望对三种探索模式中的每一种都有更深入的了解，并将去年（2020年）提交的结果作为参考点。与去年一样，我们在TRECVid2019数据集上测试的最佳短期记忆性模型（0.132）是一个基于帧的CNN，没有对任何TRECVid数据进行训练，而在Memento10k数据集上测试的最佳短期记忆性模型（0.524）是一个符合DenseNet121视觉特征的贝叶斯行驶回归器。摘要：This paper describes our approach to the Predicting Media Memorability task in MediaEval 2021, which aims to address the question of media memorability by setting the task of automatically predicting video memorability. This year we tackle the task from a comparative standpoint, looking to gain deeper insights into each of three explored modalities, and using our results from last year's submission (2020) as a point of reference. Our best performing short-term memorability model (0.132) tested on the TRECVid2019 dataset -- just like last year -- was a frame based CNN that was not trained on any TRECVid data, and our best short-term memorability model (0.524) tested on the Memento10k dataset, was a Bayesian Ride Regressor fit with DenseNet121 visual features.

　　【23】 Efficient Geometry-aware 3D Generative Adversarial Networks 标题：一种高效的几何感知3D生成性对抗网络链接：https://arxiv.org/abs/2112.07945

　　作者：Eric R. Chan,Connor Z. Lin,Matthew A. Chan,Koki Nagano,Boxiao Pan,Shalini De Mello,Orazio Gallo,Leonidas Guibas,Jonathan Tremblay,Sameh Khamis,Tero Karras,Gordon Wetzstein 备注：Project page: this https URL 摘要：仅使用单视图2D照片集无监督生成高质量多视图一致图像和3D形状一直是一个长期的挑战。现有的3D GAN要么是计算密集型的，要么是不一致的近似值；前者限制了生成图像的质量和分辨率，后者对多视图一致性和形状质量产生不利影响。在这项工作中，我们在不过度依赖这些近似的情况下提高了3D GANs的计算效率和图像质量。为此，我们引入了一种表现型混合显式-隐式网络体系结构，它与其他设计选择一起，不仅实时合成高分辨率多视图一致性图像，而且还生成高质量的三维几何图形。通过解耦特征生成和神经渲染，我们的框架能够利用最先进的2D CNN生成器，如StyleGAN2，并继承它们的效率和表达能力。我们用FFHQ和AFHQ猫演示了最先进的3D感知合成，以及其他实验。摘要：Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. For this purpose, we introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.

　　【24】 Graph-based Ensemble Machine Learning for Student Performance Prediction 标题：基于图的集成机器学习在学生成绩预测中的应用链接：https://arxiv.org/abs/2112.07893

　　作者：Yinkai Wang,Aowei Ding,Kaiyi Guan,Shixi Wu,Yuanqi Du 备注：5 pages, 3 figures and 3 tables 摘要：学生成绩预测是了解学生需求、提供适当的学习机会/资源以及提高教学质量的关键研究问题。然而，传统的机器学习方法无法产生稳定、准确的预测结果。在本文中，我们提出了一种基于图的集成机器学习方法，旨在通过多种方法的一致性来提高单个机器学习方法的稳定性。具体来说，我们利用有监督预测方法和无监督聚类方法，构建一种迭代方法，该方法在二部图中传播，并收敛到更稳定和准确的预测结果。大量的实验证明了我们提出的方法在预测更准确的学生成绩方面的有效性。具体来说，我们的模型在预测精度上比最好的传统机器学习算法高出14.8%。摘要：Student performance prediction is a critical research problem to understand the students' needs, present proper learning opportunities/resources, and develop the teaching quality. However, traditional machine learning methods fail to produce stable and accurate prediction results. In this paper, we propose a graph-based ensemble machine learning method that aims to improve the stability of single machine learning methods via the consensus of multiple methods. To be specific, we leverage both supervised prediction methods and unsupervised clustering methods, build an iterative approach that propagates in a bipartite graph as well as converges to more stable and accurate prediction results. Extensive experiments demonstrate the effectiveness of our proposed method in predicting more accurate student performance. Specifically, our model outperforms the best traditional machine learning algorithms by up to 14.8% in prediction accuracy.

　　【25】 Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data 标题：基于查询学习的弱标签数据零发音源分离链接：https://arxiv.org/abs/2112.07891

　　作者：Ke Chen,Xingjian Du,Bilei Zhu,Zejun Ma,Taylor Berg-kirkpatrick,Shlomo Dubnov 备注：9 pages, 3 figures, 5 tables, preprint version for Association for the Advancement of Artificial Intelligence Conference, AAAI 2022 摘要：将音频分离为不同声源的深度学习技术面临若干挑战。标准体系结构要求针对不同类型的音频源训练不同的模型。尽管一些通用分离器采用单一模型来定位多个源，但它们很难推广到不可见源。在本文中，我们提出了一个三分量管道来从一个大型但标记较弱的数据集：AudioSet中训练通用音频源分离器。首先，我们提出了一个基于Transformer的声音事件检测系统，用于处理弱标记的训练数据。其次，我们设计了一个基于查询的音频分离模型，该模型利用这些数据进行模型训练。第三，我们设计了一个潜在的嵌入处理器来对指定音频目标进行分离的查询进行编码，从而实现Zero-Shot泛化。我们的方法使用单一模型来分离多种声音类型的源，并且仅依赖弱标记数据进行训练。此外，建议的音频分离器可用于Zero-Shot设置，学习分离训练中从未见过的音频源类型。为了评估分离性能，我们在MUSDB18上测试了我们的模型，同时在不相交的音频集上进行了训练。我们通过对训练中保留的音频源类型进行另一个实验，进一步验证了零炮性能。在这两种情况下，该模型的源失真比（SDR）性能与当前监督模型相当。摘要：Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet. First, we propose a transformer-based sound event detection system for processing weakly-labeled training data. Second, we devise a query-based audio separation model that leverages this data for model training. Third, we design a latent embedding processor to encode queries that specify audio targets for separation, allowing for zero-shot generalization. Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training. In addition, the proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training. To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint AudioSet. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.

　　【26】 Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases 标题：用于检测社会偏见的预训语言模型的小镜头教学提示链接：https://arxiv.org/abs/2112.07868

　　作者：Shrimai Prabhumoye,Rafal Kocielnik,Mohammad Shoeybi,Anima Anandkumar,Bryan Catanzaro 摘要：检测文本中的社会偏见具有挑战性，这是由于细微差别、主观性以及难以在一定规模上获得高质量的标记数据集，特别是考虑到社会偏见和社会的演变性质。为了应对这些挑战，我们提出了一些基于快照指令的方法来提示预训练语言模型（LMs）。我们从一个小的支持库中选择几个标签平衡的示例，这些示例与要在嵌入空间中标记的查询最接近。然后，我们向LM提供指令，该指令由标记样本的子集、要分类的查询文本、偏差的定义组成，并提示它做出决定。我们证明，在少数镜头环境中使用的大型LMs可以检测不同类型的细粒度偏差，其精度与微调模型类似，有时甚至更高。我们观察到，与较小的模型相比，最大的530B参数模型在检测社会偏见方面更为有效（与其他模型相比，AUC指标至少提高了20%）。它还可以在几次拍摄设置中保持较高的AUC（下降小于5%），标记的存储库减少到100个样本。因此，大型预先训练的语言模型使得构建新的偏差检测器变得更容易、更快。摘要：Detecting social bias in text is challenging due to nuance, subjectivity, and difficulty in obtaining good quality labeled datasets at scale, especially given the evolving nature of social biases and society. To address these challenges, we propose a few-shot instruction-based method for prompting pre-trained language models (LMs). We select a few label-balanced exemplars from a small support repository that are closest to the query to be labeled in the embedding space. We then provide the LM with instruction that consists of this subset of labeled exemplars, the query text to be classified, a definition of bias, and prompt it to make a decision. We demonstrate that large LMs used in a few-shot context can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models. We observe that the largest 530B parameter model is significantly more effective in detecting social bias compared to smaller models (achieving at least 20% improvement in AUC metric compared to other models). It also maintains a high AUC (dropping less than 5%) in a few-shot setting with a labeled repository reduced to as few as 100 samples. Large pretrained language models thus make it easier and quicker to build new bias detectors.

　　【27】 Interscript: A dataset for interactive learning of scripts through error feedback 标题：InterScript：用于通过错误反馈交互学习脚本的数据集链接：https://arxiv.org/abs/2112.07867

　　作者：Niket Tandon,Aman Madaan,Peter Clark,Keisuke Sakaguchi,Yiming Yang 备注：AAAI'22-Workshop on Interactive Machine Learning 摘要：如果部署的结构化预测模型产生不一致的输出，而忽略人类语言的结构复杂性，最终用户如何提供反馈？这是一个新兴话题，最近在合成或约束环境方面取得了进展，下一个重大飞跃将需要在现实世界环境中测试和调整模型。我们提供了一个新的数据集Interscript，其中包含用户对生成复杂日常任务的已部署模型的反馈。Interscript包含8466个数据点——输入可能是错误的脚本和用户反馈，输出是修改后的脚本。我们提出了我们的两个用例，这两个用例可能会显著提高交互式学习的最新水平。该数据集可从以下网址获得：https://github.com/allenai/interscript. 摘要：How can an end-user provide feedback if a deployed structured prediction model generates inconsistent output, ignoring the structural complexity of human language? This is an emerging topic with recent progress in synthetic or constrained settings, and the next big leap would require testing and tuning models in real-world settings. We present a new dataset, Interscript, containing user feedback on a deployed model that generates complex everyday tasks. Interscript contains 8,466 data points -- the input is a possibly erroneous script and a user feedback, and the output is a modified script. We posit two use-cases of ours that might significantly advance the state-of-the-art in interactive learning. The dataset is available at: https://github.com/allenai/interscript.

　　【28】 HyObscure: Hybrid Obscuring for Privacy-Preserving Data Publishing 标题：HyObscure：用于隐私保护数据发布的混合遮挡链接：https://arxiv.org/abs/2112.07850

　　作者：Xiao Han,Yuncong Yang,Junjie Wu 摘要：在保护隐私的数据发布任务中，在确保数据实用性的同时最大限度地减少隐私泄漏是数据持有者面临的一个关键问题。大多数以前的研究只涉及一种类型的数据，并诉诸于单一的模糊方法，例如模糊或泛化，以实现隐私-效用权衡，这不足以保护现实生活中的异构数据，也难以抵御日益增长的基于机器学习的推理攻击。这项工作采取了一项试点研究，隐私保护数据发布时，泛化和混淆操作都用于异构数据保护。为此，我们首先提出了隐私和效用量化的新方法，并提出了混合隐私保护数据模糊问题，以考虑泛化和模糊化的共同影响。然后，我们设计了一种新的混合保护机制，称为hyouzzle，在一定的效用保证下交叉迭代优化泛化和模糊化操作，以获得最大的隐私保护。从理论上证明了迭代过程的收敛性和hyfuzzle的隐私泄漏界。大量实验表明，当面对不同场景下的各种推理攻击时，Hydouzzle的性能明显优于各种最先进的基线方法。Hydouxed还可根据数据大小线性扩展，并在关键参数变化时表现稳健。摘要：Minimizing privacy leakage while ensuring data utility is a critical problem to data holders in a privacy-preserving data publishing task. Most prior research concerns only with one type of data and resorts to a single obscuring method, eg, obfuscation or generalization, to achieve a privacy-utility tradeoff, which is inadequate for protecting real-life heterogeneous data and is hard to defend ever-growing machine learning based inference attacks. This work takes a pilot study on privacy-preserving data publishing when both generalization and obfuscation operations are employed for heterogeneous data protection. To this end, we first propose novel measures for privacy and utility quantification and formulate the hybrid privacy-preserving data obscuring problem to account for the joint effect of generalization and obfuscation. We then design a novel hybrid protection mechanism called HyObscure, to cross-iteratively optimize the generalization and obfuscation operations for maximum privacy protection under a certain utility guarantee. The convergence of the iterative process and the privacy leakage bound of HyObscure are also provided in theory. Extensive experiments demonstrate that HyObscure significantly outperforms a variety of state-of-the-art baseline methods when facing various inference attacks under different scenarios. HyObscure also scales linearly to the data size and behaves robustly with varying key parameters.

　　【29】 Probabilistic Logic Gate in Asynchronous Game of Life with Critical Property 标题：具有临界性的异步生命博弈中的概率逻辑门链接：https://arxiv.org/abs/2112.07846

　　作者：Yukio-Pegio Gunji,Yoshihiko Ohzawa,Terutaka Tanaka 备注：None 摘要：元启发式和自组织临界性（SOC）有助于在扰动环境下进行鲁棒计算。在临界状态下在计算系统中实现逻辑门是研究超启发式和SOC作用的有趣方法之一。在这里，我们研究了细胞自动机生命游戏（GL）在异步更新中的行为，并使用异步GL实现概率逻辑门。我们发现异步GL表现出相变，1的态密度在临界点随幂律衰减，临界点处的系统在异步GL中具有最大的可计算性。我们在具有临界性的异步GL中实现了AND和OR门，显示了良好的性能。由于调谐扰动在操作逻辑门中起着至关重要的作用，我们的研究揭示了概率逻辑门中操作和扰动之间的干扰。摘要：Metaheuristic and self-organizing criticality (SOC) could contribute to robust computation under perturbed environments. Implementing a logic gate in a computing system in a critical state is one of the intriguing ways to study the role of metaheuristics and SOCs. Here, we study the behavior of cellular automaton, game of life (GL), in asynchronous updating and implement probabilistic logic gates by using asynchronous GL. We find that asynchronous GL shows a phase transition, that the density of the state of 1 decays with the power law at the critical point, and that systems at the critical point have the most computability in asynchronous GL. We implement AND and OR gates in asynchronous GL with criticality, which shows good performance. Since tuning perturbations play an essential role in operating logic gates, our study reveals the interference between manipulation and perturbation in probabilistic logic gates.

　　【30】 Mining Minority-class Examples With Uncertainty Estimates 标题：挖掘具有不确定性估计的少数类实例链接：https://arxiv.org/abs/2112.07835

　　作者：Gursimran Singh,Lingyang Chu,Lanjun Wang,Jian Pei,Qi Tian,Yong Zhang 摘要：在现实世界中，对象的出现频率自然会发生倾斜，形成长尾类分布，这导致统计上罕见的类的性能较差。一个很有希望的解决方案是挖掘尾部类示例以平衡训练数据集。然而，挖掘尾部类示例是一项非常具有挑战性的任务。例如，大多数成功的基于不确定性的挖掘方法都由于数据中的偏斜导致的类概率失真而难以实现。在这项工作中，我们提出了一种有效但简单的方法来克服这些挑战。我们的框架增强了被抑制的尾部类激活，然后，使用以一类数据为中心的方法来有效地识别尾部类示例。我们在跨越两个计算机视觉任务的三个数据集上对我们的框架进行了详尽的评估。少数类挖掘的实质性改进和微调模型的性能有力地证实了我们提出的解决方案的价值。摘要：In the real world, the frequency of occurrence of objects is naturally skewed forming long-tail class distributions, which results in poor performance on the statistically rare classes. A promising solution is to mine tail-class examples to balance the training dataset. However, mining tail-class examples is a very challenging task. For instance, most of the otherwise successful uncertainty-based mining approaches struggle due to distortion of class probabilities resulting from skewness in data. In this work, we propose an effective, yet simple, approach to overcome these challenges. Our framework enhances the subdued tail-class activations and, thereafter, uses a one-class data-centric approach to effectively identify tail-class examples. We carry out an exhaustive evaluation of our framework on three datasets spanning over two computer vision tasks. Substantial improvements i

上一篇：止于四月
下一篇：第五届国学大典人文讲会第2讲｜刘增光：主敬与礼让：宋以降思想的一个观察｜

人工智能学术速递[12.16]

最近更新娱乐资讯