报告题目:Optimal Subsampling for Online Data Streams with Categorical Responses(分类响应数据流下的最优子抽样)
报 告 人:艾明要 教授(北京大学)
报告时间:2024年3月1日(星期五)15:30-16:30
报告地点:必赢3003am114(小报告厅)
邀 请 人:柳振鑫 教授 联系方式:84706570
报告摘要: Timely analyzing categorical data which arrive quickly in large-scale chunks are in high demand, especially for the case that storage or access to the historical data is not always possible or desirable. This work introduces an efficient subsampling procedure for online data streams with multinomial logistic model to sequentially update the parameter estimator. The proposed online subsampling and estimating algorithm is computationally efficient, minimally storage-intensive, and allows for the scenario that the labels of data are expensive to measure and are not all provided initially. Some theoretical properties to quantify the asymptotic behavior of the proposed estimator are established. Optimal subsampling probabilities are given according to the $A$-optimality criterion. An adaptive subsampling algorithm is suggested for ease of practical implementation. The advantages of the proposed method are illustrated through numerical studies on both simulated and real data sets.
报告人简介:北京大学必赢3003am统计学二级教授、博士生导师。兼任全国应用统计专业学位研究生教育指导委员会委员、培养组组长,中国现场统计研究会副理事长,中国数学会概率统计学会第十一届理事会秘书长,中国统计学会常务理事。担任四个国际重要SCI期刊Stat Sinica、JSPI、SPL和Stat编委,国内核心期刊 《系统科学与数学》、《数理统计与管理》、《数学进展》编委,科学出版社《统计与数据科学丛书》编委。主要从事大数据采样理论与算法、试验设计与分析、计算机仿真试验与建模、应用统计的教学和研究工作,在AOS、JASA、Biometrika、《中国科学》等国内外重要期刊发表学术论文八十余篇。主持国家自然科学基金重点项目、面上项目等多项,参与完成科技部重点研发计划项目多项。获得北京大学优秀博士学位论文指导教师、北京大学通识教育核心课程主讲教师和北京市高等学校优秀教学成果二等奖。