报告摘要:
Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.
报告人简介:
周豆豆,新加坡国立大学统计与数据科学系助理教授。他于中科大获得本科学位、加州大学戴维斯分校获得博士学位,并在哈佛大学T.H. Chan公共卫生学院完成博士后研究,其研究兴趣包括电子健康记录、高维统计、迁移学习与联邦学习等。相关工作发表于Nature Comminications, JASA, JRSS-B, JMLR, IEEE Trans. Inf. Theory, J. Biomed. Inform., Bioinformatics, npj Health Systems, Biostatistics和计算机领域顶级会议等。