OnlineLearningwithOptimismandDelayGenevieveFlaspohler12FrancescoOrabona3JudahCohen4SoukaynaMouatadid5MirunaOprescu6PauloOrenstein7LesterMackey6Abstractialonlinelearningalgorithmsproviderobustperfor...
CombiningPessimismwithOptimismforRobustandEfficientModel-BasedDeepReinforcementLearningSebastianCuri1IlijaBogunovic1AndreasKrause1Abstractunpredictableways.Themaingoalisthentolearnapolicythatprovab...
AnytimeOnline-to-Batch,OptimismandAccelerationAshokCutkosky1Abstractoptimalornear-optimalguarantees.ThishashelpedfuelthewidespreadadoptionofonlinelearningalgorithmsasAstandardwaytoobtainconvergence...
WhyisPosteriorSamplingBetterthanOptimismforReinforcementLearning?IanOsband12BenjaminVanRoy1Abstractmateoffuturevalueandselectstheactionwiththegreatestestimate.Ifaselectedactionisnotnear-optimal,the...