Momentum-BasedPolicyGradientMethodsFeihuHuang1ShangqianGao1JianPei2HengHuang13Abstracttimesteps,andthenmaximizesthelong-termcumulativerewardstoobtainanoptimalPolicy.Duetoeasyimple-Inthepaper,weprop...
LearningtoScoreBehaviorsforGuidedPolicyOptimizationAldoPacchiano1JackParker-Holder2YunhaoTang3AnnaChoromanska4KrzysztofChoromanski5MichaelI.Jordan1Abstractproposeisthat:Weintroduceanewapproachforco...
FromImportanceSamplingtoDoublyRobustPolicyGradientJiaweiHuang1NanJiang1AbstractSummaryofthePaperWeprovideasimpleandpositiveanswertotheabovequestionintheepisodicRLsetting.InWeshowthaton-PolicyPolicy...
EfficientPolicyLearningfromSurrogate-LossClassificationReductionsAndrewBennett1NathanKallus1AbstractapproachesmayincorrectlyinferthataPolicyofalwaysas-signinglessinvasivetreatmentswillobtainbettero...
DistributionallyRobustPolicyEvaluationandLearninginOfflineContextualBanditsNianSi1FanZhang1ZhengyuanZhou2JoseBlanchet1Abstractnomenonintheseapplications,canbeintelligentlyexploitedtoachievebetterou...
DeepReinforcementLearningwithSmoothPolicyQianliShen1YanLi2HaomingJiang2ZhaoranWang3TuoZhao2Abstractquiresasignificantamountoftrainingdata,andsuffersfromnumeroustrainingdifficultiessuchasoverfitting...
BidirectionalModel-basedPolicyOptimizationHangLai1JianShen1WeinanZhang1YongYu1Abstractbehindtheirmodel-freecounterpartsduetomodelerror,whichisespeciallysevereformulti-steprolloutbecauseofModel-base...
ADistributionalViewonMulti-ObjectivePolicyOptimizationAbbasAbdolmaleki1SandyH.Huang1LeonardHasenclever1MichaelNeunert1H.FrancisSong1MartinaZambelli1MuriloF.Martins1NicolasHeess1RaiaHadsell1MartinRi...
UnderstandingtheImpactofEntropyonPolicyOptimizationZafaraliAhmed12NicolasLeRoux13MohammadNorouzi3DaleSchuurmans34Abstractlis,2000;Greensmithetal.,2004;Schulmanetal.,2015b;Tuckeretal.,2018).Entropyr...
TransferofSamplesinPolicySearchviaMultipleImportanceSamplingAndreaTirinzoni1MattiaSalvini1MarcelloRestelli1Abstractagentissupposedtoreuseknowledgeacquiredfromasetofsourcetaskstoacceleratethelearnin...
SafePolicyImprovementwithBaselineBootstrappingRomainLaroche1PaulTrichelair1RemiTachetdesCombes1AbstractisakeychallengeofmodernRLthatneedstobetackledbeforeanywide-scaleadoption.ThispaperconsidersSaf...
RandomExpertDistillation:ImitationLearningviaExpertPolicySupportEstimationRuohanWang1CarloCiliberto1PierluigiV.Amadori1YiannisDemiris1Abstract2016).Despiteitssimplicity,BCtypicallyrequiresalargeamo...
ProjectionsforApproximatePolicyIterationAlgorithmsRiadAkrour1JoniPajarinen12GerhardNeumann34JanPeters15Abstractdient,akeybreakthroughwastheuseofnaturalgradientthatfollowsthesteepestdescentinbehavio...
Predictor-CorrectorPolicyOptimizationChing-AnCheng12XinyanYan1NathanRatliff2ByronBoots12AbstractModel-basedRLmethodsimprovesampleefficiencybyleveraginganaccuratemodelthatcancheaplysimulatein-Wepres...
PopulationBasedAugmentation:EfficientLearningofAugmentationPolicySchedulesDanielHo12EricLiang1IonStoica1PieterAbbeel13XiChen13AbstractBaselineAutoAugmentPopulationBasedAugmentation4Akeychallengeinl...
PolicyConsolidationforContinualReinforcementLearningChristosKaplanis12MurrayShanahan13ClaudiaClopath2Abstractwaythatcannotbediscretisedeasilyintoseparatetasks.Inreinforcementlearning(RL),forexample...
POLITEX:RegretBoundsforPolicyIterationUsingExpertPredictionYasinAbbasi-Yadkori1PeterL.Bartlett2KushBhatia2NevenaLazic´3CsabaSzepesvári4GellértWeisz4Abstractmodel-basedalgorithms,andtheoreticalev...
PolicyCertificates:TowardsAccountableReinforcementLearningChristophDann1LihongLi2WeiWei2EmmaBrunskill3Abstractploration.EvensharpdropsinPolicyperformanceduringlearningarecommon,e.g.,whentheagentsta...
OptimisticPolicyOptimizationviaMultipleImportanceSamplingMatteoPapini1AlbertoMariaMetelli1LorenzoLupo1MarcelloRestelli1Abstractpeholtetal.,2018).Thisiswellmotivated,asinteractingwithsomeenvironment...
ImportanceSamplingPolicyEvaluationwithanEstimatedBehaviorPolicyJosiahP.Hanna1ScottNiekum1PeterStone1Abstractdeterminetheexpectedreturn–sumofrewards–thatanevaluationPolicy,πe,willobtainwhendeploy...