mxxhcm's blog

  • 首页

  • 标签

  • 分类

  • 归档

强化学习标签

Actor-Mimic

10-14

Policy Distillation

10-13

gym wrappers and monitors

10-09

gradient method deep deterministic policy gradient

10-06

reinforcement learning importance sampling

09-27

gradient method proximal policy optimization

09-23

gym retro

09-15

log derivative trick

09-12

gradient method trust region policy optimization

09-08

gradient method natural policy gradient

09-07
12…4
马晓鑫爱马荟荟

马晓鑫爱马荟荟

记录硕士三年自己的积累

337 日志
26 分类
77 标签
RSS
GitHub E-Mail
© 2022 马晓鑫爱马荟荟
由 Hexo 强力驱动 v3.8.0
|
主题 – NexT.Pisces v6.6.0