-
Notifications
You must be signed in to change notification settings - Fork 371
/
CHANGELOG
531 lines (510 loc) · 23.4 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
2024.06.27(v0.5.2)
- env: add taxi env (#799) (#807)
- env: add ising model env (#782)
- env: add new Flozen Lake env (#781)
- env: optimize ppo continuous config in MuJoCo (#801)
- env: fix masac smac config multi_agent=True bug (#791)
- env: update/speed up pendulum ppo
- algo: fix gtrxl compatibility bug (#796)
- algo: fix complex obs demo for ppo pipeline (#786)
- algo: add naive PWIL demo
- algo: fix marl nstep td compatibility bug
- feature: add GPU utils (#788)
- feature: add deprecated function decorator (#778)
- style: relax flask requirement (#811)
- style: add new badge (hellogithub) in readme (#805)
- style: update discord link and badge in readme (#795)
- style: fix typo in config.py (#776)
- style: polish rl_utils api docs
- style: add constraint about numpy<2
- style: polish macos platform test version to 12
- style: polish ci python version
2024.02.04(v0.5.1)
- env: add MADDPG pettingzoo example (#774)
- env: polish NGU Atari configs (#767)
- env: fix bug in cliffwalking env (#759)
- env: add PettingZoo replay video demo
- env: change default max retry in env manager from 5 to 1
- algo: add QGPO diffusion-model related algorithm (#757)
- algo: add HAPPO multi-agent algorithm (#717)
- algo: add DreamerV3 + MiniGrid adaption (#725)
- algo: fix hppo entropy_weight to avoid nan error in log_prob (#761)
- algo: fix structured action bug (#760)
- algo: polish Decision Transformer entry (#754)
- algo: fix EDAC policy/model bug
- fix: env typos
- fix: pynng requirements bug
- fix: communication module unittest bug
- style: polish policy API doc (#762) (#764) (#768)
- style: add agent API doc (#758)
- style: polish torch_utils/utils API doc (#745) (#747) (#752) (#755) (#763)
2023.11.06(v0.5.0)
- env: add tabmwp env (#667)
- env: polish anytrading env issues (#731)
- algo: add PromptPG algorithm (#667)
- algo: add Plan Diffuser algorithm (#700)
- algo: add new pipeline implementation of IMPALA algorithm (#713)
- algo: add dropout layers to DQN-style algorithms (#712)
- feature: add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support (#637) (#730) (#737)
- feature: add more unittest cases for model (#728)
- feature: add collector logging in new pipeline (#735)
- fix: logger middleware problems (#715)
- fix: ppo parallel bug (#709)
- fix: typo in optimizer_helper.py (#726)
- fix: mlp dropout if condition bug
- fix: drex collecting data unittest bugs
- style: polish env manager/wrapper comments and API doc (#742)
- style: polish model comments and API doc (#722) (#729) (#734) (#736) (#741)
- style: polish policy comments and API doc (#732)
- style: polish rl_utils comments and API doc (#724)
- style: polish torch_utils comments and API doc (#738)
- style: update README.md and Colab demo (#733)
- style: update metaworld docker image
2023.08.23(v0.4.9)
- env: add cliffwalking env (#677)
- env: add lunarlander ppo config and example
- algo: add BCQ offline RL algorithm (#640)
- algo: add Dreamerv3 model-based RL algorithm (#652)
- algo: add tensor stream merge network tools (#673)
- algo: add scatter connection model (#680)
- algo: refactor Decision Transformer in new pipeline and support img input and discrete output (#693)
- algo: add three variants of Bilinear classes and a FiLM class (#703)
- feature: polish offpolicy RL multi-gpu DDP training (#679)
- feature: add middleware for Ape-X distributed pipeline (#696)
- feature: add example for evaluating trained DQN (#706)
- fix: to_ndarray fails to assign dtype for scalars (#708)
- fix: evaluator return episode_info compatibility bug
- fix: cql example entry wrong config bug
- fix: enable_save_figure env interface
- fix: redundant env info bug in evaluator
- fix: to_item unittest bug
- style: polish and simplify requirements (#672)
- style: add Hugging Face Model Zoo badge (#674)
- style: add openxlab Model Zoo badge (#675)
- style: fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 (#678)
- style: fix mujoco-py compatibility issue for cython<3 (#711)
- style: fix type spell error (#704)
- style: fix pypi release actions ubuntu 18.04 bug
- style: update contact information (e.g. wechat)
- style: polish algorithm doc tables
2023.05.25(v0.4.8)
- env: fix gym hybrid reward dtype bug (#664)
- env: fix atari env id noframeskip bug (#655)
- env: fix typo in gym any_trading env (#654)
- env: update td3bc d4rl config (#659)
- env: polish bipedalwalker config
- algo: add EDAC offline RL algorithm (#639)
- algo: add LN and GN norm_type support in ResBlock (#660)
- algo: add normal value norm baseline for PPOF (#658)
- algo: polish last layer init/norm in MLP (#650)
- algo: polish TD3 monitor variable
- feature: add MAPPO/MASAC task example (#661)
- feature: add PPO example for complex env observation (#644)
- feature: add barrier middleware (#570)
- fix: abnormal collector log and add record_random_collect option (#662)
- fix: to_item compatibility bug (#646)
- fix: trainer dtype transform compatibility bug
- fix: pettingzoo 1.23.0 compatibility bug
- fix: ensemble head unittest bug
- style: fix incompatible gym version bug in Dockerfile.env (#653)
- style: add more algorithm docs
2023.04.11(v0.4.7)
- env: add dmc2gym env support and baseline (#451)
- env: update pettingzoo to the latest version (#597)
- env: polish icm/rnd+onppo config bugs and add app_door_to_key env (#564)
- env: add lunarlander continuous TD3/SAC config
- env: polish lunarlander discrete C51 config
- algo: add Procedure Cloning (PC) imitation learning algorithm (#514)
- algo: add Munchausen Reinforcement Learning (MDQN) algorithm (#590)
- algo: add reward/value norm methods: popart & value rescale & symlog (#605)
- algo: polish reward model config and training pipeline (#624)
- algo: add PPOF reward space demo support (#608)
- algo: add PPOF Atari demo support (#589)
- algo: polish dqn default config and env examples (#611)
- algo: polish comment and clean code about SAC
- feature: add language model (e.g. GPT) training utils (#625)
- feature: remove policy cfg sub fields requirements (#620)
- feature: add full wandb support (#579)
- fix: confusing shallow copy operation about next_obs (#641)
- fix: unsqueeze action_args in PDQN when shape is 1 (#599)
- fix: evaluator return_info tensor type bug (#592)
- fix: deque buffer wrapper PER bug (#586)
- fix: reward model save method compatibility bug
- fix: logger assertion and unittest bug
- fix: bfs test py3.9 compatibility bug
- fix: zergling collector unittest bug
- style: add DI-engine torch-rpc p2p communication docker (#628)
- style: add D4RL docker (#591)
- style: correct typo in task (#617)
- style: correct typo in time_helper (#602)
- style: polish readme and add treetensor example
- style: update contributing doc
2023.02.16(v0.4.6)
- env: add metadrive env and related ppo config (#574)
- env: add acrobot env and related dqn config (#577)
- env: add carracing in box2d (#575)
- env: add new gym hybrid viz (#563)
- env: update cartpole IL config (#578)
- algo: add BDQ algorithm (#558)
- algo: add procedure cloning model (#573)
- feature: add simplified PPOF (PPO × Family) interface (#567) (#568) (#581) (#582)
- fix: to_device and prev_state bug when using ttorch (#571)
- fix: py38 and numpy unittest bugs (#565)
- fix: typo in contrastive_loss.py (#572)
- fix: dizoo envs pkg installation bugs
- fix: multi_trainer middleware unittest bug
- style: add evogym docker (#580)
- style: fix metaworld docker bug
- style: fix setuptools high version incompatibility bug
- style: extend treetensor lowest version
2022.12.13(v0.4.5)
- env: add beergame supply chain optimization env (#512)
- env: add env gym_pybullet_drones (#526)
- env: rename eval reward to episode return (#536)
- algo: add policy gradient algo implementation (#544)
- algo: add MADDPG algo implementation (#550)
- algo: add IMPALA continuous algo implementation (#551)
- algo: add MADQN algo implementation (#540)
- feature: add new task IMPALA-type distributed training scheme (#321)
- feature: add load and save method for replaybuffer (#542)
- feature: add more DingEnvWrapper example (#525)
- feature: add evaluator more info viz support (#538)
- feature: add trackback log for subprocess env manager (#534)
- fix: halfcheetah td3 config file (#537)
- fix: mujoco action_clip args compatibility bug (#535)
- fix: atari a2c config entry bug
- fix: drex unittest compatibility bug
- style: add Roadmap issue of DI-engine (#548)
- style: update related project link and new env doc
2022.10.31(v0.4.4)
- env: add modified gym-hybrid including moving, sliding and hardmove (#505) (#519)
- env: add evogym support (#495) (#527)
- env: add save_replay_gif option (#506)
- env: adapt minigrid_env and related config to latest MiniGrid v2.0.0 (#500)
- algo: add pcgrad optimizer (#489)
- algo: add some features in MLP and ResBlock (#511)
- algo: delete mcts related modules (#518)
- feature: add wandb middleware and demo (#488) (#523) (#528)
- feature: add new properties in Context (#499)
- feature: add single env policy wrapper for policy deployment
- feature: add custom model demo and doc
- fix: build logger args and unittests (#522)
- fix: total_loss calculation in PDQN (#504)
- fix: save gif function bug
- fix: level sample unittest bug
- style: update contact email address (#503)
- style: polish env log and resblock name
- style: add details button in readme
2022.09.23(v0.4.3)
- env: add rule-based gomoku expert (#465)
- algo: fix a2c policy batch size bug (#481)
- algo: enable activation option in collaq attention and mixer
- algo: minor fix about IBC (#477)
- feature: add IGM support (#486)
- feature: add tb logger middleware and demo
- fix: the type conversion in ding_env_wrapper (#483)
- fix: di-orchestrator version bug in unittest (#479)
- fix: data collection errors caused by shallow copies (#475)
- fix: gym==0.26.0 seed args bug
- style: add readme tutorial link(environment & algorithm) (#490) (#493)
- style: adjust location of the default_model method in policy (#453)
2022.09.08(v0.4.2)
- env: add rocket env (#449)
- env: updated pettingzoo env and improved related performance (#457)
- env: add mario env demo (#443)
- env: add MAPPO multi-agent config (#464)
- env: add mountain car (discrete action) environment (#452)
- env: fix multi-agent mujoco gym comaptibility bug
- env: fix gfootball env save_replay variable init bug
- algo: add IBC (Implicit Behaviour Cloning) algorithm (#401)
- algo: add BCO (Behaviour Cloning from Observation) algorithm (#270)
- algo: add continuous PPOPG algorithm (#414)
- algo: add PER in CollaQ (#472)
- algo: add activation option in QMIX and CollaQ
- feature: update ctx to dataclass (#467)
- fix: base_env FinalMeta bug about gym 0.25.0-0.25.1
- fix: config inplace modification bug
- fix: ding cli no argument problem
- fix: import errors after running setup.py (jinja2, markupsafe)
- fix: conda py3.6 and cross platform build bug
- style: add project state and datetime in log dir (#455)
- style: polish notes for q-learning model (#427)
- style: revision to mujoco dockerfile and validation (#474)
- style: add dockerfile for cityflow env
- style: polish default output log format
2022.08.12(v0.4.1)
- env: add gym trading env (#424)
- env: add board games env (tictactoe, gomuku, chess) (#356)
- env: add sokoban env (#397) (#429)
- env: add BC and DQN demo for gfootball (#418) (#423)
- env: add discrete pendulum env (#395)
- algo: add STEVE model-based algorithm (#363)
- algo: add PLR algorithm (#408)
- algo: plugin ST-DIM in PPO (#379)
- feature: add final result saving in training pipeline
- fix: random policy randomness bug
- fix: action_space seed compalbility bug
- fix: discard message sent by self in redis mq (#354)
- fix: remove pace controller (#400)
- fix: import error in serial_pipeline_trex (#410)
- fix: unittest hang and fail bug (#413)
- fix: DREX collect data unittest bug
- fix: remove unused import cv2
- fix: ding CLI env/policy option bug
- style: upgrade Python version from 3.6-3.8 to 3.7-3.9
- style: upgrade gym version from 0.20.0 to 0.25.0
- style: upgrade torch version from 1.10.0 to 1.12.0
- style: upgrade mujoco bin from 2.0.0 to 2.1.0
- style: add buffer api description (#371)
- style: polish VAE comments (#404)
- style: unittest for FQF (#412)
- style: add metaworld dockerfile (#432)
- style: remove opencv requirement in default setting
- style: update long description in setup.py
2022.06.21(v0.4.0)
- env: add MAPPO/MASAC all configs in SMAC (#310) **(SOTA results in SMAC!!!)**
- env: add dmc2gym env (#344) (#360)
- env: remove DI-star requirements of dizoo/smac, use official pysc2 (#302)
- env: add latest GAIL mujoco config (#298)
- env: polish procgen env (#311)
- env: add MBPO ant and humanoid config for mbpo (#314)
- env: fix slime volley env obs space bug when agent_vs_agent
- env: fix smac env obs space bug
- env: fix import path error in lunarlander (#362)
- algo: add Decision Transformer algorithm (#327) (#364)
- algo: add on-policy PPG algorithm (#312)
- algo: add DDPPO & add model-based SAC with lambda-return algorithm (#332)
- algo: add infoNCE loss and ST-DIM algorithm (#326)
- algo: add FQF distributional RL algorithm (#274)
- algo: add continuous BC algorithm (#318)
- algo: add pure policy gradient PPO algorithm (#382)
- algo: add SQIL + SAC algorithm (#348)
- algo: polish NGU and related modules (#283) (#343) (#353)
- algo: add marl distributional td loss (#331)
- feature: add new worker middleware (#236)
- feature: refactor model-based RL pipeline (ding/world_model) (#332)
- feature: refactor logging system in the whole DI-engine (#316)
- feature: add env supervisor design (#330)
- feature: support async reset for envpool env manager (#250)
- feature: add log videos to tensorboard (#320)
- feature: refactor impala cnn encoder interface (#378)
- fix: env save replay bug
- fix: transformer mask inplace operation bug
- fix: transtion_with_policy_data bug in SAC and PPG
- style: add dockerfile for ding:hpc image (#337)
- style: fix mpire 2.3.5 which handles default processes more elegantly (#306)
- style: use FORMAT_DIR instead of ./ding (#309)
- style: update quickstart colab link (#347)
- style: polish comments in ding/model/common (#315)
- style: update mujoco docker download path (#386)
- style: fix protobuf new version compatibility bug
- style: fix torch1.8.0 torch.div compatibility bug
- style: update doc links in readme
- style: add outline in readme and update wechat image
- style: update head image and refactor docker dir
2022.04.23(v0.3.1)
- env: polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299)
- env: add GRF academic env and config (#281)
- env: update env inferface of GRF (#258)
- env: update D4RL offline RL env and config (#285)
- env: polish PomdpAtariEnv (#254)
- algo: DREX algorithm (#218)
- feature: separate mq and parallel modules, add redis (#247)
- feature: rename env variables; fix attach_to parameter (#244)
- feature: env implementation check (#275)
- feature: adjust and set the max column number of tabulate in log (#296)
- feature: add drop_extra option for sample collect
- feature: speed up GTrXL forward method + GRU unittest (#253) (#292)
- fix: add act_scale in DingEnvWrapper; fix envpool env manager (#245)
- fix: auto_reset=False and env_ref bug in env manager (#248)
- fix: data type and deepcopy bug in RND (#288)
- fix: share_memory bug and multi_mujoco env (#279)
- fix: some bugs in GTrXL (#276)
- fix: update gym_vector_env_manager and add more unittest (#241)
- fix: mdpolicy random collect bug (#293)
- fix: gym.wrapper save video replay bug
- fix: collect abnormal step format bug and add unittest
- test: add buffer benchmark & socket test (#284)
- style: upgrade mpire (#251)
- style: add GRF(google research football) docker (#256)
- style: update policy and gail comment
2022.03.24(v0.3.0)
- env: add bitfilp HER DQN benchmark (#192) (#193) (#197)
- env: slime volley league training demo (#229)
- algo: Gated TransformXL (GTrXL) algorithm (#136)
- algo: TD3 + VAE(HyAR) latent action algorithm (#152)
- algo: stochastic dueling network (#234)
- algo: use log prob instead of using prob in ACER (#186)
- feature: support envpool env manager (#228)
- feature: add league main and other improvements in new framework (#177) (#214)
- feature: add pace controller middleware in new framework (#198)
- feature: add auto recover option in new framework (#242)
- feature: add k8s parser in new framework (#243)
- feature: support async event handler and logger (#213)
- feautre: add grad norm calculator (#205)
- feautre: add gym vector env manager (#147)
- feautre: add train_iter and env_step in serial pipeline (#212)
- feautre: add rich logger handler (#219) (#223) (#232)
- feature: add naive lr_scheduler demo
- refactor: new BaseEnv and DingEnvWrapper (#171) (#231) (#240)
- polish: MAPPO and MASAC smac config (#209) (#239)
- polish: QMIX smac config (#175)
- polish: R2D2 atari config (#181)
- polish: A2C atari config (#189)
- polish: GAIL box2d and mujoco config (#188)
- polish: ACER atari config (#180)
- polish: SQIL atari config (#230)
- polish: TREX atari/mujoco config
- polish: IMPALA atari config
- polish: MBPO/D4PG mujoco config
- fix: random_collect compatible to episode collector (#190)
- fix: remove default n_sample/n_episode value in policy config (#185)
- fix: PDQN model bug on gpu device (#220)
- fix: TREX algorithm CLI bug (#182)
- fix: DQfD JE computation bug and move to AdamW optimizer (#191)
- fix: pytest problem for parallel middleware (#211)
- fix: mujoco numpy compatibility bug
- fix: markupsafe 2.1.0 bug
- fix: framework parallel module network emit bug
- fix: mpire bug and disable algotest in py3.8
- fix: lunarlander env import and env_id bug
- fix: icm unittest repeat name bug
- fix: buffer thruput close bug
- test: resnet unittest (#199)
- test: SAC/SQN unittest (#207)
- test: CQL/R2D3/GAIL unittest (#201)
- test: NGU td unittest (#210)
- test: model wrapper unittest (#215)
- test: MAQAC model unittest (#226)
- style: add doc docker (#221)
2022.01.01(v0.2.3)
- env: add multi-agent mujoco env (#146)
- env: add delay reward mujoco env (#145)
- env: fix port conflict in gym_soccer (#139)
- algo: MASAC algorithm (#112)
- algo: TREX algorithm (#119) (#144)
- algo: H-PPO hybrid action space algorithm (#140)
- algo: residual link in R2D2 (#150)
- algo: gumbel softmax (#169)
- algo: move actor_head_type to action_space field
- feature: new main pipeline and async/parallel framework (#142) (#166) (#168)
- feature: refactor buffer, separate algorithm and storage (#129)
- feature: cli in new pipeline(ditask) (#160)
- feature: add multiprocess tblogger, fix circular reference problem (#156)
- feature: add multiple seed cli
- feature: polish eps_greedy_multinomial_sample in model_wrapper (#154)
- fix: R2D3 abs priority problem (#158) (#161)
- fix: multi-discrete action space policies random action bug (#167)
- fix: doc generate bug with enum_tools (#155)
- style: more comments about R2D2 (#149)
- style: add doc about how to migrate a new env
- style: add doc about env tutorial in dizoo
- style: add conda auto release (#148)
- style: udpate zh doc link
- style: update kaggle tutorial link
2021.12.03(v0.2.2)
- env: apple key to door treasure env (#128)
- env: add bsuite memory benchmark (#138)
- env: polish atari impala config
- algo: Guided Cost IRL algorithm (#57)
- algo: ICM exploration algorithm (#41)
- algo: MP-DQN hybrid action space algorithm (#131)
- algo: add loss statistics and polish r2d3 pong config (#126)
- feautre: add renew env mechanism in env manager and update timeout mechanism (#127) (#134)
- fix: async subprocess env manager reset bug (#137)
- fix: keepdims name bug in model wrapper
- fix: on-policy ppo value norm bug
- fix: GAE and RND unittest bug
- fix: hidden state wrapper h tensor compatiblity
- fix: naive buffer auto config create bug
- style: add supporters list
2021.11.22(v0.2.1)
- env: gym-hybrid env (#86)
- env: gym-soccer (HFO) env (#94)
- env: Go-Bigger env baseline (#95)
- env: add the bipedalwalker config of sac and ppo (#121)
- algo: DQfD Imitation Learning algorithm (#48) (#98)
- algo: TD3BC offline RL algorithm (#88)
- algo: MBPO model-based RL algorithm (#113)
- algo: PADDPG hybrid action space algorithm (#109)
- algo: PDQN hybrid action space algorithm (#118)
- algo: fix R2D2 bugs and produce benchmark, add naive NGU (#40)
- algo: self-play training demo in slime_volley env (#23)
- algo: add example of GAIL entry + config for mujoco (#114)
- feature: enable arbitrary policy num in serial sample collector
- feautre: add torch DataParallel for single machine multi-GPU
- feature: add registry force_overwrite argument
- feature: add naive buffer periodic thruput seconds argument
- test: add pure docker setting test (#103)
- test: add unittest for dataset and evaluator (#107)
- test: add unittest for on-policy algorithm (#92)
- test: add unittest for ppo and td (MARL case) (#89)
- test: polish collector benchmark test
- fix: target model wrapper hard reset bug
- fix: fix learn state_dict target model bug
- fix: ppo bugs and update atari ppo offpolicy config (#108)
- fix: pyyaml version bug (#99)
- fix: small fix on bsuite environment (#117)
- fix: discrete cql unittest bug
- fix: release workflow bug
- fix: base policy model state_dict overlap bug
- fix: remove on_policy option in dizoo config and entry
- fix: remove torch in env
- style: gym version > 0.20.0
- style: torch version >= 1.1.0, <= 1.10.0
- style: ale-py == 0.7.0
2021.9.30(v0.2.0)
- env: overcooked env (#20)
- env: procgen env (#26)
- env: modified predator env (#30)
- env: d4rl env (#37)
- env: imagenet dataset (#27)
- env: bsuite env (#58)
- env: move atari_py to ale-py
- algo: SQIL algorithm (#25) (#44)
- algo: CQL algorithm (discrete/continuous) (#37) (#68)
- algo: MAPPO algorithm (#62)
- algo: WQMIX algorithm (#24)
- algo: D4PG algorithm (#76)
- algo: update multi discrete policy(dqn, ppo, rainbow) (#51) (#72)
- feature: image classification training pipeline (#27)
- feature: add force_reproducibility option in subprocess env manager
- feature: add/delete/restart replicas via cli for k8s
- feautre: add league metric (trueskill and elo) (#22)
- feature: add tb in naive buffer and modify tb in advanced buffer (#39)
- feature: add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49)
- feature: add hyper-parameter scheduler module (#38)
- feautre: add plot function (#59)
- fix: acer bug and update atari result (#21)
- fix: mappo nan bug and dict obs cannot unsqueeze bug (#54)
- fix: r2d2 hidden state and obs arange bug (#36) (#52)
- fix: ppo bug when use dual_clip and adv > 0
- fix: qmix double_q hidden state bug
- fix: spawn context problem in interaction unittest (#69)
- fix: formatted config no eval bug (#53)
- fix: the catch statments that will never succeed and system proxy bug (#71) (#79)
- fix: lunarlander config
- fix: c51 head dimension mismatch bug
- fix: mujoco config typo bug
- fix: ppg atari config bug
- fix: max use and priority update special branch bug in advanced_buffer
- style: add docker deploy in github workflow (#70) (#78) (#80)
- style: support PyTorch 1.9.0
- style: add algo/env list in README
- style: rename advanced_buffer register name to advanced
2021.8.3(v0.1.1)
- env: selfplay/league demo (#12)
- env: pybullet env (#16)
- env: minigrid env (#13)
- env: atari enduro config (#11)
- algo: on policy PPO (#9)
- algo: ACER algorithm (#14)
- feature: polish experiment directory structure (#10)
- refactor: split doc to new repo (#4)
- fix: atari env info action space bug
- fix: env manager retry wrapper raise exception info bug
- fix: dist entry disable-flask-log typo
- style: codestyle optimization by lgtm (#7)
- style: code/comment statistics badge
- style: github CI workflow
2021.7.8(v0.1.0)