jangwon-kim-cocel
/

BPQL

Reinforcement Learning

Model card Files Files and versions

BPQL / README.md

jangwon-kim-cocel's picture

jangwon-kim-cocel

Update README.md

775637c verified 3 days ago

|

history blame contribute delete

2.52 kB

	---
	license: mit
	language:
	- en
	tags:
	- reinforcement-learning
	- bpql
	- delay
	- rl
	- pytorch
	- belief
	- projection
	pipeline_tag: reinforcement-learning
	---
	<div align="center">
	<h1>Belief Projection-Based Q-Learning</h1>
	<a href="https://www.python.org/">
	<img src="https://img.shields.io/badge/Python-3.8-blue?logo=python&style=flat-square" alt="Python Badge"/>
	</a>

	<a href="https://pytorch.org/">
	<img src="https://img.shields.io/badge/PyTorch-2.1.0-%23EE4C2C?logo=pytorch&style=flat-square" alt="PyTorch Badge"/>
	</a>

	<a href="https://proceedings.neurips.cc/paper_files/paper/2023/hash/0252a434b18962c94910c07cd9a7fecc-Abstract-Conference.html">
	<img src="https://img.shields.io/badge/NeurIPS%202023-Paper-%23007ACC?style=flat-square" alt="NeurIPS 2023 Badge"/>
	</a>
	<br/><br/>
	<img src="./figures/neurips_logo.png" width="200px" style="margin: 0 10px;"/>
	</div>

	## [NeurIPS 2023] Official Implementation of Belief Projection-Based Q-Learning (BPQL)
	This repository contains the PyTorch implementation of BPQL introduced in the paper: _Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback_ by Jangwon Kim et al., presented at Advances in Neural Information Processing Systems (NeurIPS), 2023.


	## 📄 Paper Link
	>You can see the paper here: https://proceedings.neurips.cc/paper_files/paper/2023/hash/0252a434b18962c94910c07cd9a7fecc-Abstract-Conference.html


	## 🚀 Achieves S.O.T.A. Performance, Yet Very Simple to Implement

	* Supports both observation delay, action delay, and their combination
	* Performance Plot ⬇️
	<p align="center">
	<img src="./figures/plot.png" alt="BPQL Performance Plot" width="600"/>
	</p>


	## ▶️ How to Run?
	### Option 1: Run the script file
	```
	>chmod +x run.sh
	>./run.sh
	```

	### Option 2: Run main.py with arguments
	```
	python main.py --env-name HalfCheetah-v3 --random-seed 2023 --obs-delayed-steps 5 --act-delayed-steps 4 --max-step 1000000
	```
	---

	## ✅Test Environment
	```
	python == 3.8.10
	gym == 0.26.2
	mujoco_py == 2.1.2.14
	pytorch == 2.1.0
	numpy == 1.24.3
	```

	## 📚 Citation Example
	```
	@inproceedings{kim2023cocel,
	author = {Kim, Jangwon and Kim, Hangyeol and Kang, Jiwook and Baek, Jongchan and Han, Soohee},
	booktitle = {Advances in Neural Information Processing Systems},
	pages = {678--696},
	title = {Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback},
	volume = {36},
	year = {2023}
	}
	```