Commit 6b23ca2d authored by sbl1996@126.com's avatar sbl1996@126.com

Update README

parent 9a2a21e3
...@@ -2,6 +2,11 @@ ...@@ -2,6 +2,11 @@
YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). It consists of a game environment and a set of AI agents. YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). It consists of a game environment and a set of AI agents.
## News
- April 7, 2024: We have switched to JAX for training and evalution due to the better performance and flexibility. The scripts are in the `scripts/jax` directory. The documentation is in progress. PyTorch scripts are still available in the `scripts` directory, but they are not maintained.
## Table of Contents ## Table of Contents
- [Subprojects](#subprojects) - [Subprojects](#subprojects)
- [ygoenv](#ygoenv) - [ygoenv](#ygoenv)
...@@ -15,6 +20,7 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). ...@@ -15,6 +20,7 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL).
- [Training](#training) - [Training](#training)
- [Single GPU Training](#single-gpu-training) - [Single GPU Training](#single-gpu-training)
- [Distributed Training](#distributed-training) - [Distributed Training](#distributed-training)
- [Training (JAX)](#training-jax)
- [Plan](#plan) - [Plan](#plan)
- [Training](#training-1) - [Training](#training-1)
- [Inference](#inference) - [Inference](#inference)
...@@ -158,21 +164,29 @@ OMP_NUM_THREADS=4 torchrun --nnodes=2 --nproc-per-node=8 --node-rank=1 \ ...@@ -158,21 +164,29 @@ OMP_NUM_THREADS=4 torchrun --nnodes=2 --nproc-per-node=8 --node-rank=1 \
The script options are mostly the same as the single GPU training. We only scale the batch size and the number of environments to the number of available CPUs and GPUs. The learning rate is then scaled according to the batch size. The script options are mostly the same as the single GPU training. We only scale the batch size and the number of environments to the number of available CPUs and GPUs. The learning rate is then scaled according to the batch size.
## Plan ## Plan
### Environment
- Fix information leak in the history actions
### Training ### Training
- Add opponent history actions and turn info to the history actions
- Evaluation with old models during training - Evaluation with old models during training
- LSTM for memory
- League training following AlphaStar and ROA-Star - League training following AlphaStar and ROA-Star
### Inference ### Inference
- MCTS-based planning - MCTS-based planning
- Support of play in YGOPro - Support of play in YGOPro
### Documentation
- JAX training and evaluation
## Sponsors
This work is supported with Cloud TPUs from Google's [TPU Research Cloud (TRC)](https://sites.research.google/trc/about/).
## Related Projects ## Related Projects
- [ygopro-core](https://github.com/Fluorohydride/ygopro-core) - [ygopro-core](https://github.com/Fluorohydride/ygopro-core)
- [envpool](https://github.com/sail-sg/envpool) - [envpool](https://github.com/sail-sg/envpool)
- [yugioh-ai](https://github.com/melvinzhang/yugioh-ai]) - [yugioh-ai](https://github.com/melvinzhang/yugioh-ai)
- [yugioh-game](https://github.com/tspivey/yugioh-game) - [yugioh-game](https://github.com/tspivey/yugioh-game)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment