Add docs

91e612fc · Biluo Shen · ff67e8c4 · 91e612fc · 91e612fc
Commit 91e612fc authored Mar 13, 2024 by Biluo Shen
Hide whitespace changes
Inline Side-by-side

Showing with 95 additions and 9 deletions

README.md README.md +94 -8

scripts/ppo.py scripts/ppo.py +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -2,26 +2,49 @@
 YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). It consists of a game environment and a set of AI agents.
-## ygoenv
+## Table of Contents
+- [Subprojects](#subprojects)
+  - [ygoenv](#ygoenv)
+  - [ygoai](#ygoai)
+- [Building](#building)
+- [Evaluation](#evaluation)
+  - [Obtain a trained agent](#obtain-a-trained-agent)
+  - [Play against the agent](#play-against-the-agent)
+  - [Battle between two agents](#battle-between-two-agents)
+  - [Serialize agent](#serialize-agent)
+- [Training](#training)
+  - [Single GPU Training](#single-gpu-training)
+  - [Distributed Training](#distributed-training)
+- [Plan](#plan)
+  - [Training](#training-1)
+  - [Inference](#inference)
+- [Related Projects](#related-projects)
+## Subprojects
+### ygoenv
 `ygoenv` is a high performance game environment for Yu-Gi-Oh! It is initially inspired by [yugioh-ai](https://github.com/melvinzhang/yugioh-ai]) and [yugioh-game](https://github.com/tspivey/yugioh-game), and now implemented on top of [envpool](https://github.com/sail-sg/envpool).
-## ygoai
+### ygoai
 `ygoai` is a set of AI agents for playing Yu-Gi-Oh! It aims to achieve superhuman performance like AlphaGo and AlphaZero, with or without human knowledge. Currently, we focus on using reinforcement learning to train the agents.
 ## Building
-### Prerequisites
+The following building instructions are only tested on Ubuntu (WSL2) and may not work on other platforms.
+To build the project, you need to install the following prerequisites first:
 - gcc 10+ or clang 11+
 - [xmake](https://xmake.io/#/getting_started)
 - PyTorch 2.0 or later with cuda support
-After installing the prerequisites, you can build the project with the following commands:
+After that, you can build with the following commands:
 ```bash
 git clone https://github.com/sbl1996/ygo-agent.git
 cd ygo-agent
-git checkout eval_with_ptj  # checkout to the stable branch
+git checkout eval_with_ptj  # switch to the stable branch
 xmake f -y
 make
 ```
@@ -29,13 +52,13 @@ make
 Sometimes you may fail to install the required libraries by xmake automatically (e.g., `glog` and `gflags`). You can install them manually and put them in the search path (LD_LIBRARY_PATH or others), then xmake will find them.
 After building, you can run the following command to test the environment. If you see episode logs, it means the environment is working. Try more usage in the next section!
 ```bash
 cd scripts
 python -u eval.py --env-id "YGOPro-v0" --deck ../assets/deck/  --num_episodes 32 --strategy random  --lang chinese --num_envs 16
 ```
+## Evaluation
-## Usage
 ### Obtain a trained agent
@@ -70,7 +93,70 @@ After training, we can serialize the trained agent model to a file for later use
 python -u eval.py --agent --checkpoint checkpoints/1234_1000M.pt --num_embeddings 999 --convert --optimize
 ```
-## TODO
+## Training
+Training an agent requires a lot of computational resources, typically 8x4090 GPUs and 128-core CPU for a few days. We don't recommend training the agent on your local machine. Reducing the number of decks for training may reduce the computational resources required.
+### Single GPU Training
+We can train the agent with a single GPU using the following command:
+```bash
+python -u ppo.py --deck ../assets/deck --seed 1 --embedding_file embed.pkl \
+--minibatch-size 128 --learning-rate 1e-4 --update-epochs 2  --save_interval 100 \
+--compile reduce-overhead --env_threads 16 --num_envs 64 --eval_episodes 32
+```
+#### Deck
+`deck` can be a directory containing `.ydk` files or a single `.ydk` file (e.g., `deck/` or `deck/BlueEyes.ydk`). The well tested and supported decks are in the `assets/deck` directory.
+Supported cards are listed in `scripts/code_list.txt`. New decks which only contain supported cards can be used, but errors may also occur due to the complexity of the game.
+#### Embedding
+To handle the diverse and complex card effects, we have converted the card information and effects into text and used large language models (LLM) to generate embeddings from the text. The embeddings are stored in a file (e.g., `embed.pkl`).
+We provide one in the [releases](https://github.com/sbl1996/ygo-agent/releases/tag/v0.1), which named `embed{n}.pkl` where `n` is the number of cards in `code_list.txt`.
+You can choose to not use the embeddings by skip the `--embedding_file` option. If you do it, remember to set `--num_embeddings` to `999` in the `eval.py` script.
+#### Compile
+We use `torch.compile` to speed up the overall training process. It is very important and can reduce the overall time by 2x or more. If the compilation fails, you may update the PyTorch version to the latest one.
+#### Seed
+The `seed` option is used to set the random seed for reproducibility. However, many optimizations used in the training are not deterministic, so the results may still vary.
+For debugging, you can set `--compile None --torch-deterministic` with the same seed to get a deterministic result.
+#### Hyperparameters
+More PPO hyperparameters can be found in the `ppo.py` script. Tuning them may improve the performance but requires more computational resources.
+### Distributed Training
+The `ppo.py` script supports single-node and multi-node distributed training with `torchrun`. Start distributed training like this:
+```bash
+# single node
+OMP_NUM_THREADS=4 torchrun --standalone --nnodes=1 --nproc-per-node=8 ppo.py \
+# multi node on nodes 0
+OMP_NUM_THREADS=4 torchrun --nnodes=2 --nproc-per-node=8 --node-rank=0 \
+--rdzv-id=12941 --master-addr=$MASTER_ADDR --master-port=$MASTER_PORT ppo.py \
+# multi node on nodes 1
+OMP_NUM_THREADS=4 torchrun --nnodes=2 --nproc-per-node=8 --node-rank=1 \
+--rdzv-id=12941 --master-addr=$MASTER_ADDR --master-port=$MASTER_PORT ppo.py \
+# script options
+--deck ../assets/deck --seed 1 --embedding_file embed.pkl \
+--minibatch-size 2048 --learning-rate 5e-4 --update-epochs 2 --save_interval 100 \
+--compile reduce-overhead --env_threads 128 --num_envs 1024 --eval_episodes 128
+```
+The script options are mostly the same as the single GPU training. We only scale the batch size and the number of environments to the number of available CPUs and GPUs. The learning rate is then scaled according to the batch size.
+## Plan
 ### Training
 - Evaluation with old models during training

--- a/scripts/ppo.py
+++ b/scripts/ppo.py
@@ -98,7 +98,7 @@ class Args:
    """the target KL divergence threshold"""
    learn_opponent: bool = True
    """if toggled, the samples from the opponent will be used to train the agent"""
-    collect_length: int = None
+    collect_length: Optional[int] = None
    """the length of the buffer, only the first `num_steps` will be used for training (partial GAE)"""
    backend: Literal["gloo", "nccl", "mpi"] = "nccl"