Commit c0689b01 authored by sbl1996@126.com's avatar sbl1996@126.com

Update README

parent 350c29a8
...@@ -4,6 +4,8 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). ...@@ -4,6 +4,8 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL).
## News ## News
- April 18, 2024: We have fully switched to JAX for training and evaluation. Check the evaluation sections for more details and try the new JAX-trained agents.
- April 14, 2024: LSTM has been implemented and well tested. See `scripts/jax/ppo.py` for more details. - April 14, 2024: LSTM has been implemented and well tested. See `scripts/jax/ppo.py` for more details.
- April 7, 2024: We have switched to JAX for training and evalution due to the better performance and flexibility. The scripts are in the `scripts/jax` directory. The documentation is in progress. PyTorch scripts are still available in the `scripts` directory, but they are not maintained. - April 7, 2024: We have switched to JAX for training and evalution due to the better performance and flexibility. The scripts are in the `scripts/jax` directory. The documentation is in progress. PyTorch scripts are still available in the `scripts` directory, but they are not maintained.
...@@ -14,12 +16,12 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). ...@@ -14,12 +16,12 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL).
- [ygoenv](#ygoenv) - [ygoenv](#ygoenv)
- [ygoai](#ygoai) - [ygoai](#ygoai)
- [Building](#building) - [Building](#building)
- [Common Issues](#common-issues)
- [Evaluation](#evaluation) - [Evaluation](#evaluation)
- [Obtain a trained agent](#obtain-a-trained-agent) - [Obtain a trained agent](#obtain-a-trained-agent)
- [Play against the agent](#play-against-the-agent) - [Play against the agent](#play-against-the-agent)
- [Battle between two agents](#battle-between-two-agents) - [Battle between two agents](#battle-between-two-agents)
- [Serialize agent](#serialize-agent) - [Training (Deprecated, to be updated)](#training-deprecated-to-be-updated)
- [Training](#training)
- [Single GPU Training](#single-gpu-training) - [Single GPU Training](#single-gpu-training)
- [Distributed Training](#distributed-training) - [Distributed Training](#distributed-training)
- [Plan](#plan) - [Plan](#plan)
...@@ -27,6 +29,7 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). ...@@ -27,6 +29,7 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL).
- [Training](#training-1) - [Training](#training-1)
- [Inference](#inference) - [Inference](#inference)
- [Documentation](#documentation) - [Documentation](#documentation)
- [Sponsors](#sponsors)
- [Related Projects](#related-projects) - [Related Projects](#related-projects)
...@@ -47,20 +50,18 @@ To build the project, you need to install the following prerequisites first: ...@@ -47,20 +50,18 @@ To build the project, you need to install the following prerequisites first:
- gcc 10+ or clang 11+ - gcc 10+ or clang 11+
- CMake 3.12+ - CMake 3.12+
- [xmake](https://xmake.io/#/getting_started) - [xmake](https://xmake.io/#/getting_started)
- PyTorch 2.0 or later with cuda support - jax 0.4.25+, flax 0.8.2+, distrax 0.1.5+ (CUDA is optional)
After that, you can build with the following commands: After that, you can build with the following commands:
```bash ```bash
git clone https://github.com/sbl1996/ygo-agent.git git clone https://github.com/sbl1996/ygo-agent.git
cd ygo-agent cd ygo-agent
git checkout eval_with_ptj # switch to the stable branch git checkout stable # switch to the stable branch
xmake f -y xmake f -y
make make
``` ```
Sometimes you may fail to install the required libraries by xmake automatically (e.g., `glog` and `gflags`). You can install them manually and put them in the search path (LD_LIBRARY_PATH or others), then xmake will find them.
After building, you can run the following command to test the environment. If you see episode logs, it means the environment is working. Try more usage in the next section! After building, you can run the following command to test the environment. If you see episode logs, it means the environment is working. Try more usage in the next section!
```bash ```bash
...@@ -68,44 +69,61 @@ cd scripts ...@@ -68,44 +69,61 @@ cd scripts
python -u eval.py --env-id "YGOPro-v0" --deck ../assets/deck/ --num_episodes 32 --strategy random --lang chinese --num_envs 16 python -u eval.py --env-id "YGOPro-v0" --deck ../assets/deck/ --num_episodes 32 --strategy random --lang chinese --num_envs 16
``` ```
### Common Issues
#### Package version not found by xmake
Delete `repositories`, `cache`, `packages` directories in the `~/.xmake` directory and run `xmake f -y` again.
#### Install packages failed with xmake
Sometimes you may fail to install the required libraries by xmake automatically (e.g., `glog` and `gflags`). You can install them manually (e.g., `apt install`) and put them in the search path (`$LD_LIBRARY_PATH` or others), then xmake will find them.
#### GLIBC and GLIBCXX version conflict
Mostly, it is because your `libstdc++` from `$CONDA_PREFIX` is older than the system one, while xmake compiles libraries with the system one and you run programs with the `$CONDA_PREFIX` one. If so, you can delete the old `libstdc++` from `$CONDA_PREFIX` (backup it first) and make a soft link to the system one.
#### Other xmake issues
Open a new terminal and try again. God bless you.
## Evaluation ## Evaluation
### Obtain a trained agent ### Obtain a trained agent
We provide some trained agents in the [releases](https://github.com/sbl1996/ygo-agent/releases/tag/v0.1). Check these TorchScript files named with `{commit_hash}_{exp_id}_{step}.ptj` and download them to your local machine. Switch to the corresponding commit hash before using it. The following usage assumes you have it. We provide trained agents in the [releases](https://github.com/sbl1996/ygo-agent/releases/tag/v0.1). Check these Flax checkpoint files named with `{commit_hash}_{exp_id}_{step}.flax_model` and download (the lastest) one to your local machine. The following usage assumes you have it.
Notice that the provided `ptj` can only run on GPU, but not CPU. Actually, the agent can run in real-time on CPU, we will provide a CPU version in the future. If you are not in the `stable` branch or encounter any other running issues, you can try to switch to the `commit_hash` commit before using the agent. You may need to rebuild the project after switching:
```bash
xmake f -c
xmake b -r ygopro_ygoenv
```
### Play against the agent ### Play against the agent
We can use `eval.py` to play against the trained agent with a MUD-like interface in the terminal. We can use `eval.py` to play against the trained agent with a MUD-like interface in the terminal. We add `--xla_device cpu` to run the agent on the CPU.
```bash ```bash
python -u eval.py --agent --deck ../assets/deck --lang chinese --checkpoint checkpoints/1234_1000M.ptj --play python -u eval.py --deck ../assets/deck --lang chinese --xla_device cpu --checkpoint checkpoints/350c29a_7565_6700M.flax_model --play
``` ```
We can enter `quit` to exit the game. Run `python eval.py --help` for more options, for example, `--player 0` to make the agent play as the first player, `--deck1 TenyiSword` to force the first player to use the TenyiSword deck.
### Battle between two agents ### Battle between two agents
We can use `battle.py` to let two agents play against each other and find out which one is better. We can use `battle.py` to let two agents play against each other and find out which one is better.
```bash ```bash
python -u battle.py --deck ../assets/deck --checkpoint1 checkpoints/1234_1000M.ptj --checkpoint2 checkpoints/9876_100M.ptj --num-episodes=256 --num_envs=32 --seed 0 python -u battle.py --deck ../assets/deck --checkpoint1 checkpoints/350c29a_7565_6700M.flax_model --checkpoint2 checkpoints/350c29a_1166_6200M.flax_model --num-episodes 32 --num_envs 8 --seed 0
``` ```
You can set `--num_envs=1 --verbose --record` to generate `.yrp` replay files. We can set `--record` to generate `.yrp` replay files to the `replay` directory. The `yrp` files can be replayed in YGOPro compatible clients (YGOPro, YGOPro2, KoishiPro, MDPro). Change `--seed` to generate different games.
### Serialize agent
After training, we can serialize the trained agent model to a file for later use without keeping source code of the model. The serialized model file will end with `.ptj` (PyTorch JIT) extension.
```bash ```bash
python -u eval.py --agent --checkpoint checkpoints/1234_1000M.pt --num_embeddings 999 --convert --optimize python -u battle.py --deck ../assets/deck --checkpoint1 checkpoints/350c29a_7565_6700M.flax_model --checkpoint2 checkpoints/350c29a_1166_6200M.flax_model --num-episodes 16 --record --seed 0
``` ```
If you have used `--embedding_file` during training, skip the `--num_embeddings` option.
## Training ## Training (Deprecated, to be updated)
Training an agent requires a lot of computational resources, typically 8x4090 GPUs and 128-core CPU for a few days. We don't recommend training the agent on your local machine. Reducing the number of decks for training may reduce the computational resources required. Training an agent requires a lot of computational resources, typically 8x4090 GPUs and 128-core CPU for a few days. We don't recommend training the agent on your local machine. Reducing the number of decks for training may reduce the computational resources required.
...@@ -175,6 +193,7 @@ The script options are mostly the same as the single GPU training. We only scale ...@@ -175,6 +193,7 @@ The script options are mostly the same as the single GPU training. We only scale
### Training ### Training
- League training (AlphaStar, ROA-Star) - League training (AlphaStar, ROA-Star)
- Nash equilibrium training (OSFP, DeepNash) - Nash equilibrium training (OSFP, DeepNash)
- Individual agent for first and second player
- Centralized critic with full observation - Centralized critic with full observation
### Inference ### Inference
...@@ -182,7 +201,7 @@ The script options are mostly the same as the single GPU training. We only scale ...@@ -182,7 +201,7 @@ The script options are mostly the same as the single GPU training. We only scale
- Support of play in YGOPro - Support of play in YGOPro
### Documentation ### Documentation
- JAX training and evaluation - JAX training
## Sponsors ## Sponsors
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment