Training an agent requires a lot of computational resources, typically 8x4090 GPUs and 128-core CPU for a few days. We don't recommend training the agent on your local machine. Reducing the number of decks for training may reduce the computational resources required.
...
...
@@ -136,9 +136,12 @@ Training an agent requires a lot of computational resources, typically 8x4090 GP
We can train the agent with a single GPU using the following command:
@@ -151,43 +154,16 @@ To handle the diverse and complex card effects, we have converted the card infor
We provide one in the [releases](https://github.com/sbl1996/ygo-agent/releases/tag/v0.1), which named `embed{n}.pkl` where `n` is the number of cards in `code_list.txt`.
You can choose to not use the embeddings by skip the `--embedding_file` option. If you do it, remember to set `--num_embeddings` to `999` in the `eval.py` script.
#### Compile
We use `torch.compile` to speed up the overall training process. It is very important and can reduce the overall time by 2x or more. If the compilation fails, you may update the PyTorch version to the latest one.
You can choose to not use the embeddings by skip the `--embedding_file` option.
#### Seed
The `seed` option is used to set the random seed for reproducibility. However, many optimizations used in the training are not deterministic, so the results may still vary.
For debugging, you can set `--compile None --torch-deterministic` with the same seed to get a deterministic result.
The `seed` option is used to set the random seed for reproducibility. The training and and evaluation will be exactly the same under the same seed.
#### Hyperparameters
More PPO hyperparameters can be found in the `ppo.py` script. Tuning them may improve the performance but requires more computational resources.
More hyperparameters can be found in the `cleanba.py` script. Tuning them may improve the performance but requires more computational resources.
### Distributed Training
The `ppo.py` script supports single-node and multi-node distributed training with `torchrun`. Start distributed training like this:
The script options are mostly the same as the single GPU training. We only scale the batch size and the number of environments to the number of available CPUs and GPUs. The learning rate is then scaled according to the batch size.