Commit 0ecf0a00 authored by sbl1996@126.com's avatar sbl1996@126.com

Update README

parent d2dcecde
...@@ -4,6 +4,8 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). ...@@ -4,6 +4,8 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL).
## News ## News
- April 14, 2024: LSTM has been implemented and well tested. See `scripts/jax/ppo.py` for more details.
- April 7, 2024: We have switched to JAX for training and evalution due to the better performance and flexibility. The scripts are in the `scripts/jax` directory. The documentation is in progress. PyTorch scripts are still available in the `scripts` directory, but they are not maintained. - April 7, 2024: We have switched to JAX for training and evalution due to the better performance and flexibility. The scripts are in the `scripts/jax` directory. The documentation is in progress. PyTorch scripts are still available in the `scripts` directory, but they are not maintained.
...@@ -20,10 +22,11 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL). ...@@ -20,10 +22,11 @@ YGO Agent is a project to create a Yu-Gi-Oh! AI using deep learning (LLMs, RL).
- [Training](#training) - [Training](#training)
- [Single GPU Training](#single-gpu-training) - [Single GPU Training](#single-gpu-training)
- [Distributed Training](#distributed-training) - [Distributed Training](#distributed-training)
- [Training (JAX)](#training-jax)
- [Plan](#plan) - [Plan](#plan)
- [Environment](#environment)
- [Training](#training-1) - [Training](#training-1)
- [Inference](#inference) - [Inference](#inference)
- [Documentation](#documentation)
- [Related Projects](#related-projects) - [Related Projects](#related-projects)
...@@ -170,8 +173,9 @@ The script options are mostly the same as the single GPU training. We only scale ...@@ -170,8 +173,9 @@ The script options are mostly the same as the single GPU training. We only scale
- Fix information leak in the history actions - Fix information leak in the history actions
### Training ### Training
- Evaluation with old models during training - League training (AlphaStar, ROA-Star)
- League training following AlphaStar and ROA-Star - Nash equilibrium training (OSFP, DeepNash)
- Centralized critic with full observation
### Inference ### Inference
- MCTS-based planning - MCTS-based planning
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment