From 421776514eeb66169402d2c6f06834393930ae8e Mon Sep 17 00:00:00 2001
From: beanbun <yuanzhonghang@pjlab.org.cn>
Date: Mon, 13 Apr 2026 12:53:53 +0800
Subject: [PATCH 1/4] update README

---
 README.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/README.md b/README.md
index 3989f5c9..b730083f 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,16 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and
 |   Math    |                          AIME24                           | **20.6** |              16.7              |
 |           |                          AIME25                           | **22.7** |              7.2               |
 
+### RLVR
+We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results.
+|  Domain   |                          Dataset                          |   Ours   | Qwen2.5-7B-Instruct (baseline) |
+|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
+|   Plant   |                           SeedBench                       | **66.8** |              51.5              |
+|    law    |                           LawBench                        | **55.2** |              54.76             |
+|  Medicine |                            MedQA                          | **87.1** |              80.7              |
+|  General  |                             BBH                           | **55.3** |              49.6              |
 
+More details can be found at `examples/generate/generate_masked_fill_in_blank_qa`
 
 ## ⚙️ Support List
 

From e97a243ade6b1f0ac954ed97174d5ea1ae75c95a Mon Sep 17 00:00:00 2001
From: chenzihong <58508660+ChenZiHong-Gavin@users.noreply.github.com>
Date: Mon, 13 Apr 2026 14:52:35 +0800
Subject: [PATCH 2/4] Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b730083f..0140412f 100644
--- a/README.md
+++ b/README.md
@@ -103,7 +103,7 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and
 We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results.
 |  Domain   |                          Dataset                          |   Ours   | Qwen2.5-7B-Instruct (baseline) |
 |:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
-|   Plant   |                           SeedBench                       | **66.8** |              51.5              |
+|   Plant   | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** |              51.5              |
 |    law    |                           LawBench                        | **55.2** |              54.76             |
 |  Medicine |                            MedQA                          | **87.1** |              80.7              |
 |  General  |                             BBH                           | **55.3** |              49.6              |

From f6c778e7603980fcbd7cd73bac76774c2a0c05a5 Mon Sep 17 00:00:00 2001
From: chenzihong <58508660+ChenZiHong-Gavin@users.noreply.github.com>
Date: Mon, 13 Apr 2026 14:53:19 +0800
Subject: [PATCH 3/4] Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 0140412f..669460b9 100644
--- a/README.md
+++ b/README.md
@@ -104,7 +104,7 @@ We applied reinforcement learning directly to the Qwen2.5-7B base model without
 |  Domain   |                          Dataset                          |   Ours   | Qwen2.5-7B-Instruct (baseline) |
 |:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:|
 |   Plant   | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** |              51.5              |
-|    law    |                           LawBench                        | **55.2** |              54.76             |
+|    Law    |                           LawBench                        | **55.2** |              54.76             |
 |  Medicine |                            MedQA                          | **87.1** |              80.7              |
 |  General  |                             BBH                           | **55.3** |              49.6              |
 

From e97b6c104d2e20b9f110e1d7f5429a4308070266 Mon Sep 17 00:00:00 2001
From: chenzihong <58508660+ChenZiHong-Gavin@users.noreply.github.com>
Date: Mon, 13 Apr 2026 14:53:40 +0800
Subject: [PATCH 4/4] Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 669460b9..5384fa4c 100644
--- a/README.md
+++ b/README.md
@@ -108,7 +108,7 @@ We applied reinforcement learning directly to the Qwen2.5-7B base model without
 |  Medicine |                            MedQA                          | **87.1** |              80.7              |
 |  General  |                             BBH                           | **55.3** |              49.6              |
 
-More details can be found at `examples/generate/generate_masked_fill_in_blank_qa`
+More details can be found at [`examples/generate/generate_masked_fill_in_blank_qa`](./examples/generate/generate_masked_fill_in_blank_qa).
 
 ## ⚙️ Support List