From 421776514eeb66169402d2c6f06834393930ae8e Mon Sep 17 00:00:00 2001 From: beanbun Date: Mon, 13 Apr 2026 12:53:53 +0800 Subject: [PATCH 1/4] update README --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index 3989f5c9..b730083f 100644 --- a/README.md +++ b/README.md @@ -99,7 +99,16 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and | Math | AIME24 | **20.6** | 16.7 | | | AIME25 | **22.7** | 7.2 | +### RLVR +We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results. +| Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) | +|:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:| +| Plant | SeedBench | **66.8** | 51.5 | +| law | LawBench | **55.2** | 54.76 | +| Medicine | MedQA | **87.1** | 80.7 | +| General | BBH | **55.3** | 49.6 | +More details can be found at `examples/generate/generate_masked_fill_in_blank_qa` ## ⚙️ Support List From e97a243ade6b1f0ac954ed97174d5ea1ae75c95a Mon Sep 17 00:00:00 2001 From: chenzihong <58508660+ChenZiHong-Gavin@users.noreply.github.com> Date: Mon, 13 Apr 2026 14:52:35 +0800 Subject: [PATCH 2/4] Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b730083f..0140412f 100644 --- a/README.md +++ b/README.md @@ -103,7 +103,7 @@ Here is post-training result which **over 50% SFT data** comes from GraphGen and We applied reinforcement learning directly to the Qwen2.5-7B base model without any prior SFT. Here are the results. | Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) | |:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:| -| Plant | SeedBench | **66.8** | 51.5 | +| Plant | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** | 51.5 | | law | LawBench | **55.2** | 54.76 | | Medicine | MedQA | **87.1** | 80.7 | | General | BBH | **55.3** | 49.6 | From f6c778e7603980fcbd7cd73bac76774c2a0c05a5 Mon Sep 17 00:00:00 2001 From: chenzihong <58508660+ChenZiHong-Gavin@users.noreply.github.com> Date: Mon, 13 Apr 2026 14:53:19 +0800 Subject: [PATCH 3/4] Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0140412f..669460b9 100644 --- a/README.md +++ b/README.md @@ -104,7 +104,7 @@ We applied reinforcement learning directly to the Qwen2.5-7B base model without | Domain | Dataset | Ours | Qwen2.5-7B-Instruct (baseline) | |:---------:|:---------------------------------------------------------:|:--------:|:------------------------------:| | Plant | [SeedBench](https://github.com/open-sciencelab/SeedBench) | **66.8** | 51.5 | -| law | LawBench | **55.2** | 54.76 | +| Law | LawBench | **55.2** | 54.76 | | Medicine | MedQA | **87.1** | 80.7 | | General | BBH | **55.3** | 49.6 | From e97b6c104d2e20b9f110e1d7f5429a4308070266 Mon Sep 17 00:00:00 2001 From: chenzihong <58508660+ChenZiHong-Gavin@users.noreply.github.com> Date: Mon, 13 Apr 2026 14:53:40 +0800 Subject: [PATCH 4/4] Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 669460b9..5384fa4c 100644 --- a/README.md +++ b/README.md @@ -108,7 +108,7 @@ We applied reinforcement learning directly to the Qwen2.5-7B base model without | Medicine | MedQA | **87.1** | 80.7 | | General | BBH | **55.3** | 49.6 | -More details can be found at `examples/generate/generate_masked_fill_in_blank_qa` +More details can be found at [`examples/generate/generate_masked_fill_in_blank_qa`](./examples/generate/generate_masked_fill_in_blank_qa). ## ⚙️ Support List