Skip to content

seems that MoA does not work on MATH and QA with both weak and strong LLMs #41

Description

@yanan1116

I have thoroughly tested MoA (with one layer) on some objective benchmarks (less subjective compared to MT-bench), such as GSM8K, HotpotQA.
It seems that when the LLMs are 7B-level, it does not work anymore.
Here in my setting,
the three LLMs in layer one is mistralai/Mistral-7B-Instruct-v0.1/2/3, while the aggregator is meta-llama/Meta-Llama-3.1-8B-Instruct.
(before the experiment, I have tested each model's capability to solve the problem, the most powerful one is llama-3.1-8B).

Then, when applying MoA, I find that the performance decrease, for example, in GSM8K, the acc decreases from 75.1 to 61.3, where llama-3.1 solely achives 75.1, here rounds=0; while 61.3 is from rounds=1 that the intermidiate layer consists of the mistral-7B v0.1/2/3.

This finding also applies to HotpotQA.

Does anyone face the similar observation with me ? Any suggestions on how to use 7B-level llms ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions