Skip to content

Partial VLM Quantization Shows Minimal LIBERO Impact #897

@fenghao999

Description

@fenghao999

Hi all,

We wanted to share an initial quantization result on the libero-pi0-fine-tuned model and get feedback from the community.

We used ModelOpt to perform PTQ of both FP8 and NVFP4 on "lerobot/libero-pi0-fine-tuned" model. We experimented with quantizing the MLP projection weights in the PaliGemma VLM backbone's weights while leaving the rest of the model unchanged. We tested two lower-precision formats: FP8 and NVFP4, across 4 datasets of libero suites: libero_object, libero_spatial, libero_goal, libero_10.

One important clarification is that this experiment does not include kernel-level optimization. This is an inference evaluation only: the quantized FP8/NVFP4 weights are dequantized back to BF16 at runtime, rather than being executed with native low-precision kernels.

The main observation is that in this setup, I do not see any significant deterioration in task success rate after quantization. These results suggest that quantizing only the PaliGemma MLP projection weights may preserve policy performance reasonably well, at least for these LIBERO suites. There is a little drop on libero_long with nvfp4, but across the other suites the change is negligible. I've attached the comparison figure below (evaluated over 2000 episodes on each dataset).

Question / discussion points:

  • Is this behavior expected given the current open-pi architecture?
  • Has anyone benchmarked similar partial quantization settings in open-pi?
  • If useful, I can also share more details about the quantization setup and evaluation procedure.
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions