Skip to content

Stochastic GCG for optimizing prompts on datasets #6

@amanb2000

Description

@amanb2000

Let's generalize the easy_gcg code to optimize prompts on a dataset of (x, y) pairs, where each x is the question and y is the answer.

We want to solve u := argmax_u E [P(y | u + x)] where the expectation is taken over the dataset (x, y) ~ D.

We can start by simply aggregating gradients for the swaps in GCG over multiple elements of the batch (

def stochastic_easy_gcg_qa_ids(question_ids: list[torch.Tensor],
).

All that remains is to create an efficient batch_compute_score_dataset() function to compute the scores of each potential new prompt w.r.t. the dataset (

alt_scores = batch_compute_score_dataset(alt_prompt_ids,
)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions