Hi, I am working on adapting the method (say, prompt-gating) from BART (enc-dec transformers) to GPT2 (dec-only transformers) and stuff.
Regarding the paper, I used prefix-tuning to train my condition vectors (e.g., sentiment, topic, ...) while adding prompt-gates as norms (right like what you did in the paper for bart).
However I found the result a little bit disappointing (not much better than without using norm) when I concatenate them together to do MCTG...
I am wondering if this idea can smoothly generalize to GPT-style models? Do you have such experience or suggestions?
I would definitely appreciate it if hearing response from you :D
Hi, I am working on adapting the method (say, prompt-gating) from BART (enc-dec transformers) to GPT2 (dec-only transformers) and stuff.
Regarding the paper, I used prefix-tuning to train my condition vectors (e.g., sentiment, topic, ...) while adding prompt-gates as norms (right like what you did in the paper for bart).
However I found the result a little bit disappointing (not much better than without using norm) when I concatenate them together to do MCTG...
I am wondering if this idea can smoothly generalize to GPT-style models? Do you have such experience or suggestions?
I would definitely appreciate it if hearing response from you :D