:sparkles: Adapt RL training to the translation task by Manuel-2011 · Pull Request #1 · Manuel-2011/RLTranslator

Manuel-2011 · 2025-05-08T16:43:24Z

No description provided.

Manuel-2011 · 2025-05-08T16:47:53Z

        while line != '' and len(all_matches) < max_matches:
            data = line.strip().split(',')
-            if re.search(rf'\b{spanish_word}\b', data[0], re.IGNORECASE):
+            if re.search(rf'\b{re.escape(spanish_word)}\b', data[0], re.IGNORECASE):


Así lo que está en spanish_word no se toma como regex

Manuel-2011 · 2025-05-08T16:49:18Z

-                    inputs[j] += list(output.outputs[0].token_ids)
+                    responses[j] += tokenizer.eos_token
+                    inputs[j] += [tokenizer.eos_token_id]
+                    mask[j] += [1]


Cuando se supera el límite de tokens de generación, no se concatenan los tokens para no gastar recursos de computo en más tokens que igual no llevan a ninguna recompensa.

Manuel-2011 · 2025-05-08T16:49:49Z


 def extract_answer(response, transform_fn = lambda x: x, nan_val = None)->str|None:
-    ans = re.match('.*?<answer>(.*?)</answer>', response, re.DOTALL|re.MULTILINE)
+    ans = re.match('.*?<answer>(.*?)</answer>\s*$', response, re.DOTALL|re.MULTILINE)


Para obtener la recompensa debe terminar con los tags de answer

Manuel-2011 · 2025-05-08T16:50:29Z

+    logger.debug(f'Rewards: {rewards[torch.arange(len(samples)), eos_index]}')
+
+    return rewards
+


Trabajo de Meli @mvrobles !

Manuel-2011 · 2025-05-08T16:51:35Z

+    rewards = get_rewards_translation(inputs, is_terminal, wayuu_text)
+    return inputs, rewards, is_terminal, complete_prompts, prompt_length, mask
+
+


Adaptación para generar un episodio. Generar x simulaciones a partir de una traducción en español y evaluarlas con el BLEU

Manuel-2011 · 2025-05-08T16:52:15Z

+        wayuu = self.wayuu_lines[idx]
+
+        return spa, wayuu
 # %% [markdown]


Leer el dataset de tradcción. No es la manera más eficiente porque se guarda completo en RAM.

Manuel-2011 · 2025-05-08T16:52:37Z

 no_kl=True
-enabled_tools = ['calculator', 'spa_to_wayu']
-logger.info(f'Hyperparameters:\nupdate_epochs:{update_epochs}\nrl_steps:{rl_steps}\nsims_per_prompt:{sims_per_prompt}\nminibatch_size:{minibatch_size}\npolicy_lr:{policy_lr}\nwarmup_steps:{warmup_steps}\ngae_lambda: {gae_lambda}\nnormalize advantage:{normalize_advantage}\nlower_clip:{lower_clip}\nupper_clip:{upper_clip}\nkl_penalty_coef:{kl_penalty_coef}\ntemperature:{temperature}\ndr_grpo:{dr_grpo}\nno_kl={no_kl}\nuse_deepspeed={use_deepspeed}\nuse_vllm={use_vllm}\nenabled_tools={enabled_tools}\nbase_model_name={base_model_name}')
+max_new_tokens=512


Aumenté la cantidad de tokens que se pueden generar

Manuel-2011 · 2025-05-08T16:53:23Z

    scheduling_policy="fcfs",
    dtype=torch.bfloat16,
-    max_model_len=2048,
+    max_model_len=768,


No estoy seguro, pero de pronto esto reduzca el uso de memoria, igual no se requieren tantos tokens.

Manuel-2011 · 2025-05-08T16:53:57Z

-                acc = eval_multiplication(model_engine, tokenizer, epochs=40, batch_size=64)
-        else:
-            with model_engine.disable_adapter():
-                acc = eval_multiplication(model_engine, tokenizer, epochs=40, batch_size=64)


Falata hacer una función de evaluación para sacar un estimado del desempeño a medida que entrenamos

✨ Adapt RL training to the translation task

35cbc94

Manuel-2011 requested review from jd-rodriguezp1234 and mvrobles May 8, 2025 16:43

Manuel-2011 self-assigned this May 8, 2025

Update requirements.txt with sacrebleu

141969c

Manuel-2011 commented May 8, 2025

View reviewed changes

Comment thread datasets/dev.es.txt

Manuel-2011 commented May 8, 2025

View reviewed changes

Manuel-2011 and others added 7 commits May 13, 2025 09:30

Add evaluation while training and other minor changes

71c7c8f

✨ SFT on translation task

0e006de

Evaluation notebook

d4557d6

Minor fix

2b5d704

✨ Preliminary results

5cd061b

Added curated train dataset for RL training

70986e1

Test datasets

91ea6f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Adapt RL training to the translation task#1

✨ Adapt RL training to the translation task#1
Manuel-2011 wants to merge 9 commits intomainfrom
feature/adapt_training_to_task

Manuel-2011 commented May 8, 2025

Uh oh!

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Manuel-2011 May 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		logger.debug(f'Rewards: {rewards[torch.arange(len(samples)), eos_index]}')

		return rewards

		rewards = get_rewards_translation(inputs, is_terminal, wayuu_text)
		return inputs, rewards, is_terminal, complete_prompts, prompt_length, mask

Conversation

Manuel-2011 commented May 8, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants