From 374dfce612058e6791953443ac4b62d07ee66a46 Mon Sep 17 00:00:00 2001 From: My Chiffon Nguyen Date: Thu, 5 Mar 2026 17:53:30 -0800 Subject: [PATCH 1/9] fix: table x-overflow --- _sass/_base.scss | 3 +++ 1 file changed, 3 insertions(+) diff --git a/_sass/_base.scss b/_sass/_base.scss index 10d3440..c61d528 100644 --- a/_sass/_base.scss +++ b/_sass/_base.scss @@ -41,6 +41,7 @@ html { scroll-behavior: smooth; + overflow-x: hidden; } body { @@ -249,6 +250,8 @@ table { border-collapse: collapse; font-size: 0.9rem; line-height: 1.5; + table-layout: fixed; + word-wrap: break-word; thead { border-bottom: 2px solid var(--border); From 11289d6feee01a4075bdd29485cc728fd4adae7f Mon Sep 17 00:00:00 2001 From: My Chiffon Nguyen Date: Thu, 5 Mar 2026 17:54:00 -0800 Subject: [PATCH 2/9] fix: resources link in navbar --- _data/navbar.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_data/navbar.yml b/_data/navbar.yml index e17acea..fa0f429 100644 --- a/_data/navbar.yml +++ b/_data/navbar.yml @@ -9,7 +9,7 @@ - name: "Publications" href: "/publications" - name: "Resources" - href: "/resources/" + href: "/resources" - name: "Blog" href: "/posts/" From 84c438cea516cb7088905800c052eee54339903b Mon Sep 17 00:00:00 2001 From: My Chiffon Nguyen Date: Thu, 5 Mar 2026 17:55:17 -0800 Subject: [PATCH 3/9] content: close sea-vl phase 2 and adjust apprenticeship --- _pages/apprenticeship.md | 10 +++++----- _projects/2025-seavl-phase-2.md | 11 ++--------- 2 files changed, 7 insertions(+), 14 deletions(-) diff --git a/_pages/apprenticeship.md b/_pages/apprenticeship.md index b00132f..4d71aea 100644 --- a/_pages/apprenticeship.md +++ b/_pages/apprenticeship.md @@ -27,7 +27,7 @@ faq: Examples include work on SEA languages, regional datasets, or SEA-specific social or cultural AI challenges. - title: "Am I qualified if I speak a language spoken in Southeast Asia, like Thai/Hokkien Chinese/etc?" content: | - This alone does not qualify you for the program. + This alone does NOT qualify you for the program. However, speaking a language from Southeast Asia can help, especially if it informs your research interests. We encourage you to highlight any relevant language skills and how they connect to your research goals in your application. - title: "Can I apply as a group/team?" @@ -125,7 +125,7 @@ We offer five cutting-edge research projects: There are no formal eligibility or age limits. We’re a growth-first programme and value potential, motivation, and effort more than credentials. -We welcome **anyone** who meets at least one of the following: +We welcome anyone who can **commit 10+ hours/week** to the program and meets **at least one** of the following: - Your affiliation (e.g., school, organization, company) is from Southeast Asia (SEA). - You were born and raised in SEA (living there more than 10 years). @@ -136,14 +136,14 @@ and share [our vision](/about#our-vision). ### What Increases Your Chances - Bachelor’s degree with a publication or Master’s degree. -- Clear AI research goals (pre-PhD programs, early-year PhD, or prior collaboration with mentors). -- Fit with project topics, capability, motivation, and mentors (assessed via application + interview). +- Clear AI research goals (pre-doctoral programs, early-year PhD, or prior collaboration with mentors). +- Fit with project topic, capability, motivation, mentors (assessed via application + interview). ## What You'll Gain - **Certificate of achievement** upon completion - **Letter of recommendation** for PhD/job applications (for strong contributors) -- **Potential publication** at top ML/AI/NLP venues (first authorship reserved for project leads) +- **Potential publication** at top ML/AI/NLP venues. The order of authorship is determined by your actual contribution, and usually first authorship is reserved for project lead. - **Mentorship** from experienced AI researchers - **Peer network** with similar research interests - **Hands-on experience** in collaborative AI research diff --git a/_projects/2025-seavl-phase-2.md b/_projects/2025-seavl-phase-2.md index 3ffac8d..d970787 100644 --- a/_projects/2025-seavl-phase-2.md +++ b/_projects/2025-seavl-phase-2.md @@ -3,7 +3,8 @@ title: SEA-VL Phase 2 thumbnail: seavl-phase2-banner.png description: Develop and benchmark an open-source state-of-the-art vision language model (VLM) for Southeast Asia fromDate: 2025-03-01 -status: ongoing +toDate: 2026-01-31 +status: completed keywords: [ "SEA-VL", @@ -18,10 +19,6 @@ keywords: ] --- - - After the success of [SEA-VL Phase 1](/projects/2025-seavl-phase-1), we are proud to launch **SEA-VL Phase 2**! We believe it's high time to create a model that truly understands Southeast Asian culture and language. We want the model to reflect the visual and linguistic richness of the SEA region through diverse contributions: high-quality data curation, annotation, prompting, model training, and evaluation. @@ -120,10 +117,6 @@ Go to this [form](https://docs.google.com/forms/d/e/1FAIpQLSdbcxgN_sOqDResVkcj3s7IkYTWI8WDV6dZw0mZOIkH7puNSg/viewform?usp=dialog) to contribute. -### Task 6: Submit High-Quality Text Prompts - -_(To be opened at a later date)_ - ## Contribution Point System for Tasks Each task has its corresponding point-per-submission, calibrated for task difficulty and its From 200f54751d4a6146f2b1a17d516514f9e9dd0519 Mon Sep 17 00:00:00 2001 From: My Chiffon Nguyen Date: Thu, 5 Mar 2026 17:55:49 -0800 Subject: [PATCH 4/9] feat: add mentors and mentees to apprenticeship projects --- .../2026/coral-contextual-relevance.md | 7 ++ .../knowledge-distillation-vision-text.md | 9 +++ .../multilingual-agentic-underrepresented.md | 9 +++ .../2026/reasoning-agentic-router.md | 7 ++ .../2026/selective-memory-layer.md | 10 +++ _includes/project-cards.html | 69 ++++++++++++++----- _sass/_card.scss | 34 +++++++-- _sass/_projects.scss | 23 ++++++- 8 files changed, 144 insertions(+), 24 deletions(-) diff --git a/_apprentice_projects/2026/coral-contextual-relevance.md b/_apprentice_projects/2026/coral-contextual-relevance.md index 47a8a66..13a7a5f 100644 --- a/_apprentice_projects/2026/coral-contextual-relevance.md +++ b/_apprentice_projects/2026/coral-contextual-relevance.md @@ -3,6 +3,13 @@ batch: 2026 order: 2 title: "CoRaL: Contextual Relevance and Linguistic Enrichment" summary: "A multi-dimensional data curation framework to balance quality, relevance, and cultural coverage in low-resource corpora." +mentors: + - name: "Fajri Koto" + - name: "M Dehan Al-Kautsar" +mentees: + - name: "Thanh-Nhi Nguyen" + - name: "Feliks Victor Parningotan Samosir" + - name: "Michael Christlambert Sinanta" --- Low-resource language corpora often suffer from noise, domain imbalance, and linguistic mixing, making naive filtering harmful to both quantity and cultural representation. diff --git a/_apprentice_projects/2026/knowledge-distillation-vision-text.md b/_apprentice_projects/2026/knowledge-distillation-vision-text.md index 4cbd6f4..552acd9 100644 --- a/_apprentice_projects/2026/knowledge-distillation-vision-text.md +++ b/_apprentice_projects/2026/knowledge-distillation-vision-text.md @@ -3,6 +3,15 @@ batch: 2026 order: 5 title: "Knowledge Distillation in Multilingual Vision-Text Model" summary: "Distill compact multilingual vision-text embeddings from large multimodal teachers for real-world deployment." +mentors: + - name: "Peerat Limkonchotiwat" + - name: "Ekapol Chuangsuwanich" + - name: "Pume Tuchinda" +mentees: + - name: "Ashvanth S" + - name: "Faiz Assabil Firdaus" + - name: "Ilma Aliya Fiddien" + - name: "Puja Ahmad Habibi" --- We propose a training framework to distill a small vision-text embedding model from a large multimodal teacher. Existing KD approaches often assume a base-sized teacher and focus on monolingual settings, leaving large teachers and multilingual scenarios underexplored. diff --git a/_apprentice_projects/2026/multilingual-agentic-underrepresented.md b/_apprentice_projects/2026/multilingual-agentic-underrepresented.md index 713511c..b9e8cec 100644 --- a/_apprentice_projects/2026/multilingual-agentic-underrepresented.md +++ b/_apprentice_projects/2026/multilingual-agentic-underrepresented.md @@ -3,6 +3,15 @@ batch: 2026 order: 1 title: "Multilingual Agentic for Underrepresented Regions" summary: "Build an environment and evaluation benchmark for agentic LLMs in low-resource languages and underrepresented regions." +mentors: + - name: "Samuel Cahyawijaya" + - name: "Patomporn Payoungkhamdee" +mentees: + - name: "Aulia Adila" + - name: "Kittiphat Leesombatwathana" + - name: "My (Chiffon) Nguyen" + - name: "Saksorn Ruangtanusak" + - name: "Vissuta Gunawan Lim" --- In this work, we address the gap in enabling LLMs with agentic capabilities for low-resource languages and underrepresented regions. Most existing environments and evaluation benchmarks (e.g., Taubench) are Anglocentric, leaving a critical void in assessing performance across diverse linguistic contexts. diff --git a/_apprentice_projects/2026/reasoning-agentic-router.md b/_apprentice_projects/2026/reasoning-agentic-router.md index 6775b45..53901ed 100644 --- a/_apprentice_projects/2026/reasoning-agentic-router.md +++ b/_apprentice_projects/2026/reasoning-agentic-router.md @@ -3,6 +3,13 @@ batch: 2026 order: 3 title: "Reasoning Agentic LLM Router" summary: "Develop skill-based routing to reduce inference costs while preserving strong generalization." +mentors: + - name: "Genta Indra Winata" + - name: "David Anugraha" +mentees: + - name: "Napol Rachatasumrit" + - name: "Quyen Le Hoang Tran" + - name: "Jaycent Gunawan Ongris" --- Learning to route effectively is crucial for improving the efficiency of LLM inference by leveraging model capabilities. Prior work explores routing strategies, but does not thoroughly examine fine-grained, skill-based routing that can substantially reduce costs while preserving strong generalization. diff --git a/_apprentice_projects/2026/selective-memory-layer.md b/_apprentice_projects/2026/selective-memory-layer.md index 076e2f0..5867c73 100644 --- a/_apprentice_projects/2026/selective-memory-layer.md +++ b/_apprentice_projects/2026/selective-memory-layer.md @@ -3,6 +3,16 @@ batch: 2026 order: 4 title: "Selective Memory Layer Finetuning" summary: "Explore memory-layer finetuning strategies to improve continual learning without catastrophic forgetting." +mentors: + - name: "Alham Fikri Aji" + - name: "Farid Adilazuarda" + - name: "Muhammad Reza Qorib" +mentees: + - name: "Faeyza Rishad Ardi" + - name: "Izaaz Inhar" + - name: "Phudish Prateepamornkul" + - name: "Quang Minh Nguyen" + - name: "Tri Vo" --- We tackle continual learning from an architectural perspective. Instead of LoRA, whose parameters grow with the number of tasks or languages, we explore memory layers where the model can store or learn context by injecting key-value information during inference. diff --git a/_includes/project-cards.html b/_includes/project-cards.html index 2e90f94..8f78243 100644 --- a/_includes/project-cards.html +++ b/_includes/project-cards.html @@ -1,8 +1,7 @@ {% comment %} - Usage: - {% include project-cards.html projects=projects batch_key=batch_key %} + Usage: {% include project-cards.html projects=projects + batch_key=batch_key %} {% endcomment %} - {% assign projects = include.projects | default: site.apprentice_projects %} {% if projects and projects.size > 0 %}
@@ -15,9 +14,8 @@ %} {% assign mentor_names = project.mentors | map: 'name' | join: ', ' %} {% assign mentee_names = project.mentees | map: 'name' | join: ', ' %} - {% capture project_preview %} -{{ project.summary | default: project.content | markdownify }} - {% endcapture %} + {% capture project_preview %} {{ project.summary | default: project.content | markdownify + }} {% endcapture %} {% assign summary_text = project_preview | strip_html | strip_newlines @@ -32,11 +30,26 @@ >

{{ project.title }}

- {% if mentor_names != '' %} -

Mentors: {{ mentor_names }}

- {% endif %}

{{ summary_text }}

+ {% if mentor_names != '' or mentee_names != '' %} +
+ {% if mentor_names != '' %} +

+ Mentors + {{ mentor_names }} +

+ {% endif %} + {% if project.mentees and project.mentees.size > 0 %} +

+ Mentees ({{ project.mentees.size }}) + {{ mentee_names }} +

+ {% endif %} +
+ {% endif %}