Skip to content

feat(integration-whatsapp): core ingest do coletor de WhatsApp#273

Draft
Clintonrocha98 wants to merge 7 commits into
4.xfrom
feat/integration-whatsapp-ingest
Draft

feat(integration-whatsapp): core ingest do coletor de WhatsApp#273
Clintonrocha98 wants to merge 7 commits into
4.xfrom
feat/integration-whatsapp-ingest

Conversation

@Clintonrocha98
Copy link
Copy Markdown
Member

@Clintonrocha98 Clintonrocha98 commented May 21, 2026

📊 Resumo visual (prático): abrir artefato
HTML autocontido explicando o fluxo, o modelo de dados, as decisões e os testes deste PR.


O que é

Primeira fatia do módulo integration-whatsapp: a camada de ingest (lado Laravel) do coletor de métricas dos grupos de WhatsApp da He4rt. Um webhook autenticado por HMAC recebe os eventos crus emitidos pelo coletor externo (wpp-tui, repo Node/Baileys separado) e os persiste num data lake (3 tabelas), para a equipe de dados explorar depois.

Esta fase é de mapeamento: capturar tudo cru e decidir o que medir depois. Sem agregação de métricas, sem dashboards — isso vem em PRs futuros.

Arquitetura

WhatsApp servers
   ↓ WebSocket (Baileys)
[ wpp-tui — repo Node separado ]  ← runtime, sessão, envia TODOS os eventos (sem filtro)
   ↓ POST /api/integrations/whatsapp/events   (X-Signature: HMAC-SHA256, X-Event-Id: uuid)
[ integration-whatsapp — ESTE módulo ]
   ├─ VerifyWhatsAppSignature      ← valida HMAC + X-Event-Id (timing-safe)
   ├─ WhatsAppWebhookController     ← valida body, checa duplicata, dispatch → 202
   ├─ ProcessWhatsAppEvent (Job/Horizon)
   │     ├─ upsert whatsapp_groups
   │     ├─ upsert whatsapp_participants
   │     └─ insert whatsapp_events (payload jsonb cru)
   └─ Models/{WhatsAppGroup, WhatsAppParticipant, WhatsAppEvent}

O controller responde rápido (202) e enfileira o processamento pesado no Horizon. A idempotência é garantida em duas camadas: short-circuit por event_id no controller e firstOrCreate(event_id) no Job (seguro sob corrida/retry).

Modelo de dados (data lake)

3 tabelas com UUID PK (UUIDv7 ordenado via HasUuids) e a coluna payload (jsonb) guardando o evento Baileys cru. Só type, FKs e timestamps são materializados como colunas (para indexação).

  • whatsapp_groupsexternal_jid UNIQUE, tenant_id?, display_name, internal_name, payload, first/last_seen_at
  • whatsapp_participantsglobal por número (external_jid UNIQUE, 1 linha por pessoa), push_name, identity_id? (→ users, para vínculo futuro), payload
  • whatsapp_eventsevent_id UNIQUE (idempotência), type (indexado), group_id?, participant_id?, tenant_id?, occurred_at, received_at, payload. Índices compostos (type, occurred_at), (group_id, occurred_at), (participant_id, occurred_at)

Decisões de design

  • UUIDv7, não bigint: o monolito já usa uuid em users; UUIDv7 é ordenado por tempo → sem fragmentação de índice no insert da tabela de alto volume.
  • Participante global (1 linha por número): resolve uma contradição interna do spec. Identidade é por número; o vínculo com grupo vive nos eventos. Membership explícita (se precisar) → tabela pivô futura, sem inchar participants.
  • HMAC-SHA256 no webhook: integridade + anti-replay (X-Event-Id), via middleware VerifyWhatsAppSignature.
  • collection_policy descartada nesta fase: o bot envia tudo e o Laravel salva tudo cru — mais aderente ao data lake e remove o subsistema de polling/cache. Revisitar só se uma fase de endurecimento exigir.
  • type como string (não enum): o conjunto de eventos do Baileys é aberto e muda entre versões; um enum rejeitaria tipos novos e quebraria o mapeamento.

Privacidade — postura consciente (fase exploratória)

⚠️ Telefone real (sem hash), conteúdo cru e sem TTL. Decisão deliberada e temporária para a fase de mapeamento, registrada no docs/adr/0001. É um passivo LGPD a ser revisitado ao fim da exploração (hashing, minimização de conteúdo, retenção).

Testes

12 testes Pest (37 asserts), todos verdes; Pint limpo.

  • Models (3): UUID, casts jsonb, FKs nuláveis
  • Job (4): upsert grupo+participante, evento sem grupo/participante, idempotência por event_id, não-duplicação de grupo
  • Webhook (5): HMAC válido → 202 + dispatch; assinatura inválida → 401; sem X-Event-Id → 401; body inválido → 422; duplicata → 202 sem dispatch

Fora de escopo (PRs futuros)

Recurso Filament para administração, flow de vínculo de identidade, wiring do webhook no wpp-tui, e operação (deploy, observabilidade, retenção, onboarding/consentimento dos membros).

Config

Requer WHATSAPP_WEBHOOK_SECRET (adicionado ao .env.example) — o mesmo segredo usado pelo wpp-tui para assinar o corpo.

Documentação incluída

CONTEXT.md, docs/spec.md, docs/adr/0001-data-lake-approach.md e docs/plans/0001-ingest-implementation.md.

@Clintonrocha98 Clintonrocha98 marked this pull request as draft May 21, 2026 04:14
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

This PR introduces a complete WhatsApp event ingestion module for the Laravel monolith. The implementation follows a data-lake schema-on-read approach with HMAC-authenticated webhook endpoints, idempotency checks via X-Event-Id, and asynchronous Horizon jobs that persist raw event payloads into three tables: whatsapp_groups, whatsapp_participants, and whatsapp_events. The module includes migrations, Eloquent models with UUID support, test factories, request validation, signature verification middleware, and comprehensive feature tests validating the full request-to-persistence flow.

Suggested reviewers

  • gvieira18
  • danielhe4rt
  • 1pride
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.24% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title references the WhatsApp integration ingest layer (core ingest) which is the main focus of this changeset, clearly summarizing the primary addition of webhook infrastructure, event processing, and data persistence for WhatsApp group metrics collection.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
app-modules/integration-whatsapp/docs/spec.md (1)

63-78: ⚡ Quick win

Add language tags to fenced diagrams to satisfy markdown linting.

Both fenced blocks should declare a language (e.g., text) to avoid MD040 warnings and keep docs CI clean.

Suggested diff
-```
+```text
 WhatsApp servers
    ↓ WebSocket (Baileys)
 [ wpp-tui ]  ← repo Node separado · runtime · sessão · envia todos os eventos (sem filtro)
@@
-```
+```

-```
+```text
 whatsapp_groups
 ├─ id                uuid PK
@@
-```
+```

Also applies to: 109-141

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app-modules/integration-whatsapp/docs/spec.md` around lines 63 - 78, The two
fenced diagram blocks in the spec (the block starting with "WhatsApp servers ↓
WebSocket (Baileys) [ wpp-tui ]" and the block listing "whatsapp_groups ├─ id
uuid PK ...") are missing language tags; update their opening ``` fences to
include a language (e.g., ```text) so both diagrams and the other affected block
range (lines around the second diagram referenced) are annotated and satisfy
markdown linting (MD040).
app-modules/integration-whatsapp/docs/plans/0001-ingest-implementation.md (1)

32-54: ⚡ Quick win

Add language identifiers to all unlabeled fenced blocks.

Several fenced blocks are missing a language tag; adding text, bash, or env as appropriate will clear MD040 warnings and improve readability.

Also applies to: 58-69, 73-86, 90-96, 102-130, 226-228

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app-modules/integration-whatsapp/docs/plans/0001-ingest-implementation.md`
around lines 32 - 54, The Markdown file contains multiple unlabeled fenced code
blocks (e.g., the architecture diagram and payload examples surrounding symbols
like VerifyWhatsAppSignature, WhatsAppWebhookController, ProcessWhatsAppEvent,
and the POST body schema) which trigger MD040; add appropriate language
identifiers to each fence — use "text" for the ASCII diagram block, "json" for
the request body example ({ type, group_jid, ... }), "bash" for any CLI
snippets, and "env" for environment samples — so every fenced block has a
language tag to clear the lint warnings and improve readability.
app-modules/integration-whatsapp/CONTEXT.md (1)

25-38: ⚡ Quick win

Declare a language for the architecture fence block.

Use a language like text on this fenced diagram to satisfy markdownlint MD040.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app-modules/integration-whatsapp/CONTEXT.md` around lines 25 - 38, The fenced
architecture diagram at the top of the file is missing a language tag which
triggers markdownlint MD040; update the opening triple-backtick of that diagram
(the top-level architecture fence block) to include a language identifier such
as text so the fence becomes ```text, ensuring the markdown linter recognizes it
as a code block with an explicit language.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app-modules/integration-whatsapp/docs/adr/0001-data-lake-approach.md`:
- Around line 70-77: Update the ADR text to replace the vague “revisit when the
exploration phase ends” with a concrete privacy review trigger: add a specific
calendar deadline (e.g., "Review by YYYY-MM-DD") and an accountable owner (e.g.,
"Owner: Data Privacy Lead / <name or role>"). Also document where the review
will evaluate the listed items (aggregating signals, TTL for whatsapp_events,
phone hashing/content minimization, and presence.update) so the owner and date
are clearly tied to those symbols (`whatsapp_events`, `presence.update`) in the
same paragraph.

---

Nitpick comments:
In `@app-modules/integration-whatsapp/CONTEXT.md`:
- Around line 25-38: The fenced architecture diagram at the top of the file is
missing a language tag which triggers markdownlint MD040; update the opening
triple-backtick of that diagram (the top-level architecture fence block) to
include a language identifier such as text so the fence becomes ```text,
ensuring the markdown linter recognizes it as a code block with an explicit
language.

In `@app-modules/integration-whatsapp/docs/plans/0001-ingest-implementation.md`:
- Around line 32-54: The Markdown file contains multiple unlabeled fenced code
blocks (e.g., the architecture diagram and payload examples surrounding symbols
like VerifyWhatsAppSignature, WhatsAppWebhookController, ProcessWhatsAppEvent,
and the POST body schema) which trigger MD040; add appropriate language
identifiers to each fence — use "text" for the ASCII diagram block, "json" for
the request body example ({ type, group_jid, ... }), "bash" for any CLI
snippets, and "env" for environment samples — so every fenced block has a
language tag to clear the lint warnings and improve readability.

In `@app-modules/integration-whatsapp/docs/spec.md`:
- Around line 63-78: The two fenced diagram blocks in the spec (the block
starting with "WhatsApp servers ↓ WebSocket (Baileys) [ wpp-tui ]" and the block
listing "whatsapp_groups ├─ id uuid PK ...") are missing language tags; update
their opening ``` fences to include a language (e.g., ```text) so both diagrams
and the other affected block range (lines around the second diagram referenced)
are annotated and satisfy markdown linting (MD040).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro

Run ID: b6fb1b2d-3870-4d40-9845-c6d4061b98d8

📥 Commits

Reviewing files that changed from the base of the PR and between efd05ff and c264605.

⛔ Files ignored due to path filters (1)
  • composer.lock is excluded by !**/*.lock
📒 Files selected for processing (26)
  • .env.example
  • app-modules/integration-whatsapp/CONTEXT.md
  • app-modules/integration-whatsapp/composer.json
  • app-modules/integration-whatsapp/config/whatsapp.php
  • app-modules/integration-whatsapp/database/factories/WhatsAppEventFactory.php
  • app-modules/integration-whatsapp/database/factories/WhatsAppGroupFactory.php
  • app-modules/integration-whatsapp/database/factories/WhatsAppParticipantFactory.php
  • app-modules/integration-whatsapp/database/migrations/2026_05_20_120000_create_whatsapp_groups_table.php
  • app-modules/integration-whatsapp/database/migrations/2026_05_20_120001_create_whatsapp_participants_table.php
  • app-modules/integration-whatsapp/database/migrations/2026_05_20_120002_create_whatsapp_events_table.php
  • app-modules/integration-whatsapp/docs/adr/0001-data-lake-approach.md
  • app-modules/integration-whatsapp/docs/plans/0001-ingest-implementation.md
  • app-modules/integration-whatsapp/docs/spec.md
  • app-modules/integration-whatsapp/routes/whatsapp-routes.php
  • app-modules/integration-whatsapp/src/Ingest/Http/Controllers/WhatsAppWebhookController.php
  • app-modules/integration-whatsapp/src/Ingest/Http/Middleware/VerifyWhatsAppSignature.php
  • app-modules/integration-whatsapp/src/Ingest/Http/Requests/IngestEventRequest.php
  • app-modules/integration-whatsapp/src/Ingest/Jobs/ProcessWhatsAppEvent.php
  • app-modules/integration-whatsapp/src/IntegrationWhatsappServiceProvider.php
  • app-modules/integration-whatsapp/src/Models/WhatsAppEvent.php
  • app-modules/integration-whatsapp/src/Models/WhatsAppGroup.php
  • app-modules/integration-whatsapp/src/Models/WhatsAppParticipant.php
  • app-modules/integration-whatsapp/tests/Feature/Ingest/ProcessWhatsAppEventTest.php
  • app-modules/integration-whatsapp/tests/Feature/Ingest/WebhookIngestTest.php
  • app-modules/integration-whatsapp/tests/Feature/Models/WhatsAppModelsTest.php
  • composer.json

Comment on lines +70 to +77
This decision **must be revisited when the exploration phase ends** — concretely, once the data team
has defined the metrics worth keeping. At that point, evaluate:

- Aggregating useful signals into a persistent metrics table and **dropping or truncating raw payloads**.
- Introducing a **retention TTL** for `whatsapp_events`.
- Re-introducing **phone hashing / content minimization** (with a shared pepper if hashing, to keep
identity linking working).
- Whether `presence.update` is still worth collecting at volume.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the privacy review trigger concrete (date + owner).

“Revisit when exploration phase ends” is too open-ended for a decision that explicitly stores raw phone/content with no TTL. Add a hard deadline and accountable owner to avoid indefinite retention by drift.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app-modules/integration-whatsapp/docs/adr/0001-data-lake-approach.md` around
lines 70 - 77, Update the ADR text to replace the vague “revisit when the
exploration phase ends” with a concrete privacy review trigger: add a specific
calendar deadline (e.g., "Review by YYYY-MM-DD") and an accountable owner (e.g.,
"Owner: Data Privacy Lead / <name or role>"). Also document where the review
will evaluate the listed items (aggregating signals, TTL for whatsapp_events,
phone hashing/content minimization, and presence.update) so the owner and date
are clearly tied to those symbols (`whatsapp_events`, `presence.update`) in the
same paragraph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant