Skip to content

feat: 뉴스 DOM JSON 데이터 검증 및 검색 CLI 도구 추가#280

Open
seonghobae wants to merge 2 commits into
developfrom
feat/dom-tools-17499131751374055448
Open

feat: 뉴스 DOM JSON 데이터 검증 및 검색 CLI 도구 추가#280
seonghobae wants to merge 2 commits into
developfrom
feat/dom-tools-17499131751374055448

Conversation

@seonghobae

Copy link
Copy Markdown
Collaborator

What (구현 내용)

  • 파싱된 NewsDOM JSON 결과물이 ParseResponse Pydantic 스키마와 완벽하게 일치하는지 검증하는 tools/validate_dom.py CLI 도구를 추가했습니다.
  • 뉴스 DOM JSON 내의 기사 제목(headline)과 본문(body_blocks)에서 특정 텍스트 키워드를 검색하고 위치(페이지, 기사 ID 등)를 반환하는 tools/search_dom.py CLI 도구를 추가했습니다.
  • 위 도구들에 대해 edge-case(if __name__ == "__main__": 분기, sys.path injection 분기, 예외 상황 등)를 꼼꼼히 모킹하여 100% 분기 커버리지 단위 테스트를 작성했습니다.
  • 변경 사항을 한국어로 CHANGELOG.md에 기록했습니다.

Why (해결하는 문제)

  • 추출된 JSON 데이터의 정합성을 배포나 데이터 파이프라인에서 수동으로 확인하는 번거로움을 줄이고 스크립트 기반 검증 자동화를 지원하기 위함입니다.
  • 디버깅 또는 특정 사건/기사 검색 시 거대한 JSON 파일을 수동으로 조회하기 어려운 문제를 해결하기 위함입니다.

Impact (성능/영향)

  • 코어 패키지 실행 속도나 동작에 영향을 주지 않는 독립적인 tools/ 툴 체인 추가입니다.
  • 검색 스크립트는 컴파일된 정규식을 사용하여 대용량 DOM 구조에서도 빠르게 검색 결과를 반환합니다.

Measurement (측정/검증 방법)

  • uv run ruff checkuv run ruff format을 통한 코드 품질 검증 통과 완료.
  • uv run pytest --cov=tools --cov-branch --cov-report=term-missing 명령을 실행하여 tools/ 패키지의 테스트 커버리지 100%를 달성 및 확인했습니다.

PR created automatically by Jules for task 17499131751374055448 started by @seonghobae

- DOM 스키마를 엄격히 검증하는 `validate_dom.py` 작성
- 기사 제목/본문을 검색하는 `search_dom.py` 작성
- 관련 테스트 코드 작성 및 100% 커버리지 달성
- 변경 사항을 `CHANGELOG.md`에 기록
@google-labs-jules

Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Comment thread tests/test_tools_search_dom.py Fixed
- `test_search_dom_unknown_type` 함수 내부의 중복된 `import json` 구문을 제거하여 `opencode-review` CI 실패 이슈를 수정했습니다.
@opencode-agent

opencode-agent Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

OpenCode Review Overview

  • Head SHA: 55639b00fbb64bf9f1b92ddc31b908d1ac7aa0f0
  • Workflow run: 28727484835
  • Workflow attempt: 1
  • Gate result: APPROVE (approval step)

Pull request overview

OpenCode reviewed the current-head bounded evidence and found no blocking issues.

Findings

No blocking findings.

Summary

Approval sufficiency: bounded evidence supplied affirmative approval evidence for changed files, coverage/docstring posture, risk surfaces, and current-head verification; approval is not based merely on the absence of known blockers.
Verification posture: CodeGraph evidence was initialized and bounded current-head evidence reviewed for changed-file evidence including CHANGELOG.md, tests/test_tools_search_dom.py, tests/test_tools_validate_dom.py, tools/search_dom.py, tools/validate_dom.py.
Linter/static: workflow/static review evidence is bounded by the current-head GitHub Checks gate and changed-file evidence.
TDD/regression: coverage execution evidence and focused changed hunks were reviewed from bounded-review-evidence.md.
Coverage: coverage execution evidence reports supported repository test suites passed.
Docstring coverage: coverage execution evidence reports configured repository docstring gates passed or docstring coverage was advisory.
DAG: CodeGraph/source-backed behavior map connects CHANGELOG.md to the affected review, runtime, or workflow path and required checks.
PoC/execution: coverage-evidence job executed on the current head and reported PASS.
DDD/domain: workflow and repository-governance invariants were reviewed against changed files in bounded evidence.
CDD/context: CodeGraph evidence, changed-file history, and focused hunks were reviewed from bounded-review-evidence.md.
Similar issues: changed-file history evidence was reviewed for comparable local precedents.
Claim/concept check: bounded evidence, repository source, current-head workflow evidence, and, where numeric, scientific, statistical, or literature-backed claims are affected, original-paper/formula evidence and parameter-recovery expectations were used for claims.
Standards search: standards and external-source checks are delegated to configured OpenCode web_search/Context7/DeepWiki sources when applicable; no evidence-backed standards blocker is present in bounded evidence.
Compatibility/convention: changed workflow/script conventions, object naming, and reserved-word safety for schema/API/config/code surfaces were checked in bounded evidence.
Breaking-change/backcompat: deployment evidence and changed-file history were checked for backward-compatibility risk.
Performance: changed surfaces were checked for performance risk in bounded evidence.
Developer experience: changed automation, review, test, setup, and maintenance surfaces were checked for helpful or obstructive DX impact in bounded evidence.
User experience: connected user, operator, API, CLI, documentation, review-comment, status-check, rendering, and workflow-reader behavior was checked for contradictions against code, docs, and tests in bounded evidence.
Visual/DOM: Playwright visual, DOM locator, ARIA snapshot, console, and responsive evidence were checked when a web UI surface was present; for non-web surfaces, API/CLI/log/docs/workflow interaction evidence was reviewed instead.
Accessibility/i18n: accessibility, localization, and human-readable text surfaces were checked where UI, CLI, API message, docs, logs, or review text changed.
Supply-chain/license: dependency, package, model, container, and external-tool changes were checked in bounded evidence.
Packaging: package, build, test, lint, and security contracts were checked in bounded evidence.
Security/privacy: workflow-token, review-gate, and repository-automation security/privacy boundaries were checked in bounded evidence.

  • Result: APPROVE
  • Reason: New validation tool meets all quality standards
  • Head SHA: 55639b00fbb64bf9f1b92ddc31b908d1ac7aa0f0
  • Workflow run: 28727484835
  • Workflow attempt: 1

Changed-File Evidence Map

flowchart LR
  PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
  Evidence --> S1["Changed file (3 files)"]
  S1 --> I1["repository behavior"]
  I1 --> R1["Review risk: Changed file (3 files)"]
  R1 --> V1["required checks"]
  Evidence --> S2["Test (2 files)"]
  S2 --> I2["regression suite"]
  I2 --> R2["Review risk: Test (2 files)"]
  R2 --> V2["targeted test run"]
Loading

@opencode-agent opencode-agent Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

OpenCode reviewed the current-head bounded evidence and found no blocking issues.

Findings

No blocking findings.

Summary

Approval sufficiency: bounded evidence supplied affirmative approval evidence for changed files, coverage/docstring posture, risk surfaces, and current-head verification; approval is not based merely on the absence of known blockers.
Verification posture: CodeGraph evidence was initialized and bounded current-head evidence reviewed for changed-file evidence including CHANGELOG.md, tests/test_tools_search_dom.py, tests/test_tools_validate_dom.py, tools/search_dom.py, tools/validate_dom.py.
Linter/static: workflow/static review evidence is bounded by the current-head GitHub Checks gate and changed-file evidence.
TDD/regression: coverage execution evidence and focused changed hunks were reviewed from bounded-review-evidence.md.
Coverage: coverage execution evidence reports supported repository test suites passed.
Docstring coverage: coverage execution evidence reports configured repository docstring gates passed or docstring coverage was advisory.
DAG: CodeGraph/source-backed behavior map connects CHANGELOG.md to the affected review, runtime, or workflow path and required checks.
PoC/execution: coverage-evidence job executed on the current head and reported PASS.
DDD/domain: workflow and repository-governance invariants were reviewed against changed files in bounded evidence.
CDD/context: CodeGraph evidence, changed-file history, and focused hunks were reviewed from bounded-review-evidence.md.
Similar issues: changed-file history evidence was reviewed for comparable local precedents.
Claim/concept check: bounded evidence, repository source, current-head workflow evidence, and, where numeric, scientific, statistical, or literature-backed claims are affected, original-paper/formula evidence and parameter-recovery expectations were used for claims.
Standards search: standards and external-source checks are delegated to configured OpenCode web_search/Context7/DeepWiki sources when applicable; no evidence-backed standards blocker is present in bounded evidence.
Compatibility/convention: changed workflow/script conventions, object naming, and reserved-word safety for schema/API/config/code surfaces were checked in bounded evidence.
Breaking-change/backcompat: deployment evidence and changed-file history were checked for backward-compatibility risk.
Performance: changed surfaces were checked for performance risk in bounded evidence.
Developer experience: changed automation, review, test, setup, and maintenance surfaces were checked for helpful or obstructive DX impact in bounded evidence.
User experience: connected user, operator, API, CLI, documentation, review-comment, status-check, rendering, and workflow-reader behavior was checked for contradictions against code, docs, and tests in bounded evidence.
Visual/DOM: Playwright visual, DOM locator, ARIA snapshot, console, and responsive evidence were checked when a web UI surface was present; for non-web surfaces, API/CLI/log/docs/workflow interaction evidence was reviewed instead.
Accessibility/i18n: accessibility, localization, and human-readable text surfaces were checked where UI, CLI, API message, docs, logs, or review text changed.
Supply-chain/license: dependency, package, model, container, and external-tool changes were checked in bounded evidence.
Packaging: package, build, test, lint, and security contracts were checked in bounded evidence.
Security/privacy: workflow-token, review-gate, and repository-automation security/privacy boundaries were checked in bounded evidence.

  • Result: APPROVE
  • Reason: New validation tool meets all quality standards
  • Head SHA: 55639b00fbb64bf9f1b92ddc31b908d1ac7aa0f0
  • Workflow run: 28727484835
  • Workflow attempt: 1

Changed-File Evidence Map

flowchart LR
  PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
  Evidence --> S1["Changed file (3 files)"]
  S1 --> I1["repository behavior"]
  I1 --> R1["Review risk: Changed file (3 files)"]
  R1 --> V1["required checks"]
  Evidence --> S2["Test (2 files)"]
  S2 --> I2["regression suite"]
  I2 --> R2["Review risk: Test (2 files)"]
  R2 --> V2["targeted test run"]
Loading

@github-actions github-actions Bot enabled auto-merge (squash) July 5, 2026 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant