Skip to content

ozefe/yoktez

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

yoktez

yoktez mascot generated by Google's Nano Banana 2

Typed Python client for the National Thesis Center of Turkey.

yoktez wraps the YOK NTC JSP/AJAX surface behind a single synchronous Client with frozen-dataclass return types, a deterministic exception hierarchy, and bilingual-aware fields. Built for application and CLI developers who need a typed surface and a small install footprint without writing bespoke scraping code for each project.

Installation

pip install yoktez

Requires Python 3.14+.

Quickstart

"""End-to-end yoktez quickstart: search -> metadata -> assets.

Demonstrates the typical three-call flow without writing files to disk.

Run with: `python examples/quickstart.py`
"""

from yoktez import AssetStatus, Client

_QUERY = "yapay zeka"


with Client() as client:
    results = client.search.simple(_QUERY)
    print(f"{results.total} matches for {_QUERY!r}")

    thesis = results[0]
    print(f"  title:   {thesis.title}")
    print(f"  author:  {thesis.author}")
    print(f"  year:    {thesis.year}")
    print(f"  keys:    {thesis.registration_no} / {thesis.thesis_no}")

    metadata = client.metadata.get(thesis)
    print(f"  advisor: {metadata.supervisor}")
    if metadata.affiliation is not None:
        print(f"  uni:     {metadata.affiliation.university}")
    if metadata.keywords is not None:
        print(f"  tags:    {len(metadata.keywords)} keywords")

    assets = client.assets.get(thesis)
    print(f"  status:  {assets.status.name}")
    if assets.status is AssetStatus.AVAILABLE:
        print(f"  pdf_key: {assets.pdf_key}")

Sample output:

6841 matches for 'yapay zeka'
  title:   Kimya eğitiminde yapay zekâ araştırmalarına ilişkin bir meta-sentez çalışması
  author:  MURAT EBUBEKİR YAYLA
  year:    2026
  keys:    nslbSyAODG1_FIruL8qUAA / THvIvDpZXvJIiHZpuqpKVw
  advisor: PROF. DR. MUSA ÜCE
  uni:     MARMARA ÜNİVERSİTESİ
  tags:    5 keywords
  status:  AVAILABLE
  pdf_key: 5T1_CZ5-UGb9QCmoURec4AbpuuyvqUeed_1PcCh_6DVZ4b1fbX7Gcu-DQFLIcE11

Features

  • Four search modes: simple, advanced, detail, and recent from a single client.search namespace, all returning a sliceable SearchResults carrying the database-wide match total alongside the result window.
  • Structured metadata: client.metadata.get(thesis) returns a typed ThesisMetadata with bilingual keywords (Bilingual(raw, tr, en)), a tiered Affiliation, and pre-formatted citation strings (APA / IEEE / MLA / Chicago / Harvard).
  • Two-step asset download: client.assets.get(thesis) resolves to one of AVAILABLE / UNDER_EMBARGO / NO_PERMIT / PREPARING before any bytes move; the available branch exposes a pdf_key (and optional appendix_key) to feed download_pdf / download_appendix.
  • Catalog lookups: client.lookups covers universities (TR / INT), institutes, divisions, subjects, departments, sections, and keywords, with per-instance memoization and an explicit refresh().
  • Typed value objects: every returned record is a @dataclass(frozen=True, slots=True); values are immutable, hashable where field types allow, and ship with py.typed for downstream type checkers.
  • Sync-only, thread-friendly: no async/await surface; the recommended concurrency pattern is one Client per thread.
  • Small dependency surface: httpx, beautifulsoup4, and lxml. No Rust core, no auth, no hidden state.

Usage

All snippets assume with Client() as client: for deterministic cleanup of the underlying HTTP connection pool.

Search

Simple search by free text, optionally narrowed to a single field:

from yoktez import Client, SearchField

with Client() as client:
    results = client.search.simple("yapay zeka", field=SearchField.ABSTRACT)

    print(f"{results.total} matches")
    for thesis in results[:5]:
        print(thesis.year, thesis.title)

Advanced search joins up to three terms with boolean operators:

from yoktez import AdvancedOperator, Client, MatchType

with Client() as client:
    results = client.search.advanced(
        "sosyal",
        term2="medya",
        op1=AdvancedOperator.AND,
        match=MatchType.INCLUDES,
    )

Detail search accepts the full filter surface; enum-shaped parameters also accept the member name as a string or the raw int code:

from yoktez import Client, ThesisType

with Client() as client:
    unis = client.lookups.universities()
    results = client.search.detail(
        university=unis[0],
        year_min=2020,
        year_max=2025,
        degree_type=ThesisType.MASTER,  # also accepts "MASTER" or 1
    )

Recently added theses (server-fixed 15-day window):

from yoktez import Client

with Client() as client:
    results = client.search.recent()

Metadata

from yoktez import Client

with Client() as client:
    thesis = client.search.simple("makine öğrenmesi")[0]
    metadata = client.metadata.get(thesis)

    if metadata.affiliation is not None:
        print(metadata.affiliation.university)
    if metadata.keywords:
        print(metadata.keywords[0].tr, "=", metadata.keywords[0].en)
    if metadata.references is not None:
        print(metadata.references.apa)

Assets (two-step download)

from yoktez import AssetStatus, Client

with Client() as client:
    thesis = client.search.simple("yapay zeka")[0]
    assets = client.assets.get(thesis)

    if assets.status is AssetStatus.AVAILABLE and assets.pdf_key is not None:
        client.assets.download_pdf(assets.pdf_key, "thesis.pdf")

        if assets.appendix_key is not None:
            client.assets.download_appendix(assets.appendix_key, "thesis-ek.rar")

download_pdf and download_appendix accept a filesystem path (Path or str, opened and closed for you) or a pre-opened binary file-like (written to but not closed — ownership stays with the caller).

Lookups

from yoktez import Client, UniversitySource

with Client() as client:
    unis = client.lookups.universities(UniversitySource.TR)
    institutes = client.lookups.institutes(unis[0])
    divisions = client.lookups.divisions(unis[0], institutes[0])

    # Bulk catalogs; keywords() also accepts group / language / first_letter / search.
    keywords = client.lookups.all_keywords()

Every client.lookups.* call is memoized on the Client instance. Call client.lookups.refresh() to clear the cache if YOKSIS IDs are suspected to have rotated.

HTTP client configuration

Client accepts keyword-only overrides for the underlying httpx.Client:

from yoktez import Client

with Client(timeout=60, retries=5, user_agent="my-app/1.0") as client:
    ...

For full control, inject a pre-built httpx.Client via http_client=. Ownership stays with the caller; Client.close() is a no-op for an injected client:

import httpx
from yoktez import Client

http = httpx.Client(timeout=30.0, follow_redirects=True)
try:
    with Client(http_client=http) as client:
        ...
finally:
    http.close()

Concurrency

yoktez.Client is single-threaded by design — share one per thread, never across threads. The library ships no concurrency primitives; threading strategy is the caller's choice.

Design principles

  • Synchronous-only API: Sync is sufficient for YOK NTC's IO patterns; an async surface would double the API and complicate testing for no proven benefit. Concurrency strategy belongs to the caller, and examples/multithreaded_pool.py demonstrates the one-Client-per-thread pattern.
  • Frozen-dataclass value objects: Every returned record is @dataclass(frozen=True, slots=True). Stdlib-only, immutable, hashable, and very fast.
  • Coerce-on-input enum handling: Enum-shaped parameters accept the matching Enum member, its name (e.g., "MASTER"), or its raw int code; the raw-int passthrough tolerates new YOK NTC codes the library hasn't yet enumerated, so wire-side additions don't gate a release.
  • Two-step download flow: client.assets.get(...) resolves status first; download_pdf and download_appendix run only on the available branch. Honest to the underlying YOK NTC flow, and lets callers inspect embargo dates and appendix availability before committing to a second request.
  • Hierarchical logger naming: Every sub-package logs under yoktez.<concern> (yoktez.http, yoktez.search, yoktez.lookups, yoktez.assets). Operators can silence the high-volume HTTP DEBUG channel while preserving the rarer parser WARNING channels; a single logging.getLogger("yoktez").setLevel(...) still catches every child through parent propagation.

Limitations

yoktez is intentionally narrow. The following are out of scope and will not land in this package:

  • No async API: Synchronous code throughout; no async def, no asyncio surface.
  • No multi-threaded helper functions: Concurrency strategy is the caller's choice.
  • No authentication or login flows (e-Devlet): Anonymous public-data access only; features requiring login (favorites, history) are excluded.
  • No bypassing access restrictions: Embargoed and no-permit theses surface their state via AssetStatus and the matching exception types; the library does not attempt to circumvent these.
  • No data hosting or mirroring: The library fetches on demand; no bundled snapshots of the YOK NTC database.
  • No CLI shipped from this package: A separate package may add one later — out of scope here.

License

MIT — see LICENSE.

About

Typed Python client for searching, fetching metadata, and downloading theses from the National Thesis Center of Turkey (YÖK Ulusal Tez Merkezi)

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages