Skip to content

Performance: Parallelize ingestion pipelines #66

@mikewaters

Description

@mikewaters

In-Pipeline

I can use num_workers to parallelize an individual ingestion.
BLOCKER: sqlite allows only one writer, and our transforms run in a database session which carries through to each (now parallelized via multiproeccing) worker. No good, database locking issues.

Across Pipelines

I can use LLamaindex Workflows tro parallelize multiple jobs, whihc I think I will do atthe Substrate layer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions