13 Mar 18:32

5462080

Release 1.0.0 Latest

Latest

Normalized Semantic Chunker - Docker Image Published

This release publishes a single Docker image with CUDA support that works on both GPU and CPU machines.

Pull the image

docker pull ghcr.io/smart-models/normalized-semantic-chunker:1.0.0
# or
docker pull ghcr.io/smart-models/normalized-semantic-chunker:latest

Run with GPU (recommended)

docker run --gpus all -p 8000:8000 ghcr.io/smart-models/normalized-semantic-chunker:1.0.0

Run on CPU (fallback)

docker run -p 8000:8000 ghcr.io/smart-models/normalized-semantic-chunker:1.0.0

Docker Compose

cd docker

# With GPU
docker compose --profile gpu up -d

# CPU only
docker compose --profile cpu up -d

Verify installation

curl http://localhost:8000/

For more information, see the README.

Full Changelog: v0.7.2...v1.0.0

Assets 2

18 Sep 15:12

federicopalma-pro

v0.7.2

27404ef

Release v0.7.2

🚀 Normalized Semantic Chunker - Docker Images Published

This release includes Docker images for both CPU and GPU variants of the Normalized Semantic Chunker:

CPU Image

docker pull ghcr.io/smart-models/Normalized-Semantic-Chunker:v0.7.2-cpu
docker pull ghcr.io/smart-models/Normalized-Semantic-Chunker:latest-cpu

GPU Image (CUDA 12.1)

docker pull ghcr.io/smart-models/Normalized-Semantic-Chunker:v0.7.2-gpu
docker pull ghcr.io/smart-models/Normalized-Semantic-Chunker:latest-gpu

Docker Compose

# CPU deployment
cd docker
docker compose --profile cpu up -d

# GPU deployment (requires NVIDIA GPU and drivers)
cd docker
docker compose --profile gpu up -d

Quick Start

# Run CPU version
docker run -p 8080:8080 ghcr.io/smart-models/Normalized-Semantic-Chunker:v0.7.2-cpu

# Run GPU version (requires nvidia-docker)
docker run --gpus all -p 8080:8080 ghcr.io/smart-models/Normalized-Semantic-Chunker:v0.7.2-gpu

Features

Intelligent text normalization and semantic chunking
Support for multiple languages and document formats
GPU acceleration for improved performance
RESTful API with comprehensive documentation
Docker containerization for easy deployment

For more information, see the README.

Assets 2

03 Aug 20:16

federicopalma-pro

v0.7.1

91b3f25

v0.7.1

Normalized Semantic Chunker v0.7.1

✨ New Features
JSON File Support: Processing of JSON files with format {"chunks": [{"text": "..."}, ...]}
Dynamic Memory Management: Smart worker allocation based on system resources
Verbosity Controls: Configurable logging for debugging and production
Configurable Parameters: Control via environment variables
File Validation: Input size and format checks

🚀 Performance Improvements
Batch Processing: Prevents OOM errors for large documents (>20K sentences)
Model Caching: Cache system with automatic expiration (1h default)
Adaptive Workers: Scalability based on document size
Memory Cleanup: Optimized GPU memory management
Adaptive Step Size: Optimization based on document size

🛡️ Robustness
Error Handling: Smart fallback mechanisms for tiktoken errors
Automatic Recovery: Recovery mechanisms from processing failures
Improved Logging: Detailed and configurable logging system
Input Validation: Comprehensive checks on file size, format, and content

📈 Improvement Metrics
⬇️ Memory Usage: -30–40% for large documents
⚡ Speed: +15–25% for documents >10K sentences
🛠️ Reliability: +95% reduction in processing errors
🔧 Configurability: Full control via environment variables

Changelog

Torch library to 2.6.0
Docker compose fix

Assets 2

25 Jun 22:56

federicopalma-pro

v0.7.0

1f6c1c0

v0.7.0

Normalized Semantic Chunker v0.7.0

Assets 2

06 Apr 15:46

federicopalma-pro

v0.5.0

f739580

v0.5.0

Normalized Semantic Chunker v0.5.0

A cutting-edge tool that processes text documents and splits them into semantically coherent segments while ensuring optimal chunk size for downstream NLP tasks. Ideal for retrieval-augmented generation (RAG) and other token-sensitive applications.

Key Features

Adaptive semantic chunking with precise token limit control
Parallel multi-percentile optimization and GPU acceleration
Intelligent handling of small and oversized chunks
REST API with FastAPI

Prerequisites

Docker and Docker Compose (for Docker deployment)
NVIDIA GPU with CUDA support (recommended)
NVIDIA Container Toolkit (for GPU passthrough in Docker)
Python 3.10-3.12 (Python 3.11 recommended, Python 3.13 not supported due to dependency compatibility issues)

Assets 2

Releases: smart-models/Normalized-Semantic-Chunker

Release 1.0.0

Normalized Semantic Chunker - Docker Image Published

Pull the image

Run with GPU (recommended)

Run on CPU (fallback)

Docker Compose

Verify installation

Uh oh!

Release v0.7.2

🚀 Normalized Semantic Chunker - Docker Images Published

CPU Image

GPU Image (CUDA 12.1)

Docker Compose

Quick Start

Features

Uh oh!

v0.7.1

Normalized Semantic Chunker v0.7.1

Changelog

Uh oh!

v0.7.0

Normalized Semantic Chunker v0.7.0

Uh oh!

v0.5.0

Normalized Semantic Chunker v0.5.0

Key Features

Prerequisites

Uh oh!