This example shows how to use the Resume API for processing a large file. It uses the file component to read a large text file. The processing reads 30 lines of the file and then stop. Then, when it restarts, it continues from the last line it processed in the previous execution. The offsets are stored in a Kafka topic.
Note: this demo runs in a container. Although it is possible to run it outside a container, doing so requires additional infrastructure. Therefore, it's not covered by the documentation.
Note 2: the code is deliberately slowed down for a better display of the execution.
To build the demo and the containers:
mvn clean package && docker compose buildTo run the demo:
docker compose up -d && docker compose logs --no-log-prefix -f example ; docker compose downTo create a large file for processing, run:
cat /dev/null > data.txt ; for i in $(seq -f '%010g' 1 10000000) ; do echo $i >> data.txt ; doneVerify command
cat summary.txt | sort | uniq --count --repeated | wc -l
To create a large directory tree for processing, run:
Very Small (500 files)
mkdir very-small && cd very-small
for dir in $(seq 1 5) ; do mkdir $dir && (cd $dir && (for file in $(seq 1 100) ; do echo $RANDOM > $file ; done) ; cd ..) ; doneSmall (25000 files)
mkdir small && cd small
for dir in $(seq 1 5) ; do mkdir $dir && (cd $dir && (for file in $(seq 1 5000) ; do echo $RANDOM > $file ; done) ; cd ..) ; doneMedium (50000 files)
mkdir medium && cd medium
for dir in $(seq 1 5) ; do mkdir $dir && (cd $dir && (for file in $(seq 1 10000) ; do echo $RANDOM > $file ; done) ; cd ..) ; doneLarge (100000 files)
mkdir large && cd large
for dir in $(seq 1 10) ; do mkdir $dir && (cd $dir && (for file in $(seq 1 10000) ; do echo $RANDOM > $file ; done) ; cd ..) ; doneVery Large (2000000 files)
mkdir very-large && cd very-large
for dir in $(seq 1 2) ; do mkdir $dir && (cd $dir && (for file in $(seq 1 1000000) ; do echo $RANDOM > $file ; done) ; cd ..) ; done