Skip to content

(How to?) Improve performance when parsing many strings in the same format #17

@robin-xyzt-ai

Description

@robin-xyzt-ai

I was wondering if there is an option to improve the performance even further when parsing many strings that are all in the same format.
My use-case is parsing timestamps from a CSV file where the CSV file has million of rows and each of the timestamps is in the same format.
It would be ideal if I could just say to the parser: "remember that format you detected for the previous string. I'm pretty sure this string is in the same format, so try that first when parsing this string".

To illustrate this, my situation is similar to this benchmark

package com.github.sisyphsu.dateparser.benchmark;

import com.github.sisyphsu.dateparser.DateParser;
import org.openjdk.jmh.annotations.*;

import java.util.Random;
import java.util.concurrent.TimeUnit;

@Warmup(iterations = 2, time = 2)
@BenchmarkMode(Mode.AverageTime)
@Fork(2)
@Measurement(iterations = 3, time = 3)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MultiSameBenchmark {

    private static String[] TEXTS;

    static {
        Random random = new Random(123456789l);
        TEXTS = new String[10000000];
        for(int i = 0; i < TEXTS.length; i++){
            TEXTS[i] = String.format("2020-0%d-1%d 00:%d%d:00 UTC",
                    random.nextInt(8) + 1,
                    random.nextInt(8) + 1,
                    random.nextInt(5),
                    random.nextInt(9));
        }
    }

    @Benchmark
    public void parser() {
        DateParser parser = DateParser.newBuilder().build();
        for (String text : TEXTS) {
            parser.parseDate(text);
        }
    }
}

Is there already such an option on the parser that I overlooked ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions