-
Notifications
You must be signed in to change notification settings - Fork 24
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
I was wondering if there is an option to improve the performance even further when parsing many strings that are all in the same format.
My use-case is parsing timestamps from a CSV file where the CSV file has million of rows and each of the timestamps is in the same format.
It would be ideal if I could just say to the parser: "remember that format you detected for the previous string. I'm pretty sure this string is in the same format, so try that first when parsing this string".
To illustrate this, my situation is similar to this benchmark
package com.github.sisyphsu.dateparser.benchmark;
import com.github.sisyphsu.dateparser.DateParser;
import org.openjdk.jmh.annotations.*;
import java.util.Random;
import java.util.concurrent.TimeUnit;
@Warmup(iterations = 2, time = 2)
@BenchmarkMode(Mode.AverageTime)
@Fork(2)
@Measurement(iterations = 3, time = 3)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MultiSameBenchmark {
private static String[] TEXTS;
static {
Random random = new Random(123456789l);
TEXTS = new String[10000000];
for(int i = 0; i < TEXTS.length; i++){
TEXTS[i] = String.format("2020-0%d-1%d 00:%d%d:00 UTC",
random.nextInt(8) + 1,
random.nextInt(8) + 1,
random.nextInt(5),
random.nextInt(9));
}
}
@Benchmark
public void parser() {
DateParser parser = DateParser.newBuilder().build();
for (String text : TEXTS) {
parser.parseDate(text);
}
}
}Is there already such an option on the parser that I overlooked ?
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed