Skip to content

cat: Improve performance on Linux#1978

Merged
sylvestre merged 6 commits intouutils:masterfrom
ArniDagur:use-splice2
Apr 1, 2021
Merged

cat: Improve performance on Linux#1978
sylvestre merged 6 commits intouutils:masterfrom
ArniDagur:use-splice2

Conversation

@ArniDagur
Copy link
Contributor

@ArniDagur ArniDagur commented Mar 30, 2021

This PR should improve cat performance, especially on Linux. It does this by:

  • Using trait monomorphization rather than trait objects (Box<dyn Trait>).
  • Using splice on Linux

This PR supercedes #1289

Unscientific benchmarks

GNU cat

[arni][~/src/coreutils][use-splice2]% cat /tmp/bigfile | pv -r > /dev/null
[4.47GiB/s]
[arni][~/src/coreutils][use-splice2]% cat /tmp/bigfile | pv -r > /dev/null
[4.68GiB/s]
[arni][~/src/coreutils][use-splice2]% cat /tmp/bigfile | pv -r > /dev/null
[4.65GiB/s]

Before removing trait box, without Linux specialization

[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[3.41GiB/s]
[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[3.40GiB/s]
[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[3.52GiB/s]

After removing trait box, without Linux specialization

This is the figure you should expect for non Linux platforms.

[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[7.76GiB/s]
[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[5.67GiB/s]
[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[6.61GiB/s]

Linux specialization

[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[29.6GiB/s]
--- 22:33:36 ---
[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[14.4GiB/s]
--- 22:33:37 ---
[arni][~/src/coreutils][use-splice2]% target/release/cat /tmp/bigfile | pv -r > /dev/null
[22.2GiB/s]

On my MacBook Pro 2020, it is around 25% faster to not use io::copy.
@ArniDagur ArniDagur changed the title cat: Improve performance, especially on Linux cat: Improve performance on Linux Mar 31, 2021
Copy link
Collaborator

@Arcterus Arcterus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. In the future, I think we'll want to extract the splice() bits into a function or a type that serves as both a reader and a writer and throw it into uucore, but that can be done later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants