Skip to content

Use splice syscall on Linux to greatly improve cat performance#1289

Closed
ArniDagur wants to merge 6 commits intouutils:masterfrom
ArniDagur:use-splice
Closed

Use splice syscall on Linux to greatly improve cat performance#1289
ArniDagur wants to merge 6 commits intouutils:masterfrom
ArniDagur:use-splice

Conversation

@ArniDagur
Copy link
Contributor

This is what mre/fcat does to great effect (3x faster than GNU cat). Unfortunately, when testing the source branch, I get the same error as described in mre/fcat#12, so there are definitely bugs that need to be dealt with. Thus I labelled this PR work in progress.

If someone here could test the source branch on their machine to see if they get the same error, do code review, or give other feedback I would very much appreciate it :)

@Arcterus
Copy link
Collaborator

Based on what I'm seeing in that thread, I'm imagining it might be due to some kernel setting (or combination of kernel settings). Have you tested using different shells? I'm not using Linux atm, so I can't check myself until later this week.

@ArniDagur
Copy link
Contributor Author

@Arcterus It turns out that the problem lied in my use of the following two lines in my ~/.bashrc

eval $(keychain --agents gpg,ssh --quiet --eval id_rsa)
export GPG_TTY=$(tty)

I intend to file an issue with the keychain people soon, as this could be a preventable bug.

Fastcat and my not-so-great tee clone now work as intended, and on the topic of this pull request, a quick test yields positive results:

Source branch (this pull request):

[arni: coreutils (use-splice)]$ target/release/cat /tmp/largefile | pv -r > /dev/null
[10.8GiB/s]
[arni: coreutils (use-splice)]$ target/release/cat /tmp/largefile | pv -r > /dev/null
[11.2GiB/s]
[arni: coreutils (use-splice)]$ target/release/cat /tmp/largefile | pv -r > /dev/null
[11.0GiB/s]

Master:

[arni: coreutils (master)]$ target/release/cat /tmp/largefile | pv -r > /dev/null
[3.94GiB/s]
[arni: coreutils (master)]$ target/release/cat /tmp/largefile | pv -r > /dev/null
[3.96GiB/s]
[arni: coreutils (master)]$ target/release/cat /tmp/largefile | pv -r > /dev/null
[3.87GiB/s]

Please follow up later this week when you get back to your Linux machine :)

@ArniDagur
Copy link
Contributor Author

@Arcterus The unit tests pass for me. Do you mind taking a look at this PR?

@ArniDagur ArniDagur changed the title WIP: Use splice syscall on Linux to greatly improve cat performance Use splice syscall on Linux to greatly improve cat performance Jan 26, 2019
Copy link
Collaborator

@Arcterus Arcterus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. It might be nice to add a feature letting the user just use the standard behavior on Linux as well (it's not necessary though).

@ArniDagur
Copy link
Contributor Author

@Arcterus Does this PR look better to you now?

Copy link
Collaborator

@Arcterus Arcterus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time, I'd be interested in adding a reader/writer like this into uucore.

Árni Dagur added 2 commits May 18, 2019 12:49
* Use `io::copy` as `write_fast` fallback.
* Document `write_fast_using_splice` with Rustdoc comment.
* Inline `write_fast_using_splice`.
* Ignore all `splice` related errors.
src/cat/cat.rs Outdated
},
// If we're not on Linux or Android, or the splice() call failed,
// fall back on slower writing.
io::copy(&mut handle.reader, &mut writer_handle).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old implementation was actually incorrect in that this should not panic/cause an error. It should write out the error message and then bump the error count like in the Err(error) clause below.

Copy link
Collaborator

@Arcterus Arcterus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response! If you don't want to make these changes, at least make issues for them so we can keep track.

@@ -456,7 +518,8 @@ fn write_nonprint_to_end<W: Write>(in_buf: &[u8], writer: &mut W, tab: &[u8]) ->
128...159 => writer.write_all(&[b'M', b'-', b'^', byte - 64]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to use byte literals (and maybe fix the other issues mentioned by the person on IRC)?

return Ok(InputHandle {
#[cfg(any(target_os = "linux", target_os = "android"))]
file_descriptor: stdin.as_raw_fd(),
reader: Box::new(stdin) as Box<Read>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this on the other PR as well, but could you change this to avoid Box?

@the8472
Copy link

the8472 commented May 12, 2020

For the specific case of catting from regular file(s) to a regular file one could also try copy_file_range, which will do the same as splice but with fewer syscalls. The generic splice approach is only supported since 5.3 though.

@rivy rivy force-pushed the master branch 4 times, most recently from f07992e to 813e57d Compare June 15, 2020 04:39
@the8472
Copy link

the8472 commented Mar 9, 2021

Note that with rust 1.50 onwards the standard library's io::copy is specialized on linux to use better syscalls when you pass files as reader and writer. rust-lang/rust#75272

You'll probably get the same benefit by simply switching to io::copy as default strategy..

@sylvestre
Copy link
Contributor

@ArniDagur As wc landed, do you want to finish this one too ? :)

@ArniDagur
Copy link
Contributor Author

New PR: #1978

@ArniDagur ArniDagur closed this Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants