This run failed when one of the DocumentationQueryTests displayed unexpected behavior: https://github.com/FoundationDB/fdb-record-layer/actions/runs/21530330669
From the logs included in the test reports:
2026-01-30 22:08:57,721 [INFO] c.a.f.r.y.s.ExternalServer - Started external server err_file="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/tmp-test/fdb-relational-server-!current_version-1111-err.1754494390310835624.log" grpc_port="1111" http_port="1112" jar="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/externalServer/fdb-relational-server-4.9.6.0-all.jar" out_file="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/tmp-test/fdb-relational-server-!current_version-1111-out.10676378663490580244.log" version="!current_version" {}
2026-01-30 22:09:00,742 [INFO] c.a.f.r.y.s.ExternalServer - Started external server err_file="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/tmp-test/fdb-relational-server-4.9.4.0-1111-err.4689561780696515867.log" grpc_port="1111" http_port="1112" jar="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/externalServer/fdb-relational-server-4.9.4.0-all.jar" out_file="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/tmp-test/fdb-relational-server-4.9.4.0-1111-out.4538351290628074477.log" version="4.9.4.0" {}
It looks like the 4.9.6.0 and 4.9.4.0 servers were both started on port 1111. Then we hit an error that looked like a request that was trying to target 4.9.4.0 actually went to 4.9.6.0. Which would make sense if, say, after this, the 4.9.4.0 server wasn't able to actually start because the port it was listening on was already in use.
We have a bunch of behavior in the files covered in YamlIntegrationTests at that same time, and those all succeeded. For that file, we see:
2026-01-30 22:10:37,979 [INFO] c.a.f.r.y.s.ExternalServer - Started external server err_file="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/tmp-test/fdb-relational-server-!current_version-1111-err.9034997590691910401.log" grpc_port="1111" http_port="1112" jar="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/externalServer/fdb-relational-server-4.9.6.0-all.jar" out_file="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/tmp-test/fdb-relational-server-!current_version-1111-out.2053876967657620012.log" version="!current_version" {}
2026-01-30 22:10:40,982 [INFO] c.a.f.r.y.s.ExternalServer - Started external server err_file="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/tmp-test/fdb-relational-server-4.9.4.0-1113-err.18149555724328578421.log" grpc_port="1113" http_port="1114" jar="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/externalServer/fdb-relational-server-4.9.4.0-all.jar" out_file="/home/runner/work/fdb-record-layer/fdb-record-layer/yaml-tests/.out/tmp-test/fdb-relational-server-4.9.4.0-1113-out.10891905594048222494.log" version="4.9.4.0" {}
That is, the two versions get allocated separate ports. Which is in line with the problem being caused by this port issue (that is, a correct allocation of ports results in tests succeeding).
Unfortunately, we don't have any more server logs from this run, so we don't have more details as to why the builds failed. Prior to this build, we didn't even log the ports, so it's hard to say conclusively whether this explains other similar problems we saw during release, but it seems likely. Since then, we haven't had that failure re-occur, though we did add additional validation of the server version (see #3906). I did try and see if setting the version to a static 1111 would result in a re-occurrence, but it instead led to initialization errors (see #3928). So while I'm fairly confident that the errors seen are in line with the strange behavior, I suppose there is still a little bit of wiggle room remaining.
This run failed when one of the
DocumentationQueryTestsdisplayed unexpected behavior: https://github.com/FoundationDB/fdb-record-layer/actions/runs/21530330669From the logs included in the test reports:
It looks like the 4.9.6.0 and 4.9.4.0 servers were both started on port 1111. Then we hit an error that looked like a request that was trying to target 4.9.4.0 actually went to 4.9.6.0. Which would make sense if, say, after this, the 4.9.4.0 server wasn't able to actually start because the port it was listening on was already in use.
We have a bunch of behavior in the files covered in
YamlIntegrationTestsat that same time, and those all succeeded. For that file, we see:That is, the two versions get allocated separate ports. Which is in line with the problem being caused by this port issue (that is, a correct allocation of ports results in tests succeeding).
Unfortunately, we don't have any more server logs from this run, so we don't have more details as to why the builds failed. Prior to this build, we didn't even log the ports, so it's hard to say conclusively whether this explains other similar problems we saw during release, but it seems likely. Since then, we haven't had that failure re-occur, though we did add additional validation of the server version (see #3906). I did try and see if setting the version to a static 1111 would result in a re-occurrence, but it instead led to initialization errors (see #3928). So while I'm fairly confident that the errors seen are in line with the strange behavior, I suppose there is still a little bit of wiggle room remaining.