[Perf] Refactor performance test for different kv store backends by tianyi-ge · Pull Request #52 · Ascend/TransferQueue

tianyi-ge · 2026-03-19T16:50:00Z

Description

support different kv backends
support intra-node and inter-node client placement for yr
output to csv
remove the non-tensor part when create_complex_test_case
remove ray bandwidth test
add readme for perf test
test 3 times to mitigate variance (warmup)
use kv client to simplify usage

Usage

usage: perftest.py [-h] --backend_config BACKEND_CONFIG [--backend BACKEND] [--device {cpu,npu,gpu}] [--global_batch_size GLOBAL_BATCH_SIZE] [--field_num FIELD_NUM]
                   [--seq_len SEQ_LEN] [--num_test_iterations NUM_TEST_ITERATIONS] --head_node_ip HEAD_NODE_IP [--worker_node_ip WORKER_NODE_IP]
                   [--output_csv OUTPUT_CSV] [--use_complex_case]

TransferQueue Throughput Test

options:
  -h, --help            show this help message and exit
  --backend_config BACKEND_CONFIG
                        Path to backend config YAML file
  --backend BACKEND     Override storage_backend in config (e.g. SimpleStorage, Yuanrong, MooncakeStore)
  --device {cpu,npu,gpu}
                        Device to use (default: cpu)
  --global_batch_size GLOBAL_BATCH_SIZE
                        Global batch size (default: 1024)
  --field_num FIELD_NUM
                        Number of fields (default: 10)
  --seq_len SEQ_LEN     Sequence length (default: 8192)
  --num_test_iterations NUM_TEST_ITERATIONS
                        Number of test iterations (default: 4)
  --head_node_ip HEAD_NODE_IP
                        Head node IP address
  --worker_node_ip WORKER_NODE_IP
                        Worker node IP address (required for Yuanrong)
  --output_csv OUTPUT_CSV
                        Path to output CSV file (optional)
  --use_complex_case    Use complex test case with nested tensors and nontensor fields (default: False, simple case)

closes #51

ascend-robot · 2026-03-19T16:50:11Z

CLA Signature Guide

@tianyi-ge , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit	Reason
1e71d496 refactor perftest 1. support dif...	the email used in the commit is not linked to a signed CLA! please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

ascend-robot · 2026-03-20T01:05:18Z

CLA Signature Guide

@tianyi-ge , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit	Reason
c8e038dd refactor perftest 1. support dif...	the email used in the commit is not linked to a signed CLA! please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

tianyi-ge · 2026-03-20T01:31:56Z

/check-cla

ascend-robot · 2026-03-20T01:32:04Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

dpj135 · 2026-03-20T01:21:08Z

scripts/perftest.py

+        if self.device in ["npu", "gpu"]:
+            device_resource = {self.device: 1}


should device be uppercase?

yes. I fixed it. and deal with gpu case as well

dpj135 · 2026-03-20T01:24:55Z

scripts/perftest.py

+        self.test_data, self.total_data_size_gb = create_complex_test_case(batch_size, seq_length, field_num, device)
+        return list(self.test_data.keys()), self.total_data_size_gb


Is create_complex_test_case in TQClientActor for decreasing once copy?

yes. we don't need to pass this large dataset from driver to writer

dpj135 · 2026-03-20T01:31:57Z

scripts/perftest.py

+        # Initialize storage managers
+        logger.info(f"Using {self.manager_type} as storage backend.")
+
+        w = self.writer.initialize_storage_manager.remote(manager_type=self.manager_type, config=self.writer_config)
+        r = self.reader.initialize_storage_manager.remote(manager_type=self.manager_type, config=self.reader_config)


Perhaps we should move initialize_storage_manager to TQClientActor.__init__

I think it's fine so far

dpj135 · 2026-03-20T01:35:47Z

scripts/README_PERFTEST.md

+### Inter-node test with yuanrong backend
+```bash
+python perftest.py --backend=yuanrong --client_placement=inter_node \
+  --backend_config=configs/yuanrong.yaml \
+  --head_node_ip=192.168.0.1 --worker_node_ip=192.168.0.2
+```


Yuanrong need some pre-operations like starting etcd and datasystem

it's briefly described in prerequisites. this doc focused on perftest usage.

dpj135 · 2026-03-20T01:37:48Z

scripts/README_PERFTEST.md

+| `--backend` | Backend type: default, yuanrong, mooncake | default |
+| `--client_placement` | Client placement: intra_node or inter_node | intra_node |
+| `--backend_config` | Path to YAML config file (optional) | None |
+| `--device` | Device: cpu, npu, gpu | cpu |


We should list which devices each backend supports.

I've described it below

ascend-robot · 2026-03-20T02:35:37Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-03-20T09:08:36Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

0oshowero0 · 2026-03-22T06:14:29Z

scripts/README_PERFTEST.md

+
+| Argument | Description | Default |
+|----------|-------------|---------|
+| `--backend` | Backend type: default, yuanrong, mooncake | default |


We'd better align the config structure with https://github.com/Ascend/TransferQueue/blob/main/transfer_queue/config.yaml

For the --backend, we should also align with the main config as SimpleStorage, Yuanrong and MooncakeStore

scripts/performance_test/README_PERFTEST.md

0oshowero0 · 2026-03-22T06:16:04Z

scripts/performance_test/README_PERFTEST.md

+  --backend=[default|yuanrong|mooncake] \
+  --client_placement=[intra_node|inter_node] \
+  --backend_config=xxx.yaml \
+  --device=[cpu|npu|gpu] \


Do we have to distinguish npu and gpu？

It's used to determine the target devices of tensors during creation, like device="cuda:0" or device="npu:0"

scripts/performance_test/README_PERFTEST.md

0oshowero0 · 2026-03-22T06:20:30Z

scripts/README_PERFTEST.md

+
+## Examples
+
+### Intra-node test with default backend


We may need to explain what's the expected behavior for each kind of backend.

0oshowero0 · 2026-03-22T06:26:44Z

scripts/perftest.py

+            # Tensor field
+            tensor_data = torch.randn(batch_size, seq_length, dtype=torch.float32, device=torch_device)
+            fields[field_name] = tensor_data
+        else:


I'm thinking to delete the non-tensor part.. As a performance test, we cannot cover the use case for all scenarios. We can just illustrate tensor performance, and letting users to implement and test their own case.

0oshowero0 · 2026-03-22T06:27:28Z

scripts/perftest.py

+    """Ray actor that holds a TransferQueueClient."""
+
+    def __init__(self, client_id: str, controller_info: Any):
+        self.client = TransferQueueClient(


Use tq.init(config)?

0oshowero0 · 2026-03-22T06:30:20Z

scripts/perftest.py

+        """Put data to storage."""
+        self.client.put(data=self.test_data, partition_id=partition_id)
+
+    def get_meta(


I think we'd better to illustrate the high-level KV api to reduce the cost to understand TQ.

0oshowero0 · 2026-03-22T06:30:51Z

scripts/perftest.py

+    def _get_manager_type(self) -> str:
+        """Get the storage manager type based on backend."""
+        if self.backend == "default":
+            return "AsyncSimpleStorageManager"
+        elif self.backend == "yuanrong":
+            return "YuanrongStorageManager"
+        elif self.backend == "mooncake":
+            return "MooncakeStorageManager"
+        else:
+            raise ValueError(f"Unknown backend: {self.backend}")


Just align with the main config

0oshowero0 · 2026-03-22T06:35:02Z

scripts/perftest.py

+
+        self.data_system_storage_units = {}
+
+        if storage_unit_placement == "remote":


For SimpleStorage backend, we can just illustrate the common use case where we put data distributely in all the nodes. We can provide another script to manually validate the bandwidth efficiency

0oshowero0 · 2026-03-22T06:37:25Z

scripts/perftest.py

+    def _initialize_clients(self) -> None:
+        """Initialize writer and reader TQClientActors."""
+        # Determine node placement
+        if self.client_placement == "intra_node":


Should we preserve this config since:

We only demonstrate the normal usage for SimpleStorage backend

It doesn't affect MooncakeStore

For Yuanrong, only inter_node is reasonable since it prefer local storage by default?

ascend-robot · 2026-03-23T07:11:19Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

.github/workflows/perftest.yml

ascend-robot · 2026-03-23T07:29:24Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-03-23T07:30:12Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-03-23T07:44:40Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

0oshowero0 · 2026-03-23T08:57:11Z

scripts/performance_test/README_PERFTEST.md

+   ray start --head --resources='{"node:192.168.0.1":1}'
+
+   # On worker node
+   ray start --address=192.168.0.1 --resources='{"node:192.168.0.2":1}'


I will modify it later

ascend-robot · 2026-03-23T14:46:29Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

1. support different kv backends 2. support intra-node and inter-node client placement 3. remove ray bandwidth test Signed-off-by: tianyi-ge <tianyig@outlook.com>

Signed-off-by: tianyi-ge <tianyig@outlook.com>

ascend-robot · 2026-03-26T07:14:41Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

2. remove delete time stats Signed-off-by: tianyi-ge <tianyig@outlook.com>

ascend-robot · 2026-03-26T08:43:03Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

tianyi-ge · 2026-03-26T09:38:02Z

dscli start --cpunodebind 6 --localalloc --timeout 600 -w --worker_address 10.170.27.237:31501 --etcd_address 10.170.27.237:2379 --arena_per_tenant 1 --max_log_size 1024 --max_log_file_num 10 --node_timeout_s 600 --node_dead_timeout_s 1800 --enable_fallocate false --enable_worker_worker_batch_get true --shared_memory_populate true --shared_memory_size_mb 40960 --remote_h2d_device_ids "0" --enable_huge_tlb true

dscli start --cpunodebind 6 --localalloc --timeout 600 -w --worker_address 10.170.27.158:31501 --etcd_address 10.170.27.237:2379 --arena_per_tenant 1 --max_log_size 1024 --max_log_file_num 10 --node_timeout_s 600 --node_dead_timeout_s 1800 --enable_fallocate false --enable_worker_worker_batch_get true --shared_memory_populate true --shared_memory_size_mb 40960 --remote_h2d_device_ids "0" --enable_huge_tlb true

…s will connect to the head node Signed-off-by: tianyi-ge <tianyig@outlook.com>

ascend-robot · 2026-03-27T13:12:21Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-03-27T13:22:29Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

0oshowero0 · 2026-03-28T06:06:00Z

pyproject.toml

 mooncake = [
    "mooncake-transfer-engine"
 ]
+perftest = [


this are not needed

Signed-off-by: tianyi-ge <tianyig@outlook.com>

0oshowero0 · 2026-03-28T06:07:55Z

scripts/performance_test/perftest.py

+        self.backend = self.full_config["backend"]["storage_backend"]
+
+        # For Yuanrong, always use inter_node
+        self.use_inter_node = self.backend == "Yuanrong"


actually we can use inter node as default for all backends

ascend-robot · 2026-03-28T06:08:23Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

0oshowero0 · 2026-03-28T06:12:13Z

transfer_queue/utils/common.py

    return env_value_lower in true_values
+
+
+def get_local_ip_addresses() -> list[str]:


We can put these functions in yuanrong_client.py since now only Yuanrong requires these utils

0oshowero0 · 2026-03-28T06:13:29Z

scripts/performance_test/perftest_config.yaml

+    # Memory segment size in bytes for mounting (default: 4GB)
+    global_segment_size: 4294967296
+    # Local buffer size in bytes (default: 1GB)
+    local_buffer_size: 1073741824


Suggested change

# Memory segment size in bytes for mounting (default: 4GB)

global_segment_size: 4294967296

# Local buffer size in bytes (default: 1GB)

local_buffer_size: 1073741824

# Memory segment size in bytes for mounting

global_segment_size: 86294967296

# Local buffer size in bytes

local_buffer_size: 86294967296

0oshowero0 · 2026-03-28T06:13:44Z

scripts/performance_test/perftest_config.yaml

+    # Address of local host. Set to "" to use Ray IP as local host address
+    local_hostname: ""
+    # Protocol for transmission. Choose from: tcp, rdma. (default: tcp)
+    protocol: tcp


Suggested change

protocol: tcp

protocol: rdma

2. modify default mooncake store perftest config Signed-off-by: tianyi-ge <tianyig@outlook.com>

ascend-robot · 2026-03-28T08:25:13Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

0oshowero0 · 2026-03-28T08:57:43Z

transfer_queue/interface.py

    except ValueError:
        logger.info("Some other rank has initialized TransferQueueController. Try to connect to existing controller.")
        _init_from_existing()
-        return


This should not be deleted

Signed-off-by: tianyi-ge <tianyig@outlook.com>

ascend-robot · 2026-03-28T09:01:33Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-03-28T09:03:56Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

0oshowero0 · 2026-03-28T09:26:52Z

scripts/performance_test/run_perf_test.sh

+declare -a SETTINGS=(
+    "1024,9,8192,Small"
+    "4096,15,32768,Medium"
+    "8192,21,128000,Large"


Suggested change

"8192,21,128000,Large"

"8192,18,100000,Large"

0oshowero0 · 2026-03-28T09:32:11Z

scripts/performance_test/README_PERFTEST.md

+### Test Matrix
+
+- **Backends**: SimpleStorage, Yuanrong, MooncakeStore, Ray (baseline)
+- **Data sizes**: Small (batch=1024, fields=9, seq=8192), Medium (batch=4096, fields=15, seq=32768), Large (batch=8192, fields=21, seq=128000)


Suggested change

- **Data sizes**: Small (batch=1024, fields=9, seq=8192), Medium (batch=4096, fields=15, seq=32768), Large (batch=8192, fields=21, seq=128000)

- **Data sizes**: Small (batch=1024, fields=9, seq=8192), Medium (batch=4096, fields=15, seq=32768), Large (batch=8192, fields=18, seq=100000)

0oshowero0 · 2026-03-28T09:38:49Z

scripts/performance_test/README_PERFTEST.md

+- `Yuanrong`: `cpu`, `npu`
+- `MooncakeStore`: `cpu`, `gpu`
+
+## Test Data Format


Now we have 2 scenarios and we need to illustrate both the simple case and complex case https://www.yuque.com/haomingzi-lfse7/lhp4el/tml8ke0zkgn6roey?singleDoc# 《TransferQueue Performance Test - 0.1.6》

Signed-off-by: tianyi-ge <tianyig@outlook.com>

ascend-robot · 2026-03-28T09:42:38Z

CLA Signature Pass

tianyi-ge, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot added the ascend-cla/no label Mar 19, 2026

tianyi-ge force-pushed the feat/perftest-refactor branch from 1e71d49 to c8e038d Compare March 20, 2026 01:05

ascend-robot added ascend-cla/yes and removed ascend-cla/no labels Mar 20, 2026

dpj135 reviewed Mar 20, 2026

View reviewed changes

0oshowero0 reviewed Mar 22, 2026

View reviewed changes

scripts/performance_test/README_PERFTEST.md Show resolved Hide resolved

0oshowero0 reviewed Mar 22, 2026

View reviewed changes

scripts/performance_test/README_PERFTEST.md Show resolved Hide resolved

0oshowero0 reviewed Mar 22, 2026

View reviewed changes

0oshowero0 reviewed Mar 23, 2026

View reviewed changes

.github/workflows/perftest.yml Outdated Show resolved Hide resolved

0oshowero0 reviewed Mar 23, 2026

View reviewed changes

refactor perftest

221fd70

1. support different kv backends 2. support intra-node and inter-node client placement 3. remove ray bandwidth test Signed-off-by: tianyi-ge <tianyig@outlook.com>

update readme for perftest

ca530af

Signed-off-by: tianyi-ge <tianyig@outlook.com>

1. fix bar order in draw_figure.py

dbd830f

2. remove delete time stats Signed-off-by: tianyi-ge <tianyig@outlook.com>

fix incorrect init yr client from controller; otherwise all yr client…

eb380c2

…s will connect to the head node Signed-off-by: tianyi-ge <tianyig@outlook.com>

add simple case

b68d267

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

0oshowero0 reviewed Mar 28, 2026

View reviewed changes

pyproject.toml Outdated

mooncake = [

"mooncake-transfer-engine"

]

perftest = [

Copy link
Copy Markdown

Collaborator

0oshowero0 Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this are not needed

remove host config for yuanrong; auto-detect instead

a020706

Signed-off-by: tianyi-ge <tianyig@outlook.com>

0oshowero0 reviewed Mar 28, 2026

View reviewed changes

1. move find reachable ip to yuanrong client

28313dd

2. modify default mooncake store perftest config Signed-off-by: tianyi-ge <tianyig@outlook.com>

0oshowero0 reviewed Mar 28, 2026

View reviewed changes

fix comments

f278f8e

Signed-off-by: tianyi-ge <tianyig@outlook.com>

fix figure drawing

4797705

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

0oshowero0 reviewed Mar 28, 2026

View reviewed changes

update large test config

49d1139

Signed-off-by: tianyi-ge <tianyig@outlook.com>

0oshowero0 approved these changes Mar 28, 2026

View reviewed changes

0oshowero0 merged commit 0c3ac24 into Ascend:main Mar 28, 2026
8 checks passed

		if self.device in ["npu", "gpu"]:
		device_resource = {self.device: 1}

		self.test_data, self.total_data_size_gb = create_complex_test_case(batch_size, seq_length, field_num, device)
		return list(self.test_data.keys()), self.total_data_size_gb


		self.data_system_storage_units = {}

		if storage_unit_placement == "remote":

		return env_value_lower in true_values


		def get_local_ip_addresses() -> list[str]:

	- Data sizes: Small (batch=1024, fields=9, seq=8192), Medium (batch=4096, fields=15, seq=32768), Large (batch=8192, fields=21, seq=128000)
	- Data sizes: Small (batch=1024, fields=9, seq=8192), Medium (batch=4096, fields=15, seq=32768), Large (batch=8192, fields=18, seq=100000)

Conversation

tianyi-ge commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Uh oh!

ascend-robot commented Mar 19, 2026

CLA Signature Guide

Uh oh!

ascend-robot commented Mar 20, 2026

CLA Signature Guide

Uh oh!

tianyi-ge commented Mar 20, 2026

Uh oh!

ascend-robot commented Mar 20, 2026

CLA Signature Pass

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ascend-robot commented Mar 20, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 20, 2026

CLA Signature Pass

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ascend-robot commented Mar 23, 2026

CLA Signature Pass

Uh oh!

Uh oh!

ascend-robot commented Mar 23, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 23, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Mar 23, 2026

CLA Signature Pass

Uh oh!

tianyi-ge commented Mar 19, 2026 •

edited

Loading