Tensorlink is a Python library and decentralized compute platform for running PyTorch and Hugging Face models across peer-to-peer networks. It lets you run, train, and serve large models securely on distributed hardware without relying on centralized cloud inference providers.
With Tensorlink, models can be automatically sharded across multiple GPUs, enabling execution beyond local VRAM limits. You can host models on your own devices, expose them through a REST API, stream tokens in real time, and optionally route requests only to your own hardware for private usage. Tensorlink supports both distributed training with optimizers and low-latency inference across the network.
- Native PyTorch & REST API Access โ Use models directly in Python or via HTTP endpoints.
- Run Large Models โ Automatic offloading and model sharding across peers.
- Plug-and-Play Distributed Execution โ No manual cluster setup required.
- Streaming Generation โ Token-by-token responses for real-time apps.
- Privacy Controls โ Route traffic exclusively to your own machines, or leverage hybrid models privacy enhanced model workflows.
Early Access: Tensorlink is under active development. APIs and internals may evolve.
Join our Discord for updates, support, and roadmap discussions.
pip install tensorlinkRequirements: Python 3.10+, PyTorch 2.3+, UNIX/MacOS (Windows: use WSL)
Execute Hugging Face models distributed across the network.
from tensorlink.ml import DistributedModel
MODEL_NAME = "Qwen/Qwen3-8B"
# Connect to a pre-trained model on the network
model = DistributedModel(
model=MODEL_NAME,
training=True,
device="cuda"
)
optimizer = model.create_optimizer(lr=0.001)See Examples for streaming generation, distributed training, custom models, and network configurations.
Access models via HTTP on the public network, or configure your own hardware for private API access. Tensorlink exposes OpenAI-style endpoints for distributed inference:
import requests
response = requests.post(
"http://smartnodes.ddns.net/tensorlink-api/v1/generate",
json={
"hf_name": "Qwen/Qwen2.5-7B-Instruct",
"message": "Explain quantum computing in one sentence.",
"max_new_tokens": 50,
"stream": False,
}
)
print(response.json())Access the public network or configure your own hardware for private API access. See Examples for streaming, chat completions, and API reference.
Run Tensorlink nodes to host models, shard workloads across GPUs, and expose them via Python and HTTP APIs. Nodes can act as workers (run models), validators (route requests + expose API), or both. This allows you to build private clusters, public compute providers, or local development environments.
- Download the latest
tensorlink-nodefrom Releases - Edit
config.jsonto configure your nodes. - Run:
./run-node.sh
By default, the config is set for running a public worker node. Your GPU will process network workloads and earn rewards via the networking layer (Smartnodes). See Examples for different device and network configurations.
Your config.json controls networking, rewards, and model execution behavior. By default, the config.json is set for
running a public worker node.
| Field | Type | Description |
|---|---|---|
type |
str |
Node Type (worker|validator|both): validator accepts job & api requests, workers run models |
mode |
str |
Network Type (public|private|local): public (earn rewards), private (your devices), local (testing) |
endpoint |
bool |
Endpoint Toggle: Enables REST API server on this node (validator role) |
endpoint_url |
str |
Endpoint URL: Address the API binds to. Use 0.0.0.0 to expose on LAN |
endpoint_port |
int |
Endpoint Port: Port for the HTTP API (default: 64747) |
priority_nodes |
List[List[str, int]] |
Nodes to Connect: Bootstrap trusted peers to connect to first (e.g.,[["192.168.2.42", 38751]]) |
logging |
int |
Console logging mode (e.g., DEBUG|INFO|WARNING) |
| Field | Type | Description |
|---|---|---|
trusted |
bool |
Allows execution of custom user-supplied models |
max_vram_gb |
int |
Limits VRAM usage per node to prevent overload |
| Field | Type | Description |
|---|---|---|
address |
str |
Wallet address used for identity and rewards |
mining |
bool |
Contribute GPU compute to the public network for rewards |
mining_script |
str |
Path to mining / GPU workload executable |
seed_validators |
List[List[str, int, str]] |
Path to mining / GPU workload executable |
For common configuration recipes and examples, see Examples: Node Configuration
Tensorlink exposes OpenAI-compatible HTTP endpoints for distributed inference.
POST /v1/generateโ Simple text generationPOST /v1/chat/completionsโ OpenAI-compatible chat interfacePOST /request-modelโ Preload models across the network
Simple generation endpoint with flexible output formats.
| Parameter | Type | Default | Description |
|---|---|---|---|
hf_name |
string | required | Hugging Face model identifier |
message |
string | required | Input text to generate from |
prompt |
string | null |
Alternative to message |
model_type |
string | "auto" |
Model architecture hint |
max_length |
int | 2048 |
Maximum total sequence length |
max_new_tokens |
int | 2048 |
Maximum tokens to generate |
temperature |
float | 0.7 |
Sampling temperature (0.01-2.0) |
do_sample |
bool | true |
Enable sampling vs greedy decode |
num_beams |
int | 1 |
Beam search width |
stream |
bool | false |
Enable streaming responses |
input_format |
string | "raw" |
"chat" or "raw" |
output_format |
string | "simple" |
"simple", "openai", or "raw" |
history |
array | null |
Chat history for multi-turn conversations |
is_chat_completion |
bool | false |
Determines whether to format chat output |
import requests
r = requests.post(
"http://localhost:64747/v1/generate",
json={
"hf_name": "Qwen/Qwen2.5-7B-Instruct",
"message": "Explain quantum computing in one sentence.",
"max_new_tokens": 64,
"temperature": 0.7,
"stream": False,
}
)
print(r.json()["generated_text"])r = requests.post(
"http://localhost:64747/v1/generate",
json={
"hf_name": "Qwen/Qwen2.5-7B-Instruct",
"message": "What about entanglement?",
"input_format": "chat",
"output_format": "openai",
"history": [
{"role": "user", "content": "Explain quantum computing."},
{"role": "assistant", "content": "Quantum computing uses..."}
],
"max_new_tokens": 128,
}
)
print(r.json())OpenAI-compatible chat completions endpoint with full streaming support.
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | required | Hugging Face model identifier |
messages |
array | required | Array of chat messages |
temperature |
float | 0.7 |
Sampling temperature (0.01-2.0) |
top_p |
float | 1.0 |
Nucleus sampling threshold |
n |
int | 1 |
Number of completions to generate |
stream |
bool | false |
Enable SSE streaming |
stop |
string/array | null |
Stop sequences |
max_tokens |
int | 1024 |
Maximum tokens to generate |
presence_penalty |
float | 0.0 |
Presence penalty (-2.0 to 2.0) |
frequency_penalty |
float | 0.0 |
Frequency penalty (-2.0 to 2.0) |
user |
string | null |
User identifier for tracking |
{
"role": "system" | "user" | "assistant",
"content": "message text"
}import requests
r = requests.post(
"http://localhost:64747/v1/chat/completions",
json={
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 128,
"temperature": 0.7,
}
)
response = r.json()
print(response["choices"][0]["message"]["content"])import requests
r = requests.post(
"http://localhost:64747/v1/chat/completions",
json={
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Explain quantum computing."}
],
"max_tokens": 128,
"stream": True
},
stream=True,
)
for line in r.iter_lines():
if line:
if line.decode().startswith("data: "):
data = line.decode()[6:] # Remove "data: " prefix
if data != "[DONE]":
import json
chunk = json.loads(data)
if chunk["choices"][0]["delta"].get("content"):
print(chunk["choices"][0]["delta"]["content"], end="", flush=True){
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "Qwen/Qwen2.5-7B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing harnesses quantum mechanics..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 50,
"total_tokens": 70
}
}data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Preload a model across the distributed network before making generation requests.
| Parameter | Type | Description |
|---|---|---|
hf_name |
string | Hugging Face model identifier |
import requests
r = requests.post(
"http://localhost:64747/request-model",
json={"hf_name": "Qwen/Qwen3-8B"}
)
print(r.json())
# {"status": "success", "message": "Model loading initiated"}Tensorlink is designed to support any Hugging Face model, however errors with certain models may appear. Please report any bugs via Issues
- Temperature: Values below
0.01automatically disable sampling to prevent numerical instability - Streaming: Both endpoints support Server-Sent Events (SSE) streaming via
stream: true - Token IDs: Automatically handles missing pad/eos tokens with safe fallbacks
- Format Control: Use
input_format="chat"andoutput_format="openai"for seamless integration
For complete examples, error handling, and advanced usage, see Examples: HTTP API
- ๐ Documentation โ Full API reference and guides
- ๐ฏ Examples โ Comprehensive usage patterns and recipes
- ๐ฌ Discord Community โ Get help and connect with developers
- ๐ฎ Live Demo โ Try the chatbot demo powered by a model on Tensorlink
- ๐ Litepaper โ Technical overview and architecture
Read our contirbution guide.
Tensorlink is released under the MIT License.
