Skip to content

Feat: Codex migration#1

Draft
nipsysdev wants to merge 4 commits intomainfrom
feat/codex_migration
Draft

Feat: Codex migration#1
nipsysdev wants to merge 4 commits intomainfrom
feat/codex_migration

Conversation

@nipsysdev
Copy link
Copy Markdown
Owner

@nipsysdev nipsysdev commented Nov 26, 2025

Migrating from a centralized HTTP/Tor-based server to a decentralized Codex network node. Replacing serving PMTiles files via HTTP endpoints with uploading them to Codex, a persistent distributed storage network.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request represents a major architectural migration from a centralized HTTP/Tor-based server to a decentralized Codex network node. The application transitions from serving PMTiles files via HTTP endpoints to uploading them to a persistent Codex distributed storage network and maintaining CID (Content Identifier) mappings.

Key changes:

  • Replaced Tor/Axum HTTP server architecture with Codex network integration using codex-bindings v0.1.3
  • Introduced CodexNodeManager to maintain a single persistent Codex node instance throughout the application lifecycle
  • Implemented NodeOps service for batch upload processing with queue management
  • Separated database concerns: WhosOnFirst (read-only locality data) and CID mappings (read-write upload tracking)

Reviewed changes

Copilot reviewed 24 out of 26 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
Cargo.toml Removed Tor/Arti and Axum dependencies; added codex-bindings and test dependencies
.env.example Added Codex configuration variables; removed obsolete server/Tor settings
src/main.rs Replaced HTTP server with Codex node lifecycle and upload processing flow
src/config.rs Introduced LocalitySrvConfig with Codex-specific settings, removed HTTP server config
src/node/manager.rs New module implementing managed Codex node lifecycle with start/stop/upload operations
src/services/node_ops.rs New service for processing locality uploads with batch queuing and statistics tracking
src/services/database.rs Simplified initialization; added CID mapping table and query methods
src/services/extraction.rs Removed remote planet URL fetching; requires local planet file in decentralized mode
src/models/storage.rs New models for upload queue, pending/completed uploads, and statistics
src/initialization.rs Added Codex node initialization and directory permission handling
tests/single_node_usage_test.rs Unit tests for node manager ensuring single instance usage
tests/integration_test.rs Integration tests with real Codex uploads and verification
src/services/tor.rs Deleted - Tor hidden service no longer needed
src/api/* Deleted - HTTP API endpoints removed in favor of direct Codex storage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +37
# Protomaps Configuration
PROTOMAPS_BUILDS_URL=https://build-metadata.protomaps.dev/builds.json

# Local Planet PMTiles Configuration
# Set this to the path of a local planet.pmtiles file to use it instead of downloading from remote
# Example: PLANET_PMTILES_PATH=/path/to/planet.pmtiles
# If not set or empty, the system will fetch the latest planet pmtiles from the remote URL
PLANET_PMTILES_PATH=

# Target Countries (comma-separated, empty or ALL for all countries)
TARGET_COUNTRIES=AE,AF

# Codex Node Configuration
CODEX_DATA_DIR=./.codex-data
CODEX_STORAGE_QUOTA_GB=100
CODEX_DISCOVERY_PORT=8090
CODEX_LISTEN_ADDRS=/ip4/0.0.0.0/tcp/8090,/ip4/127.0.0.1/tcp/8090
LOG_LEVEL=info

# Performance Settings
MAX_CONCURRENT_EXTRACTIONS=10
DB_CONNECTION_POOL_SIZE=10 No newline at end of file
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These environment variables in the example file (PROTOMAPS_BUILDS_URL and DB_CONNECTION_POOL_SIZE) are no longer used in the codebase. They should be removed from the example configuration file to avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +36
// Only create CID table if this is not a WhosOnFirst database
if !database_path.contains("whosonfirst") {
service.create_optimized_indexes().await?;
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional check database_path.contains("whosonfirst") relies on a string pattern in the path to determine whether to create CID tables. This is fragile and could fail if:

  1. The WhosOnFirst database is named differently
  2. The CID database path accidentally contains "whosonfirst" in its path

Consider passing an explicit flag or using separate methods (new_whosonfirst_db() and new_cid_db()) to make the intent clearer and more robust.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +91
println!("Real Codex integration test completed successfully!");
println!("WhosOnFirst database saved at: {:?}", whosonfirst_db_path);
println!("CID mappings database saved at: {:?}", cid_db_path);
println!(
"You can inspect the databases with: sqlite3 {:?} and sqlite3 {:?}",
whosonfirst_db_path, cid_db_path
);

// Clean up test data after successful test
cleanup_test_data(&test_resources_dir).await?;
println!("✓ Cleaned up test data after test completion");

Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test data cleanup issue: The test calls cleanup_test_data() at the end (line 89), which removes the databases and PMTiles files that were just created. However, the test also logs on lines 81-86 that users can inspect these databases. This is contradictory - the files will be deleted before users can inspect them.

Consider either:

  1. Only cleaning up on test failure, or
  2. Making cleanup optional via an environment variable, or
  3. Removing the "you can inspect" messages if cleanup always happens

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +287
async fn test_real_codex_uploads(
_node_ops: &NodeOps,
db_service: &Arc<DatabaseService>,
) -> Result<(String, u64), Box<dyn std::error::Error>> {
println!("Testing real Codex uploads...");

// Test direct Codex upload using the bindings
println!(" Testing direct Codex upload...");

let test_file_path =
std::path::PathBuf::from("tests/resources/localities/AE/421168683.pmtiles");
if !test_file_path.exists() {
return Err("Test PMTiles file not found".into());
}

// Create a Codex node using test resources directory
let test_resources_dir = std::path::Path::new("tests/resources");
let config = codex_bindings::CodexConfig::new()
.log_level(codex_bindings::LogLevel::Info)
.data_dir(test_resources_dir.join(".codex-test-data"))
.storage_quota(100 * 1024 * 1024) // 100MB
.discovery_port(0); // Use random port

let mut node = codex_bindings::CodexNode::new(config)?;
node.start()?;

println!(" Started temporary Codex node for upload test");

// Perform real upload
let upload_options = codex_bindings::UploadOptions::new()
.filepath(&test_file_path)
.on_progress(|progress| {
let percentage = (progress.percentage * 100.0) as u32;
println!(
" Upload progress: {} bytes ({}%)",
progress.bytes_uploaded, percentage
);
});

let upload_result = codex_bindings::upload_file(&node, upload_options).await?;

println!(" ✓ Real upload completed!");
println!(" CID: {}", upload_result.cid);
println!(" Size: {} bytes", upload_result.size);
println!(" Duration: {} ms", upload_result.duration_ms);

// Verify the uploaded content exists
let exists = codex_bindings::exists(&node, &upload_result.cid).await?;
assert!(exists, "Uploaded content should exist in Codex node");
println!(" ✓ Upload verification passed - content exists in node");

// Test downloading to verify round-trip
let test_resources_dir = std::path::Path::new("tests/resources");
let download_path = test_resources_dir.join("downloaded.pmtiles");
let download_options =
codex_bindings::DownloadStreamOptions::new(&upload_result.cid).filepath(&download_path);

let download_result =
codex_bindings::download_stream(&node, &upload_result.cid, download_options).await?;

println!(" ✓ Download verification completed!");
println!(" Downloaded size: {} bytes", download_result.size);

// Verify file sizes match
let original_size = tokio::fs::metadata(&test_file_path).await?.len();
let downloaded_size = tokio::fs::metadata(&download_path).await?.len();

assert_eq!(
original_size, downloaded_size as u64,
"Downloaded file should match original size"
);
println!(" ✓ File size verification passed: {} bytes", original_size);

// Store the real CID in the database
let real_cid_mapping = vec![(
"AE".to_string(),
421168683,
upload_result.cid.clone(),
upload_result.size as u64,
)];

db_service
.batch_insert_cid_mappings(&real_cid_mapping)
.await?;
println!(" ✓ Stored real CID in database: {}", upload_result.cid);

// Cleanup
node.stop()?;
node.destroy()?;

println!(" ✓ Real Codex upload test completed successfully");

Ok((upload_result.cid, upload_result.size as u64))
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak: The test creates a temporary Codex node on lines 217-218 but never stops or destroys it if an error occurs between creation and the cleanup on lines 281-282. If the test fails at any point (e.g., during upload or download), the node will continue running.

Wrap the node lifecycle in a guard or use defer-like cleanup to ensure the node is always stopped and destroyed, even on test failure.

Copilot uses AI. Check for mistakes.
let downloaded_size = tokio::fs::metadata(&download_path).await?.len();

assert_eq!(
original_size, downloaded_size as u64,
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect cast: On line 262, downloaded_size is cast from the metadata length (which is u64) to u64 again, which is redundant. The actual issue is that on line 259, downloaded_size is assigned from metadata.len() which already returns u64, so the cast on line 262 is unnecessary.

Remove the as u64 cast on line 262 since both variables are already u64.

Suggested change
original_size, downloaded_size as u64,
original_size, downloaded_size,

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +155
impl Drop for CodexNodeManager {
fn drop(&mut self) {
// Note: This is a synchronous drop, so we can't use async operations here
// In a real implementation, you might want to ensure proper cleanup
// by calling stop() explicitly before the manager goes out of scope
info!("CodexNodeManager being dropped - ensure stop() was called explicitly");
}
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Drop implementation for CodexNodeManager only logs a message but doesn't actually stop the node. This could lead to resource leaks if the node is not explicitly stopped before the manager is dropped.

Consider either:

  1. Documenting that stop() must be called before dropping, or
  2. Implementing graceful shutdown in Drop (though this is challenging with async operations)

At minimum, the log message should be upgraded to a warning to make it more visible.

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +182
let test_mappings = vec![
(
"AE".to_string(),
421168683,
"bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi421168683".to_string(),
10965857, // Real file size
),
(
"AE".to_string(),
421168685,
"bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi421168685".to_string(),
1327870, // Real file size
),
(
"AE".to_string(),
421168687,
"bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi421168687".to_string(),
7484944, // Real file size
),
];
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded test file size: The "real file size" values on lines 168, 174, and 180 are hardcoded. If the actual PMTiles files change, these values will become outdated and the test will insert incorrect data.

Consider reading the actual file sizes dynamically using tokio::fs::metadata() instead of hardcoding them.

Copilot uses AI. Check for mistakes.
Comment on lines +184 to 190
_ = async {
let mut sig_term = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())
.expect("Failed to setup SIGTERM handler");
sig_term.recv().await;
} => {
info!("Received termination signal, shutting down gracefully...");
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Platform-specific code without fallback: The SIGTERM signal handler on lines 184-190 uses tokio::signal::unix::signal, which is Unix-specific and will fail to compile on Windows. This makes the code non-portable.

Consider wrapping this in a conditional compilation block:

#[cfg(unix)]
_ = async {
    let mut sig_term = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())
        .expect("Failed to setup SIGTERM handler");
    sig_term.recv().await;
} => {
    info!("Received termination signal, shutting down gracefully...");
}

Or provide a Windows-compatible alternative using tokio::signal::ctrl_c() only on non-Unix platforms.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,37 @@
# Server Configuration
SERVER_PORT=8000
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SERVER_PORT environment variable on line 2 is not used in the new configuration implementation (LocalitySrvConfig::from_env()). It should be removed from the example file as the application no longer runs an HTTP server.

Suggested change
SERVER_PORT=8000

Copilot uses AI. Check for mistakes.
Comment on lines +289 to +290
// Check how many are already uploaded - simplified for now
let uploaded_count = 0u32;
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The uploaded_count is hardcoded to 0 on line 290, which means the upload readiness check will always show 0 uploaded files even if some files have already been uploaded. This defeats the purpose of checking readiness.

The comment says "simplified for now", but this should query the CID database service to get the actual count of uploaded files for accurate reporting. Consider calling cid_db_service.has_cid_mapping() for each locality or implementing a batch count query.

Suggested change
// Check how many are already uploaded - simplified for now
let uploaded_count = 0u32;
// Query the actual number of uploaded files for this country
let uploaded_count = db_service.get_uploaded_file_count(country_code).await?;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants