Conversation
There was a problem hiding this comment.
Pull request overview
This pull request represents a major architectural migration from a centralized HTTP/Tor-based server to a decentralized Codex network node. The application transitions from serving PMTiles files via HTTP endpoints to uploading them to a persistent Codex distributed storage network and maintaining CID (Content Identifier) mappings.
Key changes:
- Replaced Tor/Axum HTTP server architecture with Codex network integration using
codex-bindingsv0.1.3 - Introduced
CodexNodeManagerto maintain a single persistent Codex node instance throughout the application lifecycle - Implemented
NodeOpsservice for batch upload processing with queue management - Separated database concerns: WhosOnFirst (read-only locality data) and CID mappings (read-write upload tracking)
Reviewed changes
Copilot reviewed 24 out of 26 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
Cargo.toml |
Removed Tor/Arti and Axum dependencies; added codex-bindings and test dependencies |
.env.example |
Added Codex configuration variables; removed obsolete server/Tor settings |
src/main.rs |
Replaced HTTP server with Codex node lifecycle and upload processing flow |
src/config.rs |
Introduced LocalitySrvConfig with Codex-specific settings, removed HTTP server config |
src/node/manager.rs |
New module implementing managed Codex node lifecycle with start/stop/upload operations |
src/services/node_ops.rs |
New service for processing locality uploads with batch queuing and statistics tracking |
src/services/database.rs |
Simplified initialization; added CID mapping table and query methods |
src/services/extraction.rs |
Removed remote planet URL fetching; requires local planet file in decentralized mode |
src/models/storage.rs |
New models for upload queue, pending/completed uploads, and statistics |
src/initialization.rs |
Added Codex node initialization and directory permission handling |
tests/single_node_usage_test.rs |
Unit tests for node manager ensuring single instance usage |
tests/integration_test.rs |
Integration tests with real Codex uploads and verification |
src/services/tor.rs |
Deleted - Tor hidden service no longer needed |
src/api/* |
Deleted - HTTP API endpoints removed in favor of direct Codex storage |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Protomaps Configuration | ||
| PROTOMAPS_BUILDS_URL=https://build-metadata.protomaps.dev/builds.json | ||
|
|
||
| # Local Planet PMTiles Configuration | ||
| # Set this to the path of a local planet.pmtiles file to use it instead of downloading from remote | ||
| # Example: PLANET_PMTILES_PATH=/path/to/planet.pmtiles | ||
| # If not set or empty, the system will fetch the latest planet pmtiles from the remote URL | ||
| PLANET_PMTILES_PATH= | ||
|
|
||
| # Target Countries (comma-separated, empty or ALL for all countries) | ||
| TARGET_COUNTRIES=AE,AF | ||
|
|
||
| # Codex Node Configuration | ||
| CODEX_DATA_DIR=./.codex-data | ||
| CODEX_STORAGE_QUOTA_GB=100 | ||
| CODEX_DISCOVERY_PORT=8090 | ||
| CODEX_LISTEN_ADDRS=/ip4/0.0.0.0/tcp/8090,/ip4/127.0.0.1/tcp/8090 | ||
| LOG_LEVEL=info | ||
|
|
||
| # Performance Settings | ||
| MAX_CONCURRENT_EXTRACTIONS=10 | ||
| DB_CONNECTION_POOL_SIZE=10 No newline at end of file |
There was a problem hiding this comment.
These environment variables in the example file (PROTOMAPS_BUILDS_URL and DB_CONNECTION_POOL_SIZE) are no longer used in the codebase. They should be removed from the example configuration file to avoid confusion.
| // Only create CID table if this is not a WhosOnFirst database | ||
| if !database_path.contains("whosonfirst") { | ||
| service.create_optimized_indexes().await?; | ||
| } |
There was a problem hiding this comment.
The conditional check database_path.contains("whosonfirst") relies on a string pattern in the path to determine whether to create CID tables. This is fragile and could fail if:
- The WhosOnFirst database is named differently
- The CID database path accidentally contains "whosonfirst" in its path
Consider passing an explicit flag or using separate methods (new_whosonfirst_db() and new_cid_db()) to make the intent clearer and more robust.
| println!("Real Codex integration test completed successfully!"); | ||
| println!("WhosOnFirst database saved at: {:?}", whosonfirst_db_path); | ||
| println!("CID mappings database saved at: {:?}", cid_db_path); | ||
| println!( | ||
| "You can inspect the databases with: sqlite3 {:?} and sqlite3 {:?}", | ||
| whosonfirst_db_path, cid_db_path | ||
| ); | ||
|
|
||
| // Clean up test data after successful test | ||
| cleanup_test_data(&test_resources_dir).await?; | ||
| println!("✓ Cleaned up test data after test completion"); | ||
|
|
There was a problem hiding this comment.
Test data cleanup issue: The test calls cleanup_test_data() at the end (line 89), which removes the databases and PMTiles files that were just created. However, the test also logs on lines 81-86 that users can inspect these databases. This is contradictory - the files will be deleted before users can inspect them.
Consider either:
- Only cleaning up on test failure, or
- Making cleanup optional via an environment variable, or
- Removing the "you can inspect" messages if cleanup always happens
| async fn test_real_codex_uploads( | ||
| _node_ops: &NodeOps, | ||
| db_service: &Arc<DatabaseService>, | ||
| ) -> Result<(String, u64), Box<dyn std::error::Error>> { | ||
| println!("Testing real Codex uploads..."); | ||
|
|
||
| // Test direct Codex upload using the bindings | ||
| println!(" Testing direct Codex upload..."); | ||
|
|
||
| let test_file_path = | ||
| std::path::PathBuf::from("tests/resources/localities/AE/421168683.pmtiles"); | ||
| if !test_file_path.exists() { | ||
| return Err("Test PMTiles file not found".into()); | ||
| } | ||
|
|
||
| // Create a Codex node using test resources directory | ||
| let test_resources_dir = std::path::Path::new("tests/resources"); | ||
| let config = codex_bindings::CodexConfig::new() | ||
| .log_level(codex_bindings::LogLevel::Info) | ||
| .data_dir(test_resources_dir.join(".codex-test-data")) | ||
| .storage_quota(100 * 1024 * 1024) // 100MB | ||
| .discovery_port(0); // Use random port | ||
|
|
||
| let mut node = codex_bindings::CodexNode::new(config)?; | ||
| node.start()?; | ||
|
|
||
| println!(" Started temporary Codex node for upload test"); | ||
|
|
||
| // Perform real upload | ||
| let upload_options = codex_bindings::UploadOptions::new() | ||
| .filepath(&test_file_path) | ||
| .on_progress(|progress| { | ||
| let percentage = (progress.percentage * 100.0) as u32; | ||
| println!( | ||
| " Upload progress: {} bytes ({}%)", | ||
| progress.bytes_uploaded, percentage | ||
| ); | ||
| }); | ||
|
|
||
| let upload_result = codex_bindings::upload_file(&node, upload_options).await?; | ||
|
|
||
| println!(" ✓ Real upload completed!"); | ||
| println!(" CID: {}", upload_result.cid); | ||
| println!(" Size: {} bytes", upload_result.size); | ||
| println!(" Duration: {} ms", upload_result.duration_ms); | ||
|
|
||
| // Verify the uploaded content exists | ||
| let exists = codex_bindings::exists(&node, &upload_result.cid).await?; | ||
| assert!(exists, "Uploaded content should exist in Codex node"); | ||
| println!(" ✓ Upload verification passed - content exists in node"); | ||
|
|
||
| // Test downloading to verify round-trip | ||
| let test_resources_dir = std::path::Path::new("tests/resources"); | ||
| let download_path = test_resources_dir.join("downloaded.pmtiles"); | ||
| let download_options = | ||
| codex_bindings::DownloadStreamOptions::new(&upload_result.cid).filepath(&download_path); | ||
|
|
||
| let download_result = | ||
| codex_bindings::download_stream(&node, &upload_result.cid, download_options).await?; | ||
|
|
||
| println!(" ✓ Download verification completed!"); | ||
| println!(" Downloaded size: {} bytes", download_result.size); | ||
|
|
||
| // Verify file sizes match | ||
| let original_size = tokio::fs::metadata(&test_file_path).await?.len(); | ||
| let downloaded_size = tokio::fs::metadata(&download_path).await?.len(); | ||
|
|
||
| assert_eq!( | ||
| original_size, downloaded_size as u64, | ||
| "Downloaded file should match original size" | ||
| ); | ||
| println!(" ✓ File size verification passed: {} bytes", original_size); | ||
|
|
||
| // Store the real CID in the database | ||
| let real_cid_mapping = vec![( | ||
| "AE".to_string(), | ||
| 421168683, | ||
| upload_result.cid.clone(), | ||
| upload_result.size as u64, | ||
| )]; | ||
|
|
||
| db_service | ||
| .batch_insert_cid_mappings(&real_cid_mapping) | ||
| .await?; | ||
| println!(" ✓ Stored real CID in database: {}", upload_result.cid); | ||
|
|
||
| // Cleanup | ||
| node.stop()?; | ||
| node.destroy()?; | ||
|
|
||
| println!(" ✓ Real Codex upload test completed successfully"); | ||
|
|
||
| Ok((upload_result.cid, upload_result.size as u64)) | ||
| } |
There was a problem hiding this comment.
Resource leak: The test creates a temporary Codex node on lines 217-218 but never stops or destroys it if an error occurs between creation and the cleanup on lines 281-282. If the test fails at any point (e.g., during upload or download), the node will continue running.
Wrap the node lifecycle in a guard or use defer-like cleanup to ensure the node is always stopped and destroyed, even on test failure.
| let downloaded_size = tokio::fs::metadata(&download_path).await?.len(); | ||
|
|
||
| assert_eq!( | ||
| original_size, downloaded_size as u64, |
There was a problem hiding this comment.
Incorrect cast: On line 262, downloaded_size is cast from the metadata length (which is u64) to u64 again, which is redundant. The actual issue is that on line 259, downloaded_size is assigned from metadata.len() which already returns u64, so the cast on line 262 is unnecessary.
Remove the as u64 cast on line 262 since both variables are already u64.
| original_size, downloaded_size as u64, | |
| original_size, downloaded_size, |
| impl Drop for CodexNodeManager { | ||
| fn drop(&mut self) { | ||
| // Note: This is a synchronous drop, so we can't use async operations here | ||
| // In a real implementation, you might want to ensure proper cleanup | ||
| // by calling stop() explicitly before the manager goes out of scope | ||
| info!("CodexNodeManager being dropped - ensure stop() was called explicitly"); | ||
| } | ||
| } |
There was a problem hiding this comment.
The Drop implementation for CodexNodeManager only logs a message but doesn't actually stop the node. This could lead to resource leaks if the node is not explicitly stopped before the manager is dropped.
Consider either:
- Documenting that
stop()must be called before dropping, or - Implementing graceful shutdown in
Drop(though this is challenging with async operations)
At minimum, the log message should be upgraded to a warning to make it more visible.
| let test_mappings = vec![ | ||
| ( | ||
| "AE".to_string(), | ||
| 421168683, | ||
| "bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi421168683".to_string(), | ||
| 10965857, // Real file size | ||
| ), | ||
| ( | ||
| "AE".to_string(), | ||
| 421168685, | ||
| "bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi421168685".to_string(), | ||
| 1327870, // Real file size | ||
| ), | ||
| ( | ||
| "AE".to_string(), | ||
| 421168687, | ||
| "bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi421168687".to_string(), | ||
| 7484944, // Real file size | ||
| ), | ||
| ]; |
There was a problem hiding this comment.
Hardcoded test file size: The "real file size" values on lines 168, 174, and 180 are hardcoded. If the actual PMTiles files change, these values will become outdated and the test will insert incorrect data.
Consider reading the actual file sizes dynamically using tokio::fs::metadata() instead of hardcoding them.
| _ = async { | ||
| let mut sig_term = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate()) | ||
| .expect("Failed to setup SIGTERM handler"); | ||
| sig_term.recv().await; | ||
| } => { | ||
| info!("Received termination signal, shutting down gracefully..."); | ||
| } |
There was a problem hiding this comment.
Platform-specific code without fallback: The SIGTERM signal handler on lines 184-190 uses tokio::signal::unix::signal, which is Unix-specific and will fail to compile on Windows. This makes the code non-portable.
Consider wrapping this in a conditional compilation block:
#[cfg(unix)]
_ = async {
let mut sig_term = tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())
.expect("Failed to setup SIGTERM handler");
sig_term.recv().await;
} => {
info!("Received termination signal, shutting down gracefully...");
}Or provide a Windows-compatible alternative using tokio::signal::ctrl_c() only on non-Unix platforms.
| @@ -0,0 +1,37 @@ | |||
| # Server Configuration | |||
| SERVER_PORT=8000 | |||
There was a problem hiding this comment.
The SERVER_PORT environment variable on line 2 is not used in the new configuration implementation (LocalitySrvConfig::from_env()). It should be removed from the example file as the application no longer runs an HTTP server.
| SERVER_PORT=8000 |
| // Check how many are already uploaded - simplified for now | ||
| let uploaded_count = 0u32; |
There was a problem hiding this comment.
The uploaded_count is hardcoded to 0 on line 290, which means the upload readiness check will always show 0 uploaded files even if some files have already been uploaded. This defeats the purpose of checking readiness.
The comment says "simplified for now", but this should query the CID database service to get the actual count of uploaded files for accurate reporting. Consider calling cid_db_service.has_cid_mapping() for each locality or implementing a batch count query.
| // Check how many are already uploaded - simplified for now | |
| let uploaded_count = 0u32; | |
| // Query the actual number of uploaded files for this country | |
| let uploaded_count = db_service.get_uploaded_file_count(country_code).await?; |
Migrating from a centralized HTTP/Tor-based server to a decentralized Codex network node. Replacing serving PMTiles files via HTTP endpoints with uploading them to Codex, a persistent distributed storage network.