Skip to content

Conversation

@CTTY
Copy link
Collaborator

@CTTY CTTY commented Jan 13, 2026

Which issue does this PR close?

What changes are included in this PR?

  • Support a CREATE EXTERNAL TABLE with catalog so the table is writable

Are these changes tested?

Added some uts and sqllogictests

@CTTY CTTY marked this pull request as ready for review January 14, 2026 01:45
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr, just finished first round of review, it looks great!


# Create external table with bare name (uses "default" namespace)
statement ok
CREATE EXTERNAL TABLE ns_table STORED AS ICEBERG LOCATION ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test for partitioned table?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's not included in this PR, it should be doable by changing things here

I just filed a separate issue to track it: #2050

catalog.create_namespace(&namespace, HashMap::new()).await?;

// Create partitioned test table (unpartitioned tables are now created via SQL)
Self::create_partitioned_table(&catalog, &namespace).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this one.

@liurenjie1024
Copy link
Contributor

After second thought, I still fee that the logic of creating external table in datafusion is a little confusion. IIUC, the calling stack of creating external table is like following:

let table_provider = table_provier_factory.create(...);
schema_provider.register(table_name, table_provider);

If the schema_provider is non iceberg schema provider, for example MemorySchemaProvider, the logic in this pr is absolute correct. But loading a catalog managed iceberg table to put it in non iceberg catalog/schema seem odd to me.

When the schema_provider is an iceberg schema provider, which is encapsulated by iceberg catalog provider, things get complicated. I think in this case we should create a table rather than simply loading from catalog, which always return error(it has been checked in schema_provider).

Due to the complexity, I think a better approach would be to rename exisiting TableProviderFactory to StaticTableProviderFactory, which only loads table from a metadata location. And we create a new CatalogTableProviderFactory, which is backed an iceberg catalog. This CatalogTableProviderFactory is supposed to be used with IcebergCatalogProvider, IcebergSchemaProvider together, which provides fully functional table factory, even creating a new table which doesn't exists yet.

{
return Err(to_datafusion_error(Error::new(
ErrorKind::TableNotFound,
format!("Table '{table_ident}' not found in catalog"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
format!("Table '{table_ident}' not found in catalog"),
format!("Table '{table_ident}' not found in iceberg catalog"),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support CREATE EXTERNAL TABLE backed by a Catalog with DataFusion

2 participants