Skip to content

HIVE-29578: Iceberg: add support for native views#6449

Open
difin wants to merge 2 commits intoapache:masterfrom
difin:iceberg_native_views
Open

HIVE-29578: Iceberg: add support for native views#6449
difin wants to merge 2 commits intoapache:masterfrom
difin:iceberg_native_views

Conversation

@difin
Copy link
Copy Markdown
Contributor

@difin difin commented Apr 23, 2026

What changes were proposed in this pull request?

Added support for Iceberg native views in Hive for both HMS and REST catalogs.

There is a limitation in the current implementation: when Hive uses a REST catalog and creates a view on a partitioned Iceberg table, querying the view only works with CBO disabled. To be addressed in a follow-up PR.

Why are the changes needed?

To support Iceberg native views. This can be especially useful for REST Catalog clients.

Does this PR introduce any user-facing change?

Yes, new HQL syntax:

create view <view_name> as select * from <src_tbl> stored by iceberg;

How was this patch tested?

Created new and updated exiting unit and integration tests with Iceberg native views test cases.

@difin difin force-pushed the iceberg_native_views branch from 4fdad42 to 252c608 Compare April 24, 2026 23:06
@difin difin force-pushed the iceberg_native_views branch from 252c608 to e10eba5 Compare April 24, 2026 23:31
@difin difin marked this pull request as ready for review April 24, 2026 23:31
@difin difin changed the title HIVE-29578: Iceberg: support for Iceberg native views HIVE-29578: Iceberg: support native views Apr 24, 2026
@difin difin changed the title HIVE-29578: Iceberg: support native views HIVE-29578: Iceberg: add support for native views Apr 24, 2026
@difin difin force-pushed the iceberg_native_views branch from e10eba5 to 96fa476 Compare April 25, 2026 20:46
@difin difin requested review from deniskuzZ and kasakrisz April 25, 2026 20:46
@difin difin force-pushed the iceberg_native_views branch from 96fa476 to 114412a Compare April 26, 2026 15:12

delete from src_ice where last_name in ('ln1a', 'ln2a', 'ln7a');

create view v_ice as select * from src_ice stored by iceberg;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO think the syntax should follow materialized view syntax

create materialized view mat1 stored by iceberg stored as orc tblproperties ('format-version'='1') as
select tbl_ice.b, tbl_ice.c from tbl_ice where tbl_ice.c > 52;

I checked some other database engines (Trino, Dremio) that supports Iceberg logical views, none of them adds extra keywords to the SQL syntax but they enable define the catalog where the view should be stored and that catalog should be Iceberg

Comment on lines +205 to +209
/**
* Optional trailing {@code tableFileFormat} on CREATE VIEW: only {@code STORED BY ICEBERG} is allowed
* (no serde properties or {@code STORED AS} tail).
*/
private boolean validateOptionalViewStorageClause(ASTNode storageRoot) throws SemanticException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keywords STORED BY ICEBERG are a bit confusing because no data is actually stored in the case of logical views. Some engines do not require extra keywords to specify when creating Iceberg logical views.

If we insist on using keywords, how about something like these?

create view <view_name> viewproperties(format='iceberg')
as select...;

create view <view_name> format iceberg
as select...;

If we decide to go with the STORED BY ICEBERG keywords, please create a new grammar rule specifically for views—similar to tableFileFormat—called viewMetadataFormat. This should limit the grammar to the STORED BY <identifier> syntax. By doing this, you can eliminate the need for extra validation checks in the analyzer.

I recommend checking the configuration setting hive.default.storage.handler.class when deciding where to store the view metadata. If a storage handler is set that supports views, let's use the Storage Handler API to store the metadata.

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Created a rule viewMetadataFormat.
  • Moved STORED BY ICEBERG closer to the view definition similar to CTAS. i.e.  `STORED BY ICEBERG as SELECT.
  • Made STORED BY ICEBERG optional. if not specified, deducting the type based on hive.default.storage.handler.class conf.

result.setLastAccessTime(nowSec);
result.setRetention(Integer.MAX_VALUE);

boolean hiveEngineEnabled = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is hiveEngineEnabled and why is it false?

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hiveEngineEnabled switches how HiveOperationsBase.storageDescriptor fills the Storage Desacriptor: with HiveIcebergInputFormat / HiveIcebergOutputFormat / HiveIcebergSerDe when true, or the usual placeholder FileInputFormat / FileOutputFormat / LazySimpleSerDe when false.

Why it’s false in toHiveView:

This path materializes an HMS VIRTUAL_VIEW for REST catalog that expose Iceberg view metadata through the HMS API. That row isn’t meant to drive a Hive table scan the way a real Iceberg table commit does; execution still comes from the view definition / catalog, not from wiring Iceberg MR formats on the stub. HiveViewOperations does the same thing (hiveEngineEnabled = false).

So we keep a minimal SD consistent with normal virtual views and avoid implying this HMS object is an Iceberg-backed table for the Hive engine. For tables, HiveTableOperations still turns engine integration on/off via metadata + ConfigProperties.ENGINE_HIVE_ENABLED where that actually matters.

private static ViewBuilder applyCommentAndTblProps(
ViewBuilder builder, Map<String, String> tblProperties, String comment) {
ViewBuilder viewBuilder = builder;
if (comment != null && !comment.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about isNotBlank ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return (ViewCatalog) catalog;
}

private static ViewBuilder startViewBuilder(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.: startViewBuilder, applyCommentAndTblProps, commitView doesn't add much value when we already have a builder.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - replaced the too verbose methods with inline code.

Comment on lines +57 to +58
if (cat.viewExists(id)) {
cat.dropView(id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the result of dropView when the view with the specified name doesn't exists? I thinkg about whether the cat.viewExists(id) is necessary

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of dropView when the view with the specified name doesn't exists is false. Else true. I removed the cat.viewExists(id) check.

}

@Test
public void testIfNotExistsReturnsFalseWhenViewExists() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the test method name testIfNotExistsReturnsFalseWhenViewExists is misleading? We are testing createOrReplaceNativeView not the IfNotExists method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the name was vague, IfNotExists is not a method, but one of the parameters.
I renamed the method to:
testCreateOrReplaceNativeViewSkipsWhenViewExistsAndIfNotExistsFlagTrue

create view v_ice as select * from src_ice stored by iceberg;

select * from v_ice;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add

  • logical view which does some transformation on it's base table and query from it?
  • create views when the schema is specified and not specified.

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logical view which does some transformation on it's base table and query from it?

This is not supported by Hive itself:

update v_ice set last_name = last_name + 'a' 
fname=iceberg_native_view.q

See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
 org.apache.hadoop.hive.ql.parse.SemanticException: You cannot update or delete records in a view
	at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.validateTargetTable(RewriteSemanticAnalyzer.java:265)
	at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyze(RewriteSemanticAnalyzer.java:84)
	at org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:73)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
	at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
	at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:499)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:451)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:415)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:234)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:358)
	at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:790)
	at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:760)
	at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
	at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:139)

create views when the schema is specified and not specified.

Done

break;
}
}
boolean icebergNativeView = validateOptionalViewStorageClause(storageClause);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not hardcode anything like Iceberg into compiler code. The compiler is independent from the storage handler. I'm aware that we already hove lots of code which violates this principal and it already causes lots of troubles.

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - moved all Iceberg-specific code into HiveIcebergStorageHandler and kept generic interfaces in the Compiler.

private static final long serialVersionUID = 1L;

/** HMS table property set when the view is declared with {@code STORED BY ICEBERG} (native Iceberg view). */
public static final String ICEBERG_NATIVE_VIEW_PROPERTY = "hive.iceberg.native.view";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

private final boolean ifNotExists;
private final boolean replace;
private final List<FieldSchema> partitionColumns;
private final boolean icebergNativeView;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +104 to +107
@Explain(displayName = "iceberg native view", displayOnlyOnTrue = true)
public boolean isIcebergNativeView() {
return icebergNativeView;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this from here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants