Skip to content

[CARBONDATA-4045] Add TPCDS TestCase for Spark on CarbonData Integration Test#3997

Open
marchpure wants to merge 1 commit intoapache:masterfrom
marchpure:tpcds
Open

[CARBONDATA-4045] Add TPCDS TestCase for Spark on CarbonData Integration Test#3997
marchpure wants to merge 1 commit intoapache:masterfrom
marchpure:tpcds

Conversation

@marchpure
Copy link
Copy Markdown
Contributor

@marchpure marchpure commented Oct 25, 2020

Why is this PR needed?

There is no TPC-DS TestCases in the current source code. It is difficult to debug TPC-DS on small dataset. Also, TPC-DS TestCase would help to find possible issues

What changes were proposed in this PR?

  1. Add small dataset of TPCDS
  2. Add SQL of TPCDS
  3. Add 54 Testcases (There are 99 testcases in TPCDS, but only 54 testcases can run without empty result)

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • Yes

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2920/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4677/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2921/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4678/

@marchpure
Copy link
Copy Markdown
Contributor Author

retest this please

@marchpure marchpure changed the title [WIP]Add TPCDS TestCase [WIP] Add TPCDS TestCase Oct 25, 2020
@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2923/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4680/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4681/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2924/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2927/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4684/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4690/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2933/

@CarbonDataQA1
Copy link
Copy Markdown

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4693/

@CarbonDataQA1
Copy link
Copy Markdown

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2936/

Why is this PR needed?
There is no TPC-DS TestCases in the current source code. It is difficult to debug TPC-DS on small dataset. Also, TPC-DS TestCase would help to find possible issues

What changes were proposed in this PR?
1) Add small dataset of TPCDS
2) Add SQL of TPCDS
3) Add 54 Testcases (There are 99 testcases in TPCDS, but only 54 testcases can run without empty result)

Does this PR introduce any user interface change?
No

Is any new testcase added?
Yes
@marchpure marchpure changed the title [WIP] Add TPCDS TestCase [CARBONDATA-4045] Add TPCDS TestCase Oct 27, 2020
@CarbonDataQA1
Copy link
Copy Markdown

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4697/

@CarbonDataQA1
Copy link
Copy Markdown

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2940/

@marchpure marchpure changed the title [CARBONDATA-4045] Add TPCDS TestCase [CARBONDATA-4045] Add TPCDS TestCase for Spark Integration Testing Oct 27, 2020
@marchpure marchpure changed the title [CARBONDATA-4045] Add TPCDS TestCase for Spark Integration Testing [CARBONDATA-4045] Add TPCDS TestCase for Spark Integration Test Oct 27, 2020
@marchpure marchpure changed the title [CARBONDATA-4045] Add TPCDS TestCase for Spark Integration Test [CARBONDATA-4045] Add TPCDS TestCase for Spark on CarbonData Integration Test Oct 27, 2020
@ajantha-bhat
Copy link
Copy Markdown
Member

@marchpure : The reason why we don't have TPCH and TPCDS in UT is we need a huge data set, loading huge data takes time.

What is the reason behind adding this? we can anyways have separate TPCH or TPCDS machines that can have automation script to give performance benchmark on every release

Also no need to run TPCH and TPCDS on every PR builder. Running once per release is enough.

@QiangCai , @kunal642 : What's your opinion on this?

@marchpure
Copy link
Copy Markdown
Contributor Author

marchpure commented Oct 27, 2020

@marchpure : The reason why we don't have TPCH and TPCDS in UT is we need a huge data set, loading huge data takes time.

What is the reason behind adding this? we can anyways have separate TPCH or TPCDS machines that can have automation script to give performance benchmark on every release

Also no need to run TPCH and TPCDS on every PR builder. Running once per release is enough.

@QiangCai , @kunal642 : What's your opinion on this?

  1. The TPCDS dataset in this PR is really small(totally 33KB). it won"t took so much time to load and query. It may help to avoid possible issues with accepted overhead.
  2. it also help us to debug tpcds. explain plan ~analyse in local environment.

the inspiration to add tpcds test case is CARBONDATA 4008. Whose issue is Spark on CarbonData will fail in TPCDS Query 83. This issue seems has been there for a log time, which implies that our UT is not enough.
I believe that we can add a profile to turn on/off of TPCDS test in the future if the automatic TPCDS machine is ready.

Maybe we can have a module name 'carbondata-integretion-test'?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants