pythonKaggleHousePrice/step at master · Wkalpha/pythonKaggleHousePrice · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1. Import data (Of course)
Train and test data should be import.

2. Exploratory data analysis(EDA) your data
Understand data that help you how to preprocess data.

3. Preprocess data
Fill NaN : Model can't handle NaN data, so we need to fill that with median or zero or something.
Delete outlier : Outlier data will reduce model performance, so we need decide delete or not.
Convert data type : Linear model usually can't handle text feature, so we need conver that into int.

4. Feature engineering
Most important thing in data science, it decide your model's performance.

5. Training model
Using XGBoostRegressor or other model with your input data.

6. Tuneing model paramater
This step can imporve your model performance.

In summary we probably know how to build a model, but we actully don't know, because we can't image the result.
We need hand-on. See below.