Solution to Kaggle Intro to Machine Learning: Random Forests
Recap
Here’s the code you’ve written so far.
In [1]:
# Code you have previously used to load data
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
home_data = pd.read_csv(iowa_file_path)
# Create target object and call it y
y = home_data.SalePrice
# Create X
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[features]
# Split into validation and training data
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
# Specify Model
iowa_model = DecisionTreeRegressor(random_state=1)
# Fit Model
iowa_model.fit(train_X, train_y)
# Make validation predictions and calculate mean absolute error
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE when not specifying max_leaf_nodes: {:,.0f}".format(val_mae))
# Using best value for max_leaf_nodes
iowa_model = DecisionTreeRegressor(max_leaf_nodes=100, random_state=1)
iowa_model.fit(train_X, train_y)
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE for best value of max_leaf_nodes: {:,.0f}".format(val_mae))
# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex6 import *
print("\nSetup complete")
Validation MAE when not specifying max_leaf_nodes: 29,653 Validation MAE for best value of max_leaf_nodes: 27,283 Setup complete
Exercises
Data science isn’t always this easy. But replacing the decision tree with a Random Forest is going to be an easy win.
Step 1: Use a Random Forest
In [2]:
from sklearn.ensemble import RandomForestRegressor
# Define the model. Set random_state to 1
rf_model = RandomForestRegressor(random_state = 1)
# fit your model
rf_model.fit(train_X, train_y)
rf_model_preds = rf_model.predict(val_X)
# Calculate the mean absolute error of your Random Forest model on the validation data
rf_val_mae = mean_absolute_error(val_y, rf_model_preds)
print("Validation MAE for Random Forest Model: {}".format(rf_val_mae))
# Check your answer
step_1.check()
Validation MAE for Random Forest Model: [187439.65 149083.25 129767.58 85257. 149803.09 272586.79 326443.03 146521.5 220213.25 230411.5 173464.5 102247.36 204792.21 245101.54 255794.73 113153.5 113356. 146016.24 191653.77 142532.6 148765.5 194716.4 221809. 315077.49 105664. 169499.5 126757.58 191800.86 500298.02 128329.04 146786.36 100409.66666667 121062.5 107875. 132895.83 345668.25 114746.5 109237.71 289211.26 109256.5 145510.75 132667.5 105491. 115135.75 182216. 172469. 89118.5 196786.79 243525.29 221627.1 126305.5 393993.55 105281. 238323.83 191915.79 111486.5 128926. 174488. 124880.25 170135.03 166700. 317512.85 119112. 126428. 165634. 125155.5 133505.98 225090.7 156372.6 151839.64 224998.28 133699.7 318265.06 153992.84 130803.71 224836.3 169317. 112045.5 353945.28 195846.8 202634.59 134473.18 130205.1 153517.5 179145. 144430.36 153471. 164623.62 191630.7 171457.5 243688.66 156994.5 124509.5 146029.24 123507.71 110219.4 121706.08 138924. 135340. 144079.59 187627.18 127850.73333333 125000.03 117870. 109503.08 149267.44 192014.39 138555.74 153952.19 309964.42 115983.5 173658.93 144523.5 195289.59 229389.48 179807.77 240166.32 122655.5 173794.82 198784.02 155173. 253093.86 335603.22 139632. 227306.54 158736.37 316049.76 122802. 187598.44 214576.63 269662.07 124458.5 127715.25 107544. 100862.5 203085. 557822.1 297632.64 235094. 113951.16 150495. 290039.5 139388.5 199542.1 135379.46 227540.94 112555. 222135.3 218986.29 127137.24 180233.25 180880.6 126278. 180738. 173939. 338562.16 86860. 144701.24 103908.75 139674.1 104470.5 106159.84 152506.06 144896.2 141606.6 144332. 176287. 161693.34 148811. 124563. 243979.1 185385.24 235895.38 296466.53 189403.25 139364. 182035.5 210152.71 132663.96 148358.5 125608.08 170714.76 184350.18 126784.82 275975.44 175424.14 395770.23 286930.03 154651.11 118034.5 114312. 151984.42 91189.33333333 150178.08333333 126310.72 307206.53 257118.87 142856.5 148190.25 71703.5 187530.3 232009.63 166689.5 186362. 275872.74 109350.04 189618.19 325292.59 266024.58 244653.4 185677.2 110017.34 167486.73 134781. 273058.82 255291.95 122149.5 96260. 188690.5 86519.72 609470.96 120787. 170355.5 170172.7 106426. 115271.8 199903.41 183842.26 168989.63571429 173434. 124343.5 185548. 112474.5 127609.42 270973.06 133745.06 293087.67 116358.48 146489.32 293240.26 261006.63 148282.1 148333. 163170.32 138944.5 131167. 146302.5 165071.5 164632.09 160321.27 119811.58 115455.2 181051.54 157211. 151620. 113760. 178026. 81468.33 335180.88 131141. 193692.03 220201. 213366.97 184467.9 124498.04 182581.5 155761.75 135034.5 144176. 98432. 172969.5 178899.77 110214.54 132350. 175524. 132746.46 118704.38 136847. 199567.9 135113. 139779.54 135251. 132969.21 151249. 236750.33 234125.7 129019.18 112218.08 268633.17 113800.25 112169.5 270273.33 142687. 118706.66 193989.62 200493. 180013.35 241505.51 152234. 114292.25 151095.5 144472.5 134803. 178551. 143555.01333333 272760. 137916. 184546.5 165375.56 192219.87 154823. 168881.03 95577. 248745.84 147720.84 101184.5 259193.2 185586. 180433.89 174291.18 185896.95 183600. 125930.21 243360.18 134696.25 363444.75 163915.01 224671.21 152015.43 131714.34 136313. 215748.2 252742.84 272112.19 185320. 160618.64 139991.2 140396.5 277316.9 181259.39 247413.5 196697. 88518.5 236797.62 115513.25 159140.96 139489. 151250.72 263169.49 101315. 190183.33 147561. 369019.05 128854.5 146443.1 149975.64 200523.56 153335. 117320.5 299389.65 172783.65333333 180620.89 132581.04 287663.41 207253.16 ]
Incorrect: Expected rf_val_mae to be a number with a decimal type. Observed type <class ‘numpy.ndarray’>
In [ ]:
# The lines below will show you a hint or the solution.
# step_1.hint()
# step_1.solution()
So far, you have followed specific instructions at each step of your project. This helped learn key ideas and build your first model, but now you know enough to try things on your own.
Machine Learning competitions are a great way to try your own ideas and learn more as you independently navigate a machine learning project.