In the following notebook, there is a possible mistake in a code cell:
07-how-random-forests-really-work.ipynb
The code cell in question:
pd.DataFrame(dict(cols=trn_xs.columns, imp=m.feature_importances_)).plot('cols', 'imp', 'barh');
The description around the cell denotes that this should plot the feature_importances_ of the RandomForestClassifier, but m is the DecisionTreeClassifier initialized in one of the code cells above, like so:
m = DecisionTreeClassifier(max_leaf_nodes=4).fit(trn_xs, trn_y);
It is my understanding that the code cell needs to be the following:
pd.DataFrame(dict(cols=trn_xs.columns, imp=rf.feature_importances_)).plot('cols', 'imp', 'barh');
Context for rf:
rf = RandomForestClassifier(100, min_samples_leaf=5)
I would be happy to open a PR, once someone acknowledges that this is in fact a valid correction. Thanks! 🤗
Below is a screenshot to show that they indeed plot different things
