Chapter 9 | Notion

What is a continuous variable?
What is a categorical variable?
Provide two of the words that are used for the possible values of a categorical variable.
What is a "dense layer"?
How do entity embeddings reduce memory usage and speed up neural networks?
What kinds of datasets are entity embeddings especially useful for?
What are the two main families of machine learning algorithms?
Why do some categorical columns need a special ordering in their classes? How do you do this in Pandas?
Summarize what a decision tree algorithm does.
Why is a date different from a regular categorical or continuous variable, and how can you preprocess it to allow it to be used in a model?
Should you pick a random validation set in the bulldozer competition? If no, what kind of validation set should you pick?
What is pickle and what is it useful for?
How are mse, samples, and values calculated in the decision tree drawn in this chapter?
How do we deal with outliers, before building a decision tree?
How do we handle categorical variables in a decision tree?
What is bagging?
What is the difference between max_samples and max_features when creating a random forest?
If you increase n_estimators to a very high value, can that lead to overfitting? Why or why not?
In the section "Creating a Random Forest", just after <>, why did preds.mean(0) give the same result as our random forest?
What is "out-of-bag-error"?