- What is a continuous variable?
- What is a categorical variable?
- Provide two of the words that are used for the possible values of a categorical variable.
- What is a "dense layer"?
- How do entity embeddings reduce memory usage and speed up neural networks?
- What kinds of datasets are entity embeddings especially useful for?
- What are the two main families of machine learning algorithms?
- Why do some categorical columns need a special ordering in their classes? How do you do this in Pandas?
- Summarize what a decision tree algorithm does.
- Why is a date different from a regular categorical or continuous variable, and how can you preprocess it to allow it to be used in a model?
- Should you pick a random validation set in the bulldozer competition? If no, what kind of validation set should you pick?
- What is pickle and what is it useful for?
- How are
mse, samples, and values calculated in the decision tree drawn in this chapter?
- How do we deal with outliers, before building a decision tree?
- How do we handle categorical variables in a decision tree?
- What is bagging?
- What is the difference between
max_samples and max_features when creating a random forest?
- If you increase
n_estimators to a very high value, can that lead to overfitting? Why or why not?
- In the section "Creating a Random Forest", just after <>, why did
preds.mean(0) give the same result as our random forest?
- What is "out-of-bag-error"?