• What is a continuous variable?
  • What is a categorical variable?
  • Provide two of the words that are used for the possible values of a categorical variable.
  • What is a "dense layer"?
  • How do entity embeddings reduce memory usage and speed up neural networks?
  • What kinds of datasets are entity embeddings especially useful for?
  • What are the two main families of machine learning algorithms?
  • Why do some categorical columns need a special ordering in their classes? How do you do this in Pandas?
  • Summarize what a decision tree algorithm does.
  • Why is a date different from a regular categorical or continuous variable, and how can you preprocess it to allow it to be used in a model?
  • Should you pick a random validation set in the bulldozer competition? If no, what kind of validation set should you pick?
  • What is pickle and what is it useful for?
  • How are mse, samples, and values calculated in the decision tree drawn in this chapter?
  • How do we deal with outliers, before building a decision tree?
  • How do we handle categorical variables in a decision tree?
  • What is bagging?
  • What is the difference between max_samples and max_features when creating a random forest?
  • If you increase n_estimators to a very high value, can that lead to overfitting? Why or why not?
  • In the section "Creating a Random Forest", just after <>, why did preds.mean(0) give the same result as our random forest?
  • What is "out-of-bag-error"?