Data choice. Choose any dataset from the repository that has at least five attributes, and for which the default task is classification. Transform this dataset into an appropriate one to load into your chosen analytics software.

Background information. Write a description of the dataset and project. Provide an overview of what the dataset is about, including from where and how it has been gathered, and for what purpose.

Data description. Describe how many instances does the dataset contain, how many attributes there are in the dataset, their names, and include which is the class attribute.

Include in your description details of any missing values, and any other relevant characteristics. Use appropriate pandas functions to initially analyse the data, for instance descriptive statistics of each attribute, including description of the range of possible values of the attributes, and visualise these in a graphical format.

Initial analysis. You will need to make decisions about which features to include in your dataframe, and how to deal with missing values (if they exist). You might need preprocess the dataset attributes. Useful techniques will include remove certain attributes, exploring different ways of discretizing continuous attributes and replacing missing values. Discretizing is the conversion of numeric attributes into “nominal” ones by binning numeric values into intervals. If you replaced missing values explain what strategy you used to select a replacement of the missing values.

