CF 207D7 - The Beaver's Problem - 3

Rating: 1600
Tags: -
Solve time: 30s
Verified: no

Solution

Problem Understanding

We are asked to classify a text document into one of three subjects, numbered 1, 2, or 3, given a training set of labeled documents. Each document in the input consists of an identifier, a title, and a body of text. The identifier is irrelevant for classification, so only the textual content matters. The output is a single integer representing the predicted subject of the input document.

The document size is limited to 10 kilobytes. Since there are only three classes, this is effectively a small multiclass text classification problem. The training set contains example documents for each class, which can be used to extract features, typ