What is Weka?

Explore how Weka simplifies machine learning with built-in datasets for hands-on data analysis and model evaluation.

Ceyhun Enki Aksan
Ceyhun Enki Aksan Entrepreneur, Maker

In machine learning projects and academic researches, I’d like to briefly mention a few applications that are frequently referenced. One of them is Weka.

At its core, Weka is a data mining program developed in Java and distributed as open-source software by the Waikato University, which integrates machine learning algorithms and data preprocessing requirements. Weka uses the ARFF file format as its file extension. You can visit the official website for version details and download instructions1. Additionally, you can install Weka on macOS through the package manager, providing stable and development versions for Windows, macOS, and Linux environments.

brew cask install weka

Using Weka

Upon downloading and launching the Weka application, you’ll encounter a very simple user interface. The available tools are as follows: Explorer, Experimenter, KnowledgeFlow, Workbench, and Simple CLI. You can also access visualization and other tools through the menu.

Weka Explorer
Weka Explorer

As I previously mentioned, Weka uses the ARFF file format. On the other hand, it’s also appropriate to mention commonly used *.csv (Comma-separated values) files in data mining. These CSV files store data with values separated by commas. Below, I visually present an example of such content.

Vote csv
Vote csv

Before using data stored in Excel within Weka, it must first be saved in CSV format. Then, we need to restructure the CSV content into the ARFF format to use it within Weka2. Let’s proceed with an example. For this purpose, I will use the house-votes-84.csv dataset, which I previously used in the R language example demonstration3.

The ARFF document contains @RELATION, @ATTRIBUTE, and @DATA constructs.

  1. The @RELATION (relationship) specifies the name of our dataset.
  2. The @ATTRIBUTE (attribute) allows for variable definitions. The data types that can be assigned to variables are: numeric (numerical values), real (all real numbers), string (text), nominal (categorical values), and date (dates).
  3. The @DATA (data) represents our dataset.

When we format the content of house-votes-84.csv as ARFF, the resulting content will look like this:

@RELATION house-votes-84
@ATTRIBUTE v16 NUMERIC
@ATTRIBUTE v1 NUMERIC
@ATTRIBUTE v2 NUMERIC
@ATTRIBUTE v3 NUMERIC
@ATTRIBUTE v4 NUMERIC
@ATTRIBUTE v5 NUMERIC
@ATTRIBUTE v6 NUMERIC
@ATTRIBUTE v7 NUMERIC
@ATTRIBUTE v8 NUMERIC
@ATTRIBUTE v9 NUMERIC
@ATTRIBUTE v10 NUMERIC
@ATTRIBUTE v11 NUMERIC
@ATTRIBUTE v12 NUMERIC
@ATTRIBUTE v13 NUMERIC
@ATTRIBUTE v14 NUMERIC
@ATTRIBUTE v15 NUMERIC
@ATTRIBUTE party {republican,democrat}

@DATA
1,-1,1,-1,1,1,1,-1,-1,-1,1,-1,1,1,1,-1,republican
-1,-1,1,-1,1,1,1,-1,-1,-1,-1,-1,1,1,1,-1,republican
-1,-1,1,1,-1,1,1,-1,-1,-1,-1,1,-1,1,1,-1,democrat
...
Weka vote data
Weka vote data

The Explorer application allows access to various tabs such as classification, clustering, regression, feature selection, or feature extraction. Visualization of the outputs of these operations can also be performed through the corresponding tabs and menus.

So, how can we quickly and efficiently convert a CSV file to ARFF format using the Simple CLI tool from Weka? I frequently discuss practical approaches for command-line operations. Those familiar with both the interface and command syntax will find the Simple CLI usage quite straightforward. Examples and explanations for commands available on the Primer4 and Command redirection5 pages are provided. For the conversion process from CSV to ARFF mentioned earlier, we will use the weka.core.converters.CSVLoader converter demonstrated in the Primer page:

# java weka.core.converters.CSVLoader [csv-file-path] [output-file-path]
java weka.core.converters.CSVLoader /Users/kullanici-adi/Desktop/house-votes-84.csv > /Users/kullanici-adi/Desktop/house-votes-84.arff

For practical training on using Weka, you may explore the application’s website. In particular, the content under Book6 and the free courses available in the Courses7 section are highly recommended. Additionally, you may consider the 15-week free Practical Data Mining course offered by Future Learn89. Throughout this process, I will continue with basic examples and written explanations, as appropriate, based on the topics being covered. Before that, you may review the Sample Weka Data Sets10 and Auto-WEKA: Sample Datasets11 datasets for use in the application.

*[CSV]: Comma-separated values
*[ARFF]: Attribute-Relation File Format

Footnotes

  1. Machine Learning at the University of Waikato
  2. ARFF (stable version). WikiSpaces
  3. golearn/examples/datasets/house-votes-84.csv. GitHub
  4. Primer. WikiSpaces
  5. Command redirection. WikiSpaces
  6. Data Mining: Practical Machine Learning Tools and Techniques
  7. Machine Learning Courses. Data Mining with Weka
  8. FutureLearn
  9. Practical Data Mining. FutureLearn
  10. Gary M. Weiss, Ph.D. Sample Weka Data Sets
  11. Auto-WEKA : Sample Datasets