Data mining tools
Usually, when I work on data mining problems using genetics-based machine learning, I tend to compare the results with the ones obtained using non evolutionary methods. I know Martin and Jaume have also been using some of this tools too in their data mining related papers.
The first one I started using was WEKA. It has a nice collection of a classification, regression, and clustering algorithms. Written in Java, it is easy to use, providing a flexible environment for rapid preliminary filtering and analysis of raw data. Recently, I have notice the existence at least more than 20 different projects using such framework.
Lately, I have moved from WEKA to D2K, a data mining framework developed by the Automated Learning Group at the National Center for Supercomputing Applications. It is again pure Java. The thing I like the most about D2K, and one of the reasons for switching, is the data flow oriented paradigm that it uses. Using an intuitive graphical editor, complicated analysis and visualization task are rapidly deployed by simple drag & drop. I have been heavily using D2K in the DISCUS project, and I have no regrets about not using WEKA anymore. Only good words is what I have about D2K’s quality and how much effort the ALG people put into it to make a great package easy to extend and customize.
Another tool I want to mention is a pretty specialized library. LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC ), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM ). They provide sources in C++, Java, and C# .NET, and interfaces to Python, R , Matlab, Perl, and Ruby interfaces. I have been using it in some of my recent research, and if you are interested in such areas, I definitely recommend you to take a look at it.
And these leads me to one of my favorite tools, R. R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. Contributed packages include all sorts of tools. I just want to point out the project on graphical models. I would recommend such tool to anyone who wants to speed up the analysis of the results that his/her GAs generate.
Posted by admin on January 27th, 2005 under Illigal-blogging
Comments: none
Write a comment