Preface
This is a book about (machine) learning from (experimental) data. Many
books devoted to this broad field have been published recently.
One even feels tempted to begin the previous sentence with an
adjective extremely. Thus, there is an urgent need to
introduce both the motives for and the content of the present
volume in order to highlight its distinguishing features.
Before doing that, few words about the very broad meaning of data are in
order. Today, we are surrounded by an ocean of all kind of
experimental data (i.e., examples, samples, measurements, records,
patterns, pictures, tunes, observations,..., etc) produced by
various sensors, cameras, microphones, pieces of software and/or
other human made devices. The amount of data produced is enormous
and ever increasing. The first obvious consequence of such a fact
is  humans can¡¦t handle such massive quantity of data which are
usually appearing in the numeric shape as the huge (rectangular or
square) matrices. Typically, the number of their rows (n)
tells about the number of data pairs collected, and the number of
columns (m) represent the dimensionality of data. Thus,
faced with the Giga and Terabyte sized data files one has to
develop new approaches, algorithms and procedures. Few techniques
for coping with huge data size problems are presented here. This,
possibly, explains the appearance of a wording ¡¦huge data
sets¡¦ in the title of the book.
Another direct consequence is that (instead of attempting to dive into
the sea of hundreds of thousands or millions of highdimensional
data pairs) we are developing other "machines" or
"devices" for analyzing, recognizing and/or learning
from, such huge data sets. The socalled "learning
machine" is predominantly a piece of software that implements
both the learning algorithm and the function (network, model)
which parameters has to be determined by the learning part of the
software. Today, it turns out that some models used for solving
machine learning tasks are either originally based on using
kernels (e.g., support vector machines), or their newest
extensions are obtained by an introduction of the kernel functions
within the existing standard techniques. Many classic data mining
algorithms are extended to the applications in the
highdimensional feature space. The list is long as well as the
fast growing one and just the most recent extensions are mentioned
here. They are  kernel principal component analysis, kernel
independent component analysis, kernel least squares, kernel
discriminant analysis, kernel kmeans clustering, kernel
selforganizing feature map, kernel Mahalanobis distance, kernel
subspace classification methods and kernel functions based
dimensionality reduction. What the kernels are, as well as why and
how they became so popular in the learning from data sets tasks,
will be shown shortly. As for now, their wide use as well as their
efficiency in a numeric part of the algorithms (achieved by
avoiding the calculation of the scalar products between extremely
high dimensional feature vectors), explains their appearance in
the title of the book.
Next,
it is worth of clarifying the fact that many authors tend to label
similar (or even same) models, approaches and algorithms by
different names. One is just destine to cope with concepts of data
mining, knowledge discovery, neural networks, Bayesian networks,
machine learning, pattern recognition, classification, regression,
statistical learning, decision trees, decision making etc. All of
them usually have a lot in common, and they often use the same set
of techniques for adjusting, tuning, training or learning the
parameters defining the models. The common object for all of them
is a training data set. All the various approaches mentioned start
with a set of data pairs (x_{i}, y_{i})
where x_{i} represent the input variables (causes,
observations, records) and y_{i} denote the
measured outputs (responses, labels, meanings). However, even with
the very commencing point in machine learning (namely, with the
training data set collected), the real life has been tossing the
coin in providing us either with

a set of genuine
training data pairs (x_{i},
y_{i}) where for each
input x_{i} there is
a corresponding output y_{i}
or with,

the partially
labeled data containing both the pairs (x_{i},
y_{i}) and the sole
inputs x_{i} without
associated known outputs y_{i}
or, in the worst case scenario, with

the set of sole
inputs (observations or records) x_{i}
without any information about the possible desired output
values (labels, meaning) y_{i}.
It is a genuine
challenge indeed to try to solve such differently posed machine
learning problems by the unique approach and methodology. In fact,
this is exactly what did not happen in the real life because the
development in the field followed a natural path by inventing
different tools for unlike tasks. The answer to the challenge was
a, more or less, independent (although with some overlapping and
mutual impact) development of three large and distinct subareas
in machine learning  supervised, semisupervised and unsupervised
learning. This is where both the subtitle and the structure of the
book are originated from. Here, all three approaches are
introduced and presented in details which should enable the reader
not only to acquire various techniques but also to equip
him/herself with all the basic knowledge and requisites for
further development in all three fields on his/her own.
back
to top
The
presentation in the book follows the order mentioned above. It
starts with seemingly most powerful
supervised learning approach in solving classification (pattern
recognition) problems and regression (function approximation) tasks at the moment, namely with support vector
machines (SVMs). Then, it continues with two
most popular and promising semisupervised approaches (with graph based semisupervised learning
algorithms; with the Gaussian random fields
model (GRFM) and with the consistency method (CM)).
Both the original setting of methods and their improved versions
will be introduced. This makes the volume to
be the first book on semisupervised learning
at all. The book¡¦s final part focuses on the two most appealing
and widely used unsupervised methods
labeled as principal component analysis (PCA)
and independent component analysis (ICA). Two algorithms are the working horses in unsupervised learning today and
their presentation, as well as a pointing to
their major characteristics, capacities and differences, is given the highest care here.
The
models and algorithms for all three parts of machine learning
mentioned are given in the way that equips
the reader for their straight implementation. This
is achieved not only by their sole presentation but also through the applications of the models and algorithms to
some low dimensional (and thus, easy to
understand, visualize and follow) examples. The equations and models provided will be able to handle much bigger
problems (the ones having much more data of
much higher dimensionality) in the same way as they did the
ones we can follow and ¡¦see¡¦ in the examples provided. In the
authors¡¦experience and opinion, the approach adopted here is the
most accessible, pleasant and useful way to
master the material containing many new (and potentially
difficult) concepts.
