\documentclass[twoside, a4paper, final]{article}
\usepackage[english]{babel}
\usepackage{a4wide}
\usepackage{eurosym}
\usepackage{times}
\usepackage{type1cm}
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
\usepackage{amssymb, amsmath}
\usepackage{xr-hyper}
\usepackage[colorlinks=true,linkcolor=blue,pdftex]{hyperref}

\title{PANDA MVA documentation.}
\author{M. Babai}
\date{14 May 2010.}
%\date{\today}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\maketitle
\begin{center}
\rule[0.1mm]{5.0cm}{2.0mm}
\end{center}
\tableofcontents
\hrulefill
\newpage
%========================= KNN =====================================
\section{k-nearest neighbor algorithm (KNN)}
In the current package placed in "pandaroot/PndTools/MVA", there are
two different implementations of this algorithm:
\begin{description}
\item [KNN:] This is the very simple implementation of KNN, based on
  linear search. For large data sets and large number of features
  (parameters), this implementation becomes very slow and almost
  impractical. This is included just for validation purpose.
\item [TMVAkd\_KNN:] This implementation is using a kd-tree for
  storage and searching through the available examples. This complex
  data structure is fast in construction and all search and traverse
  operations are of the Olog(n) complexity, here n is the number of
  examples inserted into the data structure. Very fast but needs a lot
  of examples in order to build a representative and usefull
  database. Needs large memory.
\end{description}

The current implementations run linear in worst case and the result is
an estimation of the probability densities. These are normalized among
the available classes (labels).

\subsection{Using KNN}
In both the directories "KNN" and "TMVAkd\_KNN" there are examples
that show how to use these implementations which are a good starting
point. One can modify these examples to perform pattern
classification.\newline{\bf Notes:}
\begin{description}
\item [Training:] For KNN there is no need for training the
  classifier. If the input weight file is already pre-processed
  (normalization, decorrelated, ...), that can be used directly for
  classification.
\item [Weights:] Example up to date weight files which can be
  used by this algorithm can be fetched using the scripts available in
  the directory "macro/scripts/". These are not available via svn
  merely due their size ($\pm \ 800MB$).
\end{description}

%========================= LVQ =====================================
\section{Learning Vector Quantization (LVQ)}
This directory contains a very simple implementation of the LVQ1 and
LVQ2.1 algorithms. One can use these functions directly to create a
set of prototypes and perform the training using each of the mentioned
algorithms. The classifier itself is also implemented. At this point
the classifier returns only the shortest distances to every class type
and no decision is made. The user needs to modify the code and place a
discriminator based on the output if one wants the decision to be made
by the classifier.

\subsection{Using LVQ}
\begin{description}
\item [Training:] In the directory LVQ there is an example
  implementations (LVQtrain.cpp). This sample program can be used as a
  starting point for a training scheme.
\item [Weights:] Example up to date weight files which can be used by
  this algorithm can be fetched using the scripts available in the
  directory "macro/scripts/". These are not available via svn merely
  due their size.
\item [Classification:] In the directory LVQ there is an example
  implementations (LVQclassify.cpp). This sample program can be used
  as a starting point for a classification scheme. Shortest Mva value
  indicates a \underline{{\it{\bf better match}}}. {\it In other words
    the current example is most likely to be from the lable with the
    smallest MVA value}.
\end{description}

%========================= K-means Clustering ===================
\section{K-means Clustering}
The directory "Clusters" containes an implementation of this
algorithm. Given a set of parameter vectors and the number of expected
clusters (centroids), "k-Means" generates for each cluster a centroid
that represents the mean vector of that particular set of parameter
vectors. At this moment only the "Hrad K-Means" is implemented. Which
means that each initialized center belongs to a single cluster. It is
also possible to generate weighted centroids that are particially
member of two or more clusters. The latter is called "Soft K-Means"
and is not implemented yet.
\end{document}