\documentclass[twoside, a4paper, final]{article}
\usepackage[english]{babel}
\usepackage{a4wide}
\usepackage{eurosym}
\usepackage{times}
\usepackage{type1cm}
\usepackage[T1]{fontenc}
\usepackage[latin1]{inputenc}
\usepackage{amssymb, amsmath}
\usepackage{xr-hyper}
\usepackage[colorlinks=true,linkcolor=blue,pdftex]{hyperref}

\title{PANDA MVA documentation.}
\author{M. Babai}
\date{14 May 2010.}
%\date{\today}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\maketitle
\begin{center}
\rule[0.1mm]{5.0cm}{2.0mm}
\end{center}
\tableofcontents
\hrulefill
\newpage
%========================= KNN =====================================
\section{k-nearest neighbor algorithm (KNN)}
In the current package placed in "pandaroot/PndTools/MVA", there are
two different implementations of this algorithm:
\begin{description}
\item [KNN:] This is the very simple implementation of KNN, based on
  linear search. For large data sets and large number of features
  (parameters), this implementation becomes very slow and almost
  impractical. This is included just for validation purpose.
\item [TMVAkd\_KNN:] This implementation is using a kd-tree for
  storage and searching through the available examples. This complex
  data structure is fast in construction and all search and traverse
  operations are of the Olog(n) complexity, here n is the number of
  examples inserted into the data structure. Very fast but needs a lot
  of examples in order to build a representative and useful
  database. Needs large memory.
\end{description}

The current implementations runs linear in worst case and the result
is an estimation of the probability densities (pdf). These are
normalized among the available classes (labels).

\subsection{Using KNN}
In both the directories "KNN" and "TMVAkd\_KNN" there are examples
that show how to use these implementations which are a good starting
point. One can modify these examples to perform pattern
classification.\newline{\bf Notes:}
\begin{description}
\item [Training:] For KNN there is no need for training the
  classifier. If the input weight file is already pre-processed
  (normalization, decorrelated, ...), then it can be used directly for
  classification. Note that the same modifications need to be applied
  to the new yet to be classified patterns.
\item [Weights:] Example up to date weight files which can be used by
  this algorithm can be fetched using the scripts available in the
  directory "macro/scripts/". These files are not available via svn,
  merely due their size ($\pm \ 800MB$).
\end{description}

%========================= LVQ =====================================
\section{Learning Vector Quantization (LVQ)}
This directory contains a very simple implementation of the LVQ1 and
LVQ2.1 algorithms. One can use these functions directly to create a
set of prototypes and perform the training using each of the mentioned
algorithms. There are also tools provided for creating ROC and to
perform k-folds cross-validation. The classifier itself is also
implemented. At this point the classifier returns only the shortest
distances to every class type, normalized by the sum of the outputs
and no decision is made. The user needs to modify the code and place a
discriminator based on the output if one wants the decision to be made
by the classifier.

\subsection{Using LVQ}
\begin{description}
\item [Training:] In the directory LVQ there is an example
  implementations (LVQtrain.cpp). This sample program can be used as a
  starting point for a training scheme, Cross-validation, trainer
  error estimation or error evolution.
\item [Weights:] Example up to date weight files which can be used by
  this algorithm can be fetched using the scripts available in the
  directory "macro/scripts/". These are not available via svn merely
  due their numbers.
\item [Classification:] In the directory LVQ there is an example
  implementations (LVQclassify.cpp). This sample program can be used
  as a starting point for a classification scheme or ROC
  production. Shortest Mva value indicates a \underline{{\it{\bf
        better match}}}. {\it In other words the current example is
    most likely to be from the label with the smallest MVA
    value}. Furthermore, the output is normalized by the sum of the
  outputs for all labels.
\end{description} 

%========================= K-means Clustering ===================
\section{K-means Clustering}
The directory "Clusters" contains an implementation of this
algorithm. Given a set of parameter vectors and the number of expected
clusters (centroids), "k-Means" generates for each cluster a centroid
that represents the mean vector of that particular set of parameter
vectors. At this moment only the "Hard K-Means" is implemented. Which
means that each initialized center belongs to a single cluster. It is
also possible to generate weighted centroids that are partially member
of two or more clusters. The latter is called "Soft K-Means" and is
not implemented yet.

\section{TMVA\_MCL}
This directory contains the implementation of a number of wrappers.
These are interfaces to multi-class implementations of the algorithms
available in TMVA (available as a part of the ROOT package). This is
done in order to have a common interface to all available algorithms
in pandaroot. After creation of the objects, the control is fully
passed to TMVA. This means that setting and changing the parameters
follows the TMVA guidelines. For further information on how to use,
train and understand these methods, see TMVA manual.
\end{document}