\documentclass{TEMA}
\usepackage[brazil]{babel}   
\usepackage[latin1]{inputenc}  
\usepackage[dvips]{graphics}
\usepackage{subfigure}
\usepackage{graphicx}
\usepackage{epsfig}
\usepackage{hyperref}
\usepackage{framed}
\usepackage{psfrag}
\usepackage{tikz} 
\newcommand{\B}{{\tt\symbol{92}}}
\newcommand{\til}{{\tt\symbol{126}}}
\newcommand{\chap}{{\tt\symbol{94}}}
\newcommand{\agud}{{\tt\symbol{13}}}
\newcommand{\crav}{{\tt\symbol{18}}}
\begin{document}

\title{
     Arboreal identification supported by fuzzy modeling for trunk texture recognition%
     \thanks{Supported by the Coordination for the Improvement of Higher Education Personnel (CAPES).}
}
\author{
     A. BRESSANE%
     \thanks{adriano.bressane@posgrad.sorocaba.unesp.br},
      S{\~a}o Paulo State University (Unesp), Institute of Science and Technology, Campus at Sorocaba city, 03 March Avenue, 511, Brazil
     \\ \\
     F. H. FENGLER%
     \thanks{felipe.fengler@posgrad.sorocaba.unesp.br},
     S{\~a}o Paulo State University (Unesp), Institute of Science and Technology, Campus at Sorocaba city, 03 March Avenue, 511, Brazil
      \\ \\
     S. R. M. M. ROVEDA%
     \thanks{sandra@sorocaba.unesp.br},
     S{\~a}o Paulo State University (Unesp), Institute of Science and Technology, Campus at Sorocaba city, 03 March Avenue, 511, Brazil
       \\ \\
     J. A. F. ROVEDA%
     \thanks{roveda@sorocaba.unesp.br},
     S{\~a}o Paulo State University (Unesp), Institute of Science and Technology, Campus at Sorocaba city, 03 March Avenue, 511, Brazil
       \\ \\
      A. C. G. MARTINS%
     \thanks{amartins@sorocaba.unesp.br},
     S{\~a}o Paulo State University (Unesp), Institute of Science and Technology, Campus at Sorocaba city, 03 March Avenue, 511, Brazil
}
\criartitulo
\runningheads{Bressane et al.}%
{ Arboreal identification supported by fuzzy modeling}

\begin{abstract}
{\bf Abstract}. Due to the natural variability of the arboreal bark there are texture patterns in trunk images with values belonging to more than one species. Thus, the present study analyzed the usage of fuzzy modeling as an alternative to handle the uncertainty in the trunk texture recognition, in comparison with other machine learning algorithms. A total of 2160 samples, belonging to 20 tree species from the Brazilian native deciduous forest, were used in the experimental analyzes. After transforming the images from RGB to HSV, 70 texture patterns have been extracted based on first and second order statistics. Secondly, an exploratory factor analysis was performed for dealing with redundant information and optimizing the computational effort. Then, only the first dimensions with higher cumulative variability were selected as input variables in the predictive modeling. As a result, fuzzy modeling reached a generalization ability that outperformed algorithms widely used in classification tasks, besides of obtaining an almost perfect agreement with the classifier with the best accuracy in the validation tests. Therefore, the fuzzy modeling can be considered as a competitive approach, with reliable performance in arboreal trunk texture recognition.

{\bf Keywords}. soft computing, image processing,
pattern matching, bioinformatics.

\end{abstract}

\section{Introduction}

The usage of computational intelligence in the feature extraction and pattern recognition from biological data has been increasingly studied for supporting the arboreal identification. However, as the studies carried out have focused on the leaves image processing, its techniques are not applicable when the leaf structure is not available, as occurs with deciduous species at certain times of the year. 

As an alternative, the texture recognition in tree trunk images still has few outcomes reported in the literature, in which the predictive modeling has been performed using machine learning algorithms based on k-Nearest Neighbors (\cite{Porebski2007}, \cite{Wan2004}), Artificial Neural Networks (\cite{Huangetal2006}), Support Vector Machine (\cite{Boman2013}, \cite{Fiel2011}, \cite{Huang2006}), and Decision Tree (\cite{Bressane2015}).

By analyzing statistical properties in tree trunk images, \cite{Bressane2015} found that, due to the natural variability of the arboreal bark, commonly its texture patterns have some values belonging to more than one species, i.e, there is an overlap between neighboring subspaces. As a consequence, this overlapping in the pattern matching can lead to an ambiguity during predictive modeling. 

In these cases, there is some uncertainty in relation to what species the sample belongs to, undermining the texture discriminant analysis by means predictor variables with a sharply defined boundary. Therefore, the present study aims to analyze the usage of fuzzy modeling as an approach to deal with the uncertainty in the trunk texture recognition, in comparison with other machine learning algorithms.

In the mid-1960s, the fuzzy set theory has been developed by \cite{Zadeh1965} as an extension of the classical set theory to provide a mathematical treatment for complex phenomena, becoming it popular after 1980s (\cite{Zadeh2008}, \cite{Pedrycz2007}. For that, the fuzzy modeling is a soft-computing method capable of processing uncertain knowledge or data. Thus, by affording a convenient formalism for integrating different kinds of variables, by means of an user-friendly structure with transparency and interpretability, the usage of fuzzy modeling is becoming more and more common, with several applications in the environmental sciences over the years (e.g. \cite{Bressane2016}, \cite{Liu2012}, \cite{Liu2010}, \cite{Lermontov2009}, \cite{Ascough2008}, \cite{Adriaenssens2004}, \cite{Silvert2000}). 

According to \cite{Ishibuchi2001}, the main applications of the fuzzy modeling used to be optimization and control problems. Nevertheless, nowadays many other areas can be highlighted, such as the development of intelligent systems for supporting the decision making, data mining, signal processing, diagnosis, forecasting, regression, and classification from numerical data using pattern recognition based on the graded membership (\cite{Singh2013}). Thereby, the fuzzy modeling can achieve a competitive performance when compared to other machine learning algorithms in classification tasks involving uncertainty, vagueness, partial true, which demand predictors without hard boundaries (\cite{Arunpriya2015}, \cite{Riza2015}).

\section{Methods}

\subsection{Data collection and feature extraction}

The data were collected using a digital camera for capturing outer bark images at different heights of the trunk, at a 50 mm distance around the trees. Due to the three-dimensional shape of arboreal trunk, only a central area was used for extracting features, in order to avoid the distortion at the image edge. Then, using a moving mask with 512 x 512 pixels, 2160 samples were obtained, being 108 of each of the 20 tree species from the Brazilian native deciduous forest, shown in Figure 1.

\begin{figure}[h!]
  \psfrag{_err}[c][c][1][180]{$\|p-\mathcal Ip\|_{L^2(\Omega)}$}
  \psfrag{_h}[c][c]{$h$}
  \psfrag{_h3meios}[l][l]{$h^\frac{3}{2}$}
  \psfrag{_hmeio}[l][l]{$h^\frac{1}{2}$}
  \psfrag{_mini}[l][c]{$Q_h^1$}
  \psfrag{_exp}[l][c]{$Q_h^\Gamma$}
  \centering
  \includegraphics*[width=0.69\linewidth]{Fig1.pdf}  
  \caption{Tree trunk images (512x512 pixels) from: \textit{Anadenanthera falcata} (\textit{Af}), \textit{Anadenanthera macrocarpa} (\textit{Am}), \textit{Bauhinia forficate} (\textit{Bf}), \textit{Caesalpinia peltophoroides} (\textit{Ca}), \textit{Caesalpinia echinata} (\textit{Ce}), \textit{Cedrela fissilis} (\textit{Cf}), \textit{Caesalpinia peltophoroides} (\textit{Cp}), \textit{Ceiba speciosa} (\textit{Cs}), \textit{Centrolobium tomentosum }(\textit{Ct}), \textit{Enterolobium contortisiliquum} (\textit{Ec}), \textit{Erythrina speciosa} (\textit{Es}), \textit{Gochnatia polymorpha} (\textit{Gp}), \textit{Guazuma ulmifolia} (\textit{Gu}), \textit{Hymenaea courbaril} (\textit{Hc}), \textit{Inga vera} (\textit{Iv}), \textit{Piptadenia gonoacantha }(\textit{Pg}), \textit{Schizolobiun parahyba} (\textit{Sp}), \textit{Tibouchina granulosa} (\textit{Tg}), \textit{Tabebuia roseoalba }(\textit{Tr}), and \textit{Zanthoxylum kleinii} (\textit{Zk}).}  
\end{figure}

To reduce the influence of the environmental conditions and image acquisition settings, before starting the features extraction the images have been transformed from RGB (red-green-blue) system to HSV (hue-saturation-value) space. Then, features based on first and second order statistics were extracted using the V channel from the grayscale images. 


The first-order statistical parameters included 6 texture features, equivalent to uniformity, entropy, skewness, smoothness, intensity, and standard deviation, described below from \cite{Gonzales2008}. 

As a measure of the proximity of the gray levels, the uniformity ($u$) is given by:
  
\begin{equation}
u = \displaystyle \sum_{i=0}^{L-1} p^2(z_{i})
\label{u}
\end{equation}
where $L$ correspond to the number of gray levels in the image, $z_i$ is the intensity, and $p(z_i)$ is the image histogram.

The first-order entropy ($e$) measures the randomness in the image, as in:
 
\begin{equation}
e = -\displaystyle \sum_{i=0}^{L-1} p(z_i)\log_2p(z_i)
\label{e}
\end{equation}

The skewness is a measure of the asymmetry ($\mu_3$), and smoothness ($s$) takes in to account the transition of gray shades, respectively obtained by:

\begin{equation}
\mu_3 = \displaystyle \sum_{i=0}^{L-1} (z_i-\mu_1)^2p(z_i)
\label{mi}
\end{equation}
and
\begin{equation}
s = 1 - \displaystyle\frac{1}{1+\mu_{2}^2}
\label{s}
\end{equation}
where ($\mu_1$) is the intensity that returns the gray level average, and ($\mu_2$) is the standard deviation, calculated by:
\begin{equation}
\mu_1 = \displaystyle \sum_{i=0}^{L-1} z_i p(z_i)
\label{mi1}
\end{equation}
and
\begin{equation}
\mu_2 = \displaystyle\frac{\displaystyle\sum_{i=1}^{n}(z_i-\mu_1)^2}{n-1}
\label{mi2}
\end{equation}
where $n$ is the number of image pixels.

In turn, the second order statistics included contrast, correlation, energy, and homogeneity, measured at 16 positions ($\phi$), correspondent to distance between pixels equal to 1, 3, 5 and 7, in the rotation angles 0, 45, 90 and 135 degrees, producing 64 texture features. These descriptors are described below from \cite{Haralick1973} and \cite{Gonzales2009}. 

Contrast ($c$) compares the intensity of neighboring pixels and it is computed by:
\begin{equation}
c_\phi = \displaystyle\sum_{i=1}^{k}\sum_{j=1}^{k}(i-j)^2p_{ij}
\label{c}
\end{equation}
where $k$ is the co-occurrence matrix dimension, $p_i_j$ is probability of satisfying $\phi$.

The correlation ($r$) measures the probability of occurrence of specified pixel pairs, given by:
\begin{equation}
r_\phi = \displaystyle\sum_{i=1}^{k}\sum_{j=1}^{k}\frac{(i-m_{row})(j-m_{col})}{\sigma_{row}-\sigma_{col}} p_{ij}
\label{r}
\end{equation}
Energy ($\varepsilon$) adds the squared elements in the co-occurrence matrix, and homogeneity ($h$) measures the closeness of gray levels in the spatial distribution over image, respectively obtained by:  
\begin{equation}
\varepsilon_\phi = \displaystyle\sum_{i=1}^{k}\sum_{j=1}^{k}p_{ij}^2
\label{epsilon}
\end{equation}
and
\begin{equation}
h_\phi = \displaystyle\sum_{i=1}^{k}\sum_{j=1}^{k}\frac{p_{ij}}{1+|i-j|} 
\label{h_phi}
\end{equation}

From the foregoing, the total number of measured variables amounted to 70 texture features. So taking into account that some features may be highly correlated, an Exploratory Factor Analysis (EFA) has been performed. As a multivariate analysis technique, the EFA finds a coordinate system that maximizes the variance shared among variables, enabling to reduce the data dimensionality and prevent the use of redundant information (\cite{Costello2009}). 

In the new m-dimensional space found by EFA, the standardized original variables ($z$) correspond to linear combinations of underlying factors ($z'$), given by (\cite{Yong2013}):
 \begin{equation}
z_j=a_{j1}z'_{1}+a_{j2}z'_{2}+...+a_{jm}z'_{m}
\label{z}
\end{equation}

For that, the EFA was carried out using the Spearman’s coefficient, a non-parametric alternative regarded as robust for general distributions (non-normal data), the principal factors as extraction method, and the communalities ($h_i$) based on the squared multiple correlations, as in:
\begin{equation}
h_i = \displaystyle\sum_{j=1}^{m}l_{ij}^{2}
\label{hi}
\end{equation}
where $l_i_j$ is correlation between the $i^t^h$ principal factor with $j^t^h$ original variable (texture feature), previously standardized by means of:
\begin{equation}
z_i = \displaystyle\frac{x_i - \bar{x}}{\sigma^{-1}}
\label{zi}
\end{equation}
where $x_i$ is the measured original variable, $\bar{x}$ and $\sigma$ are respectively its mean and standard deviation.

Thus, the features extracted from tree trunk images have been reduced to fewer latent variables (principal factors), which were used as predictors for generating and learning fuzzy if-then rules in the texture patterns recognition.

\subsection{Fuzzy modeling for the pattern recognition }
From the mid-1990s, the development of the fuzzy modeling for classification tasks is relatively recent in comparison to other applications. Notwithstanding, since then several approaches have already been proposed, including space partitioning (\cite{Chi1996}), neural-network-based methods (\cite{Nauck1997}), clustering techniques (\cite{Abe1997}), genetic algorithms (\cite{Gonzalez1999}), and fuzzy partition using certainty grades (\cite{Ishibuchi2001}).  

For the predictive modeling in the present study, we used a fuzzy rule-based classification system, created and described by \cite{Riza2015} as FRBCS.W algorithm, made available in R programming language by means of the ‘frbs’ package. The FRBCS.W algorithm has been developed based on the Ishibuchi's method (\cite{Ishibuchi2001}). 

As aforementioned, the Ishibuchi's method is a learning method from numerical data that consists of the fuzzy partitioning with certainty grades. In its learning process, the antecedent parts of rules are determined by a grid-type fuzzy partition. That is, the portioning occurs dividing the input space of the predictor variables ($x_i$) into regular fuzzy regions, resulting in uniform and symmetrical intervals correspondent to the antecedent terms ($a_i_j$), as can be seen in Figure 2. 

\begin{figure}[h!]
  \psfrag{_err}[c][c][1][180]{$\|p-\mathcal Ip\|_{L^2(\Omega)}$}
  \psfrag{_h}[c][c]{$h$}
  \psfrag{_h3meios}[l][l]{$h^\frac{3}{2}$}
  \psfrag{_hmeio}[l][l]{$h^\frac{1}{2}$}
  \psfrag{_mini}[l][c]{$Q_h^1$}
  \psfrag{_exp}[l][c]{$Q_h^\Gamma$}
  \centering
  \includegraphics*[width=0.7\linewidth]{Fig2.pdf}  
  \caption{Grid-type fuzzy partition: (a) partitioning of the predictor variable - $x_i$; (b) intervals of certainty and uncertainty that comprises the fuzzy region.} 
\end{figure}

By using the grid-type fuzzy partition, the total number of rules ($N$) is determinate by amount of possible combinations of the antecedent terms. For that, rulebase is generating by pattern matching, calculating membership degrees ($\varphi$) of the training data in the antecedents terms ($a_i_j$) of each predictor variables ($x_i$). Thus, the consequent part is defined as the dominant categorical variable ($C_j$) in the fuzzy region corresponding to the antecedents of the rule (\cite{Ishibuchi2001}): 
\begin{eqnarray}
\mbox{Rule}\, R_j: &\mbox{IF}& x_1\; \mbox{is}\; a_{1j}\; \mbox{AND}\, ... \, \mbox{AND}\; x_m\; \mbox{is}\; a_{mj} \nonumber \\
&\mbox{THEN}& C_j\; \mbox{with}\; CF_j,\; j=1,2,...,N 
\label{rule}
\end{eqnarray}
where $x$ is a m-dimensional vector of predictor variables ($x_i$), $CF_j$ is the certainty grade of the rule $R_j$, and $C_j$ is the dominant categorical variable correspondent to output class, determinate taking into account:
 \begin{equation}
\displaystyle\sum_{p\, \in \, class\, C_j} \varphi_j (x_p) = \max \left\{ \displaystyle\sum_{p\, \in \, class\, k}\varphi_j (x_p): k=1,2,...,c \right\}
\label{max}
\end{equation}
where $x_p=(x_p_1,...,x_p_m )$ is a new pattern, and c is the number of output classes.

After generating the predictive model, the classification of new instances is based on a single winner rule, which is determinate by the maximum product of the rule certainty grade ($CF_j$) by the instance compatibility grade in the rule $R_j$ ($\varphi_j$), as in:
\begin{equation}
\varphi_j(x_p).CF_j = \max \left\{\varphi_j(x_p).CF_j : j=1,2,...,N  \right\}
\label{varphiCF}
\end{equation}
where $\varphi_j$ ($x_p$) is the instance compatibility grade given by aggregation of the membership values of its predictor variables vector in the antecedents of the rule. In turn, the rule certainty grade ($CF_j$) is a real number in the interval [0, 1] that works as the weight of rule, given by:
 \begin{equation}
CF_j = \displaystyle\frac{\beta_{class\, C_j}(R_j)-\bar{\beta}}{\displaystyle\sum_{k=1}^{c}\beta_{class\, k}(R_j)}
\label{CFj}
\end{equation}
where
\begin{equation}
\bar{\beta} = \displaystyle\frac{\displaystyle\sum_{k\neq C_j}\beta_{class\, k}(R_j)}{(c-1)}
\label{betabarra}
\end{equation}
and
\begin{equation}
\beta_{class\, k}(R_j) = \displaystyle\sum_{x_p\, \in \, class\, k}\varphi_j (x_p), k=1,2,...,c
\label{betaclass}
\end{equation}

\subsection{Benchmarking experiment}
From the database with 2160 samples we used 70\% randomly selected for the machine learning process. During this process a 5-fold cross-validation was carried out over learning dataset, in order to find the best control parameters setting. Then, a hold-out validation has been performed using the remaining 30\% as testing dataset for assessing the generalization ability of the Fuzzy Rule-Based Classification System (FRBCS) in the trunk texture pattern recognition (Figure 3).

 \begin{figure}[htb!]
  \psfrag{_err}[c][c][1][180]{$\|p-\mathcal Ip\|_{L^2(\Omega)}$}
  \psfrag{_h}[c][c]{$h$}
  \psfrag{_h3meios}[l][l]{$h^\frac{3}{2}$}
  \psfrag{_hmeio}[l][l]{$h^\frac{1}{2}$}
  \psfrag{_mini}[l][c]{$Q_h^1$}
  \psfrag{_exp}[l][c]{$Q_h^\Gamma$}
  \centering
  \includegraphics*[width=0.7\linewidth]{Fig3.pdf}  
  \caption{Split of database for the learning process and to assessing the generalization ability based on testing dataset.} 
\end{figure}

Furthermore, as a reference to assess the performance from the fuzzy-based approach, a benchmarking experiment has been carried out using the same database for training, checking and testing other machine learning algorithms as shown in Table 1.

\begin{center} 
{\footnotesize {\bf Table 1.} Machine learning algorithms considered for performance comparison and control parameters settings adjusted during the learning process, which provide the best results in the cross-validation over the checking dataset.}

\vspace{0.2cm}
\begin{tabular}{lll}
\hline \hline
Learning algorithm & Control parameters settings & Available in 
\\ \hline 
\hline
Fuzzy-Based System & frbcs.w, mf: gaussian,  & Package `{\em frbs}' \\
(FRBCS) & t-norm: product, &  R language \\
 & antecedent terms: 23 & \\ \hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Boosted Rule-Based  & subset: false, no global   & Package `C5.0' \\
Model (C5) &  pruning: false, CF: 0.25, & R language \\
& trials: 100 & \\\hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Cascade-Correlation   & kernel: sigmoid and & Algorithm `CNN'\\ 
Neural Network (CNN) & gaussian, neuron: 0-$10^3$, & C language \\
  & candid. $10^2$, epoch $10^3$ \\
  & overf.: prune to opt. size \\ \hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
$k$-Nearest Neighbors & model: knn kernel, & Pack `CORElearn'  \\
(KNN) & weighting: gaussian & R language \\
& kernel, type: probability \\ \hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Probabilistic Neural & sigma: each var., steps 20, & Algorithm PNN \\
Network (PNN) &  kernel: gaussian, prior prob.:  &  C language \\
 &  frequency distribution &  \\\hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Multilayer Perceptron & layers: 3, overfitting control:  & Algorithm MLP \\
Network (MLP) & min. holdout val. error over  & C language \\
 & 10\% train, function: logistic  & \\ \hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Random Dec. Forest & importance: true, & Pack `randomForest' \\
(Random Forest)  & proximity: true, & R language \\
& number of trees: 300  \\\hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Single Decision Tree & minimum node to split: 3, & Algorithm SDT \\
(SDT) & maximum tree levels: 300, & C language \\
  &  overfitting control: prune &  \\
  &  to min cross-val error   \\ \hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 Stochastic Gradient & trees: 300, depth: 8, & Algor. `TreeBoost'\\
 Boosting (TreeBoost) & min size node to split: 10, & C language\\
 & prune series to min. err,  &  \\
 & minimum trees in series: 10  &  \\ \hline
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Support Vector  & type: bound-constraint, & Package `kernlab' \\
Machine (SVM) & kernel function: gaussian, &  R language \\
& sigma:  0.1, C: 24 &   \\ \hline
\end{tabular}
\end{center}
\\ \\

Based on the testing results, the learning algorithms performance has been assessed according to the overall accuracy ($\theta$), which measures the ratio of samples correctly classified by the total number of samples ($n_T$), as in:
\begin{equation}
\theta = n_{T}^{-1} \displaystyle\sum_{i=1}^{n_{sp}}TP_{sp_{i}}
\label{theta}
\end{equation}
where $TP_s_p_i$ is the total number of true positive samples, and $n_s_p$ is the total number of tree species.

In addition, the Kappa index ($K$) has also been used to assess how well the fuzzy-based model agrees with an already established algorithm (gold standard), which achieving the best performance during the experiments, by means of (\cite{Carletta1996}):
\begin{equation}
K = \displaystyle\frac{\theta_1 - \theta_2}{1 - \theta_2}
\label{K}
\end{equation}
and
\begin{equation}
\theta_2 = \displaystyle\frac{1}{n_{T}^{2}}\sum_{i=1}^{n_{sp}}\left(V_{sp_i}.I_{sp_i} \right)
\label{theta2}
\end{equation}
where $\theta_2$   is the proportion of times for which an agreement is expected by chance, $I_s_p_i$ is the total number of samples predicted as belonging to the species $i$ ($sp_i$), and $V_s_p_i$ is the total number of samples actually belonging to $sp_i$  according to the algorithm adopted as a reference (gold standard).

\section{Results and discussion}

Regarding the requirements for data pre-processing using multivariate analysis, the Cronbach's alpha equivalent to 0.9 indicated an excellent internal consistency, and the Kaiser-Meyer-Olkin equal to 0.97 confirmed a good sampling adequacy, verifying sufficient conditions to perform the Exploratory Factor Analysis (EFA), whose result is shown in Figure 4. 

\begin{figure}[h!]
  \psfrag{_err}[c][c][1][180]{$\|p-\mathcal Ip\|_{L^2(\Omega)}$}
  \psfrag{_h}[c][c]{$h$}
  \psfrag{_h3meios}[l][l]{$h^\frac{3}{2}$}
  \psfrag{_hmeio}[l][l]{$h^\frac{1}{2}$}
  \psfrag{_mini}[l][c]{$Q_h^1$}
  \psfrag{_exp}[l][c]{$Q_h^\Gamma$}
  \centering
  \includegraphics*[width=0.7\linewidth]{Fig4.pdf}  
  \caption{Eigenvalues and cumulative variability explained by the first 20 latent variables (principal factors) produced from the Exploratory Factor Analysis.} 
\end{figure}

From the Figure 4, we find that the first 20 principal factors explain 99.0\% of the cumulative variability. Therefore, the EFA was capable of reducing the data dimensionality and at the same time retaining almost all information available in the 70 original variables. Thus, these principal factors were used as predictor variables in the modeling process, affording the results shown in Table 2. 

\begin{center} 
{\footnotesize {\bf Table 2.} Performance of the machine learning algorithms in the benchmarking experiments, based on the settings that reach the best accuracy over checking dataset during the learning process, using the first 20 principal factors as predictor variables.}

\vspace{0.2cm}
\begin{tabular}{lccc}\hline \hline
 & \multicolumn{2}{c}{5-fold Cross-validation (\%)} & Hold-out validation (\%) \\
Learning algorithms & Training data  & Checking data  & Testing data  \\ \hline \hline
& & & \\
FRBCS  & 100 & 93.5 & 94.0 \\
& & & \\ %\hline
C5 & 100 & 85.3 & 86.5 \\ 
 & & & \\
CNN & 86.2 & 76.5 & 78.5 \\
 & & & \\
KNN & 96.3 & 89.1 & 89.7 \\
  & & & \\
PNN & 100 & 95.3 & 96.1 \\
 & & & \\
MLP & 94.9 & 88.5 & 90.8 \\
  & & & \\
Random Forest & 100 & 88.9 & 89.5 \\
  & & & \\
SDT & 91.6 & 72.3 & 72.3 \\
 & & & \\
TreeBoost & 100 & 85.7 & 87.3 \\
  & & & \\
 SVM & 100 & 95.9 & 96.2 \\ 
  & & & \\ \hline
\end{tabular}
\end{center} 
 \\ \\

In general, each machine learning algorithm has properties which can provide better performance than others, depending on the characteristics of the case under analysis. Thus, the performance from the algorithms in the benchmarking experiments has been discussed taking into account such properties. In this sense, by analyzing Table 2 it is noted three performance groups according to the accuracy over testing dataset. 

With accuracy less than 80\%, in the first group are the Single Decision Tree (SDT) and Cascade-Correlation Neural Network (CNN). The CNN is a self-organizing network that determines its own size and topologies, by adding neurons to the architecture. The SDT also grows adding nodes to its structure, both for reaching greater preciseness during the learning process. As a consequence, these algorithms can lead to an overfitting to the train data, losing some generalization ability. Then, we use an overfitting control pruning the models to minimum cross-validated error over checking dataset. Despite this, the CNN performance decreases from 86.2\% during training to 78.5\% in the testing, and SDT from 91.6\% to 72.3\%, respectively. Therefore, these findings can be considered as an indicator of the complexity of the arboreal trunk texture, making hardier the classification task.

Capable of handling this issue better than the single tree-based model (SDT), the Decision Tree Forest (Random Forest), Stochastic Gradient Boosting (TreeBoost), and Boosted Rule-Based Model (C5.0) are in the second group of algorithms with medium-performance in the testing (from 80 to 90\%), along with \textit{k}-Nearest Neighbors (KNN). 

The Random Forest and TreeBoost are ensembles based on different strategies of creating a collection of decision trees. The Random Forest uses the bagging (Bootstrap Aggregating) technique for creating trees grown in parallel, which afforded a generalization ability of 89.5\%. On the other hand, the TreeBoost uses a sequential training (boosting) that resulted in a series of trees with 87.3\% accuracy. Similarly, C5.0 is a voting classification algorithm also based on a boosting technique to create a collection of rules that achieved 85.3\% accuracy. The boosting usually provides more accuracy than bagging strategy, except when there is noise in data, such as outliers (\cite{Bauer1999}). Therefore, as the Random Forest outperforms the boosting-based models in the present analysis, we can consider some influence of outliers. Notwithstanding, as the bark texture in the arboreal trunk is a biological feature subject to imperfections, these outliers has not been removed because they can be caused by a natural variability. In turn, the KNN is a non-parametric algorithm of instance-based learning, in which a pattern is recognized by majority voting according to the similarity with the \textit{k} nearest neighbors. By using kernel functions to weight the vote of the neighbors, the KNN provides 89.7\% accuracy, slightly higher than ensemble-based models. 

The third group with high-performance, more than 90\% of accuracy over testing dataset, has been formed by the Support Vector Machine (SVM), Probabilistic Neural Network (PNN), Fuzzy Rule-Based Classification System (FRBCS), and Multilayer Perceptron Neural Network (MLP).

The SVM operates by finding an n-dimensional hyperplane in order to optimize the separation of different data classes. Although similar to artificial neural networks (ANN) in some aspects, the SVM is less prone to overfitting and has good adequacy for dealing with high dimensional spaces and outliers, because it selects the most suitable features and considers only the most relevant points. Besides that, the SVM has a solution global and unique whilst the ANN can suffer from multiple local minima. Thus, in our analysis the SVM provides a significant improvement in comparison with most of the learning algorithms, reaching 96.2\% over testing dataset. 

Among the neural networks, the PNN performs classification based on the estimation of probability density functions, capable of dealing with erroneous data and computing nonlinear decision boundaries as complex as necessary, in order to approach the Bayes optimal, i.e., to minimize the error in a probabilistic manner as much as possible. Thus, relatively insensitive to outliers, the PNN achieves virtually the same performance than SVM, with 96.1\% accuracy over testing dataset. In turn, the MLP allows nonlinear mappings, using logistic activation functions and back-propagation algorithm for adjusting the neural network weights. To prevent overfitting, we use the MLP architecture with minimum validated error during the learning process, significant generalization ability correspondent to 90.8\% accuracy, but still even less than PNN one.

Regarding the FRBCS, to be the focus of the present study, in the following we approach a more detailed description on the learning machine process, before presenting its accuracy over testing dataset. During the training we found that the gaussian curve membership function afforded a performance better than ones achieved with triangular and trapezoidal-shaped functions. Then, using gaussian functions for the fuzzy partitioning, variations of antecedent terms number has been assessed in combination with minimum and product t-norm (Figure 5).

\begin{figure}[h!]
  \psfrag{_err}[c][c][1][180]{$\|p-\mathcal Ip\|_{L^2(\Omega)}$}
  \psfrag{_h}[c][c]{$h$}
  \psfrag{_h3meios}[l][l]{$h^\frac{3}{2}$}
  \psfrag{_hmeio}[l][l]{$h^\frac{1}{2}$}
  \psfrag{_mini}[l][c]{$Q_h^1$}
  \psfrag{_exp}[l][c]{$Q_h^\Gamma$}
  \centering
  \includegraphics*[width=0.7\linewidth]{Fig5.pdf}  
  \caption{Performance of different setings of the fuzzy rule-based classification model, from the variations of antecedent terms number in combination with minimum and product t-norm.} 
\end{figure}

Analyzing Figure 5 it is noted that, for both t-norms (minimum and product), about 10 antecedent terms were sufficient for the fuzzy rule-based classifier to reduce the error to zero during the training, but a higher accuracy over checking dataset required a greater number of terms. In that regard, one of the main aspects to highlight is the difference of performance provided by minimum and product t-norm. 

Both product and minimum t-norm allowed aggregating the predictor variables via fuzzy intersections, modeling the simultaneous occurrence of patterns that characterize the same arboreal species. However, the product t-norm operates multiplying all the membership values and, in contrast, the minimum t-norm takes into account only the lowest membership during the aggregation process (Figure 6). 

\begin{figure}[h!]
  \psfrag{_err}[c][c][1][180]{$\|p-\mathcal Ip\|_{L^2(\Omega)}$}
  \psfrag{_h}[c][c]{$h$}
  \psfrag{_h3meios}[l][l]{$h^\frac{3}{2}$}
  \psfrag{_hmeio}[l][l]{$h^\frac{1}{2}$}
  \psfrag{_mini}[l][c]{$Q_h^1$}
  \psfrag{_exp}[l][c]{$Q_h^\Gamma$}
  \centering
  \includegraphics*[width=0.7\linewidth]{Fig6.pdf}  
  \caption{Aggregation process of predictor variables ($x_i$) in the rules 1 ($R_1$) and 2 ($R_2$), using minimum and product t-norm.} 
\end{figure}

In Figure 6 we have a case in which a given sample has features (pattern values) belonging to more than one arboreal species, i.e, a sample with pertinence in both consequent classes of the rules 1 and 2, but with different membership degrees. By using the minimum t-norm the most critical condition given by the lowest membership become decisive, and hence we have a more rigorous classifier, but which can be naive by disregarding the other predictor variables. 

As a consequence, for the case in Figure 6 the minimum t-norm would result in the arboreal species identification supported by the rule 2 ($\varphi_m_i_n(R_2)>\varphi_m_i_n(R_1)$). However, the sample has higher membership in the majority of the fuzzy regions correspondent to the consequent of the rule 1, as computed by the product t-norm ($\varphi_p_r_o_d(R_1)>\varphi_p_r_o_d(R_2)$). 

Thus, by taking account all the predictors, the product t-norm seems to afford a more assertive predictive modeling, so that it provided better performance than minimum t-norm in all settings assessed in the present study (see Figure 5).

During the learning process we can note a tendency of accuracy improvement over checking dataset with the increase of the fuzzy regions number, which was more significant up to about 15 antecedent terms. This improvement seems to occur due to the increase of the decision areas ($D_j$) formed by each fuzzy if-then rule, as can be seen in Figure 7. 

\begin{figure}[h!]
  \psfrag{_err}[c][c][1][180]{$\|p-\mathcal Ip\|_{L^2(\Omega)}$}
  \psfrag{_h}[c][c]{$h$}
  \psfrag{_h3meios}[l][l]{$h^\frac{3}{2}$}
  \psfrag{_hmeio}[l][l]{$h^\frac{1}{2}$}
  \psfrag{_mini}[l][c]{$Q_h^1$}
  \psfrag{_exp}[l][c]{$Q_h^\Gamma$}
  \centering
  \includegraphics*[width=0.7\linewidth]{Fig7.pdf}  
  \caption{Increase in decision areas formed by the fuzzy if-then rules as consequence of the increment of the antecedent terms numbers.} 
\end{figure}

Nevertheless, after a certain point there was a performance fluctuation that demanded an exhaustive search for the best accuracy over checking dataset (93.5\%), which was found using gaussian curve membership function, product t-norm, and 23 antecedents terms. Then, by using this setting the fuzzy-based model reaches 94.0\% accuracy over testing dataset. Furthermore, considering the SVM as a gold standard, the FRBCS obtained 0.95 Kappa index.

\vspace{2cm}
\section{Conclusions}
In the present study we analyzed the enforceability of fuzzy-based pattern recognition for dealing with complexity related to natural variability of texture in the arboreal trunk, which can cause uncertainties due to ambiguity in the pattern matching. 

By providing a nonlinear and smooth discriminate function, with the differential of taking into account the graded membership of a given sample in the matching patterns of different classes (arboreal species), the Fuzzy Rule-Based Classification System (FRBCS) afforded a high generalization ability, which outperformed the most of assessed learning algorithms, including ensembles with a lot of classifiers and kernel-based models, such as some artificial neural networks, widely used in pattern recognition tasks. 

Furthermore, the Kappa index indicates that the FRBCS had an almost perfect agreement with the classifier with the best accuracy during the benchmarking experiment. Therefore, the fuzzy modeling can be considered an alternative approach, with a competitive and reliable performance for arboreal trunk texture recognition, in order to support the arboreal species identification using computational intelligence.

\bibliographystyle{ieeetr}
\bibliography{Bressane2}
\end{document}
