Discrimination of civet coffee using visible spectroscopy

Civet coffee is considered as highly marketable and rare. This specialty coffee has a special flavor and higher price relative to regular coffee, and it is restricted in supply. Establishing a straightforward and efficient approach to distinguish civet coffee for quality; likewise, consumer protection is fundamental. This study utilized visible spectroscopy as a nondestructive and quick technique to obtain the absorbance, ranging from 450 nm to 650 nm, of the civet coffee and non-civet coffee samples. Overall, 160 samples were analyzed, and the total spectra accumulated was 960. The data gathered from the first 120 samples were fed to the classification learner application and were used as a training data set. The remaining samples were used for testing the classification algorithm. The study shows that civet coffee bean samples have lower absorbance values in visible spectra than non-civet coffee bean samples. The process yields 96.7 % to 100 % classification scores for quadratic discriminant analysis and logistic regression. Among the two classification algorithms, logistic regression generated the fastest training time of 14.050 seconds. The application of visible spectroscopy combined with data mining algorithms is effective in discriminating civet coffee from non-civet coffee.


I. INTRODUCTION
Civet coffee is produced using an unconventional process called digestive bio-processing, a regular coffee bean fermentation in the civet cat's intestine. The partially metabolized and excreted coffee beans by the Alamid were gathered, washed, and dried [1]. Civet coffee is considered highly marketable and rare [2], making it prone to fraud and scams. Civet coffee enthusiast condemns cheap imitation of kopi luwak (civet coffee) where ordinary coffee bean varieties are mixed or claimed to be an original civet coffee [3]. For the time being, there is no globally agreed way of determining whether a bean is civet coffee [4].
Related studies regarding civet coffee have been carried by various researchers to validate civet coffee's authentic properties. Several types of research yield results that characterize civet coffee in terms of aroma and volatile [4], sensory attributes [5], metabolites [6], caffeine, α-tocopherol, and chosen elements such as carbon (C) & oxygen (O) [7], protein, sugar, fat, and acidity [8].
Some studies applied spectroscopy, that deals with the analysis of how light acts towards a given object [9], in discrimination of civet coffee. Suhandy and Yulia [10] used chosen fluorescence spectra and the SIMCA method to differentiate the two coffee varieties in terms of specificity and sensitivity values. On the other hand, another study utilized UV-Visible spectroscopy combined with various chemometrics methods [11] in which produced a 100 % satisfaction in civet coffee classification. However, the major disadvantage with the mentioned studies [4]- [11] is that they require the samples to be grounded or mixed with certain chemicals, which may be time-consuming and complicated. Some samples will also be wasted because they cannot be reused for other purposes.
Arboleda [12] used Near-Infrared spectroscopy (NIRS), including Artificial Neural Network (ANN), to classify between civet coffee and Arabica coffee, which produced a classification result of 95% to 100% range. The study does not require grounded coffee beans as well as chemical mixtures. Nevertheless, the account that implements spectroscopy for discrimination of civet coffee that used whole coffee bean and did not use chemicals is very limited.
Spectroscopic methods have significant benefits, which are speed and cost per study [13]. The use of visible spectroscopy analysis is increasing and can provide an objective, repeatable, undestructive method to monitor and assess food quality and other commodities in agriculture [14]. Ultraviolet and visible spectroscopy represents the oldest and easiest process used for food authentication [15], which is when the food meets its label definition [16].
Quality control has been the pillar of food safety in the industry, which has begun to incorporate advanced food safety and quality management programs [17]. As a response to the rising food safety concerns -regulations, policy, and guidelines on food protection and quality control have been established [18]. Authentication is becoming an important topic in coffee trading. Specialty coffee is restricted in supply because of special flavor and high price relative to regular coffee. The spectroscopic approach used for coffee authentication is most often paired with the strength of methods involving pattern recognition [10]. Most used in various plant studies for its two-dimensional aspect [13].
Classification aims to classify the data wherein the data included has the category type [19]. The problem occurs when a subject has to be classified into a specified group, given a set of identified characteristics aligned with that subject [20]. Pattern identification stands for the methods where the information of the sample membership type is used for classification purposes [21]. Machine learning is intended to teach systems on how to process data and gain knowledge from it [22]. A classifier is an algorithm mapping a specific category of input data using similar features [23]. The two sorts of classifiers are binary classifiers and multi-class classifiers [24]. The binary classification is concerned with two classes. On the other hand, multicast classification deals with more than two classes for classification [25]. The classification model shall be built based on the training and data collection with defined groups [26]. The classifiers' training gives way to identify the best possible way to separate between two data classes and ensure accuracy [27].
This study was carried out to determine the difference of civet coffee to non-civet coffee in visible spectra. Regarding the authors' awareness, little to none evidence of discrimination regarding civet coffee using visible spectroscopy has been documented in the literature. This study also aims to identify whether data mining algorithms can distinguish between the two coffee varieties. The research concerns the efficacy of the procedure in the context of discriminating civet coffee.

A. Research design
Civet coffee was made from the coffee berry, which the Alamid, scientifically named Paradoxorus Hermaphroditus, digested [2]. The distinct flavor of the said coffee variety was hard to verify. In cheap coffee imitations, ordinary coffee bean varieties are mixed or claimed to be an original civet coffee [3]. Due to its anticipated marketable characteristics, it became a target for fraud and imitation where ordinary coffee beans are sold as pure civet coffee or civet coffee mixed with ordinary coffee beans.
This study aims to utilize visible spectroscopy to determine civet coffee through the use of a classification algorithm. The MATLAB R2018A was used to access and preprocess spectral data from civet coffee and noncivet coffee beans. It includes different algorithms that were used to train the gathered data. The accuracy of each classifier algorithm was evaluated to choose the algorithm that would be used for testing.

B. Research procedure
The research procedure was divided into four main tasks: sample preparation, data collection and variable selection, training of classifier algorithms, and evaluation of classifier performance. Raspberry Pi Zero was used to gather spectral data from the visible sensor and stored directly on a USB flash drive for data set compilation. Classifier Learner app from MATLAB software was used for data processing. A total of 60 civet coffee beans and 60 non-civet coffee beans samples were used to train classifier algorithms, while 20 samples of both civet and non-civet coffee beans were used as a test sample to evaluate the chosen classifier algorithm if it properly classifies civet and non-civet coffee.

Sample preparation
One hundred sixty Robusta coffee bean samples were used for data gathering in this study. Samples were split equally -80 Robusta coffee beans, which was given to a wild civet category, was considered the first class. In comparison, the succeeding class with 80 Robusta coffee beans were considered as the non-civet coffee sample for testing. Civet coffee beans excreted by a wild civet cat were collected. The samples were shown in Figure 1. Both civet and non-civet coffee beans have been washed and sun-dried. The parchment layer was removed for civet beans while the outer skin, pulp, or mesocarp layer and the parchment layer were extracted for non-civet beans. The samples were stored around 20°C-25°C or at room temperature.

Data collection
The visible sensor used in this study is AS7262. The wavelengths in the visible range detectable by the sensor are 450nm, 500nm, 550nm, 570nm, 600nm, and 650nm, each with 40nm of full-width half-max detection. This sensor gathered reflectance from the sample. The spectral data gathered was converted from .csv extension to excel data (.xls) format and converted to absorbance. The absorbance is determined by measuring the amount reflected and transmitted when there is an incident of light on the object using Eq. 1 that satisfies Beer-Lambert's law [28]. A denotes absorbance, Io is the original intensity of light, and I is the transmitted intensity of light.

Variable selection
Data cleaning was carried out to remove unnecessary, incomplete, and fluctuating data [29]. It aimed to select data that are essential to the study for generating more accurate results. The wavelengths with large absorbance values differences have been selected as the features to be used in the model training. Upon choosing the features to be used, the principal component analysis or PCA in Classification Learner was used to minimize dimensions to avoid overfitting.

Classification
MATLAB was used for development, study, and design through simulation, application development, data processing, analysis, and mathematical computations [30], [31]. Classification Learner App from this software was used to train the data gathered from the samples. The application covers five categories of classifiers and includes a total of 23 classifiers [32]. Gathered data were analyzed, and relevant features were selected as training set for higher accuracy. This feature selection approach was used to minimize data dimensions and often includes choosing an important element of the primary data set [33].
Moreover, a validation method was also used to prevent overfitting to the training dataset. The 25-fold Cross-validation scheme was used to split the data into 25 disjoint sets. The data is grouped by k sets of data, in which the value of the fold is the k [34]. By this validation scheme, the data was set into 75% for training and 25% for testing. A total of 160 samples were used for data gathering, 75% of which is 120 samples are used for training, and 25% of which is 40 samples used for testing. This approach provides a reasonable approximation of the classification models' accuracy that has been trained with the dataset. Table 1 shows all the classification algorithms used to train the gathered data.

C. Evaluation
The measurement criteria used to compare the two classifiers are accuracy and precision [35], [36]. Accuracy defines the ratio of summation of true positive and true negative to the total events (Eq. 2). TP or True Positive indicates the total positive assumptions from the real positive sample, while FP or False Positive indicates the total positive assumptions from the real negative one. Moreover, TN or True Negative represents the total negative assumptions from the real negative type, while FN or False Negative indicates the total negative assumptions from the real positive set. On the other hand, precision describes the ratio of true positive to the total predicted positive observations (Eq. 3).

A. The absorbance values of civet and non-civet coffee
The differences between the spectral values are 550 nm, 600 nm, and 650 nm are shown in Table 2. The graph of the absorbance values for civet and non-civet is shown in Figure 2. It can be observed that civet coffee has lower absorbance in visible spectra in comparison to non-civet coffee.

B. Training of classifier algorithms
A total of 60 civet coffee beans and 60 non-civet coffee beans were used to create a training dataset. All types of classifier models in the Classification Learner app was trained. All the accuracy of each classification algorithm was seen at the History List once they finish training. The highest overall accuracy was highlighted in a box. In choosing the most reliable and effective classifier model, such significant factors, including training period and accuracy, have to be evaluated. Table 3 shows the highest accuracy as 98. The confusion matrix was used to evaluate the accuracy percentage of Quadratic Discriminant Analysis and Logistic Regression. It shows how civet coffee beans and non-civet coffee beans are accurately classified according to their group. Table 4 indicates that Quadratic Discriminant Analysis properly classifies 100% of civet coffee beans and approximately 97 % of non-civet beans. Moreover, it was clearly shown in Table 5 that Logistic Regression correctly classifies approximately 98 % of both civet and non-civet coffee beans.

C. Testing of the final models
A selection of 20 civet coffee beans and 20 noncivet coffee beans were used to test the final models. It can be seen in Table 6 Figure 2. Absorbance values of civet and non-civet coffee bean Analysis and Logistic Regression accurately classify the samples according to their group with 100 % accuracy, which is slightly better than [12] with 95 % to 100 % accuracy that uses ANN classification. This study successfully implements spectroscopy for civet coffee discrimination that used whole coffee bean and did not use chemicals in contrast with [4]- [11]. It can be observed in Figure 3 that in terms of accuracy, the Quadratic Discriminant Analysis is higher. However, in terms of precision, Logistic Regression scored 98 %, which is approximately 1% higher than Quadratic Discriminant Analysis. In advantage, the Logistic Regression is simple and precise, which produces a good prediction. It is also commonly used in fraud detection in various research [37]- [39] -One of the study's main objectives.

IV. CONCLUSION
The application of visible spectroscopy in combination with a classification algorithm was effective in discriminating civet coffee from non-civet coffee. Classification models correctly define civet and non-civet by their absorbance levels. The classification models, namely quadratic discriminant analysis and logistic regression, yield 100% accuracy for both civet and non-civet classifications. Among all the classifier models, logistic regression is the most effective classifier since it has high accuracy, precision, and fastest training time of 14.050 seconds.

ACKNOWLEDGMENTS
The author(s) would like to express their sincere gratitude to the faculty members of Cavite State University-Indang for providing invaluable guidance, comments, and suggestion about the study. The authors would also like to thank Mrs. Rizalyn Gacot Latube for the samples used in the study and sample preparation guidance.