Segmentation-Based Fractal Texture Analysis (SFTA) to Detect Mass in Mammogram Images

In Indonesia, the most cancer cases were breast cancer, namely 58,256 cases or 16.7% of the total 348,809 cancer cases. A system is required to assist the expert in detecting breast cancer in women by identifying mammogram images. Abnormalities in a mammogram are determined in part of texture with a particular form and specific limit, usually called a ‘mass.’ Image acquisition is perceived as the first step, followed by segmentation using the k-means and the thresholding. Image segmentation undergoes the morphological analysis steps using opening and masking methods, after feature extraction processing by SFTA, using Support Vector Machine (SVM) for classification processing. The obtained research result revealed an accuracy value of 90%, a precision value of 87.75%, a recall value of 93.33%, and an F1-Score of 90.32%, with the number of thresholds (nt) of SFTA amounting to 3.


INTRODUCTION
Cancer is one of the deadliest diseases globally, with a growth rate and excessive cell growth and uncontrolled (Arafah & Notobroto, 2017) (Larasati & Prawira, 2018) (Supriyanti, 2014). Breast cancer is one of the examples. It usually grows in the lobule cells, namely the glands that produce milk and the ducts (Biswas et al., 2016) (Fajrin et al., 2015). According to the World Health Organization in 2018, around 627,000 women died from breast cancer or 15% of every death caused by breast cancer in women. In Indonesia, it shows that most cancer cases are breast cancer, with a total case of 58,256 =or 16.7% of 348,809 cases of cancer.
Mammography is a special screening and radiological examination process using a low-dose of x-rays (Nur, 2014) (Listia et al., 2014) to identify abnormalities in the breast such as cancer (Junita, 2017). The result of mammography is called a mammogram. The presence of a mass indicates the abnormality in the mammogram image. Diagnosing breast cancer on mammogram images requires the skills and experience of a radiologist. Computer-Aided Diagnosis (CAD) is a system that can be used as a problem solver in diagnosing and identifying breast cancer (Nababan et al., 2017), where CAD is expected to help radiologists detect breast cancer. CAD has two stages: segmentation on mammogram images and a step for detecting breast cancer (Hariraj et al., 2017).
Based on this background, this study designed a system to detect masses in mammogram images using the Segmentation-Based Fractal Texture Analysis (SFTA) method. Several studies have been performed to detect mammogram masses, such as (Setiawan & Putra, 2019) using the k-means method for segmentation, Gray Level Co-Occurance Matrix (GLCM) for feature's texture extraction, and SVM for classification. The results obtained an accuracy value of 80% for the classification of normal or abnormal mammogram images. Research by (Suresh et al., 2019) conducted segmentation with ARKFCM and hybrid for feature extraction with GLCM and Histogram of Oriented Gradients (HOG) with DNN classification obtained an accuracy value of 98.8%.
The previous study's weakness lies in the use of GLCM that extracted the value from the area of mass, and they only used four value features. Research (Suresh et al., 2019) conducted a hybrid GLCM with HOG to extract cancer cells' optimal feature value. The HOG calculates the gradient orientation and illumination of the edges or boundaries. The cancer cell area is called the mass, where there are textural patterns with specific shapes and borders (Junita, 2017). The texture feature can be extracted from certain areas and borders on the mass with the SFTA method based on the fractal value on an image's edge (Öztürk & Akdemir, 2018). STFA also produced the mean value of the grey area (pixel counting).
A preliminary study related to the SFTA method was also conducted (Costa et al., 2012). This study compares the feature extraction methods used, namely GLCM, Gabor filter, and SFTA for image retrieval and image classification. These studies show that the SFTA algorithm is simple but effective because SFTA extraction is 3.7 times faster than Gabor filters and 1.6 times faster than GLCM. Research by (Öztürk & Akdemir, 2018) compared the texture feature extraction and histopathological image classification. The study results were SFTA feature extraction, and SVM classification resulted in better accuracy than the others at 94%.
Based on the review of the literature, this study aims to measure the accuracy of the SFTA method in obtaining the fractal dimension, mean, and area features of the mammogram mass and applying the Support Vector Machine (SVM) method to classify the mammogram mass

B. Segmentation
The mammogram image entered into the system in the image acquisition process then carried out the segmentation process. The segmentation stage in this study uses the k-means clustering and thresholding algorithm. K-means is the process of classifying n objects with attributes into groups of k, where k <n (Amaliah et al., 2018) (Setiawan & Putra, 2019). This research used k = 4, then repeatedly searched the centroid's closest distance. The steps are as follows.
1. Determine the number of clusters k. The value of k = 4 is used for this process. This value of k is the number of segments generated later where k is positive.
2. Determine the initial centroid position from the image pixel value with coordinates f (x, y). 3. Calculate the distance between the centroid and other objects using the euclidean distance using Equation (1) (1) Where: d(x,y) = Euclidean distance's result n = number of data xi = data x to-i yi = data y to-i 4. Pixel grouping is done by assigning image pixels to the nearest cluster between k. This closest distance is the smallest value from the euclidean distance results. 5. Update the centroid by calculating the mean of the pixels assigned in the appropriate cluster. Compared with the previous centroid, if the centroid changes, repeat the process three and four until the centroid is stable.
The k-means clustering results are carried out at the thresholding stage with a value of T = 158. The value of each pixel that exceeds or is equal to the pixel value becomes 255, while if it is below the threshold, it becomes 0.

C. Morphology
The thresholding process results are used as input to the morphological process, namely the form operation, consisting of two arrays. The first array as an image input carried out by the morphological, while the second contains strel or structure element (SE) (Luthfi et al., 2019). The forms of morphological operations used in the digital image process are dilation, erosion, opening, and closing. The morphology used in this study is opening.
The first opening stage is the erosion process, which then continued with the dilation process. The strel process has been determined and matched with each pixel input of the binary image. Each pixel is placed with the axis point. The strel used in this study is 15x15 in size with a square shape.
1. Erosion is the process of thinning or shrinking the edges of a pixel by changing the value to 0 or background pixels on the coordinate axis point of the object image that neighbors outside part of the strel, 2. The erosion results obtained then continued to the dilation process, which is the opposite of the erosion process by thickening or enlarging the pixels' edges. A pixel that is crushed with a strel will have a 255 or white axis center.
Masking is performed to the result of opening processes to return the pixel value to the original image acquisition process. The flowchart of the morphology can be seen in Figure 4.  The results of the morphological stage then go through the texture feature extraction process. Texture-based feature extraction is an example of the GLCM method, but this method has the disadvantages of high error rates, long execution processes, and low classification accuracy (Edwin et al., 2017). This study uses the SFTA feature extraction method for fractal feature extraction (Ergen & Baykara, 2014). The process of this method is divided into two parts. The first is the Two Threshold Binary Decomposition (TTBD) process in the gray image, converted into a different binary image. The second stage is the feature extraction stage of each binary image, namely obtaining the fractal dimension, mean, and area. In the TTBD algorithm's first stage, the T threshold value is calculated using multilevel otsu thresholding (Usha & Perumal, 2019). The threshold value is chosen based on the smallest within-class variance (Paramkusham et al., 2018). The Otsu algorithm is applied to each image until the threshold amount reaches the nt (number of holding) value using Equation (2) and Equation (3).
The number of holding is a user-defined parameter used in this study, namely nt = 3. Equation   2 is the first set, and equation 3 is the second set to produce a binary image.
Where: ∆ Where: The boundary of the binary image is used to find the fractal dimension value. This value is obtained using box-counting (El-henawy et al., 2016) to cover the input image with a box shape and count how many squares cover the image, and repeat it a pattern structure is formed (Shanmugavadivu et al., 2017).

E. Classification
The classification process produces feature extraction from the mammogram by comparing the test data with the training data feature's characteristics. Creating a model from training data using a Support Vector Machine (SVM) is a machine learning method used for the classification or regression process to find the best hyperplane that separates two classes (Setiawan & Putra, 2019). Comparing the test data features with the model made. The classification results will determine the class of the tested mammogram image with the output in normal images for those in the negative area or abnormal for the positive area.  Figure 5. Flowchart of Feature Extraction Figure 6. Flowchart of Classification

RESULT AND DISCUSSION
This section describes each stage's results in this study on the system created to produce accuracy from applying the SFTA method. The test results were obtained by calculating the accuracy in Equation (5), precision in Equation (6), recall in Equation (7), and fi-score value in Equation (8)

Labeling
Labeling the first stage of this research is for the data training process. The abnormal image mass is marked with pixel coordinates (x, y) and radius in this stage. Coordinates and radius are obtaining from the dataset. The results of the labeling stage are shown in Figure 7. The next step process, the texture feature extraction process for each labeling image using SFTA and then stored in .csv containing vector features with the fractal dimension, mean, and area values.

Segmentation
Segmentation aims to separate the mass object indication from the background using k-means clustering and then proceed with the thresholding process. The k-means clustering stage in this study uses the k = 4 value, and for thresholding, it uses the T = 158 value. The k = 4 value is the optimal value, it can be seen from the test results in Table 1, and the T = 158 value is the optimal value, it can be seen in Table 2.  The segmentation test results for the k value (cluster) of the k-means process in Table 1 start from k = 3, because the k = 1 image value is only a black background, while k = 2 is only the background and outer part of the breast. The result of the optimal value of k = 4 can classify the mass with an accuracy of 73.33%, having the highest value compared to other k values The next segmentation process uses thresholding. Testing the T value for thresholding starts from T = 138 because the mammogram image's mass has a bright color, meaning that the value of the pixel intensity is relatively high. The value of T = 158 is optimal based on the test in Table 2, with an accuracy of 73.33%, having the highest value compared to other T values.

Morphology
The morphology at this stage aims to remove small pixels using the opening. In this study, use a square size of 15 x 15. A masking process then follows the morphology results to restore the pixel value of the image. The results of the morphology and masking image are shown in Figure 9.

SFTA Feature Extraction
The SFTA method in this study is used to extract the mammogram mass's texture features to produce vector features that contain the fractal dimension, mean, and area values. This research used nt = 3 as an input parameter to the SFTA method. The value of nt = 3 is the optimal value with an accuracy of 90%; the test can be seen in Table 3.   Table 3 describes the optimal nt value test at nt = 3 with an accuracy of 90% supported by the segmentation process's optimal value. The value of nt = 3 produces six different binary images. Each binary image has 3 features, namely the fractal dimension, mean, and area. The total features resulting from nt = 3 are 18 features. The number of features produced by SFTA did not make the accuracy get any higher.
Based on the extraction process of 215 normal class training images and 185 abnormal class images using the SFTA method with the optimal nt value, nt = 3, then the results of the range of normal and abnormal values are obtained for the fractal dimension (D), mean (v ), and area (A) as shown in Table 4. Table 4

Training
At this stage, validating the training data with input parameters from texture feature extraction from the SFTA method is performed. The validation set aims to provide a model evaluation from training data to support the test data. The training data covers the fractal dimension, mean, and area features extracted using SFTA.
Many features are generated depending on the nt parameter used. These value results are obtained from dividing training data and test data with training data by 70% and testing data by 30%. Table 5 shows the highest accuracy, precision, recall, and f1-score, respectively, 99.17%, 99.28%, 99%, and 99.13%. The system testing results are shown in Table 6 using optimal values of k = 4, T = 158, and nt = 3. The images tested were the different images as the training data. Classification errors occur in images with numbers 17, 19, and 21. The error on number 17 occurred when segmentation cannot find the mass so that during the extraction process, the three values of fractal dimension, mean, and area are zero. The errors for numbers 19 and 21 occurred during the classification process. The value of the resulting feature extraction from number 19 and 21 belong to an abnormal class. To overcome these misclassifications from the system model created, the author added a new segmentation process. In other words, the writer did the preprocessing stage first and adding new training data so that the fractal dimension, mean, and area values can cover the entire abnormal class.   This system test results using the SFTA method show accuracy, precision, recall, and f1-score, respectively 90%, 87.5%, 93.33%, and 90.32% with input nt = 3. Based on the results of the validation set and performance testing of the mass detection system on the mammogram image, underfitting or overfitting conditions did not occur because the resulting accuracy was not too much different at the percentage of 90%.

CONCLUSION
Based on performance testing of the system using the Segmentation-based Fractal Texture Analysis (SFTA) method to detect cancer cells, the system can classify mammogram images with an accuracy percentage of 90%, a precision of 87.75%, a recall of 93.33%, and f1 -score of 90.32%. Using k value in the k-means clustering process is k = 4, thresholding T = 158, and feature extraction using SFTA with nt = 3 could reach the highest result of this research.