Clustering Productive Palm Land using the K-Means Clustering Algorithm

medium productive plantation group (C1), and 7 blocks or 44% including the small productive plantation group (C0).


INTRODUCTION
Indonesia is a country with a tropical climate that has many plantations, one of which is an oil palm plantation.Oil palm plantations in Indonesia are managed by both the government and private companies [1].According to the Central Statistics Agency (BPS), the area of oil palm plantations in Indonesia is expected to reach 14.62 million hectares in 2021.This is a slight increase of 0.24% compared to the previous year's 14.59 million hectares.The data shows that the area of Indonesian oil palm plantations has been steadily increasing, particularly in the last decade, with a jump of 60.11% between 2011 and 2021 [2].Did you know that East Kotawaringin is the district with the largest area of oil palm plantations in Indonesia?According to data from the Ministry of Agriculture, the total area of oil palm land in various districts of Central Kalimantan Province is approximately 425,000 hectares (Ha).This amount comprises 406,000 hectares of national and foreign private plantation land, as well as 19,000 hectares of community plantations.Interestingly, the total area of oil palm in Central Kalimantan is 1.48 million hectares, which is the second largest on the island of Kalimantan after West Kalimantan with an area of 1.5 million hectares [3].
Alkema Deo is a company that manages oil palm plantations in Sampit City, located in the East Kotawaringin Regency of Central Kalimantan.Established in 2016, the company has two plantation sites situated on Jl.General Sudirman Km. 18, East Kotawaringin and Seibabi Village, Telawang District, East Kotawaringin.Alkema Deo operates two divisions of oil palm land, which are further divided into 16 blocks, covering a total area of approximately 33 hectares.Each block has an average size of 3-2 hectares and contains around 272 highly efficient palm trees.Each block can produce roughly 3.2 tons of palm fruit per hectare each month.The oil palm land belongs to CV. Alkema JINITA Vol. 5, No. 2, December 2023 DOI: doi.org/ 10.35970/ jinita.v5i2.2051Deo is organized by area, block, and productivity.This system of grouping is designed to enable daily production control and land management, to monitor yields effectively.This is critical to ensuring that palm oil production runs smoothly, while also maintaining the land and detecting any errors or declining production in each block.Cluster analysis is a technique used to group objects that share similar characteristics and properties.The objective is to create homogeneous groups with members that share common traits [4].
The k-Means algorithm is a process that helps to group data sets into several clusters.It is a straightforward and adaptable algorithm that is widely used.K-Means clustering is a significant part of data mining.The clustering process is dependent on the data and conclusions are drawn from it.Clustering in data mining is highly beneficial for identifying distribution patterns and analyzing data [5].Data mining is the process of uncovering knowledge from databases.It involves statistical analysis, mathematics, artificial intelligence, and machine learning to extract useful information and related insights from massive databases [6].
According to the given information, information about productive oil palm land is in the CV.Alkema Deo is needed to utilize the K-Means clustering algorithm method.Pulungan, I. M., Saifullah, S., Fauzan, M., and Windarto, A. P. researched determining the most productive oil palm blocks using the K-Means clustering algorithm.They grouped oil palm plant blocks based on data [7].Presents other previous studies that used different criteria in applying the K-Means clustering algorithm.Research by Hajar, S., Novany, AA, Windarto, AP, Wanto, A., & Irawan, E (2020) entitled Application of K-Means Clustering on palm oil exports by destination country.The research results show that palm oil exports can be completed using the K-Means Clustering Algorithm Method [8].Research by Nofiar, Andri, dan Sarjon Defit.Tambunan, HS (2018) entitled Determining Palm Oil Quality Using the K-Means Clustering Method.The research results of the k-means clustering method can be used to process data using data mining concepts in grouping data based on criteria [9].Research by Nuraisana, Nuraisana (2019) entitled Clustering analysis to determine the level of oil palm potential based on crop area using the K-means algorithm at the North Sumatra statistical agency.The research results are widely known that the use of the K-Means algorithm can assess the prospect of oil palm plantation capacity according to the area of cultivated land in North Sumatra province [10].Research by Febiola, Yessica Inggir, Imam Cholissodin, dan Candra Dewi (2019) entitled Forecasting Palm Oil Yields Using the Multifactor High Order Fuzzy Time Series Method Optimized with K-Means Clustering (Case Study: PT.Sandabi Indah Lestari, Bengkulu City).Research results It is known that this research used several factors consisting of monthly oil palm yield, land area, plant age, and amount of palm oil staples [11].Research by Sari, KAMI, Muslimin, M., Franz, A., & Sugiartawan, P. (2022) entitled Detection of Maturity Levels of Fresh Palm Fruit Bunch using the K-Means Algorithm.The results of the K-Means clustering research are a method of grouping objects based on their proximity to the average value of the cluster center for a certain number of clusters, k [12].Research by Pratama, Faiq Husain, Agung Triayudi, dan Eri Mardiani (2022) entitled K-medoids and k-means data mining for the classification of palm oil production potential in Indonesia.This is because there are similarities in the characteristics of plantations based on similarities in area, production, and productivity [13].Based on previous research, this research is to classify productive oil palm land in CV.Alkema Deo with 4 variables, namely land area, land area, average production yield, and percentage of achievement using data mining methods.
Therefore, this research aims to analyze and identify groupings of highly productive, medium productive, and small productive oil palm lands using the K-Means clustering algorithm method.Apart from that, it also takes into account the variables of land area, number of rows, average production yield, and percentage of achievement of the specified targets as a basis for grouping.Based on this objective, the benefit of this research is that companies can find out the grouping of oil palm land based on the productive level so that it can help in land management and make decisions that are expected to increase oil palm production results.

RESEARCH METHOD
This study applied a qualitative approach using a descriptive research pattern.The secondary data used in this study is plantation production data based on outside areas in CV.Alkema Deo from January to June 2022.The object of research in this study is the area of the CV.Alkema Deo is located at Jendral Sudirman Km.18 and in the village of Seibabi, Telawang District, East Kotawaringin Regency.The area to be studied is 33 hectares (ha) consisting of 2 divisions totaling 16 blocks of productive oil palm land.Based on the research object, it can be determined that the population is all the oil palm land in CV.Alchemy God.Then for the research sample are productive oil palm lands based on division, block, area/land, empty fruit bunch, average yield, and yielding percentage.

Data Collection Methods
Production data for 2022 is then processed by data mining with the following steps: data mining with the following steps; Data cleaning (Data Cleaning); Data Integration (Data Integration); Data Selection (Data Selection); Transformation Data (Data Transformation); and Presentation of Knowledge obtained as a result of data mining processing.Calculations were performed using Excel software [14].To carry out the grouping of oil palm land in CV.Alkema Deo uses secondary data as mentioned, with the following steps in data mining processing: All steps of data mining processing can be done using Excel software.You can take advantage of various features and functions of Excel to clean, integrate, select, transform data, and present the knowledge obtained through processing the data mining.

Data Processing Process
Production data for 2022 will then be processed using the data mining method with the following stages: data cleaning; data integration; data selection; data transformation; data mining; and knowledge representation obtained from the results of the data mining processing [15].It is a process for each stage of data processing.The computational process is carried out using Excel and RapidMiner software to simplify data processing.Segmentation with the K-Means Clustering technique in data mining involves several stages.The procedure for each stage of this technique can be carried out as shown in Figure 2. The calculation results from data processing using this technique will produce a segmentation of oil palm plantations in the CV.Alkema Deo is divided into several groups based on the same results in the 2022 production data [16].The analysis step is the last in the plantation segmentation process.After the data is processed using the K-Means clustering technique, the data will be further analyzed to find out the details of the oil palm plantation segments in various areas of CV.Alkema Deo is based on the similarity of area characteristics (Ha), the number of shoots produced, the production yield of the specified target (%), and the average production yield per hectare (kg) [16].

Research result
Based on the results of research for writers on CVs.Alkema Deo, the use of Excel in companies is quite good at processing data, but on a CV.Alkema Deo does not yet have land groupings based on productivity levels, so it is difficult to see the level achieved in 6 months based on the set target, and daily production control in terms of area and block area.

Data and Information Collection
Data obtained from the CV.Alkema Deo is grouped by area, block, and productivity.This grouping aims to provide daily production control, and land control to monitor yields, which play an important role in ensuring that JINITA Vol.

Data Cleanup
The data contained in Table 2 is production data for 2022 which the author obtained from CV. Alkema Deo has no data duplication or inconsistent data, so data cleaning is not necessary.

Data Integration
The data that the author will use in the data mining process consists of one data source, namely production data for 2022.Production data for 2022 can be seen in Table 2, and integration data can be seen in Table 3.

Data Selection
The data cleaning that has been done is then re-selected and the right data is selected for the process of grouping CV.Alkema Deo productive oil palm land.Data selection can be seen in Table 4.

Data Transformation
Data transformation resulted in 16 datasets which would be processed using the K-Means Clustering technique.Before executing using this technique, each variable is assigned a special attribute to facilitate the processing of the selected data, which is converted to a format that suits your needs in the form of: a.
Average production yield =

Clustering K-means
Based on data that has been processed in the early stages of data mining, land area, number of stems, average yield and percentage of achievement are the variables that will be used in clustering calculations.The following are JINITA Vol.Steps for Completion of K-Means: 1.The number of clusters formed is 3 clusters.Where cluster 2 (C2) is highly productive, cluster 1 (C1) is productive and cluster 0 (C0) is low productive.2. In testing as the center of the cluster starting point (centroid) the author uses values from production data for January 2022 by determining the maximum value for C0, the average value for C1, and the minimum value for C2.Following are the results of the initial centroid data for iteration 1, as shown in Table 7:  7 above, the initial centroid is determined.The optimal number of clusters is determined by looking at the maximum average value of the silhouette S(i).The number of K-means clusters is an estimate of the variable that maximizes the average value of the silhouette validity index S(i).For cluster 0(C0) Area = 2, empty fruit bunch= 578, Average yield = 1075, Percentage of Achievement = 77.06.cluster 1(C1) Area = 3, empty fruit bunch= 810, Average yield = 1086.66,Percentage of Achievement = 84.12.Cluster 2(C2) Area = 2, empty fruit bunch = 317, Average production yield = 675, Achievement Percentage = 72.97.From the results of the calculation above, the result shows that the distance between the production data and the second block 1A cluster is 642.3693.

JINITA
Data source: Processed 2023 data In Table 8 above, the results of iteration 1 calculations obtained grouping results for C0 = 6 (38%), C1 = 1 (6%), and C2 = 9 (56%) as shown in the table above.The iteration process continues until the number of clusters is equal to the number of previous iterations.When the number of the last cluster is the same as the previous cluster, the K-Means process stops.In other words, if the calculation of the resulting data group changes, repeat each data with a new centroid.5.In testing this sample, the iteration process is carried out 3 times because the number of members from the 2nd and 3rd iteration calculations is the same, the iteration process is stopped.Following are the results of the new centroid for iteration 2.  9 above determines the centroid for iteration 2. The optimal number of clusters is determined by looking at the maximum average value of the silhouette S(i).The number of K-means clusters is an estimate of the variable that maximizes the average value of the silhouette validity index S(i).For cluster 0(C0) Area = 2, Empty fruit bunch = 542.5,Average yield = 987.5,Percentage of Achievement = 79.96167.cluster 1(C1) Area = 3, Empty fruit bunch = 810, Average yield = 1086.66,Percentage of Achievement = 84.12.Cluster 2(C2) Area = 2, Empty fruit bunch = 420.6667,Average production yield = 677.2222,Achievement Percentage = 72.93889.After all, points are calculated in the nearest cluster, then do the calculation again on the centroid of Iteration 2. 6. Calculation of iteration 2 by calculating the shortest distance using the Euclidean Distance formula: 1AC0 = √(3 − 2) 2 + (810 − 542,5) 2 + (1086,66 − 987,5) 2 + (84,12 − 79,96167) 2 1AC0 = 285,3196 From the results of the calculation above, it is obtained that the distance between production data and block 1A cluster zero is 285,3196.1AC1= √(3 − 3) 2 + (810 − 810) 2 + (1086,66 − 1086,66) 2 + (84,12 − 84,12) 2 1AC1 = 0 From the results of the calculation above, it is obtained that the distance between production data and the first block 1A cluster is 0. From the results of the calculation above, the result shows that the distance between the production data and the second block 1A cluster is 565,1069.7. The results of iteration 2 calculations can be seen in Table 10 below:   Table 11 above determines the centroid for iteration 3. The optimal number of clusters is determined by looking at the maximum average value of the silhouette S(i).The number of K-means clusters is an estimate of the variable that maximizes the average value of the silhouette validity index S(i).For cluster 0(C0) Area = 2, empty fruit bunch = 530.5714,Average production yield = 967.8571,Achievement Percentage = 80.68143.cluster 1(C1) Area = 3, empty fruit bunch = 810, Average yield = 1086.66,Percentage of Achievement = 84.12.Cluster 2(C2) Area = 2, empty fruit bunch = 415.875,Average production yield = 655.625,Achievement Percentage = 71.43125.After all, points are calculated in the nearest cluster, then do the calculation again at the centroid of Iteration 3. From the results of the calculation above, it is obtained that the distance between production data and the first block 1A cluster is 0. 1AC2 = √(3 − 2) 2 + (810 − 415,875) 2 + (1086,66 − 655,625) 2 + (84,12 − 71,43125) 2 1AC2 = 584,1983 From the results of the calculation above, the result shows that the distance between the production data and the second block 1A cluster is 584,1983.9.After getting the centroid update, the next step is to repeat the iteration with the new centroid center by doing the same calculations as iteration 1.Then the iteration 3 cluster results are obtained as in Table 12.After getting the centroid update, the next step is to repeat the iteration with the new centroid center by doing the same calculations as iteration 1.Then the iteration 3 cluster results are obtained as in 13.The data mining used in this segmentation is the k-means clustering technique, which divides oil palm plantations into several segments based on the four variables of area, length, average yield, and percentage of achievement.Segmentation is done using RapidMiner software.The first thing to do is to make the K-means design process by importing the data contained in Table 13.The number of clusters formed is 3 clusters.Where cluster 0 (C0) is low productive, cluster 1 (C1) is productive, and cluster 2 (C2).Determination of the centroid value of the center point is obtained randomly and automatically with the type of measurement used in the form of numerical   from Rapidminer The results of the k-means clustering process from Rapidminer with three segmentation models as shown in Figure 4.The results of the segmentation provide information that the first segment or cluster 0 consists of 7 blocks, the second segment, or cluster 1 consists of 1 block, and the third segment, or cluster 2 consists of 8 blocks.

K-Means Clustering Representation
The calculation results show that the content of each element of the oil palm plantation segment is different CV.Alchemy God.Representatives of K-Means groupings can be seen based on

Data Analysis
Cluster 0 is a plantation group that has low yield potential for oil palm plantations, consisting of 1B, 1C, 1G, 1H, 1I, 2B, and 2F.In cluster 0 it can be seen that the total area lies in the range of 2 Ha, length in the range of 530,571 to 459, average production yields in the range of 967,857 Kg/Ha to 850 Kg/Ha, and the percentage of achievement is 80,681 to 71,94.
Cluster 1 is a plantation group that has moderate potential for oil palm plantations, consisting of ID: 1A.In cluster 1 it can be seen that the total area lies in the 3 Ha range, the Length is in the 810 range, the average production yield is in the 1086.660Kg/Ha range, and the percentage of achievement is 84.120.
Cluster 2 is a plantation group that has high yield potential for oil palm plantations, consisting of ID: 1D, 1E, 1F, 2A, 2C, 2D, 2E, and 2G.In cluster 2 it can be seen that the total area lies in the range of 2 Ha, the Length in the range of 415.875 to 280, and the average yield in the range of 655.625 Kg/Ha to 550 Kg/Ha, and the percentage of achievement is 71.431 to 92.43.

1 .
Data Cleaning: a. Identify and treat missing or blank values in data such as area, empty fruit bunch, average yield, and percentage gain.b.Identification and handling of invalid or inconsistent data, for example, data that exceeds logical limits or differs from the specified format.c.Eliminate data duplication if any. 2. Data Integration: a. Combining data from various relevant sources, such as data on area, empty fruit bunch, average yield, and percentage of achievement, into one integrated dataset.b.Ensuring uniformity and consistency of data formats between columns.3. Data Selection: a. Select the most relevant and significant variables or attributes to group oil palm lands.b.Remove variables that are irrelevant or have little effect on grouping.4. Transformation Data (Data Transformation): a. Perform data transformations when necessary, such as data normalization or scaling, to ensure uniformity and consistency of data in the dataset.b.Apply appropriate methods or techniques to change data formats or representations where necessary.5. Knowledge Presentation: a. Use appropriate data mining processing techniques, such as clustering algorithms, to classify oil palm areas based on existing data.b.Analyze grouping results and present them in informative presentations, such as tables, graphs, or data visualizations.

Figure 1 .Figure 2
Figure 1.Data processing with K-means data when used in the data mining process k-means Table5data transformation.

JINITA
Vol. 5, No. 2, December 2023 DOI: doi.org/ 10.35970/ jinita.v5i2.2051measures of distance.The centroid value in each cluster is based on area, length, average production yield, and the percentage of achievement used by the centroid table as shown in Figure 3. Centroid data in the centroid table.

Figure 3 .
Figure 3.The data on the Centroid Tablefrom Rapidminer

Figure 4 .
Figure 4. Productive Segmentation Results for January Using K-Means Clustering from Rapidminer

Figure 5
Representation of K-MeansClustering In Figure Note that cluster 0 is displayed in green, which is a group of plants that produce low oil palm plantations.Cluster 1 blue means the group with moderate yields of oil palm plantations.Cluster 2, shown in orange, is a group of plantations with high yields of oil palm plantations.

Figure 5
Figure 5 Representation of K-Means Clustering from Rapidminer

Table 2 .
5, No. 2, December 2023 DOI: doi.org/ 10.35970/ jinita.v5i2.2051palm oil production continues to run well, maintaining the land, and seeing if there are any errors or decreased production in each block according to production data in 2022.Oil palm plantation data to see the potential generated if there are 16 data shown in Table 2. Production data for 2022 consists of block data, Op, area (Ha), principal amount, empty fruit bunch, production, and Production Distribution.Production data for 2022

Table 3 .
Data Integration

Table 4 .
Data Selection

Table 5 .
Data transformation

Table 10
Grouping in Iteration 2

Table 10
50%) are shown in the table above.The iteration process continues until the number of clusters is equal to the number of previous iterations.