Prediction of citations increase in scopus documents using the naïve bayes algorithm / Nadia Arianesya

Arianesya, Nadia (2025) Prediction of citations increase in scopus documents using the naïve bayes algorithm / Nadia Arianesya. Diploma thesis, Universitas Negeri Malang.

Full text not available from this repository.

Abstract

p Nowadays many researchers are working on scientific projects around the world and writing research articles. As a result many scientific articles are published every day with different quality and scientific impact. The potential impact of academic work is one of the considerations for researchers to improve their research. Citation is one measure of the quality of scientific work. Apart from being used as an evaluation metric in scientific works citations are also often used to estimate future trends to help allocate research funds effectively. The research method used includes four processes including data collection pre-processing process configuration and evaluation. Data collection is conducted by exporting data from the Scopus database with publications from January 1 2021-30 July 2024. Next data pre-processing is carried out with attribute selection and labeling using the K-Means algorithm. Then process configuration is carried out with validation using cross validation to train the na iuml ve bayes algorithm and evaluate by calculating evaluation metrics (accuracy precision recall and F1-Score). This research shows that the na iuml ve bayes algorithm is successfully applied to predict the increase in the number of citations on Scopus documents using various attributes including document type open access number of foreign affiliates number of authors number of foreign affiliated authors and labels obtained from the clustering process. After the data is prepared until the clustering process then predictions are made using the na iuml ve bayes algorithm with validation using cross validation and evaluation with accuracy precision recall and F1-Score metrics. The initial model showed high accuracy of 96.57% with excellent performance in the ldquo low rdquo class (F1-score 98.25%) but poor in the ldquo high rdquo class (F1-score 10.66%) due to data imbalance. To address this data balancing using SMOTE was performed which increased the F1-score of the ldquo high rdquo class to 76.09% but decreased the overall accuracy to 75.19%. Although the model was better at recognizing the minority class the altered data distribution led to a decrease in performance on the majority class. /p

Item Type:	Thesis (Diploma)
Subjects:	L Education > LC Special aspects of education > LC5201 Education extension. Adult education. Continuing education L Education > L Education (General) > LMO Model Pembelajaran
Divisions:	Fakultas Teknik (FT) > Departemen Teknik Elektro (TE) > S1 Teknik Informatika
Depositing User:	library UM
Date Deposited:	09 May 2025 04:29
Last Modified:	01 Apr 2026 02:10
URI:	http://repository.um.ac.id/id/eprint/400285

Actions (login required)

View Item