Programming Languages Prediction from Stack Overflow Questions Using Deep Learning
Abstract
Understanding programming languages is vital in the ever-evolving world of software development. With constant updates and the emergence of new languages, staying informed is essential for any programmer. Additionally, utilizing a tagging system for data storage is a widely accepted practice. In our study, queries were selected from a Stack Overflow dataset using random sampling. Then the tags were cleaned and separated the data into title, title + body, and body. After preprocessing, tokenizing, and padding the data, randomly split it into training and testing datasets. Then various deep learning models were applied such as Long Short-Term Memory, Bidirectional Long Short-Term Memory, Multilayer Perceptron, Convolutional Neural Network, Feedforward Neural Network, Gated Recurrent Unit, Recurrent Neural Network, Artificial Neural Network algorithms to the dataset in order to identify the programming languages from the tags. This study aims to assist in identifying the programming language from the question tags, which can help programmers better understand the problem or make it easier to understand other programming languages.
References
T. S. Jalolov, “The Importance of English in Programming,” World of Science, vol. 7, no. 5, pp. 128–134, 2024.
O. Chew, H. T. Lin, K. W. Chang, and K. H. Huang, "Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis," arXiv preprint arXiv:2305.13654, 2023. doi: https://doi.org/10.48550/arXiv.2305.13654.
A. Hassan and A. Mahmood, "Efficient deep learning model for text classification based on recurrent and convolutional layers," in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1108-1113, Dec. 2017. doi: 10.1109/ICMLA.2017.00009.
K. P. Chaithanya and J. G. Melekoodappattu, "An Exploration on Plant Disease Detection," in 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT), pp. 911-916, 2022.
S. Nosouhian, F. Nosouhian, and A. K. Khoshouei, "A review of recurrent neural network architecture for sequence learning: Comparison between LSTM and GRU," 2021. doi:10.20944/preprints202107.0252.v1.
I. Chehreh, E. Ansari, and B. S. Bigham, "Advanced Automated Tagging for Stack Overflow: A Multi-Stage Approach Using Deep Learning and NLP Techniques," in 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 1-6, Feb. 2024. doi: https://doi.org/10.1145/3387940.3391491.
A. Lobanov, E. Bogomolov, Y. Golubev, M. Mirzayanov, and T. Bryksin, "Predicting tags for programming tasks by combining textual and source code data," arXiv preprint arXiv:2301.04597, 2023. doi: 10.48550/arXiv.2301.04597.
E. Skenderi, S. M. Laaksonen, J. Huhtamäki, and K. Stefanidis, "Assessing Text Representation Methods on Tag Prediction Task for StackOverflow," in The 56th Hawaii International Conference on System Sciences, pp. 585-594, Jan. 2023. doi: https://doi.org/10.24251/HICSS.2023.075.
A. K. Saha, R. K. Saha, and K. A. Schneider, “A discriminative model approach for suggesting tags automatically for Stack Overflow questions,” in 2013 10th Working Conference on Mining Software Repositories (MSR), 2013, pp. 73–76. doi:10.1109/MSR.2013.6624009.
V. S. Rekha, N. Divya, and P. S. Bagavathi, “A hybrid auto-tagging system for Stack Overflow forum questions,” in Proc. 2014 Int. Conf. Interdiscip. Adv. Appl. Comput., 2014, pp. 1–5. doi:10.1145/2660859.2660970.
V. Jain and J. Lodhavia, “Automatic question tagging using k-nearest neighbors and random forest,” in 2020 Int. Conf. Intell. Syst. Comput. Vision (ISCV), 2020, pp. 1–4. doi:10.1109/ISCV49265.2020.9204309
J. N. Khasnabish, M. Sodhi, J. Deshmukh, and G. Srinivasaraghavan, “Detecting programming language from source code using Bayesian learning techniques,” in Mach. Learn. Data Mining Pattern Recognit., 10th Int. Conf., MLDM 2014, St. Petersburg, Russia, Jul. 21–24, 2014. Proc., 2014, vol. 10, pp. 513–522. doi: https://doi.org/10.1007/978-3-319-08979-9_39.
E. M. Kavuk and A. Tosun, “Predicting Stack Overflow question tags: A multi-class, multi-label classification,” in Proc. IEEE/ACM 42nd Int. Conf. Softw. Eng. Workshops, 2020, pp. 489–493. doi: https://doi.org/10.1145/3387940.339149.
T. Saini and S. Tripathi, “Predicting tags for Stack Overflow questions using different classifiers,” in 2018 4th Int. Conf. Recent Adv. Inf. Technol. (RAIT), 2018, pp. 1–5. doi: 10.1109/RAIT.2018.8389059.
S. Subramani, S. Rajesh, K. Wankhede, and B. Wukkadada, “Predicting Tags of Stack Overflow Questions: A Deep Learning Approach,” in 2023 Somaiya Int. Conf. Technol. Inf. Manage. (SICTIM), 2023, pp. 64–68. doi: 10.1109/SICTIM56495.2023.10105054.
J. F. Baquero, J. E. Camargo, F. Restrepo-Calle, J. H. Aponte, and F. A. González, “Predicting the programming language: Extracting knowledge from Stack Overflow posts,” in Adv. Comput., 12th Colombian Conf., CCC 2017, Cali, Colombia, Sep. 19–22, 2017, Proc., 2017, vol. 12, pp. 199–210. doi: https://doi.org/10.1007/978-3-319-66562-7_15.
H. Wang, B. Wang, C. Li, L. Xu, J. He, and M. Yang, “SOTagRec: A combined tag recommendation approach for Stack Overflow,” in Proc. 2019 4th Int. Conf. Math. Artif. Intell., 2019, pp. 146–152.
doi: https://doi.org/10.1145/3325730.332575.
T. Saini and S. Tripathi, “Predicting tags for Stack Overflow questions using different classifiers,” in 2018 4th Int. Conf. Recent Adv. Inf. Technol. (RAIT), 2018, pp. 1–5. doi: 10.1109/RAIT.2018.8389059.
P. Prajapati, A. Thakkar, and A. Ganatra, “A survey and current research challenges in multi-label classification methods,” Int. J. Soft Comput. Eng. (IJSCE), vol. 2, no. 1, pp. 248–252, 2012.
D. Wihardini, “Long Short-Term Memory (LSTM): Trends and Future Research Potential,” Future, vol. 5, p. 6. doi: 10.46338/ijetae0523_04.
Z. Chai, “BiLSTM Short-term Wind Power Prediction Based on Attention Mechanism,” in 2023 IEEE 3rd Int. Conf. Electron. Technol., Commun. Inf. (ICETCI), 2023, pp. 1341–1346.
doi: 10.1109/ICETCI57876.2023.10176962.
G. Bachmann, S. Anagnostidis, and T. Hofmann, “Scaling MLPs: A tale of inductive bias,” in Adv. Neural Inf. Process. Syst., vol. 36, 2024.
M. Umer, Z. Imtiaz, M. Ahmad, M. Nappi, C. Medaglia, G. S. Choi, and A. Mehmood, “Impact of convolutional neural network and FastText embedding on text classification,” Multimed. Tools Appl., vol. 82, no. 4, pp. 5569–5585, 2023. doi: https://doi.org/10.1007/s11042-022-13459-x.
V. K. Ojha, A. Abraham, and V. Snášel, “Metaheuristic design of feedforward neural networks: A review of two decades of research,” Eng. Appl. Artif. Intell., vol. 60, pp. 97–116, 2017. doi: https://doi.org/10.1016/j.engappai.2017.01.013.
W. A. Degife and B. S. Lin, “Deep-Learning-Powered GRU Model for Flight Ticket Fare Forecasting,” Appl. Sci., vol. 13, no. 10, p. 6032, 2023. doi: https://doi.org/10.3390/app13106032.
W. Liu, F. Teng, X. Fang, Y. Liang, and S. Zhang, “An RNN-based performance identification model for multi-agent containment control systems,” Mathematics, vol. 11, no. 12, p. 2760, 2023. doi: https://doi.org/10.3390/math11122760.
X. Ma, L. Zhong, and X. Chen, “Application of Hopfield Neural Network Algorithm in Mathematical Modeling,” in 2023 IEEE 12th Int. Conf. Commun. Syst. Netw. Technol. (CSNT), 2023, pp. 591–595.
doi: 10.1109/CSNT57126.2023.10134711
S. Sridhar, “A study on various programming languages to keep pace with innovation,” IJITR, vol. 5, no. 2, pp. 5681–5704, Feb.–Mar. 2017.
Copyright (c) 2024 Tapu Biswas
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).