A semi-supervised approach for Vietnamese stock news classification with deep learning

Nguyen Minh Nhat1, Tran Huynh Minh Tan 1
1 Ho Chi Minh University of Banking (HUB), State Bank of Vietnam, Vietnam

Nội dung chính của bài viết

Tóm tắt

Stock-related news and articles on Vietnamese economic websites and blogs are rapidly increasing, but they are mixed with entertainment news, miscellaneous topics, and advertisements. This makes it annoy for real investors and analysts who only focus to find and analyze the stock-related information that matters (Boudoukh, 2013). This research introduces a novel method for automatically labeling the relevance of news articles to the stock market, based on a set of criteria derived from financial domain knowledge. In addition, this study also develops a deep learning classifier model that leverages the BERT architecture and the Vietnamese language model (viBERT) (Tran, 2020) to achieve high accuracy and efficiency in scoring the stock market news. This approach helps investors and analysts to filter out the irrelevant content on Vietnamese economic websites and access the most useful information for their mainstream analysis of stock movements.

Chi tiết bài viết

Tài liệu tham khảo

Allen, D. E. (2019). Daily market news sentiment and stock prices. Applied Economics, 51(30), 3212-3235. https://doi.org/10.1080/00036846.2018.1564115
Blum, A. A. (1998). Combining labeled and unlabeled data with co-training. Proceedings of the eleventh annual conference on Computational learning theory, (pp. 92-100).
Boudoukh, J. A. (2013). Which news moves stock prices? A textual analysis. National Bureau of Economic Research.
Devlin, J. A.-W. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Duong, D., Nguyen, T., & Dang, M. (2016, January). Stock market prediction using financial news articles on Ho Chi Minh Stock Exchange. In Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication (pp. 1-6).
Francisco Viveros -Jiménez, M. A.-P.-A.-D. (2018). Improving the boilerpipe algorithm for boilerplate removal in news articles using html tree structure. Computacion y Sistemas, 22, 483-489.
Gidofalvi, G. A. (2001). Using news articles to predict stock price movements. Department of computer science and engineering, university of california, san diego. https://people.kth.se/~gyozo/docs/financial-prediction.pdf
Khan, W., Ghazanfar, M., Azam, M. A., Karami, A., Alyoubi, K. H., & Alfakeeh, A. S. (2020). Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing, 13, 3433-3456. https://doi.org/10.1007/s12652-020-01839-w
Kohlschütter, C. (2022). Boilerpipe. Retrieved 01 2024, from Boilerpipe: https://github.com/kohlschutter/boilerpipe
Lison, P. A. (2020). Named entity recognition without labelled data: A weak supervision approach. arXiv preprint arXiv:2004.14723.
Makrehchi, M. A. (2013). Stock prediction using event-based sentiment analysis. 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 1,. 337-342.
Minaee, S. A. (2021). Deep learning--based text classification: a comprehensive review. ACM computing surveys (CSUR), 54, 1-40.
Nguyen Van, P. (2015). A good news or bad news has greater impact on the Vietnamese stock market? (No. 61194). University Library of Munich, Germany.
Office, G. S. (2023). Socio-economic situation report in the first quarter of 2023. https://www.gso.gov.vn/en/highlight/2023/07/socio-economic-situation-report-in-the-first-quarter-of-2023/
Qing Li, T. W. (2014). The effect of news and public mood on stock movements. Information Sciences, 826--840.
Sun, C. A. (2019). How to fine-tune bert for text classification? Chinese Computational Linguistics: 18th China National Conference, CCL 2019 (pp. 194-206). Kunming, China: Springer.
Sun, Y. M. (2018). A novel stock recommendation system using Guba sentiment analysis. Personal and Ubiquitous Computing, 22, 575-587.
Tran Duc Anh, N. S. (2023). Stock Market Outlook 2024. KB Securities Vietnam, Macro & Strategy. KB Securities Vietnam.
Tran, T. O. (2020). Improving sequence tagging for Vietnamese text using transformer-based neural models. In Proceedings of the 34th Pacific Asia conference on language, information and computation (pp. 13-20).
Triguero, I. A. (2015). Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 42, 245--284.
Tun, N. A. (2021). Stock article title sentiment-based classification using PhoBERT. In CEUR Workshop Proceedings (Vol. 3026, pp. 225-233).
Usmani, S. A. (2021). News sensitive stock market prediction: literature review and suggestions. PeerJ Computer Science, 7, e490.
Van de Kauter, M. a. (2015). Fine-grained analysis of explicit and implicit sentiment in financial news articles. Expert Systems with Applications, 4999-5010.
Ashish, V. (2017). Attention is all you need. Advances in neural information processing systems, 30, 1706.03762.
Villamil, L. A. (2023). Improved stock price movement classification using news articles based on embeddings and label smoothing. arXiv preprint arXiv:2301.10458.
Wu, H., Liu, Y., & Wang, J. (2020). Review of text classification methods on deep learning. Computers, Materials and Continua, 63(3), 1309-1321. https://doi.org/10.32604/cmc.2020.010172
Xiaodong Li, H. X. (2014). News impact on stock price return via sentiment analysis. Knowledge-Based Systems, 69, 14-23.
Zhu, X. (2005). Semi-Supervised Learning Literature Survey. World, 10. http://www2.denizyuret.com/ref/zhu/ssl_survey.pdf