USING ARTIFICIAL INTELLIGENCE TO AUTOMATE THE PROCESS OF COLLECTING AND ANALYZING DATA FROM ONLINE JOB POSTINGS
Keywords:
classification method, machine learning, neural network language models, natural language processing, information extraction, entity recognition.Abstract
The article discusses an approach to information extraction using online
learning based on determining the semantic proximity of sentence vectors and knowledge base
entities using neural network language models trained without a teacher on a large text corpus
of the subject area. A detailed review of modern supervised and unsupervised information
extraction methods is provided, which allow achieving acceptable quality in solving the problem
of analyzing current labor market requirements without the labor-intensive procedure of text
corpus tagging and without using rule-based approaches.
References
Al-Nabki, W., Eduardo, F., Enrique, A., & Laura, F.-R. (2020). Improving named entity recognition in noisy user-generated text with local distance neighbor feature. Neurocomputing, 382, 1–11. https://doi.org/10.1016/j.neucom.2019.11.091
Zhang, Z., & Iria, J. (2009). A novel approach to automatic gazetteer generation using Wikipedia. In Proceedings of the 2009 Workshop on the People’s Web Meets NLP (ACL-IJCNLP) (pp. 1–9). https://doi.org/10.3115/1699765.1699766
Zahraa, S. A., Mark, C., & Gholamreza, H. (2017). Multi-domain evaluation framework for named entity recognition tools. Computer Speech & Language, 43, 34–55. https://doi.org/10.1016/j.csl.2016.09.004
Rekia, K., Yu, Z., Weinan, Z., & Ting, L. (2017). CCG supertagging via bidirectional LSTM-CRF neural architecture. Neurocomputing, 283, 31–37. https://doi.org/10.1016/j.neucom.2017.01.018
Wang, Y., Tong, H., Zhu, Z., & Li, Y. (2022). Nested named entity recognition: A survey. ACM Transactions on Knowledge Discovery from Data, 1–29. https://doi.org/10.1145/3544931
Hu, X., et al. (2023). Location reference recognition from texts: A survey and comparison. ACM Computing Surveys, 1–37. https://doi.org/10.1145/3583559
Shao, Y., Hardmeier, C., & Nivre, J. (2016). Multilingual named entity recognition using hybrid neural networks. In The Sixth Swedish Language Technology Conference (SLTC).
Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint, arXiv:1508.01991. https://arxiv.org/abs/1508.01991
Luan, S., & Anderson, F. (2021). An entity resolution approach based on word embeddings and knowledge bases for microblog texts. Association for Computing Machinery, Article 53, 1–8. https://doi.org/10.1145/3431234
Kumarjeet, P., Pramit, M., & Vaishali, G. (2020). Named entity recognition using Word2vec. International Research Journal of Engineering and Technology (IRJET), 7(9), 1818–1820.
Nogueira, D., Madaan, P., Fersini, E., Palmonari, M., & Messina, E. (2021). Learning to adapt with word embeddings: Domain adaptation of named entity recognition systems. Information Processing & Management, 58(3), 102479. https://doi.org/10.1016/j.ipm.2021.102479
Nguyen, T. H., Plank, B., & Grishman, R. (2015). Semantic representations for domain adaptation: A case study on the tree kernel-based method for relation extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP) (pp. 635–644). https://doi.org/10.3115/v1/P15-1062
Guo, W., Yang, X., Yu, M., Huang, K., Peng, W., & Zhang, Z. (2022). MarkerGenie: An NLP-enabled text-mining system for biomedical entity relation extraction. Bioinformatics Advances, 2(1), vbac035. https://doi.org/10.1093/bioadv/vbac035
Wang, Z., Wang, Z., Li, J., et al. (2012). Knowledge extraction from Chinese wiki encyclopedias. Journal of Zhejiang University - Science C, 13(4), 268–280. https://doi.org/10.1631/jzus.C1100273
Nayak, T., & Ng, H. T. (2019). Effective modeling of encoder-decoder architecture for joint entity and relation extraction. arXiv preprint, arXiv:1911.09886. https://doi.org/10.48550/arXiv.1911.09886
Gensim Developers. (n.d.). Gensim. PyPI. https://pypi.org/project/gensim/
(Accessed: February 10, 2025)