Exploring Data Scientists in Scientific Literature: LDA Topic Modeling on the Semantic Scholar Database


Łukasz Iwasiński 

Afiliacja: ,  Polska

Len Krawczyk 

Afiliacja: Uniwersytet Warszawski, Wydział Socjologii,  Polska

Mateusz Szymański 


Abstrakt

Purpose/Thesis: This paper explores the representation of data scientists in scientific literature. It aims to answer the following quiestions: How has the number of publications on data scientists evolved over time? How are papers regarding data scientists distributed over different fields of study? In what context data scientists are represented in scientific literature?

Approach/Methods: Authors used Latent Dirichlet Allocation (LDA) topic modeling to the resources available within the Semantic Scholar API.

Results and conclusions: There is an increase in the number of publications on data scientists since 2008. A robust connection between data scientists and information technology as well as biomedical research was found. Little literature discusses data scientists in sociocultural context.

Originality/Value:

To our knowledge no studies heve been devoted to the representation of data scientists in scientific literature. The research may contribute to conceptualization of this notion.

Słowa kluczowe

Data Science; Text Mining; Latent Dirichlet Allocation; Topic Modeling; Semantic Scholar


Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022
Fan, W., Wallace, L., Rich, S., & Zhang, Z. (2006). Tapping the power of text mining. Communications of the ACM, 49(9), 76–82.
Floridi, L. (2014). The Fourth Revolution: How the Infosphere is Reshaping Human Reality. Oxford: Oxford University Press.
Fricke, S. (2018). Semantic scholar. Journal of the Medical Library Association: JMLA,106(1).
Hazzan, O, & Koby, M. (2023). Data Science as a Research Method. In: Hazzan, Orit, & Koby Mike (eds.) Guide to Teaching Data Science. Cham: Springer International Publishing, 121–35.
Ho, A., Nguyen, A., Pafford, J. L., & Slater, R. (2019). A Data Science Approach to Defining a Data Scientist. SMU Data Science Review, 2(3).
Joshi, M. V. (2020). Security/Privacy Issues and Challenges in Big Data. International Research Journal of Engineering and Technology, 07(06).
Kinney, R., Anastasiades, C., Authur, R., Beltagy, I., Bragg, J., Buraczynski, A., Cachola, I., Candra, S., Chandrasekhar, Y., Cohan, A., Crawford, M., Downey, D., Dunkelberger, J., Etzioni, O., Evans, R., Feldman, S., Gorney, J., Graham, D., Hu, F, Huff, R., King, D., Kohlmeier, S., Kuehl, B., Langan, M., Lin, D., Liu, H., Lo, K., Lochner, J., MacMillan, K., Murray, T., Newell, C., Rao, S. R., Rohatgi, S., Sayre, P. L., Shen, Z., Singh, A., Soldaini, L., Subramanian, S., Tanaka, A, Wade, A. D., Wagner, L. M., Wang, L. L., Wilhelm, C., Wu, C., Yang, J., Zamarron, A., van Zuylen, M., and Weld, D. S. (2023). The Semantic Scholar Open Data Platform [online]. Allen Institute for Artificial Intelligence Seattle [14.09.2024], https://arxiv.org/pdf/2301.10140.
Krumholz, H. M. (2014). Big Data And New Knowledge In Medicine: The Thinking, Training, And Tools Needed For A Learning Health System. Health Affairs, 33(7), 1163–1170.
Lai, Y., Kankanhalli, A. & Ong, D. (2021). Human-AI Collaboration in Healthcare: A Review and Research Agenda. Proceedings of the 54th Hawaii International Conference on System Sciences | 2021.
Mishra, A. (2021). The Study of Impact of Data Science on Business. International Journal for Research in Applied Science and Engineering Technology, 9(VII), 1486–1488.
Nair, S. R. (2020). A review on ethical concerns in big data management., International Journal of Big Data Management,1(1), 8-25.
Rothstein, M. A. (2015). Ethical Issues in Big Data Health Research: Currents in Contemporary Bioethics. Journal of Law, Medicine & Ethics, 43(2), 425–429.
Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70.
Thakur, K. & Kumar, V. (2022). Application of Text Mining Techniques on Scholarly Research Articles: Methods and Tools. New Review of Academic Librarianship, 28(3), 279–302.
Wiedemann, G. (2016). Text Mining for Qualitative Data Analysis in the Social Sciences. Leipzig: Springer Vs.

Opublikowane: 2025-04-08



Łukasz Iwasiński  lukiwas@gmail.com

Afiliacja: ,  Polska

Len Krawczyk 

Afiliacja: Uniwersytet Warszawski, Wydział Socjologii,  Polska

Mateusz Szymański 






Creative Commons License

Utwór dostępny jest na licencji Creative Commons Uznanie autorstwa – Użycie niekomercyjne – Bez utworów zależnych 4.0 Międzynarodowe.

CC BY-NC-ND 4.0 Uznanie autorstwa - Użycie niekomercyjne - Bez utworów zależnych 4.0 Międzynarodowe