UPH Digital Library Miner: A Topic Modelling-based Software Application for Mining Document Collections of a Digital Library
![]() |
10.5120/ijca2015907559 |
Toluwase A Olowookere, Ayodeji I Fasiku and Ifeanyi C Emeto. Article: UPH Digital Library Miner: A Topic Modelling-based Software Application for Mining Document Collections of a Digital Library. International Journal of Computer Applications 132(13):1-8, December 2015. Published by Foundation of Computer Science (FCS), NY, USA. BibTeX
@article{key:article, author = {Toluwase A. Olowookere and Ayodeji I. Fasiku and Ifeanyi C. Emeto}, title = {Article: UPH Digital Library Miner: A Topic Modelling-based Software Application for Mining Document Collections of a Digital Library}, journal = {International Journal of Computer Applications}, year = {2015}, volume = {132}, number = {13}, pages = {1-8}, month = {December}, note = {Published by Foundation of Computer Science (FCS), NY, USA} }
Abstract
With changing user expectations, many traditional libraries are moving toward digital content storage. Accessible from anywhere at any time, digital contents as stored in digital libraries provide users with efficient, on-demand information experiences. With this trend, the amount of digital contents especially digital text documents made available to users have tremendously increased over the years, being filled with hidden information in form of the varieties of topics of discourse inherent in them leading to information overload. Accordingly, users, mostly computational researchers are presented with challenges on the discovery and identification of the varieties of topical contents of the collections in the digital library thus making it imperative to develop a means to automatically discover the topics that pervade the collections in a digital library. This paper therefore presents UPH Digital Library Miner, a software application for mining document collections of a digital library for topical structure discovery and topic-based similarities search between collection pairs, using topic modeling algorithm and inverted Kullback-Leibler divergence measure. The application is integrated with document collections built in a widely used digital library software system— Greenstone digital library system, via loose-coupling integration approach. Results obtained from using this software application on the Greenstone’s document collections that contain abstracts of about 628 documents from IEEE transactions on Software Engineering show its ability to discover latent topical structures in collections and also report collections that are similar based on their discovered topical structure.
References
- Hearst, M. 1999. Untangling Text Mining. In Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics. College Park MD, Association of Computational Linguistics, Morristown, NJ. pp.3-10.
- Feldman, R. 1998. Practical Text Mining. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. London: 478.
- Lammey, R. 2014. CrossRef’s Text and Data Mining Services. Learned Publishing, Vol. 27, No. 4, pp. 245-250.
- Feldman, R and Sanger, J. 2006. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, New York: Cambridge University Press.
- Hofmann, T. 1999. Probabilistic Latent Semantic Indexing. In In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, pp. 50-57.
- Steyvers, M. and Griffiths, T. 2005. Probabilistic Topic Models. In Landauer, T., McNamara, D., Dennis, S. and Kintsch, W. (ed), Latent Semantic Analysis: A Road to Meaning, Laurence Erlbaum.
- Blei, D., Ng, A. and Jordan, M. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, vol. 3, pp. 993–1022.
- Mimno D. and McCallum, A. 2007. Mining a Digital Library for Influential Authors. In JCDL’07 ACM, Vancouver, British Columbia, Canada, June 18–23.
- Rajasekharan, K. and Nafala, K. M. 2007. Building up a Digital Library with Greenstone, A Self-Instructional Guide for Beginners. Thrissur, India.
- Rauber, A. and Merkl, D. 2003. Text Mining in the SOMLib Digital Library System: The Representation of Topics and Genres. Applied Intelligence, vol. 18, 271–293.
- Vidhya, K. A. and Aghila, G. 2010. Text Mining Process, Techniques and Tools: an Overview. International Journal of Information Technology and Knowledge Management, vol. 2, no. 2, pp. 613-622.
- Olowookere, T. A., Eke B. O. and Oghenekaro, L. U. 2015. A Topic Modelling-Based Framework for Mining Digital Library’s Text Documents. IEEE African Journal of Computing and ICTs, Nigeria, vol. 8, no 4.
- McCallum, A. K. 2002. MALLET: A Machine Learning for Language Toolkit. http://mallet.cs.umass.edu.
- Nelken, R. and Shieber, S. M. 2006. Computing The Kullback-Leibler Divergence Between Probabilistic Automata Using Rational Kernels. Harvard University, Division of Engineering and Applied Sciences, Cambridge.
- Ramage, D. and Rosen, E. 2009. Topic Modeling Toolbox Stanford NLP group, http://nlp.stanford.edu/software/tmt/tmt-0.4.ieeexplore.ieee.org/xpl/tocresult.jsp? isnumber=28304&punumber=32.
Keywords
Digital Library, Document Collection, Text mining, Topic Modeling, Topical Structure.