CFP last date
20 May 2024
Reseach Article

Automated Movie Genre Classification with LDA-based Topic Modeling

by Brandon Chao, Ankit Sirmorya
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 145 - Number 13
Year of Publication: 2016
Authors: Brandon Chao, Ankit Sirmorya

Brandon Chao, Ankit Sirmorya . Automated Movie Genre Classification with LDA-based Topic Modeling. International Journal of Computer Applications. 145, 13 ( Jul 2016), 1-5. DOI=10.5120/ijca2016910822

@article{ 10.5120/ijca2016910822,
author = { Brandon Chao, Ankit Sirmorya },
title = { Automated Movie Genre Classification with LDA-based Topic Modeling },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2016 },
volume = { 145 },
number = { 13 },
month = { Jul },
year = { 2016 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { },
doi = { 10.5120/ijca2016910822 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T23:48:41.949360+05:30
%A Brandon Chao
%A Ankit Sirmorya
%T Automated Movie Genre Classification with LDA-based Topic Modeling
%J International Journal of Computer Applications
%@ 0975-8887
%V 145
%N 13
%P 1-5
%D 2016
%I Foundation of Computer Science (FCS), NY, USA

Movie genre classification is a challenging problem with many potential applications. Whereas many prior approaches rely on image, audio, or motion features to classify movies, we consider using textual content analysis instead, which is a comparatively less computationally expensive and time consuming process. In this paper, we present a novel system for movie genre classification that uses probabilistic topic modeling of the movie’s script as its main component. Our approach uses latent Dirichlet allocation, a topic modeling algorithm, to train our model and discover common themes present in movie scripts of the same genre. We then compute the cosine similarity of the feature vectors from our trained and test models and use this value to identify the movies’ genres.

  1. H. Zhou, T. Hermans, A. V. Karandikar and J. M. Rehg, Movie Genre Classification via Scene Categorization, in 18th ACM International Conference on Multimedia, 2010.
  2. B. T. Truong, S. Venkatesh and C. Dorai, Automatic Genre Identification for Content-Based Video Categorization, IEEE International Conference on Pattern Recognition, 2000.
  3. Z. Rasheed, Y. Sheikh, and M. Shah, On the Use of Computable Features for Film Classification, in IEEE Transactions on Circuit and Systems for Video Technology, 2001.
  4. M. Roach, L. Q. Xu, and J. Mason, Classification of nonedited broadcast video using holistic low-level features, in IWDC, 2002.
  5. R. S. Jasinschi and J. Louie, Automatic TV program genre classification based on audio patterns, in Euromicro Conference, 2001.
  6. W. Zhu, C. Toklu, and S. P. Liou, Automatic news video segmentation and categorization based on closed-captioned text, in Multimedia and Expo, ICME, 2001.
  7. S. Oger, M. Rouvier and G. Linares, Transcription Based Video Genre Classification, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2010.
  8. M. Blosseville, G Hebrail, M. Monteil, N. Penot, Automatic Document Classification: Natural Language Processing, Statistical Analysis, and Expert System Techniques Used Together, Sigir Forum (Acm Special Interest Group on Information Retrieval), 1992.
  9. M. Steyvers, and T. Griffiths. Probabilistic topic models, Handbook of latent semantic analysis 427.7 (2007): 424-440.
  10. D. M. Blei, T. L. Griffiths , M. I . Jordan and J. B. Tenenbaum (2004), Hierarchical topic models and the nested Chinese restaurant process, Advances in Neural Information Processing Systems, 2004.
  11. D. M. Blei, A. Y. Ng, and M. I . Jordan, Latent Dirichlet allocation, Journal of Machine Learning Research, 2003.
  12. T. Hofmann, Probabilistic Latent Semantic Analysis, In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 1999.
  13. McCallum, Andrew Kachites. MALLET: A Machine Learning for Language Toolkit. 2002.
Index Terms

Computer Science
Information Sciences


Video Genre Identification Latent Dirichlet Allocation LDA