Call for Paper - August 2022 Edition
IJCA solicits original research papers for the August 2022 Edition. Last date of manuscript submission is July 20, 2022. Read More

Content Modeling Paradigm: an interplay of relationship between Author, Document, Topic, and Words

© 2010 by IJCA Journal
Number 2 - Article 3
Year of Publication: 2010
Deepak Gupta

Deepak Gupta. Article: Content Modeling Paradigm: an interplay of relationship between Author, Document, Topic, and Words. IJCA,Special Issue on CASCT (2):61–68, 2010. Published By Foundation of Computer Science. BibTeX

	author = {Deepak Gupta},
	title = {Article: Content Modeling Paradigm: an interplay of relationship between Author, Document, Topic, and Words},
	journal = {IJCA,Special Issue on CASCT},
	year = {2010},
	number = {2},
	pages = {61--68},
	note = {Published By Foundation of Computer Science}


for any work of literature, a fundamental issue is to identify the individual(s) who wrote it, and conversely, to identify all of the works that belong to a given individual or to identify the individual who writes many papers on same topic or to identify the topics name that an author works on. Information extraction techniques (such as Author Name and Topic Recognition) have long been used to extract useful pieces of information from text. The types of information to be extracted are generally fixed and well defined. However in some cases, the user goal is more abstract and information types cannot be narrowly defined. For example, a reader of online user reviews typically has the goal of making a good choice and is interested to learn about the different aspects of a topic and author relation (e.g., famous author of a topic, author’s papers with his research field). Some of these aspects may be known by the reader and some others may need to be discovered from the inherent text structure in a large collection. Even for the known aspects (such as “author name” and “topic”), the challenge is to recognize various hidden aspects like number of papers written by an author, his research field, popularity of an author.

In this paper, we will develop content modeling Paradigm to extract the relationship between the author, document, topic and Words as topics with identifiable word distributions across documents of various authors. We review several probabilistic graphical models (such as Latent Dirichlet Allocation) and propose a new model called content modeling paradigm which is based on frequency of the words within the document.


  • T. F. Lunt, J. van Home, and L. Halme. Analysis of computer system audit trailsinitial data analysis. Technical Report TR-85009, Sytek, Mountain View, California,September 1985.
  • J. van Horne and L. Halme. Analysis of computer system audit trails final report.Technical Report TR-85007, Sytek, Mountain View, California, May 1986.
  • Peter G. Neumann. Security and integrity controls for federal, state, and local computersaccessing NCIC. Technical report, SRI International, 333 Ravenswood Avenue, MenloPark, CA 94025, June 1990.
  • Alfonso Valdes and Debra Anderson. Statistical methods for computer usage anomalydetection using NIDES. In Conference on Rough Sets and Soft Computing, November1994.
  • Boyd-Graber, J. & Blei, D., 2009. Syntactic Topic Models. In Neural Information ProcessingSystems.
  • Branavan, S., Chen, H., Eisenstein, J. & Barzilay, R., 2008. Learning Document-Level Semantic Properties from Free-text Annotations. In Proceedings of ACL.
  • Lin, J., 1991. Divergence measures based on the Shannon entropy. In IEEE Transactions onInformation Theory.
  • Mann, G. & McCallum, A., 2008. Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields. In ACL.
  • Mccallum, A., Corrada-Emmanuel, Andres & Wang, X., 2005. Topic and Role Discovery in Social Networks. In Proceeding of IJCAI.
  • Blei, D.M. & McAuliffe, J., 2007. Supervised topic models. In Advanced In NIPS. Blei, D.M., Ng, A.Y. & Jordan, M.I., 2003. Latent Dirichlet Allocation. In Journal of Machine Learning Research.
  • Minka, T. & Lafferty, J., 2002. Expectation-propagation for the generative aspect model. In Proceedings of UAI.
  • Newman, D., Chemudugunta, C. & Smyth, P., 2006. Statistical entity-topic models. In: 10th ACM SigKDD conference knowledge discovery and data mining (Seattle, 2004)
  • Mark Steyvers, Padhrai Smyth, Thomas Grihffiths, Probabilistic Author­Topic Models for Information Discovery.
  • H.S. Javitz and A. Valdes. The SRI statistical anomaly detector. In Proceedings of the1991 IEEE Symposium on Research in Security and Privacy, May 1991.
  • J. P. Anderson. Computer security threat monitoring and surveillance. Technical report, James P. Anderson Company, Fort Washington, Pennsylvania, April 1980.
  • T. F. Lunt, J. van Horne, and L. Halme. Automated analysis of computer system audit trails. In Proceedings of the Ninth DOE Computer Security Group Conference, May1986
  • Waterman, D.A, (1984) A guide to Expert Systems, Reading, Addison-Wesly, Massachusetts.
  • Blei, D.M. & Jordan, M.I., 2003. Modeling annotated data. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval.
  • Chang, J. & Blei, D., 2009. Relational Topic Models for Document Networks. In Artificial Intelligence and Statistics.
  • Cohen, J., 1960. A coefficient of agreement for nominal scales. In Education andPsychological Measuremen.
  • Deerwester, S. et al., 1990. Indexing by latent semantic analysis. In Journal of the AmericanSociety for Information Science.
  • Goldwater, S., Griffiths, T.L. & Johnson, M., 2006. Contextual Dependencies in Unsupervised Word Segmentation. In Proceedings of Coling/ACL.
  • Griffiths, T.L. & Steyvers, M., 2004. Finding scientific topics. In Proc Natl Acad Sci U S A. Griffiths, T.L., Steyvers, M., Blei, D.M. & Tenenbaum, J.B., 2005.
  • Integrating topics and Syntax. In Advances in NIPS 17.
  • Gruber, A., Rosen-Zvi, M. & Weiss, Y., 2007. Hidden Topic Markov Models. In ArtificialIntelligence and Statistics.
  • Haghighi, A. & Klein, D., 2007. Unsupervised Coreference Resolution in a Nonparametric Bayesian Model. In Association for Computational Linguistics.
  • Hofmann, T., 1999. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, UAI’99.
  • Hu, M. & Liu., B., 2004. Mining and summarizing customer reviews. In Proceedings of SIGKDD.
  • Levin, E. & Sharifi, M., 2006. Evaluation of Utility of LSA for Word Sense Discrimination. In Proceedings of HLT/NAACL.
  • Blei, D. & Lafferty, J., 2006. Dynamic topic models. In Proceedings of the 23rdInternational Conference on Machine Learning.
  • Blei, D. & Lafferty, J., 2007. A correlated topic model of Science. In Annals of AppliedStatistics.
  • Teresa Lunt. Detecting intruders in computer systems. In 1993 Conference on Auditingand Computer Technology, 1993.
  • Next-generation Intrusion Detection Expert System by Debra AndersonThane Frivold Alfonso Valdes Computer Science Laboratory 1995