Call for Paper - March 2023 Edition
IJCA solicits original research papers for the March 2023 Edition. Last date of manuscript submission is February 20, 2023. Read More

Multistep Sparse Approximation Technology in Information Retrieval

Print
PDF
International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 99 - Number 10
Year of Publication: 2014
Authors:
Chi Shen
Wen Li
Mike F. Unuakhalu
10.5120/17406-7991

Chi Shen, Wen Li and Mike F Unuakhalu. Article: Multistep Sparse Approximation Technology in Information Retrieval. International Journal of Computer Applications 99(10):1-8, August 2014. Full text available. BibTeX

@article{key:article,
	author = {Chi Shen and Wen Li and Mike F. Unuakhalu},
	title = {Article: Multistep Sparse Approximation Technology in Information Retrieval},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {99},
	number = {10},
	pages = {1-8},
	month = {August},
	note = {Full text available}
}

Abstract

With large sets of text documents increasing rapidly, being able to efficiently utilize this vast volume of new information and service resource presents challenges to computational scientists. Text documents are usually modeled as a term-document matrix which has high dimensional and space vectors. To reduce the high dimensions, one of the various dimensionality reduction methods, concept decomposition, has been developed by some researchers. This method is based on document clustering techniques and leastsquare matrix approximation to approximate the matrix of vectors. However the numerical computation is expensive, as an inverse of a dense matrix formed by the concept vector matrix is required. In this paper we presented a class of multistep spare matrix strategies for concept decomposition matrix approximation. In this approach, a series of simple sparse matrices are used to approximate the decompositions. Our numerical experiments on both small and large datasets show the advantage of such an approach in terms of storage costs and query time compared with the least-squares based approach while maintaining comparable retrieval quality.

References

  • M. Berry and M. Browne. Understanding Search Engines: Mathematical modeling and text retrieval. SIAM,1999
  • C. M. Chen, N. Stoffel, M. Post, C. Basu, D. Bassu, and C. Behrens. Telcordia LSI engine: Implementation and scalability issues. In Proceedings of the Eleventh International Workshop on Research Issues in Data Engineering (RIDE 2001), Heidelberg, Germany, April 2001.
  • J. Chen and Y. Saad. Divide and Conquer Strategies for Effective Information Retrieval, Proceedings of the Ninth SIAM International Conference on Data Mining (SDM), 2009.
  • E. Chow. A priori sparsity patterns for parallel sparse approximate inverse preconditioners. SIAM J. Sci. Comput. , 21(5):1804–1822, 2000.
  • I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine Learning, 42(1):143–175, 2001.
  • J. Dobsa and B. J. Basic. Concept decomposition by fuzzy kmeans algorithm, In Proceedings of IEEE Web Intelligence Conference (WI2003), pp. 684-688, October 2003
  • J. Gao and J. Zhang. Text retrieval using sparsified concept decomposition matrix. Book Chapter of Computational and Information Science, Springer Berlin/Heidelberg, vol. 3314, pp. 523 – 529, 2005
  • J. Gao and J. Zhang. Clustered SVD strategies in latent semantic indexing. in Information Processing and Management, 41(5):1051-1063, 2005.
  • P. Husbands, H. Simon, and C. Ding. On the use of singular value decomposition for text retrieval. In M. W. Berry, editor, Computational Information Retrieval, pages 145–156. SIAM, Philadelphia, PA, 2001.
  • E. R. Jessup and J. H. Martin. Taking a new look at the latent semantic analysis approach to information retrieval. In Proceedings of the SIAM Workshop on Computational Information Retrieval, Raleigh, NC, 2000.
  • A. Kontostathis, W. M. Pottenger, and B. D. Davison. Identifcation of critical values in Latent Semantic Indexing (LSI). Book chapter of Foundations of Data Mining and Knowledge Discovery, pp. 333-346, 2005.
  • A. Kontostathis and W. M. Pottenger. A framework for understanding Latent Semantic Indexing(LSI) performance. In-formation Processing and Management , 42(1):56-73, 2006.
  • A. Kontostathis, W. M. Pottenger, and B. D. Davison. Identification of critical values in Latent Semantic Indexing(LSI). In Foundations of Data Mining and Knowledge Discovery pp. 333-346, 2005.
  • A. Kontostathis. Essential Dimensions of Latent Semantic Indexing (LSI). 40th Annual Hawaii International Conference on System Sciences, pp. 73, 2007.
  • V. Raghavan, P. Bollmann, and G. S. Jung. A critical investigation of recall and precision as measures of retrieval system performance. ACM Transactions on Information Systems, Vol. 7, pp. 205-229, 1989.
  • C. Tang, S. Dwarkadas, and Z. Xu. On scaling latent semantic indexing for large peer-to-peer systems. In Proceedings of the 27th Annual International ACM SIGIR Conference, Sheffield, UK, 2004
  • K. Wang and J. Zhang. MSP:A class of parallel multistep successive sparse approximate inverse preconditioning strategies. SIAM Journal on Scientific Computing, Vol. 24, No. 4, pp. 1141-1151, 2003
  • C. Shen and D. Williams. Approximate Matrix Decomposition Technniques in Information Retrieval, In Proceedings of World Congress on Engineering and Computer Science 2007, pp. 347 - 352, October 24-26, 2007.