Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

ImageFARMER: Introducing a Data Mining Framework for the Creation of Large-scale Content-based Image Retrieval Systems

International Journal of Computer Applications
© 2013 by IJCA Journal
Volume 79 - Number 13
Year of Publication: 2013
Juan M. Banda
Rafal A. Angryk
Petrus C. Martens

Juan M Banda, Rafal A Angryk and Petrus C Martens. Article: ImageFARMER: Introducing a Data Mining Framework for the Creation of Large-scale Content-based Image Retrieval Systems. International Journal of Computer Applications 79(13):8-13, October 2013. Full text available. BibTeX

	author = {Juan M. Banda and Rafal A. Angryk and Petrus C. Martens},
	title = {Article: ImageFARMER: Introducing a Data Mining Framework for the Creation of Large-scale Content-based Image Retrieval Systems},
	journal = {International Journal of Computer Applications},
	year = {2013},
	volume = {79},
	number = {13},
	pages = {8-13},
	month = {October},
	note = {Full text available}


In this paper we introduce imageFARMER, a framework that allows information retrieval researchers and educators to develop and customize domain-specific content-based image retrieval systems with ease while developing a deeper understanding of the underlying representation of domain-specific image data. imageFARMER incorporates different aspects of image processing and content-based information retrieval, such as: image representation via image parameter extraction, validation via image parameters, analysis of multiple dissimilarity measures for accurate data analysis, testing of dimensionality reduction methods for storage and processing optimization, and indexing algorithms for fast and efficient querying. The unique capabilities of this framework have not been available together as an open-source software package designed for research, while offering enhanced knowledge discovery and validation of all steps involved when creating large-scale content-based image retrieval systems.


  • Datta, R. , Joshi, D. , Li, J. and Wang, J. 2008. Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, vol. 40, no. 2, article 5, 1-60.
  • Rui, Y. , Huang, T. S. , and Chang, S. 1999. Image Retrieval: Current Techniques, Promising Directions, and Open Issues. Journal of Visual Com. and Image Rep. 10, 39–62.
  • imageFARMER [Onliune] Available: http://www. imagefarmer. org [Accessed: Oct 10, 2013]
  • Kato, T. 2002. Database architecture for content-based image retrieval. In Img. Storage and Ret. Sys. , 112-123.
  • Banda, J. M. , Angryk, R. , and Martens, P. C. H 2012. Quantitative Comparison of Linear and Non-linear Dimensionality Reduction Techniques for Solar Image Archives, In FLAIRS-25, 376-381.
  • Smeulders, A. W. , Worring, M. , and Santini, S. 2000. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intel. 22, 12, 1349–1380.
  • Snoek, C. G. M. and Worring, M. 2005. Multimodal video indexing: A review of the state-of-the-art. Multimed. Tools Appl. 25, 1, 5–35.
  • Muller, H. , Pun, T. , and Squire, D. 2004. Learning from user behavior in image retrieval: Application of market basket analysis. Int. J. Comput. Vision 56, 1–2, 65–77.
  • Banda, J. M. , Angryk, R. , Martens, P. 2012. Steps Towards a Large-Scale Solar Image Data Analysis to Differentiate Solar Phenomena, Solar Physics, Springer.
  • Bohm, C. , Berchtold, S. and Keim, D. A. 2001. Searching in high-dimensional space index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33, 3, 322–373.
  • FIRE -- Flexible Image Retrieval Engine [Online], Available: http://thomas. deselaers. de/fire/ [Accessed: Oct 10, 2013]
  • Deselaers, T. , Keysers, D. and Ney, H. 2008. Features for Image Retrieval: An Experimental Comparison, Inf. Ret. , Vol. 11, issue 2, The Netherlands, Springer, 77-107.
  • Banda, J. M. and Angryk, R. 2009. On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images. In FUZZ-IEEE09, Jeju Island, Korea, 2019-2024.
  • Pronobis, A. , Caputo, B. , Jensfelt, P. and Christensen, H. I. 2006. A discriminative approach to robust visual place recognition. In IEEE/RSJ IROS06, China, 3829-3836.
  • Hersh, W. , Müller, H. , Kalpathy-Cramer, J. The consolidated ImageCLEFmed Medical Image Retrieval Task Test Collection, J. of Digi. Img. , Vol. 22(6), 648-655.
  • Everingham, M. , Van Gool, L. , Williams, C. K. et. al. 2010. The PASCAL Visual Object Classes (VOC) Challenge Int. Journal. of Computer Vision, 88(2), 303-338.
  • Hall, M. , Frank, H. , Holmes, G. , Pfahringer, B. , Reutemann, P. , Witten, I. H, 2009. The WEKA Data Mining Software: An Update. SIGKDD Explorations, Volume 11, Issue 1, 10-18.
  • Banda, J. M. and Angryk, R. 2010. Selection of image parameters as the first step towards creating a CBIR System for the solar dynamics observatory. DICTA 2010. Sydney, Australia, 528-534.
  • Banda, J. M. 2011. Framework for creating large-scale content-based image retrieval system (CBIR) for solar data analysis. Ph. D thesis, Dept. Comp. Sci, M. S. U. .
  • Lamb, R. 2008. An Information Retrieval System For Images From The Trace Satellite, M. S. thesis, Dept. Comp. Sci. , M. S. U.
  • Orlov, N. , Shamir, L. , Macura, T. , Johnston, J. , Eckley, D. M. , Goldberg, I. G. 2008. WND-CHARM: Multi-purpose image classification using compound image transforms. Pattern Recognition Letters. 29(11): 1684-93
  • Vasconcelos, N. and Vasconcelos, M. 2004. Scalable Discriminant Feature Selection for Image Retrieval and Recognition. In CVPR 2004, 770–775.
  • Kullback, S. and Leibler, R. A. 1951. On Information and Sufficiency. Annals of Mathematical Statistics 22 (1): pp. 79–86.
  • Borg, I. and Groenen, P. 2005. Modern multidimensional scaling: theory and applications (2nd ed. ), Springer, NY.
  • Guo, G. D. , Jain, A. K. and Ma, W. Y. 2002. Learning similarity measure for natural image retrieval with relevance feedback. IEEE Trans. on Neural Net. Vol 13(4):811—820.
  • Aggarwal, C. , Hinneburg, A. , Keim, D. A. 2001. On the surprising behavior of distance metrics in high dimensional space. Database Theory—ICDT 2001: 420-434.
  • Ye, J. , Janardan, R. and Li, Q. 2004. GPCA: an efficient dimension reduction scheme for image compression and retrieval. In KDD '04. ACM, New York, NY, 354-363.
  • Moravec, P. and Snasel, V. 2006. Dimension reduction methods for image retrieval. In. ISDA '06. IEEE Computer Society, Washington, DC, 1055-1060.
  • Banda, J. M, Angryk, R. , Martens, P. C. H. 2012. On dimensionality reduction for indexing and retrieval of large-scale solar image data. Solar Physics: Image processing in the petabyte era. Springer 2012. DOI: 10. 1007/s11207-012-0027-4. Vol. 283, Issue 1, 113-141.
  • Matlab Toolbox for Dim. Reduction [Online] Available: http://homepage. tudelft. nl/19j49/ [Accessed: Oct 10, 2013]
  • Faloutsos, C. 1996. Searching Multimedia Databases by Content (1st ed. ), Springer.
  • Jagadish, H. V. , Ooi, B. C. , Tan, K. , et. al. 2005. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search. ACM TODS, 30, 2, 364-397.
  • Schuh, M. , Wylie, T. , Banda, J. M. , Angryk, R. 2013. A Comprehensive Study of iDistance Partitioning Strategies for kNN Queries and High-Dimensional Data Indexing. Lecture Notes in Computer Science: Big Data, Vol. 7968, 2013, 238-252