• Users Online: 45
  • Print this page
  • Email this page

 Table of Contents  
Year : 2011  |  Volume : 1  |  Issue : 1  |  Page : 12-18

CBMIR: Content-based image retrieval algorithm for medical image databases

Department of Computer Engineering, Bu Ali Sina University, Hamedan, Iran

Date of Web Publication23-Sep-2019

Correspondence Address:
Abdol Hamid Pilevar
Department of Computer Engineering, Medical Intelligence and Language Engineering Laboratory, Bu Ali Sina University, Hamedan
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/2228-7477.83460

Rights and Permissions

We propose a novel algorithm for the retrieval of images from medical image databases by content. The aim of this article is to present a content-based retrieval algorithm that is robust to scaling, with translation of objects within an image. For the best result and efficient representation and retrieval of medical images, attention is focused on the methodology, and the content of medical images is represented by the regions and relationships between such objects or regions of the Image Attributes (IA) of the objects. The CBMIR employs a new model in which each image is first decomposed into regions. The similarity measurement between images is developed based on a scheme that integrates the properties of all the regions in the images using regional matching. The method can answer queries by example. The efficiency and performance of the presented method has been evaluated using a dataset of about 5,000 simulated, but realistic computed tomography and magnetic resonance images, from which the original images are selected from three large medical image databases. The results of our experiments show more than a 93 percent success rate, which is satisfactory.

Keywords: Content-based image retrieval, medical image databases, region matching

How to cite this article:
Pilevar AH. CBMIR: Content-based image retrieval algorithm for medical image databases. J Med Signals Sens 2011;1:12-8

How to cite this URL:
Pilevar AH. CBMIR: Content-based image retrieval algorithm for medical image databases. J Med Signals Sens [serial online] 2011 [cited 2022 Aug 19];1:12-8. Available from: https://www.jmssjournal.net/text.asp?2011/1/1/12/83460

  Introduction Top

Content-based image retrieval (CBIR) applies to techniques for retrieving similar images from image databases, based on automated feature extraction methods. In recent years, the medical imaging field has been grown and is generating a lot more interest in methods and tools, to control the analysis of medical images. To support clinical decision-making, many imaging modalities, such as magnetic resonance imaging (MRI), X-ray computed tomography (CT), digital radiography, and ultrasound, are currently available. For administrative, clinical, teaching, and research activities, medical image database systems are emerging as an important component of Picture Archiving and Communication Systems (PACS). Usually, in the CBIR system, for each image, a feature signature on its pixel values is computed, the signature serves as an image representation, the components of the signature are called features. A rule for comparing images is defined as retrieving images that match the given query rules from a large database of images. The main reason for using the signature is to improve the correlation between image representation and semantics. This is done by mapping one or several signatures to d-dimensional points in some metric space and building an index on all signatures for fast retrieval. A function such as the Euclidean distance is used for calculating distances between each pair of signatures. The index is used to efficiently locate signatures close to the query point. The matched images are returned to the user.

The existing general-purpose CBIR systems roughly fall into two categories depending on the approach to extract signatures: The image-based search and the region-based search. Some of the systems using the weighted sum matching metric, combine the retrieval results from individual algorithms [1] or other algorithms. [2] The signatures are extracted; a comparison rule, including a querying scheme and the definition of a similarity measure between images is determined.

In most of the image retrieval systems, a query is specified by an image to be matched. We refer to this as an overall search, as similarity is based on the overall properties of images. By contrast, there are also partial search querying systems that retrieve based on a particular region in an image. [3] A content-based image retrieval method named CBDIR, segmenting the teeth of dental study models (plaster casts of the dentition), exhibits varieties of malocclusions. [4] A medical and general purpose image retrieval (MGIR) method is used for retrieving medical and general purpose images from databases, robust to scaling and translation of objects within an image. [5] The Colorimetry-Based Retardation Measurement Method (CBRM) is a method in which each image is first decomposed into regions. A measure for the overall similarity between images is developed using a region-matching scheme that integrates the properties of all the regions in the images. [6]

In a clinical decision-making system, more than a query by series ID, patient name, or study ID for images is needed. It is important and beneficial to find other images of the same disease and the same modality in the same anatomic region. [7],[8],[9]

In a query, for example, methods focusing only on color, texture, and shape, do not show how to handle inter-relationships or multiple objects or regions. The Content Based Medical Image Retrieval algorithm (CBMIR) algorithm mainly focuses on spatial relationships. The main contributions of this study are as follows:

  1. A method for efficient retrieval and representation of medical images based on Image Attributes (IA)
  2. An effective method for examining the retrieval process in the MRI and CT medical images.

The CBMIR system is interactive and the user is allowed to correct the results of the segmented images. The user can identify and extract interesting images or regions from all segmented images. The user can even specify the class to which an image belongs. Based on the properties of individual regions and spatial relationships between such regions, the CBMIR system takes the responsibility of efficient storage, representation, and retrieval of images.

The rest of this article is organized as follows:

  • A short presentation of the underlying theory on Image Attributes is presented in Section 2
  • An approach to the CBMIR algorithm for medical images is discussed in Section 3
  • The indexing and search method is explained in Section 4
  • Feature vector and similarity measure are discussed in Section 5
  • Experimental results are discussed in Section 6
  • The conclusion and issues for future research are presented in Section 7.

  Attribute Extraction Method Top

A collection of images is given; the appropriate representations of their features and organization, together with their representations in the database are needed, so that one can search for images similar to the query image. The images are defined with their object properties and relationships between objects. The segmentation of CT and MRI images is in general very difficult and it is currently the subject of independent research activities. [1],[10] The image segmentation process is done under the supervision of related experts (i.e., a clinician). In the first step, the images are segmented after necessary edge detection, using a low-pass filter. By editing, deleting or correcting the insignificant segments, the experts provide the desired segmentations and shapes.

Different features are specified for image representation; the features and original gray-level images are stored in the database and used for browsing or retrieving the images.

The images are segmented into disjointed regions or objects. [Figure 1] shows an example of the edge-detected form of a gray-level image, and the complete contour-detected image is shown in [Figure 1]a, and its corresponding segmented polygonal shaped components are depicted in [Figure 1] (P1 to P4).
Figure 1: (A) Contour detection of an image. (P1, P2, P3, and P4) its segmented components

Click here to view

The images are classified into specific predefined anatomical pathology classes [Table 1]. The classes are defined based on parts of the body (e.g., neck, head, etc.), or a part of the image (i.e., a region, an object, or a segment). They are classified into predefined classes, which correspond to the normality or abnormality of anatomical structures (e.g., hematoma, ventricle, tumor, etc.). The category is organized based on diagnostic and anatomical hierarchies. The experts have classified the images into appropriate anatomical classes by selecting their names in the class hierarchy. In this article, we focus on MRI and CT images, and the effectiveness of the proposed method is illustrated by looking at the image content, its representation, and retrieval. [Figure 2] illustrates four representative images of these classes. However, the proposed method can be applied to other kinds of medical images too.
Figure 2: Four samples of the medical images which are saved in training and test databases

Click here to view
Table 1: Category of the anatomical pathology

Click here to view

  Cbmir Algorithm Top

Based on closed contour correspondence, the images are segmented into dominant image objects or regions, and then the image components are labeled by domain experts [Figure 3]. The classes of images are characterized by certain objects (e.g., liver, body outline, spine for CT or MRI images of the skull, abdomen, ventricles for images of the head etc.). For almost all of the images of the same class such objects are presented. The unexpected and additional objects are identified and classified into classes such as tumor, hematoma, and the like.
Figure 3: A sample of extracted components P1…P4, and their signature S1…S4

Click here to view

Three databases are implemented in this study:

  • All the original MRI or CT images saved in a database are named Dd
  • Images of the closed contour components (i.e., [Figure 1] P1…P4), saved in a database are named Ds
  • ID numbers of the images and numbers of their components are saved in database Dy.

This is followed by the number of components and feature vectors of the images, which are saved in database Dv.

Step 1: CBMIRs algorithm for creating the Dv database:

for k=1: N */ N is the number of images in database Dd

read image k; */read the kth image from database Dd

for I=1: P k */ P j is the number of components of the k th image

read i; */ read component i, from database Ds (i.e., [Figure 3] P i )

calculate C i ; */ C i is the centroid point of the component region i

find S i ; */ S i is the signature of component i (i.e. [Figure 3] S i )

calculate F i1 , F i2… F i8

*/ calculate eight feature values for F i , of signature S i ,

*/ on points (0, π /4, π /2, 3π /4, π , 5π /4, 3π/2, 7π/4) (i.e. [Figure 3] S i )

save k, i, F i1 , F i2 …F i8

*/ save ten values for component i, in database D v



Step 2:
CBIR-Ms algorithm for query matching:

input Q; */Q is the query image, segmented into closed contours form

find F q ; */ calculate the feature vector of query image Q, like step 1

for J=1: N */ N is the number of images in database D d

compare Q; */ search for the most similar images in database D y

*/ (compare number of components and feature vectors)

report similarities; */ report partially or completely similar cases.


All the images in the database are normalized into 400 × 400 pixel size images, to reduce the fuzziness of the


Two images are compared by the system for object similarities and their relationships, and outcome results are evaluated by referring to instructions of the radiologists.

  Indexing and Search Method Top

The images are coded with character strings of a length of six characters, as shown in [Figure 4]. For locating an image address in database Dd, these six character string codes, which are called class-codes, are used.
Figure 4: The structure of character string codes (class-code) of the images in database Dd

Click here to view

To search for the feature vectors of the components in databases Dy, string codes with the length of eight characters are used [Figure 5].
Figure 5: The structure of character string codes of the features in database Dy

Click here to view

A special tree search technique called component feature codes (CF-Codes) is applied; the feature vectors are looked for by using a tree-based searching technique as displayed in [Figure 6].
Figure 6: Depict of CF-codes tree base search technique

Click here to view

For example, a query image is given and its CF-code is extracted by the system, as shown in [Figure 7], then the searching process will be followed through the nodes of the tree, as marked in [Figure 6].
Figure 7: An example of a CF code, consisting of class-code (02), sub-class-code (12), sub-sub-class-code (24), and component-code (09)

Click here to view

An appropriate software system is provided, every time a query image is given to the system, and the following steps are taken:

  1. Image is segmented into components
  2. The number of components is found
  3. Contours of the components are detected
  4. Signatures of the components are found
  5. The feature vector of the components is computed
  6. The class, sub-class, and sub-sub-class are suggested and class-code is extracted from database D d
  7. The CF-code of the image is provided
  8. The records with similar CF-codes in database D y are marked
  9. The records with the most similar feature vectors in database D y are found
  10. The related images in database D d that compare to marked records in database D y are detected
  11. The detected images are reported respectively, based on their similarities.

  Feature Vector and Similarity Measure Top

Features are extracted as follows:

Let I be the Image, V f be vector defined for feature f, then function between I and V f is defined as (1).

For any extracted feature there exists a feature vector into which the images are mapped. All images are known by their related feature vector. Therefore, too many feature vectors have to be supported by the image database system, while different measurement strategies can be applied. There are different metric functions for determining the similarity degree of the images with each other. A metric vector is defined as a tuple (V f , M f), where V f is a set of features and M f a metric for calculating the similarity between a pair of given features V f , as follows:

Where V f × V f is the Cartesian product between the features of the same vector.

Such that:

  1. M f (x, y) ≥ 0. Non-negativity
  2. M f (x, y) = 0, if and only if x = y. Identity
  3. M f (x, y) = M f (y, x). Symmetry
  4. M f (x, z) ≤ M f (x, y) + M f (y, z). Triangle inequality

If a linear combination of different metrics and many features are used, then better comparisons are expected to be achieved:

- Let E f be the feature extraction function of a feature f.

- Let x,y ∈ I be images.

- Let M f be a metric in the feature space V f .

If the importance weight is defined as W f , and linear combination of metrics as M f , then the similarity function of features is defined as:

Whereas in this study, the number of features in each feature vector is eight, therefore:

Measures can be evaluated using distance functions, and it is important to determine the most suitable function for each type of feature vector. The following metrics have been experimentally evaluated. Let H, H be vectors, both with M elements, and H m be the m th individual element.

Euclidean distance is used to evaluate distances in n dimensional vector spaces,

In this study, the number of the elements in vector M is set to be eight features.

Ten of the most similar images to the query images are selected from database D d . Between thses ten selected images, the most similar image is found by using the Fourier descriptors method as follows:

The Fourier descriptors can be used to match similar shapes even if they have a different size and orientation. If a(f) and b(f) are the FDs of two boundaries u(n) and v(n), respectively, then their shapes are similar if the distance,

is small. The parameters u 0 , α , n 0 , and θ are chosen to minimize the effects of translation, scaling, starting, points, and rotation, respectively. If u(n) and v(n) are normalized so that Σ u(n) = Σ v(n) = 0, then for a given shift n 0 , the above distance is minimum when

u 0 = 0

Where: a(k)b*(k) = c(k)e j-k , Φ ≜ -2π n 0 /N, and c(k) is a real quantity. These equations gives α and θ 0 , from which the minimum distance d is given by

The distance d(Φ ) can be evaluated for each Φ = Φ (n 0 ), n 0 = 0, 1,…,N-1, and a minimum search to obtain d. The quantity d is then a useful measure of difference between the two shapes.

The image with minimum difference is reported as the most similar image to the query image.

  Experiments and Results Top

A dataset of about 5,000 simulated, but realistic computed tomography and magnetic resonance images (MRI) is used. In the medical field, there is a great amount of anatomical information gathered during the past centuries, which is well accepted by physicians and radiologists. The original images are selected from three different and large medical image databases, such that almost all classes of the category 'anatomical pathology' [Table 1] are covered.

As an example, a query image is given to the CBMIR's software system and the features are extracted as shown in [Table 2].
Table 2: Feature vectors of a query image with five components

Click here to view

The feature vector of the query image is compared with the feature vectors of the images in the dataset. The Euclidian method is used to calculate the distances between the query image and extracted images. The most similar images are reported. In [Table 3], the features vector of the most similar extracted image is shown.
Table 3: Feature vectors of the most similar images to the query image

Click here to view

The two primary measures for evaluating the overall performance of a retrieval system are recall and precision. Recall is defined as the fraction of retrieved relevant images over the total number of relevant images in the database. Precision is defined as the fraction of relevant images retrieved over all the images retrieved by the system. The two measures are usually correlated in such a way that maximizing one deteriorates the other. For each query, the precision is computed and the accuracy of the method is measured.

Our experiments on the CBMIR system illustrate that, the average value of precision is more than 93%, that is to say, more than 93 percent of the retrieved images are correct.

For example in [Figure 8], a query image with four components is shown [Figure 8]a. One of the extracted images by the CBMIR algorithm is image 2 with six components [Figure 8]b, as we can see, four of the components are similar to the query image and two components are extra. It can be very noticeable for experts, from a diagnostic point of view.
Figure 8: (1) Query image with four components, and (2) extracted image with six components

Click here to view

In this article, we have proposed a novel content-based retrieval algorithm CBMIR, robust to translation and scaling of objects within an image. CBMIR employs a novel technique in which each image is first decomposed into components. An efficient software system is used for illustrating the signatures and computing feature vectors. CBMIR, unlike the traditional approaches, which are based on a single signature for each image, builds a set of eight signatures for an image and stores the set of signatures as feature vectors in a grid-structured file. Experiment results on real-life sets show that the retrieved images by CBMIR are semantically more related to the query image than those retrieved by similar algorithms. In this article we have shown that the CBMIR system works efficiently and effectively for medical applications.

In future studies, we have planned to work on applying this technique to different types of large and very large medical and biomedical image databases, which will be presented in our next article.

  References Top

D. Comaniciou, P. Meer, and D. J. Foran. "Image-quided decision support system for pathology". Machine Vision and Applications, vol. 11, no. 4, pp. 213-223, 1999.  Back to cited text no. 1
S. C. Orphanoudakis, C. Chronaki, and D. Vamvaka. "I-Cnet: content-based similarity search in geographically distributed repositories of medical images". Computerized Medical Imaging and Graphics, vol. 20, no. 4, pp. 193-207, 1996.  Back to cited text no. 2
E. G.M. Petrakis and C. Faloutsos. "Similarity searching in medical image databases",. IEEE Trans. On Knowledge and Data Engineering, vol. 9, no. 3, pp. 435-447, 1997.  Back to cited text no. 3
A. H. Pilevar, M. Sukumar, A. R. Gowda, E. T. Roy, " CBDIR: content-based dental image retrieval", Journal of Indian Orthodontics Society(JIOS), 2005.  Back to cited text no. 4
A. H. Pilevar, Mohammad Taher Pilevar, "MGIR: an image retrieval method for medical and general image databases", 11th Joint Conference on Information Science (JCIS 2008), Dec. 15-20, 2008.  Back to cited text no. 5
A. H . Pilevar, Mohammad Taher Pilevar , " CBRM: a new content-based regional-matching image retrieval method", Visualization, Imaging and Image Processing (IASTED 2008), Sept.1-3, 2008.  Back to cited text no. 6
KP. Andriole, "The Society for Computer Applications in Radiology Transforming the Radiological Interpretation Process (TRIP) Inititative", White Paper (http://www.siimweb.org). November 2005. Last Accessed: March 26, 2007.   Back to cited text no. 7
A. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, "Content-based image retrieval at the end of the early years", IEEE Trans. Pattern Analysis. & Machine Intel., vol. 22, no. 12, pp. 1349-80, 2000.  Back to cited text no. 8
A.A.A. Youssif, A. A. Darwish and R. A. Mohamed, "Content based medical image retrieval based on pyramid structure wavelet", International Journal of Computer Science and Network Security, vol. 10, no. 3, pp. 157-164, March 2010.  Back to cited text no. 9
W. W. Chu, C.-C. Hsu, A. Cardenas, and R.K. Taira. "Knowledge-based image retrieval with spatial and temporal constructs". IEEE Trans. on Knowledge and Data Engineering, vol. 10, no. 6, pp. 872-888, 1998.  Back to cited text no. 10

  Authors Top

Abdol Hamid Pilevar, Assistant professor in Computers Engineering Department, Bu Ali Sina University, Hamedan, Iran. He received his B.Sc and MSc. degree in Computer Systems from Florida Atlantic University, Florida, U.S.A. Dr. Pilevar received the PhD degree in computer science from University of Mysore, Mysore, India in 2005 He was a Research Associate and Post Doctoral Fellow at Indian Institute of Science (IISC, Bangalore, India), and Mediscan Prenatal Diagnosis & Fetal Therapy Center (Chennai, India) in 2005-2006. His fields of interest include Medical Intelligence & Image Processing, 3D Modeling, and Speech &Natural Language Processing. He has published more than 40 papers in international, national journals, and conferences.


  [Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7], [Figure 8]

  [Table 1], [Table 2], [Table 3]

This article has been cited by
1 Content-based medical image retrieval system for lung diseases using deep CNNs
Shubham Agrawal, Aastha Chowdhary, Saurabh Agarwala, Veena Mayya, Sowmya Kamath S.
International Journal of Information Technology. 2022;
[Pubmed] | [DOI]
2 An overview of deep learning in medical imaging focusing on MRI
Alexander Selvikvåg Lundervold,Arvid Lundervold
Zeitschrift für Medizinische Physik. 2019; 29(2): 102
[Pubmed] | [DOI]
3 An Efficient Multiclass Medical Image CBIR System Based on Classification and Clustering
Mahabaleshwar S. Kabbur
Journal of Intelligent Systems. 2018; 27(2): 275
[Pubmed] | [DOI]
4 Heterogeneous SoC-based acceleration of MPEG-7 compliance image retrieval process
Romina Molina,Julio Dondo Gazzano,Fernando Rincon,Veronica Gil-Costa,Jesus Barba,Ricardo Petrino,Juan Carlos Lopez
Journal of Real-Time Image Processing. 2018; 15(1): 161
[Pubmed] | [DOI]
5 A survey of grid-based searching techniques for large scale distributed data
Mohammed Bakri Bashir,Muhammad Shafie Bin Abd Latiff,Yahaya Coulibaly,Adil Yousif
Journal of Network and Computer Applications. 2016; 60: 170
[Pubmed] | [DOI]


Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

  In this article
   Attribute Extrac...
  Cbmir Algorithm
   Indexing and Sea...
   Feature Vector a...
   Experiments and ...
   Article Figures
   Article Tables

 Article Access Statistics
    PDF Downloaded37    
    Comments [Add]    
    Cited by others 5    

Recommend this journal