Reviews

27 views

A compact color descriptor for image retrieval

The resource usage in Content-Based Image Retrieval is a frequently neglected issue. This paper describes a novel compact feature vector based on image color histograms in the HSL color space. The images are represented using only 10 bytes per image.
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Transcript
  c  2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in anycurrent or future media, including reprinting/republishing this material for advertising or promotional purposes, creating newcollective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in otherworks.  A Compact Color Descriptor for Image Retrieval Vedran Ljubovic, Haris SupicUniversity of Sarajevo, Faculty of Electrical Engineering SarajevoZmaja od Bosne bb71000 Sarajevo, Bosnia and Herzegovinavljubovic@etf.unsa.ba, hsupic@etf.unsa.ba  Abstract —The resource usage in Content-Based Image Re-trieval is a frequently neglected issue. This paper describes anovel compact feature vector based on image color histogramsin the HSL color space. The images are represented using only10 bytes per image. It is shown that, in the context of Query-by-Example (QbE) usage scenarios, the method described achievesretrieval performance close to the state of the art image retrievalmethods that use considerably more memory. It is also shown thatthe described method outperforms other methods with similarmemory usage.  Keywords —  content-based image retrieval, histograms, his- togram distance, color spaces, HSL, Matsushita distance, query- by-example I. I NTRODUCTION A crucial element of any Content-Based Image Retrieval(CBIR) system is the feature extraction, a process that attemptsto represent a computer image with a limited amount of data. The created feature vector database represents collectionsof images. In the Query-by-Example (QbE) application suchdatabase can be searched for images that are most similar tothe given query image.Most CBIR systems include a feature vector that representsimage coloration. Color histograms are one of the earliest andbest known such color features [17].A large number of papers focus on getting the optimalretrieval performance using metrics common in informationretrieval science such as precision and recall. There arerelatively few papers that focus specifically on the featurevector size, such as [3]. Resource usage in CBIR systems isstill an important issue, even with the progress of computingtechnology [6].This paper explores a novel color descriptor that can beused alone or in combination with other descriptors for texture,layout etc. It features fast calculation and has low memoryusage. The paper also gives optimal approach to calculatingfeature distance. The latest datasets and performance metricsare used to compare our approach to other methods mentionedin literature, both state-of-the-art ones and those specificallydesigned for the lower memory usage. Overall the describedretrieval method seems deceptively simple, but a large numberof experiments (over 4000) were performed to tune all theparameters to reach the achieved result.The remainder of this paper is structured as follows:Chapter II gives an overview of the previous work in thisarea. Chapter III gives a detailed description of the algorithmsand methods used. Chapter IV explains the experimentalmethodology and apparatus, and presents experimental resultsin a brief discussion. Finally, Chapter V gives the conclusionand some directions for the future work.II. P REVIOUS  W ORK It is long known that large feature vectors created by thecommon retrieval methods are not usable in practical imageretrieval scenarios. In 2001 the MPEG-7 working group hasrecommended a number of image retrieval compact descriptorsthat can be used individually or in combination [9]. TheMPEG-7 effort also contributed to the image retrieval methodcomparison methodology.After this seminal work, researchers focused on bench-marking and suggesting improvements to MPEG-7, for exam-ple in [20]. The paper [5] proposes another novel method based on fuzzy processing of color histograms, while [2] combinessuch histogram with texture for a compact image descriptorthat combines both types of features. In [3] the same authorspropose an improved compact composite descriptor that pro-vides the best overall performance of all tested descriptors.The earliest work on using color histograms in imageretrieval is [17] which uses histograms in RGB color space.The RGB model is convenient for image representation incomputers because it models the way images are displayed onCRT monitors. A number of other papers on color histogramsalso use RGB color model (e.g. [4]).However, RGB model doesn’t correspond to the wayhumans perceive color. HSV color space is explicitly designedto model human color perception, and is therefore used in mostpapers on histograms as a global feature in CBIR (e.g. [3], [16], [21], [12]). It is also used for the Scalable Color Descriptor (SCD) feature of the MPEG-7 standard for multimedia contentdescription [9].A closely related color model is HSL, however there arefew papers that use this particular color model in the imageretrieval. It is commonly – and mistakenly – assumed thatthis model is identical to the HSV for the purposes of imageretrieval.None of the above color spaces is uniform. Papers [5]and [14] suggest that using a uniform color space suchas CIE L*a*b* or L*u*v* should deliver superior retrievalperformance. However, conversion from RGB to Lab requiresan extra step of conversion to XYZ model and a calculationof cube root, making it computationally expensive [3].Another problem with the color histograms in image re-trieval is that of dimensionality reduction. Computer images  are usually represented using 24 bits per pixel. Resultinghistogram with 16,7 million bins would typically use morememory than the image itself. Most papers use some approachto reduce the number of bins.The most common approach since the earliest work onhistograms [17] is the following: Each of color channels isfirst quantized by taking a number of most-significant bits.Then these bits are concatenated to form an index into a singlehistogram [3], [16]. Such histogram is called linked. For example, a commonly used quantization scheme isHSV 16*4*4 as is the case with MPEG-7 Scalable ColorDescriptor [9]. Here, value for Hue (H) is quantized as a 4-bitinteger (value in range [0,15]), while Saturation (S) and Value(V) are quantized as 2-bit integers (values in range [0,3]). Theresulting values are combined to form an index I to a singlebig histogram using the following formula: I   = 16  ·  H   + 4  ·  S   +  V   We see that I is an 8-bit integer (value in range [0,255]) andtherefore a histogram with 256 bins is obtained.Sometimes heuristic can be used to further reduce thenumber of colors. For example, in the HSV color model,gray can be described as any pixel where saturation (S) isbelow certain threshold [16]. Similarly, black is a color wherevalue (V) is below certain threshold. However, white colorcharacterization requires observation of both S and V. InHSL color model white is described simpler: any pixel wherelightness (L) is above the given threshold. Another problemwith the HSV model is that the human perception of color(hue) varies with saturation [16]. This problem likewise is lesspronounced in HSL model.The hue can be quantized by a formula [12], howevera better approach is to use the table of values for hue andtheir mapping to perceived colors [21]. Results can be furtherimproved by using fuzzy color processing [3], [5]. The issue of how histograms are represented in memory i.e.how many bits are used to represent each bin is surprisinglyfrequently omitted from papers. It seems that most papersassume using the native floating point type of a programminglanguage, typically of 32 bits size. MPEG-7 SCD uses onlyone bit per bin, which is processed using Haar transform [9].Our analysis of source code to LIRe, the open-source imageretrieval platform [8] revealed that its color histogram featureis linearly quantized to 10 bits per bin.A conclusion reached after a large number of experimentsis that, when using linear quantization of histogram bins,decreasing the number of bits per bin results in a rapidlydecreased retrieval performance [7]. Optimal number of bitsper bin is between 8 and 12 bits. However some papersuse nonlinear quantization. In CEDD and FCTH histogramis quantized to 3 bits per bin using coefficients obtained usingGustaffson Kessel fuzzy classifier [3].Finally, there must be a method for calculating distancebetween two histograms. Most relevant papers use the Manhat-tan distance (L1) or the Euclidean distance (L2), although laterwork suggests that the optimal distance metrics for histogramsmay be the Jeffrey divergence (Jensen-Shannon divergence) [4] Table I. E XPERIMENTALLY OBTAINED OPTIMAL QUANTIZATION FOR H UE AND ITS INTERPRETATION IN TERMS OF VISIBLE COLORS Hue (0-360) English language color name0-20 red20-70 yellow70-140 green140-210 cyan210-240 blue240-280 magenta280-360 pink  Figure 1. The palette of 10 basic colors used in the method described in thepaper or histogram intersection [12], [14]. A comprehensive study on the distance metrics for color histograms [7] found that usingthe Matsushita distance gives the best results, especially whenusing a large number of bits per bin and a small number of bins.III. D ESCRIPTION OF  R ETRIEVAL  M ETHOD Feature extraction starts with the image conversion to HSL(Hue-Saturation-Lightness) color model. Computer images aretypically represented using the RGB color model. Conversionto HSL is fast and direct compared to perceptually uniformcolor models such as the CIE L*a*b* used in [5]. Addedcomplexity of conversion to the CIE L*a*b* is not justifiedby improved retrieval performance [3].An important feature of the HSL color model is that black,white and gray can be characterized by extreme values of individual components (Lightness or Saturation respectively).In the more common HSV model, characterization of whitecolor requires that both Saturation be low and Value be high.Another consequence is that in the HSV model the effect of saturation on perceptible color can’t be ignored as is done inthis paper.Upon conversion, the color of each pixel is quantized asfollows: If Lightness is lower than a black threshold (BT) thecolor is black. If it is higher than a white threshold (WT)the color is white. Otherwise, if Saturation is lower then agray threshold (GT) the color is gray. If none of the aboverules apply, Saturation and Lightness are discarded and Hueis quantized into one of the seven values representing sevenbasic colors.Through a large number of experiments that can be foundon authors’ website 1 , the best retrieval results were obtainedwith BT=15, WT=95 and GT=15 assuming that S and L arevalues on interval [0,100]. The optimal quantization for Hueand its interpretation is given in Table I.Thus the whole image is represented using 10 basic colorsgiven in Fig. 1. 1 http://people.etf.unsa.ba/~vljubovic/etfis/   However, there are still certain nonlinearities in HSL colorperception. For example, let’s observe a pixel with H=100(green), S=20 and L=95. A human observer would not noticeany coloration in this pixel, i.e. it would be recognized assomewhere between gray and white, closer to white. However,if we increase S to 100, the pixel gets an unmistakable greentint. Similar tint is obtained if we keep S at 20 but reduce Lto 80.To address these nonlinearities, fuzzy color matching in-spired by [3], [5], [2] is used. A set of 20 rules is used with fuzzy antecedents and crisp consequents evaluated using theMulti-Participant algorithm. First the Lightness (L) is evaluatedto determine membership to the Black and White bins. Thenthe Saturation (S) is evaluated for membership to the Gray bin.Finally, the Hue is evaluated for membership into 7 colors.Through a large number of experiments, ideal values forfuzzy limits were established. The BT thus became a fuzzyvalue [6,17], WT is [60,96], while GT is [10,21]. Each of thecolor transitions in the Table I is given a 20 degree fuzzy zone.For example, colors with the Hue in range [0,10] are classifiedas red, [10,30] are between red and yellow, [30,60] is yellow,etc.A histogram is formed of these 10 basic colors having10 bins. Upon calculation the histogram is normalized andquantized to 8 bits per bin using the following formula: h N  ( i ) =   h ( i ) n  ·  2 8  where  h  is the starting histogram,  n  is the total number of pixels in image, and  h N   is the normalized and quantizedhistogram.The following important issue in information retrievalis that of calculating distance between feature vectors. Theretrieval method presented in this paper uses the Matsushitadistance which is defined with the following formula: D  =   i ( √  x i  − √  y i ) 2 Through a large number of experiments available onauthors’ website it was found that the Matsushita distancedelivers superior performance to other distance metrics suchas L1 distance, L2 distance, Jensen-Shannon divergence [4],histogram intersection [12], [14], etc. IV. E XPERIMENTAL  M ETHODOLOGY AND  R ESULTS Use of standardized, annotated datasets is an importantissue in the CBIR [10]. To this purpose, we used three datasetsconsisting of photograph collections that are often cited inliterature: Wang SIMPLYcity [19], UW [1] and UCID [15]. Wang dataset consists of 1000 images in 10 categories (100images each). The results are considered relevant if they arein the same category.UW dataset is accompanied with a "ground truth" filewhere all or most of the images are annotated using severalwords (tags) describing image content such as "trees", "sky","buildings" etc. A result is considered relevant if any of the tagsare common between the query image and the result image.Finally, the UCID dataset includes a "ground truth" filewhere a set of relevant results is given for each query image.For many images in this dataset only one relevant result isgiven. The consequence is that most CBIR methods exhibitlow precision but high recall in this dataset.Measuring is done through an exhaustive search i.e. eachimage in the dataset is used as a query image (images with noground truth are skipped). The query image itself is omittedfrom the results.A common retrieval methods comparison metric is thePrecision-Recall graph. Two methods are compared by over-laying their graphs [11]. This, however, is impractical forcomparing a large number of retrieval methods in manyconfigurations across several datasets. Preferred approach is tohave a simple numerical metric for each query, which is thenaveraged across a large number of queries. After evaluatingliterature, we chose the following metrics:Mean Average Precision (MAP) - mean value for precisionis calculated across the 11 usual points in Precision-Recallgraph [11], [18]. This is a standard measure used by NIST in its TREC contest.AP15 - average value for precision in the first 15 retrievedimages [11]. This metric simulates a search engine whichreturns results in pages of 10-20 results. Research suggeststhat most users don’t look beyond the first page [13].Average Normalized Modified Retrieval Rank (ANMRR)is explained in [9]. This is the standard metric used in theImageCLEF competition.In addition we wanted to gauge the ability of tested meth-ods to quickly retrieve results of higher relevance, especiallyin UW dataset. To this purpose we introduce relevance scoreas number of tags that are common between query image andretrieved image, divided by number of tags in query image: Relevance  =  Nr.common tagsNr.query tags Relevance is a value in range [0,1] where the highervalue corresponds to the higher relevance of each result. Thusweighted precision (WP) can be calculated using the followingformula: WP   =  N i =1  Relevance i N  where N is number of retrieved images. This allows us tocalculate average weighted precision (AWP15) as a metriccomparable to AP15 given above.The method presented in this paper is compared to anumber of CBIR methods described in literature.The most common type of the color feature is histogramin the RGB color space where the three color components arecombined (or linked) so that the three most significant bitsare used from each component and then concatenated givinga total of 9 bits. Thus the resulting histogram has 512 bins.  Table II. E XPERIMENTAL RESULTS . F OR  MAP  AND  AWP15  METRICS MORE IS BETTER ,  FOR  ANMRR  LESS IS BETTER . R IGHT COLUMN FOR EACHMETRIC GIVES RANKING  ( E . G . 1  IS BEST PERFORMING METHOD ). T IMES ARE IN SECONDS . “E XPER .  TIME ”  IS TIME REQUIRED TO PERFORM THEEXPERIMENT  ( EXHAUSTIVE SEARCH IN ALL DATASETS ). Dataset WANG UW UCIDFt. size Index time Exper. timeMethod MAP ANMRR MAP AWP15 ANMRR MAP ANMRR This paper  0.4977 2 0.4188 4 0.6307 1 0.3738 6 0.2601 3 0.3621 1 0.5612 1 10 79 83RGB 0.4733 5 0.4274 5 0.6114 6 0.3820 3 0.2643 7 0.2369 5 0.6380 5 512 36 54MPEG-7 SCD 0.3108 8 0.5994 8 0.6048 7 0.3210 8 0.2604 4 0.1223 8 0.8272 8 ~32 59 58MPEG-7 CLD 0.4365 7 0.4684 7 0.5971 8 0.3239 7 0.2819 8 0.1683 6 0.7557 7 8 36 40CEDD 0.4982 1 0.4080 1 0.6155 5 0.3851 2 0.2615 6 0.2718 3 0.5915 3 54 94 100C.CEDD 0.4872 4 0.4183 3 0.6186 3 0.3797 4 0.2600 2 0.2547 4 0.6133 4 23 93 93FCTH 0.4906 3 0.4107 2 0.6185 4 0.3936 1 0.2604 4 0.2742 2 0.5892 2 72 112 106C.FCTH 0.4699 6 0.4277 6 0.6198 2 0.3768 5 0.2577 1 0.1287 7 0.7472 6 30 101 109 The histogram is then quantized to 10 bits per bin. The L1distance is used for distance metric.A very compact color feature based on the color histogramsis Scalable Color Descriptor (SCD) which is a part of MPEG-7specification for multimedia content description [9]. Anothercompact feature used in the MPEG-7 is Color Layout De-scriptor (CLD) which is not derived from the color histogramsbut uses Discrete Cosine Transform (DCT) to create a schemeof spatial distribution of color in an image. SCD and CLDare evaluated using implementations given in the LIRe open-source image retrieval platform [8]. The configuration for SCDis the default used in LIRe: 256 coefficients, resulting in cca.32 bytes per image whereas CLD always uses 8 bytes perimage [20].There are several proposals for improving MPEG-7 per-formance maintaining a compact vector. We used CEDD [3]and FCTH [2], as well as their compact versions – C.CEDDand C.FCTH, as state-of-the-art methods for comparison withour method. These methods combine color histograms using24 bins (standard versions) and 10 bins (compact versions)with texture features (FCTH and C.FCTH) and edge histogram(CEDD and C.CEDD) respectively. Again, implementations inLIRe were used in experiments.The experimental results are presented in Table II. For theMAP and AWP15 higher values mean better results, whilefor the ANMRR lower values correspond to better retrieval.In addition to performance metrics described above, featuresize and timing is given. The total time for indexing allthree datasets and for performing all experiments (exhaustivesearch) is given on an Intel Core i5 PC. A highly efficientimplementation in C++ is used for the method presented inour paper.The Table II shows that the method presented in thispaper outperforms the common RGB histogram which usesexponentially larger amount of memory. Our method alsoconsistently outperforms the similar MPEG-7 SCD, as wellas the MPEG-7 CLD. The state-of-the-art CEDD and FCTHmethods deliver better retrieval performance at greater memoryusage, however it must be noted that these methods combinecolor information with texture, whereas method in this paperuses just color information.V. C ONCLUSIONS AND FUTURE WORK The method presented in this paper does not give the bestpossible retrieval performance, nor is it unique in its smallfeature vector size. However, it does represent a very rea-sonable compromise between the two requirements. The bestoverall results are obtained using CEDD, followed by FCTH.However, these methods have considerably larger feature sizeand indexing time compared to method presented in this paper.The principal contributions of this paper are: use of theHSL color space and corresponding new coefficients for fuzzycolor matching, and use of the Matsushita distance. Usinglatest methodology and datasets, we have shown that themethod presented in this paper gives superior performanceto other methods that have similarly small feature vector. Itwas also shown that this method gives results comparable toretrieval methods that use considerably more memory.Therefore, it should be interesting to combine the colorfeature described here with some texture feature in the mannersimilar to that described in [3].Given very low resource usage for this method it shouldbe especially interesting to apply such approach in JPEGcompressed domain [6].Another problem that should be addressed is that of theefficient search of feature vectors. It is evident from timingsin Table II that all tested methods use linear search for findingthe image with the lowest distance, which is unacceptable forsearching large image collections (millions of images) or videocollections.R EFERENCES[1] Annotated groundtruth database. Department of Com-puter Science and Engineering, University of Wash-ington. Retrieved on February 13th, 2013 from:http://www.cs.washington.edu/research/imagedatabase/groundtruth/.[2] Savvas A Chatzichristofis and Yiannis S Boutalis. FCTH: Fuzzycolor and texture histogram – A low level feature for accurate imageretrieval. In  Image Analysis for Multimedia Interactive Services, 2008.WIAMIS’08. Ninth International Workshop on , pages 191–196. IEEE,2008.[3] Savvas A Chatzichristofis, Konstantinos Zagoris, Yiannis S Boutalis,and Nikos Papamarkos. Accurate image retrieval based on compactcomposite descriptors and relevance feedback information.  Inter-national Journal of Pattern Recognition and Artificial Intelligence ,24(02):207–244, 2010.[4] Thomas Deselaers, Daniel Keysers, and Hermann Ney. Features forimage retrieval: An experimental comparison.  Information Retrieval ,11(2):77–107, 2008.[5] K Konstantinidis, A Gasteratos, and I Andreadis. Image retrievalbased on fuzzy color histogram processing.  Optics Communications ,248(4):375–386, 2005.[6] Vedran Ljubovic and Haris Supic. Issue of resource usage in content-based image retrieval algorithms. In  Telecommunications (BIHTEL),2012 IX International Symposium on , pages 1–5. IEEE, 2012.
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks