Content-Based Image Retrieval using Deep Learning (Master thesis)

Καπλάνογλου, Παντελής Ι.


Deep Convolutional Neural Networks (CNNs) have created new perspectives for Computer Vision and have recently been applied for Content-Based Image Retrieval (CBIR). Nevertheless their applicability for real-world image recognition in Robotics and Medical Imaging is still a subject open to research. Additional improvements in accuracy combined with reduction in computational costs are key issues that future proposals need to address. This Master's thesis proposes a new approach, by introducing the BioCNN. The Bio-inspired Convolutional Neural Network is a novel kind of architecture that imitates the human visual system starting from the photoreceptors and the neurons in the retina, through the lateral geniculate nucleus (LGN) and ending with the V1 cells of the primary visual cortex. The network is trained for classification with gradient descent, using a new technique of variable learning rate called Throttled Gradient Descent (TGD). Making a generalization of the various Bag of Visual Words (BoVW) approaches, the term Visual Features Language (VFL) is used to describe an image representation made of visual words that encapsulate its local features. The set of feature descriptors for the image regions is assembled from the activations of a BioCNN convolutional layer. Then it is clustered into visual words using the Rooting k-Means (RkMeans) algorithm, a variation of Hierarchical k-Means (HkMeans) with variable branching factor. Visual feature fusion is implemented by assembling the different VFLs into a single textual image representation that is indexed by a text search engine. The aim of this thesis is to explore this research field and conduct an initial study using the BioCNN-CBIR pipeline. During this the BioCNN architecture showed promising results in CIFAR10, by achieving similar performance with ImageNet winning architectures using an order of magnitude less network parameters.
Institution and School/Department of submitter: Σχολή Τεχνολογικών Εφαρμογών / Τμήμα Μηχανικών Πληροφορικής
Subject classification: Βαθιά μάθηση (Μηχανική μάθηση)
Deep learning (Machine learning)
Content-base image retrieval
Ανάκτηση εικόνας με βάση το περιεχόμενο
Image reconstruction--Digital techniques
Ανακατασκευή εικόνας--Ψηφιακές τεχνικές
Image processing -- Digital techniques
Επεξεργασία εικόνας -- Ψηφιακές τεχνικές
Keywords: Μηχανική Μάθηση;Deep Learning;Βαθιά Εκμάθηση;Ανάκτηση Πληροφορίας;Image Recognition;Αναγνώριση Εικόνας;Information Retrieval;Machine Learning
Description: Μεταπτυχιακή εργασία - Σχολή Τεχνολογικών Εφαρμογών - Τμήμα Μηχανικών Πληροφορικής, 2017 α.α. 8817
URI: http://195.251.240.227/jspui/handle/123456789/13728
Item type: masterThesis
General Description / Additional Comments: Μεταπτυχιακή εργασία
Subject classification: Βαθιά μάθηση (Μηχανική μάθηση)
Deep learning (Machine learning)
Content-base image retrieval
Ανάκτηση εικόνας με βάση το περιεχόμενο
Image reconstruction--Digital techniques
Ανακατασκευή εικόνας--Ψηφιακές τεχνικές
Image processing -- Digital techniques
Επεξεργασία εικόνας -- Ψηφιακές τεχνικές
Submission Date: 2021-12-23T15:02:11Z
Item language: en
Item access scheme: free
Institution and School/Department of submitter: Σχολή Τεχνολογικών Εφαρμογών / Τμήμα Μηχανικών Πληροφορικής
Publication date: 2017-07-05
Bibliographic citation: Καπλάνογλου Παντελής Ι., Ανάκτηση Εικόνων με Βάση το Περιεχόμενο με χρήση Μεθόδων Βαθιάς Μάθησης, Σχολή Τεχνολογικών Εφαρμογών / Τμήμα Μηχανικών Πληροφορικής, Διεθνές Πανεπιστήμιο Ελλάδος, 2017
Abstract: Τα Βαθιά Συνελικτικά Νευρωνικά Δίκτυα (CNNs) έχουν δημιουργήσει νέες προοπτικές για την Υπολογιστική Όραση και πρόσφατα έχουν εφαρμοστεί στην Ανάκτηση Εικόνας με Βάση το Περιεχόμενο (CBIR). Μολαταύτα η εφαρμοσιμότητα τους στον πραγματικό κόσμο για αναγνώριση εικόνας στην Ρομποτική και Ιατρική Απεικόνιση είναι ακόμα ένα ανοικτό θέμα προς διερεύνηση. Επιπρόσθετες βελτιώσεις στην ακρίβεια σε συνδυασμό με μείωση του υπολογιστικού κόστους είναι τα βασικά προβλήματα που πρέπει να αντιμετωπίσουν οι μελλοντικές προτάσεις. Η παρούσα Μεταπτυχιακή διπλωματική εργασία προτείνει μια νέα προσέγγιση εισάγοντας το BioCNN. Το Βιο-εμπνευσμένο Συνελικτικό Νευρωνικό Δίκτυο (Bio-inspired Convolutional Neural Network) είναι ένας καινοτόμο είδος αρχιτεκτονικής που μιμείται το ανθρώπινο οπτικό σύστημα, ξεκινώντας από τους φωτοϋποδοχείς και τους νευρώνες τους αμφιβληστροειδούς, διαμέσου του έξω πλευρικού γονατώδους πυρήνα (LGN) και καταλήγοντας στα κύτταρα V1 του πρωτεύοντος οπτικού φλοιού. Το δίκτυο εκπαιδεύεται για ταξινόμηση με κατάβαση δυναμικού, χρησιμοποιώντας μια νέα τεχνική μεταβλητού ρυθμού μάθησης που λέγεται Κατάβαση Δυναμικού Αυξομειούμενης Ώσης (Throttled Gradient Descent - TGD). Γενικεύοντας τις διάφορες προσεγγίσεις Σάκου Οπτικών Λέξεων (Bag of Visual Words - BoVW) , ο όρος Γλώσσα Οπτικών Χαρακτηριστικών (Visual Features Language - VFL) χρησιμοποιείται για να περιγράψει μια αναπαράσταση εικόνας φτιαγμένη από «οπτικές λέξεις», που εμπεριέχουν τα τοπικά χαρακτηριστικά της. Το σύνολο των περιγραφέων χαρακτηριστικών για τις περιοχές της εικόνας, συναθροίζεται από τις ενεργοποιήσεις ενός συνελικτικού επιπέδου του BioCNN. Κατόπιν συσταδοποιείται σε οπτικές λέξεις μέσω του Ριζωματικού k-Means (RkMeans), μιας παραλλαγής του Ιεραρχικού k-Means (HkMeans) με μεταβλητό παράγοντα διακλάδωσης. Η σύντηξη οπτικών χαρακτηριστικών υλοποιείται με συνάθροιση των διαφορετικών VFL σε μια ενοποιημένη αναπαράσταση της εικόνας, που ευρετηριοποιείται από μια μηχανή αναζήτησης κειμένου. Ο σκοπός αυτής της διπλωματικής είναι να εξερευνήσει το ερευνητικό πεδίο και να διεξάγει μια αρχική μελέτη χρησιμοποιώντας την αλληλουχία BioCNN-CBIR. Κατά την διάρκεια αυτής, η αρχιτεκτονική BioCNN έδειξε υποσχόμενα αποτελέσματα φτάνοντας σε παρόμοια επίδοση με νικήτριες αρχιτεκτονικές του ImageNet στο CIFAR10, χρησιμοποιώντας μια τάξη μεγέθους λιγότερες παραμέτρους δικτύου από αυτές.
Deep Convolutional Neural Networks (CNNs) have created new perspectives for Computer Vision and have recently been applied for Content-Based Image Retrieval (CBIR). Nevertheless their applicability for real-world image recognition in Robotics and Medical Imaging is still a subject open to research. Additional improvements in accuracy combined with reduction in computational costs are key issues that future proposals need to address. This Master's thesis proposes a new approach, by introducing the BioCNN. The Bio-inspired Convolutional Neural Network is a novel kind of architecture that imitates the human visual system starting from the photoreceptors and the neurons in the retina, through the lateral geniculate nucleus (LGN) and ending with the V1 cells of the primary visual cortex. The network is trained for classification with gradient descent, using a new technique of variable learning rate called Throttled Gradient Descent (TGD). Making a generalization of the various Bag of Visual Words (BoVW) approaches, the term Visual Features Language (VFL) is used to describe an image representation made of visual words that encapsulate its local features. The set of feature descriptors for the image regions is assembled from the activations of a BioCNN convolutional layer. Then it is clustered into visual words using the Rooting k-Means (RkMeans) algorithm, a variation of Hierarchical k-Means (HkMeans) with variable branching factor. Visual feature fusion is implemented by assembling the different VFLs into a single textual image representation that is indexed by a text search engine. The aim of this thesis is to explore this research field and conduct an initial study using the BioCNN-CBIR pipeline. During this the BioCNN architecture showed promising results in CIFAR10, by achieving similar performance with ImageNet winning architectures using an order of magnitude less network parameters.
Table of contents: 1 Introduction ...........................................................................................1 1.1 Image Recognition and Retrieval......................................................................1 1.2 Deep Learning...................................................................................................2 1.3 Biological Inspiration and the Vision for AI ......................................................3 1.4 Contributions of the Thesis ..............................................................................4 1.5 Outline of the thesis.........................................................................................5 2 Image Recognition with Deep Learning ..................................................7 2.1 Introduction......................................................................................................7 2.2 The Image Recognition Problem Setting ..........................................................7 2.2.1 Digital Imaging and Digital Image Representation ....................................7 2.2.2 Image Recognition and Image Retrieval ....................................................9 2.2.3 Properties and Quality Factors of an Image Recognition System ...........10 2.3 Deep Learning.................................................................................................12 2.3.1 Diversity of Deep Networks, Groups and Meta-Groups..........................12 2.3.2 A Short History of Deep Learning for Image Recognition........................16 2.4 CNN for Image Classification ..........................................................................17 2.4.1 Convolution Operation on Color Images.................................................17 2.4.2 Convolutional Layers................................................................................20 2.4.3 Pooling Layers..........................................................................................22 2.4.4 Activation Functions ................................................................................25 2.4.5 Pre-training a CNN for Classification........................................................27 2.5 State-Of-The-Art CNN architectures...............................................................35 2.5.1 Advances in Computer Vision through ILSVRC ........................................35 2.5.2 The CNN architecture evolution through ILSVRC ....................................36 2.6 Future Research Directions for CNNs.............................................................47 2.6.1 Evaluation of the Current State-of-the-Art..............................................47 2.6.2 Open Research Subjects for Deep Convolutional Neural Networks........48 2.6.3 Alternative Directions..............................................................................49 3 Describing Images in a Language of Visual Features ..............................51 3.1 Introduction....................................................................................................51 3.2 Content-based Image Retrieval using a Visual Features Language................52 3.2.1 Image Retrieval based on Visual Features...............................................52 3.2.2 Text Retrieval and the Bag-of-Words ......................................................53 3.2.3 Vector Space Model for Information Retrieval........................................54 3.2.4 Image Retrieval using Bag-of-Visual-Words ............................................56 3.2.5 The Visual Features Language (VFL) ........................................................56 3.2.6 Handcrafted Feature Extractors ..............................................................57 3.2.7 Creation of the Visual Vocabulary ...........................................................59 3.3 Using Convolutional Neural Networks for CBIR .............................................61 3.3.1 The Deep Neural Content-based Image Retrieval Pipeline .....................61 3.3.2 DNN Feature Extractors before ILSVRC2012 ...........................................63 3.3.3 Review of CNN-based CBIR ......................................................................63 3.3.4 Open Research Subjects for CNN-CBIR....................................................65 3.4 Bio-inspired Models for Universal Visual Features ........................................67 3.4.1 Intuition from ZFNet Visualization...........................................................67 3.4.2 Inspiration from the Human Visual System.............................................68 3.4.3 Foundations of the Bio-inspired Convolutional Neural Network ............71 3.4.4 Models Inspired by the Human Visual System ........................................72 4 The Bio-inspired Convolutional Neural Network (BioCNN).....................75 4.1 Introduction....................................................................................................75 4.2 Overview of the Bio-Inspired CNN..................................................................76 4.2.1 Inspiration ................................................................................................76 4.2.2 Motivation................................................................................................76 4.2.3 Hypotheses ..............................................................................................77 4.2.4 Architectural Overview ............................................................................79 4.3 Artificial Retinal Neural Network (ARNN).......................................................83 4.3.1 Color Space Transformation with the Photoreceptors Abstraction ........83 4.3.2 Modeling the Outer Plexiform Layer .......................................................85 4.3.3 Modeling the Inner Plexiform Layer........................................................92 4.4 Artificial Primary Visual Cortex (APVC).........................................................103 4.4.1 Lateral Geniculate Nucleus....................................................................103 4.4.2 Gabor V1 Simple Cells............................................................................104 4.4.3 Characteristics of BioCNN stem output.................................................107 4.4.4 Higher Visual Cortex Convolutional Layers............................................109 5 Creating a VFL Vocabulary from BioCNN Features...............................111 5.1 Introduction..................................................................................................111 5.2 Convolutional Layer Activations as Features................................................111 5.2.1 Semantics of Neural Activations............................................................111 5.2.2 Assemblage of the Descriptor Set..........................................................113 5.3 High Dimensionality Clustering on Large Descriptor Sets............................114 5.3.1 Hierarchical k-Means.............................................................................114 5.3.2 Rooting k-Means....................................................................................115 5.3.3 Visual Word Assignment........................................................................116 5.3.4 Extensibility into a Visual Features Language Grammar .......................117 5.4 Indexing the Image as a Visual Document ...................................................119 5.4.1 Composing a Visual Document..............................................................119 5.4.2 Using text for the Visual Document image representation...................120 5.4.3 Indexing an Image Collection.................................................................121 6 BioCNN-CBIR Evaluation .....................................................................123 6.1 Introduction..................................................................................................123 6.2 Metrics for evaluating CNN-CBIR models.....................................................124 6.2.1 The Complexity of Evaluating CNN-CBIR ...............................................124 6.2.2 Supervised Classification Metrics ..........................................................125 6.2.3 Custom Metrics for Monitoring the Training Process ...........................128 6.2.4 Metrics for the Creation of a Visual Features Language .......................130 6.2.5 Information Retrieval Metrics................................................................131 6.3 Evaluation System for Experiments..............................................................133 6.3.1 Training, Validation and Test Sets..........................................................133 6.3.2 10-fold Cross Validation.........................................................................133 6.3.3 Gradient Descent Learning ....................................................................134 6.3.4 Minibatches, Sample Shuffling and Subfolds.........................................138 6.3.5 Visual Features Language Setup ............................................................138 6.3.6 Image Retrieval Environment ................................................................139 6.4 Image Datasets.............................................................................................141 6.4.1 CIFAR10 and CIFAR100 dataset .............................................................141 6.4.2 Caltech 101 dataset ...............................................................................141 6.4.3 The Caltech101-2017 randomly chosen image collection.....................142 6.4.4 The HIQ-1 Handpicked Images Set ........................................................144 6.5 Experiments Setup........................................................................................145 6.5.1 Classification Training Process...............................................................145 6.5.2 Nominal TGD Gradient Training Hyperparameters...............................145 6.5.3 AlexNet / ZFNet architecture reproduction...........................................146 6.5.4 BioCNN Architectures for CIFAR10 ........................................................147 6.5.5 BioCNN Architectures for Caltech101-2017 ..........................................148 6.6 Image Classification Results..........................................................................150 6.6.1 Classification Results on CIFAR10 ..........................................................150 6.6.2 Classification Results on Caltech101......................................................153 6.7 VFL Creation and Image Retrieval ................................................................157 6.7.1 VFL Creation Experiments......................................................................157 6.7.2 Image Retrieval Experiments.................................................................160 6.8 Discussion .....................................................................................................166 6.8.1 Best Performance on CIFAR10 Using a Fraction of Parameters............166 6.8.2 Confirming the Transferability of BioCNN Features. .............................166 6.8.3 Competitive Performance on Caltech101-2017 ....................................167 6.8.4 Feature Quality of Sub-Optimal Classifiers............................................168 6.8.5 Benefits from the Visual Features Language Concept...........................168 6.8.6 Understanding of the CNN-CBIR complexity .........................................169 7 Software Engineering Details..............................................................171 7.1 Three Tier Software Architecture.................................................................171 7.2 Machine Learning R&D Environment...........................................................172 7.2.1 Accelerated Computing with CUDA.......................................................172 7.2.2 Google Tensorflow, Python ...................................................................173 7.2.3 SciKit-Learn, MatPlotLib.........................................................................175 7.2.4 Terrier IR Search Engine.........................................................................176 7.2.5 TALOS Framework..................................................................................177 7.2.6 Data Memory Tier..................................................................................178 8 Conclusion..........................................................................................179 8.1 Summary.......................................................................................................179 8.1.1 Introducing the BioCNN Deep Neural Network.....................................179 8.1.2 Content-Based Image Retrieval with the Visual Features Language.....180 8.1.3 Software Implementation of a Web 4.0 Intelligence Tier .....................181 8.2 Future Perspectives......................................................................................182 9 Bibliography.......................................................................................183
Advisor name: Διαμαντάρας, Κωνσταντίνος
Examining committee: Διαμαντάρας, Κωνσταντίνος
Publishing department/division: Σχολή Τεχνολογικών Εφαρμογών / Τμήμα Μηχανικών Πληροφορικής
Publishing institution: ihu
Number of pages: 212
Appears in Collections:Μεταπτυχιακές Διατριβές

Files in This Item:
File Description SizeFormat 
Pantelis I. Kaplanoglou.pdfΚαπλάνογλου, Μεταπτυχιακή 9.29 MBAdobe PDFView/Open



 Please use this identifier to cite or link to this item:
http://195.251.240.227/jspui/handle/123456789/13728
  This item is a favorite for 0 people.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.