A Discussion with Robert Haralick on
the Emergence of Deep Learning

28th December 2017

Dr. Robert Haralick is a leader in the field of computer vision. He has pioneered methods and paradigms which break problems down into smaller parts such as those of feature extraction and subspace classification of data. He was a keynote speaker at the International Conference on Advances in Pattern Recognition (ICAPR), 2017 held in the Indian Statistical Institute, Bangalore. Our junior researcher Suryoday Basak was able to catch up with him and ask him a few questions on the emergence of deep learning and how emerging problems should be solved with the use of machine learning.

Suryoday Basak: You have extensively contributed to image processing and a large portion of your work has been on smart feature engineering and extraction. This approach seems to be in contrast with methods of deep learning which are slowly becoming more popular by the day, where the features are automatically detected and the details often lost or often not looked at. What advantages or disadvantages do deep learning methods offer over more traditional machine learning methods?

Robert Haralick: I guess you’ve already stated one of the advantages – it's that you can look at the last layer of a neural network and you can get some features out, so that’s an advantage. However, the calculations that you have to go through from the beginning to that layer are a large, extensive set of calculations and in some sense, you don’t know what those calculations mean. You can find what the feature means by your own pattern recognition system where you start to look at images and what the values in the images are so that you can observe some kind of correspondence. The techniques which start out from the features’ side and develop the feature, for example, like mathematical morphology features - these kinds of features where you do morphological transforms on an image, such as opening, or closing, or dilation, or erosion, or cascading of such operations – here the number of calculations is very small and it is based on the essence of shape. So the results that you get inherently have to do with shape because mathematical morphology is the study of shape. A deep learning method may come up with something at the end that is related to a mathematical morphology feature extractor but after it does so, you don’t know that it was a mathematical morphology feature extractor! So I would not make the generalization that the whole world is going to become deep learning! This certainly won’t throw deep learning away. It has a place and it has a value but we must remember that understanding comes from being able to do something in the simplest way in which it can be done. That’s the principle of Occam’s razor and deep learning doesn’t have that. Feature extraction as it has been going on in image analysis for fifty years – every one of those methods has something that tells you why it actually works and what it means. I wouldn’t throw that away, and I don’t think that deep learning actually replaces that.

Suryoday Basak: My next question is regarding something on your website that caught my attention. You have mentioned that one of your research interests is to further the ‘science of computer vision’. Could you please elaborate on what you mean by 'science of computer vision'?

Robert Haralick: Yeah, that goal is actually a goal that I’ve had since the 1980’s and you have to understand the context of that. When people first began to work on computer vision, there was a whole group of that was doing it by demonstration. They would take a dataset, often small, and would fine-tune their method for that dataset, but the method was very brittle and it did not work when you extended it out. And also in the 1980’s, when I said – and I said this multiple times in various workshops and conferences where the people from Stanford, Carnegie Mellon, Maryland, the Californian universities and all other places would challenge the statement that I made – that computer vision has to be posed as an optimization problem; if you haven’t gotten to the place where you’re posing it as an optimization problem then you don’t fully understand what it is that you’re doing. Now from this point of view, we can go back to the earlier question on deep learning. Deep learning is posed as an optimization problem, but the downside is that it’s too complex. So what we’re looking for really is the simplest optimization problem that you can pose that gets you the required results. The ‘science’ part comes from the fact that you did it as an optimization problem. In the 1980s, I said this multiple times. So let me tell you what the researchers from MIT or Stanford said to me. They said, “Haralick, you’re a dreamer.” That’s what they said; and I knew that all the things they were talking about were a fad. And that fad lasted from 1982 to 1985. And then there was a new fad – 1985 to 1987. And, well, one fad replaced another fad, and then that was replaced by another fad, and it was all a fad because it was never posed as an optimization problem.

I had a paper that was published in that same period of time, probably in the middle to late 1980s*. It was called ‘Computer vision theory: The lack thereof’. It was a widely referenced paper where I tried to refine this business that you had to make computer vision into an optimization problem. I even mentioned what kind of an optimization problem you have to make it into. That is what I regarded as the science – when you get to the place where you can define the optimization problem and solve it. That’s when you’ve accomplished the science.

Suryoday Basak: I'm just trying to imagine how you’ve gone from one stage to another. I suppose that it would be appropriately stated as an emotional journey, where people are calling you a dreamer, as well as something technical, where you fortified everything you believed in with the use of mathematics. It really is quite overwhelming! Here is my last question for you and this is with regard to something for which a lot of people have been facing criticism of late. Often, a problem is looked at in a context-specific manner and solved using feature extraction and traditional machine learning. What has come to light of late is that if authors are not using the more edgy methods, their papers are less likely to get accepted. The problems that I'm talking about are, however, of a different flavor wherein the size of the data is of the order of 10s of megabytes, but there are inherent complexities and limitations of the data. What are your thoughts on using deep learning to solve such problems, or to solve problems in general, in a very non-context-specific sense and in small datasets?

Robert Haralick: I guess there’s some part of that question that needs to be answered first. I mentioned that what actually happens in the research area is that you get a sequence of fads and the fads last many years and after that there’s a new fad. And what happens is that different researchers jump on the bandwagon or the train of the fad. Concurrently, if you’re doing something that’s not relative to what’s happening in the fad, then they’re going to downgrade you saying that you’re not doing good research. There’s a saying in the history of science which is very like this: it’s from a book that was published by Thomas Kuhn in the 1960s. The title of the book is ‘The Structure of Scientific Revolutions’. He didn’t call it a ‘fad’, he called it a ‘paradigm’. What he shows is that there’s always a ‘current paradigm’ and everybody locks on to the current paradigm. But the current paradigm doesn’t solve all the problems, it doesn’t match of the observations – we’re talking about physics – and they start to look at the anomalies. Then they realize, “oh, there’s something wrong with the current paradigm" and then there’s a ‘new paradigm’. In the beginning, Thomas Kuhn says that all the people that are using the current paradigm even make personal attacks on other scientists saying that they need to diversify. Ultimately, when enough people work into a newer paradigm that satisfies all the requirements of all the data that was worked on earlier plus some new data, they constitute the next fad, which becomes the ‘new paradigm’ and everybody starts to join in on to the train of that paradigm and the cycle just repeats again, and again, and again. It’s unfortunate that researchers don’t realize that this is the bigger picture.

Suryoday Basak with Dr. Robert Haralick

*Robert Haralick, 1986, Computer vision theory: The lack thereof, https://doi.org/10.1016/0734-189X(86)90082-4