Aesthetic Quality Inference Engine – “Intelligent, Unbiased and Instant Assessment of Photos“
Science, snake oil or just a bit of fun?
Last week Terry Love alerted the PhD-Design community to the ACQUINE web tool for rating the “aesthetic” quality of images. Obviously this kind of thing is a red rag to most of us in the art school tradition but it does point to some serious questions. My reactions went from having a bit of fun with the tool, to questioning its credentials and finally to a feeling that it pointed to a valid direction for science to explore as a long term inquiry but also a rich well of snake oil right now for those out to make money from gullible businesses. Most of what I’ve said below was originally posted in four recent messages to the PhD-Design discussion list at JISCmail.ac.uk
Round 1 – Having fun with Acquine
I started out by trying the online tool to assess my personal beauty.
First I tried two images, one of me in my office with a green chair and grey table and the same photo cropped to just show head and shoulders.
The complete photo received 43.1% but the cropped one 31.4% indicating that my office furniture may be better looking than me. Or maybe that we should take some office furniture along as an accessory when going on a blind date to improve that vital first impression.
I then tried a different head and shoulders shot and a very close crop where my face completely fills the photo.
This time the head and shoulders was rated 36.6% but the “in your face” shot received a thumping 60.9%. So maybe I should stand very close to people if I want them to find me attractive.
A cartoon of me drawn by a student (Reuben Wu, last heard of in Boston Massachusetts in a band called Ladytron) who was bored in one of my lectures received a measly 12.3% and a 1983 photo of my fellow students at Coventry Polytechnic, laughing on the top deck of a tram at a transport museum was rated 18.1%.
Back in the real world, both of these aesthetically challenged items evoke very warm feelings in me.
I felt that this very entertaining gadget reveals far more about some scientists than it does about aesthetics.
Round 2 – So what’s the problem?
Following my rather facetious comment, Bengi Turgan posted a helpful explanation of some of the thinking behind ACQUINE so I thought I should say something more serious about this interesting project
In recent years a great deal of attention has been paid to the possibility of treating affect as something that can be engineered. Those who try to do this, often using the most carefully considered psychological and statistical principles, sometimes suffer from a fundamental misapprehension that “aesthetics” is something material and fixed like hydraulics whose behaviour can be predicted reliably from past performance and fairly simple basic principles.
What I had hoped to say in a humorous way was that the affect of complex artefacts, what my film-making colleagues might call our ‘visceral’ response to an experience, is dependent on a context which is constantly changing and subject to an enormous number of factors. The ACQUINE project seems to be attempting to predict the future based on a limited sampling of individual responses taken out of context, it has some of the characteristics of those stock market trading programs that all make the same judgement at the same time and therefore all contribute to mutual collapse when the system tips out of balance. An early paper from the ACQUINE team (there are three available on their website) indicates that they looked at some theoretical discussions of aesthetics and concluded that the picture was too confusing, in their most recent publication they say simply that
We leave aside subjectivity for now and consider aesthetic attributes to be a consensus measure over the entire population (Datta et al 2008)
So although they acknowledged early on that semantics play a part, they do not seem to be worried about the dynamic nature of semantics, or the quite profound differences that can exist between the perceptions of different groups and individuals, seeing these as secondary factors that might modify the ‘big picture’ but not invalidate it.
Meanwhile I’m not saying this work has no value, I just think it needs a hugely complex and powerful artificial intelligence machine, capable of monitoring and modelling the whole of human society and its material activities and making valid predictions for all of that stuff before it could hope to make reliable predictions for affect. Even then it would have to be capable of modelling and rating all the possible combinations of possible designs and other developments in the relevant future before it could be reliable. The social implications of such a machine go well beyond the minor concerns of designing appealing gadgets.
Of course such systems are already used for fine tuning immediate affect in established designs, such as the colour or typography of a package. But they are no defence against somebody coming up with a fresh idea that makes the whole proposition of the old idea look stale.
Round 3 – So does it have any value at all?
Following that second comment Keith Russell wrote:
While I agree with the general thrust of your comments about “treating affect as something that can be engineered”… I am concerned that the proving of the difficulty might also be used as a general excuse for designers to slip into fine art mode. I argue that there is much more than can be known about affect than we culturally allow.
Of course he’s right. I think I am responding to the hubris of the (quite large number of) examples that I have seen where people make extremely facile assumptions about how quickly you can move from a valid scientific exploration of affect to developing practical tools.
So my ideal all-knowing artificial intelligence may be an overstatement but it is also a challenge and maybe even an optimistic prediction (I’m a fan of Ian M. Banks). I’d be interested to see an argument for how something that is less than all-knowing could make useful predictions.
At the moment I believe there are techniques which allow marketeers to work out which shade of red might persuade somebody to buy a particular bottle of gin this week. However these techniques do not cope with the effect of the almost endless range of possibilities for new designs of bottles and labels that your competitors might throw at you the week after. And that doesn’t go anywhere near the altered perceptions that might be induced by future changes in supermarket decor, the typography of new government health warnings, a popular movie about a gin-swilling terrorist or any of a squillion other possible influences that we might not be able to predict right now.
Meanwhile I was talking only yesterday to a marketeer who was very persuasive on the point that business people really like the apparently reliable numbers that come out of affective engineering studies. He didn’t really understand the methodologies involved but he was impressed by the academic credentials of the affect engineers and the way their work made it easier to sell design and marketing advice to industry.
So let’s do the science by all means but right now there’s too much close-to-market snake oil for my liking. These days, every time an engineer uses the word “aesthetics” I’m strongly inclined to reach for my gun.
Round 4 – What about the science?
After I had made these comments, Don Norman responded with a strong defence of the value of the ACQUINE project as well as strong criticisms of the way that I had used the term “visceral”. (I subsequently added the caveat above about my usage following that of my film-making colleagues rather than Don’s usage from psychology)
Don included the statement: “But the work itself is solid. Someday, research of this sort, will be very useful.” I feel uncomfortable with this. I agree that research of this sort will be useful but the solidity of this particular project has to be questioned. I am sure it has some real strength in the way it is developing machine vision and artificial intelligence techniques but there are some flaws in the basic assumptions about aesthetics, revealed in the ACQUINE publications.
First, as I’ve indicated above, they adopt a key concept, that it is possible to isolate some kind of universal aesthetic value not dependent on ‘semantics’. Don Norman points out that humans do have some universal responses like “fear of heights, darkness, crowds” but I expect he would also agree that these influences can be moderated by context, we have a capacity to completely reverse our response and, in some circumstances, value images that, for example, convey fear. This is quite evident in many of the images on the project website. I can’t help feeling that the ACQUINE approach has been to just ignore the non-universal factors.
There is internal evidence to support this in one of their papers (Datta et al 2006). The website, photo.net, used by professional and amateur photographers, includes aesthetic ratings provided by its members reviewing each other’s work and these ratings were used to develop the ACQUINE engine. The first problem is in their assertion that while the professional photographers might focus on technical detail the amateurs represent “the general population”. No arguments or evidence are presented for this and I feel that it is equally valid to suggest that, while the mass-market membership of flickr.com might represent a general population, the participants in photo.net are a specialist group. There is a long history of serious amateur photography based on the kind of “professional” values and institutions characterised by Richard Sennett (2008, 24-27) in his discussion of craftsmanship among the open-source software development community and I would expect the judgements of the photo.net community to be moderated by these professional concerns.
In fact that can be seen in the data. The assessments on photo.net actually use two factors, “Aesthetics” and “Originality”. The ACQUINE team report a strong correlation between the two. Given that we would expect an artistic community to put a high value on originality it seems that the photographers may not be able to separate aesthetics from originality, possibly because they have no strong separate concept of aesthetics. This correlation implies that the photo.net amateurs, as members of a specialist artistic community, may not be a good guide to the general population. Datta et al note that this correlation also implies a strong semantic factor in the aesthetic judgements which makes it extremely difficult to isolate their ‘universal’ aesthetics from the other kind. Having noted that problem, they appear to have moved on regardless.
So we have two problems in the most basic raw material of the research: The population chosen appear to have a specialised concept of aesthetics, highly influenced by their artistic context; and the researchers have no way of isolating their looked-for universal “consensus measure” from the messy background of semantics. They are optimistically pursuing a convenient, narrow and possible spurious factor that they cannot isolate, apparently because it is too difficult to address the very complex reality they have in front of them. If there were no other way to deal with this scientific problem I might have some sympathy for them but I believe there are many possible avenues to advance knowledge and technique in this area, maybe they are not as appealing to the public or rich in immediate snake oil potential.
Ritendra Datta, Dhiraj Joshi, Jia Li and James Z. Wang (2006) Studying Aesthetics in Photographic Images Using a Computational Approach, Lecture Notes in Computer Science, vol. 3953, Proceedings of the European Conference on Computer Vision, Part III, pp. 288-301, Graz, Austria, May 2006, available online at http://www-db.stanford.edu/~wangz/project/imsearch/Aesthetics/ECCV06/
Ritendra Datta, Jia Li and James Z. Wang (2008) Algorithmic Inferencing of Aesthetics and Emotion in Natural Images: An Exposition, Proceedings of the IEEE International Conference on Image Processing (ICIP), Special Session on Image Aesthetics, Mood and Emotion, pp. 105-108, San Diego, California, IEEE, October 2008, available online from http://infolab.stanford.edu/~wangz/project/imsearch/Aesthetics/ICIP08/
Richard Sennett (2008) The Craftsman London, Allen Lane