​The group’s motivation with this installation was to produce an installation with an enthralling user interaction, one that grabs their attention and provides multisensory engagement. This installation serves as an enhanced, exaggerated interactive simulation of what chromesthesia is, and how people that have it may perceive the world around them.

Following the line of thought of multi-sensory engagement, the perceptual phenomenon of synaesthesia served as inspiration for the concept of this installation.

Synesthesia is a perceptual phenomenon in which stimulation of one sensory or cognitive pathway leads to involuntary experiences in a second sensory or cognitive pathway [1].


In particular, one of the forms of synesthesia that was of interest for this installation was chromesthesia. This form of the phenomenon in particular is an association of sound and colour. [2]



First attempt: 

The first attempt of the installation was to utilise the frequency of the user’s speech to map it to a specific colour, that would then be used to alter the saturation of the live video feed of the user. 

Alongside this, the live video feed would be distorted according to the amplitude of the speech. 

Using the attributes of the user’s speech to modify the visual output creates a link between the two senses that embodies an exaggerated and whimsical imagination of what it could feel like to have sound and colour associated. 

Because colour is simply a light wave, it too has an associated frequency. Mapping the values of said frequency, to those of the range of speech frequencies allows for the generation of a colour from the input of three frequencies.

The first step was to record a sound input, that was done in the following way:

This recording was stored into a buffer, to then be used to extract the frequency components.

An fft operation on the buffer was then carried out to extract the frequency coefficients, using Volker Bohm’s external object vb.FFTWbuf~. His input to this project was very helpful, and I contacted him to better understand his object, and receive the source code to compile it for Windows as well. 



This is illustrated here:

Where the recorded buffer is used to perform a single FFT on the data inside the buffer.

Once this was done, it was necessary to extract 3 frequency cooefficients, to match to the three RGB values of a colour. This would then be converted to HSL, and the S value which represents saturation, would be use to change the saturation of the live feed video.

The three frequency coefficients that were decided to be used where the minimum, the maximum and the midrange values. This was done using a simple javascript code added to the externals:

Which created three outlets that would output these values.These values were then mapped to the range of 0 to 255, which is the one of RGB colours. The input frequencies were the output of the javascript max, min, medium value finder.

The saturation value was then used to alter the saturation of the live feed. 

All of the aforementioned was my contribution to the first implementation.


Trevor used the amplitude of speech to stretch the live feed video, and Esther created a filter to remove the background noise of the speech, as well as helping compile the vb.FFTW~ object since she has a MacOS, and thus we worked together on joining all patches into a main one to test it worked and record it.


However, there was issues using the saturation value generated from the frequency coefficients, as the live feed video was using jit.gen. Other issues arose that lead us to decide to discard the fft method to extract frequencies from the speech, and instead use the fiddle~ and the mtof object.

Second attempt:

Esther incorporated the fiddle~ and the mtof objects to instead use pitch to vary the saturation of the live feed, and using a panning mechanism that can be seen in the patch below. The final result was using pitch and amplitude to vary the visual appearance of the live feed video.

With an improved user interface, the final patch takes in a live feed video of the user, and adjusts the colour saturation according to the pitch of the speech, and zooms into the user thereby distorting their image according to the amplitude of their speech. It is also possible to vary the apparent localisation of the sound source by changing the live gain of each channel and wearing headphones to experience this effect.

The installation would have been a semi-transparent acrylic sheet that would be placed in front of the user, where the distorted live feed would have been projected onto. The user would wear a pair of headphones in front of the acrylic sheet. The laptop and the projector would be concealed under a black box, and the acylic sheet would be placed on a stand on top of it. This creates a mirror like experience, where the user expects to see their reflection on the acrylic, but instead, their face is distorted and changes colours according to their speech, and the apparent source of their speech can be changed using amplitude panning.

To test this yourself, download MaxMSP, then download the zip file and extract the contents. Place the fiddle~ patch under msp extensions in your program files of Max8. Then open the OPEN_ME file and view in presentation mode.

[1] Cytowic, Richard E. (2002). Synesthesia: A Union of the Senses (2nd edition). Cambridge, Massachusetts: MIT Press. ISBN 978-0-262-03296-4. OCLC 49395033.

[2] "Absolute Pitch and Synesthesia | The Feinstein Institute for Medical Research". The Feinstein Institute for Medical Research. Retrieved 2 May 2016.

  • White LinkedIn Icon