Recently I had the privilege of talking to the enthusiastic founder and CEO of Galaxy Studios, and Auro Technologies Wilfried Van Baelen. Auro Technologies are at the forefront of new sound technologies that are changing the way we experience, and ultimately ‘hear’ sound. More immersive, more lifelike, and for the first time, true 3-dimensional sound that more accurately replicates the way humans hear, providing a far more natural and enjoyable listening experience.
Wilfried has been recognized since the mid-1990’s as a pioneer in the production of high-end Surround Sound for both music and film. He built his first studio (Studio Galaxy) at the age of 18, before becoming a celebrated artist with gold records to his credit, and performed over 160 concerts on synthesizers and pop-organs. Wilfried then decided to focus his energies on his career as a producer-engineer, using his studio as a laboratory. In 1991 he developed the first interactive modular studio complex, Galaxy Studios, known as probably the most advanced music recording facilities in the world. Wilfried went on to produce and engineer over 20 platinum albums for a host of international artists, well also delving into his passion for film, mixing more than 20 films as dubbing engineer receiving international acclaim for his work.
Wilfried is now busily working with his ground-breaking company, Auro Technologies. The company is growing daily in adoption and regard as more and more people discover (and listen to) its powerful technologies. So without further a due, I introduce to you Wilfried and the company Auro Technologies.
Auro Technologies is a spin-off of the Galaxy Studios Group, and owner of the Auro-3D® Technology Suite. Galaxy Studios is renowned worldwide for its state-of-the-art leadership in audio innovation for music and sound for film. The Auro-3D® suite offers ground-breaking, easy-to-use and unprecedented levels of sound reproduction capabilities to the professional, automotive, broadcast and consumer electronics markets (such as gaming, smart phones, multimedia PC, notebooks, tablets, audio players, digital TV, media libraries and packaged media).
I agree, getting closer to virtual reality by improving sound and picture enhances the immersive impact. Now sound has more impact on that than many people would believe. It has been proven that sound in 3D gives a higher immersive result compared to picture in 3D. The big difference is that sound does more work on the subconscious mind compared to picture, and therefore has such a huge emotional impact. It was clear to me in 2005 when I developed the format, that the addition of height, the missing dimension, all around the audience is the key to immersive sound. Now putting a new format on the market always has many challenges because there are 3 different things which need to be solved: The creation of the content, the distribution, and a compatible playback install. The Auro-3D® format is designed as an end-to-end solution with full backwards compatibility; all Auro-3D® speaker layouts are based on the existing 5.1 / 7.1 Surround standard (SMPTE for cinema, ITU for home). Secondly, the workflow is based on the existing workflow and hardware, meaning there is no extra hardware needed; except of course the Auro-3D® speaker system. And thirdly, our groundbreaking technical solutions allow us to use the existing delivery formats like Blu-Ray™ or DCP (Digital Cinema Package), because it allows us to have multiple mixes in the one PCM carrier.
“Sound does more work on the subconscious mind compared to picture, and therefore has such a huge emotional impact.”
We now have more than 500 Auro cinemas installed or confirmed, more than 50 international films recorded and/or mixed in the in Auro-3D® format, and more than 35 leading mixing studio installs confirmed by the end of this summer. Auro-3D® is already proving to be common practice in the film industry, and is also starting in other industries such as Music, Gaming, Home Entertainment, and Mobile.
It is important to understand what “immersive sound” is. The definition is related to the reproduction of a natural, true 3D sound all around and above the listener. This means that is it not related to an object based technology as Dolby or DTS try to make us believe through their marketing. Immersive sound can be achieved even better and more efficiently with channel based technology over object based technologies. To achieve being immersed, the addition of the third and missing dimension ‘height’ all around and above the audience is key. This means that the speaker layout becomes the most important aspect, and not the way that sound is going to be delivered by a channel or object based technology. The term “immersive sound” was used for the first time in a flyer we made together with Barco in 2010 to launch a new true 3D experience in sound. But because “3D Sound” was used to market “Surround Sound” formats like 5.1 and 7.1, there was too much confusion and a new name was needed. We have chosen the term “immersive sound” because the addition of a good spread of sound in the vertical axis (height) is what is creating that total immersive experience.
Now we see different proposals, some with channel based technology, and some with both channel and object based technologies. Where our competitors are more related to an object based approach, the Auro-3D® format is more based on a channel based approach (but with the ability to do object based as well). It is important to understand that you cannot have both technologies together without any compromise. In fact, if you want to play channel and object based technology at the same time in a cinema theatre, then you should need different speakers for the objects, and different speakers for the channels. In fact, the larger the room, the more issues with object based technology because the sweetspot becomes too small. For very large theaters, the Auro-3D® format wants to be careful with solutions based on the reproduction of a full discrete channel by each single speaker like our competitors do. We keep our rooms in zones to get a much larger sweetspot that allows much more people in the theatre to have the same experience. Additionally, we have a 3 layer system that gives a much better vertical spread and vertical resolution of sound compared to the 2 layered system from Dolby Atmos.
Coming back to the immersive effect, immersive is related to the subconscious. All day we humans have an audio/visual experience, and we do not really get tired of it. It is a natural experience for which our brains use little energy processing. The moment you see a film, an audio/visual experience, one can see brain activity coming up, scientists were able to analyse this more precisely about 10 years ago. Now the question is where does this increase in brain activity come from? What people don’t understand is that when you have an audio/visual experience you are not seeing moving pictures, it is a repetition of stills, and our brain does see this. It is not our conscious brain, but our subconscious brain that is noticing it and has to interpret it; our conscious is not fast enough to notice. The conscious brain can only handle about 12 items a second, so seeing 24 frames in one second is too fast, so it looks like it is moving when there is a logical relationship between those pictures. Our subconscious on the other hand can handle upwards of 12 million items of information a second, now that’s amazing what is processed in every second of our life! So only about one millionth of the information our brain is taking in every second is coming through our conscious brain.
“You have to know that sound is the first sense we develop in life, when you’re in your mothers belly.”
So we can question how many frames per second humans need to require less brain power, to create a more lifelike, natural experience. Scientists tested many people on it, and it came in at about 54-56 frames per second. So this is why you are seeing film directors like James Cameron interested in implementing 60 frames per second in their films, this increase is a huge difference, it feels so much more relaxing, so much more real which is not the case with 48 frames/second. Watching 60 Frames/Second in stereoscopic 3D feels like you are much closer to the action on screen; it is almost like looking through a window of reality, which means much more immersive, and can be used by creative people to enhance the emotional experience of the movie-goers.
Visual stimuli takes up the majority of our conscious brain, approximately 70%, so this is not leaving much room in our conscious brain for sound. I have noticed this many times as a mix engineer, if I am mixing and close my eyes, I hear more, a lot of engineers do this if they are listening to a new sound system for a movie; they close their eyes and hear more! A lot of people underestimate the audio visual relationship; sound is so much more subconscious, but also why it makes it so emotionally powerful. You have to know that sound is the first sense we develop in life, when you’re in your mothers belly. We begin to develop our hearing 4-5 months after being conceived, it is the only sense we develop before we are born, all the rest of the senses develop later.
So our hearing is the sense that has the most nerve strength, like a hot line to our reptile brain. This is all to do with survival, our reptile brain has, let’s say a radar which is going on all the time, the moment something comes from behind you where you don’t have any visual information, immediately your subconscious brain is knocking on your conscious brain and saying, HEY WATCH! This is why you get a fright when you hear an unexpected sound, particularly loud ones coming from your non-visual field. This helps explain why playing too obvious with object based sounds in the non-visual field can be distractive, and might be perceived as less immersive as it can divert moviegoers from the story-telling on screen.
“So if you better understand how our brain and hearing work, then you better understand why Auro-3D is such a fantastic system.”
This is also the reason why we are so sensitive to sounds from ear level up to the first reflection level; this is where our brains are most carefully scanning. Human beings are less sensitive for sounds coming from above because of the million years of evolution we have in our DNA based on survival, enemies were not typically located there. That made our hearing system less sensitive for sounds located directly above us; this is why we don’t have an ear on the top of our head. It’s very clear that the whole immersive experience is coming from the data mining of the brain, from these multiple height layers which give us directional clues. So if you better understand how our brain and hearing work, then you better understand why Auro-3D® is such a fantastic system.
Even with a million speakers around us we will never be able to reproduce natural sound. If you start to understand how critical our ears are, how air moves, and how low, mid, and high frequencies move, then you will understand that we will never be able to accurately reproduce it. So the art is to recreate that immersive experience with a minimum amount of channels and speakers.
Exactly! You know the expression “less is more”? You need a minimum, but there also becomes a point when you can have too much, it becomes problematic. Every extra channel you add to a speaker system is adding phase issues, is adding workflow issues, is adding extra cost. So is the experience better? I don’t know.
I have done a lot of testing, up to 25 channels (24.1). In a small room I don’t need this amount, absolutely not. When I hear the difference, it’s not worth the investment. If you go to a large room, let’s say a large 500 set cinema then it can make sense to use up to 24 channels, but in a smaller theatre this is not needed. But having said that, the key to immersive sound is a natural balance between the horizontal and vertical spread of sound around the audience, this can be perfectly achieved in a large theatre in Auro 11.1.
Most people forget that true 3D sound has to be divided in the room as a natural spread of energy, meaning it is not enough to have great precision in the horizontal field and not in the vertical axis. The vertical axis is very new for almost every engineer, and most of them make the mistake believing that it is just another couple of added elevated speakers. It does so much more; it creates extra harmonics, more natural colors, more depth, extra transparency etc….but as well new challenges. Moving sounds vertically works completely different than we are used to doing horizontally. This is because there is almost no time difference between our left and right ears, for which we are extremely sensitive to (about 4 to 5 millionth of a second). So that’s why you need more layers in the vertical axis.
“Even with a million speakers around us we will never be able to reproduce natural sound.”
You can position phantom sources in the vertical axis much easier with Auro-3Ds unique 3-layered system. The vertical resolutionis much more precise and important for an immersive real-life experience. It is not because you can pan vertically, because you can’t do that, but you can pan in a triangle, this is how you easily create a phantom source in the Auro 11.1 system. We have for instance 6 screen channels divided over 2 vertical layers in which we have many triangle relationships, much more than any system from our competitors.
If I bring you into a dark room, and I play you a demo through Auro 11.1, and I say to you: “This is our new system, the new 3D audio system” and at the end I ask you: “how many channels do you think you have heard?” Everyone says “At least 20 but I’m not sure, I couldn’t count them exactly.” You get the impression that you heard more channels because you hear sounds in the hemisphere where there are no speakers. It truly is an amazing experience.
George Lucas famously stated that “Sound is 50% of the movie experience” and all film directors agree with this because the emotion of a movie is largely created by the sound. But in practice, sound almost always takes a backseat over visual. I believe this has more to do with the way our brain is working, visual is more conscious related while sound is much more subconscious, but therefore so powerful! Now what makes sound in 3D (called “immersive sound”) so special is that the audience is immersed in a true 3D sound field with an arrangement of speakers all around and above them, thus creating a unique new experience. This cannot be compared with picture in 3D because that technology is not based on the way we experience 3D picture in real life. This is different with sound in 3D which can be heard by everybody (even with hearing problems) and comes closer to our natural way of hearing. Immersive 3D Sound has much more impact than current Surround formats, even with a standard 2D picture format.
The 3-layered Auro-3D® system allows a huge amount of extra creativity in sound design. It is the vertical Stereo-field between the existing Surround layer and the unique Auro-3D®’s second layer (Height Layer) on the screen as well as around the audience that creates a dimension of new possibilities for immersive sound design. The third layer (Top Layer / Overhead Layer) is the cherry on the cake, but is not as necessary to get the most immersive sound like commonly believed. The top layer is good for special effects like flyer over’s, but is in general less efficient for immersive reflections, since in nature there are almost no reflections coming from above.
Auro-3D® is not only a mixing or reproduction system, but as well a recording format in 3D. This is different with the competitors format because native object based recordings are not possible. In a true 3D (native) recording of e.g. city traffic, the objects (like moving cars, bikes etc) cannot be extracted to use them afterwards as objects, nor the direct sound, nor the 3D reflections around those objects. But the millions of reflections in 3D around each source are the most important information to reproduce a natural sound. Those 3D reflections can be easily captured with the Auro-3D recording rigs and reproduced over an Auro-3D® system. That’s the place where the vertical Stereo-field of the Auro-3D® system has huge advantages, not only for native recordings but also for creative sound design in 3D space. We all know that the immersive impact of native 3D recordings are not comparable with the artificial recreation of such sounds by using Mono or Stereo sources, and then adding some artificial reverb to it.
To improve such experience with an object-based format, those reflections should be rendered for each object at the playback side, which requires more than a thousand times the DSP (Digital Signal Processing) power from the renderers which are currently on the market. Additionally, it needs to be played back over a full amplified system with at least 25 channels. For this reason, many people know that good quality object-based production is still many years away from us. Although Auro-3D® supports object and channel based technology, we see that for many reasons, it is better to stay with a mainly channel based approach and have object based functionality as a next upgrade.
The Auro-3D® format vision is to first get the most immersive experience with the minimum amount of channels, and as a second step clients can easily upgrade to add object-based technology by adding some amplifiers and the renderer without changing the speaker layout (Because the 3 layered systems is needed as well using object based technology). Even using object-based, we will keep it in the reproduction zones like Auro 22.1, because we do not want to lose our large sweetspot (unique to Auro-3D®) when upgrading to object-based technology.
The immersive impact on sound for film can be seriously enhanced by using native sound recorded in the Auro-3D® format. This is not only obvious for music recordings, but also a strong new tool for sound designers. We have even carried out tests capturing the room reflections in 3D around some dialogue recordings which is really amazing. It makes you feel like you are standing on the position of the camera on set; it is so much more immersive! This is most obvious when the dialogue is almost the only thing happening, not being masked by music or other complex sound design. So to work as efficient as possible, to get the most immersive sound experience, the sound supervisor should already look into the script before the recording on set starts, as to advise the recording engineers which sounds could be good to have recorded in native Auro-3D®. This can easily be done with the different Auro-3D® rigs developed, from small to very large. In the meantime there are already companies busy with delivering a library of native Auro-3D® recorded sounds. Tonsturm recently recorded many recordings of forest sounds, it sounds so extremely natural if you hear them played back over an Auro-3D® system. (You can find more information on this on the Tonsturm website)
The advantage of the scalable and compatible Auro-3D® listening formats allows sound designers to prepare the full sound design in 3D on a Auro 9.1 system. Additionally, the Auro-3D® Creative Tool Suite allows the pre-production for all formats, meaning object and channel based configurations, but with the advantage that there is no need for a hardware renderer. These are serious advantages compared to the competitor’s format. Each sound can be routed to channels or to objects without losing automation, and there is no limitation on the amount of objects used. So pre-production using the Auro-3D®tools makes workflow easy, and totally compatible.
I had the concept already in 2005 when I designed the Auro-3D® format. But the technical development started in 2006 with a team of engineers lead by Guido Van den Berghe (co-inventor of the technical solution I had in mind), to bring that new unique audio experience to the market in the existing technology and formats. Full backwards compatibility was always key for an easy integration in the market. I wanted a simple solution, but simplicity is not always simple. So it took our team about four years to have our technology ready and implemented in plug-ins. I demoed the technology in October 2010 in Japan during the first spatial audio convention of AES (Audio Engineering Society). We are still improving the Codec every day, specifically the mathematics around it. Our ability to have multiple channels in one PCM carrier is continually improving. In fact, we started with only 2 channels in one, and then we had 3 channels in one, and last year we started with 4 channels in one. Last month they told me “Wilfried the quality of the 4 channels in one is already better than 3 channels in one a year ago”, so it is getting better all the time.
Due to this groundbreaking technology, the Auro-3D® format allows a very easy integration because there are no distribution issues, there is no new format needed, no new package media; everything is in the standard. PCM is the biggest digital standard in audio, in fact the only uncompressed standard digital format. So yes, it opens a lot of doors for an easy cross-market solution. Another important aspect of our codec is that the extra information is stored in the noise floor, a noise floor that you can’t even hear. We just changed that noise, it sounds exactly like white noise, but you can’t hear it. (For the real audiophiles and coding types, you can find out more about the inner workings of the codec in the Auro-3D® Octopus Codec White Paper.)
The Auro-3D® engine recognizes if the PCM stream is Auro-encoded or not. In case it is Auro-encoded, the Auro-Codec® Decoder will reveal out of that standard PCM stream the original native Auro-3D® mix like it was intended by its creators. But when there is no native mix in Auro-3D®, we developed a groundbreaking technology that can up-mix in a very natural quality each source (Mono, Stereo, 5.1) to an Auro-3D® experience. That technology is called Auro-Matic®, and was developed by a team of Engineers under the lead of Ralph Kessler. Our up-mixing technology is amazing. Everyone says: “Wow, this is so natural sounding. I have never heard anything like this before.” I tell you if you heard the up-mixing of a Mono sound source to Auro 9.1 you will not believe it. People think this is not possible, that you can’t do it with a plug-in, but it really sounds incredible, like it has been re-mastered.
Up-mixing technologies are typically using a combination of changes in equalization (in the spectrum), and an addition of reflection patterns. The approach of Auro-Matic® is different; I didn’t want to hear changes in reverb compared to the original mix, and didn’t want to hear phase issues due to the up-mix in so many channels. So as a Producer/Engineer, I was always very sensitive about making sure the original intent of the creators is respected. So we solved this by designing algorithms which are also related to Head Related Transfer Function (HRTF). This is a new technology, taking into account the way humans hear in the natural environment. One of the derivatives of this technology is our ‘Beautifyer™’ App for iPhone and iPad, used for Stereo enhancement.
To find out more information about Auro Technologies and its products and developments, you can find them through the subsequent channels: