What do machines hear that humans cannot? Artist Florian Hecker explores the formal, perceptual, and aesthetic possibilities afforded by custom machine-listening software in “1935.” Through the use of computer audition, Hecker explores degrees of perceptual resolution hitherto inaccessible via human listening.
1935
00:00–06:58 CC weight = 0.1 iter = 500
07:01–11:53 scattering transform Q = 12 J = 10 sc = tf wvlt = gam
11:53–19:35 CC weight ordered
The field of psychoacoustics combines research from physics, physiology, neurology, and psychology to qualify and subsequently quantify the subjective auditory experience of sound. It attempts to unlock principles underlying human audition by probing the inner workings of the listening mind in order to codify the ways in which the ear, brain, and “movement of mind” work together to synthesize sense from raw sensation. This interdisciplinary field has undertaken such a project through examining the relations between specific stimuli and the individual perceptions they trigger. Objective characteristics—say, the discrete frequency, spectrum, or loudness of a sound—are often measured against subjective impressions such as those offered by individuals evaluating sounds using intuitive descriptions. Accordingly, the aim of psychoacoustics is to define quantitative relations between empirical measurement and perception with increasingly refined degrees of resolution. Or, put another way, its mandate is nothing short of the denaturalization of phenomenal listening as such—and it has made significant headway in this pursuit since the inception of the field.
However, many categories fundamental to human audition continue to frustrate such a project. This is of particular relevance when discussing high-dimensional epiphenomena that synthesize into coherent qualities in perception—categories that appear “given” to the listener, but perhaps exist only within perception itself. For example, when considering timbre—the perceived quality or “tone color” of a sound—the Canadian psychoacousticians Albert Bergman and Stephen McAdams note with frustration that it can only be defined as the “multidimensional waste-basket category for everything that cannot be labeled pitch or loudness. Though the field has identified certain universal mechanisms that structure subjective audition, the emergent categories that synthesize multiple dimensions of sound into coherent “given” features of perception continue to defy theoretical and practical explanation.
Despite such impasses, the application of psychoacoustics in machine listening has had a profound influence. Instead of simply having intelligent machines parse audio data without any a priori categories of analysis or description, machine listeners are endowed with listening abilities designed to parallel our own through the algorithmization of psychoacoustic theory. Using various mathematical generalizations of the inner workings of the ear-brain complex, today’s machine-listening software can recognize specific voices and genres of music and drive the realistic synthesis of acoustically complex phenomena with ease. However, as is evidenced by the inability of underlying psychoacoustic theory to grasp emergent perceptual syntheses—as in the case of timbre—these technologies may well face certain conceptual limitations. Crucially, though, this is not to say that such tools are unable to shed any light on the complexity that certain epiphenomenal perceptual emergences exhibit. In spite of theoretical impasses, through the intensification of analytic resolution in the crucible of inhuman computation, bit by bit, frame by frame, software might offer glimpses into the inner workings of perceptual categories that synthesize into the monolithic phenomenal “given.”
Drawing on compositional techniques and concepts developed in Untitled (F.A.N.N.) (2006/2013), Chimerization (2012), Articulação (2014) / Articulação Sintetico (2015), and A Script for Machine Synthesis (2013–15), and staged most recently in the 2017 exhibitions Synopsis at Tramway, Glasgow, and Halluzination, Perspektive, Synthese at Kunsthalle Wien, 1935 (2018) is the most recent iteration of Florian Hecker’s long-standing interest in synthetic listening. When describing the point of departure for this work, Hecker points toward David Tudor’s Neural Synthesis (1989–95)—a series of works for which the virtuoso-pianist-turned-electroacoustic-improviser experimented with neural networks to synthesize and organize sound in real time. Hecker explains:
My curiosity in AI systems and their role and impact in sound was first triggered through listening to Neural Synthesis. While spending some months in Los Angeles during 2004, I encountered the Tudor archives at the Getty Center. This resulted in regular visits going through the truly heterogeneous materials that have been indexed by the archive—I remember such gems as the request for a quote to ship a Tandoor oven from India to Tudor’s home in Stony Point, New York, recipes for the best gin and tonic, and postcards from Karlheinz and Doris Stockhausen. Most important was a box of approximately eighty audio CD-Rs, all featuring digitized versions of Tudor pieces, recordings of rehearsals, material for installations, hours and hours. … At the time the “archive fever” (to call Suely Rolnik’s commentary to mind) of experimental music had not fully started yet and only a few Tudor vinyls and CDs had been released. Among those archived works, Neural Synthesis continues to stick out to me. What was the drive for Tudor, who had been labeled as a virtuoso instrumentalist for decades, to share the authorship of a work with a machine
Proceeding from these coordinates, 1935 dramatizes two trajectories of machine-listening analyses that subsequently transform and resynthesize new computer-generated sounds. On the one hand, Hecker employs algorithms to share authorial agency with highly formalized yet dynamic processes, and, on the other, to explore the formal, conceptual, and perceptual possibilities afforded by machine listening. In 1935, the computer’s specific capacities for synthetic sensation are materialized through resynthesis to become the very raw materials that Hecker uses for composition. Through the use of these by-products of computer audition, Hecker asks how we might grapple with this paradoxical invitation to listen to the machine’s listening, which, in turn, imperfectly models our perception.
At the heart of these processes of machine listening, analysis, and resynthesis are two computational frameworks: first is a model for audio texture synthesis, using time-frequency scattering, developed by Vincent Lostanlen (heard at 07:01–11:53 of the work; second is an algorithm for the synthesis and transformation of sounds from time frequency statistics, developed by Axel Röbel and members of the Analysis/Synthesis group at IRCAM, Paris (heard at 00:00–06:58 and 11:53–19:35) The algorithmic process of the latter begins with the analysis and extraction of statistical “descriptors” from a given input sound. The computer identifies structures in audio data and subsequently generates a representative model—a statistical “descriptor”—of that sound. Crucially, the identification of relevant structures is informed by psychoacoustic theories of perception; such theories are programmed into the analysis phase of the software to ensure that descriptors are as perceptually “relevant” to human listening as possible. What results is a high-dimensional representation of the original sound to subsequently drive “realistic” resynthesis—or, more interestingly and more relevant to 1935, taking specific statistical descriptors derived from the analysis of an input sound and resynthesizing the same sound with scaled weightings. These descriptors may well correspond with relevant perceptual categories intuitive to phenomenal listening. They may be used to resynthesize an input with fidelity and realism from the perspective of a human listener. But, behind this capacity for representation, they often exhibit measures of abstraction and scales of resolution foreign to human conceptual and perceptual capacities. It is precisely this capacity for abstract description that Hecker instrumentalizes in 1935. Here, Hecker does not resynthesize input sounds to the end of “realistic”—that is to say, normative—representation per se. Instead the aim is to highlight the inner workings of machine listening—by showing that there is a fundamental, generative difference between synthetic and human listening.
This harnessing of the abstract capacities of machine listening particularly comes to the fore when considering Hecker’s use of Röbel’s algorithm to derive “descriptors” from sounds that do not readily invite neat analysis due to the limits of psychoacoustic theory—in particular, sounds generated using the various particle-synthesis methods for which he has become well known. Crucially, the intention here is not the creative misuse of a tool; Hecker does not use these sounds to push algorithmic processes to the point of failure or technical breakdown. Instead, it is to investigate the abstract description derived from this process, which both rigorously maintains certain aspects of the original and uncovers something else in the process. This something else is precisely a function of the computer’s minute quantification of sensory input. Here, sounds that are texturally or timbrally too complex to be codified by psychoacoustic theory are nevertheless described statistically, thereby shedding light on facets of human perception that have hitherto evaded analysis—the facets of sound that synthesize in phenomenal listening into “given” emergent wholes.
1935 consists almost entirely of resynthesized sounds derived from various descriptors. Abstract qualities originating from sounds that are no longer heard become the phantoms that inhabit sonic matter, fusing to form states that are both potential and contingent. Nevertheless, however computationally rigorous or philosophically compelling these transformations may be in and of themselves, 1935 recognizes that it is an individual listener who must construe some sort of cohesion from these new materializations of synthetic sensation. Ultimately, a human listener must become entrained by the differences afforded by the machine’s descriptors and subsequent resynthesis. But Hecker knows this. In 1935, the resultant resynthesized sounds are choreographed in time so as to encourage the active investigation and comparison of synthetic impressions. Each of the three sections of 1935 facilitates this comparison through the presence of continuous sound textures. Acting as the sonic foundation upon which various descriptors act, each texture stands-in as a sort of cantus firmus for each section of the work. The statistical model of the descriptor fuses with this foundation, leaving only traces of the original; crucially, through the referencing of this underlying texture at any given moment, the listener can treat this continuous sound field as a control through which the different statistical filters can be heard and compared.
No whole emerges from the formalist interplay of these machine-driven impressions for the listener. The resynthesized descriptors are continually juxtaposed with hard cuts—cuts that are further distinguished spatially in the stereo field to ensure the disparity between “original” texture and resynthesis is maximized. Through this disparity we might misunderstand Hecker’s intention and attempt a triangulation between original and resynthesis to reveal the genetic disturbance introduced by the descriptor and glean impressions of a lost sound source. We might wonder, what did the machine originally listen to in order to derive its statistical impression. As with strategies of “reduced listening,” we might attempt to deduce the character of some unknown sound source from the computer’s provisional impressions—referencing sound objects that are never experienced directly as such. But Hecker does not attempt to bridge this transcendental impasse—claims to direct or even partial access are vigorously frustrated. If anything, this impasse is exacerbated. Instead, 1935 suggests a perpetual splitting through the continual bifurcation of given perceptual unities, human and machine alike. The aforementioned formal bluntness further encourages this splitting, definitively denying the promise of phenomenal synthesis. Instead, what’s suggested is that this process will only continue—that formalization might even follow a recursive sequence into territories of ever increasing abstraction. The disintegration of boundaries between sounding “reality” and artifice, and sensory perception and model, is precisely what propels 1935 forward. It is here that we understand Hecker’s interest in the artificialization of listening: both computer and human listening are identified as fundamentally provisional, and the continual deferral of an emergent whole attests to that fact. Both human and machine sensation perpetually split into finer and finer degrees of analysis, definitively denying the prospect of any Archimedean point. But, for Hecker, this is not a source of defeatism: on the contrary, in this recognition the artist identifies an invitation for experimentation The denaturalization of given listening through psychoacoustic theorization and endlessly refining computational modeling opens up to new, ever-transforming domains. After all, we do not yet know what the ear-brain complex, synthetic or organic, can do. We don’t know what multiplicitous impressions it can create—what hallucinatory experiences might await? In 1935, computation accelerates denaturalization and projects phenomenal listening beyond its bounds through the cleaving of the given from one into many. Through computation, listening is further freed from its naturalized role—freed into the delirium in which it is swimming