Summoning Music and the Aesthetics of Responsibility: The Concept of Composition in the Generative AI Era

Podcast Commentary on This Article (AI Voice)

The world of music production is at a historic turning point, symbolized by ‘Suno,’ an AI that generates music from text prompts. This AI enables anyone to create music using the incredibly accessible medium of natural language, eliminating the need for specialized knowledge of scales and harmony or proficiency in DAW operations.

From the perspective of democratizing musical expression, this change is one of the greatest achievements in human cultural history and deserves wide recognition. Many people are now experiencing the “sensation of expression” by giving musical form to their own concepts and are enjoying the pleasure of creation.

However, behind these joyful voices, I find myself confronting a fundamental question: “What is the structural difference between the act of generation by AI and composition in its traditional sense?”

Exploring this question is not an attempt to deny new technologies or the trend of democratization. Rather, it is a constructive effort to acknowledge and embrace their wonders while looking more deeply at the core of what we have called ‘composition.’ At the heart of this inquiry lies the suspicion that “physical commitment” and “ethical responsibility” are being structurally bypassed by the AI process.

Therefore, this article aims to dissect Suno’s generation process by focusing on two main motifs to uncover the nature of this structural difference. By redefining the boundary with the traditional concept of composition, I will attempt to construct a meta-theory of composition.

Physicality (the direct intervention of musical qualia)
Ethical Responsibility (the locus of the ethical subject)

Furthermore, by viewing Suno as a “mirror that retro-reflects the concept of composition,” I will also attempt to present a new role for composers and an aesthetics of responsibility in the age of generative AI. This is not an inquiry into whether “AI can replace composition,” but an attempt to shed light on the fundamental question of “what is irreplaceable in composition?”

---- Contents ----

Chapter 1: The Physicality of Composition – A Dialogue of Feedback Loops
Chapter 2: Dissecting the Suno Process – The “Summoned” Music
Chapter 3: Placing in Historical Context – A Comparison with Graphic Scores
Chapter 4: Composition as a “Social Action” – The Choice of Creative Commitment
Chapter 5: Redefining Composition and the “Aesthetics of Responsibility”
1. Addendum: On the Scope and Issues of This Essay
Related Articles

Chapter 1: The Physicality of Composition – A Dialogue of Feedback Loops

In this age of AI, when reconsidering the definition of composition, it is necessary to view the act of composing as a process rather than a set of specific elements or conditions. I see the core of the compositional act as “an active feedback loop that includes the step of interaction and communion with musical materials and components.” This loop is a common structure in the diverse methods used by composers, such as notation, performance, and DAW editing.

For example, a composer might input a phrase on a DAW, arrange and edit a sample, or play a riff on a guitar. They then immediately “listen” to the result of that “act,” make an “aesthetic judgment,” and connect it to the next “correction or operation.” This constant “dialogue” is a crucial process for precisely realizing creative intent. This feedback loop is not mere trial and error, but a process of deepening that involves responsibility for the texture and structure of the sound. An act lacking this process cannot, I believe, be included in the strict definition of ‘composition’ (the concept redefined in this essay).

For this feedback loop to become a creative dialogue rather than a mechanical trial and error, the intervention of “physicality” is essential. In this article, I will call this requirement “the intervention of the listening body.” To be clear, the “physicality” mentioned here is not about the amount of physical movement. Its core lies in the directness of operation and the resolution of feedback: ‘how directly and with what fine resolution one can manipulate sonic qualia (the raw texture of sound and the emotions it evokes) and receive immediate feedback on the result.’

From this perspective, the nature of mouse operations in a DAW, which at first glance seems non-musical and non-physical, becomes clear. A mouse click or drag, though small in physical movement and not appearing as a musical action, is the direct execution of a highly specific intention based on sonic qualia—for instance, raising the pitch of a MIDI note by a semitone, an act of “making this sound this high.” While the tool (the mouse) is indirect, the intervention on the object (the sound) is direct.

Similarly, in a compositional style where a phrase in one’s head is transcribed directly onto a score and the work is refined, the object the composer is manipulating is the “imagined sonic qualia itself.” Musical thought directly simulates the sound itself, judging its resonance in the form of “how would it sound if an A♭, not a G, followed this C?” Thus, playing the piano, operating a DAW, and constructing sound in the mind, while differing in their interfaces, are structurally common in that their feedback loops directly target “sonic qualia.”

In contrast, Suno’s process structurally bypasses this “direct intervention in sonic qualia.” The user provides instructions with language and receives a result from the AI, remaining in a state of indirect involvement. The user’s feedback loop in Suno targets “linguistic instructions” rather than “sonic qualia,” and this is the crucial difference. This musically detached operation cannot establish an active judgment on the delicate texture of qualia, nor the subjective responsibility for it.

To be clear, this principle of “the intervention of the listening body,” or more strictly, “direct intervention in sonic qualia,” is not intended to negate the act of generation by Suno. It is meant to draw a “logical boundary” to respect the purity and historical weight of the concept of “composition.” In the next chapter, based on this principle, I will analyze Suno’s process in detail and consider the concept of “Summoning,” which should be proposed as a new creative genre.

Chapter 2: Dissecting the Suno Process – The “Summoned” Music

In light of the definition from Chapter 1, music generation by Suno would not be considered composition in the strict sense. The reason is a structural shift in the object of the feedback loop. In traditional composition, the object of the feedback loop is the “sonic material or component” itself, and the composer physically intervenes with the sonic qualia.

In Suno’s process, however, the user’s operation is limited to a conceptual medium: “linguistic instructions (prompts).” When a user adjusts their musical intent, they do so by “modifying the prompt,” not by “directly manipulating the generated sonic material.” This structure places the user in a passive role as a “presenter of concepts and a selector of results,” rather than a “designer of the music.” The most critical step of creative judgment—the value judgment and decision of “why this sound is beautiful”—is delegated to the statistical black box of the AI model, hollowing out the creative subject.

To aptly describe this hollowing out of the creative subject and the non-physical nature of the process, this essay proposes to define and name the act of generation by Suno as “the Summoning of music.” Whereas composition is an “intentional construction (Generation)” through dialogue with materials, “Summoning” signifies “the act of calling forth a being beyond one’s own intent with a spell (prompt).”

However, qualitatively separating “Summoning” from “composition” is by no means a denial of the creative endeavor inherent in summoning. Rather, summoning is seen as a new creative activity that does not fit into the narrow concept of traditional composition and contains the potential for an unforeseen future. This definition of “Summoning” embodies two important elements.

1. Expressing the Black Box Nature
Generative AI probabilistically produces sound that matches a prompt through a model based on vast learning assets, sometimes resulting in outputs that exceed the user’s expectations. Calling this process “Summoning” accurately expresses the black-box nature for which the user cannot be held responsible for the sonic structure.

2. Affirming the Sense of Indirect Operation
The act of “Summoning” involves indirect operation (a spell). This does not negate the “sense of manipulating the music production process, albeit indirectly” that Suno users enjoy. Instead, it positively accepts “operation by language” as a new form of creation, while simultaneously suggesting its essential boundary.

Incidentally, the term “Summoning” is not an original neologism of this essay. It originated in the early days of image generation AI a few years ago, when many users intuitively described the experience of obtaining unintended high-quality results from complex prompts as “summoning.” This essay borrows this shared metaphor, not merely as internet slang, but as a concept for analyzing the structure of creative acts. In other words, the term “Summoning” is used as a key to connect the user’s intuitive feeling of “not knowing what will come out” to more structural issues such as “black-box nature,” “non-intervention in qualia,” and “the absence of an ethical subject.”

To add, summoning is “the act of calling forth,” which includes the “otherness” and “unpredictability” of a visitor. In terms of temporal structure, the word summoning also implies the temporal characteristic of “appearing in an instant,” encompassing a contrast with other existing production processes.

Needless to say, the term “Summoning” does not deny the creativity of users who meticulously craft prompts and select the best from countless outputs. That endeavor is an undeniable creative judgment, similar to a great film director choosing the best take or a curator breathing life into an exhibition. The reason for calling it “Summoning” here is to clarify that this type of creativity is qualitatively different from “construction,” which involves a direct struggle with materials. Just as a director or curator works with “the creations of others”—an actor’s performance or an artist’s work—a Suno user also confronts a “visitor beyond their own intent,” produced by the black box of the AI.

By defining Suno’s act as “the Summoning of music,” it gains value not just as “advanced selection” but as “concept design.” However, for summoned music to be elevated to composition in the strict sense, the user must actively intervene in the sonic qualia of the summoned sound with their own “listening body” and “professional knowledge and experience,” overwriting the AI’s statistical judgment with “human ethical intent.” The next chapter will examine how this structure of “instruction and interpretation/generation” in “Summoning” differs from the graphic scores of past music history and will verify Suno’s position within that history.

Chapter 3: Placing in Historical Context – A Comparison with Graphic Scores

The structure seen in Suno’s process, where a “medium” is placed between natural language instruction and sonic realization, is not entirely without precedent in music history. In the mid-20th century contemporary music, the method of the graphic score attempted this “separation of instruction and interpretation” by entrusting the composer’s intent to abstract shapes and symbols.

The pioneers of this approach were Earle Brown and John Cage. Earle Brown’s “December 1952” is composed of lines and dots reminiscent of a Mondrian abstract painting. The performer subjectively translated (interpreted) this visual information into pitch, duration, dynamics, and so on, making active judgments on the spot. Brown’s intention was to give the performer “subjective interpretation” and “structural responsibility.” John Cage demonstrated a similar approach with his “Fontana Mix” and, in many other works, introduced chance operations to thoroughly eliminate the composer’s ego and heighten the indeterminacy of the performance.

The structure of these graphic scores may at first seem similar to Suno’s “prompt → AI → sound” flow, but the presence or absence of a “human body” in between constitutes an essential difference. This difference might appear to be one of interface, but its essence lies in whether the interpreter/executor is an “ethical, responsible body.”

The structure of the graphic score was premised on the idea that “a human performer” would interpret and realize the composer’s abstract instructions with their physicality. For example, when a performer reads a score created by Brown, it involves an active creative judgment: “How, with my technique and physical perception, and with what sonic qualia as a background, will I responsibly realize this shape?” The performer was a vicarious creative subject, bearing professional responsibility for their interpretation. Cage may have even aimed to destroy this very chain of responsibility at times. However, even in his radical practice, the fact remains that at the final gate, where chance instructions were ultimately brought to life as sound, there was always the physical interpretation of a “living performer” bearing ethical responsibility.

In this respect, graphic scores expanded the possibilities of human creative interpretation. But it was also a perilous attempt in which the composer himself stepped back from the core of composition. Nevertheless, the act of composition remained viable because its core principle—”direct intervention in sonic qualia”—was barely maintained by being delegated to another human subject, the performer.

In Suno’s case, however, this “interpreting subject with a human body” has been replaced by “probabilistic statistical processing based on training data.” The AI mechanically generates the most likely combination from its learned model of past sonic patterns to achieve the instructed concept. There is no physical commitment to the qualia, no sense of “why this sound is beautiful.”

Thus, while graphic scores expanded the “possibilities of human creative interpretation” and maintained the principle of “direct intervention in sonic qualia” in composition by delegating it to the performer, Suno has replaced the very “necessity of physical and ethical interpretation” with technology. It is here that we can recognize a structural rupture between the two.

The perspective of treating AI as a “new type of performer” seems appealing at first glance. However, the essential difference here is the “presence or absence of accountability.” A human performer is an ethical subject who, when asked “why did you choose this performance?”, can answer based on their own musical views and aesthetics (qualia). In contrast, AI bears no ethical accountability for its output. Its decision criteria are purely statistical probability, with no “aesthetic conviction” involved. This “absence of an ethical subject” is the definitive reason why AI cannot be spoken of in the same breath as a human performer, and it is the basis for arguing that Suno stands not on the extension of compositional history, but at a “point of rupture.”

As a result of this comparative analysis, Suno can be seen not as an evolution on the extension of compositional history that graphic scores opened up in terms of “freedom of instruction,” but as standing at a qualitatively different point, having bypassed the creative core of “human interpretation.” Whereas graphic scores maintained the chain of responsibility by shifting the site of active human involvement from composer to performer, Suno structurally lacks the most crucial link in that chain: the “interpreter with a body.”

Therefore, a deep understanding of this structural feature—the “absence of an ethical subject”—becomes the starting point for considering the fundamental issue of the “professional ethics of a composer,” which will be discussed in the next chapter. Of course, this is a principled discussion based on a structural analysis of the creative process and does not deny that contemporary creators can fluidly move back and forth between the acts of “Summoning” and “composition” in their practice. It is noteworthy that technology itself is rapidly evolving to bridge this principled rupture at the practical level, as seen in the MIDI export and part-separation/editing features of the latest version of Suno.

Chapter 4: Composition as a “Social Action” – The Choice of Creative Commitment

In the preceding chapters, we have examined the internal processes of the compositional act, namely “physicality” and “responsibility.” However, the act of composition is not something that is completed solely within the individual. It is essentially a “social action” that functions within relationships with society and others.

At the most fundamental level, the very will of a person to create music and deliver it to others (listeners) is already a social act. It is a form of communication, an attempt to share one’s inner world with others and to have some kind of influence on their hearts. The nature of composition as a social action becomes more concrete in the diverse creative forms of today. For example, collaborations between musicians or with creators from other genres via the internet.

This nature is most pronounced in the world of composite arts, such as film and stage music. Here, composers are required to collaborate deeply with directors, producers, actors, and various other creators, sharing responsibility for the overall success of the work. From this collaborative perspective, the creation of music is not only the “precise realization of intent” but also the “sharing of creative commitment” with everyone involved. It is this commitment that may become a crucial value for the profession of a composer in the age of AI.

Now, to understand how Suno’s music creation process structurally differs from the traditional compositional process, let’s consider the role of an architect using an analogy. After receiving a concept (client’s request), an architect engages in the meticulous work of “design.” They actively engage with every structure and material, using their own knowledge and experience.

Applying this analogy, a Suno user first provides a concept to the AI as a “client.” They then make the “choice” to entrust the subsequent “design” step to the AI. That is, by choosing to delegate the design to the AI, the user enjoys the convenience of “instantly turning a concept into sound,” but in return, the responsibility for “actively determining all sonic structures” is delegated to the AI’s black box. If, instead, they choose the role of a composer, it means imposing on themselves an active commitment to the “design.”

This perspective of “choosing to engage in the design” is a proposal for pursuing the joy of creation even more deeply. It means fully utilizing the “concept realization ability” provided by AI, while actively intervening in its generated output with one’s own physical feedback (instruments, DAW operations, musical knowledge and experience) to sublimate the AI’s statistical judgment into human ethical intent. This may be the path for creators in the age of AI to fulfill their creative commitment of “taking responsibility for every sound” while benefiting from the fruits of democratization.

Suno has made the path to creation surprisingly short, but I believe that any further creative leap from there will always be made through one’s own hands and ears. In the final chapter, I will summarize the content so far and discuss the “aesthetics of responsibility.”

Chapter 5: Redefining Composition and the “Aesthetics of Responsibility”

In this essay, I have attempted to redefine the boundaries of the creative act of composition by structurally analyzing the expansion of the concept of composition brought about by music-generating AIs like Suno. I have concluded that composition is not merely the generation of sound, but an act accompanied by an active feedback loop and ethical responsibility.

Synthesizing these points, a new definition of composition in the age of AI would be as follows:

“Composition is a social action involving physical feedback, where one bears the ethical responsibility as a designer for the final determination of the sound.”

In light of this definition, music generation by AI is characterized by two structural features: the “non-use of physical feedback” and the “delegation of design responsibility.” Therefore, it seems appropriate to position this creative act as a new creative genre, “the Summoning of music,” independent from the domain of the strict concept of composition as defined in this essay.

In a sense, this new endeavor can no longer be contained within the narrow, humanly constrained domain of composition. Supported by AI technology and encompassing the yet-unseen potential of the future, it can be positively positioned as a vaster, newer creative genre.

Looking again at the differences in creation between humans and AI, AI belongs to the world of “the speakable” (language, data, patterns), whereas human creation extends into the realm of “the unspeakable.” This includes the subtleties of emotion that can never be captured by words, the texture of the world, and the mystery of existence. Composition, I believe, is an endless process of exploration where the composer enters this realm of “the unspeakable” with their own body and sensibility, savors it, and ultimately, through the non-verbal order of sound, attempts to outline its contours. One can instruct an AI to create “sad music,” but the existential struggle of confronting the texture (qualia) of one’s own “unnamable sadness” one-on-one through sound is something only a human can do.

By summarizing the content so far, the answer to the question posed at the beginning of this essay—”what is irreplaceable in composition?”—naturally emerges. That is to say, composition by humans encompasses a trinity of dialogue with qualia through the body, the acceptance of ethical responsibility, and the exploration of the unspeakable. These are, in essence, things that AI cannot replace.

Suno has democratically fulfilled people’s desire to “turn concepts into sound” by lowering the high barrier of the compositional process to the initial stage of concept design. This is a literal “democratization of musical expression” and a wonderful achievement to be highly praised. And an important, though less visible, role of Suno that I want to highlight is its function as a “mirror that retro-reflects the concept of composition.” The revelation of AI’s non-physical process has, paradoxically, led to a renewed recognition of the meaning and value of the “physical effort,” “responsibility for sound,” and “struggle with materials” that human composers have unconsciously been undertaking.

Considering these points, the emphasis of the composer’s role in the age of AI may shift from “technical superiority” to “ethical execution.” This is a rise in the importance of a value system that could be called the “aesthetics of responsibility.” The aesthetics of responsibility can be expressed as follows: “The beauty of a work lies not only in the perfection of the sonic result, but also in the very fact that the author has taken responsibility for every corner of that sound with physical effort and creative decisions.”

And to those who enjoy the summoning by Suno, I would like to send the following “invitation” as a partner in exploring the joy of creation. That is, the proposal to fully utilize the concept generation ability provided by AI and to actively intervene in its generated output with one’s own physical feedback (instruments, DAW operations, musical knowledge and experience, etc.), overwriting the AI’s statistical judgment with human ethical intent. This path of “AI-mediated hybrid composition” may be a new value for creators in the age of AI to explore, one that embodies the “aesthetics of responsibility” while fully enjoying the fruits of the democratization of creation.

Earlier, in Chapter 3, I mentioned a “principled rupture” between Suno and the history of composition. This did not mean a desperate cliff. Rather, it was an accurate map showing us where we should build a bridge. And gratifyingly, that bridge is already being built. The latest version of Suno (Suno Studio), released in September 2025, has implemented editing functions that were previously only possible in professional DAWs, such as “separation and individual generation of parts” and “conversion of parts to MIDI data.” This has made it possible for Suno users not just to accept the results of AI summoning as a black box, but to place them on their own DAW workbench and bring them into the process of composition, where they take responsibility for the sound. The path for anyone to shift from “advanced curator” to “responsible creator” has been opened by technology.

Co-creating with AI—it’s a simple phrase, but as discussed, it is layered with various situations and thoughts. By building upon the theory of composition we have arrived at through these reflections, this co-creation will begin to unfold before our eyes as a “possibility with deep colors.”

To conclude this essay, let me ask two questions. If you are filled with the joy of summoning, at what moment does that magic shine brightest? Is it when you think of the perfect spell, or when you encounter an unexpected treasure? And if you are embarking on the path of composition, when do you feel, “I am responsible for this sound”?

The answers to both questions should hold your own irreplaceable creative subjectivity in the age of AI.

Addendum: On the Scope and Issues of This Essay

1. In this analysis, there is a possibility that I have idealized the “physicality” of traditional composition by emphasizing it. Also, my caution towards the acceleration of AI technology may have limited my imagination regarding the “unforeseen creative possibilities” that AI will bring.

2. Regarding ethical responsibility, I focused mainly on the “composer’s internal commitment,” leaving broader external issues, such as social and legal responsibility, as important topics for the future. I would like to add here that the “aesthetics of responsibility” presented in this essay is, at this point, a limited concept derived from my personal practice.

3. The reason this essay positions Suno as a “point of rupture” in the history of composition is solely to point out the break in the principled chain of human physical interpretation. On the other hand, in the creative practice of individual contemporary creators, the acts of “Summoning” and “composition” can form a fluid spectrum, and the word “rupture” does not imply an exclusive disconnection between the two. Indeed, the editing features and MIDI export function of Suno Studio show that anyone can experience this spectrum from summoning to composition.

4. I would like to state again that this essay has no intention of denying or disrespecting the joy of those who enjoy the fruits of the democratization of musical expression.

5. Whether this essay has succeeded in creating a point of reference by shedding light on the question “what is it about composition that cannot be replaced?” is something I would like to leave to you, the reader.