A hermeneutic inquiry into musical meaning in AI generated music: a case study of suno AI’s text to music system

Main Article Content

Novia Ratnasari
Aji Prasetya Wibawa
Syaad Patmanthara

Abstract

This study investigates how generative artificial intelligence participates in the creation and interpretation of musical meaning, focusing on Suno AI’s text to music system as a concrete case. The research addresses the problem of how machine generated sound can be understood hermeneutically specifically, how linguistic prompts, probabilistic modeling, and audio generation processes shape meaning, emotion, and musical intention. The objective of the study is to examine the extent to which generative AI functions as an epistemic partner rather than a passive tool, and to identify how its outputs align with or diverge from human interpretive expectations. Using a digital epistemological hermeneutic framework operationalized through prompt-based observation, semantic interpretation, and comparative listening analysis the study conducted a series of controlled experiments varying genre, instrument, mood, and tempo. For each generated output, the analysis evaluated changes in expressive quality, emotional valence, stylistic coherence, and prompt response fidelity. These evaluation criteria allow the hermeneutic framework to be applied systematically, rather than conceptually. The findings show that generative AI constructs musical meaning through inference and representational mapping, producing sonic forms that partially reflect the semantic cues encoded in linguistic prompts. While the system does not demonstrate human like intentionality, its probabilistic structures generate patterns that resonate with human affective and interpretive frameworks, revealing a co creative space where human prompts and machine inference jointly shape musical expression. These results contribute to music and AI studies by demonstrating how hermeneutics can serve as a methodological lens for understanding AI mediated creativity, and by highlighting the implications of prompt design, model transparency, and human machine interpretation for future research in computational musicology and creative AI systems.

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

[1] K. Pyrovolakis, P. K. Tzouveli, and G. B. Stamou, “Multi-Modal Song Mood Detection with Deep Learning†,” Sensors, vol. 22, no. 3, 2022, doi: 10.3390/s22031065.

[2] G. Robillard and J. Nika, “Critical Climate Machine: A Visual and Musical Exploration of Climate Misinformation through Machine Learning,” Proc. ACM Comput. Graph. Interact. Tech., vol. 7, no. 4, 2024, doi: 10.1145/3664215.

[3] J. Bae et al., “Sound of Story: Multi-modal Storytelling with Audio,” Association for Computational Linguistics (ACL), 2023, pp. 13467–13479. doi: 10.18653/v1/2023.findings-emnlp.898.

[4] C. Bunks, T. Weyde, S. Simon Dixon, and B. Di Giorgi, “Modeling harmonic similarity for jazz using co-occurrence vectors and the membrane area,” International Society for Music Information Retrieval, 2023, pp. 757–764. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209564853&partnerID=40&md5=107cbe26b40164ff4665554e5091dd1a

[5] E. M. Sanfilippo, R. Freedman, and A. Mosca, “Ontological modeling of music and musicological claims. A case study in early music,” Int. J. Digit. Libr., vol. 26, no. 2, pp. 1–18, 2025, doi: 10.1007/s00799-025-00421-z.

[6] A. Thakur, L. Ahuja, R. Vashisth, and R. Singh, “NLP & AI Speech Recognition: An Analytical Review,” Institute of Electrical and Electronics Engineers Inc., 2023, pp. 1390–1396. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159561519&partnerID=40&md5=67e2ea4214d9b20cbbc441536553f100

[7] R. B. R. Satya, Y. Sukmayadi, and T. Narawati, “Exploring the interplay of psychoacoustic parameters and microphone selection in soundscape recording: a comprehensive review and practical guide,” Gelar J. Seni Budaya, vol. 22, no. 1, pp. 59–68, 2024, doi: 10.33153/glr.v22i1.5833.

[8] L. Zhang and X. Liu, “Some Existence Results of Coupled Hilfer Fractional Differential System and Differential Inclusion on the Circular Graph,” Qual. Theory Dyn. Syst., vol. 23, no. Suppl 1, 2024, doi: 10.1007/s12346-024-01117-6.

[9] S. Y. Ahn et al., “How do AI and human users interact? Positioning of AI and human users in customer service,” Text Talk, vol. 45, no. 3, pp. 301–318, 2025, doi: 10.1515/text-2023-0116.

[10] M. K. Virvou, G. A. Tsihrintzis, D. N. Sotiropoulos, K. Chrysafiadi, E. Sakkopoulos, and E. A. Tsichrintzi, “ChatGPT in Artificial Intelligence-Empowered E-Learning for Cultural Heritage: The case of Lyrics and Poems,” Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/IISA59645.2023.10345878.

[11] A. Ara and R. Velluri, “A Study of Emotion Classification of Music Lyrics using LSTM Networks,” Institute of Electrical and Electronics Engineers Inc., 2024, pp. 126–131. doi: 10.1109/ICMCSI61536.2024.00026.

[12] D. Schumacher and F. Labounty, “Enhancing BARK Text-to-Speech Model: Addressing Limitations through Meta’s Encodec and Pretrained HuBert,” Ssrn 4443815, no. May, 2023, doi: 10.13140/RG.2.2.16022.93760.

[13] K. Chauhan, K. K. Sharma, and T. Varma, “Multimodal Emotion Recognition Using Contextualized Audio Information and Ground Transcripts on Multiple Datasets,” Arab. J. Sci. Eng., vol. 49, no. 9, pp. 11871–11881, 2024, doi: 10.1007/s13369-023-08395-3.

[14] A. N. Tusher, S. C. Das, M. L. H. Moeen, M. S. R. Sammy, M. R. S. Sakib, and A. I. Aunik, “Sentiment Analysis of Bangla Song Comments: A Machine Learning Approach,” Institute of Electrical and Electronics Engineers Inc., 2023, pp. 157–162. doi: 10.1109/SMART59791.2023.10428413.

[15] H. D. Shah, A. Sundas, and S. Sharma, “Controlling Email System Using Audio with Speech Recognition and Text to Speech,” Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/ICRITO51393.2021.9596293.

[16] R. K. Chinnasamy, N. Saravanan, N. Gopalswamy, and P. R. Kumar, “Music Lyrics Generator and Translator,” in AIP Conference Proceedings, American Institute of Physics, 2025. doi: 10.1063/5.0262900.

[17] O. Basystiuk and N. Melnykova, “Multimodal Approaches for Natural Language Processing in Medical Data,” in CEUR Workshop Proceedings, CEUR-WS, 2022, pp. 246–252. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85144190636&partnerID=40&md5=fd6ac4574fdc651ee053a8b4a0304ad8

[18] D. Salas Espasa and M. Camacho, From aura to semi-aura: reframing authenticity in AI-generated art—a systematic literature review, no. 1957. Springer London, 2025. doi: 10.1007/s00146-025-02361-3.

[19] K. Khoirunnisaa’, P. Purwanto, S. Bachri, and B. Handoyo, “Model pembelajaran Science, Environment, Technology, Society (SETS) terintegrasi google earth untuk meningkatkan kemampuan memecahkan masalah siswa SMA,” J. Integr. dan Harmon. Inov. Ilmu-Ilmu Sos., vol. 2, no. 7, pp. 633–645, 2022, doi: 10.17977/um063v2i7p633-645.

[20] M. Shubha, K. Kapoor, M. Shrutiya, and R. Mamatha H, “Searching a video database using natural language queries,” Institute of Electrical and Electronics Engineers Inc., 2021, pp. 190–196. doi: 10.1109/ESCI50559.2021.9396886.

[21] R. Capurro, “Digital hermeneutics: An outline,” AI Soc., vol. 25, no. 1, pp. 35–42, 2010, doi: 10.1007/s00146-009-0255-9.

[22] Z. Zhao, “Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale Adversarial Pre-training,” Association for Computing Machinery, Inc, 2025, pp. 2128–2132. doi: 10.1145/3731715.3733483.

[23] L. Huang, “An Interdisciplinary Study of the Unconscious Structures in AI-Generated Music Based on Suno,” J. Contemp. Art Crit., vol. 1, no. 1, pp. 1–9, 2025, doi: 10.71113/jcac.v1i1.283.

[24] L. Chen, “Visual language transformer framework for multimodal dance performance evaluation and progression monitoring,” Sci. Rep., vol. 15, no. 1, 2025, doi: 10.1038/s41598-025-16345-2.

[25] Z. Ouyang, J. Wang, D. Zhang, B. Chen, S. Li, and Q. Lin, “MQAD: A Large-Scale Question Answering Dataset for Training Music Large Language Models,” in Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, Institute of Electrical and Electronics Engineers Inc., 2025. doi: 10.1109/ICASSP49660.2025.10890561.

[26] J. Gardner, I. Simon, E. Manilow, C. Hawthorne, and J. Engel, “MT3: MULTI-TASK MULTITRACK MUSIC TRANSCRIPTION,” International Conference on Learning Representations, ICLR, 2022. [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85150360333&partnerID=40&md5=56f828a3552f342f224148ba582d389a

[27] G. Shin, “Prompt Engineering for AI Music Creation Learning: Application and Analysis Using SUNO AI,” Korean J. Res. Music Educ., vol. 54, no. 3, pp. 95–112, 2025, doi: 10.30775/KMES.54.3.95.

[28] P. Orhan, Y. Boubenec, and J. R. King, “The detection of algebraic auditory structures emerges with self-supervised learning,” PLOS Comput. Biol., vol. 21, no. 9 September, 2025, doi: 10.1371/journal.pcbi.1013271.

[29] W. Subroto, E. Syarief, B. Subiakto, and R. Milyartini, “Cultural value transformation in Anang Ardiansyah ’ s song lyrics : a hermeneutic inquiry into banjar people ’ s identity,” vol. 7, no. 1, pp. 13–24, 2025.

[30] G. Franceschelli and M. Musolesi, “On the creativity of large language models,” AI Soc., vol. 40, no. 5, pp. 3785–3795, 2025, doi: 10.1007/s00146-024-02127-3.

[31] S. Man and Z. Li, “Multimodal Discourse Analysis of Interactive Environment of Film Discourse Based on Deep Learning,” J. Environ. Public Health, vol. 2022, 2022, doi: 10.1155/2022/1606926.

[32] R. Bhavani, T. V. Muni, R. K. Tata, J. Narasimharao, M. Kalipindi, and H. Kaur, “Deep Learning Techniques for Speech Emotion Recognition,” Institute of Electrical and Electronics Engineers Inc., 2022. doi: 10.1109/INCOFT55651.2022.10094534.

[33] S. Geng, G. Ren, X. Pan, J. P. Zysman, and M. Ogihara, “Sequential modeling of temporal timbre series for popular music sub-genres analyses using deep bidirectional encoder representations from transformers,” 2021, [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85113849304&partnerID=40&md5=84901ecb0b4d0a6048aa4ca8a2f16b7b

[34] A. Melendez-Rios, R. Vega-Berrocal, and W. Ugarte, “Generative Adversarial Neural Networks for Random and Complex Chord Progression Generation,” in Conference of Open Innovation Association, FRUCT, IEEE Computer Society, 2025, pp. 185–194. doi: 10.23919/FRUCT65909.2025.11008228.

[35] B. Dave and P. Majumder, “SqCLIRIL: Spoken query cross-lingual information retrieval in Indian languages,” Pattern Recognit. Lett., 2025, doi: 10.1016/j.patrec.2025.08.022.

[36] A. Šeļa, P. Plecháč, and A. Lassche, “Semantics of European poetry is shaped by conservative forces: The relationship between poetic meter and meaning in accentualsyllabic verse,” PLoS One, vol. 17, no. 4 April, 2022, doi: 10.1371/journal.pone.0266556.

[37] J. Wang, “Research on the Integration Path of College Vocal Music Teaching and Traditional Music Culture Based on Deep Learning,” Appl. Math. Nonlinear Sci., vol. 9, no. 1, 2024, doi: 10.2478/amns.2023.2.01218.

[38] D. Jia et al., “VOICE: Visual Oracle for Interaction, Conversation, and Explanation,” IEEE Trans. Vis. Comput. Graph., vol. 31, no. 10, pp. 8828–8845, 2025, doi: 10.1109/TVCG.2025.3579956.

[39] B. J. Carone and P. Ripollés, “SoundSignature: What Type of Music do you Like?,” Institute of Electrical and Electronics Engineers Inc., 2024. doi: 10.1109/IS262782.2024.10704174.

[40] N. Fradet, N. Gutowski, F. Chhel, and J. P. Briot, “Byte Pair Encoding for Symbolic Music,” Association for Computational Linguistics (ACL), 2023, pp. 2001–2020. doi: 10.18653/v1/2023.emnlp-main.123.

[41] S. P. G. P. L. Raja and V. V Ramalingam, “The grammatical structure used by a Tamil lyricist: a linear regression model with natural language processing,” Soft Comput., vol. 27, no. 23, pp. 18215–18225, 2023, doi: 10.1007/s00500-023-09263-w.

[42] V. Gupta, S. Jeevaraj, and S. Kumar, “Songs Recommendation using Context-Based Semantic Similarity between Lyrics,” Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/INDISCON53343.2021.9582158.

[43] J. S. Reddy, D. A. Surat, P. Shyamala, and S. Syama, “Emotion Prediction from Text and Multilingual Voice Inputs,” Institute of Electrical and Electronics Engineers Inc., 2024, pp. 850–855. doi: 10.1109/ICECA63461.2024.10801126.

[44] I. Dilshani and M. C. Chandrasena, “Bridging Linguistic Gaps: A Review of AI-Driven Speechto-Speech Translation for Sinhala and Tamil in Sri Lanka,” Institute of Electrical and Electronics Engineers Inc., 2025. doi: 10.1109/SCSE65633.2025.11030975.

[45] J. Kane, M. N. Johnstone, and P. Szewczyk, “Voice Synthesis Improvement by Machine Learning of Natural Prosody,” Sensors, vol. 24, no. 5, 2024, doi: 10.3390/s24051624.

[46] A. Q. A. Hassan et al., “Integrating applied linguistics with artificial intelligence-enabled arabic text-to-speech synthesizer,” Fractals, vol. 32, no. 9–10, 2024, doi: 10.1142/S0218348X2540050X.

[47] J. Rakas, S. Sohn, L. Keslerwest, and J. Krozel, “Deep Speech Pattern Analysis of Controller-Pilot Voice Communications for Enhancing Future Aviation Systems Safety,” American Institute of Aeronautics and Astronautics Inc, AIAA, 2023. doi: 10.2514/6.2023-4410.

[48] R. Zhao, A. S. G. Choi, A. Koenecke, and A. Rameau, “Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech,” Laryngoscope, vol. 135, no. 1, pp. 191–197, 2025, doi: 10.1002/lary.31713.

[49] T. Harada, T. Motomitsu, K. Hayashi, Y. Sakai, and H. Kamigaito, “Can Impressions of Music be Extracted from Thumbnail Images?,” pp. 49–56, 2024, [Online]. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-105000168958&partnerID=40&md5=ed7c2484fdf96eeaf23dd2a60ab61f69

[50] M. Li, “Exploring the Application of Large Language Models in Spoken Language Understanding Tasks,” Institute of Electrical and Electronics Engineers Inc., 2024, pp. 1537–1542. doi: 10.1109/ICSECE61636.2024.10729345.

[51] J. Jo, S. Kim, and Y. Yoon, “Text and Sound-Based Feature Extraction and Speech Emotion Classification for Korean,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 14, no. 3, pp. 873–879, 2024, doi: 10.18517/ijaseit.14.3.18544.

[52] P. Murphy, “Design Research: Aesthetic Epistemology and Explanatory Knowledge,” She Ji, vol. 3, no. 2, pp. 117–132, 2017, doi: 10.1016/j.sheji.2017.09.002.

[53] P. Ulleri, S. H. Prakash, K. B. Zenith, G. S. Nair, and J. M. Kannimoola, “Music Recommendation Systum Based on Emotion,” Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/ICCCNT51525.2021.9579689.