Aug 8th 2007 Publications
Jorge Gonzalez Alonso, Maria del Pilar Garcia Mayo, and Julián Villegas. Processing of English compounds by Basque-Spanish bilinguals: The role of dominance. In Hispanic Linguistics Symposium, Gainesville, Florida (USA), Oct. 2012.
Word-formation processes vary greatly among languages, although those which are typologically close tend to cluster around particular configurations which may or may not differ from those of other linguistic families. The case of compound words in Romance and Germanic languages has received a considerable amount of attention from both theoretical linguists (Contreras,1985; Yoon, 2009) and acquisitionists (Liceras & Díaz, 2000; Slabakova, 2002; García Mayo, 2006), with the second focusing more on the interplay between two or more systems in a multilingual setting. The case of deverbal N+N compounds (e.g. can opener) in English as compared to their V+N Spanish semantic equivalents (e.g. abrelatas ‘can opener’, lit. ‘opens-cans’) isparticularly interesting. What seems apparent is that Spanish and English do not lexicalise verb-noun relationships in the same way. Basque, on the other hand, does seem to have direct parallels with English: Basque deverbal compounds are also right-headed N+N constructions, in which the deverbal head has been nominalised through affixation (e.g. kontu kontalaria, lit.‘story teller’). In light of this, are there any facilitatory effects in processing for those bilinguals whose L1 is similar to the L3 in the formation of deverbal compounds? We carried out an experiment in which we controlled for both language profile and proficiency. Sixty-six participants belonging to one of three language groups (L1-Spanish monolinguals, L1Basque-L2Spanishbilinguals and L1Spanish-L2Basque bilinguals) were assigned to one of three levels of proficiency in English (high, medium or low) based on their scores on the standardised Oxford Placement Test, and further tested in a lexical decision task, where they were asked to respond whether the items appearing on screen were actual English words. For the critical condition, 42 high-frequency English compounds and 42 pseudo-compounds (non-words) were used. The design was completed with 168 fillers: 84 non-compound words and 84 non-words. We predicted practically equal accuracy rates for all groups at comparable levels of proficiency, since the effect is not expected to override lexical knowledge; a faster performance of the monolingual group, due to an attested higher processing cost in bilinguals (Costa, 2005); and shorter response latencies for the Basque-dominant bilinguals as opposed to their Spanish-dominant counterparts, since the critical structure is hypothesised to be more readily available for the former group. Results have largely matched our predictions: two-way ANOVAs performed on the data indicated no significant effect of the participants’ linguistic profile on their accuracy rates, a factor which was however significantly influential when it came to their response latencies to the critical conditions. That is, while all participants,irrespective of language group, performed equally well when compared to their proficiency-matched counterparts, Basque-dominant bilinguals were significantly faster at processing English deverbal compounds than their Spanish-dominant peers. These results will be considered in light of models of L3 transfer, for which they might have important implications.
 Michael Cohen, Rasika Ranaweera, Kensuke Nishimura, Yuya Sasamoto, Yukihiro Nishikawa, Tetunobu Ohashi, Ryo Kanno, Tomohiro Oyama, Anzu Nakada, and Julián Villegas. Whirled Sequencing of Spatial Music. In Proc. Audio Eng. Soc. Japan Sect. Conf., Sendai, Oct. 2012.
“Poi,” originally a Maori performance art featuring whirled tethered weights, combines elements of dance and juggling. It has been embraced by contemporary festival culture (especially rave-style electronic music events), including extension to “glowstringing,” in which a glow stick (chemiluminescent plastic tube) is whirled at the end of a string. We further modernize this activity, opening it up to internet-amplified multimedia. The ubiquity of the contemporary smartphone makes it an attractive platform for even location-based attractions. By sensing its magnetometer, the twirling of a mobile phone can be used to sequence score-following music. Synchronizing this sequencing with sound spatialization, also modulated by the azimuth of the whirled phone, as through an annular (ring-shaped) speaker array, allows interactive, multimodal interaction.
We investigated the effect on objective speech intelligibility of scaling the fundamental frequency (f0) of voiced regions in a set of utterances. The frequency scaling was driven by maximising the glimpse proportion in voiced epochs, inspired by musical consonance maximisation techniques. Results show that depending on the energetic masker and the signal to noise ratio, f0modifications increased the mean glimpse proportion by up to 15%. On average, lower mean f0 changes resulted in greater glimpse proportions. It was also found that the glimpse proportion could be a good predictor of music consonance.
In this talk, the origins of speech intelligibility and musical consonance are discussed. Physical, perceptual, and cognitive causes of these complex phenomena have been identified and the understanding of their interactions is still a very active topic of research. We will focus on acoustic and psycho-acoustic features of speech and music and present the effect of them onintelligibility and consonance. Particularly, the role of fundamental frequency in both phenomena will be discussed: we will show the effect of scaling the fundamental frequency of voiced regions on objective speech intelligibility. The frequency scaling is driven by maximizing the glimpse proportion, inspired by musical consonance maximization techniques.
Speech rate has been identified as one of the main differences between speaking styles. Clear speech (or the speech produced by someone who has been asked to speak clearly) and Lombard speech (or speech produced in noisy environments) are more intelligible than other styles (like casual and “normal” speech), and also have a slower speech rate. In this talk, we will discuss the role of speech rate on intelligibility, show how to artificially modify speech rate either by stretching or compressing an utterance or by time-aligning one to another. Rather than discussing the inner mechanisms of the signal processing, we willfocus on understanding the differences between the two modalities and how to use existing software applications to modify duration.
 Elizabeth Godoy, Yannis Stylianou, and Julián Villegas. Unsupervised Normal-to-Lombard Spectral Envelope Transformation; Examining Loudness, Voicing & Stationarity. In Proc. The Listening Talker Wkshp., Edinburgh, May 2012.
When speaking in noisy environments, humans modify their speech in order to make it more intelligible: this phenomenon is known as the Lombard effect ,. It has been shown that, among the various Lombard modifications, those to the spec- tral envelope account for the largest increases in speech intelligibility . The present work examines and seeks to exploit the spectral envelope differences between Normal and Lombard speech for multiple (4 male, 4 female) speakers of the GRID corpus in an unsupervised context, i.e., in the absence of seg- mentation or phonetic labeling. Our goals are twofold: 1) totransform the Normal speech spectral envelope towards that of the Lombard; 2) to isolate acoustic criteria that help to identify and better understand important (e.g. perceived) spectral differ- ences between Normal and Lombard speech.
Speech produced in the presence of noise – Lombard speech (LS) – has been found to be more intelligible than ’normal’ speech when presented in equivalent amounts of noise. However, the origin of the Lombard speech advantage remains unclear. Part of the benefit appears to stem from spectral changes in LS which shift energy into the 1-4 kHz region where it better escapes energetic masking by speech-shaped noise. Other parameters which show changes in LS include F0 and duration. Lu & Cooke (2009) modified the mean F0 and spectrum (both independently and jointly) of normal speech, demonstrating a clear advantage of spectral modification but no effect of F0. The current study extended Lu & Cooke (2009) in two directions. First, durational modifications to reflect differences between normal and Lombard speech were included. Second, as well as global (per utterance) changes, local (per frame) modifications were applied. Four male and four female talkers produced simplesentences containing spoken letter and number keywords in quiet and in intense speech-shaped noise (96 dB SPL). A perception experiment explored global versus local modifications of spectral and durational parameters applied independently. Spectral changes based on global or local spectral difference measures were equally beneficial. Taken across all talkers, durational changes to normal speech produced no intelligibility benefit, whether applied globally (i.e. linear stretching or compression) or locally (using dynamic time warping to align normal and Lombard frames). However, when talkers were partitioned in two groups according to their speech rate in noise, some effect of durational modifications was observed: normal speech modified to the faster speech rate group was significantly less intelligible than unmodified speech, while conversely for the group with slower Lombard speech a small intelligibility benefit was present. These findings suggest that durational differences between normal and Lombard speech can affect intelligibility, but the benefit or otherwise depends on individual differences in speech rate.
 Julián Villegas, Martin Cooke, and Catherine Mayo. The influence of temporal and spectral modifications on the intelligibility of normal and Lombard speech. In Proc. SPiN–2012: The 4 Int. Wkshp. on Speech in Noise: Intelligibility and Quality, Cardiff, Jan 2012.
The current study manipulated independently spectral and durational parameters of ‘normal’ and Lombard utterances. For each parameter, normal speech was modified to take on the values observed in Lombard speech, while Lombard speech was modified to match those values found in normal speech. Modifications were applied globally or instantaneously. Durationalmodifications had no effect on intelligibility, while spectral changes led to large gains. Global modifications produced larger effects than instantaneous modifications. These findings suggest that most of the intelligibility benefit of Lombard speech is due to the release from energetic masking resulting from spectral changes. However, Lombard speech retains some residual intelligibility benefit. The current study demonstrates that this residual gain is unlikely to be due to the slower speaking rate observed in Lombard speech.
 Michael Cohen, Rasika Ranaweera, Hayato Ito, Shun Endo, Sascha Holesch, and Julián Villegas. Whirled worlds: Pointing and spinning smartphones and tablets to control multimodal augmented reality displays. In HotMobile: Proc. Int. Wkshp. on Mobile Computing Systems and Applications, San Diego, 2012.
 Vincent Aubanel, Julián Villegas, and Martin Cooke. Conversing in the presence of another conversation: interactive and Lombard effects. In Wkshp. on the production and comprehension of conversational speech, Nijmegen, the Netherlands, Dec. 2011.
Conversational speech is usually characterized by well described departures in production from carefully pronounced speech, while it maintains a fair level of comprehension. Less is known however on the speech production modifications induced by a background conversation on a foreground conversation, and how speakers manage to maintain intelligibility and comprehension in this scenario. Extending a previous study with Spanish native speakers, we recorded pairs of English native speakers engaging in natural dialogues in the absence or presence of another talker pair. We observed small but significant increases in energy and F1 across conversations, and larger prosodic effects (increase of f0, decrease in speech rate) within dialogues, which are not easily explained in purely energetic masking terms. Indeed, background conversations are different from noise used in traditional Lombard studies in that they consist of intelligible speech, which creates the potential for informational masking at the ears of the interlocutor. We further tested whether speakers’ eye-contact with their interlocutor could influence their capacity to cope for the disruptive effect of a background conversation. Preliminary result point to an attenuation of Lombard effects (i.e., intensity, f0, F1, speech rate) in the absence of eye-contact condition, which could be the sign of an active monitoring of the informational content of the conversation, thus further minimizing the masking effect of the background conversation.
 Michael Cohen, Rasika Ranaweera, Hayato Ito, Shun Endo, Sascha Holesch, and Julián Villegas. Whirling interfaces: Smartphones & tablets as spinnable affordances. In Proc. of ICAT—The 21 Int. Conf. on Artificial Reality and Telexistence, Osaka, Nov. 2011.
Interfaces featuring smartphones and tablets that use magnetometer-derived orientation sensing can be used to modulate virtual displays. Embedding such devices into a spinnable affordance allows a “spinning plate”-style interface, a novel interaction technique. Either static (pointing) or dynamic (whirled) mode can be used to control multimodal display, including panoramic and turnoramic images, the positions of avatars in virtual environments, and spatial sound. “Spinning,” in which a flatish object is whirled with an extended finger or stick is a disappearing art. We hope to re-motivate this vanishing skill, modernizing it and opening it up to internet-amplified multimedia. The ubiquity of the modern smartphone makes it an attractive platform for even location-based attractions. We are experimenting with embedding mobile devices into suitable affordances that encourage their spinning. Using azimuthal (yaw) tracking especially allows such devices to control horizontalplanar displays such as periphonic spatial sound, as well as avatar heading and (QTVR-style) panoramic and turnoramic imaged-based rendering.
 Martin Cooke, Vincent Aubanel, Julián Villegas, and Maria Luisa Garcia Lecumberri. Lombard, interactional and overlap effects while conversing in the presence of competing speech. In Proc. of Int. Wkshp. “Computational Audition”, Delmenhorst, October 2011.
One aspect of the cocktail party problem which has hitherto received little attention is the role played by the interlocutors themselves in facilitating comprehension during conversations which take place in noise. Studying speech produced in noise (Lombard speech) is not new, but less is known about a talker’s response to ‘noise’ consisting of competing speech, especially in real conversations, where modifications to both low-level acoustic parameters as well as higher-level interactional aspects can be expected. Understanding a talker’s response to adverse conditions during spoken communication might lead to improvements in speech output technologies (for instance, rendering equal intelligibility at lower presentation levels, or more appropriate timing of interventions in dialogue systems). Here, we present results from two studies involving speech produced in the presence of intelligible speech.
 Julián Villegas, Vincent Aubanel, and Martin Cooke. Temporal changes in conversational interactions induced by the presence of a simultaneous conversation. In Proc. ESCOP, the 17th Meeting of the European Soc. for Cognitive Psychology, Donostia – San Sebastián, Spain, Sep. 2011.
This study aims to better understand the changes in foreground conversations induced by background conversations, particularly modifications in the temporal domain including overlaps between foreground and background speech. Understanding the strategies that humans adopt to orally communicate with a peer in the presence of competing dialogs could give some useful insights for developing improved human–computer interfaces, delivering aural information more effectively, etc. In comparison to the acoustic effects of a background dialog in a conversation, our knowledge on background conversation interactional effects is rather limited. In experiments involving simultaneous conversations, we have found intensity and fundamental frequency increments, speech rate decrements, and other changes associated with the Lombard effect in speech produced in the presence of competing talkers. Interactional effects such as greater number of interruptions and dysfluencies, and less accurate turn taking were also seen. Unlike previous studies, we observed no reduction in overlap between foreground and background speech. We hypothesise that this unexpected result could be explained by visual cues used by the subjects during the conversation, methodological differences (i.e., as opposed to free conversations, previous reports focused on task-oriented experiments), stimuli differences (a single competing talker instead of a spontaneous talking pair).
Major challenges to adapt all forms of speech output to a given auditory context (e.g., noisy or highly reverberant environments, second language or hearing-impaired listeners, etc.) based on human speaker strategies are discussed. Ongoing research aimed at increasing speech intelligibility in real-time without compromising speech quality (or fatiguing the listener) is described, and software applications used in this research are presented. This talk will also present auditory demonstrations of natural and artificial speech modifications.
 Michael Cohen and Julián Villegas. From Whereware to Whence- and Whitherware: Augmented Audio Reality for Position-Aware Services. In Proc. Int. Symp. on VR Innovation (ISVRI), Singapore, March 2011.
Since audition is omnidirectional, it is especially receptive to orientation modulation. Position can be defined as the combination of location and orientation information. Location-based or location-aware services do not generally require orientation information, but position-based services are explicitly parameterized by angular bearing as well as place.“Whereware” suggests using hyperlocal georeferences to allow applications location-awareness; “whence- and whitherware” suggests the potential of position-awareness to enhance navigation and situation awareness, especially in realtime high-definition communication interfaces, such as spatial sound augmented reality applications. Combining literal direction effects and metaphorical (remapped) distance effects in whence- and whitherware position-aware applications invites over-saturation of interface channels, encouraging interface strategies such as audio windowing, narrowcasting, and multipresence.
Wireless technologies allow the introduction of sensors in otherwise unanticipated devices, increasing the opportunities of interaction with real-world, daily-life things. In this paper the idea of controlling three-dimensional audio by means of flying discs is presented, the prototype implementation (based on gyroscopes, Xbee radios, Arduino micro-controllers, Pure-Data and Quartz Composer patches) helps to understand the challenges, capabilities and limitations of the underlying technologies that make such interactions possible.
 Martin Cooke, Julián Villegas, Vincent Aubanel, and Marco Piccolini-Boniforti. mtrans: A MATLAB Tool for Multi-Channel, Multi-Tier Speech Annotation. In Proc. Int. Wkshp. New Tools and Methods for Very-Large-Scale Phonetics Research, Philadelphia, USA, Jan. 28–31 2011.
A common requirement is to transcribe speech recordings with more than two channels e.g. recordings of multi-person meetings or those exploring the effect of background conversations on foreground conversations. While some annotation tools and signal editors exist which can handle N–channel audio for N > 2, they rarely possess the flexibility in audio output and visual display required when transcribing multichannel conversational speech. Overlaps, frequent in natural conversations, are particularly problematic, and their analysis would benefit from support for visual signal separation. A further issue is the mapping from annotation tiers to channels, where typical 11 assumptions are too restrictive in cases where the tier corresponds to events in more than one tier. To support rapid and flexible annotation of multichannel speech with many levels of annotation, we have developed mtrans, a freely- available MATLAB tool.
 Vincent Aubanel, Martin Cooke, Julián Villegas, and Maria Luisa Garcia Lecumberri. Conversing in the presence of a competing conversation: effects on speech production. In Proc. Interspeech, 2011.
This study investigates how a background conversations affect foreground conversations, and how speakers may adjust their speech to overcome the perturbations. Three pairs of speakers were recorded in different combinations of simultaneous dialogues, and speech production modifications were investigated at an acoustical and interactional level. In addition todisplaying standard Lombard effects, speakers were found to produce less back-channels and more interruptions in the presence of a background conversation. A decrease in the precision of turn taking was also observed. These results provide a better understanding of the strategies speakers may be developing in dealing with a concurrent conversation in the view ofincorporating them into spoken dialogue systems.
MTRANS, a freely available tool for annotating multi-channel speech is presented. This software tool is designed to provide visual and aural display flexibility required for transcribing multi-party conversations; in particular, it eases the analysis of speech overlaps by overlaying waveforms and spectrograms (with controllable transparency), and the mapping from media channels to annotation tiers by allowing arbitrary associations between them.MTRANS supports interoperability with other tools via the Open Sound Control protocol.
 Yuya Sasamoto, Julián Villegas, and Michael Cohen. Spatial Sound Control with the Yamaha Tenori-On. In Proc. HC-2010:13 Int. Conf. on Humans and Computers, Aizu-Wakamatsu, Japan, Dec. 8–10, 2010.
In this research, we explore control of spatial sound using the Yamaha Tenori-On and the University of Aizu Business Innovation Center (UBIC 3D) Theater speaker array. This project explores the spatialization of music, by mapping Tenori-On performances and sound localization in a single operation, allowing the notes to freely move around the room.
 Julián Villegas and Michael Cohen. Hrir˜: Modulating Range in Headphone-Reproduced Spatial Audio. In Proc. of VRCAI2010: Ninth Int. Conf. on VRCAI (VR Continuum and Its Applications in Industry), COEX Seoul, Korea, December 12–13, 2010.
Hrir˜, a new software audio filter for Head-Related Impulse Response (HRIR) convolution is presented. The filter, implemented as a Pure-Data object, allows dynamic modification of a sound source apparent location by modulating its virtual azimuth, elevation, and range in realtime. The last attribute being missing in surveyed similar applications. With hrir˜ users can virtually localize monophonic sources around a listener’s head in a region delimited by elevations between [-40,90]∘, and ranges between [20,160] cm from the center of the virtual listener’s head. An application based on hrir˜ is presented toillustrate its benefits.
We have retrofitted a vehicle with location-aware advisories/announcements, delivered via wireless headphones for passengers and bone-conduction headphones for the driver. Our prototype differs from other research in the spatialization of the aural information. Besides the commonly used landmarks to trigger audio streams delivery, our prototype uses geo-located virtualsources to synthesize the spatial soundscapes. Intended as a “proof of concept” and testbed for future research, our development features multilingual tourist information, navigation instructions, and traffic advisories rendered simultaneously.
This tutorial introduces the theory and practice of spatial sound for entertainment computing, including psychophysical (psychoacoustic) basis of spatial hearing, outlines the mechanism for creating and displaying spatial sound the hardware and software used to realize such systems, display configurations, and reviews some applications of spatial sound to entertainment computing, especially multimodal interfaces, featuring spatial sound. Many case studies reify the explanations; animations, videos, and live demonstrations are featured
 Julián Villegas. Beating and Roughness. The Wolfram Demonstrations Project, September 2010. [Online; accessed 23-Sep-2010]: http://demonstrations.wolfram.com/BeatingAndRoughness.
A demonstration of beating sinusoids showing fluctuation strength, roughness, and tone separation.
 Kazuhide Hamao, Julián Villegas, Michael Cohen, Jun Yamadera, and Kensuke Shimizu. “cyberbus” spatial sound navigation seminar. Technical report, University of Aizu, Aizu-Wakamatsu, Japan, June 2010.
We present a novel application of spatial (“3D”) sound to way-finding systems exploiting position-awareness (GPS) and hypermedia (GIS car navigation maps). We believe that with this application, we support drivers in finding the correct path to follow, so they do not need to deviate their sight from the road to retrieve visual information from the car navigation system,therefore increasing the driving safety. We are currently working on the integration of our system with already existing vehicular multimedia systems, mobile technologies, and driver guidance (Intelligent Traffic Systems–ITS). We envision a future were automobiles require little human intervention and in those cases, the information presented to the driver matches the best sensory modality possible, which in many cases could be the auditory one.
GoldenM is a computer-generated composition created in Pd. It is an arrangement for three voices and filtered white noise. In GoldenM, each voice has a spectrum based on the golden ratio (about 1.61803), and the pitch set was selected using the minima of the dissonance function as proposed by Vassilakis. Rhythm and the spatialization are generated using Markov chains. The purpose of the composition is to use the golden ratio, in unnatural ways preserving, in some extent, its esthetic nuance. The result, at moderate volume, resembles (at least to the author) the sound of chimes and bells used in Asian musical traditions.
We have found evidence suggesting that for musically naïve participants, when selecting among similar renditions of the same musical fragment, psychoacoustic roughness is an influencing factor on preference. We designed an experiment to compare the acceptability of three different music fragments, rendered with three different intonations, and contrasted the results with thoseof isolated chords of the same fragment–intonation combinations.
The goal of this study is to help to understand the influence of psychoacoustic roughness in music. Roughness is an auditory attribute produced by rapid temporal envelope fluctuations (normally resulting from wave interferences), and it has been related to musical dissonance. After reviewing the main theories that explain the origin of roughness, a software program created for the purpose of this research is presented. This software application, based on a spectral model to predict roughness (a physical predictor of the auditory attribute), is able to control (usually, to reduce) the predicted roughness of a sound ensemble in realtime. Experimental results were analyzed with a standard measurement software tool to corroborate the predicted roughness reduction. The audio output of this software application was compared by human subjects with renditions of the same musical content using some well known tuning systems (twelve tones equal tempered and just tuning). The results of these subjective experiments are presented and analyzed. Preliminary results on binaural roughness perception are presented at the end of the dissertation as a new direction of research. Contributions of the present work include the creation of an adaptive tuning program capable of retuning audio streams in realtime to minimize the measured roughness due to the interaction between sounds (extrinsic roughness). To the best of our knowledge, this procedure had been applied only to MIDIsequences for which realtime constraints implied oversimplifications that are not assumed in our program. We were able to determine that roughness by itself can explain musical preference among musically naïve participants. In the analysis of thatexperiment, we found that, contrary to popular belief, predicted roughness of 12-TET intervals is not always greater than pure intervals. This discovery correlates with preference choices reported by participants. We also show that current roughness models need to be revised to include the effect of binaural cues. Other minor contributions include several entries in Wikipedia (e.g., Vicentino’s keyboard layout) and to Mutopia (e.g., Bach choral BWV 264).
 Julián Villegas. Encuentros entre Colombia y Japón: homenaje a 100 años de amistad, chapter De como el mundo es un pañuelo y de las misteriosas maneras (Of how the world is a handkerchief and the misterious ways). Colombian Ministry of Foreign Affairs, Bogotá D.C., Colombia, 2010. (Fiction) In Spanish.
 Julián Villegas and Michael Cohen. Mapping Musical Scales Onto Virtual 3D Spaces. In Yôiti Suzuki, Douglas Brungart, Hiroaki Kato, Kazuhiro Iida, Densil Cabrera, and Yukio Iwaya, editors, Principles and Applications of Spatial Hearing. World Scientific, 2010.
We introduce an enhancement in the Helical Keyboard, an interactive installation displaying three-dimensional musical scales aurally and visually. This improvement in the audio display is intended to facilitate didactic purposes by enhancing users’ immersion in a virtual environment. The new system allows spatialization of audio sources with elevation angles between -40∘and +90∘ and azimuth angles between 0∘ and 355∘. In this fashion, we could overcome previous limitations on the audio display of the Helical Keyboard, for which we heretofore usually displayed only azimuth.
We have created a reintonation system that minimizes measured roughness of parallel sonorities as they are produced. Intonation adjustments are performed by finding, within a user-defined vicinity, a combination of fundamental frequencies that yields minimal roughness. The vicinity imposition limits pitch drift and eases realtime computation. Prior knowledge of the temperament and notes being played is not necessary for the operation of the algorithm. We test a proof of concept prototype adjusting equal temperament intervals reproduced with a harmonic spectrum towards pure intervals in realtime. Pitch drift of the rendered music is not prevented but limited. This prototype exemplifies musical and perceptual characteristics of roughness minimization by adaptive techniques. We discuss the results obtained, limitations, possible improvements, and future work.
 Julián Villegas and Michael Cohen. Mapping Topological Representations of Musical Scales Onto Virtual 3D Spaces. InProc. IWPASH: Int. Wkshp. on the Principles and Applications of Spatial Hearing, Zao, Japan, November 2009. [Online; accessed 5-April-2010]: http://eproceedings.worldscinet.com/9789814299312/toc.shtml.
We have developed a Collaborative Virtual Environment (CVE) client that allows directionalization of audio streams using a Head-Related Transform Function (HRTF) filter. The CVE is a suite of multimedia and multimodal clients, authored mostly in Java by members of our laboratory. Its simple but robust synchronization mechanism is used for sharing information regardinglocation and position of virtual objects among multiple applications . The new client has been deployed in conjunction with the Helical Keyboard, an interactive installation displaying three-dimensional musical scales aurally and visually, to offer a more realistic user experience and musical immersion. It allows spatialization of audio sources with elevation angles between-40∘ and +90∘ and azimuth angles between 0∘ and 355∘. In this fashion we could overcome previous limitations on the auditory display of our objects, for which we heretofore usually displayed only azimuth.
 Michael Cohen, Julián Villegas, Mamoru Ishikawa, Akira Inoue, Hiromitsu Sato, Hiroki Tsubakihara, and Jun Yamadera. “VMP My Ride”: Windshield Wipers that Swing. In Proc. Asiagraph, pages 126–130, Tokyo, October 2009.
“VMP” (pronounced /vimp/) is acronymic for ‘Visual Music Player,’ a synæsthetic audio-visual renderer. Inspired by the hit TV show “Pimp My Ride,” in which cars are outrageously customized, we recast the windshield wipers of an automobile with advanced multimedia technology, allowing wipers to dance to music. We have programmed a beat detector using “Pure Data.”This musical beat signal is filtered to drive a phase-locked loop (PLL), which triggers choreographed, articulated gestures in virtual, actual, and model windshield wipers. We use a cyclic buffer to implement a moving (“windowed”) average, effectively a low-pass filter of inter-event intervals, a digital phase-locked loop (DPLL) used to propel the selected rhythm. Achoreography module articulates the wiper sweeping gesture, mapping the DPLL beats into segmented gestures used to drive windshield wipers in three varieties: virtual, through a driving simulator; real, pulsing a standard wiper motor; and miniaturized, through a stepping-motor actuated model wiper. The driving simulator, implemented in Java3D, renders the virtual gestures. Mechatronic interfaces use signals generated by the DPLL to trigger the physical wiper cycles; signals sent via USB to custom circuits driving Peripheral Interface Controllers motivate the motors.
 Michael Cohen, Julián Villegas, Mamoru Ishikawa, Akira Inoue, Hiromitsu Sato, Hiroki Tsubakihara, and Jun Yamadera. “VMP My Ride”: Windshield Wipers that Swing. In Proc. ARTECH, page 144, Tokyo, October 2009.
Instead of adding to the driving cacophony, windshield wipers can enhance musical audition, reinforcing a beat by hiding their cabin noise in the rhythm, increasing the signal:noise ratio by increasing the signal and masking the noise, providing “visual music,” the dance of the wipers, as a welcome side-effect. We have implemented such a visual music player, which takes an arbitrary musical source and renders its beats as pulses of windshield wipers, both virtual and real. “VMP” (pronounced /vimp/) is acronymic for ‘Visual Music Player,’ a synæsthetic audio-visual renderer. Inspired by the hit TV show “Pimp MyRide,” in which cars are outrageously customized, we recast the windshield wipers of an automobile with advanced multimedia technology, allowing the wipers to dance to music.
 Julián Villegas and Michael Cohen. 音楽を探るオペレーションズ・リサ ーチ的手法を使って(Exploring Tonal Music Through Operational Research Methodology).Communications of the Operations Research Society of Japan, 54(9):554–562, October 2009. In Japanese.
Two operational research applications in music are presented. Initially, the mapping of musical scales into multi-dimensional topologies is discussed, and the advantages of projecting these structures into simple spaces explained. We also present the Helical Keyboard, an interactive installation displaying three-dimensional musical scales aurally and visually. Subsequently, the problem of minimizing musical dissonance between audio streams in realtime is discussed, and a solution based on local minima search described.
In traditional conferencing systems, participants have little or no privacy, as their voices are by default shared with all others in a session. Such systems cannot offer participants the options of muting and deafening other members. The concept of narrowcasting can be applied to make these kinds of filters available in multimedia conferencing systems. Our system treats media sinks (in the simplest case, listeners) as full citizens, peers of the media sources (conversants’ voices), and we defined therefore duals of mute & select: deafen & attend, which respectively block a sink or focus on it to the exclusion of others. In this article, we describe our prototyped application, which uses existing standard Session Initiation Protocol (SIP) methods to control fine-grained narrowcasting sessions. The runtime system considers the policy configured by the participants and provides a policy evaluation algorithm for media mixing and delivery. We have integrated a “virtual reality”-style interfacewith this SIP backend to display and control articulated narrowcasting with figurative avatars.
 Sabbir Alam, Michael Cohen, Julián Villegas, and Ashir Ahmed. Narrowcasting in SIP: Articulated Privacy Control. In Syed Ahson and Mohammad Ilyas, editors, SIP Handbook: Services, Technologies, and Security of Session Initiation Protocol, chapter 14, pages 323–345. CRC Press, 2009.
A software tool capable of determining auditory roughness in real-time is presented. This application, based on Pure-Data (Pd), calculates the roughness of audio streams using a spectral method originally proposed by Vassilakis. The processing speed is adequate for many realtime applications, and results indicate limited but significant agreement with an internet application of the chosen model. Finally, the usage of this tool is illustrated by the computation of a roughness profile of a musical composition that can be compared to its perceived patterns of ‘tension’ and ‘relaxation.’
 Mohammad Sabbir Alam, Michael Cohen, Ashir Ahmed, and Julián Villegas. Figurative Privacy Control of SIP-based Narrowcasting. In IEEE AINA: Int. Conf. on Advanced Information Networking and Applications, Gino wan, Japan, March 2008.
In traditional conferencing systems, participants have little or no privacy, as their voices are by default shared with all others in a session. Such systems cannot offer participants the options of muting and deafening other members. The concept of narrowcasting can be applied to make these kinds of filters available in multimedia conferencing systems. Our system treats media sinks (in the simplest case, listeners) as full citizens, peers of the media sources (conversants’ voices), and we defined therefore duals of mute & select: deafen & attend, which respectively block a sink or focus on it to the exclusion of others. In this article, we describe our prototyped system, which uses existing standard Session Initiation Protocol (SIP) methods to control fine-grained narrowcasting sessions. The design considers the policy configured by the participants and provides a policy evaluation algorithm for media mixing and delivery. We have integrated a ‘virtual reality’-style interface with this SIPbackend to display and control articulated narrowcasting with figurative avatars.
 Michael Cohen, Ishara Jaysingha, and Julián Villegas. Spin-Around: Phase-Locked Synchronized Rotation and Revolution in Multistandpoint Panoramic Browsers. In Proc. IEEE CIT 2007: 7th Int. Conf. on Computer and Information Technology, pages 511–516, Aizu Wakamatsu, Japan, October 2007.
Using multistandpoint panoramic browsers as dis- plays, we have developed a control function that syn- chronizes revolution and rotation of a visual perspective around a designated point of regard in a virtual environ- ment. The phase-locked orbit is uniquely determined by the focus and the start point, and the user can pa- rameterize direction, step size, and cycle speed, and in- voke an animated or single-stepped gesture. The images can be monoscopic or stereoscopic, and the rendering supports the usual scaling functions (zoom/unzoom). Additionally, via sibling clients that can directionalize realtime audio streams, spatialize hdd-resident audio files, or render rotation via a personal rotary motion platform, spatial sound and propriceptive sensations can be synchronized with such gestures, providing com- plementary multimodal displays.
 Julián Villegas and Michael Cohen. Synœsthetic Music or the Ultimate Ocular Harpsichord. In Proc. IEEE CIT: 7th Int. Conf. on Computer and Information Technology, pages 523–527, Aizu Wakamatsu, Japan, October 2007.
We address the problem of visualizing microtuned scales and chords such that each representation is unique and therefore distinguishable. Using colors to represent the different pitches, we aim to capture aspects from the musical scale impossible to represent with numerical ratios. Inspired by the neurological phenomenon known as synæshesia, we built a system to reproduce microtuned MIDI sequences aurally and visually. This system can be related to Castel’s historic idea of the ‘Ocular Harpsichord.’
 Julián Villegas and Michael Cohen. Möbius Tones and Shepard Geometries: An Alternative Synæsthetic Analogy (poster). InProc. SIGGRAPH NPAR: 5th Int. Symp. on Non-Photorealistic Animation and Rendering, San Diego, August 2007.
“Why did the chicken cross the Möbius strip?” – “Because it wanted to get to the same side.” We created a 3D animation to illustrate a different visual analogy for Shepard tones based on the well-known “Möbius Strip II” by Escher. This animation presents a sphere that, like the ants in Escher’s woodcut, moves longitudinally over the surface. The path followed by the ballvaries its transverse position randomly but smoothly. We sample this path to render a melody of Shepard tone dyads, each tone in the dyad having a frequency equivalent to the position of the ball relative to the edge of the surface. The idea of using a Möbius strip to create music is not new. Tremblay, for example, used it to show how to construct a music-box able to play sequences backward and forward. However, our work differs from other developments in the use of this non-orientable geometry to illustrate the mentioned aural paradox.
This article discusses the challenges of applying the tonotopic consonance theory to minimize the dissonance of concurrent sounds in real-time. It reviews previous solutions, proposes an alternative model, and presents a prototype programmed in Pd that aims to surmount the difficulties of prior solutions.
Although the problem of maximizing consonance in tonal music has been addressed before, every solution reflecting the technological advances of its epoch, and considering that current theories to explain this psychoacoustical phenomenon are generally satisfactory, there are still vast unexplored aspects of this area, since even most recent solutions lack adequate mechanisms to apply such techniques in realtime scenarios. In general, the most advanced achievements in this field are based on the MIDI protocol for controlling the pitch of simultaneous notes, inheriting the protocol limitations in terms of dependency on the quality of the synthesizer for satisfactory results, scalability, accuracy, veracity, etc. Besides that, timbres are generally known a priori for these techniques, so their application to unknown timbres requires digitization and analysis of sound samples, making such techniques unsuitable for realtime situations. This thesis summarizes the main theories about consonance and its relation to musical scales, reviews several previous solutions as well as the state of the art, proposes an alternative model to adaptively adjust consonance in a polyphonic scenario based on the tonotopic dissonance paradigm (presented by Plomp and Levelt, having been previously developed by Sethares), and presents a prototype of this model that aims to surmount the difficulties of prior solutions by performing realtime analysis and pitch adjustment programmed in Pure-data (Pd), a data flow DSP environment for realtime audio applications. The results are analyzed to determine the efficacy andefficiency of the proposed solution.
This paper introduces a new way to express harmonic stretching in realtime based on Pd and overcoming the limitations imposed by the MIDI protocol in previous solutions. Applicable to non-percussive sounds.
 Enrique López de Lara, Jerold A. DeHart, Julián Villegas, and Subhash Bhalla. RssKanji: Japanese Vocabulary Learning using RSS Feeds and Cellular Phones. In Proc. of JALT2006: 32nd Annual International Conference on Language Teaching and Learning & Educational Materials Exposition, Kitakyushu, Fukuoka, Japan, 2006.
The acquisition of vocabulary plays a central role in the learning of a new language. For learners of Japanese, the task involves memorizing thousands of new words and three main aspects of each word— the readings, the pictograms used and the meaning of the word in the learner’s native language. We developed a prototype of a system to improve retention of new words. The system incorporates vocabulary learning techniques such as visual mnemonics, repeated exposure to words, short quizzes followed by verification of accuracy, and grouping of words by user-generated tags. Our prototype differs fromexisting vocabulary learning systems in that it uses RSS feeds to provide ubiquitous access to the words being learned. The system provides a Web interface to a native XML database where users can store new words. Using this database, the system generates word lists in RSS format at regular intervals from randomly selected words. A user can subscribe to his own feed or to the feeds of other users, allowing for collaborative vocabulary learning. If one of the words in the list is not yet memorized by the user, the user can click on it to have a small quiz about the correct readings, meaning or pictograms for the word. TheRSS feeds can be accessed from a PC inside the classroom or from a mobile phone outside the classroom.
An extended version of the paper published in Proc. HC-2005: Eighth International Conference on Humans and Computers, introducing other possibilities to achieve harmonic stretching using only the MIDI protocol.
A new technique to visually and aurally express melodic stretching in realtime based on the Helical Keyboard (a Java 3D application) and the MIDI protocol.
 Julián Villegas, Yuuta Kawano, and Michael Cohen. Harmonic Stretching with the Helical Keyboard. In Proc. HC-2005: Eighth Int. Conf. on Human and Computer, pages 261–266, Aizu Wakamatsu, Japan, September 2005.
A new technique to visually and auditory express harmonic stretching in realtime, based on the Helical Keyboard (a Java 3D application), the MIDI protocol, and Java Sound Synthesis capabilities.
 Felipe Millán Constaín, Juan Camilo Paz, Alfredo Roa, Julián Villegas, Nicolás Carranza, Diego Briceño, and Alex Mera.Medición de la Productividad del Valor Agregado (Added Value Productivity Measurement). Servicio Nacional de Aprendizaje (SENA), 2nd edition, 2003. in Spanish. [Online; accessed 31-Aug-2008]: http://cnp.org.co.
 Julián Villegas. Diseño e Implementación de un Algoritmo Genético para la Asignación de Aulas en la Universidad del Valle (Design and Implementation of a Genetic Algorithm for Timetabling at the University of Valle). Undergraduate honors thesis, University of Valle, Cali, 2001. In Spanish.
Cómo aplicar las ventajas de las técnicas de procesamiento paralelo y de inteligencia artificial, específicamente de Algoritmos Genéticos, en la solución del problema de asignación de aulas, inicialmente en la Universidad del Valle, además de comparar la eficiencia del sistema actual con el nuevo esquema propuesto, es el objeto de esta tesis. En el primer capítulo se presenta una Introducción a la Teoría de la Computación en Paralelo: se muestran las diferentes fuentes de paralelismo en un programa y las diferentes arquitecturas de software y hardware disponibles para lograr paralelismo. Se comparan las diferentes opciones con las necesidades de la presente tesis. En el segundo capítulo se presenta una Introducción General a los Algoritmos Genéticos: se explica su funcionamiento, se presentan los operadores genéticos más comunes, las estrategias de selección de más aceptación; también, se muestra la relación que hay entre la computación en paralelo y los algoritmos genéticos.Finalmente, se hace una presentación de una implementación popular de un algoritmo genético paralelo. En el tercer capítulo, se presenta el problema general de asignación de aulas, de horarios, y sus principales características. En el cuarto capítulo se presenta la evolución del problema de la asignación de aulas y horarios y el estado del arte. En el quinto capítulo se analiza elproceso actual (enero – mayo 2000) de asignación de aulas en la Universidad del Valle; se determinan las entradas y salidas del proceso, se calcula la dimensión del problema, se recopilan los requerimientos de los entes involucrados, y se identifican las restricciones y prioridades tenidas en cuenta en el proceso de asignación de aulas. En el sexto capítulo se propone una solución al problema de asignación de aulas basada en algoritmos genéticos paralelos: resume el proceso de instalación de 6 PVM, la configuración empleada y algunos detalles que no son tan claros en la documentación que viene con las fuentes dePVM; además, se muestra la instalación de SSH y la manera de emplearlo conjuntamente con PVM en una red donde la seguridad es importante. En el séptimo capítulo, se hace un análisis de los resultados obtenidos y se discuten posibles desarrollos posteriores, mejoramientos y refinamientos del algoritmo genético propuesto. En el octavo se anexan el código fuente en C de los programas que hacen parte de la solución (el código del algoritmo genético maestro y el esclavo), como también los códigos SQL de las consultas realizadas a la base de datos para extraer la información necesaria para la asignación. Además, un papel presentado en GECCO – 2000 (Genetic and Evolutive Computation Conference – 2000). En el noveno capítulo se presentan las conclusiones generales del presente proyecto de grado. El décimo capitulo presenta las referencias bibliográficas empleadas en el desarrollo del presente proyecto.
 Takashi Mikuriya, Masataka Shimizu, and Michael Cohen. A Collaborative Virtual Environment Featuring Multimodal Information Controlled by a Dynamic Map. In Proc. HC2000: Third Int. Conf. on Human and Computer, pages 77–80, Aizu-Wakamatsu, Japan, 2000.
 Julián Villegas. Original Music for “El Proyecto del Diablo”. TV Broadcasted by Rostros y Rastros (UVTV), 1999. Documentary directed by Óscar Campo (25 Min.). www.imdb.com/title/tt0483127.
“Todos los caminos me llevan al infierno… Pero si el infierno soy yo!”, es la frase con la que inicia el monólogo que narrará a través de todo el documental La Larva, quien no sabemos si está vivo o muerto. En El Proyecto del Diablo conocemos la historia y los pensamientos más reveladores de este hombre: Desde que probó la marihuana en el colegio, pasando por sus experiencias tropeleras en la universidad, hasta los momentos de mayor éxtasis con las drogas y su estadía en la cárcel por doce años. Descubrimos así la mentalidad de quien se dejó atrapar y llevar por la maldad.
 Julián Villegas and Claudia Villegas. Hipertexto Hidráulica Aplicada al Agua Potable y Saneamiento Básico (Hypertext on applied hydraulics and basic drinkable water). Servicio Nacional de Aprendizaje (SENA), Cali, Colombia, 1999. In Spanish.
A hypertext with Java applets created for the distance-learning program on drinkable water and basic hygiene.
Esta es una de las tantas historias que puede acontecer el campo Colombiano. Una noche, en una finca, tres niños escuchan los relatos de terror narradas por una mujer, quien les advierte que no deben salir de casa, pues el diablo puede llevarlos a una muerte horrible. Al amanecer uno de los niños ha desaparecido; sus dos hermanos lo buscan por toda la casa sin encontrarlo por ninguna parte. El hermano mayor se atreve a salir en su búsqueda y logra regresar con su hermano a casa, sin embargo, en esta travesía se confrontaron sus recuerdos con una cruda verdad.
 Julián Villegas. Original Music for “El Terminal”. TV Broadcasted by Rostros y Rastros (UVTV), 1998. Documentary directed by Margarita Arbeláez, Luz Elena Luna, Ximena Bedoya, Andrea Rosales, Juan Camilo Duque, Claudia Villegas, and María Fernanda Gutiérrez (26 min.).
Este documental retrata la rutinas, los personajes y sucesos que acontecen en El Terminal de Transportes de Cali. El paso del tiempo en el documental transcurre al ritmo de la espera y la ansiedad de los viajeros: Gente que va y viene; otros se despiden y se alejan; otros llegan sin conocer. El día pasa y llega la noche, lo que marca diferentes ritmos de la vida en El Terminal.
A four part series, granted by the Ministry of Culture of Colombia, about the myths and legends of the Colombian Pacific coast which draws a thin line between fact and fiction. El ojo de Buziraco I: Nadie vio nada El Ojo de Buziraco es una serie compuesta por cuatro capítulos que reactualiza algunos mitos colombianos (mantenidos a través de la tradición oral) enámbitos urbanos. En el primer capitulo titulado Nadie vio nada, dos hombres de edad madura reconstruyen, a través de su testimonio, la historia que oyeron de sus antecesores a cerca del mito del Mohan, mounstro de los ríos; y paralelamente, se desarrolla una historia de ficción en la que una banda criminal de la ciudad se verá acechada y atacada por este personaje. El ojo de Buziraco II: Tente en el aire En el segundo capítulo de la serie El Ojo de Buziraco, la historia gira en torno a un grupo de universitarios, estudiantes de audiovisuales, que indagan acerca de la función de los “cuentos de miedo” en la educación de tres adultos mayores, a quienes dichas historias fueron contadas por sus padres y abuelos. La novia de uno de los jóvenes ha fallecido recientemente y a medida que los testimonios de las entrevistas describen el mito de La Tunda, él comienza aexperimentar acerca- mientos con la difunta: la ve en pesadillas recurrentes, en la pantalla de los monitores de edición y a través de la cámara; al final, la presencia fantasmal aparece y lo convierte en una víctima más de La Tunda. El ojo de Buziraco III: La guerra de Mandrágora La guerra de Mandrágora es el tercer capítulo de la serie El Ojo de Buziraco donde los entrevistados sostienen que las brujas sí existen. En la puesta en escena, una mujer acude a una bruja para buscar solución a las pesadillas que atormentan a su marido; luego, la mujer en su soledad y frente a la sospecha de ser engañada por su esposo,vuelve donde la bruja quien ha decidido que ella debe ser su sucesora. Una vez terminados los ritos de iniciación, aparece el diablo castigando a la bruja y la historia tiene un final inesperado. El ojo de Buziraco IV: El Vampiríparo El último capítulo de la serie El Ojo de Buziraco, acontece en un salón de clases donde el estudiante más «atontado», Medardo, comienza a tener una serie de alucinaciones. En los momentos de mayor presión y frustración, se ve a si mismo como un nosferatu, siempre en la búsqueda de un mentor: un vampiro «real». Dicha obsesión, llevará a Medardo a vivir múltiples situaciones en las que se verá comprometida su integridad. Entre tanto, los entrevistados, que son la cuota documental del audiovisual, sostendrán en sus testimonios la no-existencia de los vampiros y recrearán historias que se convirtieron en mitos urbanos y que posiblemente inspiraron la leyenda acerca de la existencia del vampirismo en la ciudad de Cali.
Relato futurista en el que las máquinas ejercen pleno control sobre los hombres. Esta tecno-sociedad de la represión, sin embargo, oculta una verdad que le es revelada a Mario desde una suerte de submundo habitado por quienes han decidido resistir: tras el poder de las máquinas se esconde el poder de algunos pocos hombres.
No, no…baby es una historia que combina elementos de drama y suspenso alrededor de las extrañas, violentas y perversas relaciones que se tejen en un grupo de amigos, entregados con pasión al consumo de la carne. Los excesos desatan sus instintos caníbales al momento que aparecen los conflictos sentimentales entre ellos. El desenlace de esta historia es llevado al extremo cuando vemos que el grupo termina devorándose mutuamente mientras observan películas de Stanley Kubrick.
July 3, 2013
Created by julovi